Pfam
Encyclopedia
Pfam is a database of protein families
that includes their annotations and multiple sequence alignment
s generated using hidden Markov models.
The descriptions of Pfam families are managed by the general public using Wikipedia.
74% of protein sequences have at least one match to Pfam. This number is called the sequence coverage.
The Pfam database contains information about protein domain
s and families. Pfam-A is the manually curated portion of the database that contains over 10,000 entries. For each entry a protein sequence alignment
and a hidden Markov model
is stored. These hidden Markov model
s can be used to search sequence databases with the HMMER
package written by Sean Eddy. Because the entries in Pfam-A do not cover all known proteins, an automatically generated supplement is provided called Pfam-B. Pfam-B contains a large number of small families derived from clusters produced by an algorithm called ADDA. Although of lower quality, Pfam-B families can be useful when no Pfam-A families are found.
The database iPfam builds on the domain description of Pfam. It investigates if different proteins described together in the protein structure
database PDB
are close enough to potentially interact.
The current release of Pfam is "Pfam 25.0" (March 2011; 12,273 families).
Protein family
A protein family is a group of evolutionarily-related proteins, and is often nearly synonymous with gene family. The term protein family should not be confused with family as it is used in taxonomy....
that includes their annotations and multiple sequence alignment
Multiple sequence alignment
A multiple sequence alignment is a sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a lineage and are descended from a common ancestor...
s generated using hidden Markov models.
Features
For each family in Pfam one can:- Look at multiple alignments
- View protein domain architectures
- Examine species distribution
- Follow links to other databases
- View known protein structures
The descriptions of Pfam families are managed by the general public using Wikipedia.
74% of protein sequences have at least one match to Pfam. This number is called the sequence coverage.
The Pfam database contains information about protein domain
Protein domain
A protein domain is a part of protein sequence and structure that can evolve, function, and exist independently of the rest of the protein chain. Each domain forms a compact three-dimensional structure and often can be independently stable and folded. Many proteins consist of several structural...
s and families. Pfam-A is the manually curated portion of the database that contains over 10,000 entries. For each entry a protein sequence alignment
Multiple sequence alignment
A multiple sequence alignment is a sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a lineage and are descended from a common ancestor...
and a hidden Markov model
Hidden Markov model
A hidden Markov model is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved states. An HMM can be considered as the simplest dynamic Bayesian network. The mathematics behind the HMM was developed by L. E...
is stored. These hidden Markov model
Hidden Markov model
A hidden Markov model is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved states. An HMM can be considered as the simplest dynamic Bayesian network. The mathematics behind the HMM was developed by L. E...
s can be used to search sequence databases with the HMMER
HMMER
HMMER is a free and commonly used software package for sequence analysis written by Sean Eddy. Its general usage is to identify homologous protein or nucleotide sequences. It does this by comparing a profile-HMM to either a single sequence or a database of sequences...
package written by Sean Eddy. Because the entries in Pfam-A do not cover all known proteins, an automatically generated supplement is provided called Pfam-B. Pfam-B contains a large number of small families derived from clusters produced by an algorithm called ADDA. Although of lower quality, Pfam-B families can be useful when no Pfam-A families are found.
The database iPfam builds on the domain description of Pfam. It investigates if different proteins described together in the protein structure
Protein structure
Proteins are an important class of biological macromolecules present in all organisms. Proteins are polymers of amino acids. Classified by their physical size, proteins are nanoparticles . Each protein polymer – also known as a polypeptide – consists of a sequence formed from 20 possible L-α-amino...
database PDB
Protein Data Bank
The Protein Data Bank is a repository for the 3-D structural data of large biological molecules, such as proteins and nucleic acids....
are close enough to potentially interact.
The current release of Pfam is "Pfam 25.0" (March 2011; 12,273 families).
See also
- TrEMBL Database performing an automated protein sequence annotation
- InterProInterProInterPro is a database of protein families, domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterise them....
Integration of protein domain and protein family databases
External links
- Pfam - Protein family database at Sanger InstituteSanger InstituteThe Wellcome Trust Sanger Institute is a non-profit, British genomics and genetics research institute, primarily funded by the Wellcome Trust....
UK - Pfam - Protein family database at Janelia Farm Research CampusJanelia Farm Research CampusThe Janelia Farm Research Campus is a research campus of the Howard Hughes Medical Institute that opened in October 2006. The campus is located in Loudoun County, Virginia, near the town of Ashburn...
USA - Pfam - Protein family database at Stockholm Bioinformatics Centre Sweden
- iPfam - Interactions of Pfam domains in PDB