Pfam
Encyclopedia
Pfam is a database of protein families
Protein family
A protein family is a group of evolutionarily-related proteins, and is often nearly synonymous with gene family. The term protein family should not be confused with family as it is used in taxonomy....

 that includes their annotations and multiple sequence alignment
Multiple sequence alignment
A multiple sequence alignment is a sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a lineage and are descended from a common ancestor...

s generated using hidden Markov models.

Features

For each family in Pfam one can:
  • Look at multiple alignments
  • View protein domain architectures
  • Examine species distribution
  • Follow links to other databases
  • View known protein structures

The descriptions of Pfam families are managed by the general public using Wikipedia.

74% of protein sequences have at least one match to Pfam. This number is called the sequence coverage.

The Pfam database contains information about protein domain
Protein domain
A protein domain is a part of protein sequence and structure that can evolve, function, and exist independently of the rest of the protein chain. Each domain forms a compact three-dimensional structure and often can be independently stable and folded. Many proteins consist of several structural...

s and families. Pfam-A is the manually curated portion of the database that contains over 10,000 entries. For each entry a protein sequence alignment
Multiple sequence alignment
A multiple sequence alignment is a sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a lineage and are descended from a common ancestor...

 and a hidden Markov model
Hidden Markov model
A hidden Markov model is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved states. An HMM can be considered as the simplest dynamic Bayesian network. The mathematics behind the HMM was developed by L. E...

 is stored. These hidden Markov model
Hidden Markov model
A hidden Markov model is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved states. An HMM can be considered as the simplest dynamic Bayesian network. The mathematics behind the HMM was developed by L. E...

s can be used to search sequence databases with the HMMER
HMMER
HMMER is a free and commonly used software package for sequence analysis written by Sean Eddy. Its general usage is to identify homologous protein or nucleotide sequences. It does this by comparing a profile-HMM to either a single sequence or a database of sequences...

 package written by Sean Eddy. Because the entries in Pfam-A do not cover all known proteins, an automatically generated supplement is provided called Pfam-B. Pfam-B contains a large number of small families derived from clusters produced by an algorithm called ADDA. Although of lower quality, Pfam-B families can be useful when no Pfam-A families are found.

The database iPfam builds on the domain description of Pfam. It investigates if different proteins described together in the protein structure
Protein structure
Proteins are an important class of biological macromolecules present in all organisms. Proteins are polymers of amino acids. Classified by their physical size, proteins are nanoparticles . Each protein polymer – also known as a polypeptide – consists of a sequence formed from 20 possible L-α-amino...

 database PDB
Protein Data Bank
The Protein Data Bank is a repository for the 3-D structural data of large biological molecules, such as proteins and nucleic acids....

 are close enough to potentially interact.

The current release of Pfam is "Pfam 25.0" (March 2011; 12,273 families).

See also

  • TrEMBL Database performing an automated protein sequence annotation
  • InterPro
    InterPro
    InterPro is a database of protein families, domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterise them....

     Integration of protein domain and protein family databases

External links

  • Pfam - Protein family database at Sanger Institute
    Sanger Institute
    The Wellcome Trust Sanger Institute is a non-profit, British genomics and genetics research institute, primarily funded by the Wellcome Trust....

     UK
  • Pfam - Protein family database at Janelia Farm Research Campus
    Janelia Farm Research Campus
    The Janelia Farm Research Campus is a research campus of the Howard Hughes Medical Institute that opened in October 2006. The campus is located in Loudoun County, Virginia, near the town of Ashburn...

     USA
  • Pfam - Protein family database at Stockholm Bioinformatics Centre Sweden
  • iPfam - Interactions of Pfam domains in PDB
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK