Protein Data Bank
Encyclopedia
The Protein Data Bank is a repository for the 3-D structural data of large biological molecules, such as protein
s and nucleic acid
s. (See also crystallographic database
). The data, typically obtained by X-ray crystallography
or NMR spectroscopy and submitted by biologist
s and biochemists
from around the world, are freely accessible on the Internet via the websites of its member organisations (PDBe, PDBj, and RCSB). The PDB is overseen by an organization called the Worldwide Protein Data Bank
, wwPDB.
The PDB is a key resource in areas of structural biology
, such as structural genomics
. Most major scientific journals, and some funding agencies, such as the NIH in the USA, now require scientists to submit their structure data to the PDB. If the contents of the PDB are thought of as primary data, then there are hundreds of derived (i.e., secondary) databases that categorize the data differently. For example, both SCOP
and CATH
categorize structures according to type of structure and assumed evolutionary relations; GO
categorize structures based on genes.
Upon Hamilton's death in 1973, Dr. Tom Koeztle took over direction of the PDB for the subsequent 20 years. In January 1994, Dr. Joel Sussman of Israel's Weizmann Institute of Science
was appointed head of the PDB. In October 1998,
the PDB was transferred to the Research Collaboratory for Structural Bioinformatics (RCSB); the transfer was completed in June 1999. The new director was Dr. Helen M. Berman of Rutgers University
(one of the member institutions of the RCSB). In 2003, with the formation of the wwPDB, the PDB became an international organization. The founding members are PDBe (Europe), RCSB(USA), and PDBj (Japan). The BMRB joined in 2006. Each of the four members of wwPDB
can act as deposition, data processing and distribution centers for PDB data. The data processing refers to the fact that wwPDB staff review and annotates each submitted entry . The data are then automatically checked for plausibility (the source code for this validation software has been made available to the public at no charge).
These data show that most structures are determined by X-ray diffraction, but about 15% of structures are now determined by protein NMR. When using X-ray diffraction, approximations of the coordinates of the atoms of the protein are obtained, whereas estimations of the distances between pairs of atoms of the protein are found through NMR experiments. Therefore, the final conformation of the protein is obtained, in the latter case, by solving a distance geometry
problem. A few proteins are determined by cryo-electron microscopy
. (Clicking on the numbers in the original table will bring up examples of structures determined by that method.)
The significance of the structure factor files, mentioned above, is that, for PDB structures determined by X-ray diffraction that have a structure file, the electron density map may be viewed. The data of such structures is stored on the "electron density server", where the electron maps can be viewed.
In the past the number of structures in the PDB has grown at an approximately exponential rate. However, since 2007 the rate of accumulation of new proteins appears to have plateaued, with 7263 proteins added in 2007, 7073 in 2008, 7448 in 2009, and 7971 in 2010.
. This original format was restricted by the width of computer punch cards to 80 characters per line. Around 1996, the "macromolecular Crystallographic Information file" format, mmCIF, started to be phased in. An XML version of this format, called PDBML, was described in 2005.
The structure files can be downloaded in any of these three formats. In fact, individual files are easily downloaded into graphics packages using web addresses:
The "
, MDL Chime
, Pymol
, UCSF Chimera, Rasmol
, Swiss-PDB Viewer, StarBiochem (a Java-based interactive molecular viewer with integrated search of protein databank), Sirius
, and VisProt3DS (a tool for Protein Visualization in 3D stereoscopic view in anaglyth and other modes). The RCSB PDB website contains an extensive list of both free and commercial molecule visualization programs and web browser plugins.
Protein
Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...
s and nucleic acid
Nucleic acid
Nucleic acids are biological molecules essential for life, and include DNA and RNA . Together with proteins, nucleic acids make up the most important macromolecules; each is found in abundance in all living things, where they function in encoding, transmitting and expressing genetic information...
s. (See also crystallographic database
Crystallographic database
A crystallographic database is a database specifically designed to store information about crystals and crystal structures. Crystals are solids having, in all three dimensions of space, a regularly repeating arrangement of atoms, ions, or molecules. They are characterized by symmetry, morphology,...
). The data, typically obtained by X-ray crystallography
X-ray crystallography
X-ray crystallography is a method of determining the arrangement of atoms within a crystal, in which a beam of X-rays strikes a crystal and causes the beam of light to spread into many specific directions. From the angles and intensities of these diffracted beams, a crystallographer can produce a...
or NMR spectroscopy and submitted by biologist
Biologist
A biologist is a scientist devoted to and producing results in biology through the study of life. Typically biologists study organisms and their relationship to their environment. Biologists involved in basic research attempt to discover underlying mechanisms that govern how organisms work...
s and biochemists
Biochemistry
Biochemistry, sometimes called biological chemistry, is the study of chemical processes in living organisms, including, but not limited to, living matter. Biochemistry governs all living organisms and living processes...
from around the world, are freely accessible on the Internet via the websites of its member organisations (PDBe, PDBj, and RCSB). The PDB is overseen by an organization called the Worldwide Protein Data Bank
Worldwide Protein Data Bank
The Worldwide Protein Data Bank, wwPDB, is an organization whose mission, according to its website, is "to maintain a single Protein Data Bank Archive of macromolecular structural data that is freely and publicly available to the global community." Given the open access goal, it is somewhat ironic...
, wwPDB.
The PDB is a key resource in areas of structural biology
Structural biology
Structural biology is a branch of molecular biology, biochemistry, and biophysics concerned with the molecular structure of biological macromolecules, especially proteins and nucleic acids, how they acquire the structures they have, and how alterations in their structures affect their function...
, such as structural genomics
Structural genomics
Structural genomics seeks to describe the 3-dimensional structure of every protein encoded by a given genome. This genome-based approach allows for a high-throughput method of structure determination by a combination of experimental and modeling approaches...
. Most major scientific journals, and some funding agencies, such as the NIH in the USA, now require scientists to submit their structure data to the PDB. If the contents of the PDB are thought of as primary data, then there are hundreds of derived (i.e., secondary) databases that categorize the data differently. For example, both SCOP
Structural Classification of Proteins
The Structural Classification of Proteins database is a largely manual classification of protein structural domains based on similarities of their structures and amino acid sequences. A motivation for this classification is to determine the evolutionary relationship between proteins...
and CATH
CATH
The CATH Protein Structure Classification is a semi-automatic, hierarchical classification of protein domains published in 1997 by Christine Orengo, Janet Thornton and their colleagues....
categorize structures according to type of structure and assumed evolutionary relations; GO
Gene Ontology
The Gene Ontology, or GO, is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species...
categorize structures based on genes.
History
Two forces converged to initiate the PDB: 1) a small but growing collection of sets of protein structure data determined by X-ray diffraction and 2) the newly available (1968) molecular graphics display, the BRookhaven Raster Display (BRAD), to visualize these protein structures in 3-D. In 1969, with the sponsorship of Dr. Walter Hamilton at the Brookhaven National Laboratory, Dr. Edgar Meyer (Texas A&M University) began to write software to store atomic coordinate files in a common format to make them available for geometric and graphical evaluation. By 1971, one of Dr. Meyer's programs, SEARCH, enabled researchers to remotely access information from the database to study protein structures offline. SEARCH was instrumental in enabling networking, thus marking the functional beginning of the PDB.Upon Hamilton's death in 1973, Dr. Tom Koeztle took over direction of the PDB for the subsequent 20 years. In January 1994, Dr. Joel Sussman of Israel's Weizmann Institute of Science
Weizmann Institute of Science
The Weizmann Institute of Science , known as Machon Weizmann, is a university and research institute in Rehovot, Israel. It differs from other Israeli universities in that it offers only graduate and post-graduate studies in the sciences....
was appointed head of the PDB. In October 1998,
the PDB was transferred to the Research Collaboratory for Structural Bioinformatics (RCSB); the transfer was completed in June 1999. The new director was Dr. Helen M. Berman of Rutgers University
Rutgers University
Rutgers, The State University of New Jersey , is the largest institution for higher education in New Jersey, United States. It was originally chartered as Queen's College in 1766. It is the eighth-oldest college in the United States and one of the nine Colonial colleges founded before the American...
(one of the member institutions of the RCSB). In 2003, with the formation of the wwPDB, the PDB became an international organization. The founding members are PDBe (Europe), RCSB(USA), and PDBj (Japan). The BMRB joined in 2006. Each of the four members of wwPDB
Worldwide Protein Data Bank
The Worldwide Protein Data Bank, wwPDB, is an organization whose mission, according to its website, is "to maintain a single Protein Data Bank Archive of macromolecular structural data that is freely and publicly available to the global community." Given the open access goal, it is somewhat ironic...
can act as deposition, data processing and distribution centers for PDB data. The data processing refers to the fact that wwPDB staff review and annotates each submitted entry . The data are then automatically checked for plausibility (the source code for this validation software has been made available to the public at no charge).
Contents
The PDB database is updated weekly (on Tuesday). Likewise, the PDB Holdings List is also updated weekly. , the breakdown of current holdings is as follows:Experimental Method |
Proteins | Nucleic Acid Nucleic acid Nucleic acids are biological molecules essential for life, and include DNA and RNA . Together with proteins, nucleic acids make up the most important macromolecules; each is found in abundance in all living things, where they function in encoding, transmitting and expressing genetic information... s |
Protein/Nucleic Acid complexes |
Other | Total |
---|---|---|---|---|---|
X-ray diffraction | 62750 | 1323 | 3050 | 2 | 67125 |
NMR Protein nuclear magnetic resonance spectroscopy Nuclear magnetic resonance spectroscopy of proteins is a field of structural biology in which NMR spectroscopy is used to obtain information about the structure and dynamics of proteins. The field was pioneered by Richard R. Ernst and Kurt Wüthrich, among others... |
7962 | 960 | 179 | 7 | 9108 |
Electron microscopy Cryo-electron microscopy Cryo-electron microscopy , or electron cryomicroscopy, is a form of transmission electron microscopy where the sample is studied at cryogenic temperatures... |
262 | 22 | 96 | 0 | 380 |
Hybrid | 41 | 3 | 1 | 1 | 46 |
Other | 133 | 4 | 5 | 13 | 155 |
Total: | 71148 | 2312 | 3331 | 23 | 76814 |
-
- 56,523 structures in the PDB have a structure factorStructure factorIn condensed matter physics and crystallography, the static structure factor is a mathematical description of how a material scatters incident radiation...
file. - 6,410 structures have an NMR restraint file.
- 198 structures in the PDB have a chemical shiftChemical shiftIn nuclear magnetic resonance spectroscopy, the chemical shift is the resonant frequency of a nucleus relative to a standard. Often the position and number of chemical shifts are diagnostic of the structure of a molecule...
s file.
- 56,523 structures in the PDB have a structure factor
These data show that most structures are determined by X-ray diffraction, but about 15% of structures are now determined by protein NMR. When using X-ray diffraction, approximations of the coordinates of the atoms of the protein are obtained, whereas estimations of the distances between pairs of atoms of the protein are found through NMR experiments. Therefore, the final conformation of the protein is obtained, in the latter case, by solving a distance geometry
Distance geometry
Distance geometry is the characterization and study of sets of points based only on given values of the distances between member pairs. Therefore distance geometry has immediate relevance where distance values are determined or considered, such as in surveying, cartography and...
problem. A few proteins are determined by cryo-electron microscopy
Cryo-electron microscopy
Cryo-electron microscopy , or electron cryomicroscopy, is a form of transmission electron microscopy where the sample is studied at cryogenic temperatures...
. (Clicking on the numbers in the original table will bring up examples of structures determined by that method.)
The significance of the structure factor files, mentioned above, is that, for PDB structures determined by X-ray diffraction that have a structure file, the electron density map may be viewed. The data of such structures is stored on the "electron density server", where the electron maps can be viewed.
In the past the number of structures in the PDB has grown at an approximately exponential rate. However, since 2007 the rate of accumulation of new proteins appears to have plateaued, with 7263 proteins added in 2007, 7073 in 2008, 7448 in 2009, and 7971 in 2010.
File format
The file format initially used by the PDB was called the PDB file formatProtein Data Bank (file format)
The Protein Data Bank file format is a textual file format describing the three dimensional structures of molecules held in the Protein Data Bank. The pdb format accordingly provides for description and annotation of protein and nucleic acid structures including atomic coordinates, observed...
. This original format was restricted by the width of computer punch cards to 80 characters per line. Around 1996, the "macromolecular Crystallographic Information file" format, mmCIF, started to be phased in. An XML version of this format, called PDBML, was described in 2005.
The structure files can be downloaded in any of these three formats. In fact, individual files are easily downloaded into graphics packages using web addresses:
- For PDB format files, use, e.g.,
http://www.pdb.org/pdb/files/4hhb.pdb.gz or http://pdbe.org/download/4hhb - For PDBML (XML) files, use, e.g.,
http://www.pdb.org/pdb/files/4hhb.xml.gz or http://pdbe.org/pdbml/4hhb
The "
4hhb
" is the PDB identifier. Each structure published in PDB receives a four-character alphanumeric identifier, its PDB ID. (This cannot be used as an identifier for biomolecules, because often several structures for the same molecule—in different environments or conformations—are contained in PDB with different PDB IDs.)Viewing the data
The structure files may be viewed using one of several open source computer programs. Some other free, but not open source programs include ICM-Browser, VMDVisual Molecular Dynamics
- External links :* * *...
, MDL Chime
MDL Chime
MDL Chime is a free plugin used by web browsers to display the three dimensional structures of molecules. It is part of the ISIS product line acquired by Symyx Technologies from scientific publisher Elsevier in October 2007. It is based on the RasMol code.Chime is used by a wide range of...
, Pymol
PyMOL
PyMOL is an open-source, user-sponsored, molecular visualization system created by Warren Lyford DeLano and commercialized by DeLano Scientific LLC, which is a private software company dedicated to creating useful tools that become universally accessible to scientific and educational communities...
, UCSF Chimera, Rasmol
RasMol
RasMol is a computer program written for molecular graphics visualization intended and used primarily for the depiction and exploration of biological macromolecule structures, such as those found in the Protein Data Bank...
, Swiss-PDB Viewer, StarBiochem (a Java-based interactive molecular viewer with integrated search of protein databank), Sirius
Sirius visualization software
Sirius is a molecular modeling and analysis system developed at San Diego Supercomputer Center. Sirius is designed to support advanced user requirements that go beyond simple display of small molecules and proteins...
, and VisProt3DS (a tool for Protein Visualization in 3D stereoscopic view in anaglyth and other modes). The RCSB PDB website contains an extensive list of both free and commercial molecule visualization programs and web browser plugins.
See also
- Crystallographic databaseCrystallographic databaseA crystallographic database is a database specifically designed to store information about crystals and crystal structures. Crystals are solids having, in all three dimensions of space, a regularly repeating arrangement of atoms, ions, or molecules. They are characterized by symmetry, morphology,...
- Protein structureProtein structureProteins are an important class of biological macromolecules present in all organisms. Proteins are polymers of amino acids. Classified by their physical size, proteins are nanoparticles . Each protein polymer – also known as a polypeptide – consists of a sequence formed from 20 possible L-α-amino...
- Protein structure databasesProtein structure databasesIn biology, a protein structure database is a database that is modeled around the various experimentally determined protein structures. The aim of most protein structure databases is to organize and annotate the protein structures, providing the biological community access to the experimental data...
- PDBsumPDBsumPDBsum is database that provides an overview of the contents of each 3D macromolecular structure deposited in the Protein Data Bank.Each structure in the PDBsum database includes an image of structure, molecular components contained in the complex, enzyme reaction diagram if appropriate, Gene...
— extracts data from other databases about PDB structures - PDBWikiPDBWikiPDBWiki is a wiki that functions as a user-contributed database of protein structure annotations, listing all the protein structures currently available in the Protein Data Bank...
— a website for community annotation of PDB structures - ProteopediaProteopediaProteopedia is a wiki, 3D encyclopedia of proteins and other molecules..The site contains a page for every entry in the Protein Data Bank , as well as pages that are more descriptive of protein structures in general such as acetylcholinesterase, hemoglobin, and the photosystem II with a Jmol view...
— a collaborative 3D encyclopedia of proteins and other molecules
External links
- The Worldwide Protein Data Bank (wwPDB) — parent site to regional hosts (below)
- RCSB Protein Data Bank (USA)
- PDBe (Europe)
- PDBj (Japan)
- BMRB, Biological Magnetic Resonance Data Bank (USA)
- wwPDB Documentation — documentation on both the PDB and PDBML file formats
- Looking at Structures — The RCSB's introduction to crystallography
- PDBsum Home Page — Extracts data from other databases about PDB structures.
- Nucleic Acid Database, NDB — a PDB mirror especially for searching for nucleic acids
- Introductory PDB tutorial sponsored by PDB