Protein Data Bank (file format)
Encyclopedia
The Protein Data Bank file format is a textual file format describing the three dimensional structures of molecules held in the Protein Data Bank
. The pdb format accordingly provides for description and annotation of protein and nucleic acid structures including atomic coordinates, observed sidechain rotamers, secondary structure assignments, as well as atomic connectivity. Structures are often deposited with other molecules such as water, ions, nucleic acids, ligands and so on, which can be described in the pdb format as well. The Protein Data Bank
also keeps data on biological macromolecules in the newer mmCIF
file format.
HEADER, TITLE and AUTHOR records : provide information about the researchers who defined the structure; numerous other types of records are available to provide other types of information.
REMARK records : can contain free-form annotation, but they also accommodate standardized information; for example, the
SEQRES records : give the sequences of the three peptide chains (named A, B and C), which are very short in this example but usually span multiple lines.
ATOM records : describe the coordinates of the atoms that are part of the protein. For example, the first ATOM line above describes the alpha-N atom of the first residue of peptide chain A, which is a proline residue; the first three floating point numbers are its x, y and z coordinates and are in units of Ångström
s. The next three columns are the occupancy, temperature factor, and the element name, respectively.
HETATM records : describe coordinates of hetero-atoms, that is those atoms which are not part of the protein molecule.
Through the years the file format has undergone many changes and revisions. Its original format was dictated by the width of computer punch cards (80 columns). The most recent revision is 3.2.
Molecular visualization software capable of displaying PDB files:
Protein Data Bank
The Protein Data Bank is a repository for the 3-D structural data of large biological molecules, such as proteins and nucleic acids....
. The pdb format accordingly provides for description and annotation of protein and nucleic acid structures including atomic coordinates, observed sidechain rotamers, secondary structure assignments, as well as atomic connectivity. Structures are often deposited with other molecules such as water, ions, nucleic acids, ligands and so on, which can be described in the pdb format as well. The Protein Data Bank
Protein Data Bank
The Protein Data Bank is a repository for the 3-D structural data of large biological molecules, such as proteins and nucleic acids....
also keeps data on biological macromolecules in the newer mmCIF
Crystallographic Information File
Crystallographic Information File is a standard text file format for representing crystallographic information, promulgated by the International Union of Crystallography . CIF was developed by the IUCr Working Party on Crystallographic Information in an effort sponsored by the IUCr Commission on...
file format.
Example
A typical pdb file describing a protein consists of hundreds to thousands of lines like the following (taken from a file describing the structure of a synthetic collagen-like peptide:
HEADER EXTRACELLULAR MATRIX 22-JAN-98 1A3I
TITLE X-RAY CRYSTALLOGRAPHIC DETERMINATION OF A COLLAGEN-LIKE
TITLE 2 PEPTIDE WITH THE REPEATING SEQUENCE (PRO-PRO-GLY)
...
EXPDTA X-RAY DIFFRACTION
AUTHOR R.Z.KRAMER,L.VITAGLIANO,J.BELLA,R.BERISIO,L.MAZZARELLA,
AUTHOR 2 B.BRODSKY,A.ZAGARI,H.M.BERMAN
...
REMARK 350 BIOMOLECULE: 1
REMARK 350 APPLY THE FOLLOWING TO CHAINS: A, B, C
REMARK 350 BIOMT1 1 1.000000 0.000000 0.000000 0.00000
REMARK 350 BIOMT2 1 0.000000 1.000000 0.000000 0.00000
...
SEQRES 1 A 9 PRO PRO GLY PRO PRO GLY PRO PRO GLY
SEQRES 1 B 6 PRO PRO GLY PRO PRO GLY
SEQRES 1 C 6 PRO PRO GLY PRO PRO GLY
...
ATOM 1 N PRO A 1 8.316 21.206 21.530 1.00 17.44 N
ATOM 2 CA PRO A 1 7.608 20.729 20.336 1.00 17.44 C
ATOM 3 C PRO A 1 8.487 20.707 19.092 1.00 17.44 C
ATOM 4 O PRO A 1 9.466 21.457 19.005 1.00 17.44 O
ATOM 5 CB PRO A 1 6.460 21.723 20.211 1.00 22.26 C
...
HETATM 130 C ACY 401 3.682 22.541 11.236 1.00 21.19 C
HETATM 131 O ACY 401 2.807 23.097 10.553 1.00 21.19 O
HETATM 132 OXT ACY 401 4.306 23.101 12.291 1.00 21.19 O
...
HEADER, TITLE and AUTHOR records : provide information about the researchers who defined the structure; numerous other types of records are available to provide other types of information.
REMARK records : can contain free-form annotation, but they also accommodate standardized information; for example, the
REMARK 350 BIOMT
records describe how to compute the coordinates of the experimentally observed multimer from those of the explicitly specified ones of a single repeating unit.SEQRES records : give the sequences of the three peptide chains (named A, B and C), which are very short in this example but usually span multiple lines.
ATOM records : describe the coordinates of the atoms that are part of the protein. For example, the first ATOM line above describes the alpha-N atom of the first residue of peptide chain A, which is a proline residue; the first three floating point numbers are its x, y and z coordinates and are in units of Ångström
Ångström
The angstrom or ångström, is a unit of length equal to 1/10,000,000,000 of a meter . Its symbol is the Swedish letter Å....
s. The next three columns are the occupancy, temperature factor, and the element name, respectively.
HETATM records : describe coordinates of hetero-atoms, that is those atoms which are not part of the protein molecule.
Through the years the file format has undergone many changes and revisions. Its original format was dictated by the width of computer punch cards (80 columns). The most recent revision is 3.2.
See also
- Chemical file formatChemical file formatThis article discusses some common molecular file formats, including usage and converting between them.-Distinguishing formats:Chemical information is usually provided as files or streams and many formats have been created, with varying degrees of documentation. The format can be found by three...
- ScientificPython — provides an interface for PythonPython (programming language)Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...
- Software for molecular mechanics modeling
Molecular visualization software capable of displaying PDB files:
- Cn3DCn3DCn3D is a Windows, Macintosh and Unix-based software from the United States National Library of Medicine that acts as a helper application for web browsers to view three-dimensional structures from The National Center for Biotechnology Information's Entrez retrieval service...
- Coot (program)Coot (program)The program Coot is used to display and manipulate atomic models of macromolecules, typically of proteins or nucleic acids, using 3D computer graphics...
- GabeditGabeditGabedit is a Graphical User Interface to GAMESS , GAUSSIAN, MOLCAS, MOLPRO, MPQC, OpenMopac, PC GAMESS, and Q-Chem computational chemistry packages.- Major features :* builds molecules by atom, ring, group, amino acid and nucleoside....
- JmolJmolJmol is an open-source Java viewer for chemical structures in 3D,that does not require 3D acceleration plugins.Jmol returns a 3D representation of a molecule that may be used as a teaching tool, or for research e.g...
- MoldenMoldenMolden is a general molecular and electronic structure processing program.- Major features :* Reads output from the ab initio packages GAMESS , Gaussian, MOLPRO and from semi-empirical packages such as MOPAC, and supports a number of other formats....
- MolekelMolekelMolekel is a free software multiplatform molecular visualization program. It was originally developed at the University of Geneva by Peter F. Flükiger in the 1990s for Silicon Graphics Computers. In 1998, Stefan Portmann took over responsibility and released Version 3.0. Version 4.0 was a nearly...
- PyMOLPyMOLPyMOL is an open-source, user-sponsored, molecular visualization system created by Warren Lyford DeLano and commercialized by DeLano Scientific LLC, which is a private software company dedicated to creating useful tools that become universally accessible to scientific and educational communities...
- RasMolRasMolRasMol is a computer program written for molecular graphics visualization intended and used primarily for the depiction and exploration of biological macromolecule structures, such as those found in the Protein Data Bank...
- UGENEUGENEUGENE is free open-source cross-platform bioinformatics software.It integrates dozens of well-known biological tools and algorithms, providing both graphical user and command line interfaces...
- VisItVisItVisIt is an open source interactive parallel visualization and graphical analysis tool for viewing scientific data. It can be used to visualize scalar and vector fields defined on 2D and 3D structured and unstructured meshes...
- VMDVisual Molecular Dynamics- External links :* * *...
- YasaraYasaraYASARA, Yet Another Scientific Artificial Reality Application, is a molecular visualisation, modelling, and dynamics program that can be used for a series of scientific applications as is expressed by the large number of mentioning this software. The free version of YASARA is well suited for...
External links
- PDB Format Guide This is the current version (3.2) of the PDB format specification.
- PDBML A more recent, alternative XML-based file format for molecular coordinates.
- The RCSB Protein Data Bank
- Protein Data Bank in Europe
- The Molecular Modeling DataBase (MMDB) from NCBINational Center for Biotechnology InformationThe National Center for Biotechnology Information is part of the United States National Library of Medicine , a branch of the National Institutes of Health. The NCBI is located in Bethesda, Maryland and was founded in 1988 through legislation sponsored by Senator Claude Pepper...
- The Data Uniformity Project from PDB
- MakeMultimer An online tool for expanding BIOMT records in pdb files
- Molecules iPad/iPhone App to display PDB files