CONTENTS Introduction Supported and funded by History PDB Holdings list Member organizations Task forces PDB ID PDB File format Browse to WWW.RCSB.ORG/PDB/
PROTEIN DATA BANK PDB Single worldwide database and hundreds of secondary databases categorize the data differently. Key resource in the area of structural biology, stores 3D structural data of large biological molecules such as Proteins and Nucleic acids. Data is submitted by Biologists and Biochemists from all around the world to be freely accessible on internet via its member organizations’ websites and is updated weekly. The mission is to maintain a single Protein Data Bank Archive of Macromolecular Structural data.
SUPPORTED AND FUNDEDThe Protein Data Bank (PDB) is operated by: Rutgers, The State University of New Jersey. The San Diego Supercomputer Center at the University of California, San Diego. The Center for Advanced Research in Biotechnology of the National Institute of Standards and Technology -- the Research Collaboratory for Structural Bioinformatics (RCSB) The PDB is supported by funds from the National Science Foundation, the Department of Energy, and the National Institutes of Health.
PDB HISTORYTwo forces to initiate PDB: Growing collection of sets of protein structural data by X-Ray diffraction. Brookhaven Raster Display (BRAD), a molecular graphics display to visualize protein structures in 3D, emerged in 1968. In 1969, Dr Edger Meyer began to write software to store atomic coordinates files in a common format to make them available for geometric and graphical evaluation (with sponsorship of Dr Walton Hamilton at Bookhaven National Laboratory. In 1971, one of Dr Meyer’s programs- SEARCH- enabled networking i.e enabled the researchers to access information from database to study protein structures offline.
In 1973, upon Hamilton’s death, Dr Tom Koetzle took over direction of PDB for 20 years. mmCIF project completed and Structural genomics began in 1970s. In 1980s, IUCr guidelines established, number of structures deposited increases and independent biological databases established – e.g., the NDB. In Oct, 1998; PDB was transferred to Research Collaboratory for Structural Bioinformatics (RCSB), complete transfer since 1999. Dr Helen M Berman of Rutgers University was the new director. In 2003, with the formation of wwPDB, the PDB became an international organization having three member organizations. !n 2006, the BMRB joined PDB.
MEMBER ORGANIZATIONS Act as Data deposition, Data processing and Distribution centers for PDB data. Three are founding member organizations: PDBe…Protein Data Bank in Europe. PDBj…Protein Data Bank in Japan. RCSB…Research Collaboratory for Structural Bioinformatics. The Biological Magnetic Resonance Data Bank (BMRB) joined later in 2006. Another organization Worldwide Protein Data Bank (wwPDB) oversees PDB. wwPDB reviews and annotates each submitted entry and then it is automatically checked for plausibility( the source code for validation software is available.
TASK FORCES X-Ray diffraction (most of the structures)…approximations of the coordinates of atoms of proteins are obtained. E.g lyzozyme. NMR (about 15% e.g, haemoglobin)…estimations of distances between pairs of atoms of proteins. Final conformation is obtained after solving distance geometry problem. Cryo Electron Microscopy (very few protein e.g, crsysalin).
PDB IDENTIFIER (PDB ID) Each structure published in PDB receives a four character alphanumeric identifier or accession number. Like, 1ANG or 4hhb. However, this cant be used as an identifier for biomolecules. Because several structures for the same molecule in different environments or conformations-are contained in PDB with different PDB IDs.
PDB FILE FORMAT Standard data representation…encoded in data dictionary. The metadata model supporting this representation is used by all PDB data processing and database software tools. PDB file format was restricted to 80 characters per line initially. In 1996, macromolecular Crystallographic Information File (mmCIF) format started. In 2005, XML version called as PDBML, was described. The structure files can be downloaded in any of these three formats. The files are easily downloaded into graphics packages as well, using web services.
PDB 3D data file format ASCII column based: 80 columns per line KEYWORD for record type at col.#1 Header records Structure records Atom records (containing coordinates) ATOM, HETATM, ..., TER
PDB format Coordinate SectionATOM record Biopolymer residue atomHETATM record nonBiopolymer atomTER record chain terminator12345678901234567890123456789012345678901234567890123456789012345678901234567890ATOM 1 N ALA 1 11.104 6.134 -6.504 1.00 0.00 NATOM 2 CA ALA 1 11.639 6.071 -5.147 1.00 0.00 C...ATOM 293 1HG GLU 18 -14.861 -4.847 0.361 1.00 0.00 HATOM 294 2HG GLU 18 -13.518 -3.769 0.084 1.00 0.00 HTER 295 GLU 18HETATM 5555 CA 0.000 0.000 0.000
mmCIF mmCIF is the acronym for the macromolecular Crystallographic Information File. mmCIF is based on a subset of the syntax rules for the Self Defining Text Archive (STAR) file. A Dictionary Description Language (DDL) defines the structure of mmCIF dictionaries. Dictionaries provide the metadata which define the content of mmCIF data files. mmCIF data files, dictionaries and DDLs are all expressed in a common syntax.
POINT YOUR BROWSER TO:WWW.RCSB.ORG/PDB/put either a search term (for example, a protein name) or aPDB number
PROTEIN ANNOTATION If the contents of the PDB are thought of as primary data, then there are hundreds of derived (i.e., secondary) databases that categorize the data differently. For example, both SCOPand CATH categorize structures according to type of structure and assumed evolutionary relations; GO categorize structures based on genes.
The Structural Classificationof Proteins (SCOP) database isa largely manual classification ofprotein structural domains basedon similarities oftheir structures and aminoacid sequences
Class:the overall secondary-structure content of the domainArchitecture:high structural similarity but no evidenceof homology.Topology:a large-scale grouping of topologies which shareparticular structural featuresHomologous superfamily:indicative of a demonstrableevolutionary relationship.
Pfam is adatabaseof proteinfamilies thatincludes theirannotationsand multiplesequencealignmentgeneratedusing hiddenMarkov models
VIEWING THE DATA 56,523 structures In PDB have structure factor files. 6410 structures In PDB have NMR restraint files. 198 structures In PDB have chemical shifts files. Text file can be viewed or modified in editor. Structure files may be viewed using various free and commercial visualizations programs and Web browsers plug-ins like OPEN SOURCE PDB softweres Jmol Molekel MeshLab(able to import PDB data set and buildup surfaces from them) QuteMol Avogadro And others open but not free , like PYMOL , RASMOL, VIST PROT 3DS & STAR BIOCHEM
HTTP://WWW.RCSB.ORG/PDB/STATIC.DO ?P=SOFTWARE/SOFTWARE_LINKS/MOL ECULAR_GRAPHICS.HTMLThe RCSB PDB website containsan extensive list of both free andcommercial molecule visualizationprograms and web browser plug-in.
LIMITATION The Protein Data Bank (PDB) is the central archive of experimentally solved biomolecular structures. However, the PDB only allows data retrieval and does not provide functionality for collaboration or user feedback. In contrast, PDBWiki allows for sharing expert knowledge about structures deposited in the PDB. It provides tools for discussing and annotating proteins in a collaborative way. The goal is to create a central and freely-accessible repository of user-contributed information that will be useful for anyone working with PDB structures. As such PDBWiki can be considered a part of a wider effort in community-based biological databases curation.