SlideShare a Scribd company logo
1 of 19
Mr.Yogesh Joshi
WCBT
Structure database: PDB
Protein Data Bank
 http://www.rcsb.org/pdb/home/home.do
 A repository for 3-D biological
macromolecular structure.
 It includes proteins, nucleic acids.
 Obtained by X-Ray crystallography (80%) or
NMR spectroscopy (16%).
 Transferred to the Research Collaboratory for
Structural Bioinformatics (RCSB) in 1998.
 Currently it holds 141616 released structures.
 freely accessible on the Internet via the websites of its member
organisations (PDBe,PDBj,and RCSB).
 The PDB is overseen by an organization called the Worldwide
Protein Data Bank, wwPDB.
 The PDB is a key resource in areas of structural biology, such
as structural genomics.
 Most major scientific journals, and some funding agencies,
now require scientists to submit their structure data to the PDB.
 Many other databases use protein structures deposited
in the PDB. For example, SCOP and CATH classify
protein structures, while PDBsum provides a graphic
overview of PDB entries using information from other
sources, such as Gene ontology
History
 Founded in 1971 by Brookhaven National
Laboratory, New York.
 In October 1998,the PDB was transferred to the
Research Collaboratory for Structural
Bioinformatics (RCSB);
 In 2003, with the formation of the wwPDB, the
PDB became an international organization.
 The founding members are PDBe (Europe), RCSB
(USA), and PDBj (Japan).
Content
 The PDB database is updated weekly.
 As of 12 May 2017, the breakdown of current
holdings is as follows:
 In the past, the number of structures in the PDB has
grown at an approximately exponential rate passing
the 100,000 structures milestone in 2014.
PDB data formats/File Formats
 The file format initially used by the PDB was called the PDB
file format.
 PDB file format was used to contain the coordinates and related
information.
 In the late 1990’s, macromolecular Crystallographic Information file
(mmCIF) evolved.
 mmCIF and PDBML
 Push in to make structure files completely self-contained descriptions of
the experiment and details of the structure determination.
 PDB file format unstructured and obsolete
PDB File Format
 Text file – you can edit with a text editor e.g. WordPad
 Atomic co-ordinates
 Rich annotation
 Citation
 Experimental Method
 Biological source e.
 Etc.
Viewing the data
 The structure files may be viewed using one of open
software programme, including JMOL, PYMOL, and
RASMOL.
 Some other free, but not open source programs include
ICM-Browser,VMD, MDL Chime, UCSF Chimera, Swiss-
PDB Viewer, StarBiochem(a Java-based interactive
molecular viewer with integrated search of protein
databank), Sirius, and VisProt3DS(a tool for Protein
Visualization in 3D stereoscopic view and other modes).
 The RCSB PDB website contains an extensive list of both
free and commercial molecule visualization programs and
web browser plugins.
 Advanced search
 New features
 File format
PDB File Format
 A deposited set of protein coordinates becomes
an entry in PDB.
 A deposited set of protein coordinates becomes
an entry in PDB.
 One can search a structure in PDB using the
four-letter code or keywords related to its
annotation.
 The identified structure can be viewed directly
online or downloaded to a local computer for
analysis.
 The PDB website provides options for retrieval,
analysis, and direct viewing of macromolecular
 It also provides links to protein structural
classification results
available in databases such as SCOP and CATH.
• The data format in PDB was created in FORTAN
compatible format.
• Header:-
• The header section provides an overview of the
protein and the quality of the structure.
 It contains information about the name of the
molecule, source organism, bibliographic reference,
methods of structure determination, resolution,
crystallographic parameters, protein sequence,
cofactors, and description of structure types and
locations and sometimes secondary structure
information.
 Structure coordinates:-there are a specified number of
columns with predetermined contents.
 The ATOM part refers to protein atom information
whereas the HETATM(for heteroatom group) part refers
to atoms of cofactor or substrate molecules.
 Approximately ten columns of text and numbers are
designated
 They include information for the atom number, atom
name, residue name, polypeptide chain identifier,
residue number, x, y, and z Cartesian coordinates,
temperature factor, and occupancy factor.
 The last two parameters, occupancy and temperature
factors, relate to disorders of atomic positions in crystals.
 END
 Restriction:-
 The field width for polypeptide chains is only one
character in width, meaning that no more than 26 chains
can be used in a multisubunit protein model
mmCIF and MMDB Formats
 The most popular new formats include the
macromolecular crystallographic information file
(mmCIF) and the molecular modeling database
(MMDB) file.
 Both formats are highly parsable by computer
software, meaning that information in each field of
a record can be retrieved separately.
 These new formats facilitate the retrieval and
organization of information from database
structures.
 The mmCIF format is similar to the format for a
relational database in which a set of tables are
used to organize database records.
 Each table or field of information is explicitly
assigned by a tag and linked to other fields
through a special syntax.
 a single line of description in the header section
of PDB is divided into many lines or fields with
each field having explicit assignment of item
names and item values.
 Each field starts with an underscore character
followed by category name and keyword
description separated by a period.
 Using multiple fields with tags for the same
information has the advantage of providing one-
to-one relationship between item names and item
values.
MMDB
 Another new format is the MMDB format
developed by the NCBI to parse and sort pieces
of information in PDB.
 The objective is to allow the information to be
more easily integrated with GenBank and Medline
through Entrez.
 An MMDB file is written in the ASN.1 format which
has information in a record structured as a nested
hierarchy.
 This allows faster retrieval than mmCIF and PDB.
 Furthermore, the MMDB format includes bond
connectivity information for each molecule, called
a “chemical graph,” which is recorded in the
ASN.1 file.

More Related Content

What's hot

What's hot (20)

protein data bank
protein data bankprotein data bank
protein data bank
 
blast bioinformatics
blast bioinformaticsblast bioinformatics
blast bioinformatics
 
Ddbj
DdbjDdbj
Ddbj
 
Structural databases
Structural databases Structural databases
Structural databases
 
Protein information resource (PIR)
Protein information resource (PIR)Protein information resource (PIR)
Protein information resource (PIR)
 
Cath
CathCath
Cath
 
Swiss PROT
Swiss PROT Swiss PROT
Swiss PROT
 
Genome annotation
Genome annotationGenome annotation
Genome annotation
 
Scop database
Scop databaseScop database
Scop database
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Molecular modeling database
Molecular modeling database Molecular modeling database
Molecular modeling database
 
EMBL- European Molecular Biology Laboratory
EMBL- European Molecular Biology LaboratoryEMBL- European Molecular Biology Laboratory
EMBL- European Molecular Biology Laboratory
 
Protein Databases
Protein DatabasesProtein Databases
Protein Databases
 
TrEMBL
TrEMBLTrEMBL
TrEMBL
 
Blast and fasta
Blast and fastaBlast and fasta
Blast and fasta
 
Protein Data Bank
Protein Data BankProtein Data Bank
Protein Data Bank
 
NCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology InformationNCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology Information
 
Fasta
FastaFasta
Fasta
 
Protein Database
Protein DatabaseProtein Database
Protein Database
 
Entrez databases
Entrez databasesEntrez databases
Entrez databases
 

Similar to Protein data bank

Protein structure
Protein structureProtein structure
Protein structure
Pooja Pawar
 
Biological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBiological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdf
BioinformaticsCentre
 
Bioinformatic_Databases_2xcxzczxcxzxcxzc
Bioinformatic_Databases_2xcxzczxcxzxcxzcBioinformatic_Databases_2xcxzczxcxzxcxzc
Bioinformatic_Databases_2xcxzczxcxzxcxzc
AdiM27
 

Similar to Protein data bank (20)

Protein data bank
Protein data bankProtein data bank
Protein data bank
 
Introduction to pdb
Introduction to pdbIntroduction to pdb
Introduction to pdb
 
Pharmacoinformatics Database basics(sree)
Pharmacoinformatics Database basics(sree)Pharmacoinformatics Database basics(sree)
Pharmacoinformatics Database basics(sree)
 
Bioinformatics lecture xxiii
Bioinformatics lecture xxiiiBioinformatics lecture xxiii
Bioinformatics lecture xxiii
 
BITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequences
 
Types of biological databases-protein database
Types of biological databases-protein databaseTypes of biological databases-protein database
Types of biological databases-protein database
 
Protein sequence databases
Protein sequence databasesProtein sequence databases
Protein sequence databases
 
Protein structure
Protein structureProtein structure
Protein structure
 
R.P Maurya ppt on C C D C & DSSP(Bioinformatics)
R.P Maurya ppt  on C C D C & DSSP(Bioinformatics)R.P Maurya ppt  on C C D C & DSSP(Bioinformatics)
R.P Maurya ppt on C C D C & DSSP(Bioinformatics)
 
Data Retrieval Systems
Data Retrieval SystemsData Retrieval Systems
Data Retrieval Systems
 
Data retreival system
Data retreival systemData retreival system
Data retreival system
 
Proteins databases
Proteins databasesProteins databases
Proteins databases
 
Biological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBiological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdf
 
Protein Data Bank ( PDB ) - Bioinformatics
Protein Data Bank ( PDB ) - BioinformaticsProtein Data Bank ( PDB ) - Bioinformatics
Protein Data Bank ( PDB ) - Bioinformatics
 
Protein database ..... of NCBI
Protein database ..... of NCBI Protein database ..... of NCBI
Protein database ..... of NCBI
 
Databases
DatabasesDatabases
Databases
 
biological databases.pptx
biological databases.pptxbiological databases.pptx
biological databases.pptx
 
Bioinformatic databases 2
Bioinformatic databases 2Bioinformatic databases 2
Bioinformatic databases 2
 
Bioinformatic_Databases_2.ppt
Bioinformatic_Databases_2.pptBioinformatic_Databases_2.ppt
Bioinformatic_Databases_2.ppt
 
Bioinformatic_Databases_2xcxzczxcxzxcxzc
Bioinformatic_Databases_2xcxzczxcxzxcxzcBioinformatic_Databases_2xcxzczxcxzxcxzc
Bioinformatic_Databases_2xcxzczxcxzxcxzc
 

Recently uploaded

Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
Cherry
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
Cherry
 
COMPOSTING : types of compost, merits and demerits
COMPOSTING : types of compost, merits and demeritsCOMPOSTING : types of compost, merits and demerits
COMPOSTING : types of compost, merits and demerits
Cherry
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
Cherry
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 

Recently uploaded (20)

Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
Cot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNACot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNA
 
Concept of gene and Complementation test.pdf
Concept of gene and Complementation test.pdfConcept of gene and Complementation test.pdf
Concept of gene and Complementation test.pdf
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
 
Taphonomy and Quality of the Fossil Record
Taphonomy and Quality of the  Fossil RecordTaphonomy and Quality of the  Fossil Record
Taphonomy and Quality of the Fossil Record
 
GBSN - Biochemistry (Unit 3) Metabolism
GBSN - Biochemistry (Unit 3) MetabolismGBSN - Biochemistry (Unit 3) Metabolism
GBSN - Biochemistry (Unit 3) Metabolism
 
Daily Lesson Log in Science 9 Fourth Quarter Physics
Daily Lesson Log in Science 9 Fourth Quarter PhysicsDaily Lesson Log in Science 9 Fourth Quarter Physics
Daily Lesson Log in Science 9 Fourth Quarter Physics
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
COMPOSTING : types of compost, merits and demerits
COMPOSTING : types of compost, merits and demeritsCOMPOSTING : types of compost, merits and demerits
COMPOSTING : types of compost, merits and demerits
 
Plasmid: types, structure and functions.
Plasmid: types, structure and functions.Plasmid: types, structure and functions.
Plasmid: types, structure and functions.
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Site specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdfSite specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdf
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 

Protein data bank

  • 2. Protein Data Bank  http://www.rcsb.org/pdb/home/home.do  A repository for 3-D biological macromolecular structure.  It includes proteins, nucleic acids.  Obtained by X-Ray crystallography (80%) or NMR spectroscopy (16%).  Transferred to the Research Collaboratory for Structural Bioinformatics (RCSB) in 1998.  Currently it holds 141616 released structures.
  • 3.  freely accessible on the Internet via the websites of its member organisations (PDBe,PDBj,and RCSB).  The PDB is overseen by an organization called the Worldwide Protein Data Bank, wwPDB.  The PDB is a key resource in areas of structural biology, such as structural genomics.  Most major scientific journals, and some funding agencies, now require scientists to submit their structure data to the PDB.
  • 4.  Many other databases use protein structures deposited in the PDB. For example, SCOP and CATH classify protein structures, while PDBsum provides a graphic overview of PDB entries using information from other sources, such as Gene ontology
  • 5. History  Founded in 1971 by Brookhaven National Laboratory, New York.  In October 1998,the PDB was transferred to the Research Collaboratory for Structural Bioinformatics (RCSB);  In 2003, with the formation of the wwPDB, the PDB became an international organization.  The founding members are PDBe (Europe), RCSB (USA), and PDBj (Japan).
  • 6. Content  The PDB database is updated weekly.  As of 12 May 2017, the breakdown of current holdings is as follows:
  • 7.  In the past, the number of structures in the PDB has grown at an approximately exponential rate passing the 100,000 structures milestone in 2014.
  • 8. PDB data formats/File Formats  The file format initially used by the PDB was called the PDB file format.  PDB file format was used to contain the coordinates and related information.  In the late 1990’s, macromolecular Crystallographic Information file (mmCIF) evolved.  mmCIF and PDBML  Push in to make structure files completely self-contained descriptions of the experiment and details of the structure determination.  PDB file format unstructured and obsolete
  • 9. PDB File Format  Text file – you can edit with a text editor e.g. WordPad  Atomic co-ordinates  Rich annotation  Citation  Experimental Method  Biological source e.  Etc.
  • 10. Viewing the data  The structure files may be viewed using one of open software programme, including JMOL, PYMOL, and RASMOL.  Some other free, but not open source programs include ICM-Browser,VMD, MDL Chime, UCSF Chimera, Swiss- PDB Viewer, StarBiochem(a Java-based interactive molecular viewer with integrated search of protein databank), Sirius, and VisProt3DS(a tool for Protein Visualization in 3D stereoscopic view and other modes).  The RCSB PDB website contains an extensive list of both free and commercial molecule visualization programs and web browser plugins.
  • 11.  Advanced search  New features  File format
  • 12. PDB File Format  A deposited set of protein coordinates becomes an entry in PDB.  A deposited set of protein coordinates becomes an entry in PDB.  One can search a structure in PDB using the four-letter code or keywords related to its annotation.  The identified structure can be viewed directly online or downloaded to a local computer for analysis.  The PDB website provides options for retrieval, analysis, and direct viewing of macromolecular
  • 13.  It also provides links to protein structural classification results available in databases such as SCOP and CATH. • The data format in PDB was created in FORTAN compatible format. • Header:- • The header section provides an overview of the protein and the quality of the structure.  It contains information about the name of the molecule, source organism, bibliographic reference, methods of structure determination, resolution, crystallographic parameters, protein sequence, cofactors, and description of structure types and locations and sometimes secondary structure information.
  • 14.  Structure coordinates:-there are a specified number of columns with predetermined contents.  The ATOM part refers to protein atom information whereas the HETATM(for heteroatom group) part refers to atoms of cofactor or substrate molecules.  Approximately ten columns of text and numbers are designated  They include information for the atom number, atom name, residue name, polypeptide chain identifier, residue number, x, y, and z Cartesian coordinates, temperature factor, and occupancy factor.  The last two parameters, occupancy and temperature factors, relate to disorders of atomic positions in crystals.  END  Restriction:-  The field width for polypeptide chains is only one character in width, meaning that no more than 26 chains can be used in a multisubunit protein model
  • 15.
  • 16. mmCIF and MMDB Formats  The most popular new formats include the macromolecular crystallographic information file (mmCIF) and the molecular modeling database (MMDB) file.  Both formats are highly parsable by computer software, meaning that information in each field of a record can be retrieved separately.  These new formats facilitate the retrieval and organization of information from database structures.  The mmCIF format is similar to the format for a relational database in which a set of tables are used to organize database records.
  • 17.  Each table or field of information is explicitly assigned by a tag and linked to other fields through a special syntax.  a single line of description in the header section of PDB is divided into many lines or fields with each field having explicit assignment of item names and item values.  Each field starts with an underscore character followed by category name and keyword description separated by a period.  Using multiple fields with tags for the same information has the advantage of providing one- to-one relationship between item names and item values.
  • 18.
  • 19. MMDB  Another new format is the MMDB format developed by the NCBI to parse and sort pieces of information in PDB.  The objective is to allow the information to be more easily integrated with GenBank and Medline through Entrez.  An MMDB file is written in the ASN.1 format which has information in a record structured as a nested hierarchy.  This allows faster retrieval than mmCIF and PDB.  Furthermore, the MMDB format includes bond connectivity information for each molecule, called a “chemical graph,” which is recorded in the ASN.1 file.