SlideShare a Scribd company logo
1 of 23
Protein Database
Presented by:-
Rajpal Choudhary
M.Sc. Biotechnology 2nd year
161103004
Protein database
• SWISS-PROT: Annotated Sequence Database
• TrEMBL: Database of EMBL nucleotide translated sequences
• InterPro: Integrated resource for protein families, domains and functional sites.
• CluSTr: Offers an automatic classification of SWISS-PROT and TrEMBL.
• IPI: A non-redundant human proteome set constructed from SWISS-PROT, TrEMBL, Ensembl and
RefSeq.
• GOA: Provides assignments of gene products to the Gene Ontology (GO) resource.
• Proteome Analysis: Statistical and comparative analysis of the predicted proteomes of fully
sequenced organisms
• Protein Profiles: Tables of SWISS-PROT and TrEMBL entries and alignments for the protein
families of the Protein Profile.
Swiss-Prot
• Annotated protein sequence database established in 1986 and
maintained collaboratively since 1987, by the Department of Medical
Biochemistry of the University of Geneva and EBI
• Complete, Curated, Non-redundant and cross-referenced with 34 other
databases
• Highly cross-referenced
• Available from a variety of servers and through sequence analysis
software tools
• More than 8,000 different species
• First 20 species represent about 42% of all sequences in the database
• More than 1,29,000 entries with 4.7 X 1010 amino acids
• More than 6,22,000 entries in TrEMBL
TrEMBL (Translation of EMBL)
• Computer-annotated supplement to SWISS-PROT, as it is impossible to
cope with the flow of data.
• Well-structure SWISS-PROT-like resource
• Derived from automated EMBL CDS translation maintained at the EBI,
UK.
• TrEMBL is automatically generated and annotated using software tools
(incompatible with the SWISS-PROT in terms of quality)
• TrEMBL contains all what is not yet in SWISS-PROT
SWISS-PROT file format
SWISS-PROT file format
SWISS-PROT file format
SWISS-PROT file format
GenBank
(http://www.ncbi.nlm.nih.gov/genbank/)
• The GenBank is a sequence database that stores nucleotide
sequences and the proteins obtained from them by translations. This
database is maintained by National Center for Biotechnology
Information (NCBI). As of April 2011, the GenBank contains
approximately 126,551,501,141 numbers of bases in 135,440,924
numbers of sequences. Each sequence submitted to GenBank is
assigned a unique GenBank identifier or GenBank accession number.
Structure Databases
•MSD: Macromolecular Structure Database - A relational database
representation of clean Protein Data Bank (PDB).
•3DSeq: 3D sequence alignment server- Annotation of the alignments
between sequence database and the PDB.
•FSSP: Based on exhaustive all-against-all 3D structure comparison of
protein structures currently in the Protein Data Bank (PDB)
•DALI: Fold Classification based on Structure-Structure Assignments.
•3Dee: Database of protein domain definitions wherein the domains
have been clustered on sequence and structural similarity.
•NDB: Nucleic Acid Structure Database
htttp://www.rcsb.org/pdb/
Protein Data Bank (PDB)
• Important in solving real problems in molecular biology
• Protein Databank
• PDB Established in 1972 at Brookhaven National Laboratory
(BNL)
• Sole international repository of macromolecular structure data
• Moved to Research Collaboratory for Structural Bioinformatics
http://www.rcsb.org/
Effective use of PDB
• Queries are of three types
• PDBid - As quoted in paper
• Search Lite - one or more keywords
• Search Fields - A detailed query form
• Query results
• Structure Explorer - details of the structure
• Query Result Browser - for multiple structures
• PDB Viewer
SCOP (Structural Classification of Protein)
(http://scop.mrc-lmb.cam.ac.uk/scop/)
• The Structural Classification of Proteins (SCOP) database is basically
a database with manual classification of protein structural
domains. The whole concept is based on similarities of the amino
acid sequences and three- dimensional structures of the proteins.
The database was originally published in 1995 and it is usually
updated at least once yearly by Alexei G. Murzin and his
colleagues.
• SCOP database uses the following protein structural hierarchy:
• Class—It is the general structural architecture of the protein domains.
• Fold—It represents similar arrangement of regular secondary
structures but without evidence of evolutionary relatedness.
• Superfamily—It represents whether the protein structures have
sufficient structural and functional similarities to each other to infer a
divergent evolutionary relation- ship but not necessarily a detectable
sequence homology.
• Family—Proteins belonging to the same family share some sequence
similarity.
• SCOP has the following classes:
• 1) Proteins with mostly α-helical domains;
• 2) Proteins with mostly β-sheet domains;
• 3) Proteins with α/β domains which contain beta- alpha-beta
structural units or motifs that form mainly parallel β-sheets;
• 4) Proteins with mostly α + β domains consisting of independent α-
helices and mainly antiparallel β-sheets;
CATH (Class Architecture Topology Homology)
(http://www.cathdb.info/)
• The CATH Protein Structure Classification method is a semi-automatic,
hierarchical classification of protein domains. The database was first
published in 1997 by Christine Orengo, Janet Thornton and their
colleagues. CATH carries many broad features with its principal rival,
SCOP. However there are also many areas in which the detailed
classifications in the two databases differ greatly.
• CATH clusters proteins at four major levels:
• 1) Class (C): Class is derived from secondary structure contents of proteins.
It is assigned for more than 90% of protein structures automatically.
• 2) Architecture (A): Architecture describes the gross orientation of
secondary structures, independent of connectivity in proteins.
• 3) Topology (T): Topology level of CATH clusters protein structures
according to their topological connections & numbers of secondary
structures.
• 4) Homologous Superfamily (H): Homologous super- families of CATH
cluster the proteins with highly similar structures & functions.
Conclusion
The different databases discussed here provide different information.
PDB gives both structural and sequence information of
macromolecules whereas SCOP & CATH have structures based on their
evolutionary relationships and folding classes. The structural
classifications of proteins are generally obtained from SCOP and CATH.
On the other hand, the UniProt/Swissprot database provides the
sequence annotations of proteins along with links to the external
databases like PDB. All these databases are increasing day by day. There
are other databases and some new databases are coming. But the
databases discussed here are considered to be the foundation stones
of bioinformatics.
REFERENCES
• G. Murzin, S. E. Brenner, T. Hubbard and C. Chothia, “SCOP: A Structural Classification of Proteins Database
for the Investigation of Sequences and Structures,” Jour- nal of Molecular Biology, Vol. 247, No. 4, 1995, pp.
536- 540. doi:10.1016/S0022-2836(05)80134-2
• L. Lo Conte, S. E. Brenner, T. J. Hubbard, C. Chothia and A. G. Murzin, “SCOP Database in 2002: Refinements
Accommodate Structural Genomics,” Nucleic Acids Re- search, Vol. 30, No. 1, 2002, pp. 264-267.
doi:10.1093/nar/30.1.264
• Andreeva, D. Howorth, S. E. Brenner, T. J. Hubbard, C. Chothia and A. G. Murzin, “SCOP Database in 2004: Re-
finements Integrate Structure and Sequence Family Data,” Nucleic Acids Research, Vol. 32, Suppl. 1, 2004,
pp. D226-D229. doi:10.1093/nar/gkh039
• R. Day, D. A. Beck, R. S. Armen and V. Daggett, “A Consensus View of Fold Space: Combining SCOP, CATH, and
the Dali Domain Dictionary,” Protein Scien- ce, Vol. 12, No. 10, 2003, pp. 2150-2160. doi:10.1110/ps.0306803
Thank you

More Related Content

What's hot

What's hot (20)

Protein 3 d structure prediction
Protein 3 d structure predictionProtein 3 d structure prediction
Protein 3 d structure prediction
 
Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment   Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment
 
Primary and secondary databases ppt by puneet kulyana
Primary and secondary databases ppt by puneet kulyanaPrimary and secondary databases ppt by puneet kulyana
Primary and secondary databases ppt by puneet kulyana
 
Swiss prot database
Swiss prot databaseSwiss prot database
Swiss prot database
 
Primary and secondary database
Primary and secondary databasePrimary and secondary database
Primary and secondary database
 
sequence of file formats in bioinformatics
sequence of file formats in bioinformaticssequence of file formats in bioinformatics
sequence of file formats in bioinformatics
 
Genome Database Systems
Genome Database Systems Genome Database Systems
Genome Database Systems
 
NCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology InformationNCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology Information
 
Biological database
Biological databaseBiological database
Biological database
 
Proteins databases
Proteins databasesProteins databases
Proteins databases
 
blast bioinformatics
blast bioinformaticsblast bioinformatics
blast bioinformatics
 
protein data bank
protein data bankprotein data bank
protein data bank
 
Introduction to NCBI
Introduction to NCBIIntroduction to NCBI
Introduction to NCBI
 
MULTIPLE SEQUENCE ALIGNMENT
MULTIPLE  SEQUENCE  ALIGNMENTMULTIPLE  SEQUENCE  ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENT
 
Protein Databases
Protein DatabasesProtein Databases
Protein Databases
 
Blast and fasta
Blast and fastaBlast and fasta
Blast and fasta
 
Ddbj
DdbjDdbj
Ddbj
 
Sequence file formats
Sequence file formatsSequence file formats
Sequence file formats
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins
 
UniProt
UniProtUniProt
UniProt
 

Similar to Protein database

Protein databases
Protein databasesProtein databases
Protein databases
sarumalay
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
SBituila
 

Similar to Protein database (20)

Types of biological databases-protein database
Types of biological databases-protein databaseTypes of biological databases-protein database
Types of biological databases-protein database
 
Structural database and their classification by abdul qahar
Structural database and their classification by abdul qaharStructural database and their classification by abdul qahar
Structural database and their classification by abdul qahar
 
Protein database
Protein  databaseProtein  database
Protein database
 
PIR- Protein Information Resource
PIR- Protein Information ResourcePIR- Protein Information Resource
PIR- Protein Information Resource
 
Introduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASEIntroduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASE
 
Important protein databases and proteomics softwares
Important protein databases and proteomics softwaresImportant protein databases and proteomics softwares
Important protein databases and proteomics softwares
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Protein databases
Protein databasesProtein databases
Protein databases
 
PROTEIN DATABASE
PROTEIN DATABASEPROTEIN DATABASE
PROTEIN DATABASE
 
Protein Sequence Databases
Protein Sequence Databases Protein Sequence Databases
Protein Sequence Databases
 
Biological databases
Biological databases Biological databases
Biological databases
 
Primary Bioinformatics Database.pptx
Primary Bioinformatics Database.pptxPrimary Bioinformatics Database.pptx
Primary Bioinformatics Database.pptx
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
 
Protein Databases
Protein DatabasesProtein Databases
Protein Databases
 
biological databases.pptx
biological databases.pptxbiological databases.pptx
biological databases.pptx
 
database retrival.pdf
database retrival.pdfdatabase retrival.pdf
database retrival.pdf
 
Data Retrieval Systems
Data Retrieval SystemsData Retrieval Systems
Data Retrieval Systems
 
Presentation on Biological database By Elufer Akram @ University Of Science ...
Presentation on Biological database  By Elufer Akram @ University Of Science ...Presentation on Biological database  By Elufer Akram @ University Of Science ...
Presentation on Biological database By Elufer Akram @ University Of Science ...
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
 

More from Rajpal Choudhary

More from Rajpal Choudhary (20)

Chromosome theory of inheritance
Chromosome theory of inheritanceChromosome theory of inheritance
Chromosome theory of inheritance
 
BLAST Search tool
BLAST Search toolBLAST Search tool
BLAST Search tool
 
Different Levels of protein
Different Levels of proteinDifferent Levels of protein
Different Levels of protein
 
Hybridoma technology and production of monoclonal antibody
Hybridoma technology and production of monoclonal antibodyHybridoma technology and production of monoclonal antibody
Hybridoma technology and production of monoclonal antibody
 
Linkage analysis and genome mapping
Linkage analysis and genome mappingLinkage analysis and genome mapping
Linkage analysis and genome mapping
 
Cancer genetics and diagnosis
Cancer genetics and diagnosisCancer genetics and diagnosis
Cancer genetics and diagnosis
 
Microbial and chemical analysis of potable water
Microbial and chemical analysis of potable waterMicrobial and chemical analysis of potable water
Microbial and chemical analysis of potable water
 
Animal cell culture and its techniques
Animal cell culture and its techniquesAnimal cell culture and its techniques
Animal cell culture and its techniques
 
Epistasis and its different types
Epistasis and its different typesEpistasis and its different types
Epistasis and its different types
 
Escherichia coli as water indicator
Escherichia coli as water indicatorEscherichia coli as water indicator
Escherichia coli as water indicator
 
Advanced techniques in animal cell culture
Advanced techniques in animal cell cultureAdvanced techniques in animal cell culture
Advanced techniques in animal cell culture
 
Cell wall structure and function
Cell wall structure and functionCell wall structure and function
Cell wall structure and function
 
Vaccines AND THEIR ROLE
Vaccines AND THEIR ROLEVaccines AND THEIR ROLE
Vaccines AND THEIR ROLE
 
Enzyme inhibition AND ITS TYPES
Enzyme inhibition AND ITS TYPES Enzyme inhibition AND ITS TYPES
Enzyme inhibition AND ITS TYPES
 
Elisa AND ITS APPLICATION
Elisa AND ITS APPLICATIONElisa AND ITS APPLICATION
Elisa AND ITS APPLICATION
 
Biofertilizer and biopesticides
Biofertilizer and biopesticidesBiofertilizer and biopesticides
Biofertilizer and biopesticides
 
Adenoviral cloning vectors
Adenoviral cloning vectorsAdenoviral cloning vectors
Adenoviral cloning vectors
 
Antigen processing and presentation
Antigen processing and presentationAntigen processing and presentation
Antigen processing and presentation
 
Agrobacterium tumefaciens
Agrobacterium tumefaciensAgrobacterium tumefaciens
Agrobacterium tumefaciens
 
Antigen antibody interaction
Antigen antibody interactionAntigen antibody interaction
Antigen antibody interaction
 

Recently uploaded

The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Silpa
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 

Recently uploaded (20)

Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Plasmid: types, structure and functions.
Plasmid: types, structure and functions.Plasmid: types, structure and functions.
Plasmid: types, structure and functions.
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditions
 
Genome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptxGenome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptx
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 

Protein database

  • 1. Protein Database Presented by:- Rajpal Choudhary M.Sc. Biotechnology 2nd year 161103004
  • 2. Protein database • SWISS-PROT: Annotated Sequence Database • TrEMBL: Database of EMBL nucleotide translated sequences • InterPro: Integrated resource for protein families, domains and functional sites. • CluSTr: Offers an automatic classification of SWISS-PROT and TrEMBL. • IPI: A non-redundant human proteome set constructed from SWISS-PROT, TrEMBL, Ensembl and RefSeq. • GOA: Provides assignments of gene products to the Gene Ontology (GO) resource. • Proteome Analysis: Statistical and comparative analysis of the predicted proteomes of fully sequenced organisms • Protein Profiles: Tables of SWISS-PROT and TrEMBL entries and alignments for the protein families of the Protein Profile.
  • 3. Swiss-Prot • Annotated protein sequence database established in 1986 and maintained collaboratively since 1987, by the Department of Medical Biochemistry of the University of Geneva and EBI • Complete, Curated, Non-redundant and cross-referenced with 34 other databases • Highly cross-referenced • Available from a variety of servers and through sequence analysis software tools • More than 8,000 different species • First 20 species represent about 42% of all sequences in the database • More than 1,29,000 entries with 4.7 X 1010 amino acids • More than 6,22,000 entries in TrEMBL
  • 4. TrEMBL (Translation of EMBL) • Computer-annotated supplement to SWISS-PROT, as it is impossible to cope with the flow of data. • Well-structure SWISS-PROT-like resource • Derived from automated EMBL CDS translation maintained at the EBI, UK. • TrEMBL is automatically generated and annotated using software tools (incompatible with the SWISS-PROT in terms of quality) • TrEMBL contains all what is not yet in SWISS-PROT
  • 9. GenBank (http://www.ncbi.nlm.nih.gov/genbank/) • The GenBank is a sequence database that stores nucleotide sequences and the proteins obtained from them by translations. This database is maintained by National Center for Biotechnology Information (NCBI). As of April 2011, the GenBank contains approximately 126,551,501,141 numbers of bases in 135,440,924 numbers of sequences. Each sequence submitted to GenBank is assigned a unique GenBank identifier or GenBank accession number.
  • 10. Structure Databases •MSD: Macromolecular Structure Database - A relational database representation of clean Protein Data Bank (PDB). •3DSeq: 3D sequence alignment server- Annotation of the alignments between sequence database and the PDB. •FSSP: Based on exhaustive all-against-all 3D structure comparison of protein structures currently in the Protein Data Bank (PDB) •DALI: Fold Classification based on Structure-Structure Assignments. •3Dee: Database of protein domain definitions wherein the domains have been clustered on sequence and structural similarity. •NDB: Nucleic Acid Structure Database
  • 12. Protein Data Bank (PDB) • Important in solving real problems in molecular biology • Protein Databank • PDB Established in 1972 at Brookhaven National Laboratory (BNL) • Sole international repository of macromolecular structure data • Moved to Research Collaboratory for Structural Bioinformatics http://www.rcsb.org/
  • 13. Effective use of PDB • Queries are of three types • PDBid - As quoted in paper • Search Lite - one or more keywords • Search Fields - A detailed query form • Query results • Structure Explorer - details of the structure • Query Result Browser - for multiple structures • PDB Viewer
  • 14.
  • 15.
  • 16. SCOP (Structural Classification of Protein) (http://scop.mrc-lmb.cam.ac.uk/scop/) • The Structural Classification of Proteins (SCOP) database is basically a database with manual classification of protein structural domains. The whole concept is based on similarities of the amino acid sequences and three- dimensional structures of the proteins. The database was originally published in 1995 and it is usually updated at least once yearly by Alexei G. Murzin and his colleagues.
  • 17. • SCOP database uses the following protein structural hierarchy: • Class—It is the general structural architecture of the protein domains. • Fold—It represents similar arrangement of regular secondary structures but without evidence of evolutionary relatedness. • Superfamily—It represents whether the protein structures have sufficient structural and functional similarities to each other to infer a divergent evolutionary relation- ship but not necessarily a detectable sequence homology. • Family—Proteins belonging to the same family share some sequence similarity.
  • 18. • SCOP has the following classes: • 1) Proteins with mostly α-helical domains; • 2) Proteins with mostly β-sheet domains; • 3) Proteins with α/β domains which contain beta- alpha-beta structural units or motifs that form mainly parallel β-sheets; • 4) Proteins with mostly α + β domains consisting of independent α- helices and mainly antiparallel β-sheets;
  • 19. CATH (Class Architecture Topology Homology) (http://www.cathdb.info/) • The CATH Protein Structure Classification method is a semi-automatic, hierarchical classification of protein domains. The database was first published in 1997 by Christine Orengo, Janet Thornton and their colleagues. CATH carries many broad features with its principal rival, SCOP. However there are also many areas in which the detailed classifications in the two databases differ greatly.
  • 20. • CATH clusters proteins at four major levels: • 1) Class (C): Class is derived from secondary structure contents of proteins. It is assigned for more than 90% of protein structures automatically. • 2) Architecture (A): Architecture describes the gross orientation of secondary structures, independent of connectivity in proteins. • 3) Topology (T): Topology level of CATH clusters protein structures according to their topological connections & numbers of secondary structures. • 4) Homologous Superfamily (H): Homologous super- families of CATH cluster the proteins with highly similar structures & functions.
  • 21. Conclusion The different databases discussed here provide different information. PDB gives both structural and sequence information of macromolecules whereas SCOP & CATH have structures based on their evolutionary relationships and folding classes. The structural classifications of proteins are generally obtained from SCOP and CATH. On the other hand, the UniProt/Swissprot database provides the sequence annotations of proteins along with links to the external databases like PDB. All these databases are increasing day by day. There are other databases and some new databases are coming. But the databases discussed here are considered to be the foundation stones of bioinformatics.
  • 22. REFERENCES • G. Murzin, S. E. Brenner, T. Hubbard and C. Chothia, “SCOP: A Structural Classification of Proteins Database for the Investigation of Sequences and Structures,” Jour- nal of Molecular Biology, Vol. 247, No. 4, 1995, pp. 536- 540. doi:10.1016/S0022-2836(05)80134-2 • L. Lo Conte, S. E. Brenner, T. J. Hubbard, C. Chothia and A. G. Murzin, “SCOP Database in 2002: Refinements Accommodate Structural Genomics,” Nucleic Acids Re- search, Vol. 30, No. 1, 2002, pp. 264-267. doi:10.1093/nar/30.1.264 • Andreeva, D. Howorth, S. E. Brenner, T. J. Hubbard, C. Chothia and A. G. Murzin, “SCOP Database in 2004: Re- finements Integrate Structure and Sequence Family Data,” Nucleic Acids Research, Vol. 32, Suppl. 1, 2004, pp. D226-D229. doi:10.1093/nar/gkh039 • R. Day, D. A. Beck, R. S. Armen and V. Daggett, “A Consensus View of Fold Space: Combining SCOP, CATH, and the Dali Domain Dictionary,” Protein Scien- ce, Vol. 12, No. 10, 2003, pp. 2150-2160. doi:10.1110/ps.0306803