SlideShare a Scribd company logo
1 of 15
Download to read offline
1/22/2024 Computational Structural Biology (BIO455) - CC 86
BIO455: Protein databases
1/23/2024 Computational Structural Biology (BIO455) - CC 87
Biological data are complex, exception-ridden, vast and incomplete. A collection of biological data arranged
in computer readable form that enhances the speed of search and retrieval and convenient to use is called
biological database.
The main purpose of a biological database is to store and manage biological data and information in
computer readable forms.
A range of information like
 biological sequences
 structures
 binding sites
 metabolic interactions
 molecular action
 functional relationships
 protein families, motifs and homologous
can be retrieved by using biological databases.
Biological databases
1/23/2024 Computational Structural Biology (BIO455) - CC 88
 It can also be called an archival database since it archives the experimental results submitted by the
scientists.
 The primary database is populated with experimentally derived data like genome sequence, macromolecular
structure, etc. The data entered here remains uncurated (no modifications are performed over the data).
 It contains unique data obtained from the laboratory and these data are made accessible to normal users
without any change.
 The data are given accession numbers when they are entered into the database. The same data can later be
retrieved using the accession number. Accession number identifies each data uniquely and it never changes.
Examples –
Nucleic Acid Databases: GenBank and DDBJ
Protein Databases: PDB,SwissProt, PIR, TrEMBL, Metacyc, etc.
Primary databases
1/23/2024 Computational Structural Biology (BIO455) - CC 89
 The data stored in these types of databases are the analyzed result of the primary database.
 Computational algorithms are applied to the primary database and meaningful and informative data is
stored inside the secondary database.
 The data here are highly curated(processing the data before it is presented in the database).
 A secondary database is better and contains more valuable knowledge compared to the primary database.
 Examples:
InterPro (protein families, motifs, and domains)
UniProt Knowledgebase (sequence and functional information on proteins)
Secondary Database:
1/23/2024 Computational Structural Biology (BIO455) - CC 90
 The data entered in these types of databases are first compared and then filtered based on desired criteria.

 The initial data are taken from the primary database, and then they are merged together based on certain
conditions.
 It helps in searching sequences rapidly. Derived Databases contain non-redundant data.
Derived Databases
Examples:
SCOP, CATH, KEGG
1/23/2024 Computational Structural Biology (BIO455) - CC 91
Biological databases
1/23/2024 Computational Structural Biology (BIO455) - CC 92
Protein Sequence Databases
PIR
( https://proteininformationresource.org/ )
 PIR (Protein Information Resource) is a popular protein sequence database that provides information on
functionally annotated protein sequences.
 PIR maintains three databases, the Protein Sequence Database (PSD), the Non-redundant Reference (NREF)
sequence database, and the integrated Protein Classification (iProClass) database, which contains
annotated protein sequences, classification information, and protein family, function, and structure
information.
1/23/2024 Computational Structural Biology (BIO455) - CC
93
SWISS-PROT
(integrated with Uniprot)
 SWISS-PROT is a protein sequence database that provides high levels of annotations, including
information on the protein’s function, domain structure, post-translational modifications, and variants.

 Swiss-Prot is jointly managed by the SIB (Swiss Institute of Bioinformatics) and the EBI (European
Bioinformatics Institute).
 The database distinguishes itself from other protein sequence databases by three criteria:
(i) annotations, which cover a broad range of information,
(ii) minimal redundancy, which ensures that each sequence is represented only once, and
(iii) integration with other databases, which enables cross-referencing and retrieval of information from
related databases.
TrEMBL
 TrEMBL is a computer-annotated supplement of Swiss-Prot. TrEMBL entries follow the Swiss-Prot format.
 It contains all the translations of EMBL (European Molecular Biology Laboratory) nucleotide sequence entries
that have not yet been integrated into Swiss-Prot.
1/23/2024 Computational Structural Biology (BIO455) - CC 94
Protein Structure Databases: Protein Data Bank (PDB)
Protein structure databases are collections of information related to the
three-dimensional structure and secondary structure of proteins.
 The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological
molecules, such as proteins and nucleic acids.
 The data, typically obtained by X-ray crystallography, NMR spectroscopy, or, increasingly, cryo-electron
microscopy, and submitted by biologists and biochemists from around the world, are freely accessible on
the Internet via the websites of its member organisations, PDBe, PDBj, RCSB, and BMRB
 Most major scientific journals and some funding agencies now require scientists to submit their structure
data to the PDB.
 Many other databases use protein structures deposited in the PDB. For example, SCOP and CATH classify
protein structures, while PDBsum provides a graphic overview of PDB entries using information from
other sources, such as Gene ontology.
www.wwpdb.org
ebi.ac.uk
www.rcsb.org
bmrb.io
pdbj.org
1/23/2024 Computational Structural Biology (BIO455) - CC 95
I. Class: Types of folds, e.g., beta sheets.
II. Fold: The different shapes of domains within a class.
III. Superfamily: The domains in a fold are grouped into superfamilies, which have at least a distant common ancestor.
IV. Family: The domains in a superfamily are grouped into families, which have a more recent common ancestor.
V. Protein domain: The domains in families are grouped into protein domains, which are essentially the same protein.
VI. Species: The domains in "protein domains" are grouped according to species.
VII. Domain: part of a protein. For simple proteins, it can be the entire protein.
Structural Classification of Proteins (SCOP) database
 Manual classification of protein structural domains based on similarities of their structures and amino acid
sequences.
 A motivation for this classification is to determine the evolutionary relationship between proteins.
 Proteins with the same shapes but having little sequence or functional similarity are placed in different
superfamilies, and are assumed to have only a very distant common ancestor.
 Proteins having the same shape and some similarity of sequence and/or function are placed in "families",
and are assumed to have a closer common ancestor.
http://scop.mrc-lmb.cam.ac.uk/scop/
1/23/2024 Computational Structural Biology (BIO455) - CC 96
1.All alpha proteins: Domains consisting of α-helices
2.All beta proteins: Domains consisting of β-sheets
3.Alpha and beta proteins: Mainly parallel beta sheets (beta-alpha-beta units)
4.Alpha and beta proteins (a+b): Mainly antiparallel beta sheets (segregated alpha and beta regions)
5.Multi-domain proteins (alpha and beta): Folds consisting of two or more domains belonging to different classes
6.Membrane and cell surface proteins and peptides: Does not include proteins in the immune system
7.Small proteins : Usually dominated by metal ligand, cofactor, and/or disulfide bridges
Classes
Folds
 Each class contains a number of distinct folds. This classification level indicates similar tertiary structure,
but not necessarily evolutionary relatedness.
 For example, the "All-α proteins" class contains >280 distinct folds, including:
 Globin-like (core: 6 helices; folded leaf, partly opened),
 long alpha-hairpin (2 helices; antiparallel hairpin, left-handed twist) and
 Type I dockerin domains (tandem repeat of two calcium-binding loop-helix motifs)
 Domains within a fold are further classified into superfamilies.
 This is a largest grouping of proteins for which structural similarity is sufficient to indicate evolutionary relatedness
and therefore share a common ancestor.
 For example, the two superfamilies of the "Globin-like" fold are: the Globin superfamily and alpha-helical
ferredoxin superfamily
Superfamily
1/23/2024 Computational Structural Biology (BIO455) - CC 97
CATH database
cathdb.info
The CATH Protein Structure Classification database is a free, publicly available online resource that provides
information on the evolutionary relationships of protein domains.
The four main levels of the CATH hierarchy:
1. Class: The overall secondary-structure content of the domain. (Equivalent to the SCOP Class)
2. Architecture: High structural similarity but no evidence of homology.
3. Topology/fold: A large-scale grouping of topologies which share particular structural features (Equivalent
to the 'fold' level in SCOP)
4. Homologous superfamily: Indicative of a demonstrable evolutionary relationship. (Equivalent to SCOP
superfamily)
1/23/2024 Computational Structural Biology (BIO455) - CC 98
Protein-Protein Interaction Databases
 Protein-protein interaction databases are collections of information on the interactions between proteins.
 Relationships between different proteins and their functions in biological systems.
BIND (https://bio.tools/bind )
 BIND (Biomolecular Interaction Network Database) is a database that stores detailed descriptions of interactions,
molecular complexes, and pathways between various biomolecules, including proteins, nucleic acids, and small
molecules.
 The database is designed to be used for data mining and can be used to study networks of interactions and map
pathways across different species. The database can also provide information for kinetic simulations.
DIP (https://dip.doe-mbi.ucla.edu/dip/Main.cgi )
 DIP (Database of Interacting Proteins) is a database that contains protein-protein interaction information that has been
compiled through both manual curations and computational methods.
 It is useful for understanding protein functions, and their relationships with other proteins. It can also be used to study
the properties of networks of interacting proteins, evaluate predictions of protein-protein interactions, and explore the
evolution of these interactions.
MINT (https://mint.bio.uniroma2.it/ )
 MINT (Molecular Interaction) is a database that stores information on functional interactions between biological
molecules such as proteins, RNA, and DNA.
 It also stores information on enzymatic modifications of partner molecules.
 The database primarily focuses on experimentally verified protein-protein interactions and considers both direct and
indirect relationships.
1/23/2024 Computational Structural Biology (BIO455) - CC 99
Protein Pattern and Profile Databases
 Protein pattern and profile databases contain information on motifs found in sequences.
 Sequence motifs correspond to structural or functional features in proteins.
 So, the use of protein sequence patterns or profiles is a valuable tool in determining the function of proteins.
InterPro (https://www.ebi.ac.uk/interpro/ )
 InterPro is a database that contains information on protein families, domains, and functional sites.
 It was created by combining several major protein signature databases, including PROSITE, Pfam, PRINTS, ProDom, and
SMART into a single comprehensive resource.
PROSITE (https://prosite.expasy.org/ )
 PROSITE is a collection of signatures that identify patterns or profiles in proteins, which can provide information on
their biological functions.
 The signatures in the database are linked to annotation documents that provide information on the protein family or
domain detected, including its name, function, 3D structure, and references.
1/23/2024 Computational Structural Biology (BIO455) - CC 100
Metabolic Pathway Databases
Metabolic pathway databases contain information about enzymes, biochemical reactions, and metabolic pathways.
ENZYME (https://enzyme.expasy.org/ )
 ENZYME is a database that stores information on enzyme nomenclature.
 It is used as the nomenclature source for enzyme names and reactions by most metabolic databases as well as by other
biomolecular databases.
KEGG (https://www.genome.jp/kegg/pathway.html )
 KEGG (Kyoto Encyclopedia of Genes and Genomes) is a comprehensive database that maps out molecular and cellular
pathways involving interactions between genes and molecules.
 It is composed of pathway maps, molecule tables, gene tables, and genome maps, and is used to build functional maps
of metabolic and regulatory pathways.

More Related Content

Similar to Types of biological databases-protein database

Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
SBituila
 
Protein structure
Protein structureProtein structure
Protein structure
Pooja Pawar
 
Biological databases.pptx
Biological databases.pptxBiological databases.pptx
Biological databases.pptx
PagudalaSangeetha
 
100505 koenig biological_databases
100505 koenig biological_databases100505 koenig biological_databases
100505 koenig biological_databases
Meetika Gupta
 

Similar to Types of biological databases-protein database (20)

Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
 
Protein database
Protein databaseProtein database
Protein database
 
Proteins databases
Proteins databasesProteins databases
Proteins databases
 
Biological database
Biological databaseBiological database
Biological database
 
Protein Sequence Databases
Protein Sequence Databases Protein Sequence Databases
Protein Sequence Databases
 
PIR- Protein Information Resource
PIR- Protein Information ResourcePIR- Protein Information Resource
PIR- Protein Information Resource
 
Structural database and their classification by abdul qahar
Structural database and their classification by abdul qaharStructural database and their classification by abdul qahar
Structural database and their classification by abdul qahar
 
2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練 2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Protein structure
Protein structureProtein structure
Protein structure
 
database retrival.pdf
database retrival.pdfdatabase retrival.pdf
database retrival.pdf
 
Biological databases.pptx
Biological databases.pptxBiological databases.pptx
Biological databases.pptx
 
Data Retrieval Systems
Data Retrieval SystemsData Retrieval Systems
Data Retrieval Systems
 
Bioinformatics biological databases
Bioinformatics biological databasesBioinformatics biological databases
Bioinformatics biological databases
 
BITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequences
 
Biological Database
Biological DatabaseBiological Database
Biological Database
 
100505 koenig biological_databases
100505 koenig biological_databases100505 koenig biological_databases
100505 koenig biological_databases
 
Presentation on Biological database By Elufer Akram @ University Of Science ...
Presentation on Biological database  By Elufer Akram @ University Of Science ...Presentation on Biological database  By Elufer Akram @ University Of Science ...
Presentation on Biological database By Elufer Akram @ University Of Science ...
 
Protein structure classification/domain prediction: SCOP and CATH (Bioinforma...
Protein structure classification/domain prediction: SCOP and CATH (Bioinforma...Protein structure classification/domain prediction: SCOP and CATH (Bioinforma...
Protein structure classification/domain prediction: SCOP and CATH (Bioinforma...
 

Recently uploaded

Recently uploaded (20)

FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
dusjagr & nano talk on open tools for agriculture research and learning
dusjagr & nano talk on open tools for agriculture research and learningdusjagr & nano talk on open tools for agriculture research and learning
dusjagr & nano talk on open tools for agriculture research and learning
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdfFICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
How to Add a Tool Tip to a Field in Odoo 17
How to Add a Tool Tip to a Field in Odoo 17How to Add a Tool Tip to a Field in Odoo 17
How to Add a Tool Tip to a Field in Odoo 17
 
How to Manage Call for Tendor in Odoo 17
How to Manage Call for Tendor in Odoo 17How to Manage Call for Tendor in Odoo 17
How to Manage Call for Tendor in Odoo 17
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
latest AZ-104 Exam Questions and Answers
latest AZ-104 Exam Questions and Answerslatest AZ-104 Exam Questions and Answers
latest AZ-104 Exam Questions and Answers
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 

Types of biological databases-protein database

  • 1. 1/22/2024 Computational Structural Biology (BIO455) - CC 86 BIO455: Protein databases
  • 2. 1/23/2024 Computational Structural Biology (BIO455) - CC 87 Biological data are complex, exception-ridden, vast and incomplete. A collection of biological data arranged in computer readable form that enhances the speed of search and retrieval and convenient to use is called biological database. The main purpose of a biological database is to store and manage biological data and information in computer readable forms. A range of information like  biological sequences  structures  binding sites  metabolic interactions  molecular action  functional relationships  protein families, motifs and homologous can be retrieved by using biological databases. Biological databases
  • 3. 1/23/2024 Computational Structural Biology (BIO455) - CC 88  It can also be called an archival database since it archives the experimental results submitted by the scientists.  The primary database is populated with experimentally derived data like genome sequence, macromolecular structure, etc. The data entered here remains uncurated (no modifications are performed over the data).  It contains unique data obtained from the laboratory and these data are made accessible to normal users without any change.  The data are given accession numbers when they are entered into the database. The same data can later be retrieved using the accession number. Accession number identifies each data uniquely and it never changes. Examples – Nucleic Acid Databases: GenBank and DDBJ Protein Databases: PDB,SwissProt, PIR, TrEMBL, Metacyc, etc. Primary databases
  • 4. 1/23/2024 Computational Structural Biology (BIO455) - CC 89  The data stored in these types of databases are the analyzed result of the primary database.  Computational algorithms are applied to the primary database and meaningful and informative data is stored inside the secondary database.  The data here are highly curated(processing the data before it is presented in the database).  A secondary database is better and contains more valuable knowledge compared to the primary database.  Examples: InterPro (protein families, motifs, and domains) UniProt Knowledgebase (sequence and functional information on proteins) Secondary Database:
  • 5. 1/23/2024 Computational Structural Biology (BIO455) - CC 90  The data entered in these types of databases are first compared and then filtered based on desired criteria.   The initial data are taken from the primary database, and then they are merged together based on certain conditions.  It helps in searching sequences rapidly. Derived Databases contain non-redundant data. Derived Databases Examples: SCOP, CATH, KEGG
  • 6. 1/23/2024 Computational Structural Biology (BIO455) - CC 91 Biological databases
  • 7. 1/23/2024 Computational Structural Biology (BIO455) - CC 92 Protein Sequence Databases PIR ( https://proteininformationresource.org/ )  PIR (Protein Information Resource) is a popular protein sequence database that provides information on functionally annotated protein sequences.  PIR maintains three databases, the Protein Sequence Database (PSD), the Non-redundant Reference (NREF) sequence database, and the integrated Protein Classification (iProClass) database, which contains annotated protein sequences, classification information, and protein family, function, and structure information.
  • 8. 1/23/2024 Computational Structural Biology (BIO455) - CC 93 SWISS-PROT (integrated with Uniprot)  SWISS-PROT is a protein sequence database that provides high levels of annotations, including information on the protein’s function, domain structure, post-translational modifications, and variants.   Swiss-Prot is jointly managed by the SIB (Swiss Institute of Bioinformatics) and the EBI (European Bioinformatics Institute).  The database distinguishes itself from other protein sequence databases by three criteria: (i) annotations, which cover a broad range of information, (ii) minimal redundancy, which ensures that each sequence is represented only once, and (iii) integration with other databases, which enables cross-referencing and retrieval of information from related databases. TrEMBL  TrEMBL is a computer-annotated supplement of Swiss-Prot. TrEMBL entries follow the Swiss-Prot format.  It contains all the translations of EMBL (European Molecular Biology Laboratory) nucleotide sequence entries that have not yet been integrated into Swiss-Prot.
  • 9. 1/23/2024 Computational Structural Biology (BIO455) - CC 94 Protein Structure Databases: Protein Data Bank (PDB) Protein structure databases are collections of information related to the three-dimensional structure and secondary structure of proteins.  The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules, such as proteins and nucleic acids.  The data, typically obtained by X-ray crystallography, NMR spectroscopy, or, increasingly, cryo-electron microscopy, and submitted by biologists and biochemists from around the world, are freely accessible on the Internet via the websites of its member organisations, PDBe, PDBj, RCSB, and BMRB  Most major scientific journals and some funding agencies now require scientists to submit their structure data to the PDB.  Many other databases use protein structures deposited in the PDB. For example, SCOP and CATH classify protein structures, while PDBsum provides a graphic overview of PDB entries using information from other sources, such as Gene ontology. www.wwpdb.org ebi.ac.uk www.rcsb.org bmrb.io pdbj.org
  • 10. 1/23/2024 Computational Structural Biology (BIO455) - CC 95 I. Class: Types of folds, e.g., beta sheets. II. Fold: The different shapes of domains within a class. III. Superfamily: The domains in a fold are grouped into superfamilies, which have at least a distant common ancestor. IV. Family: The domains in a superfamily are grouped into families, which have a more recent common ancestor. V. Protein domain: The domains in families are grouped into protein domains, which are essentially the same protein. VI. Species: The domains in "protein domains" are grouped according to species. VII. Domain: part of a protein. For simple proteins, it can be the entire protein. Structural Classification of Proteins (SCOP) database  Manual classification of protein structural domains based on similarities of their structures and amino acid sequences.  A motivation for this classification is to determine the evolutionary relationship between proteins.  Proteins with the same shapes but having little sequence or functional similarity are placed in different superfamilies, and are assumed to have only a very distant common ancestor.  Proteins having the same shape and some similarity of sequence and/or function are placed in "families", and are assumed to have a closer common ancestor. http://scop.mrc-lmb.cam.ac.uk/scop/
  • 11. 1/23/2024 Computational Structural Biology (BIO455) - CC 96 1.All alpha proteins: Domains consisting of α-helices 2.All beta proteins: Domains consisting of β-sheets 3.Alpha and beta proteins: Mainly parallel beta sheets (beta-alpha-beta units) 4.Alpha and beta proteins (a+b): Mainly antiparallel beta sheets (segregated alpha and beta regions) 5.Multi-domain proteins (alpha and beta): Folds consisting of two or more domains belonging to different classes 6.Membrane and cell surface proteins and peptides: Does not include proteins in the immune system 7.Small proteins : Usually dominated by metal ligand, cofactor, and/or disulfide bridges Classes Folds  Each class contains a number of distinct folds. This classification level indicates similar tertiary structure, but not necessarily evolutionary relatedness.  For example, the "All-α proteins" class contains >280 distinct folds, including:  Globin-like (core: 6 helices; folded leaf, partly opened),  long alpha-hairpin (2 helices; antiparallel hairpin, left-handed twist) and  Type I dockerin domains (tandem repeat of two calcium-binding loop-helix motifs)  Domains within a fold are further classified into superfamilies.  This is a largest grouping of proteins for which structural similarity is sufficient to indicate evolutionary relatedness and therefore share a common ancestor.  For example, the two superfamilies of the "Globin-like" fold are: the Globin superfamily and alpha-helical ferredoxin superfamily Superfamily
  • 12. 1/23/2024 Computational Structural Biology (BIO455) - CC 97 CATH database cathdb.info The CATH Protein Structure Classification database is a free, publicly available online resource that provides information on the evolutionary relationships of protein domains. The four main levels of the CATH hierarchy: 1. Class: The overall secondary-structure content of the domain. (Equivalent to the SCOP Class) 2. Architecture: High structural similarity but no evidence of homology. 3. Topology/fold: A large-scale grouping of topologies which share particular structural features (Equivalent to the 'fold' level in SCOP) 4. Homologous superfamily: Indicative of a demonstrable evolutionary relationship. (Equivalent to SCOP superfamily)
  • 13. 1/23/2024 Computational Structural Biology (BIO455) - CC 98 Protein-Protein Interaction Databases  Protein-protein interaction databases are collections of information on the interactions between proteins.  Relationships between different proteins and their functions in biological systems. BIND (https://bio.tools/bind )  BIND (Biomolecular Interaction Network Database) is a database that stores detailed descriptions of interactions, molecular complexes, and pathways between various biomolecules, including proteins, nucleic acids, and small molecules.  The database is designed to be used for data mining and can be used to study networks of interactions and map pathways across different species. The database can also provide information for kinetic simulations. DIP (https://dip.doe-mbi.ucla.edu/dip/Main.cgi )  DIP (Database of Interacting Proteins) is a database that contains protein-protein interaction information that has been compiled through both manual curations and computational methods.  It is useful for understanding protein functions, and their relationships with other proteins. It can also be used to study the properties of networks of interacting proteins, evaluate predictions of protein-protein interactions, and explore the evolution of these interactions. MINT (https://mint.bio.uniroma2.it/ )  MINT (Molecular Interaction) is a database that stores information on functional interactions between biological molecules such as proteins, RNA, and DNA.  It also stores information on enzymatic modifications of partner molecules.  The database primarily focuses on experimentally verified protein-protein interactions and considers both direct and indirect relationships.
  • 14. 1/23/2024 Computational Structural Biology (BIO455) - CC 99 Protein Pattern and Profile Databases  Protein pattern and profile databases contain information on motifs found in sequences.  Sequence motifs correspond to structural or functional features in proteins.  So, the use of protein sequence patterns or profiles is a valuable tool in determining the function of proteins. InterPro (https://www.ebi.ac.uk/interpro/ )  InterPro is a database that contains information on protein families, domains, and functional sites.  It was created by combining several major protein signature databases, including PROSITE, Pfam, PRINTS, ProDom, and SMART into a single comprehensive resource. PROSITE (https://prosite.expasy.org/ )  PROSITE is a collection of signatures that identify patterns or profiles in proteins, which can provide information on their biological functions.  The signatures in the database are linked to annotation documents that provide information on the protein family or domain detected, including its name, function, 3D structure, and references.
  • 15. 1/23/2024 Computational Structural Biology (BIO455) - CC 100 Metabolic Pathway Databases Metabolic pathway databases contain information about enzymes, biochemical reactions, and metabolic pathways. ENZYME (https://enzyme.expasy.org/ )  ENZYME is a database that stores information on enzyme nomenclature.  It is used as the nomenclature source for enzyme names and reactions by most metabolic databases as well as by other biomolecular databases. KEGG (https://www.genome.jp/kegg/pathway.html )  KEGG (Kyoto Encyclopedia of Genes and Genomes) is a comprehensive database that maps out molecular and cellular pathways involving interactions between genes and molecules.  It is composed of pathway maps, molecule tables, gene tables, and genome maps, and is used to build functional maps of metabolic and regulatory pathways.