Intro to Biological Databases
BIO 410
Elmhurst University
Examples
https://microbenotes.com/biological-databases-types-and-importance/
Three types of databases
 Primary
 Secondary
 Tertiary
Primary Databases
 Information (usually sequences and metadata) directly submitted by researchers
from experiments
 Nucleotide sequences
 GenBank: GenBank ® is the NIH genetic sequence database, an annotated collection of
all publicly available DNA sequences
 DDBJ (Japan) and EMBL (Europe)
 Protein sequences:
 Swiss-Prot, TrEMBL and PIR-PSD (now all subsumed into UniProt)
Secondary Databases
 Data that has resulted from post-processing and analysis of data in primary
databases, mostly related to protein structure and function
 UniProt: combined databases that contain a lot of protein information like structure and
function (we’ll explore this database later)
 PDB (protein structure)
 InterPro (also protein information)
 Other things to predict (domains, motifs)
Tertiary/Derived/Composite Databases
 Combine information from multiple databases and integrate results
 Gene expression (GEO)
 Gene ontology (GO)
 Metabolic Pathways (KEGG, PathDB)
 Functional elements of human genome (ENCODE)
Let’s explore a couple of these
 NCBI GenBank
 RefSeq
 PDB
 UniProt
 KEGG
 GO
 GEO
 ENCODE
Now let’s do a couple of tutorials

BiologicalDatabases_rsm_introtobioinformatics.pptx

  • 1.
    Intro to BiologicalDatabases BIO 410 Elmhurst University
  • 2.
  • 3.
    Three types ofdatabases  Primary  Secondary  Tertiary
  • 4.
    Primary Databases  Information(usually sequences and metadata) directly submitted by researchers from experiments  Nucleotide sequences  GenBank: GenBank ® is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences  DDBJ (Japan) and EMBL (Europe)  Protein sequences:  Swiss-Prot, TrEMBL and PIR-PSD (now all subsumed into UniProt)
  • 5.
    Secondary Databases  Datathat has resulted from post-processing and analysis of data in primary databases, mostly related to protein structure and function  UniProt: combined databases that contain a lot of protein information like structure and function (we’ll explore this database later)  PDB (protein structure)  InterPro (also protein information)  Other things to predict (domains, motifs)
  • 6.
    Tertiary/Derived/Composite Databases  Combineinformation from multiple databases and integrate results  Gene expression (GEO)  Gene ontology (GO)  Metabolic Pathways (KEGG, PathDB)  Functional elements of human genome (ENCODE)
  • 7.
    Let’s explore acouple of these  NCBI GenBank  RefSeq  PDB  UniProt  KEGG  GO  GEO  ENCODE
  • 8.
    Now let’s doa couple of tutorials