Biological Database Systems
Upcoming SlideShare
Loading in...5
×
 

Biological Database Systems

on

  • 313 views

 

Statistics

Views

Total Views
313
Views on SlideShare
313
Embed Views
0

Actions

Likes
0
Downloads
3
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Biological Database Systems Biological Database Systems Presentation Transcript

  • Biological Database Systems Denis Shestakov, University of Turku/Tampere
  • Course Information
    • Course structure:
      • Lectures: approx. 12 (plus today’s intro and review lecture in the end of the course)
      • Project work: details will be given next time
      • Exam: easy to pass if project is done
      • URL:
  • Course Information
    • Dates:
      • Period 2: 27.11, 4.12, 11.12
      • Period 3: 10 meetings on Mondays/Wednesdays
    • Contact info:
      • Email:
      • ICT, B6019: at 15-18 on Tuesdays
  • Course Information: Literature
    • Slides
    • References in the end of slides
    • Books:
      • Bioinformatics: Managing Scientific Data by Lacroix & Critchlow, Morgan Kaufmann, 2003 ISBN-10: 155860829X
      • Database Systems Concepts, 5 th edition by Silbershatz, Korth & Sudarshan, McGraw-Hill, 2005 ISBN-10: 0072958863
    • Articles:
      • Biological database design and implementation by Birney & Clamp (the Ensembl project), Briefings in Bioinformatics, 5(1):31-38, 2004
  • Biological Database Systems 1.1. Course Content 1.2. Course Objectives 1.3. Database and DBMS 1.4. Biological Databases
  • Course content: main topics
    • Database concepts, database design process
    • Relational data model
    • Introduction to SQL
    • XML and XML-based databases
    • Data structures for biological data: storage and querying
    • Model organism databases
  • Course content: main topics
    • LIMS, BioPostgres
    • Analysis workflows, web services
    • Integration of biological data
    • Integration of biological data, example of integration system
    • Research issues in scientific databases
    • * Project discussion, exam preparation
  • Course focus
    • Database issues:
      • Biology -specific
      • Representation of biological data
      • Design of biological databases
    • NOT about:
      • Usage of existing databases
      • Accessing/retrieving data from bio-databases
  • Course goal
    • Give basic knowledge of biological* database design
    * - for molecular biology
  • Do you need to know that?
    • Work in “wet” laboratory:
      • One bioinformatician and many biologists
      • Likely to be IT guru for others
      • Expect to answer IT-related questions
    • Work in bioinformatics lab:
      • Many bioinformaticians
      • Group may maintain several dbs
      • Basics are helpful
    • Create/maintain biological databases
      • Start learning!
      • Ask for more information
  • Database? From Merriam-Webster dictionary: (http://www.merriam-webster.com/dictionary/database)
  • Database?
    • A collection of data:
      • structured
      • searchable (i.e., indexable)
      • updated
      • cross-referenced
    • Objective:
      • Transform “meaningless” raw data into useful information which can be accessed and analysed in the best way
    • Data b ase Management System (DBMS):
      • software designed for the purpose of managing databases (access, insert, delete, update, etc.)
  • DBMS
    • A set of tools that:
      • Store
      • Extract
      • Modify
    Database Store Extract Modify USERS
  • Biological Databases?
    • Explosive growth in biological data
    • E.g., tremendous increase in nucleotide sequences (first increase in data due to the polymerase chain reaction (PCR) technique development in 1983)
    • 1980: 80 genes fully sequenced
  • Biological Databases?
    • EMBL Database Growth:
    Total nucleotides (Nov 07: 188,490,792,445 ) Number of entries (Nov 07: 106,144,026 )
  • Biological Databases?
    • Data (genomic sequences, 3D structures, 2D gel analysis, microarrays….) directly submitted to databases
    • Essential tools for biological research, like reading relevant literature
  • Biological Databases: History
    • 1965
      • Margaret Dayhoff et al. publish “Atlas of Protein Sequences and Structures”
    • 1982
      • EMBL initiates DNA sequence databases, followed within a year by GenBank and in 1984 by the DNA Database of Japan
    • 1988
      • EMBL/GenBank/DDBJ agree on common format for data elements
  • Biological Databases: some statistics
    • More than 1000 different databases
      • 968 databases reported in The Molecular Biology Database Collection: 2007 update by Galperin, Nucleic Acids Research, 2007, Vol. 35, Database issue D3-D4
      • Metabase: database of biological databases, http://biodatabase.org/index.php/Main_Page
    • Database sizes: <100kB to >100GB (EMBL >500GB)
      • DNA: >100GB
      • Protein: 1GB
      • 3D structure: 5GB
    • Update frequency: daily to annyally
    • Freely accessible (as a rule)
  • Some databases in the field of molecular biology
    • AATDB, AceDb, ACUTS, ADB, AFDB, AGIS, AMSdb,
    • ARR, AsDb, BBDB, BCGD, Beanref, Biolmage,
    • BioMagResBank, BIOMDB, BLOCKS, BovGBASE,
    • BOVMAP, BSORF, BTKbase, CANSITE, CarbBank,
    • CARBHYD, CATH, CAZY, CCDC, CD4OLbase, CGAP,
    • ChickGBASE, Colibri, COPE, CottonDB, CSNDB, CUTG,
    • CyanoBase, dbCFC, dbEST, dbSTS, DDBJ, DGP, DictyDb,
    • Picty_cDB, DIP, DOGS, DOMO, DPD, DPlnteract, ECDC,
    • ECGC, EC02DBASE, EcoCyc, EcoGene, EMBL, EMD db,
    • ENZYME, EPD, EpoDB, ESTHER, FlyBase, FlyView,
    • GCRDB, GDB, GENATLAS, Genbank, GeneCards,
    • Genline, GenLink, GENOTK, GenProtEC, GIFTS,
    • GPCRDB, GRAP, GRBase, gRNAsdb, GRR, GSDB,
    • HAEMB, HAMSTERS, HEART-2DPAGE, HEXAdb, HGMD,
    • HIDB, HIDC, HlVdb, HotMolecBase, HOVERGEN, HPDB,
    • HSC-2DPAGE, ICN, ICTVDB, IL2RGbase, IMGT, Kabat,
    • KDNA, KEGG, Klotho, LGIC, MAD, MaizeDb, MDB,
    • Medline, Mendel, MEROPS, MGDB, MGI, MHCPEP5
    • Micado, MitoDat, MITOMAP, MJDB, MmtDB, Mol-R-Us,
    • MPDB, MRR, MutBase, MycDB, NDB, NRSub, 0-lycBase,
    • OMIA, OMIM, OPD, ORDB, OWL, PAHdb, PatBase, PDB,
    • PDD, Pfam, PhosphoBase, PigBASE, PIR, PKR, PMD,
    • PPDB, PRESAGE, PRINTS, ProDom, Prolysis, PROSITE,
    • PROTOMAP, RatMAP, RDP, REBASE, RGP, SBASE,
    • SCOP, SeqAnaiRef, SGD, SGP, SheepMap, Soybase,
    • SPAD, SRNA db, SRPDB, STACK, StyGene,Sub2D,
    • SubtiList, SWISS-2DPAGE, SWISS-3DIMAGE, SWISS-
    • MODEL Repository, SWISS-PROT, TelDB, TGN, tmRDB,
    • TOPS, TRANSFAC, TRR, UniGene, URNADB, V BASE,
    • VDRR, VectorDB, WDCM, WIT, WormPep, YEPD, YPD,
    • YPM, etc …
    Find more at http://biodatabase.org
  • Categories of Biological Databases
    • Nucleotide sequences
    • Genomics
    • Mutation/polymorphism
    • Protein seqiences
    • Protein domain/family
    • Proteomics (2D gel, MS)
  • Categories of Biological Databases
    • Microarray
    • Organism-specific
    • 3D structure
    • Metabolism
    • Bibliography
    • Others
  • Categories of Biological Databases
    • Microarray
    • Organism-specific
    • 3D structure
    • Metabolism
    • Bibliography
    • Others
  • Biological Databases: special features
    • Autonomous: many independent maintainers
    • Heterogeneous data formats: e.g., various data formats for the same data elements
    • Dynamic: frequent and continous changes in data content (and, more importnatly, in data schema)
    • Broad domain knowledge
    • Workflow-oriented: databases + rich set of analysis tools
    • Information integration is essential: aggregate data from several databases
  • Biological Databases: integration Figure is taken from Bioinformatics: Managing Scientific Data by Lacroix & Critchlow, p.20