Biological Database Systems

1,098 views

Published on

Published in: Business, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,098
On SlideShare
0
From Embeds
0
Number of Embeds
23
Actions
Shares
0
Downloads
24
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Biological Database Systems

  1. 1. Biological Database Systems Denis Shestakov, University of Turku/Tampere
  2. 2. Course Information <ul><li>Course structure: </li></ul><ul><ul><li>Lectures: approx. 12 (plus today’s intro and review lecture in the end of the course) </li></ul></ul><ul><ul><li>Project work: details will be given next time </li></ul></ul><ul><ul><li>Exam: easy to pass if project is done </li></ul></ul><ul><ul><li>URL: </li></ul></ul>
  3. 3. Course Information <ul><li>Dates: </li></ul><ul><ul><li>Period 2: 27.11, 4.12, 11.12 </li></ul></ul><ul><ul><li>Period 3: 10 meetings on Mondays/Wednesdays </li></ul></ul><ul><li>Contact info: </li></ul><ul><ul><li>Email: </li></ul></ul><ul><ul><li>ICT, B6019: at 15-18 on Tuesdays </li></ul></ul>
  4. 4. Course Information: Literature <ul><li>Slides </li></ul><ul><li>References in the end of slides </li></ul><ul><li>Books: </li></ul><ul><ul><li>Bioinformatics: Managing Scientific Data by Lacroix & Critchlow, Morgan Kaufmann, 2003 ISBN-10: 155860829X </li></ul></ul><ul><ul><li>Database Systems Concepts, 5 th edition by Silbershatz, Korth & Sudarshan, McGraw-Hill, 2005 ISBN-10: 0072958863 </li></ul></ul><ul><li>Articles: </li></ul><ul><ul><li>Biological database design and implementation by Birney & Clamp (the Ensembl project), Briefings in Bioinformatics, 5(1):31-38, 2004 </li></ul></ul>
  5. 5. Biological Database Systems 1.1. Course Content 1.2. Course Objectives 1.3. Database and DBMS 1.4. Biological Databases
  6. 6. Course content: main topics <ul><li>Database concepts, database design process </li></ul><ul><li>Relational data model </li></ul><ul><li>Introduction to SQL </li></ul><ul><li>XML and XML-based databases </li></ul><ul><li>Data structures for biological data: storage and querying </li></ul><ul><li>Model organism databases </li></ul>
  7. 7. Course content: main topics <ul><li>LIMS, BioPostgres </li></ul><ul><li>Analysis workflows, web services </li></ul><ul><li>Integration of biological data </li></ul><ul><li>Integration of biological data, example of integration system </li></ul><ul><li>Research issues in scientific databases </li></ul><ul><li>* Project discussion, exam preparation </li></ul>
  8. 8. Course focus <ul><li>Database issues: </li></ul><ul><ul><li>Biology -specific </li></ul></ul><ul><ul><li>Representation of biological data </li></ul></ul><ul><ul><li>Design of biological databases </li></ul></ul><ul><li>NOT about: </li></ul><ul><ul><li>Usage of existing databases </li></ul></ul><ul><ul><li>Accessing/retrieving data from bio-databases </li></ul></ul>
  9. 9. Course goal <ul><li>Give basic knowledge of biological* database design </li></ul>* - for molecular biology
  10. 10. Do you need to know that? <ul><li>Work in “wet” laboratory: </li></ul><ul><ul><li>One bioinformatician and many biologists </li></ul></ul><ul><ul><li>Likely to be IT guru for others </li></ul></ul><ul><ul><li>Expect to answer IT-related questions </li></ul></ul><ul><li>Work in bioinformatics lab: </li></ul><ul><ul><li>Many bioinformaticians </li></ul></ul><ul><ul><li>Group may maintain several dbs </li></ul></ul><ul><ul><li>Basics are helpful </li></ul></ul><ul><li>Create/maintain biological databases </li></ul><ul><ul><li>Start learning! </li></ul></ul><ul><ul><li>Ask for more information </li></ul></ul>
  11. 11. Database? From Merriam-Webster dictionary: (http://www.merriam-webster.com/dictionary/database)
  12. 12. Database? <ul><li>A collection of data: </li></ul><ul><ul><li>structured </li></ul></ul><ul><ul><li>searchable (i.e., indexable) </li></ul></ul><ul><ul><li>updated </li></ul></ul><ul><ul><li>cross-referenced </li></ul></ul><ul><li>Objective: </li></ul><ul><ul><li>Transform “meaningless” raw data into useful information which can be accessed and analysed in the best way </li></ul></ul><ul><li>Data b ase Management System (DBMS): </li></ul><ul><ul><li>software designed for the purpose of managing databases (access, insert, delete, update, etc.) </li></ul></ul>
  13. 13. DBMS <ul><li>A set of tools that: </li></ul><ul><ul><li>Store </li></ul></ul><ul><ul><li>Extract </li></ul></ul><ul><ul><li>Modify </li></ul></ul>Database Store Extract Modify USERS
  14. 14. Biological Databases? <ul><li>Explosive growth in biological data </li></ul><ul><li>E.g., tremendous increase in nucleotide sequences (first increase in data due to the polymerase chain reaction (PCR) technique development in 1983) </li></ul><ul><li>1980: 80 genes fully sequenced </li></ul><ul><li>… </li></ul>
  15. 15. Biological Databases? <ul><li>EMBL Database Growth: </li></ul>Total nucleotides (Nov 07: 188,490,792,445 ) Number of entries (Nov 07: 106,144,026 )
  16. 16. Biological Databases? <ul><li>Data (genomic sequences, 3D structures, 2D gel analysis, microarrays….) directly submitted to databases </li></ul><ul><li>Essential tools for biological research, like reading relevant literature </li></ul>
  17. 17. Biological Databases: History <ul><li>1965 </li></ul><ul><ul><li>Margaret Dayhoff et al. publish “Atlas of Protein Sequences and Structures” </li></ul></ul><ul><li>1982 </li></ul><ul><ul><li>EMBL initiates DNA sequence databases, followed within a year by GenBank and in 1984 by the DNA Database of Japan </li></ul></ul><ul><li>1988 </li></ul><ul><ul><li>EMBL/GenBank/DDBJ agree on common format for data elements </li></ul></ul>
  18. 18. Biological Databases: some statistics <ul><li>More than 1000 different databases </li></ul><ul><ul><li>968 databases reported in The Molecular Biology Database Collection: 2007 update by Galperin, Nucleic Acids Research, 2007, Vol. 35, Database issue D3-D4 </li></ul></ul><ul><ul><li>Metabase: database of biological databases, http://biodatabase.org/index.php/Main_Page </li></ul></ul><ul><li>Database sizes: <100kB to >100GB (EMBL >500GB) </li></ul><ul><ul><li>DNA: >100GB </li></ul></ul><ul><ul><li>Protein: 1GB </li></ul></ul><ul><ul><li>3D structure: 5GB </li></ul></ul><ul><li>Update frequency: daily to annyally </li></ul><ul><li>Freely accessible (as a rule) </li></ul>
  19. 19. Some databases in the field of molecular biology <ul><li>AATDB, AceDb, ACUTS, ADB, AFDB, AGIS, AMSdb, </li></ul><ul><li>ARR, AsDb, BBDB, BCGD, Beanref, Biolmage, </li></ul><ul><li>BioMagResBank, BIOMDB, BLOCKS, BovGBASE, </li></ul><ul><li>BOVMAP, BSORF, BTKbase, CANSITE, CarbBank, </li></ul><ul><li>CARBHYD, CATH, CAZY, CCDC, CD4OLbase, CGAP, </li></ul><ul><li>ChickGBASE, Colibri, COPE, CottonDB, CSNDB, CUTG, </li></ul><ul><li>CyanoBase, dbCFC, dbEST, dbSTS, DDBJ, DGP, DictyDb, </li></ul><ul><li>Picty_cDB, DIP, DOGS, DOMO, DPD, DPlnteract, ECDC, </li></ul><ul><li>ECGC, EC02DBASE, EcoCyc, EcoGene, EMBL, EMD db, </li></ul><ul><li>ENZYME, EPD, EpoDB, ESTHER, FlyBase, FlyView, </li></ul><ul><li>GCRDB, GDB, GENATLAS, Genbank, GeneCards, </li></ul><ul><li>Genline, GenLink, GENOTK, GenProtEC, GIFTS, </li></ul><ul><li>GPCRDB, GRAP, GRBase, gRNAsdb, GRR, GSDB, </li></ul><ul><li>HAEMB, HAMSTERS, HEART-2DPAGE, HEXAdb, HGMD, </li></ul><ul><li>HIDB, HIDC, HlVdb, HotMolecBase, HOVERGEN, HPDB, </li></ul><ul><li>HSC-2DPAGE, ICN, ICTVDB, IL2RGbase, IMGT, Kabat, </li></ul><ul><li>KDNA, KEGG, Klotho, LGIC, MAD, MaizeDb, MDB, </li></ul><ul><li>Medline, Mendel, MEROPS, MGDB, MGI, MHCPEP5 </li></ul><ul><li>Micado, MitoDat, MITOMAP, MJDB, MmtDB, Mol-R-Us, </li></ul><ul><li>MPDB, MRR, MutBase, MycDB, NDB, NRSub, 0-lycBase, </li></ul><ul><li>OMIA, OMIM, OPD, ORDB, OWL, PAHdb, PatBase, PDB, </li></ul><ul><li>PDD, Pfam, PhosphoBase, PigBASE, PIR, PKR, PMD, </li></ul><ul><li>PPDB, PRESAGE, PRINTS, ProDom, Prolysis, PROSITE, </li></ul><ul><li>PROTOMAP, RatMAP, RDP, REBASE, RGP, SBASE, </li></ul><ul><li>SCOP, SeqAnaiRef, SGD, SGP, SheepMap, Soybase, </li></ul><ul><li>SPAD, SRNA db, SRPDB, STACK, StyGene,Sub2D, </li></ul><ul><li>SubtiList, SWISS-2DPAGE, SWISS-3DIMAGE, SWISS- </li></ul><ul><li>MODEL Repository, SWISS-PROT, TelDB, TGN, tmRDB, </li></ul><ul><li>TOPS, TRANSFAC, TRR, UniGene, URNADB, V BASE, </li></ul><ul><li>VDRR, VectorDB, WDCM, WIT, WormPep, YEPD, YPD, </li></ul><ul><li>YPM, etc … </li></ul>Find more at http://biodatabase.org
  20. 20. Categories of Biological Databases <ul><li>Nucleotide sequences </li></ul><ul><li>Genomics </li></ul><ul><li>Mutation/polymorphism </li></ul><ul><li>Protein seqiences </li></ul><ul><li>Protein domain/family </li></ul><ul><li>Proteomics (2D gel, MS) </li></ul>
  21. 21. Categories of Biological Databases <ul><li>Microarray </li></ul><ul><li>Organism-specific </li></ul><ul><li>3D structure </li></ul><ul><li>Metabolism </li></ul><ul><li>Bibliography </li></ul><ul><li>Others </li></ul>
  22. 22. Categories of Biological Databases <ul><li>Microarray </li></ul><ul><li>Organism-specific </li></ul><ul><li>3D structure </li></ul><ul><li>Metabolism </li></ul><ul><li>Bibliography </li></ul><ul><li>Others </li></ul>
  23. 23. Biological Databases: special features <ul><li>Autonomous: many independent maintainers </li></ul><ul><li>Heterogeneous data formats: e.g., various data formats for the same data elements </li></ul><ul><li>Dynamic: frequent and continous changes in data content (and, more importnatly, in data schema) </li></ul><ul><li>Broad domain knowledge </li></ul><ul><li>Workflow-oriented: databases + rich set of analysis tools </li></ul><ul><li>Information integration is essential: aggregate data from several databases </li></ul>
  24. 24. Biological Databases: integration Figure is taken from Bioinformatics: Managing Scientific Data by Lacroix & Critchlow, p.20

×