0
Basic bioinformatics concepts,                      databases and tools                                                   ...
Module 4 broadens our view
To understand life, we need not onlysequences, but many other concepts                Bioinformatics is also storing and ...
Schematic view IIGeneA                sequence     annotations – gene expr – pathway – struct,...GeneB                sequ...
The indispensable databases                Gene Ontology – structuring                KEGG – biochemical pathways      ...
Gene Ontology structures the way wecommunicate about lifeGene translation                  Protein production             ...
Gene Ontology structures life               http://www.geneontology.org/               Agreement on standardized keywords ...
A gene can be givendifferent GO terms Example, cytochrome c:     molecular function: oxidoreductase activity,     biologic...
Different evidence codes can assign adegree of confidence to the assignment         http://www.geneontology.org/GO.evidenc...
Different evidence codes can assign adegree of confidence to the assignment
Gene Ontology structures all genesaccording to their biological significance         The GO structure and the terms can be...
GO can be used to retrieve all gene(products) related to one specific term         You can search broad, e.g. Amigo search...
GO can be used to retrieve all gene(products) related to one specific term              Amigo search for Diabetes
GO can be used to retrieve all gene(products) related to one specific term              Amigo search for Diabetes
GO is also useful to analyze and comparedifferent gene lists          A lot of tools on GO are available on website.      ...
Some things to know about GO         For analyses, one can make use of shrinked GO sets,           the so-called GO-slims ...
Biological pathways databases organisegenes by molecular reactions        3 important databases on biological pathways    ...
Proteins with enzymatic function receivean Enzyme Commission (EC) number        http://www.chem.qmul.ac.uk/iubmb/enzyme/  ...
IntAct database contains interactioninformation of proteins         http://www.ebi.ac.uk/intact         Three types of int...
IntAct database represents allinteractions as binary: caution!
Interaction networks can be analysed onyour computer using Cytoscape                    Cytoscape training material on the...
PDB hosts 3-dimensionalstructural data on molecules
PDB hosts 3-dimensionalstructural data on molecules         PDB = Protein DataBank             http://www.pdb.org/pdb/home...
PDB files can be read by a lot of different  tools to display the structure                       Every entry in PDB conta...
PDB files can be read by a lot of differenttools to display the structure         Tools to visualize (and some to analyze ...
To find a structure for your protein  sequence is to search for similarity               Homology modeling               S...
Structural information is used to classifyproteins              Database cross-references in PDB entry                   ...
dbSNP is a public-domain archive forsimple genetic polymorphisms                Single Nucleotide Polymorphism database (...
Expression data can be sequence-basedor hybridisation-based      Sequence-based (ESTs - RNA seq - SAGE)                  ...
Example of expression data at GEO
Example of expression data at GEO
Example of expression data at GEO
Example at ArrayExpress
Example at ArrayExpress
Entrez interconnects the databases atNCBI for easy querying                    UniGene : sequences grouped by gene       ...
Finding relevant data
Summarizing most important links todiscover everything you need ...             Protein data               Interpro (heavi...
Hold back your horses!            Phew, where do I place this all?
Bioinformatics is all about different data,as versatile as life itself            Due to the strong cross-references betwe...
New tools are emerging everyday toenable you to browse all data sources...         BioGPS, all in one window!
New tools are emerging everyday toenable you to browse all data sources...
Integrative resources are increasinglybeing organised on a species basis                    EMAGE database of in situ gen...
The organizing biological datainformation by species                     By species, why?  There is one biological informa...
BITS: Overview of important biological databases beyond sequences
Upcoming SlideShare
Loading in...5
×

BITS: Overview of important biological databases beyond sequences

2,010

Published on

Module 4 Other relevant biological data sources beyond sequences

Part of training session "Basic Bioinformatics concepts, databases and tools" - http://www.bits.vib.be/training

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,010
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
37
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • 'translation', whereas another uses the phrase 'protein synthesis',
  • 'translation', whereas another uses the phrase 'protein synthesis',
  • 'translation', whereas another uses the phrase 'protein synthesis',
  • GO hierarchy can be downloaded (obo format) GO Slim: selection of categories
  • GO hierarchy can be downloaded (obo format) GO Slim: selection of categories
  • Different types: Ribbon Cartoon Ball and stick Space filling
  • Different types: Ribbon Cartoon Ball and stick Space filling
  • Transcript of "BITS: Overview of important biological databases beyond sequences"

    1. 1. Basic bioinformatics concepts, databases and tools Module 4 Beyond the sequences Dr. Joachim Jacob http://www.bits.vib.beUpdated Nov 2011http://dl.dropbox.com/u/18352887/BITS_training_material/Link%20to%20mod4-intro_H1_2011_otherRelevantData.pdf
    2. 2. Module 4 broadens our view
    3. 3. To understand life, we need not onlysequences, but many other concepts  Bioinformatics is also storing and analyzing − gene information: variations, isoforms,... − Expression data − 3D protein structure data − Interaction data − Pathways and network “Storing all relevant biological data”
    4. 4. Schematic view IIGeneA sequence annotations – gene expr – pathway – struct,...GeneB sequence annotations – gene expr – pathway – struct,...GeneC sequence annotations – gene expr – pathway – struct,... analysis Additional information sources results resultsPrimary databaseOther sequencedatabases
    5. 5. The indispensable databases  Gene Ontology – structuring  KEGG – biochemical pathways  PDB – Structure of proteins  Intact – Interaction data  dbSNP – database of genomic variation  Expression sources – Microarray data
    6. 6. Gene Ontology structures the way wecommunicate about lifeGene translation Protein production Protein synthesis http://www.arabidopsis.org/help/tutorials/go1.jsp http://www.geneontology.org/teaching_resources/tutorials/2005-09_BiB-journal-tutorial_jlomax
    7. 7. Gene Ontology structures life http://www.geneontology.org/ Agreement on standardized keywords (often referred to as controlled vocabularies), describing all natural processes in an hierarchical way (ontology). Keywords are assigned to genes based different evidence Keywords are ordered in a hierarchical tree-like structure ( directed acyclic graphs) Three GO trees exists, describing: "Biological Process" "Cellular Component" "Molecular Function" http://www.arabidopsis.org/help/tutorials/go1.jsp http://www.geneontology.org/teaching_resources/tutorials/2005-09_BiB-journal-tutorial_jlomax
    8. 8. A gene can be givendifferent GO terms Example, cytochrome c: molecular function: oxidoreductase activity, biological process: oxidative phosphorylation and induction of cell death, cellular component: mitochondrial matrix and mitochondrial inner membrane. In each tree, the terms are organised in a directed acyclic graph: a network consisting of parents and child-terms (as nodes) and lines between them as relationships.
    9. 9. Different evidence codes can assign adegree of confidence to the assignment http://www.geneontology.org/GO.evidence.shtml Evidence codes can be grouped by:  Experimental (e.g. IDA – inferred from direct assay)  Computational analysis  Author statement  Curator statement  Inferred from electronic annotation (IEA) If available, each annotation has also a reference
    10. 10. Different evidence codes can assign adegree of confidence to the assignment
    11. 11. Gene Ontology structures all genesaccording to their biological significance The GO structure and the terms can be browsed by a browser called AmiGO. The Quick Go from EBI has some nice visualisation Excellent GO-wiki for all your questions
    12. 12. GO can be used to retrieve all gene(products) related to one specific term You can search broad, e.g. Amigo search for Diabetes leads to following GO term http://amigo.geneontology.org/
    13. 13. GO can be used to retrieve all gene(products) related to one specific term Amigo search for Diabetes
    14. 14. GO can be used to retrieve all gene(products) related to one specific term Amigo search for Diabetes
    15. 15. GO is also useful to analyze and comparedifferent gene lists A lot of tools on GO are available on website. http://www.geneontology.org/GO.tools.shtml
    16. 16. Some things to know about GO For analyses, one can make use of shrinked GO sets, the so-called GO-slims – GO slims are a subset of biologically more relevant GO terms (available per species) – GO ontologies can be downloaded in .obo format. Not all information is captured by GO and need to be retrieved in other databases Metabolic pathways: KEGG, … Phenotype/diseases • Mapping files exists e.g. kegg2go http://www.geneontology.org/GO.slims.shtml
    17. 17. Biological pathways databases organisegenes by molecular reactions 3 important databases on biological pathways  http://www.kegg.jp/  http://www.reactome.org/ - EBI  http://metacyc.org
    18. 18. Proteins with enzymatic function receivean Enzyme Commission (EC) number http://www.chem.qmul.ac.uk/iubmb/enzyme/ EC 6 Ligases EC 5 Isomerases EC 4 Lyases EC 3 Hydrolases EC 2 Transferases EC 1 Oxidoreductases
    19. 19. IntAct database contains interactioninformation of proteins http://www.ebi.ac.uk/intact Three types of interactions stored  Protein-protein  Protein-dna  Protein-small molecule
    20. 20. IntAct database represents allinteractions as binary: caution!
    21. 21. Interaction networks can be analysed onyour computer using Cytoscape Cytoscape training material on the BITS website
    22. 22. PDB hosts 3-dimensionalstructural data on molecules
    23. 23. PDB hosts 3-dimensionalstructural data on molecules PDB = Protein DataBank http://www.pdb.org/pdb/home/home.do Only structures resolved through NMR and X-ray (or other accurate techniques)  Proteins  DNA  RNA  Ligands Understanding PDB data: tutorial
    24. 24. PDB files can be read by a lot of different tools to display the structure Every entry in PDB contains its own PDB accession number (often 1 digit and three letters) The PDB file contains 3D coordinates from every single atom in the structure, together with variability of that position (last two digits)http://www.bits.vib.be/index.php?option=com_content&view=article&id=17203817:protein-structure-
    25. 25. PDB files can be read by a lot of differenttools to display the structure Tools to visualize (and some to analyze structures) (see BITS wiki) http://www.bits.vib.be/wiki/index.php/Protein_structure
    26. 26. To find a structure for your protein sequence is to search for similarity Homology modeling Similarity on sequence level projected to a structure  Blast your query against PDB db by cblast , or at expasy  PSI-BLAST - can detect sequences with similar structures (twilight zone!)  If still no success: 3D-jury (a meta approach, including fold recognition and local structure prediction) Similarity on structural level: aligning structures  VAST (structure)  Distance mAtrix aLIgnment DALI BITS training on protein structure analysis http://www.ii.uib.no/~slars/bioinfocourse/PDFs/structpred_tutorial.pdfTools at EBI http://consurf.tau.ac.il/pe/protexpl/psbiores.htm
    27. 27. Structural information is used to classifyproteins Database cross-references in PDB entry  SCOP Groups proteins based on evolutionary, domain architecture and structural information.  CATH Manually curated classification on protein domains http://scop.mrc-lmb.cam.ac.uk/scop/ http://www.cathdb.info/
    28. 28. dbSNP is a public-domain archive forsimple genetic polymorphisms  Single Nucleotide Polymorphism database (NCBI)  Each dbSNP entry has a code rsxx (RefSNP) or ssxx (submitted SNP)  single-base nucleotide substitutions (also known as single nucleotide polymorphisms or SNPs),  small-scale multi-base deletions or insertions (also called deletion insertion polymorphisms or DIPs)  retroposable element insertions and microsatellite repeat variations (also called short tandem repeats or STRs).  Synchronized with new genome builds
    29. 29. Expression data can be sequence-basedor hybridisation-based Sequence-based (ESTs - RNA seq - SAGE)  Digital gene expression/northern Microarray databases – hybridisation based:  GEO: gene expression omnibus (NCBI) − Platform: GPLxxxxxxx − Experiment: GSExxxxxx (= several samples) − Sample: GSMxxxxxxxx − Some experiments are curated: GDSxxxxx (online analysis possible)  ArrayExpress (EBI)
    30. 30. Example of expression data at GEO
    31. 31. Example of expression data at GEO
    32. 32. Example of expression data at GEO
    33. 33. Example at ArrayExpress
    34. 34. Example at ArrayExpress
    35. 35. Entrez interconnects the databases atNCBI for easy querying  UniGene : sequences grouped by gene  PopSet : sequence alignments for population studies and phylogeny  Structure : 3D structures (PDB)  Genome : genomic maps of chromosomes and plasmids  UniSTS (Sequence Tagged Sites)  PubMed : literature abstracts (MEDLINE,…)  OMIM (Online Mendelian Inheritance in Man) : literature reviews,  Mesh (Medical Subject Headings) : keywords  Taxonomy
    36. 36. Finding relevant data
    37. 37. Summarizing most important links todiscover everything you need ... Protein data Interpro (heavily integrated with EBI resources) http://www.interpro.org Gene data Entrez at NCBI : Entrez Gene http://www.ncbi.nlm.nih.gov/Entrez/ Ebeye Search at EBI : excellent for cross-species http://www.ebi.ac.uk/ebisearch/
    38. 38. Hold back your horses! Phew, where do I place this all?
    39. 39. Bioinformatics is all about different data,as versatile as life itself Due to the strong cross-references between different databases, new databases and relevant info are rapidly integrated in existing databases. You can discover them by taking time to read the entries.
    40. 40. New tools are emerging everyday toenable you to browse all data sources... BioGPS, all in one window!
    41. 41. New tools are emerging everyday toenable you to browse all data sources...
    42. 42. Integrative resources are increasinglybeing organised on a species basis  EMAGE database of in situ gene expression in mouse  OMIM Database of diseases in man  Websites providing an interface to integrate all this data is increasingly important  Often organized on a species basis − TAIR − Flybase − Wormbase
    43. 43. The organizing biological datainformation by species By species, why? There is one biological information resource which stays more or less unchanged per species ...
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×