Integration of diverse large-scale datasets
Upcoming SlideShare
Loading in...5
×
 

Integration of diverse large-scale datasets

on

  • 819 views

IPAM Proteomics Reunion Conference, UCLA, Lake Arrowhead, California, December 11-16, 2005

IPAM Proteomics Reunion Conference, UCLA, Lake Arrowhead, California, December 11-16, 2005

Statistics

Views

Total Views
819
Views on SlideShare
817
Embed Views
2

Actions

Likes
0
Downloads
6
Comments
0

1 Embed 2

http://www.slideshare.net 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Integration of diverse large-scale datasets Integration of diverse large-scale datasets Presentation Transcript

    • Integration of diverse large-scale datasets
    • Lars Juhl Jensen
    •  
    •  
    •  
    • promoter analysis
    • Jensen et al., Bioinformatics, 2000
    • DNA structure
    • genome visualization
    • Pedersen et al., Journal of Molecular Biology, 2000
    • microarray normalization
    • Workman et al., Genome Biology, 2002
    • protein function prediction
    •  
    •  
    •  
    •  
    • STRING
    •  
    • integrate diverse evidence
    • functional interactions
    • Bork et al., Current Opinion in Structural Biology, 2005
    • 179 proteomes
    • evolution
    •  
    •  
    • statistics
    • (the original sin)
    • prokaryotes
    • genomic context methods
    • gene fusion
    •  
    • gene neighborhood
    •  
    • phylogenetic profiles
    •  
    •  
    •  
    •  
    • Cell Cellulosomes Cellulose
    • eukaryotes
    • integrate diverse datasets
    • Jensen et al., Drug Discovery Today: Targets, 2004
    • curated knowledge
    • MIPS Munich Information center for Protein Sequences
    • KEGG Kyoto Encyclopedia of Genes and Genomes
    • STKE Signal Transduction Knowledge Environment
    • Reactome
    • literature mining
    • M EDLINE
    • SGD Saccharomyces Genome Database
    • The Interactive Fly
    • OMIM Online Mendelian Inheritance in Man
    • co-mentioning
    • NLP Natural Language Processing
      • Gene and protein names
      • Cue words for entity recognition
      • Verbs for relation extraction
      • [ nxgene The GAL4 gene ]
      • [ nxexpr T he expression of [ nxgene the cytochrome genes [ nxpg CYC1 and CYC7 ]]] is controlled by [ nxpg HAP1 ]
    •  
    • primary experimental data
    • microarray expression data
    • GEO Gene Expression Omnibus
    • physical protein interactions
    • BIND Biomolecular Interaction Network Database
    • MINT Molecular Interactions Database
    • GRID General Repository for Interaction Datasets
    • DIP Database of Interacting Proteins
    • HPRD Human Protein Reference Database
    • problems
    • many sources
    • (different gene identifiers)
    • many types of evidence
    • questionable quality
    • not directly comparable
    • spread over many species
    • huge synonyms lists
    • calculate raw quality scores
    • calibrate vs. gold standard
    • KEGG Kyoto Encyclopedia of Genes and Genomes
    • von Mering et al., Nucleic Acids Research, 2005
    • transfer based on orthology
    • combine all evidence
    • Bork et al., Current Opinion in Structural Biology, 2005
    • cell cycle
    • qualitative modeling
    •  
    • Chen et al., Molecular Biology of the Cell, 2004
    • Chen et al., Molecular Biology of the Cell, 2004
    • synchronized cell culture
    •  
    • microarray time series
    •  
    • periodically expressed genes
    •  
    • S. cerevisiae
    • Cho et al.
    • Spellman et al.
    • numerous analysis methods
    • Cho et al.
    • Spellman et al.
    • Zhao et al.
    • Johansson et al.
    • Luan and Li
    • Lu et al.
    • Ahdesm äki et al.
    • Willbrand et al.
    • no benchmarking
    • de Lichtenberg et al., Bioinformatics, 2005
    • reproducibility
    • de Lichtenberg et al., Bioinformatics, 2005
    • regulation vs. periodicity
    • de Lichtenberg et al., Bioinformatics, 2005
    • list of 600 periodic genes
    • S. pombe
    • several expression studies
    • reproducibility
    • Marguerat et al., Yeast, 2006
    • name inconsistencies
    • Marguerat et al., Yeast, 2006
    • different analysis methods
    • no benchmarking
    • Marguerat et al., Yeast, 2006
    • Marguerat et al., Yeast, 2006
    • too many genes suggested
    • Marguerat et al., Yeast, 2006
    • Marguerat et al., Yeast, 2006
    • averaging better than voting
    • Marguerat et al., Yeast, 2006
    • S. cerevisiae
    • list of 600 periodic genes
    • protein interaction data
    •  
    • von Mering et al., Nucleic Acids Research, 2005
    • de Lichtenberg et al., Science, 2005
    • dynamic proteins
    • static proteins
    • de Lichtenberg et al., Science, 2005
    • reproduces what is known
    • de Lichtenberg et al., Science, 2005
    • many detailed predictions
    • de Lichtenberg et al., Science, 2005
    • global trends
    • dynamic proteins
    • de Lichtenberg et al., Science, 2005
    • static proteins
    • de Lichtenberg et al., Science, 2005
    • just-in-time assembly
    • de Lichtenberg et al., Science, 2005
    • de Lichtenberg et al., Science, 2005
    • coordinated regulation
    • periodically expressed genes
    • Cdc28p substrates
    • PEST degradation signals
    • the human interactome
    • yeast two-hybrid
    • 1936 13 4 4 1385 65 18465 Stelzl et al. Rual et al. Small-scale studies
    • 32 0 3 4 18 4 23 Stelzl et al. Rual et al. Small-scale studies
    • 62 8 39 Small-scale studies Stelzl et al. Rual et al. 852 17 473 432 69 260
    • 3.5% and 21% sensitivity
    • in a couple of years
    • the human interactome
    • 100% = 1/5?
    • the yeast interactome
    • five years ago
    • yeast two-hybrid
    • 1150 117 117 72 4053 118 4469 Uetz et al. Ito et al. Small-scale studies
    • 162 53 34 72 180 29 338 Uetz et al. Ito et al. Small-scale studies
    • 511 189 616 Small-scale studies Uetz et al. Ito et al. 439 178 759 897 190 1347
    • 19% and 12% sensitivity
    • the challenge
    • how to get from here …
    • 1936 13 4 4 1385 65 18465 Stelzl et al. Rual et al. Small-scale studies
    • … to there …
    • de Lichtenberg et al., Science, 2005
    • Acknowledgments
      • The STRING team (EMBL)
        • Christian von Mering
        • Berend Snel
        • Martijn Huynen
        • Sean Hooper
        • Mathilde Foglierini
        • Julien Lagarde
        • Peer Bork
      • Literature mining project (EML Research)
        • Jasmin Saric
        • Rossitza Ouzounova
        • Isabel Rojas
      • Cell cycle studies (CBS)
        • Ulrik de Lichtenberg
        • Thomas Skøt Jensen
        • Søren Brunak
      • S. pombe cell cycle (Sanger)
        • Samuel Marguerat
        • J ürg Bähler
      • Inspiration for presentation
        • Lawrence Lessig
        • Dick Clarence Hardt
        • Anders Gorm Pedersen
    • Thank you!