Network integration of heterogeneous data Lars Juhl Jensen EMBL Heidelberg
association networks
 
STRING
 
STITCH
 
373 genomes
 
model organism databases
Ensembl
Genome Reviews
RefSeq
genomic context methods
phylogenetic profiles
 
 
 
 
Cell Cellulosomes Cellulose
conserved neighborhood
operons
 
bidirectional promoters
 
gene fusion
 
primary experimental data
expression profiles
 
GEO Gene Expression Omnibus
expression compendia
protein interactions
yeast two-hybrid
 
affinity purification
 
genetic interactions
synthetic lethality
 
BioGRID General Repository for Interaction Datasets
IntAct
MINT Molecular Interactions Database
DIP Database of Interacting Proteins
BIND Biomolecular Interaction Network Database
HPRD Human Protein Reference Database
literature mining
 
co-mentioning
statistical methods
NLP Natural Language Processing
<ul><li>Gene  and protein  names </li></ul><ul><li>Cue words for entity recognition </li></ul><ul><li>Verbs for relation e...
M EDLINE
SGD Saccharomyces Genome Database
The Interactive Fly
OMIM Online Mendelian Inheritance in Man
good synonyms list
manual curation
orthographic variation
disambiguation
curated knowledge
complexes
MIPS Munich Information center for Protein Sequences
Gene Ontology
pathways
 
KEGG Kyoto Encyclopedia of Genes and Genomes
Reactome
PID NCI-Nature Pathway Interaction Database
STKE Signal Transduction Knowledge Environment
variable reliability
raw quality scores
conservation
 
 
reproducibility
 
 
not comparable
benchmarking
calibrate vs. gold standard
 
probabilistic scores
combine all evidence
P = 1-(1-P 1 ) . (1-P 2 ) . (1-P 3 ) …
spread over many species
transfer by orthology
 
two modes
COG mode
 
 
protein mode
 
 
signaling network
NetworKIN
 
NetPhorest
 
phosphoproteomics
mass spectrometry
 
in vivo  phosphosites
kinases are unknown
computational methods
sequence motifs
 
kinase families
overprediction
context
localization
expression
co-activators
scaffolders
association networks
 
the idea
 
NetworKIN
coverage
69 kinases
 
benchmarking
 
small-scale validation
ATM phosphorylates Rad50
 
Cdk1 phosphorylates 53BP1
 
high-throughput validation
multiple reaction monitoring
 
the future
more sequence motifs
NetPhorest
data organization
 
selection
 
benchmarking
 
179 kinases
89 SH2 domains
8 PTB domains
upstream signaling
downstream signaling
signaling pathways
Acknowledgments <ul><li>STRING & STITCH </li></ul><ul><ul><li>Christian von Mering </li></ul></ul><ul><ul><li>Michael Kuhn...
http://larsjuhljensen.wordpress.com
Upcoming SlideShare
Loading in …5
×

Network integration of heterogeneous data

586 views

Published on

8th Course in Bioinformatics &amp; Systems Biology for Molecular Biologists, Bertinoro di Romagna, Bertinoro, Italy, March 16-20, 2008

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
586
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Network integration of heterogeneous data

  1. 1. Network integration of heterogeneous data Lars Juhl Jensen EMBL Heidelberg
  2. 2. association networks
  3. 4. STRING
  4. 6. STITCH
  5. 8. 373 genomes
  6. 10. model organism databases
  7. 11. Ensembl
  8. 12. Genome Reviews
  9. 13. RefSeq
  10. 14. genomic context methods
  11. 15. phylogenetic profiles
  12. 20. Cell Cellulosomes Cellulose
  13. 21. conserved neighborhood
  14. 22. operons
  15. 24. bidirectional promoters
  16. 26. gene fusion
  17. 28. primary experimental data
  18. 29. expression profiles
  19. 31. GEO Gene Expression Omnibus
  20. 32. expression compendia
  21. 33. protein interactions
  22. 34. yeast two-hybrid
  23. 36. affinity purification
  24. 38. genetic interactions
  25. 39. synthetic lethality
  26. 41. BioGRID General Repository for Interaction Datasets
  27. 42. IntAct
  28. 43. MINT Molecular Interactions Database
  29. 44. DIP Database of Interacting Proteins
  30. 45. BIND Biomolecular Interaction Network Database
  31. 46. HPRD Human Protein Reference Database
  32. 47. literature mining
  33. 49. co-mentioning
  34. 50. statistical methods
  35. 51. NLP Natural Language Processing
  36. 52. <ul><li>Gene and protein names </li></ul><ul><li>Cue words for entity recognition </li></ul><ul><li>Verbs for relation extraction </li></ul><ul><li>[ nxexpr T he expression of [ nxgene the cytochrome genes [ nxpg CYC1 and CYC7 ]]] is controlled by [ nxpg HAP1 ] </li></ul>
  37. 53. M EDLINE
  38. 54. SGD Saccharomyces Genome Database
  39. 55. The Interactive Fly
  40. 56. OMIM Online Mendelian Inheritance in Man
  41. 57. good synonyms list
  42. 58. manual curation
  43. 59. orthographic variation
  44. 60. disambiguation
  45. 61. curated knowledge
  46. 62. complexes
  47. 63. MIPS Munich Information center for Protein Sequences
  48. 64. Gene Ontology
  49. 65. pathways
  50. 67. KEGG Kyoto Encyclopedia of Genes and Genomes
  51. 68. Reactome
  52. 69. PID NCI-Nature Pathway Interaction Database
  53. 70. STKE Signal Transduction Knowledge Environment
  54. 71. variable reliability
  55. 72. raw quality scores
  56. 73. conservation
  57. 76. reproducibility
  58. 79. not comparable
  59. 80. benchmarking
  60. 81. calibrate vs. gold standard
  61. 83. probabilistic scores
  62. 84. combine all evidence
  63. 85. P = 1-(1-P 1 ) . (1-P 2 ) . (1-P 3 ) …
  64. 86. spread over many species
  65. 87. transfer by orthology
  66. 89. two modes
  67. 90. COG mode
  68. 93. protein mode
  69. 96. signaling network
  70. 97. NetworKIN
  71. 99. NetPhorest
  72. 101. phosphoproteomics
  73. 102. mass spectrometry
  74. 104. in vivo phosphosites
  75. 105. kinases are unknown
  76. 106. computational methods
  77. 107. sequence motifs
  78. 109. kinase families
  79. 110. overprediction
  80. 111. context
  81. 112. localization
  82. 113. expression
  83. 114. co-activators
  84. 115. scaffolders
  85. 116. association networks
  86. 118. the idea
  87. 120. NetworKIN
  88. 121. coverage
  89. 122. 69 kinases
  90. 124. benchmarking
  91. 126. small-scale validation
  92. 127. ATM phosphorylates Rad50
  93. 129. Cdk1 phosphorylates 53BP1
  94. 131. high-throughput validation
  95. 132. multiple reaction monitoring
  96. 134. the future
  97. 135. more sequence motifs
  98. 136. NetPhorest
  99. 137. data organization
  100. 139. selection
  101. 141. benchmarking
  102. 143. 179 kinases
  103. 144. 89 SH2 domains
  104. 145. 8 PTB domains
  105. 146. upstream signaling
  106. 147. downstream signaling
  107. 148. signaling pathways
  108. 149. Acknowledgments <ul><li>STRING & STITCH </li></ul><ul><ul><li>Christian von Mering </li></ul></ul><ul><ul><li>Michael Kuhn </li></ul></ul><ul><ul><li>Manuel Stark </li></ul></ul><ul><ul><li>Samuel Chaffron </li></ul></ul><ul><ul><li>Philippe Julien </li></ul></ul><ul><ul><li>Tobias Doerks </li></ul></ul><ul><ul><li>Jan Korbel </li></ul></ul><ul><ul><li>Berend Snel </li></ul></ul><ul><ul><li>Martijn Huynen </li></ul></ul><ul><ul><li>Peer Bork </li></ul></ul><ul><li>Literature mining </li></ul><ul><ul><li>Evangelos Pafilis </li></ul></ul><ul><ul><li>Jasmin Saric </li></ul></ul><ul><ul><li>Rossitza Ouzounova </li></ul></ul><ul><ul><li>Sean O’Donoghue </li></ul></ul><ul><ul><li>Isabel Rojas </li></ul></ul><ul><li>NetworKIN & NetPhorest </li></ul><ul><ul><li>Rune Linding </li></ul></ul><ul><ul><li>Martin Lee Miller </li></ul></ul><ul><ul><li>Gerard Ostheimer </li></ul></ul><ul><ul><li>Francesca Diella </li></ul></ul><ul><ul><li>Karen Colwill </li></ul></ul><ul><ul><li>Jing Jin </li></ul></ul><ul><ul><li>Pavel Metalnikov </li></ul></ul><ul><ul><li>Vivian Nguyen </li></ul></ul><ul><ul><li>Adrian Pasculescu </li></ul></ul><ul><ul><li>Jin Gyoon Park </li></ul></ul><ul><ul><li>Leona D. Samson </li></ul></ul><ul><ul><li>Nikolaj Blom </li></ul></ul><ul><ul><li>Rob Russell </li></ul></ul><ul><ul><li>Peer Bork </li></ul></ul><ul><ul><li>Søren Brunak </li></ul></ul><ul><ul><li>Michael Yaffe </li></ul></ul><ul><ul><li>Tony Pawson </li></ul></ul>
  109. 150. http://larsjuhljensen.wordpress.com

×