Data integration - Integration of functional associations using STRING

840 views

Published on

EMBO World Practical Course on Computational Biology, Shanghai Jiao Tong University, Shanghai, China, August 22, 2009.

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
840
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
25
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Data integration - Integration of functional associations using STRING

  1. 1. Data integration Integration of functional associations using STRING Lars Juhl Jensen
  2. 2. Jensen, Kuhn et al., Nucleic Acids Research , 2009
  3. 3. functional associations
  4. 4. confidence scores
  5. 5. cross-species integration
  6. 6. 630 genomes
  7. 7. model organism databases
  8. 8. Ensembl
  9. 9. RefSeq
  10. 10. defining orthology
  11. 11. two modes
  12. 12. protein mode
  13. 13. von Mering et al., Nucleic Acids Research , 2005
  14. 14. COG mode
  15. 15. von Mering et al., Nucleic Acids Research , 2005
  16. 16. genomic context
  17. 17. gene fusion
  18. 18. Korbel et al., Nature Biotechnology , 2004
  19. 19. conserved neighborhood
  20. 20. operons
  21. 21. Korbel et al., Nature Biotechnology , 2004
  22. 22. bidirectional promoters
  23. 23. Korbel et al., Nature Biotechnology , 2004
  24. 24. phylogenetic profiles
  25. 25. Korbel et al., Nature Biotechnology , 2004
  26. 26. examples
  27. 27. bacterial Cox assembly
  28. 29. Banci et al., PNAS , 2005
  29. 30. Banci et al., PNAS , 2005
  30. 31. cellulose degradation
  31. 35. Cell Cellulosomes Cellulose
  32. 36. experimental data
  33. 37. protein interactions
  34. 38. yeast two-hybrid
  35. 39. affinity purification
  36. 40. fragment complementation
  37. 41. Jensen & Bork, Science , 2008
  38. 42. genetic interactions
  39. 43. Beyer et al., Nature Reviews Genetics , 2007
  40. 44. BIND Biomolecular Interaction Network Database
  41. 45. BioGRID General Repository for Interaction Datasets
  42. 46. DIP Database of Interacting Proteins
  43. 47. IntAct
  44. 48. MINT Molecular Interactions Database
  45. 49. HPRD Human Protein Reference Database
  46. 50. PDB Protein Data Bank
  47. 51. inferred associations
  48. 52. gene coexpression
  49. 54. GEO Gene Expression Omnibus
  50. 55. expression compendia
  51. 56. curated knowledge
  52. 57. complexes
  53. 58. MIPS Munich Information center for Protein Sequences
  54. 59. Gene Ontology
  55. 60. pathways
  56. 61. Letunic & Bork, Trends in Biochemical Sciences , 2008
  57. 62. KEGG Kyoto Encyclopedia of Genes and Genomes
  58. 63. MetaCyc
  59. 64. Reactome
  60. 65. PID NCI-Nature Pathway Interaction Database
  61. 66. literature mining
  62. 67. >10 km
  63. 68. M EDLINE
  64. 69. SGD Saccharomyces Genome Database
  65. 70. The Interactive Fly
  66. 71. OMIM Online Mendelian Inheritance in Man
  67. 72. co-mentioning
  68. 73. NLP Natural Language Processing
  69. 74. <ul><li>Gene and protein names </li></ul><ul><li>Cue words for entity recognition </li></ul><ul><li>Verbs for relation extraction </li></ul><ul><li>[ nxgene The GAL4 gene ] </li></ul><ul><li>[ nxexpr T he expression of [ nxgene the cytochrome genes [ nxpg CYC1 and CYC7 ]]] is controlled by [ nxpg HAP1 ] </li></ul>
  70. 76. easy in theory …
  71. 77. … but not in practice
  72. 78. many data types
  73. 79. not comparable
  74. 80. variable quality
  75. 81. many sources
  76. 82. different file formats
  77. 83. different gene identifiers
  78. 84. partially redundant
  79. 85. spread over 630 genomes
  80. 86. quality scores
  81. 87. reproducibility
  82. 88. von Mering et al., Nucleic Acids Research , 2005
  83. 89. intergenic distances
  84. 91. benchmarking
  85. 92. calibrate vs. gold standard
  86. 93. von Mering et al., Nucleic Acids Research , 2005
  87. 94. raw quality scores
  88. 95. probabilistic scores
  89. 96. integrate over orthologs
  90. 97. protein mode
  91. 98. von Mering et al., Nucleic Acids Research , 2005
  92. 99. COG mode
  93. 100. von Mering et al., Nucleic Acids Research , 2005
  94. 101. combine all evidence
  95. 102. Frishman et al., Modern Genome Annotation , 2009
  96. 103. small molecules
  97. 104. Kuhn et al., Nucleic Acids Research , 2008
  98. 105. metametabolomics
  99. 106. Acknowledgments <ul><li>Christian von Mering </li></ul><ul><li>Michael Kuhn </li></ul><ul><li>Manuel Stark </li></ul><ul><li>Samuel Chaffron </li></ul><ul><li>Philippe Julien </li></ul><ul><li>Monica Campillos </li></ul><ul><li>Tobias Doerks </li></ul><ul><li>Jan Korbel </li></ul><ul><li>Berend Snel </li></ul><ul><li>Martijn Huynen </li></ul><ul><li>Peer Bork </li></ul>
  100. 107. larsjuhljensen

×