Literature mining and large-scale data integration

494 views

Published on

Computational and Systems Biology Course, Centre for Computational and Systems Biology (CoSBi), Trento, Italy, March 10-14, 2008

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
494
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Literature mining and large-scale data integration

  1. 1. Literature mining and large-scale data integration Lars Juhl Jensen EMBL Heidelberg
  2. 2. literature mining
  3. 3. why?
  4. 5. too much to read
  5. 6. information retrieval
  6. 7. finding the papers
  7. 8. ad hoc retrieval
  8. 9. user-specified query
  9. 10. “ yeast AND cell cycle”
  10. 11. stemming
  11. 12. yeast / yeasts
  12. 13. dynamic query expansion
  13. 14. yeast / S. cerevisiae
  14. 15. ranking
  15. 24. Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation
  16. 25. no tool will find it
  17. 26. entity recognition
  18. 27. identifying the substance(s)
  19. 28. Mitotic cyclin ( Clb2 )-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5 -dependent Swe1 hyperphosphorylation and degradation
  20. 29. Cdc28  yeast
  21. 30. Cdc28  cell cycle
  22. 31. good synonyms list
  23. 32. manual curation
  24. 33. orthographic variation
  25. 34. CDC28
  26. 35. Cdc28p
  27. 36. disambiguation
  28. 37. hairy
  29. 38. SDS
  30. 39. APC
  31. 40. Cdc2
  32. 45. still too much to read
  33. 46. information extraction
  34. 47. formalizing the facts
  35. 49. co-mentioning
  36. 50. statistical methods
  37. 51. NLP Natural Language Processing
  38. 52. <ul><li>Gene and protein names </li></ul><ul><li>Cue words for entity recognition </li></ul><ul><li>Verbs for relation extraction </li></ul><ul><li>[ nxexpr T he expression of [ nxgene the cytochrome genes [ nxpg CYC1 and CYC7 ]]] is controlled by [ nxpg HAP1 ] </li></ul>
  39. 53. Mitotic cyclin ( Clb2 )-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5 -dependent Swe1 hyperphosphorylation and degradation
  40. 55. no new discoveries
  41. 56. text mining
  42. 57. undiscovered links
  43. 59. Raynaud’s syndrome
  44. 60. fish oil
  45. 62. temporal trends
  46. 64. buzzwords
  47. 66. data integration
  48. 67. association networks
  49. 69. information extraction
  50. 71. curated knowledge
  51. 73. protein interaction data
  52. 75. genetic interaction data
  53. 77. gene expression data
  54. 79. computational predictions
  55. 80. conserved neighborhood
  56. 82. gene fusion
  57. 84. phylogenetic profiles
  58. 86. variable reliability
  59. 87. raw quality scores
  60. 91. not comparable
  61. 92. benchmarking
  62. 93. calibrate vs. gold standard
  63. 95. probabilistic scores
  64. 96. spread over many species
  65. 97. 373 genomes
  66. 99. transfer by orthology
  67. 101. combine all evidence
  68. 102. P = 1-(1-P 1 ) . (1-P 2 ) . (1-P 3 ) …
  69. 103. web resources
  70. 106. signaling networks
  71. 107. phosphoproteomics
  72. 109. in vivo phosphosites
  73. 110. kinases are unknown
  74. 111. computational methods
  75. 113. overprediction
  76. 114. context
  77. 115. scaffolders
  78. 116. association networks
  79. 118. NetworKIN
  80. 120. benchmarking
  81. 122. 2.5-fold better accuracy
  82. 123. web resources
  83. 126. summary
  84. 127. literature mining is good
  85. 128. data integration is better
  86. 129. Acknowledgments <ul><li>Reflect & NLP </li></ul><ul><ul><li>Evangelos Pafilis </li></ul></ul><ul><ul><li>Jasmin Saric </li></ul></ul><ul><ul><li>Rossitza Ouzounova </li></ul></ul><ul><ul><li>Sean O’Donoghue </li></ul></ul><ul><ul><li>Isabel Rojas </li></ul></ul><ul><li>STRING & STITCH </li></ul><ul><ul><li>Christian von Mering </li></ul></ul><ul><ul><li>Michael Kuhn </li></ul></ul><ul><ul><li>Manuel Stark </li></ul></ul><ul><ul><li>Samuel Chaffron </li></ul></ul><ul><ul><li>Philippe Julien </li></ul></ul><ul><ul><li>Tobias Doerks </li></ul></ul><ul><ul><li>Jan Korbel </li></ul></ul><ul><ul><li>Berend Snel </li></ul></ul><ul><ul><li>Martijn Huynen </li></ul></ul><ul><ul><li>Peer Bork </li></ul></ul><ul><li>NetworKIN & NetPhorest </li></ul><ul><ul><li>Rune Linding </li></ul></ul><ul><ul><li>Martin Lee Miller </li></ul></ul><ul><ul><li>Gerard Ostheimer </li></ul></ul><ul><ul><li>Francesca Diella </li></ul></ul><ul><ul><li>Karen Colwill </li></ul></ul><ul><ul><li>Jing Jin </li></ul></ul><ul><ul><li>Pavel Metalnikov </li></ul></ul><ul><ul><li>Vivian Nguyen </li></ul></ul><ul><ul><li>Adrian Pasculescu </li></ul></ul><ul><ul><li>Jin Gyoon Park </li></ul></ul><ul><ul><li>Leona D. Samson </li></ul></ul><ul><ul><li>Nikolaj Blom </li></ul></ul><ul><ul><li>Rob Russell </li></ul></ul><ul><ul><li>Peer Bork </li></ul></ul><ul><ul><li>Søren Brunak </li></ul></ul><ul><ul><li>Michael Yaffe </li></ul></ul><ul><ul><li>Tony Pawson </li></ul></ul>
  87. 130. http://larsjuhljensen.wordpress.com

×