Large-scale data and text mining

338 views
265 views

Published on

Published in: Science
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
338
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Large-scale data and text mining

  1. 1. Network biology Large-scale data and text mining Lars Juhl Jensen
  2. 2. protein networks
  3. 3. medical networks
  4. 4. guilt by association
  5. 5. protein networks
  6. 6. STRING
  7. 7. functional associations
  8. 8. computational predictions
  9. 9. gene fusion
  10. 10. Korbel et al., Nature Biotechnology, 2004
  11. 11. gene neighborhood
  12. 12. Korbel et al., Nature Biotechnology, 2004
  13. 13. phylogenetic profiles
  14. 14. Korbel et al., Nature Biotechnology, 2004
  15. 15. experimental data
  16. 16. gene coexpression
  17. 17. protein interactions
  18. 18. Jensen & Bork, Science, 2008
  19. 19. curated knowledge
  20. 20. complexes
  21. 21. pathways
  22. 22. Letunic & Bork, Trends in Biochemical Sciences, 2008
  23. 23. many databases
  24. 24. different formats
  25. 25. different identifiers
  26. 26. variable quality
  27. 27. not comparable
  28. 28. not same species
  29. 29. hard work
  30. 30. quality scores
  31. 31. von Mering et al., Nucleic Acids Research, 2005
  32. 32. calibrate vs. gold standard
  33. 33. von Mering et al., Nucleic Acids Research, 2005
  34. 34. homology-based transfer
  35. 35. Franceschini et al., Nucleic Acids Research, 2013
  36. 36. vizualization
  37. 37. string-db.org
  38. 38. missing most of the data
  39. 39. text mining
  40. 40. >10 km
  41. 41. too much to read
  42. 42. computer
  43. 43. as smart as a dog
  44. 44. teach it specific tricks
  45. 45. named entity recognition
  46. 46. comprehensive lexicon
  47. 47. CDC2
  48. 48. cyclin dependent kinase 1
  49. 49. expansion rules
  50. 50. hCdc2
  51. 51. CDC2
  52. 52. flexible matching
  53. 53. cyclin-dependent kinase 1
  54. 54. cyclin dependent kinase 1
  55. 55. “black list”
  56. 56. SDS
  57. 57. augmented browsing
  58. 58. Reflect
  59. 59. browser add-on
  60. 60. real-time text mining
  61. 61. Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009 O’Donoghue et al., Journal of Web Semantics, 2010
  62. 62. information extraction
  63. 63. co-mentioning
  64. 64. within documents
  65. 65. within paragraphs
  66. 66. within sentences
  67. 67. natural language processing
  68. 68. Gene and protein names Cue words for entity recognition Verbs for relation extraction [nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]] is controlled by [nxpg HAP1]
  69. 69. text corpus
  70. 70. ~22 million abstracts
  71. 71. millions of full-text articles
  72. 72. medical networks
  73. 73. Jensen et al., Nature Reviews Genetics, 2012
  74. 74. opt-out
  75. 75. opt-in
  76. 76. structured data
  77. 77. Jensen et al., Nature Reviews Genetics, 2012
  78. 78. unstructured data
  79. 79. Danish
  80. 80. busy doctors
  81. 81. psychiatric patients
  82. 82. custom dictionaries
  83. 83. drugs
  84. 84. adverse drug events
  85. 85. complex filters
  86. 86. Eriksson et al., submitted, 2013
  87. 87. new adverse drug reactions
  88. 88. Eriksson et al., submitted, 2013 Drug substance ADE p-value Chlordiazepoxide Nystagmus 4.0e-8 Simvastatin Personality changes 8.4e-8 Dipyridamole Visual impairment 4.4e-4 Citalopram Psychosis 8.8e-4 Bendroflumethiazi de Apoplexy 8.5e-3
  89. 89. temporal correlation
  90. 90. diagnosis trajectories
  91. 91. Jensen et al., in preparation, 2013
  92. 92. national discharge registry
  93. 93. 6.2 million patients
  94. 94. 14 years
  95. 95. confounding factors
  96. 96. age and gender
  97. 97. Jensen et al., submitted, 2013 Female Male In-patientOut-patientEmergencyroom
  98. 98. lifestyle
  99. 99. reporting biases
  100. 100. complex trajectories
  101. 101. Jensen et al., submitted, 2013
  102. 102. medical implications
  103. 103. Acknowledgments STRING Christian von Mering Damian Szklarczyk Michael Kuhn Manuel Stark Samuel Chaffron Chris Creevey Jean Muller Tobias Doerks Philippe Julien Alexander Roth Milan Simonovic Jan Korbel Berend Snel Martijn Huynen Peer Bork Text mining Sune Frankild Jasmin Saric Evangelos Pafilis Kalliopi Tsafou Alberto Santos Janos Binder Heiko Horn Michael Kuhn Nigel Brown Reinhardt Schneider Sean O’ Donoghue EHR mining Anders Boeck Jensen Peter Bjødstrup Jensen Robert Eriksson Francisco S. Roque Henriette Schmock Marlene Dalgaard Massimo Andreatta Thomas Hansen Karen Søeby Søren Bredkjær Anders Juul Tudor Oprea Pope Moseley Thomas Werge Søren Brunak

×