Your SlideShare is downloading. ×
Large-scale data and text mining
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Large-scale data and text mining

159
views

Published on

Published in: Science

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
159
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Network biology Large-scale data and text mining Lars Juhl Jensen
  • 2. protein networks
  • 3. medical networks
  • 4. guilt by association
  • 5. protein networks
  • 6. STRING
  • 7. functional associations
  • 8. computational predictions
  • 9. gene fusion
  • 10. Korbel et al., Nature Biotechnology, 2004
  • 11. gene neighborhood
  • 12. Korbel et al., Nature Biotechnology, 2004
  • 13. phylogenetic profiles
  • 14. Korbel et al., Nature Biotechnology, 2004
  • 15. experimental data
  • 16. gene coexpression
  • 17. protein interactions
  • 18. Jensen & Bork, Science, 2008
  • 19. curated knowledge
  • 20. complexes
  • 21. pathways
  • 22. Letunic & Bork, Trends in Biochemical Sciences, 2008
  • 23. many databases
  • 24. different formats
  • 25. different identifiers
  • 26. variable quality
  • 27. not comparable
  • 28. not same species
  • 29. hard work
  • 30. quality scores
  • 31. von Mering et al., Nucleic Acids Research, 2005
  • 32. calibrate vs. gold standard
  • 33. von Mering et al., Nucleic Acids Research, 2005
  • 34. homology-based transfer
  • 35. Franceschini et al., Nucleic Acids Research, 2013
  • 36. vizualization
  • 37. string-db.org
  • 38. missing most of the data
  • 39. text mining
  • 40. >10 km
  • 41. too much to read
  • 42. computer
  • 43. as smart as a dog
  • 44. teach it specific tricks
  • 45. named entity recognition
  • 46. comprehensive lexicon
  • 47. CDC2
  • 48. cyclin dependent kinase 1
  • 49. expansion rules
  • 50. hCdc2
  • 51. CDC2
  • 52. flexible matching
  • 53. cyclin-dependent kinase 1
  • 54. cyclin dependent kinase 1
  • 55. “black list”
  • 56. SDS
  • 57. augmented browsing
  • 58. Reflect
  • 59. browser add-on
  • 60. real-time text mining
  • 61. Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009 O’Donoghue et al., Journal of Web Semantics, 2010
  • 62. information extraction
  • 63. co-mentioning
  • 64. within documents
  • 65. within paragraphs
  • 66. within sentences
  • 67. natural language processing
  • 68. Gene and protein names Cue words for entity recognition Verbs for relation extraction [nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]] is controlled by [nxpg HAP1]
  • 69. text corpus
  • 70. ~22 million abstracts
  • 71. millions of full-text articles
  • 72. medical networks
  • 73. Jensen et al., Nature Reviews Genetics, 2012
  • 74. opt-out
  • 75. opt-in
  • 76. structured data
  • 77. Jensen et al., Nature Reviews Genetics, 2012
  • 78. unstructured data
  • 79. Danish
  • 80. busy doctors
  • 81. psychiatric patients
  • 82. custom dictionaries
  • 83. drugs
  • 84. adverse drug events
  • 85. complex filters
  • 86. Eriksson et al., submitted, 2013
  • 87. new adverse drug reactions
  • 88. Eriksson et al., submitted, 2013 Drug substance ADE p-value Chlordiazepoxide Nystagmus 4.0e-8 Simvastatin Personality changes 8.4e-8 Dipyridamole Visual impairment 4.4e-4 Citalopram Psychosis 8.8e-4 Bendroflumethiazi de Apoplexy 8.5e-3
  • 89. temporal correlation
  • 90. diagnosis trajectories
  • 91. Jensen et al., in preparation, 2013
  • 92. national discharge registry
  • 93. 6.2 million patients
  • 94. 14 years
  • 95. confounding factors
  • 96. age and gender
  • 97. Jensen et al., submitted, 2013 Female Male In-patientOut-patientEmergencyroom
  • 98. lifestyle
  • 99. reporting biases
  • 100. complex trajectories
  • 101. Jensen et al., submitted, 2013
  • 102. medical implications
  • 103. Acknowledgments STRING Christian von Mering Damian Szklarczyk Michael Kuhn Manuel Stark Samuel Chaffron Chris Creevey Jean Muller Tobias Doerks Philippe Julien Alexander Roth Milan Simonovic Jan Korbel Berend Snel Martijn Huynen Peer Bork Text mining Sune Frankild Jasmin Saric Evangelos Pafilis Kalliopi Tsafou Alberto Santos Janos Binder Heiko Horn Michael Kuhn Nigel Brown Reinhardt Schneider Sean O’ Donoghue EHR mining Anders Boeck Jensen Peter Bjødstrup Jensen Robert Eriksson Francisco S. Roque Henriette Schmock Marlene Dalgaard Massimo Andreatta Thomas Hansen Karen Søeby Søren Bredkjær Anders Juul Tudor Oprea Pope Moseley Thomas Werge Søren Brunak

×