Your SlideShare is downloading. ×
0
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Data and Text Mining
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Data and Text Mining

615

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
615
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
85
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Data and Text Mining Lars Juhl Jensen
  • 2. sequence analysis
  • 3. protein networks
  • 4. de Lichtenberg, Jensen et al., Science, 2005
  • 5. adverse drug reactions
  • 6. Campillos, Kuhn et al., Science, 2008
  • 7. group leader
  • 8. cofounder
  • 9. data mining
  • 10. proteomics
  • 11. text mining
  • 12. biomedical literature
  • 13. electronic health records
  • 14. protein networks
  • 15. guilt by association
  • 16. STRING
  • 17. Szklarczyk, Franceschini et al., Nucleic Acids Research, 2011
  • 18. computational predictions
  • 19. gene fusion
  • 20. Korbel et al., Nature Biotechnology, 2004
  • 21. gene neighborhood
  • 22. operons
  • 23. Korbel et al., Nature Biotechnology, 2004
  • 24. bidirectional promoters
  • 25. Korbel et al., Nature Biotechnology, 2004
  • 26. phylogenetic profiles
  • 27. Korbel et al., Nature Biotechnology, 2004
  • 28. a real example
  • 29. Cell Cellulosomes Cellulose
  • 30. experimental data
  • 31. gene coexpression
  • 32. protein interactions
  • 33. Jensen & Bork, Science, 2008
  • 34. genetic interactions
  • 35. Beyer et al., Nature Reviews Genetics, 2007
  • 36. curated knowledge
  • 37. complexes
  • 38. pathways
  • 39. Letunic & Bork, Trends in Biochemical Sciences, 2008
  • 40. many databases
  • 41. different formats
  • 42. different identifiers
  • 43. variable quality
  • 44. not comparable
  • 45. not same species
  • 46. hard work
  • 47. quality scores
  • 48. von Mering et al., Nucleic Acids Research, 2005
  • 49. calibrate vs. gold standard
  • 50. von Mering et al., Nucleic Acids Research, 2005
  • 51. homology-based transfer
  • 52. Franceschini et al., Nucleic Acids Research, 2013
  • 53. missing most of the data
  • 54. text mining
  • 55. >10 km
  • 56. too much to read
  • 57. computer
  • 58. as smart as a dog
  • 59. teach it specific tricks
  • 60. named entity recognition
  • 61. comprehensive lexicon
  • 62. CDC2
  • 63. cyclin dependent kinase 1
  • 64. expansion rules
  • 65. hCdc2
  • 66. CDC2
  • 67. flexible matching
  • 68. cyclin-dependent kinase 1
  • 69. cyclin dependent kinase 1
  • 70. “black list”
  • 71. SDS
  • 72. augmented browsing
  • 73. Reflect
  • 74. browser add-on
  • 75. real-time text mining
  • 76. Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009 O’Donoghue et al., Journal of Web Semantics, 2010
  • 77. information extraction
  • 78. co-mentioning
  • 79. within documents
  • 80. within paragraphs
  • 81. within sentences
  • 82. text corpus
  • 83. ~22 million abstracts
  • 84. no access
  • 85. millions of full-text articles
  • 86. localization and disease
  • 87. general approach
  • 88. COMPARTMENTS
  • 89. TISSUES
  • 90. DISEASES
  • 91. curated knowledge
  • 92. experimental data
  • 93. text mining
  • 94. computational predictions
  • 95. common identifiers
  • 96. quality scores
  • 97. visualization
  • 98. compartments.jensenlab.org
  • 99. tissues.jensenlab.org
  • 100. dissemination
  • 101. web interfaces
  • 102. web services
  • 103. diseases.jensenlab.org
  • 104. bulk download
  • 105. Acknowledgments STRING Christian von Mering Damian Szklarczyk Michael Kuhn Manuel Stark Samuel Chaffron Chris Creevey Jean Muller Tobias Doerks Philippe Julien Alexander Roth Milan Simonovic Jan Korbel Berend Snel Martijn Huynen Peer Bork Text mining Sune Frankild Evangelos Pafilis Kalliopi Tsafou Alberto Santos Janos Binder Heiko Horn Michael Kuhn Nigel Brown Reinhardt Schneider Sean O’ Donoghue

×