• Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
538
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
19
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Text and data mining Lars Juhl Jensen
  • 2. Part 1 text mining
  • 3. exponential growth
  • 4.  
  • 5.  
  • 6. some things are constant
  • 7.  
  • 8. ~45 seconds per paper
  • 9. computer
  • 10. as smart as a dog
  • 11. teach it specific tricks
  • 12.  
  • 13.  
  • 14. named entity identification
  • 15. Reflect
  • 16. Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology , 2009
  • 17. comprehensive lexicon
  • 18. orthographic variation
  • 19. “ black list”
  • 20. information extraction
  • 21. no access
  • 22.  
  • 23. collaboration
  • 24.  
  • 25.  
  • 26. Part 2 protein networks
  • 27. guilt by association
  • 28.  
  • 29. STRING
  • 30. Szklarczyk, Franceschini et al., Nucleic Acids Research , 2011
  • 31. genomic context
  • 32. gene fusion
  • 33. Korbel et al., Nature Biotechnology , 2004
  • 34. experimental data
  • 35. physical interactions
  • 36. Jensen & Bork, Science , 2008
  • 37. genetic interactions
  • 38. Beyer et al., Nature Reviews Genetics , 2007
  • 39. gene coexpression
  • 40.  
  • 41. curated knowledge
  • 42. pathways
  • 43. Letunic & Bork, Trends in Biochemical Sciences , 2008
  • 44. text mining
  • 45.  
  • 46. many data types
  • 47. many databases
  • 48. different formats
  • 49. different identifiers
  • 50. variable quality
  • 51. quality scores
  • 52. calibrate vs. gold standard
  • 53. von Mering et al., Nucleic Acids Research , 2005
  • 54. orthology transfer
  • 55. Frishman et al., Modern Genome Annotation , 2009
  • 56. Part 3 drug networks
  • 57. new uses for old drugs
  • 58. shared target(s)
  • 59. chemical similarity
  • 60. Campillos & Kuhn et al., Science , 2008
  • 61. similar drugs share targets
  • 62. Campillos & Kuhn et al., Science , 2008
  • 63. only trivial predictions
  • 64. phenotypic similarity
  • 65. chemical perturbations
  • 66. phenotypic readouts
  • 67. drug treatment
  • 68. side effects
  • 69. no database
  • 70. package inserts
  • 71. Campillos & Kuhn et al., Science , 2008
  • 72. text mining
  • 73. manual validation
  • 74. side-effect correlations
  • 75. Campillos & Kuhn et al., Science , 2008
  • 76. side-effect frequencies
  • 77. Campillos & Kuhn et al., Science , 2008
  • 78. side-effect similarity
  • 79. chemical similarity
  • 80. Campillos & Kuhn et al., Science , 2008
  • 81. categorization
  • 82. Campillos & Kuhn et al., Science , 2008
  • 83. 20 drug–drug pairs
  • 84. in vitro binding assays
  • 85. K i <10 µM for 11 of 20
  • 86. cell assays
  • 87. 9 of 9 showed activity
  • 88. Acknowledgments
      • reflect.ws
      • Sune Frankild
      • Heiko Horn
      • Evangelos Pafilis
      • Michael Kuhn
      • Reinhardt Schneider
      • Sean O’Donoghue
      • sideeffects.embl.de
      • Monica Campillos
      • Michael Kuhn
      • Anne-Claude Gavin
      • Peer Bork
      • string-db.org
      • Damian Szklarczyk
      • Andrea Franceschini
      • Michael Kuhn
      • Milan Simonovic
      • Alexander Roth
      • Pablo Minguez
      • Tobias Doerks
      • Manuel Stark
      • Jean Muller
      • Peer Bork
      • Christian von Mering
  • 89. larsjuhljensen
  • 90.