Text and data mining Lars Juhl Jensen
Part 1 text mining
exponential growth
 
 
some things are constant
 
~45 seconds per paper
computer
as smart as a dog
teach it specific tricks
 
 
named entity identification
Reflect
Pafilis, O’Donoghue, Jensen et al.,  Nature Biotechnology , 2009
comprehensive lexicon
orthographic variation
“ black list”
information extraction
no access
 
collaboration
 
 
Part 2 protein networks
guilt by association
 
STRING
Szklarczyk, Franceschini et al.,  Nucleic Acids Research , 2011
genomic context
gene fusion
Korbel et al.,  Nature Biotechnology , 2004
experimental data
physical interactions
Jensen & Bork,  Science , 2008
genetic interactions
Beyer et al.,  Nature Reviews Genetics , 2007
gene coexpression
 
curated knowledge
pathways
Letunic & Bork,  Trends in Biochemical Sciences , 2008
text mining
 
many data types
many databases
different formats
different identifiers
variable quality
quality scores
calibrate vs. gold standard
von Mering et al.,  Nucleic Acids Research , 2005
orthology transfer
Frishman et al.,  Modern Genome Annotation , 2009
Part 3 drug networks
new uses for old drugs
shared target(s)
chemical similarity
Campillos & Kuhn et al.,  Science , 2008
similar drugs share targets
Campillos & Kuhn et al.,  Science , 2008
only trivial predictions
phenotypic similarity
chemical perturbations
phenotypic readouts
drug treatment
side effects
no database
package inserts
Campillos & Kuhn et al.,  Science , 2008
text mining
manual validation
side-effect correlations
Campillos & Kuhn et al.,  Science , 2008
side-effect frequencies
Campillos & Kuhn et al.,  Science , 2008
side-effect similarity
chemical similarity
Campillos & Kuhn et al.,  Science , 2008
categorization
Campillos & Kuhn et al.,  Science , 2008
20 drug–drug pairs
in vitro  binding assays
K i <10 µM for 11 of 20
cell assays
9 of 9 showed activity
Acknowledgments <ul><ul><li>reflect.ws </li></ul></ul><ul><ul><li>Sune Frankild </li></ul></ul><ul><ul><li>Heiko Horn </li...
larsjuhljensen
 
Upcoming SlideShare
Loading in …5
×

Text and data mining

721 views

Published on

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
721
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
21
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Text and data mining

  1. 1. Text and data mining Lars Juhl Jensen
  2. 2. Part 1 text mining
  3. 3. exponential growth
  4. 6. some things are constant
  5. 8. ~45 seconds per paper
  6. 9. computer
  7. 10. as smart as a dog
  8. 11. teach it specific tricks
  9. 14. named entity identification
  10. 15. Reflect
  11. 16. Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology , 2009
  12. 17. comprehensive lexicon
  13. 18. orthographic variation
  14. 19. “ black list”
  15. 20. information extraction
  16. 21. no access
  17. 23. collaboration
  18. 26. Part 2 protein networks
  19. 27. guilt by association
  20. 29. STRING
  21. 30. Szklarczyk, Franceschini et al., Nucleic Acids Research , 2011
  22. 31. genomic context
  23. 32. gene fusion
  24. 33. Korbel et al., Nature Biotechnology , 2004
  25. 34. experimental data
  26. 35. physical interactions
  27. 36. Jensen & Bork, Science , 2008
  28. 37. genetic interactions
  29. 38. Beyer et al., Nature Reviews Genetics , 2007
  30. 39. gene coexpression
  31. 41. curated knowledge
  32. 42. pathways
  33. 43. Letunic & Bork, Trends in Biochemical Sciences , 2008
  34. 44. text mining
  35. 46. many data types
  36. 47. many databases
  37. 48. different formats
  38. 49. different identifiers
  39. 50. variable quality
  40. 51. quality scores
  41. 52. calibrate vs. gold standard
  42. 53. von Mering et al., Nucleic Acids Research , 2005
  43. 54. orthology transfer
  44. 55. Frishman et al., Modern Genome Annotation , 2009
  45. 56. Part 3 drug networks
  46. 57. new uses for old drugs
  47. 58. shared target(s)
  48. 59. chemical similarity
  49. 60. Campillos & Kuhn et al., Science , 2008
  50. 61. similar drugs share targets
  51. 62. Campillos & Kuhn et al., Science , 2008
  52. 63. only trivial predictions
  53. 64. phenotypic similarity
  54. 65. chemical perturbations
  55. 66. phenotypic readouts
  56. 67. drug treatment
  57. 68. side effects
  58. 69. no database
  59. 70. package inserts
  60. 71. Campillos & Kuhn et al., Science , 2008
  61. 72. text mining
  62. 73. manual validation
  63. 74. side-effect correlations
  64. 75. Campillos & Kuhn et al., Science , 2008
  65. 76. side-effect frequencies
  66. 77. Campillos & Kuhn et al., Science , 2008
  67. 78. side-effect similarity
  68. 79. chemical similarity
  69. 80. Campillos & Kuhn et al., Science , 2008
  70. 81. categorization
  71. 82. Campillos & Kuhn et al., Science , 2008
  72. 83. 20 drug–drug pairs
  73. 84. in vitro binding assays
  74. 85. K i <10 µM for 11 of 20
  75. 86. cell assays
  76. 87. 9 of 9 showed activity
  77. 88. Acknowledgments <ul><ul><li>reflect.ws </li></ul></ul><ul><ul><li>Sune Frankild </li></ul></ul><ul><ul><li>Heiko Horn </li></ul></ul><ul><ul><li>Evangelos Pafilis </li></ul></ul><ul><ul><li>Michael Kuhn </li></ul></ul><ul><ul><li>Reinhardt Schneider </li></ul></ul><ul><ul><li>Sean O’Donoghue </li></ul></ul><ul><ul><li>sideeffects.embl.de </li></ul></ul><ul><ul><li>Monica Campillos </li></ul></ul><ul><ul><li>Michael Kuhn </li></ul></ul><ul><ul><li>Anne-Claude Gavin </li></ul></ul><ul><ul><li>Peer Bork </li></ul></ul><ul><ul><li>string-db.org </li></ul></ul><ul><ul><li>Damian Szklarczyk </li></ul></ul><ul><ul><li>Andrea Franceschini </li></ul></ul><ul><ul><li>Michael Kuhn </li></ul></ul><ul><ul><li>Milan Simonovic </li></ul></ul><ul><ul><li>Alexander Roth </li></ul></ul><ul><ul><li>Pablo Minguez </li></ul></ul><ul><ul><li>Tobias Doerks </li></ul></ul><ul><ul><li>Manuel Stark </li></ul></ul><ul><ul><li>Jean Muller </li></ul></ul><ul><ul><li>Peer Bork </li></ul></ul><ul><ul><li>Christian von Mering </li></ul></ul>
  78. 89. larsjuhljensen

×