Data and Text Mining

900 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
900
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
86
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Data and Text Mining

  1. 1. Data and Text Mining Lars Juhl Jensen
  2. 2. sequence analysis
  3. 3. protein networks
  4. 4. de Lichtenberg, Jensen et al., Science, 2005
  5. 5. adverse drug reactions
  6. 6. Campillos, Kuhn et al., Science, 2008
  7. 7. group leader
  8. 8. cofounder
  9. 9. data mining
  10. 10. proteomics
  11. 11. text mining
  12. 12. biomedical literature
  13. 13. electronic health records
  14. 14. protein networks
  15. 15. guilt by association
  16. 16. STRING
  17. 17. Szklarczyk, Franceschini et al., Nucleic Acids Research, 2011
  18. 18. computational predictions
  19. 19. gene fusion
  20. 20. Korbel et al., Nature Biotechnology, 2004
  21. 21. gene neighborhood
  22. 22. operons
  23. 23. Korbel et al., Nature Biotechnology, 2004
  24. 24. bidirectional promoters
  25. 25. Korbel et al., Nature Biotechnology, 2004
  26. 26. phylogenetic profiles
  27. 27. Korbel et al., Nature Biotechnology, 2004
  28. 28. a real example
  29. 29. Cell Cellulosomes Cellulose
  30. 30. experimental data
  31. 31. gene coexpression
  32. 32. protein interactions
  33. 33. Jensen & Bork, Science, 2008
  34. 34. genetic interactions
  35. 35. Beyer et al., Nature Reviews Genetics, 2007
  36. 36. curated knowledge
  37. 37. complexes
  38. 38. pathways
  39. 39. Letunic & Bork, Trends in Biochemical Sciences, 2008
  40. 40. many databases
  41. 41. different formats
  42. 42. different identifiers
  43. 43. variable quality
  44. 44. not comparable
  45. 45. not same species
  46. 46. hard work
  47. 47. quality scores
  48. 48. von Mering et al., Nucleic Acids Research, 2005
  49. 49. calibrate vs. gold standard
  50. 50. von Mering et al., Nucleic Acids Research, 2005
  51. 51. homology-based transfer
  52. 52. Franceschini et al., Nucleic Acids Research, 2013
  53. 53. missing most of the data
  54. 54. text mining
  55. 55. >10 km
  56. 56. too much to read
  57. 57. computer
  58. 58. as smart as a dog
  59. 59. teach it specific tricks
  60. 60. named entity recognition
  61. 61. comprehensive lexicon
  62. 62. CDC2
  63. 63. cyclin dependent kinase 1
  64. 64. expansion rules
  65. 65. hCdc2
  66. 66. CDC2
  67. 67. flexible matching
  68. 68. cyclin-dependent kinase 1
  69. 69. cyclin dependent kinase 1
  70. 70. “black list”
  71. 71. SDS
  72. 72. augmented browsing
  73. 73. Reflect
  74. 74. browser add-on
  75. 75. real-time text mining
  76. 76. Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009 O’Donoghue et al., Journal of Web Semantics, 2010
  77. 77. information extraction
  78. 78. co-mentioning
  79. 79. within documents
  80. 80. within paragraphs
  81. 81. within sentences
  82. 82. text corpus
  83. 83. ~22 million abstracts
  84. 84. no access
  85. 85. millions of full-text articles
  86. 86. localization and disease
  87. 87. general approach
  88. 88. COMPARTMENTS
  89. 89. TISSUES
  90. 90. DISEASES
  91. 91. curated knowledge
  92. 92. experimental data
  93. 93. text mining
  94. 94. computational predictions
  95. 95. common identifiers
  96. 96. quality scores
  97. 97. visualization
  98. 98. compartments.jensenlab.org
  99. 99. tissues.jensenlab.org
  100. 100. dissemination
  101. 101. web interfaces
  102. 102. web services
  103. 103. diseases.jensenlab.org
  104. 104. bulk download
  105. 105. Acknowledgments STRING Christian von Mering Damian Szklarczyk Michael Kuhn Manuel Stark Samuel Chaffron Chris Creevey Jean Muller Tobias Doerks Philippe Julien Alexander Roth Milan Simonovic Jan Korbel Berend Snel Martijn Huynen Peer Bork Text mining Sune Frankild Evangelos Pafilis Kalliopi Tsafou Alberto Santos Janos Binder Heiko Horn Michael Kuhn Nigel Brown Reinhardt Schneider Sean O’ Donoghue

×