The pragmatic text miner: From literature to electronic health records

995 views
943 views

Published on

Published in: Health & Medicine, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
995
On SlideShare
0
From Embeds
0
Number of Embeds
31
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

The pragmatic text miner: From literature to electronic health records

  1. 1. Lars Juhl Jensen The pragmatic text miner From literature to electronic health records
  2. 2. why text mining?
  3. 3. data mining
  4. 4. guilt by association
  5. 5. structured data
  6. 6. unstructured text
  7. 7. biomedical literature
  8. 8. >10 km
  9. 9. too much to read
  10. 10. computer
  11. 11. as smart as a dog
  12. 12. teach it specific tricks
  13. 13. named entity recognition
  14. 14. dictionary-based approach
  15. 15. identification required
  16. 16. dictionary
  17. 17. cyclin dependent kinase 1
  18. 18. CDC2
  19. 19. expansion rules
  20. 20. CDC2
  21. 21. hCdc2
  22. 22. flexible matching
  23. 23. cyclin dependent kinase 1
  24. 24. cyclin-dependent kinase 1
  25. 25. “black list”
  26. 26. SDS
  27. 27. >10 km <10 hours
  28. 28. the formal way
  29. 29. benchmark
  30. 30. manually annotated corpus
  31. 31. automatic tagging
  32. 32. compare
  33. 33. quality metrics
  34. 34. precision
  35. 35. recall
  36. 36. F-score
  37. 37. manually annotated corpus
  38. 38. use existing corpus
  39. 39. not new
  40. 40. make new corpus
  41. 41. hard work
  42. 42. natural language processing
  43. 43. part-of-speech tagging
  44. 44. semantic tagging
  45. 45. sentence parsing
  46. 46. Gene and protein names Cue words for entity recognition Verbs for relation extraction [nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]] is controlled by [nxpg HAP1]
  47. 47. handle negations
  48. 48. directionality
  49. 49. high precision
  50. 50. poor recall
  51. 51. highly domain specific
  52. 52. the pragmatic way
  53. 53. benchmark light™
  54. 54. requires fewer calories
  55. 55. non-annotated corpus
  56. 56. automatic tagging
  57. 57. random sampling
  58. 58. manual inspection
  59. 59. precision
  60. 60. no recall
  61. 61. relative recall
  62. 62. compare methods
  63. 63. co-mentioning
  64. 64. within documents
  65. 65. within paragraphs
  66. 66. within sentences
  67. 67. weighted score
  68. 68. benchmark
  69. 69. associations good?
  70. 70. tagging good enough
  71. 71. unifying text & data
  72. 72. web resources
  73. 73. text mining
  74. 74. curated knowledge
  75. 75. experimental data
  76. 76. computational predictions
  77. 77. common identifiers
  78. 78. quality scores
  79. 79. proteins
  80. 80. STRING
  81. 81. Szklarczyk, Franceschini et al., Nucleic Acids Research, 2011
  82. 82. small molecules
  83. 83. Kuhn et al., Nucleic Acids Research, 2012
  84. 84. compartments
  85. 85. compartments.jensenlab.org
  86. 86. tissues
  87. 87. tissues.jensenlab.org
  88. 88. diseases
  89. 89. environments
  90. 90. electronic health records
  91. 91. Jensen et al., Nature Reviews Genetics, 2012
  92. 92. structured data
  93. 93. Jensen et al., Nature Reviews Genetics, 2012
  94. 94. unstructured data
  95. 95. clinical narrative
  96. 96. comorbidity
  97. 97. Jensen et al., Nature Reviews Genetics, 2012
  98. 98. Roque et al., PLoS Computational Biology, 2011
  99. 99. in Danish
  100. 100. by busy doctors
  101. 101. confounding factors
  102. 102. age and gender
  103. 103. reporting bias
  104. 104. temporal correlation
  105. 105. diagnosis trajectories
  106. 106. Jensen et al., in preparation, 2013
  107. 107. pharmocovigilance
  108. 108. adverse drug reactions
  109. 109. Eriksson et al., in preparation, 2013
  110. 110. ADR profiles
  111. 111. Eriksson et al., in preparation, 2013
  112. 112. ADR frequencies
  113. 113. Eriksson et al., in preparation, 2013
  114. 114. Acknowledgments STRING Christian von Mering Damian Szklarczyk Michael Kuhn Manuel Stark Samuel Chaffron Chris Creevey Jean Muller Tobias Doerks Philippe Julien Alexander Roth Milan Simonovic Jan Korbel Berend Snel Martijn Huynen Peer Bork Text mining Sune Frankild Evangelos Pafilis Alberto Santos Kalliopi Tsafou Janos Binder Lucia Fanini Sarah Faulwetter Christina Pavloudi Julia Schnetzer Aikaterini Vasileiadou Heiko Horn Michael Kuhn Nigel Brown Reinhard Schneider Sean O’Donoghue EHR mining Robert Eriksson Peter Bjødstrup Jensen Anders Boeck Jensen Francisco S. Roque Henriette Schmock Marlene Dalgaard Massimo Andreatta Thomas Hansen Karen Søeby Søren Bredkjær Anders Juul Tudor Oprea Pope Moseley Thomas Werge Søren Brunak

×