Pragmatic text mining
From literature to electronic health
records

Lars Juhl Jensen
why text mining?
data mining
guilt by association
structured data
unstructured text
biomedical literature
>10 km
too much to read
computer
as smart as a dog
teach it specific tricks
named entity recognition
dictionary-based approach
identification required
dictionary
cyclin dependent kinase 1
CDC2
expansion rules
CDC2
hCdc2
flexible matching
hyphens and spaces
“black list”
SDS
efficient tagger
Pafilis et al., PLOS ONE, 2013
the formal way
benchmark
manually annotated corpus
automatic tagging
precision
recall
natural language processing
Gene and protein names
Cue words for entity
recognition
Verbs for relation extraction
[nxexpr The expression of
[nxgene th...
hard work
the pragmatic way
“benchmark light”
requires fewer calories
non-annotated corpus
automatic tagging
random inspection
precision
no recall
relative recall
co-mentioning
within documents
within paragraphs
within sentences
weighted score
unifying text & data
web resources
text mining
curated knowledge
Letunic & Bork, Trends in Biochemical Sciences, 2008
experimental data
von Mering et al., Nucleic Acids Research, 2005
computational predictions
common identifiers
quality scores
proteins
Szklarczyk, Franceschini et al., Nucleic Acids Research, 2011
small molecules
Kuhn et al., Nucleic Acids Research, 2012
compartments
compartments.jensenlab.org
tissues
tissues.jensenlab.org
diseases
electronic health records
Jensen et al., Nature Reviews Genetics, 2012
structured data
Jensen et al., Nature Reviews Genetics, 2012
unstructured data
clinical narrative
Danish
busy doctors
psychiatric patients
pharmacovigilance
structured data
medication
text mining
drug indications
adverse drug events
temporal correlation
complex filtering
Eriksson et al., in submitted, 2013
Drug substance
ADE
Chlordiazepoxide Nystagmus
Simvastatin
Personality
changes
Dipyridamole
Visual impairment
Citalopram
Ps...
Acknowledgments
Protein networks

Localization and disease

Christian von Mering
Damian Szklarczyk
Michael Kuhn
Manuel Sta...
Pragmatic text mining: From literature to electronic health records
Pragmatic text mining: From literature to electronic health records
Pragmatic text mining: From literature to electronic health records
Pragmatic text mining: From literature to electronic health records
Pragmatic text mining: From literature to electronic health records
Pragmatic text mining: From literature to electronic health records
Pragmatic text mining: From literature to electronic health records
Upcoming SlideShare
Loading in …5
×

Pragmatic text mining: From literature to electronic health records

579 views

Published on

Published in: Health & Medicine, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
579
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Pragmatic text mining: From literature to electronic health records

  1. 1. Pragmatic text mining From literature to electronic health records Lars Juhl Jensen
  2. 2. why text mining?
  3. 3. data mining
  4. 4. guilt by association
  5. 5. structured data
  6. 6. unstructured text
  7. 7. biomedical literature
  8. 8. >10 km
  9. 9. too much to read
  10. 10. computer
  11. 11. as smart as a dog
  12. 12. teach it specific tricks
  13. 13. named entity recognition
  14. 14. dictionary-based approach
  15. 15. identification required
  16. 16. dictionary
  17. 17. cyclin dependent kinase 1
  18. 18. CDC2
  19. 19. expansion rules
  20. 20. CDC2
  21. 21. hCdc2
  22. 22. flexible matching
  23. 23. hyphens and spaces
  24. 24. “black list”
  25. 25. SDS
  26. 26. efficient tagger
  27. 27. Pafilis et al., PLOS ONE, 2013
  28. 28. the formal way
  29. 29. benchmark
  30. 30. manually annotated corpus
  31. 31. automatic tagging
  32. 32. precision
  33. 33. recall
  34. 34. natural language processing
  35. 35. Gene and protein names Cue words for entity recognition Verbs for relation extraction [nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]] is controlled by [nxpg HAP1]
  36. 36. hard work
  37. 37. the pragmatic way
  38. 38. “benchmark light”
  39. 39. requires fewer calories
  40. 40. non-annotated corpus
  41. 41. automatic tagging
  42. 42. random inspection
  43. 43. precision
  44. 44. no recall
  45. 45. relative recall
  46. 46. co-mentioning
  47. 47. within documents
  48. 48. within paragraphs
  49. 49. within sentences
  50. 50. weighted score
  51. 51. unifying text & data
  52. 52. web resources
  53. 53. text mining
  54. 54. curated knowledge
  55. 55. Letunic & Bork, Trends in Biochemical Sciences, 2008
  56. 56. experimental data
  57. 57. von Mering et al., Nucleic Acids Research, 2005
  58. 58. computational predictions
  59. 59. common identifiers
  60. 60. quality scores
  61. 61. proteins
  62. 62. Szklarczyk, Franceschini et al., Nucleic Acids Research, 2011
  63. 63. small molecules
  64. 64. Kuhn et al., Nucleic Acids Research, 2012
  65. 65. compartments
  66. 66. compartments.jensenlab.org
  67. 67. tissues
  68. 68. tissues.jensenlab.org
  69. 69. diseases
  70. 70. electronic health records
  71. 71. Jensen et al., Nature Reviews Genetics, 2012
  72. 72. structured data
  73. 73. Jensen et al., Nature Reviews Genetics, 2012
  74. 74. unstructured data
  75. 75. clinical narrative
  76. 76. Danish
  77. 77. busy doctors
  78. 78. psychiatric patients
  79. 79. pharmacovigilance
  80. 80. structured data
  81. 81. medication
  82. 82. text mining
  83. 83. drug indications
  84. 84. adverse drug events
  85. 85. temporal correlation
  86. 86. complex filtering
  87. 87. Eriksson et al., in submitted, 2013
  88. 88. Drug substance ADE Chlordiazepoxide Nystagmus Simvastatin Personality changes Dipyridamole Visual impairment Citalopram Psychosis Bendroflumethiazi Apoplexy de p-value 4.0e-8 8.4e-8 4.4e-4 8.8e-4 8.5e-3 Eriksson et al., submitted, 2013
  89. 89. Acknowledgments Protein networks Localization and disease Christian von Mering Damian Szklarczyk Michael Kuhn Manuel Stark Jean Muller Tobias Doerks Alexander Roth Milan Simonovic Berend Snel Martijn Huynen Peer Bork Sune Frankild Alberto Santos Kalliopi Tsafou Janos Binder Reinhard Schneider Sean O’Donoghue Electronic health records Robert Eriksson Thomas Werge Søren Brunak

×