Medical data mining
Lars Juhl Jensen
unstructured data
structured data
Jensen et al., Nature Reviews Genetics, 2012
individual hospitals
central registries
opt-out
opt-in
Danish registries
civil registration system
CPR number
established in 1968
Jensen et al., Nature Reviews Genetics, 2012
national discharge registry
14 years
6.2 million patients
45 million admissions
68 million records
119 million diagnosis
ICD-10
Jensen et al., Nature Reviews Genetics, 2012
reimbursement
not research
diagnosis trajectories
naïve approach
comorbidity
Jensen et al., Nature Reviews Genetics, 2012
confounding factors
“known knowns”
gender
age
type of hospital encounter
Jensen et al., submitted, 2013
Female Male
In-patientOut-patientEmergencyroom
“known unknowns”
smoking
diet
“unknown unknowns”
reporting biases
disease clustering
temporal correlation
Jensen et al., submitted, 2013
diagnosis trajectories
Jensen et al., submitted, 2013
epilepsy
Jensen et al., submitted, 2013
gout
Jensen et al., submitted, 2013
electronic health records
structured data
Jensen et al., Nature Reviews Genetics, 2012
unstructured data
free text
Danish
busy doctors
psychiatric patients
delusions
text mining
computer
as smart as a dog
teach it specific tricks
named entity recognition
custom dictionaries
diseases
drugs
adverse drug events
expansion rules
orthographic variation
typos
“negative modifiers”
negations
family members
detailed disease profiles
Roque et al., PLOS Computational Biology, 2011
3262638254947
Assigned codes
Text mined codes
comorbidity
Roque et al., PLOS Computational Biology, 2011
patient stratification
Roque et al., PLOS Computational Biology, 2011
cluster characterization
Roque et al., PLOS Computational Biology, 2011
adverse drug reactions
structured data
medication
clinical narrative
possible ADRs
semi-structured data
SPC
Summary of Product Characteristics
drug indications
known ADRs
temporal correlation
link drugs to ADRs
complex filtering
Eriksson et al., submitted, 2013
new ADRs
Eriksson et al., submitted, 2013
Drug substance ADE p-value
Chlordiazepoxide Nystagmus 4.0e-8
Simvastatin Personality
chan...
ADR frequencies
Eriksson et al., submitted, 2013
heavily medicated
Eriksson et al., submitted, 2013
ADR dose dependency
Eriksson et al., submitted, 2013
ADR similarity
Eriksson et al., submitted, 2013
drug repurposing
Campillos, Kuhn et al., Science, 2008
Acknowledgments
Disease trajectories
Anders Bøck Jensen
Tudor Oprea
Pope Moseley
Søren Brunak
Adverse drug reactions
Rober...
Thank you!
Medical data mining
Medical data mining
Medical data mining
Medical data mining
Medical data mining
Medical data mining
Upcoming SlideShare
Loading in …5
×

Medical data mining

296
-1

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
296
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Medical data mining

  1. 1. Medical data mining Lars Juhl Jensen
  2. 2. unstructured data
  3. 3. structured data
  4. 4. Jensen et al., Nature Reviews Genetics, 2012
  5. 5. individual hospitals
  6. 6. central registries
  7. 7. opt-out
  8. 8. opt-in
  9. 9. Danish registries
  10. 10. civil registration system
  11. 11. CPR number
  12. 12. established in 1968
  13. 13. Jensen et al., Nature Reviews Genetics, 2012
  14. 14. national discharge registry
  15. 15. 14 years
  16. 16. 6.2 million patients
  17. 17. 45 million admissions
  18. 18. 68 million records
  19. 19. 119 million diagnosis
  20. 20. ICD-10
  21. 21. Jensen et al., Nature Reviews Genetics, 2012
  22. 22. reimbursement
  23. 23. not research
  24. 24. diagnosis trajectories
  25. 25. naïve approach
  26. 26. comorbidity
  27. 27. Jensen et al., Nature Reviews Genetics, 2012
  28. 28. confounding factors
  29. 29. “known knowns”
  30. 30. gender
  31. 31. age
  32. 32. type of hospital encounter
  33. 33. Jensen et al., submitted, 2013 Female Male In-patientOut-patientEmergencyroom
  34. 34. “known unknowns”
  35. 35. smoking
  36. 36. diet
  37. 37. “unknown unknowns”
  38. 38. reporting biases
  39. 39. disease clustering
  40. 40. temporal correlation
  41. 41. Jensen et al., submitted, 2013
  42. 42. diagnosis trajectories
  43. 43. Jensen et al., submitted, 2013
  44. 44. epilepsy
  45. 45. Jensen et al., submitted, 2013
  46. 46. gout
  47. 47. Jensen et al., submitted, 2013
  48. 48. electronic health records
  49. 49. structured data
  50. 50. Jensen et al., Nature Reviews Genetics, 2012
  51. 51. unstructured data
  52. 52. free text
  53. 53. Danish
  54. 54. busy doctors
  55. 55. psychiatric patients
  56. 56. delusions
  57. 57. text mining
  58. 58. computer
  59. 59. as smart as a dog
  60. 60. teach it specific tricks
  61. 61. named entity recognition
  62. 62. custom dictionaries
  63. 63. diseases
  64. 64. drugs
  65. 65. adverse drug events
  66. 66. expansion rules
  67. 67. orthographic variation
  68. 68. typos
  69. 69. “negative modifiers”
  70. 70. negations
  71. 71. family members
  72. 72. detailed disease profiles
  73. 73. Roque et al., PLOS Computational Biology, 2011 3262638254947 Assigned codes Text mined codes
  74. 74. comorbidity
  75. 75. Roque et al., PLOS Computational Biology, 2011
  76. 76. patient stratification
  77. 77. Roque et al., PLOS Computational Biology, 2011
  78. 78. cluster characterization
  79. 79. Roque et al., PLOS Computational Biology, 2011
  80. 80. adverse drug reactions
  81. 81. structured data
  82. 82. medication
  83. 83. clinical narrative
  84. 84. possible ADRs
  85. 85. semi-structured data
  86. 86. SPC Summary of Product Characteristics
  87. 87. drug indications
  88. 88. known ADRs
  89. 89. temporal correlation
  90. 90. link drugs to ADRs
  91. 91. complex filtering
  92. 92. Eriksson et al., submitted, 2013
  93. 93. new ADRs
  94. 94. Eriksson et al., submitted, 2013 Drug substance ADE p-value Chlordiazepoxide Nystagmus 4.0e-8 Simvastatin Personality changes 8.4e-8 Dipyridamole Visual impairment 4.4e-4 Citalopram Psychosis 8.8e-4 Bendroflumethiazi de Apoplexy 8.5e-3
  95. 95. ADR frequencies
  96. 96. Eriksson et al., submitted, 2013
  97. 97. heavily medicated
  98. 98. Eriksson et al., submitted, 2013
  99. 99. ADR dose dependency
  100. 100. Eriksson et al., submitted, 2013
  101. 101. ADR similarity
  102. 102. Eriksson et al., submitted, 2013
  103. 103. drug repurposing
  104. 104. Campillos, Kuhn et al., Science, 2008
  105. 105. Acknowledgments Disease trajectories Anders Bøck Jensen Tudor Oprea Pope Moseley Søren Brunak Adverse drug reactions Robert Eriksson Thomas Werge Søren Brunak EHR text mining Peter Bjødstrup Jensen Robert Eriksson Henriette Schmock Francisco S. Roque Anders Juul Marlene Dalgaard Massimo Andreatta Sune Frankild Eva Roitmann Thomas Hansen Karen Søeby Søren Bredkjær Thomas Werge Søren Brunak
  106. 106. Thank you!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×