SlideShare a Scribd company logo
1 of 41
Patrick Jamieson M.D.  Logical Semantics, Inc.  Knowledge Discovery and Data Mining of Free Text Radiology Reports
Acknowledgements This presentation was made possible by Grant Number 9R44RR024929-02 from the National Center for Research Resources (NCRR), a component of the National Institutes of Health (NIH). Its contents are solely the responsibility of the authors and do not necessarily represent the official view of NCRR or NIH.
Faculty Collaborators ,[object Object]
Josette Jones, Ph.D., Assistant Professor of Informatics – ontology development and organization
MalikaMahoui, Ph.D., Assistant Professor of Informatics – pattern matching and machine learning,[object Object]
 Why do we need a semantic index? What should it look like?
 Demo of MEDAT
 Questions, and hopefully some answers,[object Object]
Text miningattempting to find information in text, which once located answers a user’s question,[object Object]
 Correlate laboratory, genomic, or structured medical data with free text medical reports
 Extend decision support over text
 Document summarization
 Support for quality improvement,[object Object]
  Coding  A-Life
  Document Retrieval  MuchMore, iSMART,[object Object]
Canonical Form? “adenocarinoma of colon, metastatic to liver” ,[object Object]
 Colonic adenocarcinoma , metastatic to liver
 Large bowel adenocarcinoma , metastatic to liver
 Large intestine adenocarcinoma , metastatic to liver
 Large intestinal adenocarcinoma , metastatic to liver
 Colon’s adenocarcinoma , metastatic to liver
Adenocarcinoma of colon , with metastasis to liver
Adenocarcinoma of colon , with liver metastasis
Adenocarcinoma of colon , with hepatic metastasisUniqueness of Medical Data Mining, Krzysztof J. Cios and G. William Moore, Artificial Intelligence in Medicine, 2002
Semantic Search and Search Engines ,[object Object]
 Two semantically equivalent queries usually will get different results!			Top 5 tv moments   			Top fivetv moments Tomasz Imielinski,  AlessioSignorini, Jinyun Yan Rutgers
Semantic Queries using MedLEE  “Human errors can be avoided if the person who formulates the query can accurately translate the eligibility criteria into corresponding semantic classes and attributes, and thoroughly considers all possible combinations of semantic classes and attributes.” Li L, Chase HS, Patel, CO, Friedman C, Weng C. Comparing ICD9-Encoded Diagnoses and NLP-Processed Discharge Summaries for Clinical Trials Pre-Screening: A Case Study. AMIA AnnuSymp Proc. 2008; 2008: 404–408.
MedLEE Query  ,[object Object]
 exclusion: atrialfibrillation|atrialflutter|hemorrhage|hemorrhagicstroke|cerebralhemorrhage|TIA|transient ischemic Attack|Afb|afb|transient ischemic +sectname:report diagnosis item|report history of present illness item-certainty:very low certainty|lowcertainty|minimalcriteria|negative|no|rule out-status:resolved|removed|removal|end|healed|inactive|ruleout|familyhistory|atrisk|high risk
 #dv_stent: stent#-certainty:very low certainty|lowcertainty|minimalcriteria|negative|no|rule out#-sectname: report family history item|report social history item#-status:resolved|removed|removal|end|healed|inactive|ruleout|familyhistory|atrisk|high risk,[object Object]
 UMLS ontology
 Searches both coded and non coded content
 Researcher wishing to identify ‘invasive lung carcinoma’ in a pathology report would need to include in their query all synonyms of ‘invasive’ such as infiltrating, infiltrative, encroaching, aggressive, etc.,[object Object]
Example:Find All Reports Lungs are normal ,[object Object]
 Visualized lungs are normal?
 There are no pulmonary abnormalities?
 The right lung is normal?
 The left upper lobe is normal?Most NLP systems simply can not semantically index all the information accurately
Knowledge and Language Understanding Example: 	“Healing both bone fracture of the distal forearm.” Meaning: There is a healing distal radius fracture. There is a healing distal ulnar fracture.
Language Understanding Another Example “Global atrophy without acute abnormality” Meaning: There is diffuse cerebral atrophy. There is no acute intracranial abnormality.

More Related Content

Viewers also liked

Aries - Graphic Design & Internet Marketing
Aries - Graphic Design & Internet Marketing Aries - Graphic Design & Internet Marketing
Aries - Graphic Design & Internet Marketing Aries Graphic Design
 
Diario cjunio1912
Diario cjunio1912Diario cjunio1912
Diario cjunio1912Cortés
 
Slide share
Slide shareSlide share
Slide shareph1ll1p
 
Mobility Trends 2014 (by Creafutur)
Mobility Trends 2014 (by Creafutur)Mobility Trends 2014 (by Creafutur)
Mobility Trends 2014 (by Creafutur)Esade Creapolis
 
IGT's Centralized Baggage Help Desk Case Study
IGT's Centralized Baggage Help Desk Case StudyIGT's Centralized Baggage Help Desk Case Study
IGT's Centralized Baggage Help Desk Case StudyInterGlobe Technologies
 
เกณฑ์ราคากลางและคุณลักษณะพื้นฐานครุภัณฑ์คอมพิวเตอร์ ประจำปี พ.ศ. 2556
เกณฑ์ราคากลางและคุณลักษณะพื้นฐานครุภัณฑ์คอมพิวเตอร์ ประจำปี พ.ศ. 2556 เกณฑ์ราคากลางและคุณลักษณะพื้นฐานครุภัณฑ์คอมพิวเตอร์ ประจำปี พ.ศ. 2556
เกณฑ์ราคากลางและคุณลักษณะพื้นฐานครุภัณฑ์คอมพิวเตอร์ ประจำปี พ.ศ. 2556 Akawit Nasoke
 
Ken Morse "Innovate or Die" - 22 of March 2012 - ESADECREAPOLIS
Ken Morse "Innovate or Die" - 22 of March 2012 - ESADECREAPOLIS Ken Morse "Innovate or Die" - 22 of March 2012 - ESADECREAPOLIS
Ken Morse "Innovate or Die" - 22 of March 2012 - ESADECREAPOLIS Esade Creapolis
 
Mónica Mateu - Open&Cross Innovation en Alimentación y Salud
Mónica Mateu - Open&Cross Innovation en Alimentación y SaludMónica Mateu - Open&Cross Innovation en Alimentación y Salud
Mónica Mateu - Open&Cross Innovation en Alimentación y SaludEsade Creapolis
 
Slide share
Slide shareSlide share
Slide shareph1ll1p
 
Slide share
Slide shareSlide share
Slide shareph1ll1p
 
Xavier Naudeau Swot Groupon
Xavier Naudeau Swot  GrouponXavier Naudeau Swot  Groupon
Xavier Naudeau Swot GrouponLoïc Naga
 

Viewers also liked (15)

Web Venture Development
Web Venture DevelopmentWeb Venture Development
Web Venture Development
 
Aries - Graphic Design & Internet Marketing
Aries - Graphic Design & Internet Marketing Aries - Graphic Design & Internet Marketing
Aries - Graphic Design & Internet Marketing
 
Diario cjunio1912
Diario cjunio1912Diario cjunio1912
Diario cjunio1912
 
Visio p&id process designer introduction
Visio p&id process designer   introductionVisio p&id process designer   introduction
Visio p&id process designer introduction
 
Slide share
Slide shareSlide share
Slide share
 
Mobility Trends 2014 (by Creafutur)
Mobility Trends 2014 (by Creafutur)Mobility Trends 2014 (by Creafutur)
Mobility Trends 2014 (by Creafutur)
 
IGT's Centralized Baggage Help Desk Case Study
IGT's Centralized Baggage Help Desk Case StudyIGT's Centralized Baggage Help Desk Case Study
IGT's Centralized Baggage Help Desk Case Study
 
เกณฑ์ราคากลางและคุณลักษณะพื้นฐานครุภัณฑ์คอมพิวเตอร์ ประจำปี พ.ศ. 2556
เกณฑ์ราคากลางและคุณลักษณะพื้นฐานครุภัณฑ์คอมพิวเตอร์ ประจำปี พ.ศ. 2556 เกณฑ์ราคากลางและคุณลักษณะพื้นฐานครุภัณฑ์คอมพิวเตอร์ ประจำปี พ.ศ. 2556
เกณฑ์ราคากลางและคุณลักษณะพื้นฐานครุภัณฑ์คอมพิวเตอร์ ประจำปี พ.ศ. 2556
 
Ken Morse "Innovate or Die" - 22 of March 2012 - ESADECREAPOLIS
Ken Morse "Innovate or Die" - 22 of March 2012 - ESADECREAPOLIS Ken Morse "Innovate or Die" - 22 of March 2012 - ESADECREAPOLIS
Ken Morse "Innovate or Die" - 22 of March 2012 - ESADECREAPOLIS
 
Web Venture Development Outsourcing
Web Venture Development OutsourcingWeb Venture Development Outsourcing
Web Venture Development Outsourcing
 
Mónica Mateu - Open&Cross Innovation en Alimentación y Salud
Mónica Mateu - Open&Cross Innovation en Alimentación y SaludMónica Mateu - Open&Cross Innovation en Alimentación y Salud
Mónica Mateu - Open&Cross Innovation en Alimentación y Salud
 
Slide share
Slide shareSlide share
Slide share
 
Slide share
Slide shareSlide share
Slide share
 
Tsu Catalog 2011 1
Tsu Catalog 2011 1Tsu Catalog 2011 1
Tsu Catalog 2011 1
 
Xavier Naudeau Swot Groupon
Xavier Naudeau Swot  GrouponXavier Naudeau Swot  Groupon
Xavier Naudeau Swot Groupon
 

Similar to Knowledge Discovery And Data Mining Of Free Text Final

openEHR in Research: Linking Health Data with Computational Models
openEHR in Research: Linking Health Data with Computational ModelsopenEHR in Research: Linking Health Data with Computational Models
openEHR in Research: Linking Health Data with Computational ModelsKoray Atalag
 
Semantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical InformaticsSemantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical InformaticsChimezie Ogbuji
 
Ontology based support for brain tumour study
Ontology based support for brain tumour study Ontology based support for brain tumour study
Ontology based support for brain tumour study Subhashis Das
 
Controlled vocabularies for medical and health research
Controlled vocabularies for medical and health researchControlled vocabularies for medical and health research
Controlled vocabularies for medical and health researchARDC
 
Ontologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological DataOntologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological DataYannick Pouliot
 
Haendel clingenetics.3.14.14
Haendel clingenetics.3.14.14Haendel clingenetics.3.14.14
Haendel clingenetics.3.14.14mhaendel
 
Big Data in Pharma - Overview and Use Cases
Big Data in Pharma - Overview and Use CasesBig Data in Pharma - Overview and Use Cases
Big Data in Pharma - Overview and Use CasesJosef Scheiber
 
The Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision MedicineThe Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision Medicinemhaendel
 
Methods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big dataMethods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big dataChirag Patel
 
The Clinical Genome Conference 2014
The Clinical Genome Conference 2014The Clinical Genome Conference 2014
The Clinical Genome Conference 2014Nicole Proulx
 
Leroy Hood biomedical challenges at Skolkovo
Leroy Hood biomedical challenges at SkolkovoLeroy Hood biomedical challenges at Skolkovo
Leroy Hood biomedical challenges at Skolkovoigorod
 
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...Koray Atalag
 
AETIONOMY Overview AD/PD Conference 2015 Nice
AETIONOMY Overview AD/PD Conference 2015 NiceAETIONOMY Overview AD/PD Conference 2015 Nice
AETIONOMY Overview AD/PD Conference 2015 NiceMartin Hofmann-Apitius
 
Visualization Approaches for Biomedical Omics Data: Putting It All Together
Visualization Approaches for Biomedical Omics Data: Putting It All TogetherVisualization Approaches for Biomedical Omics Data: Putting It All Together
Visualization Approaches for Biomedical Omics Data: Putting It All TogetherNils Gehlenborg
 
SNOMED CT concept model for molecular pathology_final.pptx
SNOMED CT concept model for molecular pathology_final.pptxSNOMED CT concept model for molecular pathology_final.pptx
SNOMED CT concept model for molecular pathology_final.pptxHariHaran685388
 
Bioinformatics Strategies for Exposome 100416
Bioinformatics Strategies for Exposome 100416Bioinformatics Strategies for Exposome 100416
Bioinformatics Strategies for Exposome 100416Chirag Patel
 
Dynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical CommunicationsDynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical CommunicationsTim Clark
 
Why Life is Difficult, and What We MIght Do About It
Why Life is Difficult, and What We MIght Do About ItWhy Life is Difficult, and What We MIght Do About It
Why Life is Difficult, and What We MIght Do About ItAnita de Waard
 

Similar to Knowledge Discovery And Data Mining Of Free Text Final (20)

openEHR in Research: Linking Health Data with Computational Models
openEHR in Research: Linking Health Data with Computational ModelsopenEHR in Research: Linking Health Data with Computational Models
openEHR in Research: Linking Health Data with Computational Models
 
Semantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical InformaticsSemantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical Informatics
 
Ontology based support for brain tumour study
Ontology based support for brain tumour study Ontology based support for brain tumour study
Ontology based support for brain tumour study
 
Controlled vocabularies for medical and health research
Controlled vocabularies for medical and health researchControlled vocabularies for medical and health research
Controlled vocabularies for medical and health research
 
Ontologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological DataOntologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological Data
 
Haendel clingenetics.3.14.14
Haendel clingenetics.3.14.14Haendel clingenetics.3.14.14
Haendel clingenetics.3.14.14
 
Big Data in Pharma - Overview and Use Cases
Big Data in Pharma - Overview and Use CasesBig Data in Pharma - Overview and Use Cases
Big Data in Pharma - Overview and Use Cases
 
Bio ontology drtc-seminar_anwesha
Bio ontology drtc-seminar_anweshaBio ontology drtc-seminar_anwesha
Bio ontology drtc-seminar_anwesha
 
The Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision MedicineThe Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision Medicine
 
Methods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big dataMethods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big data
 
The Clinical Genome Conference 2014
The Clinical Genome Conference 2014The Clinical Genome Conference 2014
The Clinical Genome Conference 2014
 
Leroy Hood biomedical challenges at Skolkovo
Leroy Hood biomedical challenges at SkolkovoLeroy Hood biomedical challenges at Skolkovo
Leroy Hood biomedical challenges at Skolkovo
 
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
 
AETIONOMY Overview AD/PD Conference 2015 Nice
AETIONOMY Overview AD/PD Conference 2015 NiceAETIONOMY Overview AD/PD Conference 2015 Nice
AETIONOMY Overview AD/PD Conference 2015 Nice
 
Visualization Approaches for Biomedical Omics Data: Putting It All Together
Visualization Approaches for Biomedical Omics Data: Putting It All TogetherVisualization Approaches for Biomedical Omics Data: Putting It All Together
Visualization Approaches for Biomedical Omics Data: Putting It All Together
 
Systematic Reviews: Context & Methodology for Librarians
Systematic Reviews: Context & Methodology for LibrariansSystematic Reviews: Context & Methodology for Librarians
Systematic Reviews: Context & Methodology for Librarians
 
SNOMED CT concept model for molecular pathology_final.pptx
SNOMED CT concept model for molecular pathology_final.pptxSNOMED CT concept model for molecular pathology_final.pptx
SNOMED CT concept model for molecular pathology_final.pptx
 
Bioinformatics Strategies for Exposome 100416
Bioinformatics Strategies for Exposome 100416Bioinformatics Strategies for Exposome 100416
Bioinformatics Strategies for Exposome 100416
 
Dynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical CommunicationsDynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical Communications
 
Why Life is Difficult, and What We MIght Do About It
Why Life is Difficult, and What We MIght Do About ItWhy Life is Difficult, and What We MIght Do About It
Why Life is Difficult, and What We MIght Do About It
 

Knowledge Discovery And Data Mining Of Free Text Final

  • 1. Patrick Jamieson M.D. Logical Semantics, Inc. Knowledge Discovery and Data Mining of Free Text Radiology Reports
  • 2. Acknowledgements This presentation was made possible by Grant Number 9R44RR024929-02 from the National Center for Research Resources (NCRR), a component of the National Institutes of Health (NIH). Its contents are solely the responsibility of the authors and do not necessarily represent the official view of NCRR or NIH.
  • 3.
  • 4. Josette Jones, Ph.D., Assistant Professor of Informatics – ontology development and organization
  • 5.
  • 6. Why do we need a semantic index? What should it look like?
  • 7. Demo of MEDAT
  • 8.
  • 9.
  • 10. Correlate laboratory, genomic, or structured medical data with free text medical reports
  • 11. Extend decision support over text
  • 13.
  • 14. Coding A-Life
  • 15.
  • 16.
  • 17. Colonic adenocarcinoma , metastatic to liver
  • 18. Large bowel adenocarcinoma , metastatic to liver
  • 19. Large intestine adenocarcinoma , metastatic to liver
  • 20. Large intestinal adenocarcinoma , metastatic to liver
  • 21. Colon’s adenocarcinoma , metastatic to liver
  • 22. Adenocarcinoma of colon , with metastasis to liver
  • 23. Adenocarcinoma of colon , with liver metastasis
  • 24. Adenocarcinoma of colon , with hepatic metastasisUniqueness of Medical Data Mining, Krzysztof J. Cios and G. William Moore, Artificial Intelligence in Medicine, 2002
  • 25.
  • 26. Two semantically equivalent queries usually will get different results! Top 5 tv moments Top fivetv moments Tomasz Imielinski, AlessioSignorini, Jinyun Yan Rutgers
  • 27.
  • 28. Semantic Queries using MedLEE “Human errors can be avoided if the person who formulates the query can accurately translate the eligibility criteria into corresponding semantic classes and attributes, and thoroughly considers all possible combinations of semantic classes and attributes.” Li L, Chase HS, Patel, CO, Friedman C, Weng C. Comparing ICD9-Encoded Diagnoses and NLP-Processed Discharge Summaries for Clinical Trials Pre-Screening: A Case Study. AMIA AnnuSymp Proc. 2008; 2008: 404–408.
  • 29.
  • 30. exclusion: atrialfibrillation|atrialflutter|hemorrhage|hemorrhagicstroke|cerebralhemorrhage|TIA|transient ischemic Attack|Afb|afb|transient ischemic +sectname:report diagnosis item|report history of present illness item-certainty:very low certainty|lowcertainty|minimalcriteria|negative|no|rule out-status:resolved|removed|removal|end|healed|inactive|ruleout|familyhistory|atrisk|high risk
  • 31.
  • 33. Searches both coded and non coded content
  • 34.
  • 35.
  • 36. Visualized lungs are normal?
  • 37. There are no pulmonary abnormalities?
  • 38. The right lung is normal?
  • 39. The left upper lobe is normal?Most NLP systems simply can not semantically index all the information accurately
  • 40. Knowledge and Language Understanding Example: “Healing both bone fracture of the distal forearm.” Meaning: There is a healing distal radius fracture. There is a healing distal ulnar fracture.
  • 41. Language Understanding Another Example “Global atrophy without acute abnormality” Meaning: There is diffuse cerebral atrophy. There is no acute intracranial abnormality.
  • 42.
  • 43. A widely used method for capturing logical assertions is through first-order predicate logic (FOPC).
  • 44.
  • 45. Natural Language Engineering The last two decades have been marked by a complete paradigm shift in computational linguistics. Frustrated by the inability of applications based on explicit linguistic knowledge to scale up to real-world needs, and, perhaps more deeply, frustrated with the dominating theories in formal linguistics, we looked instead to corpora that reflect language use as our sources of (implicit) knowledge. ShulyWintner University of Haifa
  • 46. Semantic Resources Underdeveloped “Natural language systems generally need a large number of training examples to train, refine, or test the system.” Friedman, Carol. Semantic text parsing for patient records in medical informatics. Knowledge Management and Data Mining In Biomedicine. Editors Chen H, Fuller S, Friedman C, Hersh W. Springer 2005, Pg 431.
  • 47. CLEF aims to develop a high quality, secure and interoperable information repository, derived from operational electronic patient records to enable access to patient information in support of clinical care and biomedical research The CLEF gold standard corpus contains 167 clinical documents, chosen from 565K CLEF corpus.
  • 48. Clinical E-Science (CLEF) Annotation Framework “There is a left lower lobe pulmonary infiltrate” Arg1: (Condition) pulmonary infiltrate Arg2: Locus “left lower lobe”. “A standard anteroposterior radiograph shows a tibial shaft fracture.” Has_finding Arg1: (Investigation) “anteroposterior radiograph” Arg2: (Condition) “tibial shaft fracture”
  • 49.
  • 50. Are the number and type of arguments defined for these predicates?
  • 51. Do the concepts, which fill the argument slots, adequately cover the domain?
  • 52.
  • 53.
  • 54. Each sentence was reviewed by three physicians for correctness in annotation.
  • 55.
  • 56. Recall - Information Retrieval Only
  • 58.
  • 60.
  • 61. Phrasal Synonymy Phrases such as “pelvic calcifications consistent with phleboliths” and “several pelvic phleboliths are present” are semantically equivalent, but produce different PASs using MetaMap
  • 62. Concept Representation For example, “gray/white matter differentiation” is defined as the difference in appearance of parts of the brain on a CT scan or MRI (semantic rank 151), but is not in the UMLS or SNOMED CT
  • 63.
  • 64. 15,306 propositions capture the meaning of 2,561,330 sentences.
  • 65. Roughly 60% of the corpus is annotated60%
  • 66. Process for Creating Semantic Index
  • 67.
  • 68.
  • 69.
  • 70.
  • 71. Integration of statistical and rule base semantic indexing
  • 72. Enhancements to KDD user interface
  • 73.