Successfully reported this slideshow.
Your SlideShare is downloading. ×

Haystack 2019 - Ontology and Oncology: NLP for Precision Medicine - Sean Mullane

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 22 Ad

Haystack 2019 - Ontology and Oncology: NLP for Precision Medicine - Sean Mullane

Download to read offline

This session gives an overview of the importance of precision medicine in cancer treatment and describes an approach used by UVA in the TREC 2018 Precision Medicine workshop. The PM track aims to encourage research into precision oncology medicine to provide more relevant information to physicians and researchers.

For this task we ranked articles from a corpus of bio-medical article abstracts from PubMed and MEDLINE for relevance for the treatment, prevention, and prognosis of the disease given specific medical information about each patient.

We demonstrated using a flexible graph-based query expansion method that existing medical ontologies can be leveraged to improve precision in document relevance ranking with little to no other clinical input.

This session gives an overview of the importance of precision medicine in cancer treatment and describes an approach used by UVA in the TREC 2018 Precision Medicine workshop. The PM track aims to encourage research into precision oncology medicine to provide more relevant information to physicians and researchers.

For this task we ranked articles from a corpus of bio-medical article abstracts from PubMed and MEDLINE for relevance for the treatment, prevention, and prognosis of the disease given specific medical information about each patient.

We demonstrated using a flexible graph-based query expansion method that existing medical ontologies can be leveraged to improve precision in document relevance ranking with little to no other clinical input.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Haystack 2019 - Ontology and Oncology: NLP for Precision Medicine - Sean Mullane (20)

Advertisement

More from OpenSource Connections (20)

Recently uploaded (20)

Advertisement

Haystack 2019 - Ontology and Oncology: NLP for Precision Medicine - Sean Mullane

  1. 1. Ontology & Oncology: NLP for Precision Medicine Sean Mullane Data Scientist Quality & Performance Improvement
  2. 2. Why Precision Medicine? And Why Oncology? The Challenge • Single patient may have tumors with a variety of mutations and phenotypes. • A single treatment may work against one but not against others. • Incomplete treatment buys a patient little to no additional time.
  3. 3. Why Precision Medicine? And Why Oncology? The Promise • Allow treatments to be tailored to unique genetic changes in a person’s cancer. • Target specific molecules that are involved in the growth, progression, and spread of cancer. • Can be more effective against cancer cells while being less toxic to the patient.
  4. 4. Milestones in Cancer Understanding 1937 1960 1976 1979 1994 1998 2001 2010 2011 2014 https://www.cancer.gov/research/progress/250-years-milestones
  5. 5. Fighting Information Overload The Needle in a Haystack Problem • Scientific publications increasing at an exponential rate. • Increasingly difficult for physicians to keep up. • How can we help? PubMed: “Cancer”
  6. 6. TREC Scientific Abstracts Query Task Challenge: Return the most relevant publications for treatment of a unique patient. Query topics are synthetic patient cases created by MD Anderson precision oncologists.
  7. 7. Want a Text Model for Query Relevance • Vector space-based scoring model • Term frequency/inverse document frequency weighting • Field length normalization • Coordination factors • Term/query clause boosting
  8. 8. Simple Solution: Use Elasticsearch • JSON-based REST API • Lucene query language • Open source
  9. 9. Problem Solved? No. How can we improve on this? A tremendous amount of knowledge exists in medical science. How do we leverage that collected knowledge to improve query relevance results? Solution: Text term embedding and graph-based query expansion.
  10. 10. Medical Ontologies Unified Medical Language System® • A set of ontologies sources from National Library of Medicine. • The UMLS ® Metathesaurus and Semantic Network encode sets of concepts and relationships between them. Major groupings of semantic types: • organism • anatomical structure • biologic function • chemical • physical object • idea or concept Major categories of non-hierarchical relationships: • physically related to • spatially related to • temporally related to • functionally related to • conceptually related to
  11. 11. How to use ontologies effectively? Text Term Embedding Apache cTAKES: annotates documents with UMLS concepts. Maps words and phrases in the text to one or more discrete, identified concepts (CUIs) in the Metathesaurus.
  12. 12. How to use ontologies effectively? Text Term Embedding cTAKES annotates: • PubMed article abstracts with UMLS concepts. • Queries for diagnosis, gene, demographics. Article Embedding …, C0278883, C0030012, C0025202, C1523298, C0599156, … Query Terms { "diagnosis": "melanoma", "diagnosisCUI": "C0025202", "gene": "BRAF (V600E)", "geneCUI": "C3250916 ", "demographics": " 64-year-old male", "demoCUI": "C0001675" }
  13. 13. Benefit of Concept Embedding malignant neoplasm cancer malignancy C0006826 C0006826 Embedding reduces dimensionality of the vector space and increases probability of exact article <-> query matches.
  14. 14. Performance Gain from Concept Embedding
  15. 15. Query Expansion Definition: Rewriting a given query to improve document retrieval performance Example: “Melanoma” {Melanoma, Melanocytic Neoplasm, MIRLET7A3} Would like to increase both precision and recall. • Concept embedding improved precision. • Can we increase improve recall as well?
  16. 16. Introduce Relatedness Graph solution Use visuals to explain relatedness graph
  17. 17. Introduce Relatedness Graph solution Use visuals to explain relatedness graph
  18. 18. An Exclusivity-Based Relatedness Metric Hulpuş, Prangnawarat, and Hayes. “Path-Based Semantic Relatedness on Linked Data and Its Use to Word and Entity Disambiguation, 2015
  19. 19. Introduce Relatedness Graph solution Use visuals to explain relatedness graph
  20. 20. Full System Overall best median performance from queries expanded with exact concepts combined with boosted, expanded concepts based on graph neighbors selected by relatedness
  21. 21. Conclusion Precision medicine can only succeed with careful targeting of treatments to patients. Existing medical knowledge base is difficult to apply, but valuable. Concept embedding and concept graph traversal improve precision and recall of relevant literature for individual patients.
  22. 22. Q&A https://github.com/seanmullane/TREC_2018_UVA sean.mullane@virginia.edu

Editor's Notes

  • From https://www.cancer.gov/about-cancer/treatment/types/targeted-therapies/targeted-therapies-fact-sheet
    Try to get rid of most of this text, maybe use mindmap-like image of factors that affect treatment
  • From https://www.cancer.gov/about-cancer/treatment/types/targeted-therapies/targeted-therapies-fact-sheet
    Try to get rid of most of this text, maybe use image of molecule and receptor
  • Use this slide to describe the evolution in thought on cancer treatment that has led to precision medicine being important in this field
  • Data from http://dan.corlan.net/medline-trend.html
    THIS IS THE SEGUE SLIDE FROM PM  TREC
    Maybe slightly summarize the words here, otherwise good
  • Move bottom to top and top to bottom
  • Emphasize that we wanted to do the things the last point describes, then that elasticsearch was used because it gives us that. Maybe split into two and replace most talk about elasticsearch with the logo
  • Emphasize that we wanted to do the things the last point describes, then that elasticsearch was used because it gives us that. Maybe split into two and replace most talk about elasticsearch with the logo
  • Maybe condense text slight, but pretty good slide
  • Source: http://wayback.archive-it.org/org-350/20180312141727/https://www.nlm.nih.gov/pubs/factsheets/umlssemn.html
    Maybe highlight and rephrase points at the top
  • Condense text here
  • Rephrase left side points 1,2: maybe use arrows between words to emphasize mapping
    Maybe move 3rd bullet to new slide or next slide

    Embedding reduces dimensionality of the vector space and increases probability of exact article<->query matches.
  • Rephrase left side points 1,2: maybe use arrows between words to emphasize mapping
    Maybe move 3rd bullet to new slide or next slide

    Embedding reduces dimensionality of the vector space and increases probability of exact article<->query matches.
  • Text + concept embedding confers significantly higher accuracy than raw text or concept embedding alone

×