This presentation covers the "Mining and Processing of Unstructured Medical Data" presentation of the 2016 Festival of Genomics workshop "Big Medical Data in Precision Medicine: Challenges or Opportunities?" on Jan 19, 2016 in London.
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medical Data
1. Mining and Processing of Unstructured Medical Data
Cindy Perscheid
Festival of Genomics
London, Jan 19, 2016
2. ■ Doctor‘s and discharge letters
■ Clinical trial descriptions
■ Scientific publications
Unstructured Medical Data
Information Hidden in Text
Perscheid,
Schapranow
Processing of
Unstructured
Medical Data
Chart 2
3. ■ Huge amount of data: Pubmed with references to +25 Million articles
■ Restricted querying: Keyword search
■ Multilingual
Unstructured Medical Data
Challenges and Limitations
Perscheid,
Schapranow
Processing of
Unstructured
Medical Data
Chart 3
4. [Patients 65 years]NP or [older]ADJP [with]PP [breast cancer]NP ...[Patients 65 years]NP or [older]ADJP [with]PP [breast cancer]NP ...
■ Named Entity Recognition: Identify keywords
■ Part-Of-Speech Tagging: Identify grammatical function of words
■ Parsing: Identify sentence structure and components
□ Chunking: Combine words and POS tags to chunks
□ Relation Extraction: Identify relations between sentence parts
■ Semantic Role Labeling: Identify specific roles in sentence
■ …
Natural Language Processing
Selected Methods
Perscheid,
Schapranow
Processing of
Unstructured
Medical Data
Noun
Noun Noun
Disease
Preposition
Person
Adjective
Chart 4
Noun
5. ■ IMDB provides text analysis features, e.g.
□ Fulltext indexing
□ Entity Recognition
□ Tokenization/Chunking
□ Fuzzy search
■ Mechanisms can be made domain-specific by specifying
□ Dictionaries
□ CGUL rules containing regular expressions with linguistic attributes
Outlook
IMDB Textual Analysis Features
T Text Retrieval
and Extraction
Multi-Core and
Parallelization
Reduction of
Layers
x
x
Perscheid,
Schapranow
Processing of
Unstructured
Medical Data
Chart 5
6. ?
Natural Language Processing
Applications
Perscheid,
Schapranow
Processing of
Unstructured
Medical Data
Chart 6
HelloBonjour
Text
Summarization
Question Answering Systems
Machine
Translation
Information Retrieval
and Extraction
Doctor‘s Letter
Explanation
major
depression
What disease is
mirtazapine
predominantly used for?
?
7. ■ In short: Slow tools, wrong results
□ Too hard: Natural language is complex
□ Too much data: >25 Million papers in PubMed…
Application Example: Question Answering
Still a lot to Improve…
Perscheid,
Schapranow
Processing of
Unstructured
Medical Data
Credit: Dr. Mariana Neves, Hasso Plattner Institute
Chart 7
8. Thanks!
Hasso Plattner Institute
Enterprise Platform & Integration Concepts
August-Bebel-Str. 88
14482 Potsdam, Germany
Dr. Matthieu-P. Schapranow
schapranow@hpi.de
http://we.analyzegenomes.com/
Cindy Perscheid, M. Sc.
cindy.perscheid@hpi.de
Perscheid,
Schapranow
Processing of
Unstructured
Medical Data
Chart 8