SlideShare a Scribd company logo
1 of 8
Download to read offline
Mining and Processing of Unstructured Medical Data
Cindy Perscheid
Festival of Genomics
London, Jan 19, 2016
■  Doctor‘s and discharge letters
■  Clinical trial descriptions
■  Scientific publications
Unstructured Medical Data
Information Hidden in Text
Perscheid,
Schapranow
Processing of
Unstructured
Medical Data
Chart 2
■  Huge amount of data: Pubmed with references to +25 Million articles
■  Restricted querying: Keyword search
■  Multilingual
Unstructured Medical Data
Challenges and Limitations
Perscheid,
Schapranow
Processing of
Unstructured
Medical Data
Chart 3
[Patients 65 years]NP or [older]ADJP [with]PP [breast cancer]NP ...[Patients 65 years]NP or [older]ADJP [with]PP [breast cancer]NP ...
■  Named Entity Recognition: Identify keywords
■  Part-Of-Speech Tagging: Identify grammatical function of words
■  Parsing: Identify sentence structure and components
□  Chunking: Combine words and POS tags to chunks
□  Relation Extraction: Identify relations between sentence parts
■  Semantic Role Labeling: Identify specific roles in sentence
■  …
Natural Language Processing
Selected Methods
Perscheid,
Schapranow
Processing of
Unstructured
Medical Data
Noun
Noun Noun
Disease
Preposition
Person
Adjective
Chart 4
Noun
■  IMDB provides text analysis features, e.g.
□  Fulltext indexing
□  Entity Recognition
□  Tokenization/Chunking
□  Fuzzy search
■  Mechanisms can be made domain-specific by specifying
□  Dictionaries
□  CGUL rules containing regular expressions with linguistic attributes
Outlook
IMDB Textual Analysis Features
T Text Retrieval
and Extraction
Multi-Core and
Parallelization
Reduction of
Layers
x
x
Perscheid,
Schapranow
Processing of
Unstructured
Medical Data
Chart 5
?
Natural Language Processing
Applications
Perscheid,
Schapranow
Processing of
Unstructured
Medical Data
Chart 6
HelloBonjour
Text
Summarization
Question Answering Systems
Machine
Translation
Information Retrieval
and Extraction
Doctor‘s Letter
Explanation
major
depression
What disease is
mirtazapine
predominantly used for?
?
■  In short: Slow tools, wrong results
□  Too hard: Natural language is complex
□  Too much data: >25 Million papers in PubMed…
Application Example: Question Answering
Still a lot to Improve…
Perscheid,
Schapranow
Processing of
Unstructured
Medical Data
Credit: Dr. Mariana Neves, Hasso Plattner Institute
Chart 7
Thanks!
Hasso Plattner Institute
Enterprise Platform & Integration Concepts
August-Bebel-Str. 88
14482 Potsdam, Germany
Dr. Matthieu-P. Schapranow
schapranow@hpi.de
http://we.analyzegenomes.com/
Cindy Perscheid, M. Sc.
cindy.perscheid@hpi.de
Perscheid,
Schapranow
Processing of
Unstructured
Medical Data
Chart 8

More Related Content

What's hot

Analyze Genomes: A Federated In-Memory Database System For Life Sciences
Analyze Genomes: A Federated In-Memory Database System For Life SciencesAnalyze Genomes: A Federated In-Memory Database System For Life Sciences
Analyze Genomes: A Federated In-Memory Database System For Life SciencesMatthieu Schapranow
 
Analyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineAnalyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineMatthieu Schapranow
 
In-Memory Data Management for Systems Medicine
In-Memory Data Management for Systems MedicineIn-Memory Data Management for Systems Medicine
In-Memory Data Management for Systems MedicineMatthieu Schapranow
 
Analyze Genomes: In-memory Apps supporting Precision Medicine
Analyze Genomes: In-memory Apps supporting Precision MedicineAnalyze Genomes: In-memory Apps supporting Precision Medicine
Analyze Genomes: In-memory Apps supporting Precision MedicineMatthieu Schapranow
 
A Platform for Integrated Genome Data Analysis
A Platform for Integrated Genome Data AnalysisA Platform for Integrated Genome Data Analysis
A Platform for Integrated Genome Data AnalysisMatthieu Schapranow
 
BioNRW: Big Medical Data: Challenge or Potential
BioNRW: Big Medical Data: Challenge or PotentialBioNRW: Big Medical Data: Challenge or Potential
BioNRW: Big Medical Data: Challenge or PotentialMatthieu Schapranow
 
Big Data in Genomics: Opportunities and Challenges
Big Data in Genomics: Opportunities and ChallengesBig Data in Genomics: Opportunities and Challenges
Big Data in Genomics: Opportunities and ChallengesMatthieu Schapranow
 
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...Matthieu Schapranow
 
In-Memory Apps for Precision Medicine
In-Memory Apps for Precision MedicineIn-Memory Apps for Precision Medicine
In-Memory Apps for Precision MedicineMatthieu Schapranow
 
A Federated In-Memory Database System for Life Sciences
A Federated In-Memory Database System for Life SciencesA Federated In-Memory Database System for Life Sciences
A Federated In-Memory Database System for Life SciencesMatthieu Schapranow
 
Patient Journey in Oncology 2025: Molecular Tumour Boards in Practice
Patient Journey in Oncology 2025: Molecular Tumour Boards in PracticePatient Journey in Oncology 2025: Molecular Tumour Boards in Practice
Patient Journey in Oncology 2025: Molecular Tumour Boards in PracticeMatthieu Schapranow
 
AnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital Health
AnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital HealthAnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital Health
AnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital HealthMatthieu Schapranow
 
How will AI affect the patient journey of the future?
How will AI affect the patient journey of the future?How will AI affect the patient journey of the future?
How will AI affect the patient journey of the future?Matthieu Schapranow
 
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...Matthieu Schapranow
 
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...Matthieu Schapranow
 
ICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
ICT Platform to Enable Consortium Work for Systems Medicine of Heart FailureICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
ICT Platform to Enable Consortium Work for Systems Medicine of Heart FailureMatthieu Schapranow
 
Analyze Genomes: In-memory Apps for Next-generation Life Sciences Research
Analyze Genomes: In-memory Apps for Next-generation Life Sciences ResearchAnalyze Genomes: In-memory Apps for Next-generation Life Sciences Research
Analyze Genomes: In-memory Apps for Next-generation Life Sciences ResearchMatthieu Schapranow
 
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...Matthieu Schapranow
 

What's hot (20)

"When time matters..."
"When time matters...""When time matters..."
"When time matters..."
 
Analyze Genomes: A Federated In-Memory Database System For Life Sciences
Analyze Genomes: A Federated In-Memory Database System For Life SciencesAnalyze Genomes: A Federated In-Memory Database System For Life Sciences
Analyze Genomes: A Federated In-Memory Database System For Life Sciences
 
Analyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineAnalyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision Medicine
 
In-Memory Data Management for Systems Medicine
In-Memory Data Management for Systems MedicineIn-Memory Data Management for Systems Medicine
In-Memory Data Management for Systems Medicine
 
Analyze Genomes: In-memory Apps supporting Precision Medicine
Analyze Genomes: In-memory Apps supporting Precision MedicineAnalyze Genomes: In-memory Apps supporting Precision Medicine
Analyze Genomes: In-memory Apps supporting Precision Medicine
 
A Platform for Integrated Genome Data Analysis
A Platform for Integrated Genome Data AnalysisA Platform for Integrated Genome Data Analysis
A Platform for Integrated Genome Data Analysis
 
BioNRW: Big Medical Data: Challenge or Potential
BioNRW: Big Medical Data: Challenge or PotentialBioNRW: Big Medical Data: Challenge or Potential
BioNRW: Big Medical Data: Challenge or Potential
 
Big Data in Genomics: Opportunities and Challenges
Big Data in Genomics: Opportunities and ChallengesBig Data in Genomics: Opportunities and Challenges
Big Data in Genomics: Opportunities and Challenges
 
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
 
In-Memory Apps for Precision Medicine
In-Memory Apps for Precision MedicineIn-Memory Apps for Precision Medicine
In-Memory Apps for Precision Medicine
 
A Federated In-Memory Database System for Life Sciences
A Federated In-Memory Database System for Life SciencesA Federated In-Memory Database System for Life Sciences
A Federated In-Memory Database System for Life Sciences
 
Patient Journey in Oncology 2025: Molecular Tumour Boards in Practice
Patient Journey in Oncology 2025: Molecular Tumour Boards in PracticePatient Journey in Oncology 2025: Molecular Tumour Boards in Practice
Patient Journey in Oncology 2025: Molecular Tumour Boards in Practice
 
AnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital Health
AnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital HealthAnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital Health
AnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital Health
 
How will AI affect the patient journey of the future?
How will AI affect the patient journey of the future?How will AI affect the patient journey of the future?
How will AI affect the patient journey of the future?
 
AI in Oncology
AI in OncologyAI in Oncology
AI in Oncology
 
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
 
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...
 
ICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
ICT Platform to Enable Consortium Work for Systems Medicine of Heart FailureICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
ICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
 
Analyze Genomes: In-memory Apps for Next-generation Life Sciences Research
Analyze Genomes: In-memory Apps for Next-generation Life Sciences ResearchAnalyze Genomes: In-memory Apps for Next-generation Life Sciences Research
Analyze Genomes: In-memory Apps for Next-generation Life Sciences Research
 
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
 

Similar to Festival of Genomics 2016 London: Mining and Processing of Unstructured Medical Data

Understanding medical concepts and codes through NLP methods
Understanding medical concepts and codes through NLP methodsUnderstanding medical concepts and codes through NLP methods
Understanding medical concepts and codes through NLP methodsAshis Chanda
 
Festival of Genomics 2016 London: Challenges of Big Medical Data?
Festival of Genomics 2016 London: Challenges of Big Medical Data?Festival of Genomics 2016 London: Challenges of Big Medical Data?
Festival of Genomics 2016 London: Challenges of Big Medical Data?Matthieu Schapranow
 
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017MLconf
 
Big Medical Data – Challenge or Potential?
Big Medical Data – Challenge or Potential?Big Medical Data – Challenge or Potential?
Big Medical Data – Challenge or Potential?Matthieu Schapranow
 
Turning Big Data into Precision Medicine
Turning Big Data into Precision MedicineTurning Big Data into Precision Medicine
Turning Big Data into Precision MedicineMatthieu Schapranow
 
Demystifying Text Analytics and NLP in Healthcare
Demystifying Text Analytics and NLP in HealthcareDemystifying Text Analytics and NLP in Healthcare
Demystifying Text Analytics and NLP in HealthcareHealth Catalyst
 
Semantic Data Normalization For Efficient Clinical Trial Research
Semantic Data Normalization For Efficient Clinical Trial ResearchSemantic Data Normalization For Efficient Clinical Trial Research
Semantic Data Normalization For Efficient Clinical Trial ResearchOntotext
 
How Real-time Analysis turns Big Medical Data into Precision Medicine
How Real-time Analysis turns Big Medical Data into Precision MedicineHow Real-time Analysis turns Big Medical Data into Precision Medicine
How Real-time Analysis turns Big Medical Data into Precision MedicineMatthieu Schapranow
 
Processing of Big Medical Data in Personalized Medicine: Challenge or Potential
Processing of Big Medical Data in Personalized Medicine: Challenge or PotentialProcessing of Big Medical Data in Personalized Medicine: Challenge or Potential
Processing of Big Medical Data in Personalized Medicine: Challenge or PotentialMatthieu Schapranow
 
In-memory Applications for Informed Patients
In-memory Applications for Informed PatientsIn-memory Applications for Informed Patients
In-memory Applications for Informed PatientsMatthieu Schapranow
 
Research methodology
Research methodologyResearch methodology
Research methodologyTosif Ahmad
 

Similar to Festival of Genomics 2016 London: Mining and Processing of Unstructured Medical Data (15)

Understanding medical concepts and codes through NLP methods
Understanding medical concepts and codes through NLP methodsUnderstanding medical concepts and codes through NLP methods
Understanding medical concepts and codes through NLP methods
 
Festival of Genomics 2016 London: Challenges of Big Medical Data?
Festival of Genomics 2016 London: Challenges of Big Medical Data?Festival of Genomics 2016 London: Challenges of Big Medical Data?
Festival of Genomics 2016 London: Challenges of Big Medical Data?
 
ML to cure the world
ML to cure the worldML to cure the world
ML to cure the world
 
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
 
Big Medical Data – Challenge or Potential?
Big Medical Data – Challenge or Potential?Big Medical Data – Challenge or Potential?
Big Medical Data – Challenge or Potential?
 
Turning Big Data into Precision Medicine
Turning Big Data into Precision MedicineTurning Big Data into Precision Medicine
Turning Big Data into Precision Medicine
 
Demystifying Text Analytics and NLP in Healthcare
Demystifying Text Analytics and NLP in HealthcareDemystifying Text Analytics and NLP in Healthcare
Demystifying Text Analytics and NLP in Healthcare
 
Scientific Paper writing
Scientific Paper writingScientific Paper writing
Scientific Paper writing
 
Semantic Data Normalization For Efficient Clinical Trial Research
Semantic Data Normalization For Efficient Clinical Trial ResearchSemantic Data Normalization For Efficient Clinical Trial Research
Semantic Data Normalization For Efficient Clinical Trial Research
 
How Real-time Analysis turns Big Medical Data into Precision Medicine
How Real-time Analysis turns Big Medical Data into Precision MedicineHow Real-time Analysis turns Big Medical Data into Precision Medicine
How Real-time Analysis turns Big Medical Data into Precision Medicine
 
Introduction to Research methodology
Introduction to Research methodology Introduction to Research methodology
Introduction to Research methodology
 
Processing of Big Medical Data in Personalized Medicine: Challenge or Potential
Processing of Big Medical Data in Personalized Medicine: Challenge or PotentialProcessing of Big Medical Data in Personalized Medicine: Challenge or Potential
Processing of Big Medical Data in Personalized Medicine: Challenge or Potential
 
Types of Research Designs RS Mehta
Types of Research Designs RS MehtaTypes of Research Designs RS Mehta
Types of Research Designs RS Mehta
 
In-memory Applications for Informed Patients
In-memory Applications for Informed PatientsIn-memory Applications for Informed Patients
In-memory Applications for Informed Patients
 
Research methodology
Research methodologyResearch methodology
Research methodology
 

Recently uploaded

The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 

Recently uploaded (20)

The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 

Festival of Genomics 2016 London: Mining and Processing of Unstructured Medical Data

  • 1. Mining and Processing of Unstructured Medical Data Cindy Perscheid Festival of Genomics London, Jan 19, 2016
  • 2. ■  Doctor‘s and discharge letters ■  Clinical trial descriptions ■  Scientific publications Unstructured Medical Data Information Hidden in Text Perscheid, Schapranow Processing of Unstructured Medical Data Chart 2
  • 3. ■  Huge amount of data: Pubmed with references to +25 Million articles ■  Restricted querying: Keyword search ■  Multilingual Unstructured Medical Data Challenges and Limitations Perscheid, Schapranow Processing of Unstructured Medical Data Chart 3
  • 4. [Patients 65 years]NP or [older]ADJP [with]PP [breast cancer]NP ...[Patients 65 years]NP or [older]ADJP [with]PP [breast cancer]NP ... ■  Named Entity Recognition: Identify keywords ■  Part-Of-Speech Tagging: Identify grammatical function of words ■  Parsing: Identify sentence structure and components □  Chunking: Combine words and POS tags to chunks □  Relation Extraction: Identify relations between sentence parts ■  Semantic Role Labeling: Identify specific roles in sentence ■  … Natural Language Processing Selected Methods Perscheid, Schapranow Processing of Unstructured Medical Data Noun Noun Noun Disease Preposition Person Adjective Chart 4 Noun
  • 5. ■  IMDB provides text analysis features, e.g. □  Fulltext indexing □  Entity Recognition □  Tokenization/Chunking □  Fuzzy search ■  Mechanisms can be made domain-specific by specifying □  Dictionaries □  CGUL rules containing regular expressions with linguistic attributes Outlook IMDB Textual Analysis Features T Text Retrieval and Extraction Multi-Core and Parallelization Reduction of Layers x x Perscheid, Schapranow Processing of Unstructured Medical Data Chart 5
  • 6. ? Natural Language Processing Applications Perscheid, Schapranow Processing of Unstructured Medical Data Chart 6 HelloBonjour Text Summarization Question Answering Systems Machine Translation Information Retrieval and Extraction Doctor‘s Letter Explanation major depression What disease is mirtazapine predominantly used for? ?
  • 7. ■  In short: Slow tools, wrong results □  Too hard: Natural language is complex □  Too much data: >25 Million papers in PubMed… Application Example: Question Answering Still a lot to Improve… Perscheid, Schapranow Processing of Unstructured Medical Data Credit: Dr. Mariana Neves, Hasso Plattner Institute Chart 7
  • 8. Thanks! Hasso Plattner Institute Enterprise Platform & Integration Concepts August-Bebel-Str. 88 14482 Potsdam, Germany Dr. Matthieu-P. Schapranow schapranow@hpi.de http://we.analyzegenomes.com/ Cindy Perscheid, M. Sc. cindy.perscheid@hpi.de Perscheid, Schapranow Processing of Unstructured Medical Data Chart 8