SlideShare a Scribd company logo
Apache cTAKES
NLP in Healthcare
Alex Zbarcea (FannieMae / cTAKES committer)
2
Episode of Care
medications
imaging
pathology
inpatient
services and procedures
outpatient
services and procedures
medications
imaging
pathology
inpatient
services and procedures
outpatient
services and procedures
research
EMR
notes
Natural Language Processing (NLP)
3
“A way for computers to analyze, understand and derive
meaning from human language” - algorithmia [1]
[1] - https://blog.algorithmia.com/introduction-natural-language-processing-nlp/
● Feasibility
Big Data / Machine Learning / Apache Projects
● Challenges
Ontology / Specialization / Anonymization
● Approaches:
Extraction
Generation
● Algorithms:
Rule-based
Machine Learning
● Linguistic annotations:
Penn TreeBank [1]
GENIA [2]
How it works
4
corpus
[1] - https://www.clips.uantwerpen.be/pages/mbsp-tags
[2] - https://orbit.nlm.nih.gov/browse-repository/dataset/human-annotated/83-genia-corpus
5
Apache cTAKES: Overview
plain text
CDA
Named Entity
* drug
* disease/disorder
* sign/symptom
* anatomical site
* procedures
Pipeline based - combining techniques:
● Rule-based
● Machine Learning (ML)
Java, Modular
Measurable performance (standard)
Boundary detection
Tokenization
Normalization (Lemma)
Part-of-speech
Shallow parsing
Entity recognition
cTAKES System
6
NLM
Apache OpenNLP
SPECIALIST NLP Tools
Apache Lucene
UMLS, SNOMED-CT, RxNORM
ICD10/9, Mayo Clinic, Custom
Tasks in NLP
(cTAKES example)
7
cTAKES: Pipelines
8
(e.g. examples/pipeline/ProcessDir.piper )
// This file contains commands and parameters to run the ctakes-examples "Hello World"
pipeline
readFiles org/apache/ctakes/examples/notes
// Load a simple token processing pipeline from another pipeline file
load DefaultTokenizerPipeline.piper
// Add non-core annotators
add ContextDependentTokenizerAnnotator
// Collect discovered Entity information for post-run access
collectEntities
https://cwiki.apache.org/confluence/display/CTAKES/Piper+Files
● Documentation (confluence [3])
● ctakes-examples
● main Classes
[alex ~/ctakes {trunk %} ]$ grep -nRI --include="*.java" "main(String[] args)" | wc -l
171
cTAKES: Exploring Examples
9
[1] - https://builds.apache.org/analysis
[2] - https://builds.apache.org/view/C/view/Apache%20cTAKES/
[3] - https://cwiki.apache.org//confluence/display/CTAKES
● smokingstatus
● coreference
● NexEx
● pipelines
● training
● temporal
● relationextractor
● etc
● Run on real data (i.a. LibreHealth / OpenEMR)
Apache cTAKES Demo
10 [1] - https://github.com/azbarcea/ctakes-examples
Apache Software Foundation
● Community
○ Linguist experts
Users
Developers
● Mature Software Lifecycle
○ Support
Issues
SCM - Collaboration
Jenkins
Sonar
Distribution
● Popularize
11
Get involved
(You don’t need to be a software developer)
12
● Help new users and provide feedback
● Give feedback on required features
● Write or Update documentation
● Test the code and report bugs
● Fix bugs
● Write and update the software
● Create artwork
● Extend docs references
● Recommend the project to others
● Gamification
● Volunteer valuable skills
● Learn about communities - the Apache Way
● Requirements Engineering
● Learn about NLP and Healthcare
● What a strong product is about
● Test Automation and Software Engineering
● Develop code with high quality
● Build strong Software Development skills
● Explore your creativity
● Marketing
● Use your time wisely
● Help research community
git: https://github.com/apache/ctakes
wiki: https://cwiki.apache.org//confluence/display/CTAKES
e-mail: https://ctakes.apache.org/mailing-lists.html
Thanks!
Any questions?
You can find me at:
● https://linkedin.com/in/azbarcea
● alexz@apache.org
13

More Related Content

Similar to Apache cTAKES- NLP in Healthcare

Presentation of OpenNLP
Presentation of OpenNLPPresentation of OpenNLP
Presentation of OpenNLP
Robert Viseur
 
Resume_052715
Resume_052715Resume_052715
Resume_052715
Phu Sam
 
Web Sphere Problem Determination Ext
Web Sphere Problem Determination ExtWeb Sphere Problem Determination Ext
Web Sphere Problem Determination Ext
Rohit Kelapure
 
The power of linux advanced tracer [POUG18]
The power of linux advanced tracer [POUG18]The power of linux advanced tracer [POUG18]
The power of linux advanced tracer [POUG18]
Mahmoud Hatem
 

Similar to Apache cTAKES- NLP in Healthcare (20)

Shorten Device Boot Time for Automotive IVI and Navigation Systems
Shorten Device Boot Time for Automotive IVI and Navigation SystemsShorten Device Boot Time for Automotive IVI and Navigation Systems
Shorten Device Boot Time for Automotive IVI and Navigation Systems
 
OpenTelemetry 101 FTW
OpenTelemetry 101 FTWOpenTelemetry 101 FTW
OpenTelemetry 101 FTW
 
Logs: Can’t Hate Them, Won’t Love Them: Brief Log Management Class by Anton C...
Logs: Can’t Hate Them, Won’t Love Them: Brief Log Management Class by Anton C...Logs: Can’t Hate Them, Won’t Love Them: Brief Log Management Class by Anton C...
Logs: Can’t Hate Them, Won’t Love Them: Brief Log Management Class by Anton C...
 
Maven university-course
Maven university-courseMaven university-course
Maven university-course
 
Presentation of OpenNLP
Presentation of OpenNLPPresentation of OpenNLP
Presentation of OpenNLP
 
Integrating Apache Camel with Apache Syncope
Integrating Apache Camel with Apache SyncopeIntegrating Apache Camel with Apache Syncope
Integrating Apache Camel with Apache Syncope
 
Getting Access to ALCF Resources and Services
Getting Access to ALCF Resources and ServicesGetting Access to ALCF Resources and Services
Getting Access to ALCF Resources and Services
 
PostgreSQL Portland Performance Practice Project - Database Test 2 Background
PostgreSQL Portland Performance Practice Project - Database Test 2 BackgroundPostgreSQL Portland Performance Practice Project - Database Test 2 Background
PostgreSQL Portland Performance Practice Project - Database Test 2 Background
 
Linaro Connect 2016 (BKK16) - Introduction to LISA
Linaro Connect 2016 (BKK16) - Introduction to LISALinaro Connect 2016 (BKK16) - Introduction to LISA
Linaro Connect 2016 (BKK16) - Introduction to LISA
 
Introduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKennaIntroduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKenna
 
Internship msc cs
Internship msc csInternship msc cs
Internship msc cs
 
OpenSCAP Overview(security scanning for docker image and container)
OpenSCAP Overview(security scanning for docker image and container)OpenSCAP Overview(security scanning for docker image and container)
OpenSCAP Overview(security scanning for docker image and container)
 
Resume_052715
Resume_052715Resume_052715
Resume_052715
 
Web Sphere Problem Determination Ext
Web Sphere Problem Determination ExtWeb Sphere Problem Determination Ext
Web Sphere Problem Determination Ext
 
PyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web ApplicationsPyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web Applications
 
The power of linux advanced tracer [POUG18]
The power of linux advanced tracer [POUG18]The power of linux advanced tracer [POUG18]
The power of linux advanced tracer [POUG18]
 
Performance Analysis of Idle Programs
Performance Analysis of Idle ProgramsPerformance Analysis of Idle Programs
Performance Analysis of Idle Programs
 
Resilience Engineering: A field of study, a community, and some perspective s...
Resilience Engineering: A field of study, a community, and some perspective s...Resilience Engineering: A field of study, a community, and some perspective s...
Resilience Engineering: A field of study, a community, and some perspective s...
 
JDD 2016 - Grzegorz Rozniecki - Java 8 What Could Possibly Go Wrong
JDD 2016 - Grzegorz Rozniecki - Java 8 What Could Possibly Go WrongJDD 2016 - Grzegorz Rozniecki - Java 8 What Could Possibly Go Wrong
JDD 2016 - Grzegorz Rozniecki - Java 8 What Could Possibly Go Wrong
 
Practical, team-focused operability techniques for distributed systems - DevO...
Practical, team-focused operability techniques for distributed systems - DevO...Practical, team-focused operability techniques for distributed systems - DevO...
Practical, team-focused operability techniques for distributed systems - DevO...
 

Recently uploaded

Alcohol_Dr. Jeenal Mistry MD Pharmacology.pdf
Alcohol_Dr. Jeenal Mistry MD Pharmacology.pdfAlcohol_Dr. Jeenal Mistry MD Pharmacology.pdf
Alcohol_Dr. Jeenal Mistry MD Pharmacology.pdf
Dr Jeenal Mistry
 

Recently uploaded (20)

Compare home pulse pressure components collected directly from home
Compare home pulse pressure components collected directly from homeCompare home pulse pressure components collected directly from home
Compare home pulse pressure components collected directly from home
 
Surat @ℂall @Girls ꧁❤8527049040❤꧂@ℂall @Girls Service Vip Top Model Safe
Surat @ℂall @Girls ꧁❤8527049040❤꧂@ℂall @Girls Service Vip Top Model SafeSurat @ℂall @Girls ꧁❤8527049040❤꧂@ℂall @Girls Service Vip Top Model Safe
Surat @ℂall @Girls ꧁❤8527049040❤꧂@ℂall @Girls Service Vip Top Model Safe
 
1130525--家醫計畫2.0糖尿病照護研討會-社團法人高雄市醫師公會.pdf
1130525--家醫計畫2.0糖尿病照護研討會-社團法人高雄市醫師公會.pdf1130525--家醫計畫2.0糖尿病照護研討會-社團法人高雄市醫師公會.pdf
1130525--家醫計畫2.0糖尿病照護研討會-社團法人高雄市醫師公會.pdf
 
Impact of cancers therapies on the loss in cardiac function, myocardial fffic...
Impact of cancers therapies on the loss in cardiac function, myocardial fffic...Impact of cancers therapies on the loss in cardiac function, myocardial fffic...
Impact of cancers therapies on the loss in cardiac function, myocardial fffic...
 
CURRENT HEALTH PROBLEMS AND ITS SOLUTION BY AYURVEDA.pptx
CURRENT HEALTH PROBLEMS AND ITS SOLUTION BY AYURVEDA.pptxCURRENT HEALTH PROBLEMS AND ITS SOLUTION BY AYURVEDA.pptx
CURRENT HEALTH PROBLEMS AND ITS SOLUTION BY AYURVEDA.pptx
 
Ocular injury ppt Upendra pal optometrist upums saifai etawah
Ocular injury  ppt  Upendra pal  optometrist upums saifai etawahOcular injury  ppt  Upendra pal  optometrist upums saifai etawah
Ocular injury ppt Upendra pal optometrist upums saifai etawah
 
TEST BANK For Williams' Essentials of Nutrition and Diet Therapy, 13th Editio...
TEST BANK For Williams' Essentials of Nutrition and Diet Therapy, 13th Editio...TEST BANK For Williams' Essentials of Nutrition and Diet Therapy, 13th Editio...
TEST BANK For Williams' Essentials of Nutrition and Diet Therapy, 13th Editio...
 
linearity concept of significance, standard deviation, chi square test, stude...
linearity concept of significance, standard deviation, chi square test, stude...linearity concept of significance, standard deviation, chi square test, stude...
linearity concept of significance, standard deviation, chi square test, stude...
 
Why invest into infodemic management in health emergencies
Why invest into infodemic management in health emergenciesWhy invest into infodemic management in health emergencies
Why invest into infodemic management in health emergencies
 
Anuman- An inference for helpful in diagnosis and treatment
Anuman- An inference for helpful in diagnosis and treatmentAnuman- An inference for helpful in diagnosis and treatment
Anuman- An inference for helpful in diagnosis and treatment
 
DECIPHERING COMMON ECG FINDINGS IN ED.pptx
DECIPHERING COMMON ECG FINDINGS IN ED.pptxDECIPHERING COMMON ECG FINDINGS IN ED.pptx
DECIPHERING COMMON ECG FINDINGS IN ED.pptx
 
Multiple sclerosis diet.230524.ppt3.pptx
Multiple sclerosis diet.230524.ppt3.pptxMultiple sclerosis diet.230524.ppt3.pptx
Multiple sclerosis diet.230524.ppt3.pptx
 
Couples presenting to the infertility clinic- Do they really have infertility...
Couples presenting to the infertility clinic- Do they really have infertility...Couples presenting to the infertility clinic- Do they really have infertility...
Couples presenting to the infertility clinic- Do they really have infertility...
 
Alcohol_Dr. Jeenal Mistry MD Pharmacology.pdf
Alcohol_Dr. Jeenal Mistry MD Pharmacology.pdfAlcohol_Dr. Jeenal Mistry MD Pharmacology.pdf
Alcohol_Dr. Jeenal Mistry MD Pharmacology.pdf
 
Young at heart: Cardiovascular health stations to empower healthy lifestyle b...
Young at heart: Cardiovascular health stations to empower healthy lifestyle b...Young at heart: Cardiovascular health stations to empower healthy lifestyle b...
Young at heart: Cardiovascular health stations to empower healthy lifestyle b...
 
Temporal, Infratemporal & Pterygopalatine BY Dr.RIG.pptx
Temporal, Infratemporal & Pterygopalatine BY Dr.RIG.pptxTemporal, Infratemporal & Pterygopalatine BY Dr.RIG.pptx
Temporal, Infratemporal & Pterygopalatine BY Dr.RIG.pptx
 
Gauri Gawande(9) Constipation Final.pptx
Gauri Gawande(9) Constipation Final.pptxGauri Gawande(9) Constipation Final.pptx
Gauri Gawande(9) Constipation Final.pptx
 
Presentació "Advancing Emergency Medicine Education through Virtual Reality"
Presentació "Advancing Emergency Medicine Education through Virtual Reality"Presentació "Advancing Emergency Medicine Education through Virtual Reality"
Presentació "Advancing Emergency Medicine Education through Virtual Reality"
 
Effects of vaping e-cigarettes on arterial health
Effects of vaping e-cigarettes on arterial healthEffects of vaping e-cigarettes on arterial health
Effects of vaping e-cigarettes on arterial health
 
Final CAPNOCYTOPHAGA INFECTION by Gauri Gawande.pptx
Final CAPNOCYTOPHAGA INFECTION by Gauri Gawande.pptxFinal CAPNOCYTOPHAGA INFECTION by Gauri Gawande.pptx
Final CAPNOCYTOPHAGA INFECTION by Gauri Gawande.pptx
 

Apache cTAKES- NLP in Healthcare

  • 1. Apache cTAKES NLP in Healthcare Alex Zbarcea (FannieMae / cTAKES committer)
  • 2. 2 Episode of Care medications imaging pathology inpatient services and procedures outpatient services and procedures medications imaging pathology inpatient services and procedures outpatient services and procedures research EMR notes
  • 3. Natural Language Processing (NLP) 3 “A way for computers to analyze, understand and derive meaning from human language” - algorithmia [1] [1] - https://blog.algorithmia.com/introduction-natural-language-processing-nlp/ ● Feasibility Big Data / Machine Learning / Apache Projects ● Challenges Ontology / Specialization / Anonymization
  • 4. ● Approaches: Extraction Generation ● Algorithms: Rule-based Machine Learning ● Linguistic annotations: Penn TreeBank [1] GENIA [2] How it works 4 corpus [1] - https://www.clips.uantwerpen.be/pages/mbsp-tags [2] - https://orbit.nlm.nih.gov/browse-repository/dataset/human-annotated/83-genia-corpus
  • 5. 5 Apache cTAKES: Overview plain text CDA Named Entity * drug * disease/disorder * sign/symptom * anatomical site * procedures Pipeline based - combining techniques: ● Rule-based ● Machine Learning (ML) Java, Modular Measurable performance (standard)
  • 6. Boundary detection Tokenization Normalization (Lemma) Part-of-speech Shallow parsing Entity recognition cTAKES System 6 NLM Apache OpenNLP SPECIALIST NLP Tools Apache Lucene UMLS, SNOMED-CT, RxNORM ICD10/9, Mayo Clinic, Custom
  • 7. Tasks in NLP (cTAKES example) 7
  • 8. cTAKES: Pipelines 8 (e.g. examples/pipeline/ProcessDir.piper ) // This file contains commands and parameters to run the ctakes-examples "Hello World" pipeline readFiles org/apache/ctakes/examples/notes // Load a simple token processing pipeline from another pipeline file load DefaultTokenizerPipeline.piper // Add non-core annotators add ContextDependentTokenizerAnnotator // Collect discovered Entity information for post-run access collectEntities https://cwiki.apache.org/confluence/display/CTAKES/Piper+Files
  • 9. ● Documentation (confluence [3]) ● ctakes-examples ● main Classes [alex ~/ctakes {trunk %} ]$ grep -nRI --include="*.java" "main(String[] args)" | wc -l 171 cTAKES: Exploring Examples 9 [1] - https://builds.apache.org/analysis [2] - https://builds.apache.org/view/C/view/Apache%20cTAKES/ [3] - https://cwiki.apache.org//confluence/display/CTAKES ● smokingstatus ● coreference ● NexEx ● pipelines ● training ● temporal ● relationextractor ● etc ● Run on real data (i.a. LibreHealth / OpenEMR)
  • 10. Apache cTAKES Demo 10 [1] - https://github.com/azbarcea/ctakes-examples
  • 11. Apache Software Foundation ● Community ○ Linguist experts Users Developers ● Mature Software Lifecycle ○ Support Issues SCM - Collaboration Jenkins Sonar Distribution ● Popularize 11
  • 12. Get involved (You don’t need to be a software developer) 12 ● Help new users and provide feedback ● Give feedback on required features ● Write or Update documentation ● Test the code and report bugs ● Fix bugs ● Write and update the software ● Create artwork ● Extend docs references ● Recommend the project to others ● Gamification ● Volunteer valuable skills ● Learn about communities - the Apache Way ● Requirements Engineering ● Learn about NLP and Healthcare ● What a strong product is about ● Test Automation and Software Engineering ● Develop code with high quality ● Build strong Software Development skills ● Explore your creativity ● Marketing ● Use your time wisely ● Help research community git: https://github.com/apache/ctakes wiki: https://cwiki.apache.org//confluence/display/CTAKES e-mail: https://ctakes.apache.org/mailing-lists.html
  • 13. Thanks! Any questions? You can find me at: ● https://linkedin.com/in/azbarcea ● alexz@apache.org 13