SlideShare a Scribd company logo
1 of 33
Q B R C T E C H T A L K
T H O M A S N A T E P E R S O N
C L I N I C A L S C I E N C E S
P R O S P R C E N T E R A N D Q B R C
M A Y 6 , 2 0 1 3
1
NLP: Natural Language Processing
2
Outline
 Basics of NLP
 NLP Toolkits
 Basic Implementation Example
 Questions?
3
What is NLP?
 Not:
 Natural Language Programming (NLP)
 Neuro-Linguistic Programing (NLP)
 “Natural Language Processing (NLP) is a field of
computer science, artificial intelligence, and
linguistics concerned with the interactions between
computers and human(natural) languages.”
-Wikipedia
4
What can’t it do?
 Extract information not understandable or
discernible by “you”.
 Extract deeper meaning.
 Is not a substitute for Regular Expression pattern
matching
5
Basics of NLP
 Large research field
 From Speech Recognitions to Optical Character Recognition
 Examples:
 Watson (Jeopardy)
 Cleverbot
 Siri/Dragon Speak
 Captcha
 I am only concerned about Information Extraction (IE)
 Sentence detection
 Part of Speech (POS) tagging
 (nouns, verbs, adverbs)
 Named-entity recognition (NER)
 (names, organizations, locations)
 Lemmatisation
 (Walk, walked, walks, walking)
 Relationship extraction
 All possible word relationships
 Parsing
 Determining most probable word relationships
 Coreference
 Linking of references between multiple sentences
6
What’s the point of all that?
 Help categorize unstructured text into a more
structured format so that discrete information can
more easily be extracted.
7
NLP Information Extraction Example
“Pierre Vinken, 61 years old, will join the board as a
nonexecutive director Nov. 29.”
8
NLP Information Extraction Example
POS (Part of Speech) Tagging
 “Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29.”
Pierre/NNP
Vinken/NNP
,/,
61/CD
years/NNS
old/JJ
,/,
will/MD
join/VB
the/DT
board/NN
as/IN
a/DT
nonexecutive/JJ
director/NN
Nov./NNP
29/CD
./.
Penn Treebank Tagset
 CC Coordinating conjunction e.g. and,but,or...
 CD Cardinal Number
 DT Determiner
 EX Existential there
 FW Foreign Word
 IN Preposision or subordinating conjunction
 JJ Adjective
 JJR Adjective, comparative
 JJS Adjective, superlative
 LS List Item Marker
 MD Modal e.g. can, could, might, may...
 NN Noun, singular or mass
 NNP Proper Noun, singular
 NNPS Proper Noun, plural
 NNS Noun, plural
 PDT Predeterminer e.g. all, both ... when they
precede an article
 POS Possessive Ending e.g. Nouns ending in 's
 PRP Personal Pronoun e.g. I, me, you, he...
 PRP$ Possessive Pronoun e.g. my, your, mine,
yours...
 RB Adverb Most words that end in -ly as well
as degree words like quite, too and very
 RBR Adverb, comparative Adverbs with the
comparative ending -er, with a strictly
comparative meaning.
 RBS Adverb, superlative
 RP Particle
 SYM Symbol Should be used for mathematical,
scientific or technical symbols
 TO to
 UH Interjection e.g. uh, well, yes, my...
 VB Verb, base form subsumes imperatives,
infinitives and subjunctives
 VBD Verb, past tense includes the conditional
form of the verb to be
 VBG Verb, gerund or persent participle
 VBN Verb, past participle
 VBP Verb, non-3rd person singular present
 VBZ Verb, 3rd person singular present
 WDT Wh-determiner e.g. which, and that when it
is used as a relative pronoun
 WP Wh-pronoun e.g. what, who, whom...
 WP$ Possessive wh-pronoun e.g.
 WRB Wh-adverb e.g. how, where why
10
POS Parse Tree
11
POS Parse Tree
“Pierre Vinken, 61 years old, will join the board as a nonexecutive director
Nov. 29.”
( (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken))
(, ,)
(ADJP (NML (CD 61) (NNS years))
(JJ old))
(, ,))
(VP (MD will)
(VP (VB join)
(NP (DT the) (NN board))
(PP-CLR (IN as)
(NP (DT a) (JJ nonexecutive) (NN director)))
(NP-TMP (NNP Nov.) (CD 29))))
(. .)))
12
NLP Information Extraction Example
POS (Part of Speech) Tagging
 “Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29.”
Pierre/NNP
Vinken/NNP
,/,
61/CD
years/NNS
old/JJ
,/,
will/MD
join/VB
the/DT
board/NN
as/IN
a/DT
nonexecutive/JJ
director/NN
Nov./NNP
29/CD
./.
13
NLP Information Extraction Example
NER (Named Entity Recognition) Tagging
 “Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29.”
Pierre/NNP/PERSON
Vinken/NNP/PERSON
,/,/O
61/CD/DURATION
years/NNS/NUMBER
old/JJ/DURATION
,/,/O
will/MD/O
join/VB/O
the/DT/O
board/NN/O
as/IN/O
a/DT/O
nonexecutive/JJ/O
director/NN/O
Nov./NNP/DATE
29/CD/DATE
././O
14
NLP Information Extraction Example
Lemmatisation
 “Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29.”
Pierre/NNP/PERSON [Pierre]
Vinken/NNP/PERSON [Vinken]
,/,/O [,]
61/CD/DURATION [61]
years/NNS/NUMBER [year]
old/JJ/DURATION [old]
,/,/O [,]
will/MD/O [will]
join/VB/O [join]
the/DT/O [the]
board/NN/O [board]
as/IN/O [as]
a/DT/O [a]
nonexecutive/JJ/O [nonexecutive]
director/NN/O [director]
Nov./NNP/DATE [Nov.]
29/CD/DATE [29]
././O [.]
15
Relationship Parsing
 “Pierre Vinken, 61 years old, will join the board as a nonexecutive
director Nov. 29.”
nn(Vinken-1, Pierre-0) [nn modifier]
nsubj(join-8, Vinken-1) [nominal subject]
num(years-4, 61-3) [numeric modifier]
npadvmod(old-5, years-4) [noun phrase adverbial modifier]
amod(Vinken-1, old-5) [adjectival modifier]
aux(join-8, will-7) [auxiliary]
det(board-10, the-9) [determiner]
dobj(join-8, board-10) [direct object]
det(director-14, a-12) [determiner]
amod(director-14, nonexecutive-13) [adjectival modifier]
prep_as(join-8, director-14) [prep_collapsed]
tmod(join-8, Nov.-15) [temporal modifier]
num(Nov.-15, 29-16) [numeric modifier]
Relationship Extraction
 root - root
 dep - dependent
 aux - auxiliary
 auxpass - passive auxiliary
 cop - copula
 arg - argument
 agent - agent
 comp - complement
 acomp - adjectival complement
 attr - attributive
 ccomp - clausal complement with internal
subject
 xcomp - clausal complement with external
subject
 complm - complementizer
 obj - object
• dobj - direct object
• iobj - indirect object
• pobj - object of preposition
 mark - marker (word introducing an advcl )
 rel - relative (word introducing a rcmod )
 subj - subject
 nsubj - nominal subject
• nsubjpass - passive nominal subject
 csubj - clausal subject
• csubjpass - passive clausal subject
 cc - coordination
 conj - conjunct
 expl - expletive (expletive “there”)
 mod - modifier
 abbrev - abbreviation modifier
 amod - adjectival modifier
 appos - appositional modifier
 advcl - adverbial clause modifier
 purpcl - purpose clause modifier
 det - determiner
 predet - predeterminer
 preconj - preconjunct
 infmod - infinitival modifier
 mwe - multi-word expression modifier
 partmod - participial modifier
 advmod - adverbial modifier
 neg - negation modifier
 rcmod - relative clause modifier
 quantmod - quantifier modifier
 nn - noun compound modifier
 npadvmod - noun phrase adverbial modifier
 tmod - temporal modifier
 num - numeric modifier
 number - element of compound number
 prep - prepositional modifier
 poss - possession modifier
 possessive - possessive modifier (’s)
 prt - phrasal verb particle
 parataxis - parataxis
 punct - punctuation
 ref - referent
 sdep - semantic dependent
 xsubj - controlling subject
17
NLP Toolkits
 41 different toolkits listed in Wikipedia
 Four of the more popular free open source (FOSS) IE
toolkits

Name Language License Creators
OpenNLP Java
Apache
License
2.0
Online
community
General Architecture for
Text Engineering
(GATE)
Java LGPL
GATE open
source community
Natural Language
Toolkit (NLTK)
Python
Apache
2.0
Team NLTK
Stanford NLP Java GPL
The Stanford
Natural Language
Processing Group
18
NLP Toolkits
 OpenNLP
 Extensive publications
 Corporate Sponsorship
 Java
19
NLP Toolkits
 General Architecture for Text Engineering (GATE)
 Extensive publications
 Integrated Development Environment (IDE) to assist in
development
 Java
 Java Annotation Patterns Engine (JAPE)
20
NLP Toolkits
 Natural Language Tool Kit (NLTK)
 Extensive publications
 Two published documentation books from O’Reilly and Packt
21
NLP Toolkits
 Stanford Core NLP
 Extensive publications
 Wrappers for Perl, Python, Ruby, and Scala languages
 Plugins for GATE and NLTK
22
Questions from PROSPR to answer
 From the hand typed Colonoscopy report:
 How many Polyps
 Location of Polyps
 Size of Polyps
23
Sample Workflow
 Report Definition
 Report Sectionization
 Formatting the Text
 Process the Section
 Further analysis
Report Example
Gastroenterology Laboratory
Patient Name: Susan Storm Richards
Procedure Date: 5/06/2013 15:00:15 PM
MRN: 123456789
Age: 60
Accession #: 123456
Gender: Female
Order #: 123456789
Ethnicity:
Attending MD: Victor Von Doom MD
Note Status: Finalized
Room: 666
Procedure: Colonoscopy
Referring MD: Reed Richards
Providers: Victor von Doom, MD (Doctor)
Attending Participation: I personally performed the entire procedure.
Medicines: SomeDrug 3 mg IV, OtherDrug 75 micrograms IV
Indications: Screening for colorectal malignant neoplasm
Complications: No immediate complications.
Patient Profile: Refer to note in patient chart for documentation of history and
physical.
Procedure: Pre-Anesthesia Assessment:
- PLorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia
deserunt mollit anim id est laborum. ASA Grade Assessment: II - Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim
veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint
occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Lorem ipsum dolor sit amet, consectetur adipisicing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna
aliqua. Ut enim ad minim veniam, quis nostrud exercitation
ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in
esse cillum dolore eu fugiat nulla pariatur. Excepteur sint
occaecat cupidatat non proident, sunt in culpa qui officia
deserunt mollit anim id est laborum.
Findings: Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud
exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu f ugiat nulla
pariatur. Three pedunculated polyps were found in the mid sigmoid colon and in the proximal ascending colonThe polyps were 30 mm in size. Excepteur sint occaecat cupidatat non proident,
sunt in culpa qui officia deserunt mollit anim id est laborum.
Estimated Blood Loss: Estimated blood loss: none.
Recommendation: - Discharge patient to home (ambulatory).
- High fiber diet indefinitely.
CPT(c) Code(s): --- Technical ---
G0121, Colorectal cancer screening; colonoscopy on individual
not meeting criteria for high risk
CPT Copyright 2010 American Medical Association. All Rights Reserved.
The codes documented in this report are preliminary and upon coder review may be revised
to meet current compliance requirements.
Victor von Doom
Victor von Doom, MD
5/6/2013 15:10
This report has been signed electronically.
Number of Addenda: 0
25
Sectioned
Findings: Lorem ipsum dolor sit amet, consectetur
adipisicing elit, sed do eiusmod tempor incididunt ut
labore et dolore magna aliqua. Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex
ea commodo consequat. Duis aute irure dolor in
reprehenderit in voluptate velit esse cillum dolore eu fugiat
nulla pariatur. Three pedunculated polyps were
found in the mid sigmoid colon and in the
proximal ascending colonThe polyps were 30 mm
in size. Excepteur sint occaecat cupidatat non
proident, sunt in culpa qui officia deserunt mollit anim id
est laborum.
26
Sample
“ Three pedunculated polyps were found in the mid
sigmoid colon and in the proximal ascending
colonThe polyps were 30 mm in size. ”
27
Regex Formatting Text: Removing Spaces
$text =~s/(bd+b)(s)(bmmb)/$1$3/g;
$text =~s/(b[a-z]+)([A-Z])([a-z]+b)/$1.s$2$3/g;
$text =~ s/^s+//;
$text =~ s/s+$//;
28
Formatted Sample
“Three pedunculated polyps were found in the mid
sigmoid colon and in the proximal ascending colon.
The polyps were 30mm in size.”
NLP Information Extraction Example
Relationship Dependencies
 Original sentence:
Three pedunculated polyps were found in the mid
sigmoid colon and in the proximal ascending colon.
Dependencies:
num(polyps-2, Three-0) [numeric modifier]
amod(polyps-2, pedunculated-1) [adjectival
modifier]
nsubjpass(found-4, polyps-2) [nominal passive
subject]
auxpass(found-4, were-3) [passive auxiliary]
det(colon-9, the-6) [determiner]
amod(colon-9, mid-7) [adjectival modifier]
nn(colon-9, sigmoid-8) [nn modifier]
prep_in(found-4, colon-9) [prep_collapsed]
det(proximal-13, the-12) [determiner]
prep_in(found-4, proximal-13) [prep_collapsed]
conj_and(colon-9, proximal-13) [conj_collapsed]
partmod(proximal-13, ascending-14) [participial
modifier]
dobj(ascending-14, colon-15) [direct object]
 Original sentence:
The polyps were 30mm in size.
Dependencies:
det(polyps-1, The-0) [determiner]
nsubj(30mm-3, polyps-1) [nominal subject]
cop(30mm-3, were-2) [copula]
prep_in(30mm-3, size-5) [prep_collapsed]
30
Output
“Three pedunculated polyps were found in the mid
sigmoid colon and in the proximal ascending colon.
The polyps were 30 mm in size.”
Output
Number of Polyps: 3
Size of Polyps: 30,
Location of Polyps: 1,4,
1 use Lingua::StanfordCoreNLP;
2 use Lingua::EN::Words2Nums;
3 use strict;
4 use warnings;
5 my $pipeline = new Lingua::StanfordCoreNLP::Pipeline(1);
6 my $text = "Three pedunculated polyps were found in the mid sigmoid colon and in the proximal ascending colonThe polyps were 30 mm in size.";
7 $text =~s/(bd+b)(s)(bmmb)/$1$3/g;
8 $text =~s/(b[a-z]+)([A-Z])([a-z]+b)/$1.s$2$3/g;
9 $text =~ s/^s+//;
10 $text =~ s/s+$//;
11 my $result = $pipeline->process($text);
12 my $polypCount;
13 my $polypSize;
14 my $polypLocation;
15 for my $sentence (@{$result->toArray})
16 {
17 for my $dep (@{$sentence->getDependencies->toArray})
18 {
19 my $relation = $dep->getRelation,
20 my $govern = $dep->getGovernor->getWord,
21 my $depend = $dep->getDependent->getWord;
22 my $num=words2nums($depend);
23
24 if(($relation eq "num")&&($govern=~/^polyp(|s)$/i))
25 {
26 $polypCount=$num;
27 }
28 if(($relation eq "nsubj")&&($govern=~/^d+mm$/)&&($depend=~/^polyp(|s)$/i))
29 {
30 $govern=~s/mm$//;
31 $polypSize="$govern,";
32 }
33 if(($relation eq "nn")&&($govern=~/^colon$/i)&&($depend=~/sigmoid/i))
34 {
35 $polypLocation="1,";
36 }
37 if(($relation eq "dobj")&&($govern=~/^ascending$/i)&&($depend=~/^colon$/i))
38 {
39 $polypLocation.="4,";
40 }
41 }
42 }
43 print "Number of Polyps:t$polypCountn";
44 print "Size of Polyps:tt$polypSizen";
45 print "Location of Polyps:t$polypLocationn";
Perl Example
32
F - Score
6/26/2013
 Comparison against a manually curated “Gold
Standard”
 Precision = Proportion of True Positives
 Recall = True Proportion of Actual Positives
33
Questions?!

More Related Content

Similar to Nlp tech talk

Cec2010 araujo santamaria
Cec2010 araujo santamariaCec2010 araujo santamaria
Cec2010 araujo santamariaLourdes Araujo
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Rajnish Raj
 
Open nlp presentationss
Open nlp presentationssOpen nlp presentationss
Open nlp presentationssChandan Deb
 
Natural Language parsing.pptx
Natural Language parsing.pptxNatural Language parsing.pptx
Natural Language parsing.pptxsiddhantroy13
 
The noun phrase introducers of npChapter 4the noun phr.docx
The noun phrase  introducers of npChapter 4the noun phr.docxThe noun phrase  introducers of npChapter 4the noun phr.docx
The noun phrase introducers of npChapter 4the noun phr.docxarnoldmeredith47041
 
The noun phrase introducers of npChapter 4the noun phr.docx
The noun phrase  introducers of npChapter 4the noun phr.docxThe noun phrase  introducers of npChapter 4the noun phr.docx
The noun phrase introducers of npChapter 4the noun phr.docxdennisa15
 
ACL読み会2014@PFI "Less Grammar, More Features"
ACL読み会2014@PFI "Less Grammar, More Features"ACL読み会2014@PFI "Less Grammar, More Features"
ACL読み会2014@PFI "Less Grammar, More Features"nozyh
 
Lecture 7: Definite Clause Grammars
Lecture 7: Definite Clause GrammarsLecture 7: Definite Clause Grammars
Lecture 7: Definite Clause GrammarsCS, NcState
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
 
NLP in Practice - Part II
NLP in Practice - Part IINLP in Practice - Part II
NLP in Practice - Part IIDelip Rao
 
natural language processing
natural language processing natural language processing
natural language processing sunanthakrishnan
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Mustafa Jarrar
 
Part-of-Speech Tagging of Northern Sotho: Disambiguating Polysemous Function ...
Part-of-Speech Tagging of Northern Sotho: Disambiguating Polysemous Function ...Part-of-Speech Tagging of Northern Sotho: Disambiguating Polysemous Function ...
Part-of-Speech Tagging of Northern Sotho: Disambiguating Polysemous Function ...Guy De Pauw
 
Natural Language Processing and Python
Natural Language Processing and PythonNatural Language Processing and Python
Natural Language Processing and Pythonanntp
 
Arabic syntactic parsing
Arabic syntactic parsingArabic syntactic parsing
Arabic syntactic parsingAmena dheif
 
Topic model an introduction
Topic model an introductionTopic model an introduction
Topic model an introductionYueshen Xu
 

Similar to Nlp tech talk (20)

Chapter14part2
Chapter14part2Chapter14part2
Chapter14part2
 
Cec2010 araujo santamaria
Cec2010 araujo santamariaCec2010 araujo santamaria
Cec2010 araujo santamaria
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...
 
Open nlp presentationss
Open nlp presentationssOpen nlp presentationss
Open nlp presentationss
 
Natural Language parsing.pptx
Natural Language parsing.pptxNatural Language parsing.pptx
Natural Language parsing.pptx
 
OpenNLP demo
OpenNLP demoOpenNLP demo
OpenNLP demo
 
The noun phrase introducers of npChapter 4the noun phr.docx
The noun phrase  introducers of npChapter 4the noun phr.docxThe noun phrase  introducers of npChapter 4the noun phr.docx
The noun phrase introducers of npChapter 4the noun phr.docx
 
The noun phrase introducers of npChapter 4the noun phr.docx
The noun phrase  introducers of npChapter 4the noun phr.docxThe noun phrase  introducers of npChapter 4the noun phr.docx
The noun phrase introducers of npChapter 4the noun phr.docx
 
ACL読み会2014@PFI "Less Grammar, More Features"
ACL読み会2014@PFI "Less Grammar, More Features"ACL読み会2014@PFI "Less Grammar, More Features"
ACL読み会2014@PFI "Less Grammar, More Features"
 
Lecture 7: Definite Clause Grammars
Lecture 7: Definite Clause GrammarsLecture 7: Definite Clause Grammars
Lecture 7: Definite Clause Grammars
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
NLP in Practice - Part II
NLP in Practice - Part IINLP in Practice - Part II
NLP in Practice - Part II
 
natural language processing
natural language processing natural language processing
natural language processing
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
Part-of-Speech Tagging of Northern Sotho: Disambiguating Polysemous Function ...
Part-of-Speech Tagging of Northern Sotho: Disambiguating Polysemous Function ...Part-of-Speech Tagging of Northern Sotho: Disambiguating Polysemous Function ...
Part-of-Speech Tagging of Northern Sotho: Disambiguating Polysemous Function ...
 
Natural Language Processing and Python
Natural Language Processing and PythonNatural Language Processing and Python
Natural Language Processing and Python
 
Arabic syntactic parsing
Arabic syntactic parsingArabic syntactic parsing
Arabic syntactic parsing
 
Topic model an introduction
Topic model an introductionTopic model an introduction
Topic model an introduction
 
NLP
NLPNLP
NLP
 
NLP
NLPNLP
NLP
 

Recently uploaded

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Recently uploaded (20)

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

Nlp tech talk

  • 1. Q B R C T E C H T A L K T H O M A S N A T E P E R S O N C L I N I C A L S C I E N C E S P R O S P R C E N T E R A N D Q B R C M A Y 6 , 2 0 1 3 1 NLP: Natural Language Processing
  • 2. 2 Outline  Basics of NLP  NLP Toolkits  Basic Implementation Example  Questions?
  • 3. 3 What is NLP?  Not:  Natural Language Programming (NLP)  Neuro-Linguistic Programing (NLP)  “Natural Language Processing (NLP) is a field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human(natural) languages.” -Wikipedia
  • 4. 4 What can’t it do?  Extract information not understandable or discernible by “you”.  Extract deeper meaning.  Is not a substitute for Regular Expression pattern matching
  • 5. 5 Basics of NLP  Large research field  From Speech Recognitions to Optical Character Recognition  Examples:  Watson (Jeopardy)  Cleverbot  Siri/Dragon Speak  Captcha  I am only concerned about Information Extraction (IE)  Sentence detection  Part of Speech (POS) tagging  (nouns, verbs, adverbs)  Named-entity recognition (NER)  (names, organizations, locations)  Lemmatisation  (Walk, walked, walks, walking)  Relationship extraction  All possible word relationships  Parsing  Determining most probable word relationships  Coreference  Linking of references between multiple sentences
  • 6. 6 What’s the point of all that?  Help categorize unstructured text into a more structured format so that discrete information can more easily be extracted.
  • 7. 7 NLP Information Extraction Example “Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29.”
  • 8. 8 NLP Information Extraction Example POS (Part of Speech) Tagging  “Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29.” Pierre/NNP Vinken/NNP ,/, 61/CD years/NNS old/JJ ,/, will/MD join/VB the/DT board/NN as/IN a/DT nonexecutive/JJ director/NN Nov./NNP 29/CD ./.
  • 9. Penn Treebank Tagset  CC Coordinating conjunction e.g. and,but,or...  CD Cardinal Number  DT Determiner  EX Existential there  FW Foreign Word  IN Preposision or subordinating conjunction  JJ Adjective  JJR Adjective, comparative  JJS Adjective, superlative  LS List Item Marker  MD Modal e.g. can, could, might, may...  NN Noun, singular or mass  NNP Proper Noun, singular  NNPS Proper Noun, plural  NNS Noun, plural  PDT Predeterminer e.g. all, both ... when they precede an article  POS Possessive Ending e.g. Nouns ending in 's  PRP Personal Pronoun e.g. I, me, you, he...  PRP$ Possessive Pronoun e.g. my, your, mine, yours...  RB Adverb Most words that end in -ly as well as degree words like quite, too and very  RBR Adverb, comparative Adverbs with the comparative ending -er, with a strictly comparative meaning.  RBS Adverb, superlative  RP Particle  SYM Symbol Should be used for mathematical, scientific or technical symbols  TO to  UH Interjection e.g. uh, well, yes, my...  VB Verb, base form subsumes imperatives, infinitives and subjunctives  VBD Verb, past tense includes the conditional form of the verb to be  VBG Verb, gerund or persent participle  VBN Verb, past participle  VBP Verb, non-3rd person singular present  VBZ Verb, 3rd person singular present  WDT Wh-determiner e.g. which, and that when it is used as a relative pronoun  WP Wh-pronoun e.g. what, who, whom...  WP$ Possessive wh-pronoun e.g.  WRB Wh-adverb e.g. how, where why
  • 11. 11 POS Parse Tree “Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29.” ( (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken)) (, ,) (ADJP (NML (CD 61) (NNS years)) (JJ old)) (, ,)) (VP (MD will) (VP (VB join) (NP (DT the) (NN board)) (PP-CLR (IN as) (NP (DT a) (JJ nonexecutive) (NN director))) (NP-TMP (NNP Nov.) (CD 29)))) (. .)))
  • 12. 12 NLP Information Extraction Example POS (Part of Speech) Tagging  “Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29.” Pierre/NNP Vinken/NNP ,/, 61/CD years/NNS old/JJ ,/, will/MD join/VB the/DT board/NN as/IN a/DT nonexecutive/JJ director/NN Nov./NNP 29/CD ./.
  • 13. 13 NLP Information Extraction Example NER (Named Entity Recognition) Tagging  “Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29.” Pierre/NNP/PERSON Vinken/NNP/PERSON ,/,/O 61/CD/DURATION years/NNS/NUMBER old/JJ/DURATION ,/,/O will/MD/O join/VB/O the/DT/O board/NN/O as/IN/O a/DT/O nonexecutive/JJ/O director/NN/O Nov./NNP/DATE 29/CD/DATE ././O
  • 14. 14 NLP Information Extraction Example Lemmatisation  “Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29.” Pierre/NNP/PERSON [Pierre] Vinken/NNP/PERSON [Vinken] ,/,/O [,] 61/CD/DURATION [61] years/NNS/NUMBER [year] old/JJ/DURATION [old] ,/,/O [,] will/MD/O [will] join/VB/O [join] the/DT/O [the] board/NN/O [board] as/IN/O [as] a/DT/O [a] nonexecutive/JJ/O [nonexecutive] director/NN/O [director] Nov./NNP/DATE [Nov.] 29/CD/DATE [29] ././O [.]
  • 15. 15 Relationship Parsing  “Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29.” nn(Vinken-1, Pierre-0) [nn modifier] nsubj(join-8, Vinken-1) [nominal subject] num(years-4, 61-3) [numeric modifier] npadvmod(old-5, years-4) [noun phrase adverbial modifier] amod(Vinken-1, old-5) [adjectival modifier] aux(join-8, will-7) [auxiliary] det(board-10, the-9) [determiner] dobj(join-8, board-10) [direct object] det(director-14, a-12) [determiner] amod(director-14, nonexecutive-13) [adjectival modifier] prep_as(join-8, director-14) [prep_collapsed] tmod(join-8, Nov.-15) [temporal modifier] num(Nov.-15, 29-16) [numeric modifier]
  • 16. Relationship Extraction  root - root  dep - dependent  aux - auxiliary  auxpass - passive auxiliary  cop - copula  arg - argument  agent - agent  comp - complement  acomp - adjectival complement  attr - attributive  ccomp - clausal complement with internal subject  xcomp - clausal complement with external subject  complm - complementizer  obj - object • dobj - direct object • iobj - indirect object • pobj - object of preposition  mark - marker (word introducing an advcl )  rel - relative (word introducing a rcmod )  subj - subject  nsubj - nominal subject • nsubjpass - passive nominal subject  csubj - clausal subject • csubjpass - passive clausal subject  cc - coordination  conj - conjunct  expl - expletive (expletive “there”)  mod - modifier  abbrev - abbreviation modifier  amod - adjectival modifier  appos - appositional modifier  advcl - adverbial clause modifier  purpcl - purpose clause modifier  det - determiner  predet - predeterminer  preconj - preconjunct  infmod - infinitival modifier  mwe - multi-word expression modifier  partmod - participial modifier  advmod - adverbial modifier  neg - negation modifier  rcmod - relative clause modifier  quantmod - quantifier modifier  nn - noun compound modifier  npadvmod - noun phrase adverbial modifier  tmod - temporal modifier  num - numeric modifier  number - element of compound number  prep - prepositional modifier  poss - possession modifier  possessive - possessive modifier (’s)  prt - phrasal verb particle  parataxis - parataxis  punct - punctuation  ref - referent  sdep - semantic dependent  xsubj - controlling subject
  • 17. 17 NLP Toolkits  41 different toolkits listed in Wikipedia  Four of the more popular free open source (FOSS) IE toolkits  Name Language License Creators OpenNLP Java Apache License 2.0 Online community General Architecture for Text Engineering (GATE) Java LGPL GATE open source community Natural Language Toolkit (NLTK) Python Apache 2.0 Team NLTK Stanford NLP Java GPL The Stanford Natural Language Processing Group
  • 18. 18 NLP Toolkits  OpenNLP  Extensive publications  Corporate Sponsorship  Java
  • 19. 19 NLP Toolkits  General Architecture for Text Engineering (GATE)  Extensive publications  Integrated Development Environment (IDE) to assist in development  Java  Java Annotation Patterns Engine (JAPE)
  • 20. 20 NLP Toolkits  Natural Language Tool Kit (NLTK)  Extensive publications  Two published documentation books from O’Reilly and Packt
  • 21. 21 NLP Toolkits  Stanford Core NLP  Extensive publications  Wrappers for Perl, Python, Ruby, and Scala languages  Plugins for GATE and NLTK
  • 22. 22 Questions from PROSPR to answer  From the hand typed Colonoscopy report:  How many Polyps  Location of Polyps  Size of Polyps
  • 23. 23 Sample Workflow  Report Definition  Report Sectionization  Formatting the Text  Process the Section  Further analysis
  • 24. Report Example Gastroenterology Laboratory Patient Name: Susan Storm Richards Procedure Date: 5/06/2013 15:00:15 PM MRN: 123456789 Age: 60 Accession #: 123456 Gender: Female Order #: 123456789 Ethnicity: Attending MD: Victor Von Doom MD Note Status: Finalized Room: 666 Procedure: Colonoscopy Referring MD: Reed Richards Providers: Victor von Doom, MD (Doctor) Attending Participation: I personally performed the entire procedure. Medicines: SomeDrug 3 mg IV, OtherDrug 75 micrograms IV Indications: Screening for colorectal malignant neoplasm Complications: No immediate complications. Patient Profile: Refer to note in patient chart for documentation of history and physical. Procedure: Pre-Anesthesia Assessment: - PLorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. ASA Grade Assessment: II - Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Findings: Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu f ugiat nulla pariatur. Three pedunculated polyps were found in the mid sigmoid colon and in the proximal ascending colonThe polyps were 30 mm in size. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Estimated Blood Loss: Estimated blood loss: none. Recommendation: - Discharge patient to home (ambulatory). - High fiber diet indefinitely. CPT(c) Code(s): --- Technical --- G0121, Colorectal cancer screening; colonoscopy on individual not meeting criteria for high risk CPT Copyright 2010 American Medical Association. All Rights Reserved. The codes documented in this report are preliminary and upon coder review may be revised to meet current compliance requirements. Victor von Doom Victor von Doom, MD 5/6/2013 15:10 This report has been signed electronically. Number of Addenda: 0
  • 25. 25 Sectioned Findings: Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Three pedunculated polyps were found in the mid sigmoid colon and in the proximal ascending colonThe polyps were 30 mm in size. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
  • 26. 26 Sample “ Three pedunculated polyps were found in the mid sigmoid colon and in the proximal ascending colonThe polyps were 30 mm in size. ”
  • 27. 27 Regex Formatting Text: Removing Spaces $text =~s/(bd+b)(s)(bmmb)/$1$3/g; $text =~s/(b[a-z]+)([A-Z])([a-z]+b)/$1.s$2$3/g; $text =~ s/^s+//; $text =~ s/s+$//;
  • 28. 28 Formatted Sample “Three pedunculated polyps were found in the mid sigmoid colon and in the proximal ascending colon. The polyps were 30mm in size.”
  • 29. NLP Information Extraction Example Relationship Dependencies  Original sentence: Three pedunculated polyps were found in the mid sigmoid colon and in the proximal ascending colon. Dependencies: num(polyps-2, Three-0) [numeric modifier] amod(polyps-2, pedunculated-1) [adjectival modifier] nsubjpass(found-4, polyps-2) [nominal passive subject] auxpass(found-4, were-3) [passive auxiliary] det(colon-9, the-6) [determiner] amod(colon-9, mid-7) [adjectival modifier] nn(colon-9, sigmoid-8) [nn modifier] prep_in(found-4, colon-9) [prep_collapsed] det(proximal-13, the-12) [determiner] prep_in(found-4, proximal-13) [prep_collapsed] conj_and(colon-9, proximal-13) [conj_collapsed] partmod(proximal-13, ascending-14) [participial modifier] dobj(ascending-14, colon-15) [direct object]  Original sentence: The polyps were 30mm in size. Dependencies: det(polyps-1, The-0) [determiner] nsubj(30mm-3, polyps-1) [nominal subject] cop(30mm-3, were-2) [copula] prep_in(30mm-3, size-5) [prep_collapsed]
  • 30. 30 Output “Three pedunculated polyps were found in the mid sigmoid colon and in the proximal ascending colon. The polyps were 30 mm in size.” Output Number of Polyps: 3 Size of Polyps: 30, Location of Polyps: 1,4,
  • 31. 1 use Lingua::StanfordCoreNLP; 2 use Lingua::EN::Words2Nums; 3 use strict; 4 use warnings; 5 my $pipeline = new Lingua::StanfordCoreNLP::Pipeline(1); 6 my $text = "Three pedunculated polyps were found in the mid sigmoid colon and in the proximal ascending colonThe polyps were 30 mm in size."; 7 $text =~s/(bd+b)(s)(bmmb)/$1$3/g; 8 $text =~s/(b[a-z]+)([A-Z])([a-z]+b)/$1.s$2$3/g; 9 $text =~ s/^s+//; 10 $text =~ s/s+$//; 11 my $result = $pipeline->process($text); 12 my $polypCount; 13 my $polypSize; 14 my $polypLocation; 15 for my $sentence (@{$result->toArray}) 16 { 17 for my $dep (@{$sentence->getDependencies->toArray}) 18 { 19 my $relation = $dep->getRelation, 20 my $govern = $dep->getGovernor->getWord, 21 my $depend = $dep->getDependent->getWord; 22 my $num=words2nums($depend); 23 24 if(($relation eq "num")&&($govern=~/^polyp(|s)$/i)) 25 { 26 $polypCount=$num; 27 } 28 if(($relation eq "nsubj")&&($govern=~/^d+mm$/)&&($depend=~/^polyp(|s)$/i)) 29 { 30 $govern=~s/mm$//; 31 $polypSize="$govern,"; 32 } 33 if(($relation eq "nn")&&($govern=~/^colon$/i)&&($depend=~/sigmoid/i)) 34 { 35 $polypLocation="1,"; 36 } 37 if(($relation eq "dobj")&&($govern=~/^ascending$/i)&&($depend=~/^colon$/i)) 38 { 39 $polypLocation.="4,"; 40 } 41 } 42 } 43 print "Number of Polyps:t$polypCountn"; 44 print "Size of Polyps:tt$polypSizen"; 45 print "Location of Polyps:t$polypLocationn"; Perl Example
  • 32. 32 F - Score 6/26/2013  Comparison against a manually curated “Gold Standard”  Precision = Proportion of True Positives  Recall = True Proportion of Actual Positives

Editor's Notes

  1. To make the extraction via regular expression pattern matching much easier.
  2. CD Cardinal NumberDT DeterminerIN Preposision or subordinating conjunctionJJ AdjectiveMD Modal e.g. can, could, might, may...NN Noun, singular or massNNP Proper Noun, singularNNS Noun, pluralVB Verb, base form subsumes imperatives, infinitives and subjunctives
  3. CC Coordinating conjunction e.g. and,but,or...CD Cardinal NumberDT DeterminerEX Existential thereFW Foreign WordIN Preposision or subordinating conjunctionJJ AdjectiveJJR Adjective, comparativeJJS Adjective, superlativeLS List Item MarkerMD Modal e.g. can, could, might, may...NN Noun, singular or massNNP Proper Noun, singularNNPS Proper Noun, pluralNNS Noun, pluralPDT Predeterminer e.g. all, both ... when they precede an articlePOS Possessive Ending e.g. Nouns ending in 'sPRP Personal Pronoun e.g. I, me, you, he...PRP$ Possessive Pronoun e.g. my, your, mine, yours...RB Adverb Most words that end in -ly as well as degree words like quite, too and veryRBR Adverb, comparative Adverbs with the comparative ending -er, with a strictly comparative meaning.RBS Adverb, superlativeRP ParticleSYM Symbol Should be used for mathematical, scientific or technical symbolsTO toUH Interjection e.g. uh, well, yes, my...VB Verb, base form subsumes imperatives, infinitives and subjunctivesVBD Verb, past tense includes the conditional form of the verb to beVBG Verb, gerund or persent participleVBN Verb, past participleVBP Verb, non-3rd person singular presentVBZ Verb, 3rd person singular presentWDT Wh-determiner e.g. which, and that when it is used as a relative pronounWP Wh-pronoun e.g. what, who, whom...WP$ Possessive wh-pronoun e.g.WRB Wh-adverb e.g. how, where why
  4. CD Cardinal NumberDT DeterminerIN Preposision or subordinating conjunctionJJ AdjectiveMD Modal e.g. can, could, might, may...NN Noun, singular or massNNP Proper Noun, singularVB Verb, base form subsumes imperatives, infinitives and subjunctives
  5. CD Cardinal NumberDT DeterminerIN Preposision or subordinating conjunctionJJ AdjectiveMD Modal e.g. can, could, might, may...NN Noun, singular or massNNP Proper Noun, singularVB Verb, base form subsumes imperatives, infinitives and subjunctives
  6. Difference between the Number 61 being a Duration and the Number 29 being part of a Date.e.g. You just wanted to extract all the characters in a book.
  7. aux - auxiliaryauxpass - passive auxiliarydobj - direct objectnsubj - nominal subjectnsubjpass - passive nominal subjectconj - conjunctamod - adjectival modifierdet - determinernn - noun compound modifiernpadvmod - noun phrase adverbial modifiertmod - temporal modifiernumber - element of compound numbernum - numeric modifierprep - prepositional modifier
  8. root - rootdep - dependentaux - auxiliaryauxpass - passive auxiliarycop - copulaarg - argumentagent - agentcomp - complementacomp - adjectival complementattr - attributiveccomp - clausal complement with internal subjectxcomp - clausal complement with external subjectcomplm - complementizerobj - objectdobj - direct objectiobj - indirect objectpobj - object of prepositionmark - marker (word introducing an advcl )rel - relative (word introducing a rcmod )subj - subjectnsubj - nominal subjectnsubjpass - passive nominal subjectcsubj - clausal subjectcsubjpass - passive clausal subjectcc - coordinationconj - conjunctexpl - expletive (expletive “there”)mod - modifierabbrev - abbreviation modifieramod - adjectival modifierappos - appositional modifieradvcl - adverbial clause modifierpurpcl - purpose clause modifierdet - determinerpredet - predeterminerpreconj - preconjunctinfmod - infinitival modifiermwe - multi-word expression modifierpartmod - participial modifieradvmod - adverbial modifierneg - negation modifierrcmod - relative clause modifierquantmod - quantifier modifiernn - noun compound modifiernpadvmod - noun phrase adverbial modifiertmod - temporal modifiernum - numeric modifiernumber - element of compound numberprep - prepositional modifierposs - possession modifierpossessive - possessive modifier (’s)prt - phrasal verb particleparataxis - parataxispunct - punctuationref - referentsdep - semantic dependentxsubj - controlling subject
  9. GATE = University of SheffieldTeam NLTK = (6 people from: UT Austin; University of Gothenburg, Sweden; University of Melbourn; University of Sydney; Oslo, Norway; Ekaterinburg, Russia)
  10. root - rootdep - dependentaux - auxiliaryauxpass - passive auxiliarycop - copulaarg - argumentcomp - complementobj - objectdobj - direct objectsubj - subjectnsubj - nominal subjectnsubjpass - passive nominal subjectconj - conjunctmod - modifieramod - adjectival modifierdet - determinerpartmod - participial modifiernn - noun compound modifiernum - numeric modifiernumber - element of compound numberprep - prepositional modifier
  11. Takes about 40 sec to run.