Your SlideShare is downloading. ×
Nlp tech talk
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Nlp tech talk

182

Published on

NLP Tech Talk for QBRC at UTSW

NLP Tech Talk for QBRC at UTSW

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
182
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
4
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • To make the extraction via regular expression pattern matching much easier.
  • CD Cardinal NumberDT DeterminerIN Preposision or subordinating conjunctionJJ AdjectiveMD Modal e.g. can, could, might, may...NN Noun, singular or massNNP Proper Noun, singularNNS Noun, pluralVB Verb, base form subsumes imperatives, infinitives and subjunctives
  • CC Coordinating conjunction e.g. and,but,or...CD Cardinal NumberDT DeterminerEX Existential thereFW Foreign WordIN Preposision or subordinating conjunctionJJ AdjectiveJJR Adjective, comparativeJJS Adjective, superlativeLS List Item MarkerMD Modal e.g. can, could, might, may...NN Noun, singular or massNNP Proper Noun, singularNNPS Proper Noun, pluralNNS Noun, pluralPDT Predeterminer e.g. all, both ... when they precede an articlePOS Possessive Ending e.g. Nouns ending in 'sPRP Personal Pronoun e.g. I, me, you, he...PRP$ Possessive Pronoun e.g. my, your, mine, yours...RB Adverb Most words that end in -ly as well as degree words like quite, too and veryRBR Adverb, comparative Adverbs with the comparative ending -er, with a strictly comparative meaning.RBS Adverb, superlativeRP ParticleSYM Symbol Should be used for mathematical, scientific or technical symbolsTO toUH Interjection e.g. uh, well, yes, my...VB Verb, base form subsumes imperatives, infinitives and subjunctivesVBD Verb, past tense includes the conditional form of the verb to beVBG Verb, gerund or persent participleVBN Verb, past participleVBP Verb, non-3rd person singular presentVBZ Verb, 3rd person singular presentWDT Wh-determiner e.g. which, and that when it is used as a relative pronounWP Wh-pronoun e.g. what, who, whom...WP$ Possessive wh-pronoun e.g.WRB Wh-adverb e.g. how, where why
  • CD Cardinal NumberDT DeterminerIN Preposision or subordinating conjunctionJJ AdjectiveMD Modal e.g. can, could, might, may...NN Noun, singular or massNNP Proper Noun, singularVB Verb, base form subsumes imperatives, infinitives and subjunctives
  • CD Cardinal NumberDT DeterminerIN Preposision or subordinating conjunctionJJ AdjectiveMD Modal e.g. can, could, might, may...NN Noun, singular or massNNP Proper Noun, singularVB Verb, base form subsumes imperatives, infinitives and subjunctives
  • Difference between the Number 61 being a Duration and the Number 29 being part of a Date.e.g. You just wanted to extract all the characters in a book.
  • aux - auxiliaryauxpass - passive auxiliarydobj - direct objectnsubj - nominal subjectnsubjpass - passive nominal subjectconj - conjunctamod - adjectival modifierdet - determinernn - noun compound modifiernpadvmod - noun phrase adverbial modifiertmod - temporal modifiernumber - element of compound numbernum - numeric modifierprep - prepositional modifier
  • root - rootdep - dependentaux - auxiliaryauxpass - passive auxiliarycop - copulaarg - argumentagent - agentcomp - complementacomp - adjectival complementattr - attributiveccomp - clausal complement with internal subjectxcomp - clausal complement with external subjectcomplm - complementizerobj - objectdobj - direct objectiobj - indirect objectpobj - object of prepositionmark - marker (word introducing an advcl )rel - relative (word introducing a rcmod )subj - subjectnsubj - nominal subjectnsubjpass - passive nominal subjectcsubj - clausal subjectcsubjpass - passive clausal subjectcc - coordinationconj - conjunctexpl - expletive (expletive “there”)mod - modifierabbrev - abbreviation modifieramod - adjectival modifierappos - appositional modifieradvcl - adverbial clause modifierpurpcl - purpose clause modifierdet - determinerpredet - predeterminerpreconj - preconjunctinfmod - infinitival modifiermwe - multi-word expression modifierpartmod - participial modifieradvmod - adverbial modifierneg - negation modifierrcmod - relative clause modifierquantmod - quantifier modifiernn - noun compound modifiernpadvmod - noun phrase adverbial modifiertmod - temporal modifiernum - numeric modifiernumber - element of compound numberprep - prepositional modifierposs - possession modifierpossessive - possessive modifier (’s)prt - phrasal verb particleparataxis - parataxispunct - punctuationref - referentsdep - semantic dependentxsubj - controlling subject
  • GATE = University of SheffieldTeam NLTK = (6 people from: UT Austin; University of Gothenburg, Sweden; University of Melbourn; University of Sydney; Oslo, Norway; Ekaterinburg, Russia)
  • root - rootdep - dependentaux - auxiliaryauxpass - passive auxiliarycop - copulaarg - argumentcomp - complementobj - objectdobj - direct objectsubj - subjectnsubj - nominal subjectnsubjpass - passive nominal subjectconj - conjunctmod - modifieramod - adjectival modifierdet - determinerpartmod - participial modifiernn - noun compound modifiernum - numeric modifiernumber - element of compound numberprep - prepositional modifier
  • Takes about 40 sec to run.
  • Transcript

    • 1. Q B R C T E C H T A L KT H O M A S N A T E P E R S O NC L I N I C A L S C I E N C E SP R O S P R C E N T E R A N D Q B R CM A Y 6 , 2 0 1 31NLP: Natural Language Processing
    • 2. 2Outline Basics of NLP NLP Toolkits Basic Implementation Example Questions?
    • 3. 3What is NLP? Not: Natural Language Programming (NLP) Neuro-Linguistic Programing (NLP) “Natural Language Processing (NLP) is a field ofcomputer science, artificial intelligence, andlinguistics concerned with the interactions betweencomputers and human(natural) languages.”-Wikipedia
    • 4. 4What can’t it do? Extract information not understandable ordiscernible by “you”. Extract deeper meaning. Is not a substitute for Regular Expression patternmatching
    • 5. 5Basics of NLP Large research field From Speech Recognitions to Optical Character Recognition Examples: Watson (Jeopardy) Cleverbot Siri/Dragon Speak Captcha I am only concerned about Information Extraction (IE) Sentence detection Part of Speech (POS) tagging (nouns, verbs, adverbs) Named-entity recognition (NER) (names, organizations, locations) Lemmatisation (Walk, walked, walks, walking) Relationship extraction All possible word relationships Parsing Determining most probable word relationships Coreference Linking of references between multiple sentences
    • 6. 6What’s the point of all that? Help categorize unstructured text into a morestructured format so that discrete information canmore easily be extracted.
    • 7. 7NLP Information Extraction Example“Pierre Vinken, 61 years old, will join the board as anonexecutive director Nov. 29.”
    • 8. 8NLP Information Extraction ExamplePOS (Part of Speech) Tagging “Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29.”Pierre/NNPVinken/NNP,/,61/CDyears/NNSold/JJ,/,will/MDjoin/VBthe/DTboard/NNas/INa/DTnonexecutive/JJdirector/NNNov./NNP29/CD./.
    • 9. Penn Treebank Tagset CC Coordinating conjunction e.g. and,but,or... CD Cardinal Number DT Determiner EX Existential there FW Foreign Word IN Preposision or subordinating conjunction JJ Adjective JJR Adjective, comparative JJS Adjective, superlative LS List Item Marker MD Modal e.g. can, could, might, may... NN Noun, singular or mass NNP Proper Noun, singular NNPS Proper Noun, plural NNS Noun, plural PDT Predeterminer e.g. all, both ... when theyprecede an article POS Possessive Ending e.g. Nouns ending in s PRP Personal Pronoun e.g. I, me, you, he... PRP$ Possessive Pronoun e.g. my, your, mine,yours... RB Adverb Most words that end in -ly as wellas degree words like quite, too and very RBR Adverb, comparative Adverbs with thecomparative ending -er, with a strictlycomparative meaning. RBS Adverb, superlative RP Particle SYM Symbol Should be used for mathematical,scientific or technical symbols TO to UH Interjection e.g. uh, well, yes, my... VB Verb, base form subsumes imperatives,infinitives and subjunctives VBD Verb, past tense includes the conditionalform of the verb to be VBG Verb, gerund or persent participle VBN Verb, past participle VBP Verb, non-3rd person singular present VBZ Verb, 3rd person singular present WDT Wh-determiner e.g. which, and that when itis used as a relative pronoun WP Wh-pronoun e.g. what, who, whom... WP$ Possessive wh-pronoun e.g. WRB Wh-adverb e.g. how, where why
    • 10. 10POS Parse Tree
    • 11. 11POS Parse Tree“Pierre Vinken, 61 years old, will join the board as a nonexecutive directorNov. 29.”( (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken))(, ,)(ADJP (NML (CD 61) (NNS years))(JJ old))(, ,))(VP (MD will)(VP (VB join)(NP (DT the) (NN board))(PP-CLR (IN as)(NP (DT a) (JJ nonexecutive) (NN director)))(NP-TMP (NNP Nov.) (CD 29))))(. .)))
    • 12. 12NLP Information Extraction ExamplePOS (Part of Speech) Tagging “Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29.”Pierre/NNPVinken/NNP,/,61/CDyears/NNSold/JJ,/,will/MDjoin/VBthe/DTboard/NNas/INa/DTnonexecutive/JJdirector/NNNov./NNP29/CD./.
    • 13. 13NLP Information Extraction ExampleNER (Named Entity Recognition) Tagging “Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29.”Pierre/NNP/PERSONVinken/NNP/PERSON,/,/O61/CD/DURATIONyears/NNS/NUMBERold/JJ/DURATION,/,/Owill/MD/Ojoin/VB/Othe/DT/Oboard/NN/Oas/IN/Oa/DT/Ononexecutive/JJ/Odirector/NN/ONov./NNP/DATE29/CD/DATE././O
    • 14. 14NLP Information Extraction ExampleLemmatisation “Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29.”Pierre/NNP/PERSON [Pierre]Vinken/NNP/PERSON [Vinken],/,/O [,]61/CD/DURATION [61]years/NNS/NUMBER [year]old/JJ/DURATION [old],/,/O [,]will/MD/O [will]join/VB/O [join]the/DT/O [the]board/NN/O [board]as/IN/O [as]a/DT/O [a]nonexecutive/JJ/O [nonexecutive]director/NN/O [director]Nov./NNP/DATE [Nov.]29/CD/DATE [29]././O [.]
    • 15. 15Relationship Parsing “Pierre Vinken, 61 years old, will join the board as a nonexecutivedirector Nov. 29.”nn(Vinken-1, Pierre-0) [nn modifier]nsubj(join-8, Vinken-1) [nominal subject]num(years-4, 61-3) [numeric modifier]npadvmod(old-5, years-4) [noun phrase adverbial modifier]amod(Vinken-1, old-5) [adjectival modifier]aux(join-8, will-7) [auxiliary]det(board-10, the-9) [determiner]dobj(join-8, board-10) [direct object]det(director-14, a-12) [determiner]amod(director-14, nonexecutive-13) [adjectival modifier]prep_as(join-8, director-14) [prep_collapsed]tmod(join-8, Nov.-15) [temporal modifier]num(Nov.-15, 29-16) [numeric modifier]
    • 16. Relationship Extraction root - root dep - dependent aux - auxiliary auxpass - passive auxiliary cop - copula arg - argument agent - agent comp - complement acomp - adjectival complement attr - attributive ccomp - clausal complement with internalsubject xcomp - clausal complement with externalsubject complm - complementizer obj - object• dobj - direct object• iobj - indirect object• pobj - object of preposition mark - marker (word introducing an advcl ) rel - relative (word introducing a rcmod ) subj - subject nsubj - nominal subject• nsubjpass - passive nominal subject csubj - clausal subject• csubjpass - passive clausal subject cc - coordination conj - conjunct expl - expletive (expletive “there”) mod - modifier abbrev - abbreviation modifier amod - adjectival modifier appos - appositional modifier advcl - adverbial clause modifier purpcl - purpose clause modifier det - determiner predet - predeterminer preconj - preconjunct infmod - infinitival modifier mwe - multi-word expression modifier partmod - participial modifier advmod - adverbial modifier neg - negation modifier rcmod - relative clause modifier quantmod - quantifier modifier nn - noun compound modifier npadvmod - noun phrase adverbial modifier tmod - temporal modifier num - numeric modifier number - element of compound number prep - prepositional modifier poss - possession modifier possessive - possessive modifier (’s) prt - phrasal verb particle parataxis - parataxis punct - punctuation ref - referent sdep - semantic dependent xsubj - controlling subject
    • 17. 17NLP Toolkits 41 different toolkits listed in Wikipedia Four of the more popular free open source (FOSS) IEtoolkitsName Language License CreatorsOpenNLP JavaApacheLicense2.0OnlinecommunityGeneral Architecture forText Engineering(GATE)Java LGPLGATE opensource communityNatural LanguageToolkit (NLTK)PythonApache2.0Team NLTKStanford NLP Java GPLThe StanfordNatural LanguageProcessing Group
    • 18. 18NLP Toolkits OpenNLP Extensive publications Corporate Sponsorship Java
    • 19. 19NLP Toolkits General Architecture for Text Engineering (GATE) Extensive publications Integrated Development Environment (IDE) to assist indevelopment Java Java Annotation Patterns Engine (JAPE)
    • 20. 20NLP Toolkits Natural Language Tool Kit (NLTK) Extensive publications Two published documentation books from O’Reilly and Packt
    • 21. 21NLP Toolkits Stanford Core NLP Extensive publications Wrappers for Perl, Python, Ruby, and Scala languages Plugins for GATE and NLTK
    • 22. 22Questions from PROSPR to answer From the hand typed Colonoscopy report: How many Polyps Location of Polyps Size of Polyps
    • 23. 23Sample Workflow Report Definition Report Sectionization Formatting the Text Process the Section Further analysis
    • 24. Report ExampleGastroenterology LaboratoryPatient Name: Susan Storm RichardsProcedure Date: 5/06/2013 15:00:15 PMMRN: 123456789Age: 60Accession #: 123456Gender: FemaleOrder #: 123456789Ethnicity:Attending MD: Victor Von Doom MDNote Status: FinalizedRoom: 666Procedure: ColonoscopyReferring MD: Reed RichardsProviders: Victor von Doom, MD (Doctor)Attending Participation: I personally performed the entire procedure.Medicines: SomeDrug 3 mg IV, OtherDrug 75 micrograms IVIndications: Screening for colorectal malignant neoplasmComplications: No immediate complications.Patient Profile: Refer to note in patient chart for documentation of history andphysical.Procedure: Pre-Anesthesia Assessment:- PLorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laborisnisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officiadeserunt mollit anim id est laborum. ASA Grade Assessment: II - Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minimveniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sintoccaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.Lorem ipsum dolor sit amet, consectetur adipisicing elit,sed do eiusmod tempor incididunt ut labore et dolore magnaaliqua. Ut enim ad minim veniam, quis nostrud exercitationullamco laboris nisi ut aliquip ex ea commodo consequat.Duis aute irure dolor in reprehenderit inesse cillum dolore eu fugiat nulla pariatur. Excepteur sintoccaecat cupidatat non proident, sunt in culpa qui officiadeserunt mollit anim id est laborum.Findings: Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrudexercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu f ugiat nullapariatur. Three pedunculated polyps were found in the mid sigmoid colon and in the proximal ascending colonThe polyps were 30 mm in size. Excepteur sint occaecat cupidatat non proident,sunt in culpa qui officia deserunt mollit anim id est laborum.Estimated Blood Loss: Estimated blood loss: none.Recommendation: - Discharge patient to home (ambulatory).- High fiber diet indefinitely.CPT(c) Code(s): --- Technical ---G0121, Colorectal cancer screening; colonoscopy on individualnot meeting criteria for high riskCPT Copyright 2010 American Medical Association. All Rights Reserved.The codes documented in this report are preliminary and upon coder review may be revisedto meet current compliance requirements.Victor von DoomVictor von Doom, MD5/6/2013 15:10This report has been signed electronically.Number of Addenda: 0
    • 25. 25SectionedFindings: Lorem ipsum dolor sit amet, consecteturadipisicing elit, sed do eiusmod tempor incididunt utlabore et dolore magna aliqua. Ut enim ad minim veniam,quis nostrud exercitation ullamco laboris nisi ut aliquip exea commodo consequat. Duis aute irure dolor inreprehenderit in voluptate velit esse cillum dolore eu fugiatnulla pariatur. Three pedunculated polyps werefound in the mid sigmoid colon and in theproximal ascending colonThe polyps were 30 mmin size. Excepteur sint occaecat cupidatat nonproident, sunt in culpa qui officia deserunt mollit anim idest laborum.
    • 26. 26Sample“ Three pedunculated polyps were found in the midsigmoid colon and in the proximal ascendingcolonThe polyps were 30 mm in size. ”
    • 27. 27Regex Formatting Text: Removing Spaces$text =~s/(bd+b)(s)(bmmb)/$1$3/g;$text =~s/(b[a-z]+)([A-Z])([a-z]+b)/$1.s$2$3/g;$text =~ s/^s+//;$text =~ s/s+$//;
    • 28. 28Formatted Sample“Three pedunculated polyps were found in the midsigmoid colon and in the proximal ascending colon.The polyps were 30mm in size.”
    • 29. NLP Information Extraction ExampleRelationship Dependencies Original sentence:Three pedunculated polyps were found in the midsigmoid colon and in the proximal ascending colon.Dependencies:num(polyps-2, Three-0) [numeric modifier]amod(polyps-2, pedunculated-1) [adjectivalmodifier]nsubjpass(found-4, polyps-2) [nominal passivesubject]auxpass(found-4, were-3) [passive auxiliary]det(colon-9, the-6) [determiner]amod(colon-9, mid-7) [adjectival modifier]nn(colon-9, sigmoid-8) [nn modifier]prep_in(found-4, colon-9) [prep_collapsed]det(proximal-13, the-12) [determiner]prep_in(found-4, proximal-13) [prep_collapsed]conj_and(colon-9, proximal-13) [conj_collapsed]partmod(proximal-13, ascending-14) [participialmodifier]dobj(ascending-14, colon-15) [direct object] Original sentence:The polyps were 30mm in size.Dependencies:det(polyps-1, The-0) [determiner]nsubj(30mm-3, polyps-1) [nominal subject]cop(30mm-3, were-2) [copula]prep_in(30mm-3, size-5) [prep_collapsed]
    • 30. 30Output“Three pedunculated polyps were found in the midsigmoid colon and in the proximal ascending colon.The polyps were 30 mm in size.”OutputNumber of Polyps: 3Size of Polyps: 30,Location of Polyps: 1,4,
    • 31. 1 use Lingua::StanfordCoreNLP;2 use Lingua::EN::Words2Nums;3 use strict;4 use warnings;5 my $pipeline = new Lingua::StanfordCoreNLP::Pipeline(1);6 my $text = "Three pedunculated polyps were found in the mid sigmoid colon and in the proximal ascending colonThe polyps were 30 mm in size.";7 $text =~s/(bd+b)(s)(bmmb)/$1$3/g;8 $text =~s/(b[a-z]+)([A-Z])([a-z]+b)/$1.s$2$3/g;9 $text =~ s/^s+//;10 $text =~ s/s+$//;11 my $result = $pipeline->process($text);12 my $polypCount;13 my $polypSize;14 my $polypLocation;15 for my $sentence (@{$result->toArray})16 {17 for my $dep (@{$sentence->getDependencies->toArray})18 {19 my $relation = $dep->getRelation,20 my $govern = $dep->getGovernor->getWord,21 my $depend = $dep->getDependent->getWord;22 my $num=words2nums($depend);2324 if(($relation eq "num")&&($govern=~/^polyp(|s)$/i))25 {26 $polypCount=$num;27 }28 if(($relation eq "nsubj")&&($govern=~/^d+mm$/)&&($depend=~/^polyp(|s)$/i))29 {30 $govern=~s/mm$//;31 $polypSize="$govern,";32 }33 if(($relation eq "nn")&&($govern=~/^colon$/i)&&($depend=~/sigmoid/i))34 {35 $polypLocation="1,";36 }37 if(($relation eq "dobj")&&($govern=~/^ascending$/i)&&($depend=~/^colon$/i))38 {39 $polypLocation.="4,";40 }41 }42 }43 print "Number of Polyps:t$polypCountn";44 print "Size of Polyps:tt$polypSizen";45 print "Location of Polyps:t$polypLocationn";Perl Example
    • 32. 32F - Score6/26/2013 Comparison against a manually curated “GoldStandard” Precision = Proportion of True Positives Recall = True Proportion of Actual Positives
    • 33. 33Questions?!

    ×