SlideShare a Scribd company logo
1 of 35
600.465 Connecting the dots - I(NLP in Practice) Delip Rao delip@jhu.edu
What is “Text”?
What is “Text”?
What is “Text”?
“Real” World Tons of data on the web A lot of it is text In many languages In many genres Language by itself is complex.  The Web further complicates language.
But we have 600.465 ,[object Object]
1. Formalize some insights
2. Study the formalism mathematically
3. Develop & implement algorithms
4. Test on real dataForward Backward,  Gradient Descent, LBFGS, Simulated Annealing, Contrastive Estimation, … feature functions! f(wi = off, wi+1 = the) f(wi = obama, yi = NP) Adapted from : Jason Eisner
NLP for fun and profit Making NLP more accessible Provide APIs for common NLP tasks vartext = document.get(…); varentities = agent.markNE(text); Big $$$$ Backend to intelligent processing of text
Desideratum: Multilinguality Except for feature extraction, systems should be language agnostic
In this lecture Understand how to solve and ace in NLP tasks Learn general methodology or approaches End-to-End development using an example task Overview of (un)common NLP tasks
Case study: Named Entity Recognition
Case study: Named Entity Recognition Demo: http://viewer.opencalais.com ,[object Object]
How do we find out well we are doing?
How can we improve?,[object Object]
Case study: Named Entity Recognition Collect data to learn from Sentences with words marked as PER, ORG, LOC, NONE How do we get this data?
Pay the experts
Wisdom of the crowds
Getting the data: Annotation Time consuming Costs $$$ Need for quality control Inter-annotator aggreement Kappa score (Kippendorf, 1980) Smarter ways to annotate Get fewer annotations: Active Learning Rationales (Zaidan, Eisner & Piatko, 2007)
Only France and Great Britain backed Fischler ‘s proposal . Only France and Great Britain backed Fischler‘s proposal . Input x Labels y
[object Object]
2. Study the formalism mathematically
3. Develop & implement algorithms
4. Test on real dataOur recipe …
NER: Designing features Need to segment sentences Tokenize the sentences Preprocessing Not as trivial as you think Original text itself might be in an ugly HTML Cleaneval!
NER: Designing features
NER: Designing features
NER: Designing features
NER: Designing features
NER: Designing features These are extracted during preprocessing!
NER: Designing features
NER: Designing features

More Related Content

Similar to NLP in Practice - Part I

Gadgets pwn us? A pattern language for CALL
Gadgets pwn us? A pattern language for CALLGadgets pwn us? A pattern language for CALL
Gadgets pwn us? A pattern language for CALLLawrie Hunter
 
Applications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignAnubhav Jain
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibilityc.titus.brown
 
An-Exploration-of-scientific-literature-using-Natural-Language-Processing
An-Exploration-of-scientific-literature-using-Natural-Language-ProcessingAn-Exploration-of-scientific-literature-using-Natural-Language-Processing
An-Exploration-of-scientific-literature-using-Natural-Language-ProcessingTheodore J. LaGrow
 
Natural Language Processing for Irish
Natural Language Processing for IrishNatural Language Processing for Irish
Natural Language Processing for IrishTeresa Lynn
 
Week1- Introduction.pptx
Week1- Introduction.pptxWeek1- Introduction.pptx
Week1- Introduction.pptxfahmi324663
 
Identify Development Pains and Resolve Them with Idea Flow
Identify Development Pains and Resolve Them with Idea FlowIdentify Development Pains and Resolve Them with Idea Flow
Identify Development Pains and Resolve Them with Idea FlowTechWell
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language ProcessingMichel Bruley
 
NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA DATASCIENCE
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPMENGSAYLOEM1
 
Deitel® SerHow To Program SeriesC How to Program.docx
Deitel® SerHow To Program SeriesC How to Program.docxDeitel® SerHow To Program SeriesC How to Program.docx
Deitel® SerHow To Program SeriesC How to Program.docxsimonithomas47935
 
IRJET- Querying Database using Natural Language Interface
IRJET-  	  Querying Database using Natural Language InterfaceIRJET-  	  Querying Database using Natural Language Interface
IRJET- Querying Database using Natural Language InterfaceIRJET Journal
 
Text-mining and Automation
Text-mining and AutomationText-mining and Automation
Text-mining and Automationbenosteen
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition TechnologyAamir-sheriff
 
Storytelling for research software engineers
Storytelling for research software engineersStorytelling for research software engineers
Storytelling for research software engineersAlbanLevy
 
Natural language processing: feature extraction
Natural language processing: feature extractionNatural language processing: feature extraction
Natural language processing: feature extractionGabriel Hamilton
 
Problem-based Learning & Resource-based Learning two complementary approac...
Problem-based Learning & Resource-based Learning  two complementary approac...Problem-based Learning & Resource-based Learning  two complementary approac...
Problem-based Learning & Resource-based Learning two complementary approac...Wilco te Winkel
 

Similar to NLP in Practice - Part I (20)

Gadgets pwn us? A pattern language for CALL
Gadgets pwn us? A pattern language for CALLGadgets pwn us? A pattern language for CALL
Gadgets pwn us? A pattern language for CALL
 
Applications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and Design
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibility
 
An-Exploration-of-scientific-literature-using-Natural-Language-Processing
An-Exploration-of-scientific-literature-using-Natural-Language-ProcessingAn-Exploration-of-scientific-literature-using-Natural-Language-Processing
An-Exploration-of-scientific-literature-using-Natural-Language-Processing
 
Natural Language Processing for Irish
Natural Language Processing for IrishNatural Language Processing for Irish
Natural Language Processing for Irish
 
Let's pretend
Let's pretendLet's pretend
Let's pretend
 
Week1- Introduction.pptx
Week1- Introduction.pptxWeek1- Introduction.pptx
Week1- Introduction.pptx
 
Identify Development Pains and Resolve Them with Idea Flow
Identify Development Pains and Resolve Them with Idea FlowIdentify Development Pains and Resolve Them with Idea Flow
Identify Development Pains and Resolve Them with Idea Flow
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language Processing
 
2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
 
2013 arizona-swc
2013 arizona-swc2013 arizona-swc
2013 arizona-swc
 
NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLP
 
Deitel® SerHow To Program SeriesC How to Program.docx
Deitel® SerHow To Program SeriesC How to Program.docxDeitel® SerHow To Program SeriesC How to Program.docx
Deitel® SerHow To Program SeriesC How to Program.docx
 
IRJET- Querying Database using Natural Language Interface
IRJET-  	  Querying Database using Natural Language InterfaceIRJET-  	  Querying Database using Natural Language Interface
IRJET- Querying Database using Natural Language Interface
 
Text-mining and Automation
Text-mining and AutomationText-mining and Automation
Text-mining and Automation
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Storytelling for research software engineers
Storytelling for research software engineersStorytelling for research software engineers
Storytelling for research software engineers
 
Natural language processing: feature extraction
Natural language processing: feature extractionNatural language processing: feature extraction
Natural language processing: feature extraction
 
Problem-based Learning & Resource-based Learning two complementary approac...
Problem-based Learning & Resource-based Learning  two complementary approac...Problem-based Learning & Resource-based Learning  two complementary approac...
Problem-based Learning & Resource-based Learning two complementary approac...
 

Recently uploaded

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 

Recently uploaded (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

NLP in Practice - Part I

  • 1. 600.465 Connecting the dots - I(NLP in Practice) Delip Rao delip@jhu.edu
  • 2.
  • 6. “Real” World Tons of data on the web A lot of it is text In many languages In many genres Language by itself is complex. The Web further complicates language.
  • 7.
  • 9. 2. Study the formalism mathematically
  • 10. 3. Develop & implement algorithms
  • 11. 4. Test on real dataForward Backward, Gradient Descent, LBFGS, Simulated Annealing, Contrastive Estimation, … feature functions! f(wi = off, wi+1 = the) f(wi = obama, yi = NP) Adapted from : Jason Eisner
  • 12. NLP for fun and profit Making NLP more accessible Provide APIs for common NLP tasks vartext = document.get(…); varentities = agent.markNE(text); Big $$$$ Backend to intelligent processing of text
  • 13. Desideratum: Multilinguality Except for feature extraction, systems should be language agnostic
  • 14. In this lecture Understand how to solve and ace in NLP tasks Learn general methodology or approaches End-to-End development using an example task Overview of (un)common NLP tasks
  • 15. Case study: Named Entity Recognition
  • 16.
  • 17. How do we find out well we are doing?
  • 18.
  • 19. Case study: Named Entity Recognition Collect data to learn from Sentences with words marked as PER, ORG, LOC, NONE How do we get this data?
  • 21. Wisdom of the crowds
  • 22. Getting the data: Annotation Time consuming Costs $$$ Need for quality control Inter-annotator aggreement Kappa score (Kippendorf, 1980) Smarter ways to annotate Get fewer annotations: Active Learning Rationales (Zaidan, Eisner & Piatko, 2007)
  • 23. Only France and Great Britain backed Fischler ‘s proposal . Only France and Great Britain backed Fischler‘s proposal . Input x Labels y
  • 24.
  • 25. 2. Study the formalism mathematically
  • 26. 3. Develop & implement algorithms
  • 27. 4. Test on real dataOur recipe …
  • 28. NER: Designing features Need to segment sentences Tokenize the sentences Preprocessing Not as trivial as you think Original text itself might be in an ugly HTML Cleaneval!
  • 33. NER: Designing features These are extracted during preprocessing!
  • 36. NER: Designing features Can you think of other features? HAS_DIGITS IS_HYPHENATED IS_ALLCAPS FREQ_WORD RARE_WORD USEFUL_UNIGRAM_PER USEFUL_BIGRAM_PER USEFUL_UNIGRAM_LOC USEFUL_BIGRAM_LOC USEFUL_UNIGRAM_ORG USEFUL_BIGRAM_ORG USEFUL_SUFFIX_PER USEFUL_SUFFIX_LOC USEFUL_SUFFIX_ORG WORD PREV_WORD NEXT_WORD PREV_BIGRAM NEXT_BIGRAM POS PREV_POS NEXT_POS PREV_POS_BIGRAM NEXT_POS_BIGRAM IN_LEXICON_PER IN_LEXICON_LOC IN_LEXICON_ORG IS_CAPITALIZED
  • 37. Case: Named Entity Recognition Evaluation Metrics Token accuracy: What percent of the tokens got labeled correctly Problem with accuracy Precision-Recall-F president O Barack B-PER Obama O
  • 38. NER: How can we improve? Engineer better features Design better models Conditional Random Fields Y1 Y2 Y3 Y4 x1 x2 x3 x4
  • 39. NER: How else can we improve? Unlabeled data! example from Jerry Zhu
  • 40. NER : Challenges Domain transfer WSJ NYT WSJ  Blogs ?? WSJ  Twitter ??!? Tough nut: Organizations Non textual data? Entity Extraction is a Boring Solved Problem – or is it? (Vilain, Su and Lubar, 2007)
  • 41. NER: Related application Extracting real estate information from Criagslist Ads Our oversized one, two and three bedroom apartment homes with floor plans featuring 1 and 2 baths offer space unlike any competition. Relax and enjoy the views from your own private balcony or patio, or feel free to entertain, with plenty of space in your large living room, dining area and eat-in kitchen. The lovely pool and sun deck make summer fun a splash. Our location makes commuting a breeze – Near MTA bus lines, the Metro station, major shopping areas, and for the little ones, an elementary school is right next door. Our oversized one, two and three bedroom apartment homes with floor plans featuring 1 and 2 baths offer space unlike any competition. Relax and enjoy the views from your own private balcony or patio, or feel free to entertain, with plenty of space in your large living room, dining area and eat-inkitchen. The lovely pool and sun deck make summer fun a splash. Our location makes commuting a breeze – Near MTA bus lines, the Metro station, major shopping areas, and for the little ones, an elementary school is right next door.
  • 42. NER: Related Application BioNLP: Annotation of chemical entities Corbet, Batchelor & Teufel, 2007
  • 43. Shared Tasks: NLP in practice Shared Task Everybody works on a (mostly) common dataset Evaluation measures are defined Participants get ranked on the evaluation measures Advance the state of the art Set benchmarks Tasks involve common hard problems or new interesting problems