NLP “Crash Course”
Charlie Greenbacker
dcnlp.org
Agenda
• Introduction & Motivation
• Famous Examples
• Basics
• Major Task Areas
• Protips
• Resources
Introduction
& Motivation
By “NLP” we mean...
Natural Language Processing
(#NLProc)
aka Computational Linguistics,Text Analytics, etc.
not Neuro-linguistic Programming! (#NLP)
Introduction
& Motivation
Natural Language Processing is...
Using computers to process (i.e., analyze,
understand, generate, etc.) natural human
languages (e.g., English, Chinese, Klingon).
Hello, world! 你好,世界!
That sounds hard... why should I care?
• Most of the knowledge created by humans
is unstructured text (information overload)
• Need some way to make sense of it all
• Enable quantitative analysis of text data
Introduction
& Motivation
Famous Examples
Siri (Apple, SRI, Nuance)
Speech Recognition/Generation
IBM Watson
Question Answering
Google Translate
MachineTranslation
Basics
• Segmentation
• Part-of-speech tagging
• Noun phrase (NP) chunking
• Parsing
• Word sense disambiguation
Basics
• Stop words, stemming/lemmatization
• Frequency analysis
(terms, ngrams,TF-IDF)
• Machine learning (classification,
clustering, recommendation)
Major Task Areas
Question Answering
• Match query with knowledge base
• Closed domain vs open domain
• Reasoning about intent of question
Major Task Areas
Speech Recognition
• Speech to text
• Trained/untrained user models
• Voice-based interfaces
Major Task Areas
Named Entity Recognition
• Entity extraction
• Persons, organizations, location
• Grammar, syntax, phrasing
Major Task Areas
Entity Resolution
• Linking names to ground truth
• Disambiguating similar names
Major Task Areas
Co-reference Resolution
• Finding antecedents for pronouns
• Name resolution
Major Task Areas
Relationship Extraction
• Attribute values
• SVO triples
• Populating ontologies
Major Task Areas
Information Retrieval
• Query expansion
• Relevancy of results
• “More like this”
Major Task Areas
Assistive Technologies
• Text simplification
• Predictive text input
• Alternative interfaces
Major Task Areas
NLG + Automatic Summarization
• Generating text from data
• Extractive summarization
• Abstractive summarization
Major Task Areas
Machine Translation
• From source to target, and back!
• Single terms work... sometimes
• Idioms, metaphors, cultural references
Major Task Areas
Sentiment Analysis
• Polarity, intensity, direction
• "Easy" for movie/product reviews
• "Impossible" for nearly anything else
Protips
• Domain adaptation
(retrain your models, social media != news)
• Assume everything is in beta
(error rates compound, translate last,
consult the research literature)
• Evaluation is essential
(human judges,“gold standard” data,
cross-validation, appropriate metrics)
Resources
(toolkits)
Stanford CoreNLP
Java, GPL
Apache OpenNLP
Java,Apache License
NLTK
Python,Apache License
Resources
(books)
Natural Language
Processing with Python
Bird, Klein, and Loper
Speech and Language______________
Processing______________
Jurafsky and Martin______________
Foundations of Statistical
Natural Language Processing
Manning and SchĂźtze
Resources
(groups)
ACL (Association for
Computational Linguistics)
Conferences,Workshops, Journals, SIGs
DC NLP
NLP Meetups
Data Community DC
NLPWorkshops
Questions?
Charlie Greenbacker
dcnlp.org
@greenbacker

Natural Language Processing Crash Course