Charlie Greenbacker, founder and co-organizer of the DC NLP meetup group, provides a "crash course" in Natural Language Processing techniques and applications.
2. Agenda
• Introduction & Motivation
• Famous Examples
• Basics
• Major Task Areas
• Protips
• Resources
3. Introduction
& Motivation
By “NLP” we mean...
Natural Language Processing
(#NLProc)
aka Computational Linguistics,Text Analytics, etc.
not Neuro-linguistic Programming! (#NLP)
4. Introduction
& Motivation
Natural Language Processing is...
Using computers to process (i.e., analyze,
understand, generate, etc.) natural human
languages (e.g., English, Chinese, Klingon).
Hello, world! 你好,世界!
5. That sounds hard... why should I care?
• Most of the knowledge created by humans
is unstructured text (information overload)
• Need some way to make sense of it all
• Enable quantitative analysis of text data
Introduction
& Motivation
6. Famous Examples
Siri (Apple, SRI, Nuance)
Speech Recognition/Generation
IBM Watson
Question Answering
Google Translate
MachineTranslation
16. Major Task Areas
Assistive Technologies
• Text simplification
• Predictive text input
• Alternative interfaces
17. Major Task Areas
NLG + Automatic Summarization
• Generating text from data
• Extractive summarization
• Abstractive summarization
18. Major Task Areas
Machine Translation
• From source to target, and back!
• Single terms work... sometimes
• Idioms, metaphors, cultural references
19. Major Task Areas
Sentiment Analysis
• Polarity, intensity, direction
• "Easy" for movie/product reviews
• "Impossible" for nearly anything else
20. Protips
• Domain adaptation
(retrain your models, social media != news)
• Assume everything is in beta
(error rates compound, translate last,
consult the research literature)
• Evaluation is essential
(human judges,“gold standard” data,
cross-validation, appropriate metrics)
22. Resources
(books)
Natural Language
Processing with Python
Bird, Klein, and Loper
Speech and Language______________
Processing______________
Jurafsky and Martin______________
Foundations of Statistical
Natural Language Processing
Manning and Schütze