Your SlideShare is downloading. ×
0
Natural Language Processing Crash Course
Natural Language Processing Crash Course
Natural Language Processing Crash Course
Natural Language Processing Crash Course
Natural Language Processing Crash Course
Natural Language Processing Crash Course
Natural Language Processing Crash Course
Natural Language Processing Crash Course
Natural Language Processing Crash Course
Natural Language Processing Crash Course
Natural Language Processing Crash Course
Natural Language Processing Crash Course
Natural Language Processing Crash Course
Natural Language Processing Crash Course
Natural Language Processing Crash Course
Natural Language Processing Crash Course
Natural Language Processing Crash Course
Natural Language Processing Crash Course
Natural Language Processing Crash Course
Natural Language Processing Crash Course
Natural Language Processing Crash Course
Natural Language Processing Crash Course
Natural Language Processing Crash Course
Natural Language Processing Crash Course
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Natural Language Processing Crash Course

614

Published on

Charlie Greenbacker, founder and co-organizer of the DC NLP meetup group, provides a "crash course" in Natural Language Processing techniques and applications.

Charlie Greenbacker, founder and co-organizer of the DC NLP meetup group, provides a "crash course" in Natural Language Processing techniques and applications.

0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
614
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
47
Comments
0
Likes
5
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. NLP “Crash Course” Charlie Greenbacker dcnlp.org
  • 2. Agenda • Introduction & Motivation • Famous Examples • Basics • Major Task Areas • Protips • Resources
  • 3. Introduction & Motivation By “NLP” we mean... Natural Language Processing (#NLProc) aka Computational Linguistics,Text Analytics, etc. not Neuro-linguistic Programming! (#NLP)
  • 4. Introduction & Motivation Natural Language Processing is... Using computers to process (i.e., analyze, understand, generate, etc.) natural human languages (e.g., English, Chinese, Klingon). Hello, world! 你好,世界!
  • 5. That sounds hard... why should I care? • Most of the knowledge created by humans is unstructured text (information overload) • Need some way to make sense of it all • Enable quantitative analysis of text data Introduction & Motivation
  • 6. Famous Examples Siri (Apple, SRI, Nuance) Speech Recognition/Generation IBM Watson Question Answering Google Translate MachineTranslation
  • 7. Basics • Segmentation • Part-of-speech tagging • Noun phrase (NP) chunking • Parsing • Word sense disambiguation
  • 8. Basics • Stop words, stemming/lemmatization • Frequency analysis (terms, ngrams,TF-IDF) • Machine learning (classification, clustering, recommendation)
  • 9. Major Task Areas Question Answering • Match query with knowledge base • Closed domain vs open domain • Reasoning about intent of question
  • 10. Major Task Areas Speech Recognition • Speech to text • Trained/untrained user models • Voice-based interfaces
  • 11. Major Task Areas Named Entity Recognition • Entity extraction • Persons, organizations, location • Grammar, syntax, phrasing
  • 12. Major Task Areas Entity Resolution • Linking names to ground truth • Disambiguating similar names
  • 13. Major Task Areas Co-reference Resolution • Finding antecedents for pronouns • Name resolution
  • 14. Major Task Areas Relationship Extraction • Attribute values • SVO triples • Populating ontologies
  • 15. Major Task Areas Information Retrieval • Query expansion • Relevancy of results • “More like this”
  • 16. Major Task Areas Assistive Technologies • Text simplification • Predictive text input • Alternative interfaces
  • 17. Major Task Areas NLG + Automatic Summarization • Generating text from data • Extractive summarization • Abstractive summarization
  • 18. Major Task Areas Machine Translation • From source to target, and back! • Single terms work... sometimes • Idioms, metaphors, cultural references
  • 19. Major Task Areas Sentiment Analysis • Polarity, intensity, direction • "Easy" for movie/product reviews • "Impossible" for nearly anything else
  • 20. Protips • Domain adaptation (retrain your models, social media != news) • Assume everything is in beta (error rates compound, translate last, consult the research literature) • Evaluation is essential (human judges,“gold standard” data, cross-validation, appropriate metrics)
  • 21. Resources (toolkits) Stanford CoreNLP Java, GPL Apache OpenNLP Java,Apache License NLTK Python,Apache License
  • 22. Resources (books) Natural Language Processing with Python Bird, Klein, and Loper Speech and Language______________ Processing______________ Jurafsky and Martin______________ Foundations of Statistical Natural Language Processing Manning and Schütze
  • 23. Resources (groups) ACL (Association for Computational Linguistics) Conferences,Workshops, Journals, SIGs DC NLP NLP Meetups Data Community DC NLPWorkshops
  • 24. Questions? Charlie Greenbacker dcnlp.org @greenbacker

×