Natural Language Processing Crash Course

1,599 views

Published on

Charlie Greenbacker, founder and co-organizer of the DC NLP meetup group, provides a "crash course" in Natural Language Processing techniques and applications.

Natural Language Processing Crash Course

  1. 1. NLP “Crash Course” Charlie Greenbacker dcnlp.org
  2. 2. Agenda • Introduction & Motivation • Famous Examples • Basics • Major Task Areas • Protips • Resources
  3. 3. Introduction & Motivation By “NLP” we mean... Natural Language Processing (#NLProc) aka Computational Linguistics,Text Analytics, etc. not Neuro-linguistic Programming! (#NLP)
  4. 4. Introduction & Motivation Natural Language Processing is... Using computers to process (i.e., analyze, understand, generate, etc.) natural human languages (e.g., English, Chinese, Klingon). Hello, world! 你好,世界!
  5. 5. That sounds hard... why should I care? • Most of the knowledge created by humans is unstructured text (information overload) • Need some way to make sense of it all • Enable quantitative analysis of text data Introduction & Motivation
  6. 6. Famous Examples Siri (Apple, SRI, Nuance) Speech Recognition/Generation IBM Watson Question Answering Google Translate MachineTranslation
  7. 7. Basics • Segmentation • Part-of-speech tagging • Noun phrase (NP) chunking • Parsing • Word sense disambiguation
  8. 8. Basics • Stop words, stemming/lemmatization • Frequency analysis (terms, ngrams,TF-IDF) • Machine learning (classification, clustering, recommendation)
  9. 9. Major Task Areas Question Answering • Match query with knowledge base • Closed domain vs open domain • Reasoning about intent of question
  10. 10. Major Task Areas Speech Recognition • Speech to text • Trained/untrained user models • Voice-based interfaces
  11. 11. Major Task Areas Named Entity Recognition • Entity extraction • Persons, organizations, location • Grammar, syntax, phrasing
  12. 12. Major Task Areas Entity Resolution • Linking names to ground truth • Disambiguating similar names
  13. 13. Major Task Areas Co-reference Resolution • Finding antecedents for pronouns • Name resolution
  14. 14. Major Task Areas Relationship Extraction • Attribute values • SVO triples • Populating ontologies
  15. 15. Major Task Areas Information Retrieval • Query expansion • Relevancy of results • “More like this”
  16. 16. Major Task Areas Assistive Technologies • Text simplification • Predictive text input • Alternative interfaces
  17. 17. Major Task Areas NLG + Automatic Summarization • Generating text from data • Extractive summarization • Abstractive summarization
  18. 18. Major Task Areas Machine Translation • From source to target, and back! • Single terms work... sometimes • Idioms, metaphors, cultural references
  19. 19. Major Task Areas Sentiment Analysis • Polarity, intensity, direction • "Easy" for movie/product reviews • "Impossible" for nearly anything else
  20. 20. Protips • Domain adaptation (retrain your models, social media != news) • Assume everything is in beta (error rates compound, translate last, consult the research literature) • Evaluation is essential (human judges,“gold standard” data, cross-validation, appropriate metrics)
  21. 21. Resources (toolkits) Stanford CoreNLP Java, GPL Apache OpenNLP Java,Apache License NLTK Python,Apache License
  22. 22. Resources (books) Natural Language Processing with Python Bird, Klein, and Loper Speech and Language______________ Processing______________ Jurafsky and Martin______________ Foundations of Statistical Natural Language Processing Manning and Schütze
  23. 23. Resources (groups) ACL (Association for Computational Linguistics) Conferences,Workshops, Journals, SIGs DC NLP NLP Meetups Data Community DC NLPWorkshops
  24. 24. Questions? Charlie Greenbacker dcnlp.org @greenbacker

×