Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
William McKnight
www.mcknightcg.com
214-514-1444
Natural Language Processing
Strategies
#AdvAnalytics
@williammcknight
Language
• There are 6000+ distinct languages on Earth
• Languages spread and shrink
• English is especially difficult
Gra...
Computers are confused by language
• So NLP must incorporate
– Linguistics
– Theoretical Computer Science
– Math
– Statist...
Linguistics
• Words have
– Intention (goals, shared knowledge, beliefs)
– Generation
– Synthetization
• Understanding is
–...
Analyzing Language Data
• Need Text Analysis and Natural Language Processing
• Text Analysis: Text mining or text analytic...
Garden Path Sentences
• Don’t bother going.
• Don’t bother going early.
• Meet me at five.
• Meet me at five to four.
• Th...
Consider the News
(CNN) — Researchers in Canada have released new images of a
remarkably well-preserved shipwreck that wil...
Where does text come from?
• Internet chat, blogs, reviews, wikis, scientific papers,
medical records, books
– All present...
Natural Language Processing is the study of the
computational treatment of natural language
NLP: 2 Sides
• Understanding
– Mapping the given input in natural language into useful
representations
– Analyzing differe...
Enterprise Applications of NLP 1/3
– Querying Image Content
– Customer Service and Marketing Virtual Digital
Assistants
– ...
Enterprise Applications of NLP 2/3
• Customer service
– NLP technologies today are smart enough to transcribe and analyze
...
Enterprise Applications of NLP 3/3
• Personalized Advertising
– Traditionally, enterprises have relied upon demographics a...
NLP is Getting Better
• Translation Accuracy
• Self-Supervised Pre-Training
• Speech recognition
• Natural language interp...
Widely used NLP Benchmarks
• GLUE
• RACE
• SuperGlue
Steps in NLP
• Tokenization
• Stemming
• Lemmatization
• Part of Speech Tagging
• Named Entity Recognition
• Chunking
Tokenization
• The process of segmenting running text into words and
sentences.
• Text needs to be segmented into linguist...
Steps in Tokenization
• Segmenting Text into Words
• Handling Abbreviations
• Handling Hyphenated Words
• Numerical and sp...
Stemming and Lemmatization
• The goal of both stemming and lemmatization is to reduce
inflectional forms and sometimes der...
Part of Speech Tagging
• Part-of-speech tagging (POS tagging) is the task of tagging
a word in a text with its part of spe...
POS Tagging
• The runner is preparing to start his last race.
• Start = verb or noun?
• Last = noun or adjective?
• Race =...
Named Entity Recognition
• Named Entity Recognition is a process where an algorithm
takes a string of text (sentence or pa...
Parse Tree
Datascience.stackexchange.com
Named Entity Recognition Output
Chunking
• Chunking is also called shallow parsing or hierarchy of ideas
• Chunking is a process of extracting phrases fro...
Chunking Challenge Examples
• Joe ate chicken with waffles.
• Joe ate chicken with Mary.
• Joe ate chicken with a knife.
•...
NLP Open Source Libraries
• spaCy
• Textacy
• Neuralcoref
Build or Buy NLP
• Building your own NLP system from the ground up:
– Need engineer with NLP skills + other developers
– C...
NLP Vendors
• AppZen
• Automated Insights
• Cogito
• Lexalytics
• Luminoso
• M*Modal
• SmartLogic
• SyTrue
• Woebot
• Does not fit neatly into tabular relational databases
• The most common use case for the data is agile data
discovery ac...
NLP …
• Reduces the gap between human
and machine communication
• Automates processes and creates
operational efficiency
•...
Second Thursday of
Every Month, at 2:00 ET
Presented by: William McKnight
President, McKnight Consulting Group
www.mcknigh...
Upcoming SlideShare
Loading in …5
×

ADV Slides: Natural Language Processing Strategies

212 views

Published on

In this webinar, we will introduce natural language processing (NLP) to the data professional who has a use case for NLP or would like to fit NLP into their environment. NLP is about the deep understanding of human language communication. This webinar is an introduction to the capabilities of NLP, an approach that is utilized across many different use cases, including computer-assisted coding, speech recognition, and machine translation.

This webinar introduces the concept of natural language processing, the steps in NLP, the challenges of NLP, and the core algorithms of word vectors, recurrent and recursive neural networks, and convolutional neural networks.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

ADV Slides: Natural Language Processing Strategies

  1. 1. William McKnight www.mcknightcg.com 214-514-1444 Natural Language Processing Strategies #AdvAnalytics @williammcknight
  2. 2. Language • There are 6000+ distinct languages on Earth • Languages spread and shrink • English is especially difficult Graphic credit: Minna Sundberg
  3. 3. Computers are confused by language • So NLP must incorporate – Linguistics – Theoretical Computer Science – Math – Statistics – Artificial Intelligence – Psychology
  4. 4. Linguistics • Words have – Intention (goals, shared knowledge, beliefs) – Generation – Synthetization • Understanding is – Perception – Interpretation – Incorporation
  5. 5. Analyzing Language Data • Need Text Analysis and Natural Language Processing • Text Analysis: Text mining or text analytics is the process of deriving meaningful information from natural language • Natural language processing refers to the artificial intelligence methods of communicating better intelligence using the natural language
  6. 6. Garden Path Sentences • Don’t bother going. • Don’t bother going early. • Meet me at five. • Meet me at five to four. • The old man the boat. • The prime number few. • The man whistling tunes pianos. • The complex houses married and single soldiers and their families.
  7. 7. Consider the News (CNN) — Researchers in Canada have released new images of a remarkably well-preserved shipwreck that will shed new light on the ill- fated 1845 Arctic expedition in which famed British explorer John Franklin died. The wreck of HMS Terror has effectively been "frozen in time" thanks to the cold, deep waters of Terror Bay in Nunavut, Canada, and a layer of silt which has preserved artifacts such as maps, logs and scientific instruments, according to a study by Parks Canada in conjunction with Inuit researchers. HMS Terror and HMS Erebus set off from England in 1845 in search of a route across the North-West Passage but got stuck in sea ice, forcing the 129 crew members to abandon ship in 1848. The men died one by one attempting to walk to safety across the Arctic.
  8. 8. Where does text come from? • Internet chat, blogs, reviews, wikis, scientific papers, medical records, books – All present specific challenges
  9. 9. Natural Language Processing is the study of the computational treatment of natural language
  10. 10. NLP: 2 Sides • Understanding – Mapping the given input in natural language into useful representations – Analyzing different aspects of the language • Generation – Text planning − Retrieving the relevant content from knowledge base – Sentence planning − Choosing required words, forming meaningful phrases, setting tone of the sentence – Text Realization − Mapping sentence plan into sentence structure
  11. 11. Enterprise Applications of NLP 1/3 – Querying Image Content – Customer Service and Marketing Virtual Digital Assistants – Patent Research and Analysis – Automated Report Generation – Patient Data Processing – Converting Paperwork into Digital Data – Automated Code Development – Contract Analysis – Automated CliffsNotes, Study Notes, and Quiz Generation – Intelligent Recruitment and Human Resources Systems – Sentiment Analysis – Healthcare Virtual Digital Assistants – Sentiment Analysis for Psychoanalysis – Business Application Virtual Digital Assistants – E-Commerce and Sales Virtual Digital Assistants – Banking and Financial Services – Automating Food and Beverage Ordering – Social Media Feed Curation – Language Translation Services – Predictive Typing Assistant – Education for Autistic and Speech Deficient Children – Automated Grading – Text Classification and Mining for Biomedical Literature – Mining, Processing, and Making Sense of Clinical Notes – Film Script Analysis – Dialect Classification – Hospital Patient Management System – Real-Time News Analysis and Competitive Intelligence – Automated Tour Guide and Itinerary Service
  12. 12. Enterprise Applications of NLP 2/3 • Customer service – NLP technologies today are smart enough to transcribe and analyze the massive recorded call data that enterprise databases contain. – The most prominent applications of NLP are in customer support. • Reputation management – Social media platforms have become important – Consumers actively participate in reviewing their brand experiences and posting interactions with businesses – Analyze content across social media platforms and tell you the sentiment being conveyed about your brand — positive, negative, or neutral. – Provide real-time updates available in dashboards
  13. 13. Enterprise Applications of NLP 3/3 • Personalized Advertising – Traditionally, enterprises have relied upon demographics and psychographic variables to segment their markets for targeted advertising – Search engine browsing and social media activity – Identifying patterns in unstructured data spread across several web platforms – Segment users into highly nuanced groups, called personas • Market and Product intelligence – “Event extraction” is an NLP technique that parses information to mine information about specific events – Mergers and acquisitions, key takeovers, changes in the board of directors, key job role changes — any kind of event can be identified by an NLP algorithm – This can create a structured database of event information about companies, which is invaluable for an enterprise
  14. 14. NLP is Getting Better • Translation Accuracy • Self-Supervised Pre-Training • Speech recognition • Natural language interpretation • Machine translation • Sentiment analysis
  15. 15. Widely used NLP Benchmarks • GLUE • RACE • SuperGlue
  16. 16. Steps in NLP • Tokenization • Stemming • Lemmatization • Part of Speech Tagging • Named Entity Recognition • Chunking
  17. 17. Tokenization • The process of segmenting running text into words and sentences. • Text needs to be segmented into linguistic units such as words, punctuation, numbers, alphanumeric, etc. • In English, words are often separated from each other by blanks (white space), but not all white space is equal. • Tokenization is an identification of basic units to be processed. • The identification of units that do not need to be further decomposed for subsequent processing is an extremely important one.
  18. 18. Steps in Tokenization • Segmenting Text into Words • Handling Abbreviations • Handling Hyphenated Words • Numerical and special expressions
  19. 19. Stemming and Lemmatization • The goal of both stemming and lemmatization is to reduce inflectional forms and sometimes derivationally related forms of a word to a common base form • Stemming refers to a crude heuristic process that chops off the ends of words in the hope of achieving this goal correctly most of the time • Lemmatization refers to doing things properly with the use of a vocabulary and morphological analysis of words, normally aiming to remove inflectional endings only and to return the base or dictionary form of a word, which is known as the lemma
  20. 20. Part of Speech Tagging • Part-of-speech tagging (POS tagging) is the task of tagging a word in a text with its part of speech. • A part of speech is a category of words with similar grammatical properties. • Common English parts of speech are noun, verb, adjective, adverb, pronoun, preposition, conjunction, etc.
  21. 21. POS Tagging • The runner is preparing to start his last race. • Start = verb or noun? • Last = noun or adjective? • Race = verb or noun?
  22. 22. Named Entity Recognition • Named Entity Recognition is a process where an algorithm takes a string of text (sentence or paragraph) as input and identifies relevant nouns (people, places, and organizations) that are mentioned in that string. • Named Entity Recognition can automatically scan entire articles, twitter, research, etc. and reveal which are the major people, organizations, and places discussed in them.
  23. 23. Parse Tree Datascience.stackexchange.com
  24. 24. Named Entity Recognition Output
  25. 25. Chunking • Chunking is also called shallow parsing or hierarchy of ideas • Chunking is a process of extracting phrases from unstructured text
  26. 26. Chunking Challenge Examples • Joe ate chicken with waffles. • Joe ate chicken with Mary. • Joe ate chicken with a knife. • Joe ate chicken with fear.
  27. 27. NLP Open Source Libraries • spaCy • Textacy • Neuralcoref
  28. 28. Build or Buy NLP • Building your own NLP system from the ground up: – Need engineer with NLP skills + other developers – Cost: $x00,000+ – Time: months to years – Usefulness: limited without major additional work • Working with an experienced NLP vendor: – Cost: $x0,000 (basic text analytics and visualization) to low $x00,000+ (semi-custom NLP application) – Time: weeks to months – Usefulness: customized to your specific needs
  29. 29. NLP Vendors • AppZen • Automated Insights • Cogito • Lexalytics • Luminoso • M*Modal • SmartLogic • SyTrue • Woebot
  30. 30. • Does not fit neatly into tabular relational databases • The most common use case for the data is agile data discovery across an enterprise – Text Analytics/NLP • Look for – Search capabilities – Data management – quick ingest, no modeling required, secure connections, easy self-service mashups, query operations – Deployment options Data for NLP
  31. 31. NLP … • Reduces the gap between human and machine communication • Automates processes and creates operational efficiency • Pushes the barriers of data analysis by bringing unstructured data into play • Extends the capability of existing business intelligence assets in the enterprise
  32. 32. Second Thursday of Every Month, at 2:00 ET Presented by: William McKnight President, McKnight Consulting Group www.mcknightcg.com (214) 514-1444 #AdvAnalytics

×