Successfully reported this slideshow.
Your SlideShare is downloading. ×

Natural Language Processing, Techniques, Current Trends and Applications in Industry

More Related Content

Related Books

Free with a 30 day trial from Scribd

See all

Natural Language Processing, Techniques, Current Trends and Applications in Industry

  1. 1. Natural Language Processing: Techniques, Current Trends and Applications in Industry Rajkiran Veluri
  2. 2. What we will cover • We will cover some of the common techniques used by NLP practitioners • We will discuss some interesting research trends • We will discuss a few industry cases to illustrate the potential of NLP • Natural Language Processing is a very wide discipline. Hence, we may not be able to cover the entire spectrum of NLP.
  3. 3. What is NLP • Methods and Techniques that enable machines to analyse and understand natural (human) language. Involves the following concepts: • Understanding language • Reasoning about language • Generating language • Translating language NATURAL LANGUAGE PROCESSING
  5. 5. NLP: Main Components • Morphology: Analysis and description of the structure of words . Morpheme: smallest linguistic word with semantic meaning Examples: un,install . Lexeme: unit that corresponds to set of forms taken by a word: Examples install -> install, installed, installation,installing • Lexicon: A particular meaning or properties associated with a single word • Syntax: The structure and order in which words can be combined to form sentences • Semantics: Combination of morphology and syntax with lexical meaning to form the meaning of words and sentences • Pragmatics: Use of language in a particular context. • Discourse Analysis: Analysis of relationship between sentences as they occur in a sequence. Could be a monologue (one person) or dialogue (multiple people)
  6. 6. A bit of history…. • Machine Translation was one of the earliest applications(1940s). Based on a dictionary lookup a sentence in one language could be translated into another. • Machine Translation as code-breaking. A carry over of the Second World War research on code-breaking. Most important was German to English translation. Problems: Ambiguity in language was a challenge to the MT approach. • Linguistics was one of the main sources of contributions to NLP . Noam Chomsky - Generative Grammar approach to understand and generate language (1957) • Contrasting approaches to NLP : statistical and linguistic (1960s)
  7. 7. A bit of history…. • Systems (1960 - 1980) focused on Case Grammars (Linking verbs and nouns by prepositions), Augmented Transition Networks (Using knowledge of language grammars to parse sentences ) and Semantic Representations (Conceptual Dependency between parts of a sentence). Combining domain knowledge and statistical inference to design rule based systems. • Current systems (2000-Present) — Machine Learning and Deep Learning with faster CPUs, GPUs and storage. Combine linguistics and statistics in machine learning models. Research on contextual understanding and reasoning.
  8. 8. Ambiguity in Natural Language • She wore small shoes and socks. • Two interpretations for the noun modifier • Source: ? PRP VBD JJ NN CC NN
  9. 9. Ambiguity in Natural Language • Coreference Resolution: The trophy doesn’t fit into the brown suitcase because it is too [large/small]’ • Need to go beyond syntax and semantics • Source: (2018) Eisenstein J., "Natural Language Processing”, Ch 1.1, pg 3
  10. 10. pre Main components of a NLP Pipeline Sentence Detection Text Cleaning Tokenization Domain Specific Feature Extraction Stopword Removal Stemming/ Lemmatization Semantic Role Labeling (SRL) Word Sense Disambiguation Tagging Part of Speech / Dependency Modeling with Machine Learning Based / Rule Based Algorithm Downstream tasks Spell Correction Language Detection
  11. 11. Some Examples of Downstream Tasks • Named Entity Recognition (The capital of India is New Delhi -> India = Country, New Delhi = City) • Sentiment Analysis (The movie was too good, I liked it very much -> Sentiment = Positive) • Dialogue Generation (Application: Chatbots) (Example, User: I need to reset my password. ChatBot: I can certainly help you with it) • Question Answering (Example, given a snippet: Where does Bob live? -> Answer: New York) • Sentence or Document Classification (Tweets, Emails and so on) (Example, given an email -> Classification = Spam) • Machine Translation (Example: (English) where have you been? -> (German) wo bist du gewesen? • Natural Language Inference (NLI) (Sentence 1: Father and son are walking to the store. Sentence 2: Three people are walking to the store -> Inference: Contradiction) • Topic Modeling (Example, given a AirBnB dataset -> Reviews of private rooms)
  12. 12. Applications of NLP: HealthCare • Vast amounts of data is patient data generated in healthcare by clinicians, nurses and laboratory reports • A lot of this data captured in patient Electronic Health Records (EHRs). EHRs preserve historical patient information across hospital visits within a EHR system/Healthcare provider. • EHRs have lot of unstructured textual data and the format across hospital systems vary a lot • Domain-specific abbreviations, non-standard observations in short text fragments, hypotheses, clinician notes during patient visit (outpatient) as well as nurse notes (inpatient)
  13. 13. Applications of NLP: HealthCare Source:
  14. 14. • Using a medical lexicon • Match terms of interest from • Example-> ENT: Examined and Normal Information Extraction Disease/Diagnosis Lexicon Examined Normal Enlarged ……. …….. …….. Regular Expressions (a-zA-Z)[:](a-zA-Z)([ and,.]? (not|no)?(a- zA-Z ){1-3} )* lexicon • Using regular expressions (regex) • Capture terms of interest based on regular expression patterns • Example-> Extremities: Ankle scar, no joint damage
  15. 15. Machine Learning Why Machine Learning? • Rules are easy to create but need extensive testing for coverage • Rules are difficult to maintain-> If format of dataset changes, rules need to be changed • A machine learning algorithm with good generalisation can outperform rule-based systems • Machine Learning algorithms need a good amount of data to be trained • Data is usually labeled examples which the machine learning model can learn parameters. They can use these parameters to generalise to unseen examples. • Labeled training data in some domains is hard to obtain! • HealthCare - Hard and Expensive to obtain data • News, Government Records (Public) - Might be easier to obtain
  16. 16. Deep Learning Using a BiLSTM-CRF-LSTM character-embedding model for information extraction from a clinical note Source:*3OHMG4dTYpGLwcAcyl6t2Q.png
  17. 17. Some other applications of NLP in Healthcare • Analysing medical transcription records • Clinical trial matching • Data mining for research on disease information and public health • Computer assisted code generation for automated billing • Biomarker discovery and computational phenotyping • Clinical decision-making
  18. 18. Applications of NLP: Brand Monitoring • User sentiment and opinion analysis • Going beyond star-ratings. Collect user preferences in detail. • Develop product recommendations from user opinions • Discovering user group sentiments about a particular product at a particular time period • Strategy for new product development and existing product improvement
  19. 19. Brand Monitoring: NLP Overview Twitter Facebook Company Website Other websites / blogs Sentiment analysis on reviews comments • Identify top complaints from users • Identify products that have bad reviews • Identify if specific customer segments show specific sentiment (location, user type etc) • Develop products for specific groups of customers that show similar preferences • Identify products that need improvement over competitor products • Recommend similar products or to similar customers Sentim ent Analysis User preferences, product review m etrics of self and com petitor products User Review Data Customer Service Strategy, Marketing, Development Calculate similarities within users and products and cluster them Classify reviews by product, product type, geography
  20. 20. Word Embeddings - Word2Vec • Unsupervised method - Provides a notion of relatedness between two words by capturing co-occurences between words and projecting them onto a vector space. • Shallow neural network . Two models: • CBOW - Predicts the centre word from the surrounding context words . Example : A chair made of wood • Skip Gram - Predicts the surrounding context words from the centre words. Example: A chair made of wood
  21. 21. Word2Vec Visualization Image Source: Each learnt word is represented by an n-dimensional vector KING = [0.43 0.57 0.238 0.66 …. 0.5] Word embeddings can be used as inputs for tasks like classification, Named Entity Recognition
  22. 22. Why do we need contextual word embeddings • Non-contextual word embeddings (like word2vec) do not capture multiple meanings of a word. • For example, (1) ship -> dispatch (2) ship -> vehicle for navigating an ocean. • Context 1: The ship sailed across the Indian Ocean. • Context 2: I will ship the required items today. • Contextual word embeddings also capture the context of a word in the sentence it occurs. So they take into account the whole sentence before assigning it a vector value. It is natural to assume that the meaning of a word depends on the context in which it is used. • Examples of contextual word embeddings: BERT, ELMo, GPT-2
  23. 23. An example of Contextual Embeddings - BERT • BERT is developed on the concept of language models coupled with Transformers • The main motivation of BERT is transfer learning: to provide a contextual basis for the learnt embeddings so that they could be used to improve accuracy on downstream tasks. • BERT improves accuracy on many downstream tasks - Natural Language Inference - 4.6%, Question Answering (SQUAD) - 5.1% and several other NLP tasks • Bert Paper: • Transformer Paper:
  24. 24. Transformer Source:
  25. 25. BERT - Overview Image Source: BERT is a trained Transformer Encoder stack developed by Google (2018) . BERT-Base has 12 layers whereas BERT-Large has 24 layers Bidirectional Encoder Representations from Transformers
  26. 26. Using BERT-Fine Tuning Image Credit:
  27. 27. Using BERT-Feature Extraction Image Source:
  28. 28. BERT - Training Image Source: Word Masking Next Sentence Prediction
  29. 29. Model Interpretability • What is Model Interpretability? • Model interpretability is the ability of humans to understand and explain how a machine learning algorithm arrives at a decision. • Motivations • Reducing training data bias and improving fairness of models • Transparency for ethical and legal reasons . Example: Why was a person’s loan application rejected? • Understanding generalisation and improving model performance
  30. 30. LIME ModelLocal Interpretable Model-Agnostic Explanations LIME-General Procedure of Implementation • Take a point which we want to interpret, P • Sample instances around P and weigh them by distance to P • Learn a linear model from this procedure • This linear model is a good local representation of the vicinity of P but may not be generalisable globally • Check out this link if you want to use Lime with sklearn: %20basic%20usage%2C%20two%20class%20case.html Image Source:
  31. 31. LIME ModelLocal Interpretable Model-Agnostic Explanations Removing “Posting” and “NNTP” from the input text reduces the class prediction probability of “atheism” by 0.58 - (0.15 + 0.11) = 0.32 Image Source:
  32. 32. Thank you • Email :