Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Solve the most wicked text categorization problems - MeaningCloud webinar

173 views

Published on

Discover a new semantic tool to solve the most wicked text categorization problems.
MeaningCloud webinar, June 19, 2019.
More info and webinar contents https://www.meaningcloud.com/blog/recorded-webinar-solve-wicked-text-categorization-problems
MeaningCloud https://www.meaningcloud.com

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Solve the most wicked text categorization problems - MeaningCloud webinar

  1. 1. Solve the most wicked text categorization problems June 19, 2019 MEANINGCLOUD – 2019 Webinar
  2. 2. MEANINGCLOUD - 2019 2 Presenter How to participate • Send questions using the chat feature, or • Click the “Raise your hand” button to speak and we will enable your mic • Afterwards, you’ll be able to access a recording of the webinar and its contents as tutorials on our blog Before we get started… Antonio Matarranz CMO
  3. 3. 3 MEANINGCLOUD – 2019 Why this webinar? In the real world, there are wicked text categorization problems A new approach based on semantic analysis can solve them
  4. 4. MEANINGCLOUD - 2019 4 Agenda • Developing categorization models in the real world • Categorization based on pure machine learning • Deep Categorization API. Pre-defined models and vertical packs • The new Deep Categorization Customization Tool. Semantic rule language • Case Study: development of a categorization model • Deep Categorization - Text Classification. When to use one or the other • Agile model development process. Combination with machine learning • Conclusions and Q&A
  5. 5. MEANINGCLOUD - 2019 5 Text categorization in a perfect world Machine-Learning Categorization Model Input text Categories Model Training Training texts 1) Use machine learning to train a Model using tagged corpora 1) Collect a corpus of tagged texts 2) Represent each text by a feature vector that models structure and semantics 3) Train a classifier using any suitable supervised learning algorithm (SVM, Naïve Bayes, kNN, Deep Learning…) 2) Categorize input text using the Model 1. Training 2. Execution Humans tagging texts
  6. 6. 6 MEANINGCLOUD – 2019 Advantages (and limitations) of machine learning • Building models is easy and fast (provided that we have a sufficient training set) • Easy adaptation to new domains • Availability of enough training data • “Black box” model where adding new knowledge is hard/impossible • High “inertia” • Does not justify categorization result
  7. 7. MEANINGCLOUD - 2019 7 Does it look familiar? “This is our new taxonomy, but it can still be improved.” “Training text? We do not have tagged texts.” “It is important to differentiate Washington (the city) from Washington (the sports team), from Washington (the surname).” “You have to change the names of all our plans and promotions for tomorrow.”
  8. 8. MEANINGCLOUD - 2019 8 The real world is very difficult WICKED PROBLEMS Categories are not defined or they are evolving We do not have adequate training corpus Great precision is required to discriminate among categories Context in general is very dynamic HUGE DEVELOPMENT, EXPLOITATION AND EVOLUTION COSTS
  9. 9. MEANINGCLOUD - 2019 9 We need a different way of doing things Agile Text Analytics Rapid Model Generation Incorporated Domain Knowledge Powerful Configuration and Refinement Quality Assurance An inherently iterative and incremental process of continuous improvement
  10. 10. How we solve it
  11. 11. 11 MEANINGCLOUD - 2019 MeaningCloud: Meaning as a Service Standard APIs (SaaS and on-premises) Use it free at www.meaningcloud.com
  12. 12. MEANINGCLOUD - 2019 12 The foundation of our solution: Deep Categorization API Our API for wicked categorization problems Based on the meaning of the text ➢ Leverages the deep morphosyntactic and semantic analysis that MeaningCloud performs Deep Categorization Model Input text Categories
  13. 13. MEANINGCLOUD - 2019 13 Deep Categorization predefined models Vertical Packs IAB 2.0 Web content Voice of the Customer (*) Customer feedback Voice of the Employee (*) Employee feedback Intention Analysis (*) Stage in customer journey (*) Included in MeaningCloud’s Vertical Pack
  14. 14. MEANINGCLOUD - 2019 14 Now totally customizable Deep Categorization Model Input text Categories Customization Tool Domain knowledge (+ training text) Customization Tool
  15. 15. MEANINGCLOUD - 2019 15 Categorization based on the meaning of the text Use (generally) human-defined rules based on advanced pattern matching 1. Divide text into words 2. Normalization (stemming/lemmatization, case conversion, etc.) 3. Morphosyntactic and semantic analysis 4. Check and apply rules for detecting categories
  16. 16. MEANINGCLOUD - 2019 16 A difficult endeavour… I'm going to buy an iPhone I bought an iPhone I will never buy an iPhone Washington?, What Washington?
  17. 17. MEANINGCLOUD - 2019 17 Semantic rule language Modularity and Reuse Operators and Expressions Use of Semantic Information Abstraction <Rules> -> #Category
  18. 18. MEANINGCLOUD - 2019 18 Rule language highlights (1) • Literals, regular expressions and (multiword) phrases • Logical (AND, OR, AND NOT) and proximity (NEAR) operators • Lemmatization and grammatical function vs. Exact word forms L@produce vs. produces [new L@product|L@service@N|L@process@N|L@value@N]~4 -> #Management>Innovation • Macros to group words/semantic expressions and reuse them in different rules MACRO {pet} = dog|cat|rabbit|turtle
  19. 19. MEANINGCLOUD - 2019 19 Rule language highlights(2) • Use of detected entities and concepts and their semantic types S@Top>Organization>Company>FinancialCompany>BankingCompany @instance AND NOT Bank_of_America -> #BankAmericaCompetitors S@Top>LivingThing>Animal::{pet}-> #NonPetAnimal • Geographical information {travel} AND G@America>Canada -> #Travel>Canada • Use of categories in rules (if the text is or isn’t classified in a category it can be used in the rules) #SpeedAgility AND #Channel>App -> #SpeedAgilityWithApp • Robustness to spelling mistakes (Bank of Amerca)
  20. 20. Use case
  21. 21. MEANINGCLOUD - 2019 21 Contact center ticket categorization ➢ Information request ➢ Prices and conditions ➢ Bugs - Website ➢ Bugs - APIs ➢ Bugs - Integrations MeaningCloud contact center
  22. 22. 22 MEANINGCLOUD - 2019 From a ticket sample to the categorization model
  23. 23. MEANINGCLOUD - 2019 23 Process 1. Write rules based on a basic knowledge of the categories 2. Use advanced features to multiply recall and precision 3. Apply iterative and incremental development to refine and adapt to dynamic scenarios
  24. 24. MEANINGCLOUD - 2019 24 A simple case Category: Bug Report – Web • Rule: Validation email I didn’t receive the validation mail I’m still waiting for the confirmation email I’m waiting on confirmation that you have received my e-mail receive|wait AND "validation|confirmation e-?mail|mail" Lemma: “I didn’t receive”, “I’m waiting”… Literal multiword expression: “validation mail”, “confirmation email”… Regular expression: ”mail”, “email”, “e-mail”
  25. 25. 25 MEANINGCLOUD – 2019 Including semantic information (1) Category: Bug Report – APIs • Rule: API error Category: Bug Report - Integrations • Rule: Integration error I‘m having issues with the sentiment API I am trying to install the VoE plugin but keep receiving the error below <MeaningCloud API mention>AND error|bug|issue|problem <MeaningCloud Integration mention>AND error|bug|issue|problem
  26. 26. MEANINGCLOUD - 2019 26 Including semantic information (2) Creation of a custom dictionary • Entities and concepts, with their semantic information • Use them in rules Topics Extraction Text Classification Sentiment Analysis Deep Categorization Summarization … API Top Product Integration Excel add-in GATE plug-in Google Sheets add-on RapidMiner extension Zapier app …
  27. 27. MEANINGCLOUD - 2019 27 Including semantic information (3) S@Top>Product>API AND error|bug|issue|problem S@Top>Product>Integration AND error|bug|issue|problem Any mention of an API product Any mention of an Integration product
  28. 28. MEANINGCLOUD - 2019 28 Modularity and reuse applying macros Ej.: error|bug|issue|problem appears in multiple contexts and rules {error} = error|issue|problem|bug {agent} = representative|agent|someone|engineer S@Top>Product>API AND {error} S@Top>Product>Integration AND {error} Modular reuse
  29. 29. MEANINGCLOUD - 2019 29 Using categories within rules • Conflicts between categories • Rules that depend on certain categories having been triggered Hi, I’ve received an error message when using the sentiment analysis tool for Excel that says “you don’t have access to this sm/model yet” Bug Report – APIs o Bug Report - Integrations #BR-INT AND #BR-API -> #BR-API If both categories meet, exclude Bug Report – APIs
  30. 30. MEANINGCLOUD - 2019 30 E.g., releasing a new API: Insight Engine Deep Categorization API Verbatims Deep Categorization Model Dictionary Categories Including a new product without modifying rules Changes are propagated to the model without needing to modify anything Include “Insight Engine” in the dictionary
  31. 31. 31 MEANINGCLOUD – 2019 Advantages (and limitations) of semantic rules • "White box" model, where adding new knowledge is easy • Low "inertia" • Errors are easy to correct • Accuracy can be as high as desired • Does not require tagging training corpus • Justifies categorization results • The development of models requires effort (but less than manually tagging a training set) • Adaptation to new domains is relatively expensive
  32. 32. Agile development process
  33. 33. 33 MEANINGCLOUD – 2019 API Comparison: Deep Categorization vs. Text Classification. When to use one or the other? Text Classification API (Machine Learning + Basic Rules) • Well defined and fixed categories • Very big models • Plenty of training texts are available • Relatively static scenario Deep Categorization API (Semantic Rules) • Badly defined or evolving categories • Models that are not too extensive • Not enough training texts are available • High precision is required to discriminate among categories • Dynamic scenario • The justification of categories is a necessity
  34. 34. MEANINGCLOUD - 2019 34 Agile model development process. Combination with machine learning – Option 1 Machine-Learning (ML) Categorization Deep Categorization Rule ModelML Model Input text Intermediate categories Categories Model Training Model Editor Training texts Rule editor Automatic categorization engine Classifier training engine Classifier engine Fast model development and high precision from the beginning Transparency, refinement and adaptation
  35. 35. 35 MEANINGCLOUD - 2019
  36. 36. MEANINGCLOUD - 2019 37 Customer case: contact center call categorization in telco • Automatic categorization of call summaries prepared by operators to extract the reason (root cause) of the call • Goal: increase satisfaction and reduce calls to the contact center • Challenges: – Highly dimensional complex model ▪ 3 levels: functional area + reason + 2nd order reason / product ▪ 56 categories in level 1; 1,615 categories in total – High semantic overlap – Texts with incorrect capitalization and abundant typos – Modular categories, need to reuse definitions – Need for evolution over time – 10 days • Solution: – Abundant use of macros and "virtual" categories – Complex rules – Expansion of rules using Word Embeddings to discover synonyms and related terms – Final model with 800 macros and 2,395 rules – Recall of 80% of the texts – Final precision: 78% in level 1, 75% exact-match
  37. 37. MEANINGCLOUD - 2019 38 Customer case: categorization of emails in banking • Automatic categorization of email messages in the contact center • Goal: automatic routing to the area in charge • Challenges: – Model with 3 orthogonal dimensions (reason + product / service + satisfaction), 39 categories in total – 3 different languages – High semantic overlap – Multi-label scenario (several labels allowed) – 4 weeks • Solution – One model per language – Use of product / service dictionaries – Abundant use of macros – Rules with weights for relevance calculation – Model with 590 - 733 rules, depending on language – Final precision: 70% reason, 75% product / service, 93% satisfaction
  38. 38. 39 MEANINGCLOUD – 2019 Conclusion Wicked text categorization problems HAPPEN Give our agile development process a chance
  39. 39. Q & A time
  40. 40. MEANINGCLOUD - 2019 41 Stay tuned to our blog and emails We’ll be posting a recording of the webinar and its contents as tutorials soon
  41. 41. 42 MEANINGCLOUD - 2019 www.meaningcloud.com Automating the extraction of Meaning from any information source. +1 (646) 403-31043537 36th Street New York, NY 11106 amatarranz@meaningcloud.com Thank you for your attention!

×