SlideShare a Scribd company logo
1 of 28
Continuous Learning Systems:
Building ML systems that learn
from their mistakes
Anuj Gupta
(Intuit)
Saurabh Arora, Satyam Saxena, Navaneethan Santhanam
This work was done when the authors were at Freshworks
Agenda
1. Understanding the Problem Statement
● Background
● Metrics that matter
● Observations
1. Solution v1.0
2. Issues
1. Solution v2.0
a. Building Feedback loop
b. Global + local
1. Results
2. Conclusions and Way Forward
Background
● Customer Support on social is now must for all B2C brands.
● Ex: @AppleSupport, @AmazonHelp, @BofA_Help.
● Twitter, Facebook have launched dedicated features for this.
● Most CRM suites support Customer Service@social
Metrics that matter
● Owing to public nature of conversations, brands
care about 2 things:
a. Reply fast
b. Reply well
Both these contribute to how a brand is perceived.
● To measure (a), 2 key metrics are:
a. Average First Response Time (AFRT)
b. Average Response Time (ART)
● Many of our customers (CS team of brands) had pretty high AFRT/ART
● Ask: Reduce AFRT/ART
● Traffic on brand’s social channel is not just questions or requests. Its lot more than that!
Observations
✅
❌
❌
❌
Actionable
Noise/spam
Observations
● The average number of replies sent per agent per day was relatively low. (~12-15). Yet the
ART/FRT were pretty high.
● Of the total inbound traffic on support handles, only a fraction of tickets were being replied to.
typically ~ 5% - 40%.
● In between 2 messages that were responded to, lot of
messages that were not being responded to (~3-30)
Most of time going in finding finding
actionable conversations
Solution v1.0
• Noise filter for CS@social
• Model it as (binary) classification problem.
• Acquire good quality dataset.
• Engineer features – there are some very good indicators.
Actionable Noise/Spam
• Train-test-tune, ~75% accuracy. Deploy
Issues
*within couple of weeks of deployment
● Performance varied across brands.
● While for some brands the model worked very well, for some it did very badly.
● As time* went by even the models that performed well, started doing badly.
• Our data was changing
Behind the Scene
Non-stationary distributions
A stationary process is time-independent ~ the averages remain more or less the constant.
• World of CS@social is not just Black(noise) and White (actionable).
• It also has a spectrum of grey in between:
a. “Hi”, “Hello”, “Good mornings”
b. “Any new offers today”
c. “The recent ad you launched is very good. Keep it up”
d. Quizzes, engagement posts
• Some brands respond to such traffic, some do not.
• Noise and actionable were merely 2 extremes of this spectrum.
• Definition of noise and actionable was not consistent across various brands.
• Boundary (in the grey region) separating noise from actionable varies from brand to brand.
• A single common classifier for all is doomed to fail!
Behind the Scene
In Nutshell
• Based on last few slides, degradation in model performance shouldn’t come
as surprise
• One model fits all is not going to work.
• Non-stationary distributions is not just specific to twitter data. In general, it is
found in other domains as well:
o Monitoring & Anomaly detection (one-class classification) in adversarial setting
o Recommendations (where the user preferences are continuously changing; evolving labels)
o Stock market predictions (concept drift; evolving distributions).
• Build per brand model to have brand specific learning.
• Learn from mistakes: In our system, by looking at what messages are being
replied to and what not, we know (with a small delay), if the classification done
by the system is right or wrong.
• The model is not utilizing these signals to improve.
• If feedback is utilized well:
• With time adapt to brands definition of noise and actionable.
• Adapt to variations/changes in features
Towards Solution: Exploration
Incorporate feedback
• Frequently retrain your model on the updated data and deploy the same.
o Training, testing, fine-tuning – 45K models.
Compute heavy. Doesn’t scale at all .
o Loose all old learnings
• Keep learning from feedback: Model adapts to the new incoming data.
What worked for us
Global Model
Batch trained
Large Corpus
No short term updates
Local model
Fast learner
Short term updates
● 2 models - Global + Local
● Global model is common for all
brands
○ Trained on large dataset
● 1 Local model per brand
Local
• Goals
o Improves with feedback.
• Desired properties
o Fast learner (light compute)
▪ Incorporates most feedbacks successfully
(After model update, if the same data point is presented, it must correctly predict its class label.)
o Avoid catastrophic forgetting
(After model update, if the last N data point is presented, it should predict its class label with higher accuracy.)
Building feedback loop
ML model
<Tweet, Yp>
<Tweet, YT>
If YT ≠ Yp
Tweet
Works fine if the velocity of
feedback data is high (don’t
have to wait long to accumulate
a mini-batch of feedbacks).
Many applications don’t have
high velocity.
Very few data point - can skew
the model
mini-batches Instant feedback, tiny-
batches
Possible Approaches to incorporate feedback
Building feedback loop
• We model a feedback point <Tweet, YT> as a datapoint presented to local model
in online setting.
• Thus, a bunch of feedbacks = incoming data stream
• We used a Online learning.
• Online learning:
Data is modeled as stream.
Model makes a prediction (YP), when presented with data point (X).
Environment reveals the correct class label (YT)
If YP ≠ YT, update the model with <X, YT>
Online Algorithms
http://scikit-learn.org/stable/auto_examples/linear_model/plot_sgd_comparison.html
Crammer’s PA-II
• Dataset – 150K tweets, time sequenced
• Feedback incorporation improves accuracy:
o Trained (offline batch mode) model on first 100K data points.
o On test set (last 50k data points) it gave 75% accuracy (offline batch mode)
o Then ran the model on test data (50k data points) in online fashion
Model made a total 9028 mistakes.
These mistakes were instantaneously fed into the local model as feedback.
This gives a accuracy ~82 % across the test set.
○ We gained ~7% accuracy by incorporating feedback.
Results of Local :
Improving accuracy
# of test points
We also tested the local by feeding it with wrong feedbacks.
Combining global and local
• Scores from both global and local, combined to get a single score and apply
threshold to arrive at a prediction.
• We got an accuracy of ~82%
Global
Local
combined
score
# of test points
Pros:
• Improved running accuracy
• Personalization : The notion of spam varies from brand to brand. Some
brands treat ‘Hi’, ‘Hello’ as spam while others treat them as actionable. By
learning from feedback, the model adapts to the notions of the brand.
• Local is light-weight, fast thus easy to boot-strap, deploy and scale.
Cons:
● Local can overfit to feedback, thus become biased.
● Need to monitor biasness.
● Reset local as when it becomes biased.
Future Work
• Instead of a single global, have vertical specific global
• Try other online algorithms
• Handle drift
• Not incorporate every feedback? Update on most important ones.
References
1. “Online Passive-Aggressive Algorithms” - Crammer et al., JMLR 2006
2. “The learning behind gmail priority inbox” – Aberdeen et al., LCCC: NIPS Workshop 2010
3. “Learning with drift detection” – Gama et al., BSAI 2004
4. "Adaptive regularization of weight vectors." ” - Crammer et al., ANIPS 2009
5. LIBOL - A Library for Online Learning Algorithms. https://github.com/LIBOL/LIBOL
Thank You
Please feel free to reach out post this talk or on the interwebs.
@anujgupta82
Anuj Gupta

More Related Content

What's hot

Offline evaluation of recommender systems: all pain and no gain?
Offline evaluation of recommender systems: all pain and no gain?Offline evaluation of recommender systems: all pain and no gain?
Offline evaluation of recommender systems: all pain and no gain?Mark Levy
 
First steps in Test Driven Development
First steps in Test Driven Development First steps in Test Driven Development
First steps in Test Driven Development IIBA UK Chapter
 
Final year project | Guide
Final year project | GuideFinal year project | Guide
Final year project | GuideVestas
 
IEEE 2015 Final Year Project Steps Guide
IEEE 2015 Final Year Project Steps GuideIEEE 2015 Final Year Project Steps Guide
IEEE 2015 Final Year Project Steps GuideTTA_TNagar
 
Establish the right practices for Effective AI
Establish the right practices for Effective AIEstablish the right practices for Effective AI
Establish the right practices for Effective AIWee Hyong Tok
 
Microservices Summit - The Human Side of Services
Microservices Summit - The Human Side of ServicesMicroservices Summit - The Human Side of Services
Microservices Summit - The Human Side of ServicesYelp Engineering
 
Test Automation Patterns: Issues and Solutions
Test Automation Patterns: Issues and SolutionsTest Automation Patterns: Issues and Solutions
Test Automation Patterns: Issues and SolutionsTechWell
 
Darshan Desai - Virtual Test Labs,The Next Frontier - EuroSTAR 2010
Darshan Desai - Virtual Test Labs,The Next Frontier - EuroSTAR 2010Darshan Desai - Virtual Test Labs,The Next Frontier - EuroSTAR 2010
Darshan Desai - Virtual Test Labs,The Next Frontier - EuroSTAR 2010TEST Huddle
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableJustin Basilico
 

What's hot (11)

Offline evaluation of recommender systems: all pain and no gain?
Offline evaluation of recommender systems: all pain and no gain?Offline evaluation of recommender systems: all pain and no gain?
Offline evaluation of recommender systems: all pain and no gain?
 
First steps in Test Driven Development
First steps in Test Driven Development First steps in Test Driven Development
First steps in Test Driven Development
 
Final year project | Guide
Final year project | GuideFinal year project | Guide
Final year project | Guide
 
IEEE 2015 Final Year Project Steps Guide
IEEE 2015 Final Year Project Steps GuideIEEE 2015 Final Year Project Steps Guide
IEEE 2015 Final Year Project Steps Guide
 
Establish the right practices for Effective AI
Establish the right practices for Effective AIEstablish the right practices for Effective AI
Establish the right practices for Effective AI
 
Microservices Summit - The Human Side of Services
Microservices Summit - The Human Side of ServicesMicroservices Summit - The Human Side of Services
Microservices Summit - The Human Side of Services
 
Test Automation Patterns: Issues and Solutions
Test Automation Patterns: Issues and SolutionsTest Automation Patterns: Issues and Solutions
Test Automation Patterns: Issues and Solutions
 
Tech Talk
Tech TalkTech Talk
Tech Talk
 
Darshan Desai - Virtual Test Labs,The Next Frontier - EuroSTAR 2010
Darshan Desai - Virtual Test Labs,The Next Frontier - EuroSTAR 2010Darshan Desai - Virtual Test Labs,The Next Frontier - EuroSTAR 2010
Darshan Desai - Virtual Test Labs,The Next Frontier - EuroSTAR 2010
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
 
Effective Project Execution
Effective Project ExecutionEffective Project Execution
Effective Project Execution
 

Similar to Building ML Systems that Continuously Learn from Mistakes

ODSC East 2020 : Continuous_learning_systems
ODSC East 2020 : Continuous_learning_systemsODSC East 2020 : Continuous_learning_systems
ODSC East 2020 : Continuous_learning_systemsAnuj Gupta
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning SystemsAnuj Gupta
 
LPP application and problem formulation
LPP application and problem formulationLPP application and problem formulation
LPP application and problem formulationKarishma Chaudhary
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10Roger Barga
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in productionTuri, Inc.
 
Startup Product Development
Startup Product DevelopmentStartup Product Development
Startup Product DevelopmentAaron Stannard
 
Lessons learned from Large Scale Real World Recommender Systems
Lessons learned from Large Scale Real World Recommender SystemsLessons learned from Large Scale Real World Recommender Systems
Lessons learned from Large Scale Real World Recommender Systemschrisalvino
 
Lean Startup + Story Mapping = Awesome Products Faster
Lean Startup + Story Mapping = Awesome Products FasterLean Startup + Story Mapping = Awesome Products Faster
Lean Startup + Story Mapping = Awesome Products FasterBrad Swanson
 
Doing Analytics Right - Building the Analytics Environment
Doing Analytics Right - Building the Analytics EnvironmentDoing Analytics Right - Building the Analytics Environment
Doing Analytics Right - Building the Analytics EnvironmentTasktop
 
Pin the tail on the metric v01 2016 oct
Pin the tail on the metric v01 2016 octPin the tail on the metric v01 2016 oct
Pin the tail on the metric v01 2016 octSteven Martin
 
Lean startups en el mundo real ejemplos y metricas
Lean startups en el mundo real  ejemplos y metricasLean startups en el mundo real  ejemplos y metricas
Lean startups en el mundo real ejemplos y metricasSoftware Guru
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Alok Singh
 
An Introduction To Software Development - Final Review
An Introduction To Software Development - Final ReviewAn Introduction To Software Development - Final Review
An Introduction To Software Development - Final ReviewBlue Elephant Consulting
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemPierre Gutierrez
 
Dashlane Mission Teams
Dashlane Mission TeamsDashlane Mission Teams
Dashlane Mission TeamsDashlane
 
Kaggle Days Paris - Alberto Danese - ML Interpretability
Kaggle Days Paris - Alberto Danese - ML InterpretabilityKaggle Days Paris - Alberto Danese - ML Interpretability
Kaggle Days Paris - Alberto Danese - ML InterpretabilityAlberto Danese
 
L'Oreal Tech Talk
L'Oreal Tech TalkL'Oreal Tech Talk
L'Oreal Tech TalkDoug Chang
 
Production and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsProduction and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsTuri, Inc.
 
2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptxgdgsurrey
 

Similar to Building ML Systems that Continuously Learn from Mistakes (20)

ODSC East 2020 : Continuous_learning_systems
ODSC East 2020 : Continuous_learning_systemsODSC East 2020 : Continuous_learning_systems
ODSC East 2020 : Continuous_learning_systems
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning Systems
 
LPP application and problem formulation
LPP application and problem formulationLPP application and problem formulation
LPP application and problem formulation
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in production
 
Startup Product Development
Startup Product DevelopmentStartup Product Development
Startup Product Development
 
Lessons learned from Large Scale Real World Recommender Systems
Lessons learned from Large Scale Real World Recommender SystemsLessons learned from Large Scale Real World Recommender Systems
Lessons learned from Large Scale Real World Recommender Systems
 
Lean Startup + Story Mapping = Awesome Products Faster
Lean Startup + Story Mapping = Awesome Products FasterLean Startup + Story Mapping = Awesome Products Faster
Lean Startup + Story Mapping = Awesome Products Faster
 
Doing Analytics Right - Building the Analytics Environment
Doing Analytics Right - Building the Analytics EnvironmentDoing Analytics Right - Building the Analytics Environment
Doing Analytics Right - Building the Analytics Environment
 
Pin the tail on the metric v01 2016 oct
Pin the tail on the metric v01 2016 octPin the tail on the metric v01 2016 oct
Pin the tail on the metric v01 2016 oct
 
Lean startups en el mundo real ejemplos y metricas
Lean startups en el mundo real  ejemplos y metricasLean startups en el mundo real  ejemplos y metricas
Lean startups en el mundo real ejemplos y metricas
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
 
An Introduction To Software Development - Final Review
An Introduction To Software Development - Final ReviewAn Introduction To Software Development - Final Review
An Introduction To Software Development - Final Review
 
Lean UX
Lean UXLean UX
Lean UX
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender system
 
Dashlane Mission Teams
Dashlane Mission TeamsDashlane Mission Teams
Dashlane Mission Teams
 
Kaggle Days Paris - Alberto Danese - ML Interpretability
Kaggle Days Paris - Alberto Danese - ML InterpretabilityKaggle Days Paris - Alberto Danese - ML Interpretability
Kaggle Days Paris - Alberto Danese - ML Interpretability
 
L'Oreal Tech Talk
L'Oreal Tech TalkL'Oreal Tech Talk
L'Oreal Tech Talk
 
Production and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsProduction and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning Models
 
2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx
 

More from Anuj Gupta

Sarcasm Detection: Achilles Heel of sentiment analysis
Sarcasm Detection: Achilles Heel of sentiment analysisSarcasm Detection: Achilles Heel of sentiment analysis
Sarcasm Detection: Achilles Heel of sentiment analysisAnuj Gupta
 
NLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPNLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPAnuj Gupta
 
Recent Advances in NLP
  Recent Advances in NLP  Recent Advances in NLP
Recent Advances in NLPAnuj Gupta
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer ConnectAnuj Gupta
 
Representation Learning of Text for NLP
Representation Learning of Text for NLPRepresentation Learning of Text for NLP
Representation Learning of Text for NLPAnuj Gupta
 
Synthetic Gradients - Decoupling Layers of a Neural Nets
Synthetic Gradients - Decoupling Layers of a Neural NetsSynthetic Gradients - Decoupling Layers of a Neural Nets
Synthetic Gradients - Decoupling Layers of a Neural NetsAnuj Gupta
 
Representation Learning for NLP
Representation Learning for NLPRepresentation Learning for NLP
Representation Learning for NLPAnuj Gupta
 

More from Anuj Gupta (9)

Sarcasm Detection: Achilles Heel of sentiment analysis
Sarcasm Detection: Achilles Heel of sentiment analysisSarcasm Detection: Achilles Heel of sentiment analysis
Sarcasm Detection: Achilles Heel of sentiment analysis
 
NLP Bootcamp
NLP BootcampNLP Bootcamp
NLP Bootcamp
 
NLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPNLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLP
 
Recent Advances in NLP
  Recent Advances in NLP  Recent Advances in NLP
Recent Advances in NLP
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer Connect
 
Representation Learning of Text for NLP
Representation Learning of Text for NLPRepresentation Learning of Text for NLP
Representation Learning of Text for NLP
 
Synthetic Gradients - Decoupling Layers of a Neural Nets
Synthetic Gradients - Decoupling Layers of a Neural NetsSynthetic Gradients - Decoupling Layers of a Neural Nets
Synthetic Gradients - Decoupling Layers of a Neural Nets
 
DLBLR talk
DLBLR talkDLBLR talk
DLBLR talk
 
Representation Learning for NLP
Representation Learning for NLPRepresentation Learning for NLP
Representation Learning for NLP
 

Recently uploaded

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Recently uploaded (20)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Building ML Systems that Continuously Learn from Mistakes

  • 1. Continuous Learning Systems: Building ML systems that learn from their mistakes Anuj Gupta (Intuit) Saurabh Arora, Satyam Saxena, Navaneethan Santhanam This work was done when the authors were at Freshworks
  • 2. Agenda 1. Understanding the Problem Statement ● Background ● Metrics that matter ● Observations 1. Solution v1.0 2. Issues 1. Solution v2.0 a. Building Feedback loop b. Global + local 1. Results 2. Conclusions and Way Forward
  • 3. Background ● Customer Support on social is now must for all B2C brands. ● Ex: @AppleSupport, @AmazonHelp, @BofA_Help. ● Twitter, Facebook have launched dedicated features for this. ● Most CRM suites support Customer Service@social
  • 4. Metrics that matter ● Owing to public nature of conversations, brands care about 2 things: a. Reply fast b. Reply well Both these contribute to how a brand is perceived. ● To measure (a), 2 key metrics are: a. Average First Response Time (AFRT) b. Average Response Time (ART)
  • 5. ● Many of our customers (CS team of brands) had pretty high AFRT/ART ● Ask: Reduce AFRT/ART ● Traffic on brand’s social channel is not just questions or requests. Its lot more than that!
  • 7. Observations ● The average number of replies sent per agent per day was relatively low. (~12-15). Yet the ART/FRT were pretty high. ● Of the total inbound traffic on support handles, only a fraction of tickets were being replied to. typically ~ 5% - 40%. ● In between 2 messages that were responded to, lot of messages that were not being responded to (~3-30) Most of time going in finding finding actionable conversations
  • 8. Solution v1.0 • Noise filter for CS@social • Model it as (binary) classification problem. • Acquire good quality dataset. • Engineer features – there are some very good indicators. Actionable Noise/Spam • Train-test-tune, ~75% accuracy. Deploy
  • 9. Issues *within couple of weeks of deployment ● Performance varied across brands. ● While for some brands the model worked very well, for some it did very badly. ● As time* went by even the models that performed well, started doing badly.
  • 10. • Our data was changing Behind the Scene Non-stationary distributions A stationary process is time-independent ~ the averages remain more or less the constant.
  • 11. • World of CS@social is not just Black(noise) and White (actionable). • It also has a spectrum of grey in between: a. “Hi”, “Hello”, “Good mornings” b. “Any new offers today” c. “The recent ad you launched is very good. Keep it up” d. Quizzes, engagement posts • Some brands respond to such traffic, some do not. • Noise and actionable were merely 2 extremes of this spectrum. • Definition of noise and actionable was not consistent across various brands. • Boundary (in the grey region) separating noise from actionable varies from brand to brand. • A single common classifier for all is doomed to fail! Behind the Scene
  • 12. In Nutshell • Based on last few slides, degradation in model performance shouldn’t come as surprise • One model fits all is not going to work. • Non-stationary distributions is not just specific to twitter data. In general, it is found in other domains as well: o Monitoring & Anomaly detection (one-class classification) in adversarial setting o Recommendations (where the user preferences are continuously changing; evolving labels) o Stock market predictions (concept drift; evolving distributions).
  • 13. • Build per brand model to have brand specific learning. • Learn from mistakes: In our system, by looking at what messages are being replied to and what not, we know (with a small delay), if the classification done by the system is right or wrong. • The model is not utilizing these signals to improve. • If feedback is utilized well: • With time adapt to brands definition of noise and actionable. • Adapt to variations/changes in features Towards Solution: Exploration
  • 14. Incorporate feedback • Frequently retrain your model on the updated data and deploy the same. o Training, testing, fine-tuning – 45K models. Compute heavy. Doesn’t scale at all . o Loose all old learnings • Keep learning from feedback: Model adapts to the new incoming data.
  • 15. What worked for us Global Model Batch trained Large Corpus No short term updates Local model Fast learner Short term updates ● 2 models - Global + Local ● Global model is common for all brands ○ Trained on large dataset ● 1 Local model per brand
  • 16. Local • Goals o Improves with feedback. • Desired properties o Fast learner (light compute) ▪ Incorporates most feedbacks successfully (After model update, if the same data point is presented, it must correctly predict its class label.) o Avoid catastrophic forgetting (After model update, if the last N data point is presented, it should predict its class label with higher accuracy.)
  • 17. Building feedback loop ML model <Tweet, Yp> <Tweet, YT> If YT ≠ Yp Tweet
  • 18. Works fine if the velocity of feedback data is high (don’t have to wait long to accumulate a mini-batch of feedbacks). Many applications don’t have high velocity. Very few data point - can skew the model mini-batches Instant feedback, tiny- batches Possible Approaches to incorporate feedback
  • 19. Building feedback loop • We model a feedback point <Tweet, YT> as a datapoint presented to local model in online setting. • Thus, a bunch of feedbacks = incoming data stream • We used a Online learning. • Online learning: Data is modeled as stream. Model makes a prediction (YP), when presented with data point (X). Environment reveals the correct class label (YT) If YP ≠ YT, update the model with <X, YT>
  • 21. • Dataset – 150K tweets, time sequenced • Feedback incorporation improves accuracy: o Trained (offline batch mode) model on first 100K data points. o On test set (last 50k data points) it gave 75% accuracy (offline batch mode) o Then ran the model on test data (50k data points) in online fashion Model made a total 9028 mistakes. These mistakes were instantaneously fed into the local model as feedback. This gives a accuracy ~82 % across the test set. ○ We gained ~7% accuracy by incorporating feedback. Results of Local :
  • 22. Improving accuracy # of test points We also tested the local by feeding it with wrong feedbacks.
  • 23. Combining global and local • Scores from both global and local, combined to get a single score and apply threshold to arrive at a prediction. • We got an accuracy of ~82% Global Local combined score
  • 24. # of test points
  • 25. Pros: • Improved running accuracy • Personalization : The notion of spam varies from brand to brand. Some brands treat ‘Hi’, ‘Hello’ as spam while others treat them as actionable. By learning from feedback, the model adapts to the notions of the brand. • Local is light-weight, fast thus easy to boot-strap, deploy and scale. Cons: ● Local can overfit to feedback, thus become biased. ● Need to monitor biasness. ● Reset local as when it becomes biased.
  • 26. Future Work • Instead of a single global, have vertical specific global • Try other online algorithms • Handle drift • Not incorporate every feedback? Update on most important ones.
  • 27. References 1. “Online Passive-Aggressive Algorithms” - Crammer et al., JMLR 2006 2. “The learning behind gmail priority inbox” – Aberdeen et al., LCCC: NIPS Workshop 2010 3. “Learning with drift detection” – Gama et al., BSAI 2004 4. "Adaptive regularization of weight vectors." ” - Crammer et al., ANIPS 2009 5. LIBOL - A Library for Online Learning Algorithms. https://github.com/LIBOL/LIBOL
  • 28. Thank You Please feel free to reach out post this talk or on the interwebs. @anujgupta82 Anuj Gupta