SlideShare a Scribd company logo
Evaluation strategies
with partially or unlabeled data
30/11/2023
About us
Sharone Dayan
Machine Learning Engineer
@ Contentsquare
sharone.dayan@contentsquare.com
Daria Stefic, PhD
Data Scientist
@ Contentsquare
daria.stefic@contentsquare.com
Agenda
1. Global Picture
Evaluation methods
2. Case studies
a. Purchase Intent Prediction
b. Unsupervised Segment Discovery
c. Anomaly Detection for Alerting
3. Key takeaways
Academia VS Industry
Global Picture
Do we have
labels?
Semi-supervised/
Unsupervised
Can I use public
datasets?
Can I generate
artificial data?
Can I design a
proxy?
Supervised
YES NO or FEW
Can I get a
manually labelled
dataset?
Unsupervised
Segment Discovery
Purchase Intent
Prediction
Anomaly detection for
Alerting
Evaluation methods
Case studies
Case study:
Purchase Intent Prediction
“Can I design a proxy?”
8
Main goal
“Who could have converted?”
non-buyers buyers
missed purchase segment
purchase intent segment
Detect non converting users who
had converting intentions based on
their behaviour (e.g. interaction with
product details, add to cart)
Design choices:
● Focus on anonymous users
● Focus on retail clients
● No difference between single-item
and multi-item purchase
● Offline prediction
9
If we had labels for purchase intent…
…we could:
1. directly train a classifier in a supervised
way to recognise intent
2. evaluate our classifier with standard
classification metrics (e.g. f1-score) directly
on these labels
3. compare different solutions in an unbiased
way
…but:
The only labels we have for purchase intent are
from converting sessions
10
Positive Negative
Positive
Negative
We don’t want any
converters predicted as
‘Not intended to purchase‘
Predicted intent
Actual
conversion
Purchase Intent Evaluation
We want some
non-converters predicted
as ‘Intended to purchase’
© Contentsquare 2022
We need proxy labels for purchase intent
11
1. converting sessions have the intention to purchase by definition
2. sessions that put something in the cart and purchase it in a later
session probably had intention to purchase
“Strict” proxy set “Loose” proxy set
Items added-to-cart in current session are
purchased in a later session of the same user
Add-to-cart event in current session culminate
in a later converting session of the same user,
on the same day
Merchandising CS clients only
All CS clients
12
Schematic predictions Non-buyers (test set)
Buyers (training set)
Buyers (test set)
Predict purchase
Predict purchase intent
Add to cart leading to conversion
in next sessions
MISSED PURCHASE
Case study:
User Segment Discovery
“Can I generate artificial data?”
© Contentsquare 2022
Main goal
14
Understand the main user persona
based on the user’s page visit
sequence thanks to unsupervised
learning
Hypothetical outcome:
Here are 3 typical visitor personas:
● Early leaver: Home page - PDP - Exit
● Heavy searcher: Home - Search -
Multiple PDPs - Cart - Exit
● Golden converting journey: Home
page - PLP - PDP - Cart - Delivery -
Payment
“What are the typical paths people take?”
15
Wait, what… Evaluating unsupervised?
How to include business constraints?
How to benchmark different settings (features, distances,
clustering algorithms, etc.)?
16
Idea
HSSSSSSS
PSPSPSPSP
HSSSSSSSS
HPPPPPPPP
SSSSSSSS
HSPSPSPSP
HSPPPPPPP
HSSSSSSS
PSPSPSPSP
HSSSSSSSS
HPPPPPPPP
SSSSSSSS
HSPSPSPSP
HSPPPPPPP
● 3 patterns:
- Mainly visiting
product pages
- Mainly visiting search
results
- Looping between
product pages and
search results
● 3 categories
○ P = Product page
○ S = Search results
○ H = Home page
17
We need to validate the clustering “health”
Toy scenarios where we have clear expectations on what the result of
the clustering should be + run functional tests around them.
4 types of session generators (artificial data):
- sessions focused on one specific page
- sessions having a given probability for each page group
- “cycling“ sessions always coming back to the same sequence
- sessions containing a specific pattern
18
Health check results complex health check -> areas of improvement
basic health check -> mandatory
Health check
difficulty
Features A,
distance a,
clustering i
Features A,
distance b,
clustering i
… Features B,
distance c,
clustering j
…
…
…
…
…
Case study:
Anomaly Detection for Alerting
“Can I get a manually labelled dataset?”
20
Example: number of users with API errors
Anomaly detection in time series
Alert!
Main goal
Alert clients (in real time) if they have issues on their platform
21
Example: number of users with API errors
Seasonal pattern
The data
22
Confidence bounds
The data
23
But, how do we know if our model is:
● Raising true alerts?
● Not raising false alerts?
When metric values are beyond the bounds → alert
The model = seasonality + bounds
24
Not scalable!
Evaluation #0: visual inspection
25
Seasonality model:
forecasting error (MAE) = 49.25
Evaluation #1: forecasting error
26
Last value model:
forecasting error (MAE) = 35.24
Evaluation #1: forecasting error
27
Raised alerts of last value model vs. seasonal model
Last value:
Seasonal:
Evaluation #1: forecasting error
False alarm!
.
28
Do we want to alert on those?
Evaluation #2: anomalous points classification
29
When something happens, it lasts more than 5 min
Example: Iphone launch
Evaluation #3: anomalous periods
30
Negative periods
● Model is correct if there are not
detections → TN, otherwise FP
Positive periods
● Model is correct if it has >1
detection → TP, otherwise FN
Annotated examples for evaluation
Evaluation #3: anomalous periods
31
Annotated examples for evaluation
Evaluation #3: anomalous periods
Negative periods
● Model is correct if there are not
detections → TN, otherwise FP
Positive periods
● Model is correct if it has >1
detection → TP, otherwise FN
32
Recall Precision
Last value
Seasonal
→ quantifiable results!
We don’t
miss any
anomalies
We don’t
raise false
alarms
Evaluation #3: anomalous periods
Model
Key Takeaways
34
Academia Industry
Well defined
problem
Evaluation
ML Model
Labels
Business
problem
No Evaluation
ML Model
No/partial
labels
Thank you

More Related Content

Similar to Evaluation strategies for dealing with partially labelled or unlabelled data

Market Awareness Overview
Market Awareness OverviewMarket Awareness Overview
Market Awareness Overview
marketawareness2
 
Market Awareness Overview Preentation
Market Awareness Overview PreentationMarket Awareness Overview Preentation
Market Awareness Overview Preentation
marketawareness3
 
Market Awareness Overview Presentation
Market Awareness Overview PresentationMarket Awareness Overview Presentation
Market Awareness Overview Presentation
marketawareness8
 
Market Awareness Overview Presentation
Market Awareness Overview PresentationMarket Awareness Overview Presentation
Market Awareness Overview Presentation
marketawareness5
 
Market Awareness Overview Presentation
Market Awareness Overview PresentationMarket Awareness Overview Presentation
Market Awareness Overview Presentation
marketawareness4
 
Cloud Presentation (5 Meg)
Cloud Presentation (5 Meg)Cloud Presentation (5 Meg)
Cloud Presentation (5 Meg)
Erica Cadwell
 
Market Awareness Overview Presentation
Market Awareness Overview PresentationMarket Awareness Overview Presentation
Market Awareness Overview Presentation
marketawareness6
 
Market Awareness Overview Presentation
Market Awareness Overview PresentationMarket Awareness Overview Presentation
Market Awareness Overview Presentation
marketawareness7
 
Market Awareness Overview
Market Awareness OverviewMarket Awareness Overview
Market Awareness Overview
marketawareness1
 
Grainger Consolidated Marketing Plan-Univ of Illinois Capstone Project
Grainger Consolidated Marketing Plan-Univ of Illinois Capstone ProjectGrainger Consolidated Marketing Plan-Univ of Illinois Capstone Project
Grainger Consolidated Marketing Plan-Univ of Illinois Capstone Project
gyansingh01
 
Mapping the value of your customer's journey (new deck)
Mapping the value of your customer's journey (new deck)Mapping the value of your customer's journey (new deck)
Mapping the value of your customer's journey (new deck)
GoDaddy
 
Mapping the value of your customers journey
Mapping the value of your customers journeyMapping the value of your customers journey
Mapping the value of your customers journey
Ethology
 
Customer Engagement Open Group Oct 2015
Customer Engagement Open Group Oct 2015Customer Engagement Open Group Oct 2015
Customer Engagement Open Group Oct 2015
Richard Veryard
 
Your smarter data analytics strategy - Social Media Strategies Summit (SMSS) ...
Your smarter data analytics strategy - Social Media Strategies Summit (SMSS) ...Your smarter data analytics strategy - Social Media Strategies Summit (SMSS) ...
Your smarter data analytics strategy - Social Media Strategies Summit (SMSS) ...
Clark Boyd
 
Operationalizing Customer Analytics with Azure and Power BI
Operationalizing Customer Analytics with Azure and Power BIOperationalizing Customer Analytics with Azure and Power BI
Operationalizing Customer Analytics with Azure and Power BI
CCG
 
Artificial Intelligence improving customer experience in Retail
Artificial Intelligence improving customer experience in RetailArtificial Intelligence improving customer experience in Retail
Artificial Intelligence improving customer experience in Retail
Gachoucha Kretz
 
Borys Pratsiuk "How to be NVidia partner"
Borys Pratsiuk "How to be NVidia partner"Borys Pratsiuk "How to be NVidia partner"
Borys Pratsiuk "How to be NVidia partner"
Lviv Startup Club
 
How to manage and optimize mobile marketing using webanalytics - Remi van Bee...
How to manage and optimize mobile marketing using webanalytics - Remi van Bee...How to manage and optimize mobile marketing using webanalytics - Remi van Bee...
How to manage and optimize mobile marketing using webanalytics - Remi van Bee...
StormMC
 
IBM Transforming Customer Relationships Through Predictive Analytics
IBM Transforming Customer Relationships Through Predictive AnalyticsIBM Transforming Customer Relationships Through Predictive Analytics
IBM Transforming Customer Relationships Through Predictive Analytics
SFIMA
 
Become a citizen data scientist
Become a citizen data scientistBecome a citizen data scientist
Become a citizen data scientist
Mohamed Jendoubi, M. Sc.
 

Similar to Evaluation strategies for dealing with partially labelled or unlabelled data (20)

Market Awareness Overview
Market Awareness OverviewMarket Awareness Overview
Market Awareness Overview
 
Market Awareness Overview Preentation
Market Awareness Overview PreentationMarket Awareness Overview Preentation
Market Awareness Overview Preentation
 
Market Awareness Overview Presentation
Market Awareness Overview PresentationMarket Awareness Overview Presentation
Market Awareness Overview Presentation
 
Market Awareness Overview Presentation
Market Awareness Overview PresentationMarket Awareness Overview Presentation
Market Awareness Overview Presentation
 
Market Awareness Overview Presentation
Market Awareness Overview PresentationMarket Awareness Overview Presentation
Market Awareness Overview Presentation
 
Cloud Presentation (5 Meg)
Cloud Presentation (5 Meg)Cloud Presentation (5 Meg)
Cloud Presentation (5 Meg)
 
Market Awareness Overview Presentation
Market Awareness Overview PresentationMarket Awareness Overview Presentation
Market Awareness Overview Presentation
 
Market Awareness Overview Presentation
Market Awareness Overview PresentationMarket Awareness Overview Presentation
Market Awareness Overview Presentation
 
Market Awareness Overview
Market Awareness OverviewMarket Awareness Overview
Market Awareness Overview
 
Grainger Consolidated Marketing Plan-Univ of Illinois Capstone Project
Grainger Consolidated Marketing Plan-Univ of Illinois Capstone ProjectGrainger Consolidated Marketing Plan-Univ of Illinois Capstone Project
Grainger Consolidated Marketing Plan-Univ of Illinois Capstone Project
 
Mapping the value of your customer's journey (new deck)
Mapping the value of your customer's journey (new deck)Mapping the value of your customer's journey (new deck)
Mapping the value of your customer's journey (new deck)
 
Mapping the value of your customers journey
Mapping the value of your customers journeyMapping the value of your customers journey
Mapping the value of your customers journey
 
Customer Engagement Open Group Oct 2015
Customer Engagement Open Group Oct 2015Customer Engagement Open Group Oct 2015
Customer Engagement Open Group Oct 2015
 
Your smarter data analytics strategy - Social Media Strategies Summit (SMSS) ...
Your smarter data analytics strategy - Social Media Strategies Summit (SMSS) ...Your smarter data analytics strategy - Social Media Strategies Summit (SMSS) ...
Your smarter data analytics strategy - Social Media Strategies Summit (SMSS) ...
 
Operationalizing Customer Analytics with Azure and Power BI
Operationalizing Customer Analytics with Azure and Power BIOperationalizing Customer Analytics with Azure and Power BI
Operationalizing Customer Analytics with Azure and Power BI
 
Artificial Intelligence improving customer experience in Retail
Artificial Intelligence improving customer experience in RetailArtificial Intelligence improving customer experience in Retail
Artificial Intelligence improving customer experience in Retail
 
Borys Pratsiuk "How to be NVidia partner"
Borys Pratsiuk "How to be NVidia partner"Borys Pratsiuk "How to be NVidia partner"
Borys Pratsiuk "How to be NVidia partner"
 
How to manage and optimize mobile marketing using webanalytics - Remi van Bee...
How to manage and optimize mobile marketing using webanalytics - Remi van Bee...How to manage and optimize mobile marketing using webanalytics - Remi van Bee...
How to manage and optimize mobile marketing using webanalytics - Remi van Bee...
 
IBM Transforming Customer Relationships Through Predictive Analytics
IBM Transforming Customer Relationships Through Predictive AnalyticsIBM Transforming Customer Relationships Through Predictive Analytics
IBM Transforming Customer Relationships Through Predictive Analytics
 
Become a citizen data scientist
Become a citizen data scientistBecome a citizen data scientist
Become a citizen data scientist
 

More from Paris Women in Machine Learning and Data Science

Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
Paris Women in Machine Learning and Data Science
 
How and why AI should fight cybersexism, by Chloe Daudier
How and why AI should fight cybersexism, by Chloe DaudierHow and why AI should fight cybersexism, by Chloe Daudier
How and why AI should fight cybersexism, by Chloe Daudier
Paris Women in Machine Learning and Data Science
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
Paris Women in Machine Learning and Data Science
 
Managing international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha DimbanManaging international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha Dimban
Paris Women in Machine Learning and Data Science
 
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria KnorpsOptimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
Paris Women in Machine Learning and Data Science
 
Perspectives, by M. Pannegeon
Perspectives, by M. PannegeonPerspectives, by M. Pannegeon
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Paris Women in Machine Learning and Data Science
 
An age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-PierreAn age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-Pierre
Paris Women in Machine Learning and Data Science
 
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle LautréApplying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Paris Women in Machine Learning and Data Science
 
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure SoulierHow to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
Paris Women in Machine Learning and Data Science
 
Global Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna AbreuGlobal Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna Abreu
Paris Women in Machine Learning and Data Science
 
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie DelonPlug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
Paris Women in Machine Learning and Data Science
 
Sales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca IannuzziSales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca Iannuzzi
Paris Women in Machine Learning and Data Science
 
Identifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta BinkyteIdentifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta Binkyte
Paris Women in Machine Learning and Data Science
 
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
Paris Women in Machine Learning and Data Science
 
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Paris Women in Machine Learning and Data Science
 
Sandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI projectSandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI project
Paris Women in Machine Learning and Data Science
 
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Paris Women in Machine Learning and Data Science
 
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdfKhrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
Paris Women in Machine Learning and Data Science
 
Iana Iatsun_ML in production_20Dec2022.pdf
Iana Iatsun_ML in production_20Dec2022.pdfIana Iatsun_ML in production_20Dec2022.pdf
Iana Iatsun_ML in production_20Dec2022.pdf
Paris Women in Machine Learning and Data Science
 

More from Paris Women in Machine Learning and Data Science (20)

Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
How and why AI should fight cybersexism, by Chloe Daudier
How and why AI should fight cybersexism, by Chloe DaudierHow and why AI should fight cybersexism, by Chloe Daudier
How and why AI should fight cybersexism, by Chloe Daudier
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Managing international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha DimbanManaging international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha Dimban
 
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria KnorpsOptimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
 
Perspectives, by M. Pannegeon
Perspectives, by M. PannegeonPerspectives, by M. Pannegeon
Perspectives, by M. Pannegeon
 
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
 
An age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-PierreAn age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-Pierre
 
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle LautréApplying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
 
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure SoulierHow to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
 
Global Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna AbreuGlobal Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna Abreu
 
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie DelonPlug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
 
Sales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca IannuzziSales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca Iannuzzi
 
Identifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta BinkyteIdentifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta Binkyte
 
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
 
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
 
Sandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI projectSandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI project
 
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
 
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdfKhrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
 
Iana Iatsun_ML in production_20Dec2022.pdf
Iana Iatsun_ML in production_20Dec2022.pdfIana Iatsun_ML in production_20Dec2022.pdf
Iana Iatsun_ML in production_20Dec2022.pdf
 

Recently uploaded

Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 

Recently uploaded (20)

Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 

Evaluation strategies for dealing with partially labelled or unlabelled data

  • 1. Evaluation strategies with partially or unlabeled data 30/11/2023
  • 2. About us Sharone Dayan Machine Learning Engineer @ Contentsquare sharone.dayan@contentsquare.com Daria Stefic, PhD Data Scientist @ Contentsquare daria.stefic@contentsquare.com
  • 3. Agenda 1. Global Picture Evaluation methods 2. Case studies a. Purchase Intent Prediction b. Unsupervised Segment Discovery c. Anomaly Detection for Alerting 3. Key takeaways Academia VS Industry
  • 5. Do we have labels? Semi-supervised/ Unsupervised Can I use public datasets? Can I generate artificial data? Can I design a proxy? Supervised YES NO or FEW Can I get a manually labelled dataset? Unsupervised Segment Discovery Purchase Intent Prediction Anomaly detection for Alerting Evaluation methods
  • 7. Case study: Purchase Intent Prediction “Can I design a proxy?”
  • 8. 8 Main goal “Who could have converted?” non-buyers buyers missed purchase segment purchase intent segment Detect non converting users who had converting intentions based on their behaviour (e.g. interaction with product details, add to cart) Design choices: ● Focus on anonymous users ● Focus on retail clients ● No difference between single-item and multi-item purchase ● Offline prediction
  • 9. 9 If we had labels for purchase intent… …we could: 1. directly train a classifier in a supervised way to recognise intent 2. evaluate our classifier with standard classification metrics (e.g. f1-score) directly on these labels 3. compare different solutions in an unbiased way …but: The only labels we have for purchase intent are from converting sessions
  • 10. 10 Positive Negative Positive Negative We don’t want any converters predicted as ‘Not intended to purchase‘ Predicted intent Actual conversion Purchase Intent Evaluation We want some non-converters predicted as ‘Intended to purchase’
  • 11. © Contentsquare 2022 We need proxy labels for purchase intent 11 1. converting sessions have the intention to purchase by definition 2. sessions that put something in the cart and purchase it in a later session probably had intention to purchase “Strict” proxy set “Loose” proxy set Items added-to-cart in current session are purchased in a later session of the same user Add-to-cart event in current session culminate in a later converting session of the same user, on the same day Merchandising CS clients only All CS clients
  • 12. 12 Schematic predictions Non-buyers (test set) Buyers (training set) Buyers (test set) Predict purchase Predict purchase intent Add to cart leading to conversion in next sessions MISSED PURCHASE
  • 13. Case study: User Segment Discovery “Can I generate artificial data?”
  • 14. © Contentsquare 2022 Main goal 14 Understand the main user persona based on the user’s page visit sequence thanks to unsupervised learning Hypothetical outcome: Here are 3 typical visitor personas: ● Early leaver: Home page - PDP - Exit ● Heavy searcher: Home - Search - Multiple PDPs - Cart - Exit ● Golden converting journey: Home page - PLP - PDP - Cart - Delivery - Payment “What are the typical paths people take?”
  • 15. 15 Wait, what… Evaluating unsupervised? How to include business constraints? How to benchmark different settings (features, distances, clustering algorithms, etc.)?
  • 16. 16 Idea HSSSSSSS PSPSPSPSP HSSSSSSSS HPPPPPPPP SSSSSSSS HSPSPSPSP HSPPPPPPP HSSSSSSS PSPSPSPSP HSSSSSSSS HPPPPPPPP SSSSSSSS HSPSPSPSP HSPPPPPPP ● 3 patterns: - Mainly visiting product pages - Mainly visiting search results - Looping between product pages and search results ● 3 categories ○ P = Product page ○ S = Search results ○ H = Home page
  • 17. 17 We need to validate the clustering “health” Toy scenarios where we have clear expectations on what the result of the clustering should be + run functional tests around them. 4 types of session generators (artificial data): - sessions focused on one specific page - sessions having a given probability for each page group - “cycling“ sessions always coming back to the same sequence - sessions containing a specific pattern
  • 18. 18 Health check results complex health check -> areas of improvement basic health check -> mandatory Health check difficulty Features A, distance a, clustering i Features A, distance b, clustering i … Features B, distance c, clustering j … … … … …
  • 19. Case study: Anomaly Detection for Alerting “Can I get a manually labelled dataset?”
  • 20. 20 Example: number of users with API errors Anomaly detection in time series Alert! Main goal Alert clients (in real time) if they have issues on their platform
  • 21. 21 Example: number of users with API errors Seasonal pattern The data
  • 23. 23 But, how do we know if our model is: ● Raising true alerts? ● Not raising false alerts? When metric values are beyond the bounds → alert The model = seasonality + bounds
  • 24. 24 Not scalable! Evaluation #0: visual inspection
  • 25. 25 Seasonality model: forecasting error (MAE) = 49.25 Evaluation #1: forecasting error
  • 26. 26 Last value model: forecasting error (MAE) = 35.24 Evaluation #1: forecasting error
  • 27. 27 Raised alerts of last value model vs. seasonal model Last value: Seasonal: Evaluation #1: forecasting error False alarm! .
  • 28. 28 Do we want to alert on those? Evaluation #2: anomalous points classification
  • 29. 29 When something happens, it lasts more than 5 min Example: Iphone launch Evaluation #3: anomalous periods
  • 30. 30 Negative periods ● Model is correct if there are not detections → TN, otherwise FP Positive periods ● Model is correct if it has >1 detection → TP, otherwise FN Annotated examples for evaluation Evaluation #3: anomalous periods
  • 31. 31 Annotated examples for evaluation Evaluation #3: anomalous periods Negative periods ● Model is correct if there are not detections → TN, otherwise FP Positive periods ● Model is correct if it has >1 detection → TP, otherwise FN
  • 32. 32 Recall Precision Last value Seasonal → quantifiable results! We don’t miss any anomalies We don’t raise false alarms Evaluation #3: anomalous periods Model
  • 34. 34 Academia Industry Well defined problem Evaluation ML Model Labels Business problem No Evaluation ML Model No/partial labels