SlideShare a Scribd company logo
1 of 35
Download to read offline
Evaluation strategies
with partially or unlabeled data
30/11/2023
About us
Sharone Dayan
Machine Learning Engineer
@ Contentsquare
sharone.dayan@contentsquare.com
Daria Stefic, PhD
Data Scientist
@ Contentsquare
daria.stefic@contentsquare.com
Agenda
1. Global Picture
Evaluation methods
2. Case studies
a. Purchase Intent Prediction
b. Unsupervised Segment Discovery
c. Anomaly Detection for Alerting
3. Key takeaways
Academia VS Industry
Global Picture
Do we have
labels?
Semi-supervised/
Unsupervised
Can I use public
datasets?
Can I generate
artificial data?
Can I design a
proxy?
Supervised
YES NO or FEW
Can I get a
manually labelled
dataset?
Unsupervised
Segment Discovery
Purchase Intent
Prediction
Anomaly detection for
Alerting
Evaluation methods
Case studies
Case study:
Purchase Intent Prediction
“Can I design a proxy?”
8
Main goal
“Who could have converted?”
non-buyers buyers
missed purchase segment
purchase intent segment
Detect non converting users who
had converting intentions based on
their behaviour (e.g. interaction with
product details, add to cart)
Design choices:
● Focus on anonymous users
● Focus on retail clients
● No difference between single-item
and multi-item purchase
● Offline prediction
9
If we had labels for purchase intent…
…we could:
1. directly train a classifier in a supervised
way to recognise intent
2. evaluate our classifier with standard
classification metrics (e.g. f1-score) directly
on these labels
3. compare different solutions in an unbiased
way
…but:
The only labels we have for purchase intent are
from converting sessions
10
Positive Negative
Positive
Negative
We don’t want any
converters predicted as
‘Not intended to purchase‘
Predicted intent
Actual
conversion
Purchase Intent Evaluation
We want some
non-converters predicted
as ‘Intended to purchase’
© Contentsquare 2022
We need proxy labels for purchase intent
11
1. converting sessions have the intention to purchase by definition
2. sessions that put something in the cart and purchase it in a later
session probably had intention to purchase
“Strict” proxy set “Loose” proxy set
Items added-to-cart in current session are
purchased in a later session of the same user
Add-to-cart event in current session culminate
in a later converting session of the same user,
on the same day
Merchandising CS clients only
All CS clients
12
Schematic predictions Non-buyers (test set)
Buyers (training set)
Buyers (test set)
Predict purchase
Predict purchase intent
Add to cart leading to conversion
in next sessions
MISSED PURCHASE
Case study:
User Segment Discovery
“Can I generate artificial data?”
© Contentsquare 2022
Main goal
14
Understand the main user persona
based on the user’s page visit
sequence thanks to unsupervised
learning
Hypothetical outcome:
Here are 3 typical visitor personas:
● Early leaver: Home page - PDP - Exit
● Heavy searcher: Home - Search -
Multiple PDPs - Cart - Exit
● Golden converting journey: Home
page - PLP - PDP - Cart - Delivery -
Payment
“What are the typical paths people take?”
15
Wait, what… Evaluating unsupervised?
How to include business constraints?
How to benchmark different settings (features, distances,
clustering algorithms, etc.)?
16
Idea
HSSSSSSS
PSPSPSPSP
HSSSSSSSS
HPPPPPPPP
SSSSSSSS
HSPSPSPSP
HSPPPPPPP
HSSSSSSS
PSPSPSPSP
HSSSSSSSS
HPPPPPPPP
SSSSSSSS
HSPSPSPSP
HSPPPPPPP
● 3 patterns:
- Mainly visiting
product pages
- Mainly visiting search
results
- Looping between
product pages and
search results
● 3 categories
○ P = Product page
○ S = Search results
○ H = Home page
17
We need to validate the clustering “health”
Toy scenarios where we have clear expectations on what the result of
the clustering should be + run functional tests around them.
4 types of session generators (artificial data):
- sessions focused on one specific page
- sessions having a given probability for each page group
- “cycling“ sessions always coming back to the same sequence
- sessions containing a specific pattern
18
Health check results complex health check -> areas of improvement
basic health check -> mandatory
Health check
difficulty
Features A,
distance a,
clustering i
Features A,
distance b,
clustering i
… Features B,
distance c,
clustering j
…
…
…
…
…
Case study:
Anomaly Detection for Alerting
“Can I get a manually labelled dataset?”
20
Example: number of users with API errors
Anomaly detection in time series
Alert!
Main goal
Alert clients (in real time) if they have issues on their platform
21
Example: number of users with API errors
Seasonal pattern
The data
22
Confidence bounds
The data
23
But, how do we know if our model is:
● Raising true alerts?
● Not raising false alerts?
When metric values are beyond the bounds → alert
The model = seasonality + bounds
24
Not scalable!
Evaluation #0: visual inspection
25
Seasonality model:
forecasting error (MAE) = 49.25
Evaluation #1: forecasting error
26
Last value model:
forecasting error (MAE) = 35.24
Evaluation #1: forecasting error
27
Raised alerts of last value model vs. seasonal model
Last value:
Seasonal:
Evaluation #1: forecasting error
False alarm!
.
28
Do we want to alert on those?
Evaluation #2: anomalous points classification
29
When something happens, it lasts more than 5 min
Example: Iphone launch
Evaluation #3: anomalous periods
30
Negative periods
● Model is correct if there are not
detections → TN, otherwise FP
Positive periods
● Model is correct if it has >1
detection → TP, otherwise FN
Annotated examples for evaluation
Evaluation #3: anomalous periods
31
Annotated examples for evaluation
Evaluation #3: anomalous periods
Negative periods
● Model is correct if there are not
detections → TN, otherwise FP
Positive periods
● Model is correct if it has >1
detection → TP, otherwise FN
32
Recall Precision
Last value
Seasonal
→ quantifiable results!
We don’t
miss any
anomalies
We don’t
raise false
alarms
Evaluation #3: anomalous periods
Model
Key Takeaways
34
Academia Industry
Well defined
problem
Evaluation
ML Model
Labels
Business
problem
No Evaluation
ML Model
No/partial
labels
Thank you

More Related Content

Similar to Evaluation strategies for dealing with partially labelled or unlabelled data

Cloud Presentation (5 Meg)
Cloud Presentation (5 Meg)Cloud Presentation (5 Meg)
Cloud Presentation (5 Meg)Erica Cadwell
 
Market Awareness Overview Presentation
Market Awareness Overview PresentationMarket Awareness Overview Presentation
Market Awareness Overview Presentationmarketawareness6
 
Market Awareness Overview Presentation
Market Awareness Overview PresentationMarket Awareness Overview Presentation
Market Awareness Overview Presentationmarketawareness7
 
Market Awareness Overview Presentation
Market Awareness Overview PresentationMarket Awareness Overview Presentation
Market Awareness Overview Presentationmarketawareness5
 
Market Awareness Overview Preentation
Market Awareness Overview PreentationMarket Awareness Overview Preentation
Market Awareness Overview Preentationmarketawareness3
 
Market Awareness Overview Presentation
Market Awareness Overview PresentationMarket Awareness Overview Presentation
Market Awareness Overview Presentationmarketawareness8
 
Market Awareness Overview Presentation
Market Awareness Overview PresentationMarket Awareness Overview Presentation
Market Awareness Overview Presentationmarketawareness4
 
Grainger Consolidated Marketing Plan-Univ of Illinois Capstone Project
Grainger Consolidated Marketing Plan-Univ of Illinois Capstone ProjectGrainger Consolidated Marketing Plan-Univ of Illinois Capstone Project
Grainger Consolidated Marketing Plan-Univ of Illinois Capstone Projectgyansingh01
 
Mapping the value of your customers journey
Mapping the value of your customers journeyMapping the value of your customers journey
Mapping the value of your customers journeyEthology
 
Mapping the value of your customer's journey (new deck)
Mapping the value of your customer's journey (new deck)Mapping the value of your customer's journey (new deck)
Mapping the value of your customer's journey (new deck)GoDaddy
 
Customer Engagement Open Group Oct 2015
Customer Engagement Open Group Oct 2015Customer Engagement Open Group Oct 2015
Customer Engagement Open Group Oct 2015Richard Veryard
 
Your smarter data analytics strategy - Social Media Strategies Summit (SMSS) ...
Your smarter data analytics strategy - Social Media Strategies Summit (SMSS) ...Your smarter data analytics strategy - Social Media Strategies Summit (SMSS) ...
Your smarter data analytics strategy - Social Media Strategies Summit (SMSS) ...Clark Boyd
 
Operationalizing Customer Analytics with Azure and Power BI
Operationalizing Customer Analytics with Azure and Power BIOperationalizing Customer Analytics with Azure and Power BI
Operationalizing Customer Analytics with Azure and Power BICCG
 
Artificial Intelligence improving customer experience in Retail
Artificial Intelligence improving customer experience in RetailArtificial Intelligence improving customer experience in Retail
Artificial Intelligence improving customer experience in RetailGachoucha Kretz
 
Borys Pratsiuk "How to be NVidia partner"
Borys Pratsiuk "How to be NVidia partner"Borys Pratsiuk "How to be NVidia partner"
Borys Pratsiuk "How to be NVidia partner"Lviv Startup Club
 
How to manage and optimize mobile marketing using webanalytics - Remi van Bee...
How to manage and optimize mobile marketing using webanalytics - Remi van Bee...How to manage and optimize mobile marketing using webanalytics - Remi van Bee...
How to manage and optimize mobile marketing using webanalytics - Remi van Bee...StormMC
 
IBM Transforming Customer Relationships Through Predictive Analytics
IBM Transforming Customer Relationships Through Predictive AnalyticsIBM Transforming Customer Relationships Through Predictive Analytics
IBM Transforming Customer Relationships Through Predictive AnalyticsSFIMA
 

Similar to Evaluation strategies for dealing with partially labelled or unlabelled data (20)

Cloud Presentation (5 Meg)
Cloud Presentation (5 Meg)Cloud Presentation (5 Meg)
Cloud Presentation (5 Meg)
 
Market Awareness Overview Presentation
Market Awareness Overview PresentationMarket Awareness Overview Presentation
Market Awareness Overview Presentation
 
Market Awareness Overview Presentation
Market Awareness Overview PresentationMarket Awareness Overview Presentation
Market Awareness Overview Presentation
 
Market Awareness Overview
Market Awareness OverviewMarket Awareness Overview
Market Awareness Overview
 
Market Awareness Overview Presentation
Market Awareness Overview PresentationMarket Awareness Overview Presentation
Market Awareness Overview Presentation
 
Market Awareness Overview
Market Awareness OverviewMarket Awareness Overview
Market Awareness Overview
 
Market Awareness Overview Preentation
Market Awareness Overview PreentationMarket Awareness Overview Preentation
Market Awareness Overview Preentation
 
Market Awareness Overview Presentation
Market Awareness Overview PresentationMarket Awareness Overview Presentation
Market Awareness Overview Presentation
 
Market Awareness Overview Presentation
Market Awareness Overview PresentationMarket Awareness Overview Presentation
Market Awareness Overview Presentation
 
Grainger Consolidated Marketing Plan-Univ of Illinois Capstone Project
Grainger Consolidated Marketing Plan-Univ of Illinois Capstone ProjectGrainger Consolidated Marketing Plan-Univ of Illinois Capstone Project
Grainger Consolidated Marketing Plan-Univ of Illinois Capstone Project
 
Mapping the value of your customers journey
Mapping the value of your customers journeyMapping the value of your customers journey
Mapping the value of your customers journey
 
Mapping the value of your customer's journey (new deck)
Mapping the value of your customer's journey (new deck)Mapping the value of your customer's journey (new deck)
Mapping the value of your customer's journey (new deck)
 
Customer Engagement Open Group Oct 2015
Customer Engagement Open Group Oct 2015Customer Engagement Open Group Oct 2015
Customer Engagement Open Group Oct 2015
 
Your smarter data analytics strategy - Social Media Strategies Summit (SMSS) ...
Your smarter data analytics strategy - Social Media Strategies Summit (SMSS) ...Your smarter data analytics strategy - Social Media Strategies Summit (SMSS) ...
Your smarter data analytics strategy - Social Media Strategies Summit (SMSS) ...
 
Operationalizing Customer Analytics with Azure and Power BI
Operationalizing Customer Analytics with Azure and Power BIOperationalizing Customer Analytics with Azure and Power BI
Operationalizing Customer Analytics with Azure and Power BI
 
Artificial Intelligence improving customer experience in Retail
Artificial Intelligence improving customer experience in RetailArtificial Intelligence improving customer experience in Retail
Artificial Intelligence improving customer experience in Retail
 
Borys Pratsiuk "How to be NVidia partner"
Borys Pratsiuk "How to be NVidia partner"Borys Pratsiuk "How to be NVidia partner"
Borys Pratsiuk "How to be NVidia partner"
 
How to manage and optimize mobile marketing using webanalytics - Remi van Bee...
How to manage and optimize mobile marketing using webanalytics - Remi van Bee...How to manage and optimize mobile marketing using webanalytics - Remi van Bee...
How to manage and optimize mobile marketing using webanalytics - Remi van Bee...
 
IBM Transforming Customer Relationships Through Predictive Analytics
IBM Transforming Customer Relationships Through Predictive AnalyticsIBM Transforming Customer Relationships Through Predictive Analytics
IBM Transforming Customer Relationships Through Predictive Analytics
 
Become a citizen data scientist
Become a citizen data scientistBecome a citizen data scientist
Become a citizen data scientist
 

More from Paris Women in Machine Learning and Data Science

More from Paris Women in Machine Learning and Data Science (20)

Managing international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha DimbanManaging international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha Dimban
 
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria KnorpsOptimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
 
Perspectives, by M. Pannegeon
Perspectives, by M. PannegeonPerspectives, by M. Pannegeon
Perspectives, by M. Pannegeon
 
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
 
An age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-PierreAn age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-Pierre
 
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle LautréApplying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
 
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure SoulierHow to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
 
Global Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna AbreuGlobal Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna Abreu
 
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie DelonPlug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
 
Sales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca IannuzziSales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca Iannuzzi
 
Identifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta BinkyteIdentifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta Binkyte
 
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
 
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
 
Sandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI projectSandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI project
 
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
 
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdfKhrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
 
Iana Iatsun_ML in production_20Dec2022.pdf
Iana Iatsun_ML in production_20Dec2022.pdfIana Iatsun_ML in production_20Dec2022.pdf
Iana Iatsun_ML in production_20Dec2022.pdf
 
41 WiMLDS Kyiv Paris Poznan.pdf
41 WiMLDS Kyiv Paris Poznan.pdf41 WiMLDS Kyiv Paris Poznan.pdf
41 WiMLDS Kyiv Paris Poznan.pdf
 
Emergency plan to secure winter: what are the measures set up by RTE?
Emergency plan to secure winter: what are the measures set up by RTE?Emergency plan to secure winter: what are the measures set up by RTE?
Emergency plan to secure winter: what are the measures set up by RTE?
 
New edge prediction and anomaly-detection in large computer networks
New edge prediction and anomaly-detection in large computer networksNew edge prediction and anomaly-detection in large computer networks
New edge prediction and anomaly-detection in large computer networks
 

Recently uploaded

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Recently uploaded (20)

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

Evaluation strategies for dealing with partially labelled or unlabelled data

  • 1. Evaluation strategies with partially or unlabeled data 30/11/2023
  • 2. About us Sharone Dayan Machine Learning Engineer @ Contentsquare sharone.dayan@contentsquare.com Daria Stefic, PhD Data Scientist @ Contentsquare daria.stefic@contentsquare.com
  • 3. Agenda 1. Global Picture Evaluation methods 2. Case studies a. Purchase Intent Prediction b. Unsupervised Segment Discovery c. Anomaly Detection for Alerting 3. Key takeaways Academia VS Industry
  • 5. Do we have labels? Semi-supervised/ Unsupervised Can I use public datasets? Can I generate artificial data? Can I design a proxy? Supervised YES NO or FEW Can I get a manually labelled dataset? Unsupervised Segment Discovery Purchase Intent Prediction Anomaly detection for Alerting Evaluation methods
  • 7. Case study: Purchase Intent Prediction “Can I design a proxy?”
  • 8. 8 Main goal “Who could have converted?” non-buyers buyers missed purchase segment purchase intent segment Detect non converting users who had converting intentions based on their behaviour (e.g. interaction with product details, add to cart) Design choices: ● Focus on anonymous users ● Focus on retail clients ● No difference between single-item and multi-item purchase ● Offline prediction
  • 9. 9 If we had labels for purchase intent… …we could: 1. directly train a classifier in a supervised way to recognise intent 2. evaluate our classifier with standard classification metrics (e.g. f1-score) directly on these labels 3. compare different solutions in an unbiased way …but: The only labels we have for purchase intent are from converting sessions
  • 10. 10 Positive Negative Positive Negative We don’t want any converters predicted as ‘Not intended to purchase‘ Predicted intent Actual conversion Purchase Intent Evaluation We want some non-converters predicted as ‘Intended to purchase’
  • 11. © Contentsquare 2022 We need proxy labels for purchase intent 11 1. converting sessions have the intention to purchase by definition 2. sessions that put something in the cart and purchase it in a later session probably had intention to purchase “Strict” proxy set “Loose” proxy set Items added-to-cart in current session are purchased in a later session of the same user Add-to-cart event in current session culminate in a later converting session of the same user, on the same day Merchandising CS clients only All CS clients
  • 12. 12 Schematic predictions Non-buyers (test set) Buyers (training set) Buyers (test set) Predict purchase Predict purchase intent Add to cart leading to conversion in next sessions MISSED PURCHASE
  • 13. Case study: User Segment Discovery “Can I generate artificial data?”
  • 14. © Contentsquare 2022 Main goal 14 Understand the main user persona based on the user’s page visit sequence thanks to unsupervised learning Hypothetical outcome: Here are 3 typical visitor personas: ● Early leaver: Home page - PDP - Exit ● Heavy searcher: Home - Search - Multiple PDPs - Cart - Exit ● Golden converting journey: Home page - PLP - PDP - Cart - Delivery - Payment “What are the typical paths people take?”
  • 15. 15 Wait, what… Evaluating unsupervised? How to include business constraints? How to benchmark different settings (features, distances, clustering algorithms, etc.)?
  • 16. 16 Idea HSSSSSSS PSPSPSPSP HSSSSSSSS HPPPPPPPP SSSSSSSS HSPSPSPSP HSPPPPPPP HSSSSSSS PSPSPSPSP HSSSSSSSS HPPPPPPPP SSSSSSSS HSPSPSPSP HSPPPPPPP ● 3 patterns: - Mainly visiting product pages - Mainly visiting search results - Looping between product pages and search results ● 3 categories ○ P = Product page ○ S = Search results ○ H = Home page
  • 17. 17 We need to validate the clustering “health” Toy scenarios where we have clear expectations on what the result of the clustering should be + run functional tests around them. 4 types of session generators (artificial data): - sessions focused on one specific page - sessions having a given probability for each page group - “cycling“ sessions always coming back to the same sequence - sessions containing a specific pattern
  • 18. 18 Health check results complex health check -> areas of improvement basic health check -> mandatory Health check difficulty Features A, distance a, clustering i Features A, distance b, clustering i … Features B, distance c, clustering j … … … … …
  • 19. Case study: Anomaly Detection for Alerting “Can I get a manually labelled dataset?”
  • 20. 20 Example: number of users with API errors Anomaly detection in time series Alert! Main goal Alert clients (in real time) if they have issues on their platform
  • 21. 21 Example: number of users with API errors Seasonal pattern The data
  • 23. 23 But, how do we know if our model is: ● Raising true alerts? ● Not raising false alerts? When metric values are beyond the bounds → alert The model = seasonality + bounds
  • 24. 24 Not scalable! Evaluation #0: visual inspection
  • 25. 25 Seasonality model: forecasting error (MAE) = 49.25 Evaluation #1: forecasting error
  • 26. 26 Last value model: forecasting error (MAE) = 35.24 Evaluation #1: forecasting error
  • 27. 27 Raised alerts of last value model vs. seasonal model Last value: Seasonal: Evaluation #1: forecasting error False alarm! .
  • 28. 28 Do we want to alert on those? Evaluation #2: anomalous points classification
  • 29. 29 When something happens, it lasts more than 5 min Example: Iphone launch Evaluation #3: anomalous periods
  • 30. 30 Negative periods ● Model is correct if there are not detections → TN, otherwise FP Positive periods ● Model is correct if it has >1 detection → TP, otherwise FN Annotated examples for evaluation Evaluation #3: anomalous periods
  • 31. 31 Annotated examples for evaluation Evaluation #3: anomalous periods Negative periods ● Model is correct if there are not detections → TN, otherwise FP Positive periods ● Model is correct if it has >1 detection → TP, otherwise FN
  • 32. 32 Recall Precision Last value Seasonal → quantifiable results! We don’t miss any anomalies We don’t raise false alarms Evaluation #3: anomalous periods Model
  • 34. 34 Academia Industry Well defined problem Evaluation ML Model Labels Business problem No Evaluation ML Model No/partial labels