SlideShare a Scribd company logo
MEASURING
OPERATIONAL
QUALITY OF
RECOMMENDATIONS
Lina Weichbrodt
12th ACM Conference on Recommender Systems
04-10-2018
2
Speaker: Lina Weichbrodt @rmminusrslash
● Research Engineer in the recommendation algorithms team at
Zalando
● Develop scalable machine learning products
● Zalando: Europe’s leading online fashion platform, 4.5 B
revenue in 2017, 17 European countries
3
Article Recommendation
How can we monitor
recommendation quality in real time?
PROBLEM
4
OUTLINE
● Problem Analysis
○ Status Quo Algorithmic Service Monitoring
○ Risk Areas
● Solution
○ Definition of Response Quality
○ How to Select a Quality Metric
● Case Study
5
PROBLEM ANALYSIS
6
STATUS QUO ALGORITHMIC SERVICE MONITORING
ZMON
● Status Quo: speed and errors during processing
(latency and availability)
● Good, but not enough!
7
RISK AREAS: OVERVIEW
8
● The input data changes. Typical reasons might be a client
application that releases a bug (e.g., lowercasing a case
sensitive identifier) or changes a feature in a way that affects
the data distribution such as allowing all users to use the
product cart instead of previously allowing it only for logged
in users. If the change is not detected training data and
serving data can diverge.
● The model is updated and the new version is inferior to the
previous one.
● The latest deployment of the stack that processes the
request and serves the model contains a bug.
RISK AREAS
9
● Changes concerning external services lead to performance
loss. An example in an e-commerce setting is switching to a
different microservice to obtain article metadata used for
filtering the recommendations.
● Changes in configuration (applied filters, algorithms, A/B
test configuration) are faulty or lead to unforeseen quality
degradation.
RISK AREAS
10
We need quality monitoring of
delivered recommendations
CONCLUSION
11
SOLUTION
12
● Typical definition of successful response: fast enough,
no http errors.
● Suggested definition: fast enough, successful
according to the business case.
Example: For personalized recommendations on Zalando
Home Page we expect…
DEFINITION OF RESPONSE QUALITY
more than 95%
http 200 OK responses
under 200ms.
13
● Typical definition of successful response: fast enough,
no http errors.
● Suggested definition: fast enough, successful
according to the business case.
Example: For personalized recommendations on Zalando
Home Page we expect…
DEFINITION OF RESPONSE QUALITY
more than 65% of responses have
at last four articles in under 200ms.
more than 95%
http 200 OK responses
under 200ms.
14
● Quality metric(s) are business case dependent
-> choose your own!
● Measuring the objective or user perceived quality is
(most likely) not possible. That is not a problem, simple
heuristics are already effective.
HOW TO SELECT A QUALITY METRIC
15
● Suggested criteria
● comparable across models
● simple and easy to understand
● can be collected in real time
● allows for actionable alerting on problems
HOW TO SELECT A QUALITY METRIC
16
CASE STUDY
17
● Recommendation Team:
○ Dozens of business cases for website, apps and
emails
○ Heavy load with up to 4000 req/sec
○ Low latency requirements
CASE STUDY: ZALANDO
18
● Background: Recommendations are created from a
sequence of configurations (model and filtering rules)
● Simple quality metric
○ Good response: top 5 positions are from best
configuration
○ Poor response: less than 5 articles or the top 5
articles contain a fallback (e.g. popular item)
CASE STUDY: SELECT QUALITY METRIC
Best Config
Config 2
….
Worst Config
19
CASE STUDY: IMPLEMENTATION
20
CASE STUDY: ALERTING
Use case XY
21
● DO: add specific alerts for important use cases
● DO: if you have so many use cases that choosing
individual thresholds is not reasonable for all of them
add coarse alerts e.g. success percentage <50 %
CASE STUDY: ALERTING
22
CASE STUDY: ANALYSIS
Use case XY
23
● Goal: Detect bugs in the new version of the stack
● DO: Deploy model changes and code changes
separately
● Deployment process:
○ Two stack versions are live at the same time
○ Traffic is being gradually switched over
○ Measure each minute the difference of good and
bad percentages between the two stacks -> A/A
test
● Distribution of difference: average is expected to be 0 if
no bug was introduced
CASE STUDY: DEPLOYMENT MONITORING
24
CASE STUDY: DEPLOYMENT MONITORING
Use cases
25
● Insights
○ Online monitoring for data driven services must
include quality metrics
○ Definition of quality metrics and their acceptable
level is business case dependent
○ A very simplistic metric is already very useful
CONTRIBUTIONS
26
● Quality Metrics
○ allow for real-time quality monitoring
○ will detect problems that are hard to spot
○ easy to integrate with existing tools
○ can be used to set user expectations about the
service performance
CONTRIBUTIONS
27
Work with us in Berlin!
The Reco Algorithms Team is hiring a
Principal Research Engineer.

More Related Content

What's hot

The Role Of The Sqa In Software Development By Jim Coleman
The Role Of The Sqa In Software Development By Jim ColemanThe Role Of The Sqa In Software Development By Jim Coleman
The Role Of The Sqa In Software Development By Jim Coleman
James Coleman
 
Lesson 2....PPT 1
Lesson 2....PPT 1Lesson 2....PPT 1
Lesson 2....PPT 1
bhushan Nehete
 
Overview of test process improvement frameworks
Overview of test process improvement frameworksOverview of test process improvement frameworks
Overview of test process improvement frameworks
Nikita Knysh
 
www.tutorialsbook.com presents Manual testing
www.tutorialsbook.com presents Manual testingwww.tutorialsbook.com presents Manual testing
www.tutorialsbook.com presents Manual testing
Tutorials Book
 
'Growing to a Next Level Test Organisation' by Tim Koomen
'Growing to a Next Level Test Organisation' by Tim Koomen'Growing to a Next Level Test Organisation' by Tim Koomen
'Growing to a Next Level Test Organisation' by Tim Koomen
TEST Huddle
 
'Houston We Have A Problem' by Rien van Vugt & Maurice Siteur
'Houston We Have A Problem' by Rien van Vugt & Maurice Siteur'Houston We Have A Problem' by Rien van Vugt & Maurice Siteur
'Houston We Have A Problem' by Rien van Vugt & Maurice Siteur
TEST Huddle
 
Practical Application Of Risk Based Testing Methods
Practical Application Of Risk Based Testing MethodsPractical Application Of Risk Based Testing Methods
Practical Application Of Risk Based Testing Methods
Reuben Korngold
 
Introduction to White box testing
Introduction to White box testingIntroduction to White box testing
Introduction to White box testing
Aliaa Monier Ismaail
 
Edwin Van Loon - How Much Testing is Enough - EuroSTAR 2010
Edwin Van Loon -  How Much Testing is Enough - EuroSTAR 2010Edwin Van Loon -  How Much Testing is Enough - EuroSTAR 2010
Edwin Van Loon - How Much Testing is Enough - EuroSTAR 2010
TEST Huddle
 
Gitte Ottosen - Agility and Process Maturity, Of Course They Mix!
Gitte Ottosen - Agility and Process Maturity, Of Course They Mix!Gitte Ottosen - Agility and Process Maturity, Of Course They Mix!
Gitte Ottosen - Agility and Process Maturity, Of Course They Mix!
TEST Huddle
 
Dynamic Testing
Dynamic TestingDynamic Testing
Dynamic Testing
Jimi Patel
 
Writing Test Cases 20110808
Writing Test Cases 20110808Writing Test Cases 20110808
Writing Test Cases 20110808
slovejoy
 
Requirements Driven Risk Based Testing
Requirements Driven Risk Based TestingRequirements Driven Risk Based Testing
Requirements Driven Risk Based Testing
Jeff Findlay
 
Validation vs. verification
Validation vs. verificationValidation vs. verification
Validation vs. verification
Saad Al Jabri
 
Testing
TestingTesting
Testing
Kiran Kumar
 
Test Case, Use Case and Test Scenario
Test Case, Use Case and Test ScenarioTest Case, Use Case and Test Scenario
Test Case, Use Case and Test Scenario
Lokesh Agrawal
 
Measurement and Metrics for Test Managers
Measurement and Metrics for Test ManagersMeasurement and Metrics for Test Managers
Measurement and Metrics for Test Managers
TechWell
 
Acceptance sampling
Acceptance samplingAcceptance sampling
Acceptance sampling
Prasanth Khanna
 
Test Cases Vs Test Scenarios
Test Cases Vs Test ScenariosTest Cases Vs Test Scenarios
Test Cases Vs Test Scenarios
Sneha Singh
 
Improve Your Test Process from the Bottom Up
Improve Your Test Process from the Bottom UpImprove Your Test Process from the Bottom Up
Improve Your Test Process from the Bottom Up
TechWell
 

What's hot (20)

The Role Of The Sqa In Software Development By Jim Coleman
The Role Of The Sqa In Software Development By Jim ColemanThe Role Of The Sqa In Software Development By Jim Coleman
The Role Of The Sqa In Software Development By Jim Coleman
 
Lesson 2....PPT 1
Lesson 2....PPT 1Lesson 2....PPT 1
Lesson 2....PPT 1
 
Overview of test process improvement frameworks
Overview of test process improvement frameworksOverview of test process improvement frameworks
Overview of test process improvement frameworks
 
www.tutorialsbook.com presents Manual testing
www.tutorialsbook.com presents Manual testingwww.tutorialsbook.com presents Manual testing
www.tutorialsbook.com presents Manual testing
 
'Growing to a Next Level Test Organisation' by Tim Koomen
'Growing to a Next Level Test Organisation' by Tim Koomen'Growing to a Next Level Test Organisation' by Tim Koomen
'Growing to a Next Level Test Organisation' by Tim Koomen
 
'Houston We Have A Problem' by Rien van Vugt & Maurice Siteur
'Houston We Have A Problem' by Rien van Vugt & Maurice Siteur'Houston We Have A Problem' by Rien van Vugt & Maurice Siteur
'Houston We Have A Problem' by Rien van Vugt & Maurice Siteur
 
Practical Application Of Risk Based Testing Methods
Practical Application Of Risk Based Testing MethodsPractical Application Of Risk Based Testing Methods
Practical Application Of Risk Based Testing Methods
 
Introduction to White box testing
Introduction to White box testingIntroduction to White box testing
Introduction to White box testing
 
Edwin Van Loon - How Much Testing is Enough - EuroSTAR 2010
Edwin Van Loon -  How Much Testing is Enough - EuroSTAR 2010Edwin Van Loon -  How Much Testing is Enough - EuroSTAR 2010
Edwin Van Loon - How Much Testing is Enough - EuroSTAR 2010
 
Gitte Ottosen - Agility and Process Maturity, Of Course They Mix!
Gitte Ottosen - Agility and Process Maturity, Of Course They Mix!Gitte Ottosen - Agility and Process Maturity, Of Course They Mix!
Gitte Ottosen - Agility and Process Maturity, Of Course They Mix!
 
Dynamic Testing
Dynamic TestingDynamic Testing
Dynamic Testing
 
Writing Test Cases 20110808
Writing Test Cases 20110808Writing Test Cases 20110808
Writing Test Cases 20110808
 
Requirements Driven Risk Based Testing
Requirements Driven Risk Based TestingRequirements Driven Risk Based Testing
Requirements Driven Risk Based Testing
 
Validation vs. verification
Validation vs. verificationValidation vs. verification
Validation vs. verification
 
Testing
TestingTesting
Testing
 
Test Case, Use Case and Test Scenario
Test Case, Use Case and Test ScenarioTest Case, Use Case and Test Scenario
Test Case, Use Case and Test Scenario
 
Measurement and Metrics for Test Managers
Measurement and Metrics for Test ManagersMeasurement and Metrics for Test Managers
Measurement and Metrics for Test Managers
 
Acceptance sampling
Acceptance samplingAcceptance sampling
Acceptance sampling
 
Test Cases Vs Test Scenarios
Test Cases Vs Test ScenariosTest Cases Vs Test Scenarios
Test Cases Vs Test Scenarios
 
Improve Your Test Process from the Bottom Up
Improve Your Test Process from the Bottom UpImprove Your Test Process from the Bottom Up
Improve Your Test Process from the Bottom Up
 

Similar to Conference on Recommender Systems, 2018: Monitoring Algorithmic Services

Six Sigma Green Belt Training Part 7
Six Sigma Green Belt Training Part 7Six Sigma Green Belt Training Part 7
Six Sigma Green Belt Training Part 7
Skillogic Solutions
 
Quality management
Quality management Quality management
Quality management
Arun Kandukuri
 
vodQA Pune (2019) - Testing AI,ML applications
vodQA Pune (2019) - Testing AI,ML applicationsvodQA Pune (2019) - Testing AI,ML applications
vodQA Pune (2019) - Testing AI,ML applications
vodQA
 
Lean Vs Six Sigma.ppt
Lean Vs Six Sigma.pptLean Vs Six Sigma.ppt
Lean Vs Six Sigma.ppt
ssuser09851b1
 
Analytical Risk-based and Specification-based Testing - Bui Duy Tam
Analytical Risk-based and Specification-based Testing - Bui Duy TamAnalytical Risk-based and Specification-based Testing - Bui Duy Tam
Analytical Risk-based and Specification-based Testing - Bui Duy Tam
Ho Chi Minh City Software Testing Club
 
Software product quality
Software product qualitySoftware product quality
Software product quality
tumetr1
 
An Enterprise Approach to Engine Test Analysis: Requirements for Implementation
An Enterprise Approach to Engine Test Analysis: Requirements for ImplementationAn Enterprise Approach to Engine Test Analysis: Requirements for Implementation
An Enterprise Approach to Engine Test Analysis: Requirements for Implementation
SGS
 
Customer choice probabilities
Customer choice probabilitiesCustomer choice probabilities
Customer choice probabilities
Allan D. Butler
 
Independent verification & validation presented by Maneat v02
Independent verification & validation presented by Maneat v02Independent verification & validation presented by Maneat v02
Independent verification & validation presented by Maneat v02
Dr. Pierpaolo Mangeruga
 
Renni Consultancy - Introduction
Renni Consultancy - IntroductionRenni Consultancy - Introduction
Renni Consultancy - Introduction
Divakaran Narasimhachari
 
Lean for Competitive Advantage and Customer Delight
Lean for Competitive Advantage and Customer DelightLean for Competitive Advantage and Customer Delight
Lean for Competitive Advantage and Customer Delight
Lean India Summit
 
Software testing methodologies to watch out in 2020
Software testing methodologies to watch out in 2020Software testing methodologies to watch out in 2020
Software testing methodologies to watch out in 2020
Concetto Labs
 
DFMEA: Reduce Design Errors, Time and Cost
DFMEA: Reduce Design Errors, Time and CostDFMEA: Reduce Design Errors, Time and Cost
DFMEA: Reduce Design Errors, Time and Cost
Ricardo Gonzalez Luna
 
Yellow belt training 68 s
Yellow belt training 68 sYellow belt training 68 s
Yellow belt training 68 s
Rachit Gaur
 
OM2_Lecture 11vvvhhbbjjbjdjjeebjrhvhuuhh
OM2_Lecture 11vvvhhbbjjbjdjjeebjrhvhuuhhOM2_Lecture 11vvvhhbbjjbjdjjeebjrhvhuuhh
OM2_Lecture 11vvvhhbbjjbjdjjeebjrhvhuuhh
rammanoharjharupnaga
 
E Rev Max The Sigma Way
E Rev Max The Sigma WayE Rev Max The Sigma Way
E Rev Max The Sigma Way
sanjay389
 
Industrialization of testing
Industrialization of testing Industrialization of testing
Industrialization of testing
Marathon QI Consultants
 
Agile Testing Framework - The Art of Automated Testing
Agile Testing Framework - The Art of Automated TestingAgile Testing Framework - The Art of Automated Testing
Agile Testing Framework - The Art of Automated Testing
Dimitri Ponomareff
 
Test AI/ML Applications
Test AI/ML ApplicationsTest AI/ML Applications
Test AI/ML Applications
🍻 Tarun Maini
 
Defect free development - QS Tag2019
Defect free development - QS Tag2019Defect free development - QS Tag2019
Defect free development - QS Tag2019
Arnon Axelrod
 

Similar to Conference on Recommender Systems, 2018: Monitoring Algorithmic Services (20)

Six Sigma Green Belt Training Part 7
Six Sigma Green Belt Training Part 7Six Sigma Green Belt Training Part 7
Six Sigma Green Belt Training Part 7
 
Quality management
Quality management Quality management
Quality management
 
vodQA Pune (2019) - Testing AI,ML applications
vodQA Pune (2019) - Testing AI,ML applicationsvodQA Pune (2019) - Testing AI,ML applications
vodQA Pune (2019) - Testing AI,ML applications
 
Lean Vs Six Sigma.ppt
Lean Vs Six Sigma.pptLean Vs Six Sigma.ppt
Lean Vs Six Sigma.ppt
 
Analytical Risk-based and Specification-based Testing - Bui Duy Tam
Analytical Risk-based and Specification-based Testing - Bui Duy TamAnalytical Risk-based and Specification-based Testing - Bui Duy Tam
Analytical Risk-based and Specification-based Testing - Bui Duy Tam
 
Software product quality
Software product qualitySoftware product quality
Software product quality
 
An Enterprise Approach to Engine Test Analysis: Requirements for Implementation
An Enterprise Approach to Engine Test Analysis: Requirements for ImplementationAn Enterprise Approach to Engine Test Analysis: Requirements for Implementation
An Enterprise Approach to Engine Test Analysis: Requirements for Implementation
 
Customer choice probabilities
Customer choice probabilitiesCustomer choice probabilities
Customer choice probabilities
 
Independent verification & validation presented by Maneat v02
Independent verification & validation presented by Maneat v02Independent verification & validation presented by Maneat v02
Independent verification & validation presented by Maneat v02
 
Renni Consultancy - Introduction
Renni Consultancy - IntroductionRenni Consultancy - Introduction
Renni Consultancy - Introduction
 
Lean for Competitive Advantage and Customer Delight
Lean for Competitive Advantage and Customer DelightLean for Competitive Advantage and Customer Delight
Lean for Competitive Advantage and Customer Delight
 
Software testing methodologies to watch out in 2020
Software testing methodologies to watch out in 2020Software testing methodologies to watch out in 2020
Software testing methodologies to watch out in 2020
 
DFMEA: Reduce Design Errors, Time and Cost
DFMEA: Reduce Design Errors, Time and CostDFMEA: Reduce Design Errors, Time and Cost
DFMEA: Reduce Design Errors, Time and Cost
 
Yellow belt training 68 s
Yellow belt training 68 sYellow belt training 68 s
Yellow belt training 68 s
 
OM2_Lecture 11vvvhhbbjjbjdjjeebjrhvhuuhh
OM2_Lecture 11vvvhhbbjjbjdjjeebjrhvhuuhhOM2_Lecture 11vvvhhbbjjbjdjjeebjrhvhuuhh
OM2_Lecture 11vvvhhbbjjbjdjjeebjrhvhuuhh
 
E Rev Max The Sigma Way
E Rev Max The Sigma WayE Rev Max The Sigma Way
E Rev Max The Sigma Way
 
Industrialization of testing
Industrialization of testing Industrialization of testing
Industrialization of testing
 
Agile Testing Framework - The Art of Automated Testing
Agile Testing Framework - The Art of Automated TestingAgile Testing Framework - The Art of Automated Testing
Agile Testing Framework - The Art of Automated Testing
 
Test AI/ML Applications
Test AI/ML ApplicationsTest AI/ML Applications
Test AI/ML Applications
 
Defect free development - QS Tag2019
Defect free development - QS Tag2019Defect free development - QS Tag2019
Defect free development - QS Tag2019
 

Recently uploaded

一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
keesa2
 
Q4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slideQ4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slide
mukulupadhayay1
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
Vineet
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
oaxefes
 
06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus
Timothy Spann
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
nyvan3
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Marlon Dumas
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
Alireza Kamrani
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
nhutnguyen355078
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
asyed10
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
blueshagoo1
 
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdfreading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
perranet1
 
Sid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.pptSid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.ppt
ArshadAyub49
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
bmucuha
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
aguty
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
z6osjkqvd
 
一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理
ugydym
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
dataschool1
 

Recently uploaded (20)

一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
 
Q4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slideQ4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slide
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
 
06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
 
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdfreading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
 
Sid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.pptSid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.ppt
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 
一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
 

Conference on Recommender Systems, 2018: Monitoring Algorithmic Services

  • 1. MEASURING OPERATIONAL QUALITY OF RECOMMENDATIONS Lina Weichbrodt 12th ACM Conference on Recommender Systems 04-10-2018
  • 2. 2 Speaker: Lina Weichbrodt @rmminusrslash ● Research Engineer in the recommendation algorithms team at Zalando ● Develop scalable machine learning products ● Zalando: Europe’s leading online fashion platform, 4.5 B revenue in 2017, 17 European countries
  • 3. 3 Article Recommendation How can we monitor recommendation quality in real time? PROBLEM
  • 4. 4 OUTLINE ● Problem Analysis ○ Status Quo Algorithmic Service Monitoring ○ Risk Areas ● Solution ○ Definition of Response Quality ○ How to Select a Quality Metric ● Case Study
  • 6. 6 STATUS QUO ALGORITHMIC SERVICE MONITORING ZMON ● Status Quo: speed and errors during processing (latency and availability) ● Good, but not enough!
  • 8. 8 ● The input data changes. Typical reasons might be a client application that releases a bug (e.g., lowercasing a case sensitive identifier) or changes a feature in a way that affects the data distribution such as allowing all users to use the product cart instead of previously allowing it only for logged in users. If the change is not detected training data and serving data can diverge. ● The model is updated and the new version is inferior to the previous one. ● The latest deployment of the stack that processes the request and serves the model contains a bug. RISK AREAS
  • 9. 9 ● Changes concerning external services lead to performance loss. An example in an e-commerce setting is switching to a different microservice to obtain article metadata used for filtering the recommendations. ● Changes in configuration (applied filters, algorithms, A/B test configuration) are faulty or lead to unforeseen quality degradation. RISK AREAS
  • 10. 10 We need quality monitoring of delivered recommendations CONCLUSION
  • 12. 12 ● Typical definition of successful response: fast enough, no http errors. ● Suggested definition: fast enough, successful according to the business case. Example: For personalized recommendations on Zalando Home Page we expect… DEFINITION OF RESPONSE QUALITY more than 95% http 200 OK responses under 200ms.
  • 13. 13 ● Typical definition of successful response: fast enough, no http errors. ● Suggested definition: fast enough, successful according to the business case. Example: For personalized recommendations on Zalando Home Page we expect… DEFINITION OF RESPONSE QUALITY more than 65% of responses have at last four articles in under 200ms. more than 95% http 200 OK responses under 200ms.
  • 14. 14 ● Quality metric(s) are business case dependent -> choose your own! ● Measuring the objective or user perceived quality is (most likely) not possible. That is not a problem, simple heuristics are already effective. HOW TO SELECT A QUALITY METRIC
  • 15. 15 ● Suggested criteria ● comparable across models ● simple and easy to understand ● can be collected in real time ● allows for actionable alerting on problems HOW TO SELECT A QUALITY METRIC
  • 17. 17 ● Recommendation Team: ○ Dozens of business cases for website, apps and emails ○ Heavy load with up to 4000 req/sec ○ Low latency requirements CASE STUDY: ZALANDO
  • 18. 18 ● Background: Recommendations are created from a sequence of configurations (model and filtering rules) ● Simple quality metric ○ Good response: top 5 positions are from best configuration ○ Poor response: less than 5 articles or the top 5 articles contain a fallback (e.g. popular item) CASE STUDY: SELECT QUALITY METRIC Best Config Config 2 …. Worst Config
  • 21. 21 ● DO: add specific alerts for important use cases ● DO: if you have so many use cases that choosing individual thresholds is not reasonable for all of them add coarse alerts e.g. success percentage <50 % CASE STUDY: ALERTING
  • 23. 23 ● Goal: Detect bugs in the new version of the stack ● DO: Deploy model changes and code changes separately ● Deployment process: ○ Two stack versions are live at the same time ○ Traffic is being gradually switched over ○ Measure each minute the difference of good and bad percentages between the two stacks -> A/A test ● Distribution of difference: average is expected to be 0 if no bug was introduced CASE STUDY: DEPLOYMENT MONITORING
  • 24. 24 CASE STUDY: DEPLOYMENT MONITORING Use cases
  • 25. 25 ● Insights ○ Online monitoring for data driven services must include quality metrics ○ Definition of quality metrics and their acceptable level is business case dependent ○ A very simplistic metric is already very useful CONTRIBUTIONS
  • 26. 26 ● Quality Metrics ○ allow for real-time quality monitoring ○ will detect problems that are hard to spot ○ easy to integrate with existing tools ○ can be used to set user expectations about the service performance CONTRIBUTIONS
  • 27. 27 Work with us in Berlin! The Reco Algorithms Team is hiring a Principal Research Engineer.