Proprietary, Confidential, & Thoughtful
Rohan Singh Rajput
Senior Data Scientist
Headspace
Building Personalized Meditation
Experiences
Proprietary, Confidential, & Thoughtful
Agenda
● Introduction
● History
● Phase I - Modeling
● Phase II - A/B Testing
● Learnings
● Opportunities
● Q/A
2
Proprietary, Confidential, & Thoughtful
Headspace is an online meditation and
mindfulness company
3
Proprietary, Confidential, & Thoughtful
4
5
Proprietary, Confidential, & Thoughtful
History
● Launched in 2020, the content customizer started to
countering the declining retention rate.
● Every headspace user was getting the same editorial content
that used to refresh everyday.
● Headspace had over 1000+ unique content and more than 2
million B2C subscribers
6
Proprietary, Confidential, & Thoughtful
Hypothesis
Personalized recommendations will help members find
more relevant content, thereby increasing their engagement
with Headspace
● Content Customizer (a.k.a CC) is our recommendation
system that serves personalized recommendations
● It uses historical user-item interaction data to generate
recommendation everyday
Proprietary, Confidential, Thoughtful
Proprietary, Confidential, & Thoughtful
Proprietary, Confidential, & Thoughtful
7
Phase I
Modeling
Proprietary, Confidential, & Thoughtful
Components of Modeling
● Data
○ B2C and B2B Subscribers
○ Feature Store
■ User tenure, demographics, language, time of the day etc.
■ Item type, playtime duration, content creation date etc.
○ IOS and Android
● Evaluation
○ Offline: NDGC@K , HITRate@K, Precision@K, etc.
○ Online: A/B Testing
● Inference and Post processing
○ Nightly batched training
○ Payload to prediction service
○ Fix sequence, remove popular and recently completed
Proprietary, Confidential, & Thoughtful
8
9
Proprietary, Confidential, & Thoughtful
Modeling Pipeline
● Deployed for both B2C and B2B members
● Deployed to both iOS and Android platform
Data Pull Model Training
Generate
Prediction
Post
Processing
Send Data to
the Prediction
Service
10
Proprietary, Confidential, & Thoughtful
Version Name Model Type
LightFM model using user and item
features
Feature Based Model
Matrix Factorization model using
historical content interaction
NCF Model using user and item
features
Feature Based Model
Deep learning model that helps to
predict the nonlinear latent factors
and users explicit pos/neg feedback
TiSASRec Model using historical
sequences
Sequence Based Model
Transformer model to capture
sequences of content completes
History of Models
Proprietary, Confidential, & Thoughtful
LightFM Model
● Reasons
○ Matrix Factorization
○ Implicit and Explicit Feedback
○ GPU Optimization
○ Highly scalable
● Result
○ Offline: Precision@k and Recall@k
○ Online: No statistically significant lift (content
start/complete) in A/B test
Kula, Maciej. "Metadata embeddings for user and item
cold-start recommendations." arXiv preprint
arXiv:1507.08439 (2015)
11
Image Credit: Google Developer
Proprietary, Confidential, & Thoughtful
Neural Collaborative Filtering Model
● Reasons
○ Non-linearity
○ User-item features
○ Sparsity
● Result
○ Offline: Precision@k and Recall@k
○ Online: Over 2% statistically significant lift (content
start/complete) in A/B test
He, Xiangnan, et al. "Neural collaborative filtering." Proceedings of the 26th international
conference on world wide web. 2017.
12
Proprietary, Confidential, & Thoughtful
Time Interval Aware Self-Attention for Sequential Recommendation
(TiSASRec)
● Reasons
○ Sequential Data
○ Temporal Dynamics
○ Transformer Architecture
● Result
○ Offline: NDCG@k and HitRate@k
○ Online: Over 4.68% statistically significant lift
(content play) in A/B test
Li, Jiacheng, Yujie Wang, and Julian McAuley. "Time interval aware self-attention for sequential recommendation." Proceedings of the
13th international conference on web search and data mining. 2020.
13
Proprietary, Confidential, & Thoughtful
System Architecture
14
Proprietary, Confidential, & Thoughtful
Proprietary, Confidential, & Thoughtful
15
Phase II
A/B Testing
16
Proprietary, Confidential, & Thoughtful
Introduction
● “If you canʼt measure it, you canʼt improve it.” — Peter Drucker.
● Online Controlled Experiments, a.k.a A/B testing, are the gold
standard for estimating causality with high probability.
● A data-driven decision-making culture helps estimate the
measurement uncertainty to refute the null hypothesis based on
experimental data.
● Furthermore, a random assignment of the users into a
control-treatment group allows us to safely ignore the unobserved
factors and model the parameters as random variables
Proprietary, Confidential, & Thoughtful
17
Proprietary, Confidential, & Thoughtful
Experiment Design
● Recommendation surface area
○ Todayʼs tab
○ Hero and Recommended tab on
Meditate/Sleep/Move/Focus
● Platform Selection
○ iOS
○ Android
● Audience Selection
○ User type: B2C, B2B
○ Subscription type: Paid, Free trials
○ Language: English, German, French
○ Region: North America, International
● Driver Metrics and Business KPI
○ Metrics: Content Start, Content Complete, Content Play,
Content start to complete ratio
○ KPIs: DAU, WAU, MAU, Paid conversion, Churn Rate
Proprietary, Confidential, & Thoughtful
Proprietary, Confidential, & Thoughtful
18
Experiment Design
19
Proprietary, Confidential, & Thoughtful
Experiment Design
● Sample Size and Experiment Duration:
○ Baseline conversion rate
○ Minimum detectable lift
○ Use standard setting with alpha = 0.05 and beta =
20% for 1 sided test
○ Classical test:
https://www.statsig.com/calculator
○ Sequence test:
https://docs.statsig.com/experiments-plus/pow
er-analysis
Proprietary, Confidential, & Thoughtful
20
Proprietary, Confidential, & Thoughtful
Experiment Design
● Important details
○ Prefer A/B test over Multivariate test
○ Use 1 or 2 driver metrics
○ Use guardrail metrics for rollback
○ Avoid early peeking
○ Use confidence intervals over p-values
Proprietary, Confidential, & Thoughtful
21
Proprietary, Confidential, & Thoughtful
Experiment Architecture
● Content Customizer Model: A Machine Learning model
that generates personalized recommendations for every
eligible user. It sends its prediction to the prediction
service using offline inference.
● Prediction Service: A microservice that sends
predictions to the layout service asynchronously. It is
required to update the layout service after
post-processing of the content.
● Layout Service: A bundle of services that creates a
layout for the client for respective tabs. It receives
requests from the client and serves appropriate content
to the layout.
● A/B Testing Tool: A/B testing platform that helps to
administer the online controlled experiment. It
randomly buckets the eligible users into control and
treatment.
Proprietary, Confidential, & Thoughtful
A/B Testing
Tool
22
Proprietary, Confidential, & Thoughtful
Monitoring and Evaluation
● Confidence Interval is much more reliable
indicator of effectiveness than sole p-value
○ p = 0.06 (this)
○ -9% to 20% (v/s this)
● Always use practical significance threshold
along with statistical significance threshold
○ Statistical Significance is a theoretical
indicator of the impact
○ Practical significance is a practical
indicator of the impact
Proprietary, Confidential, & Thoughtful
Proprietary, Confidential, & Thoughtful
Learnings
● Feature Space
○ User features are not rich enough to understand the user behavior
■ Subscription mismatch
■ Sparse interaction
■ Privacy and Compliance
○ Item features are fragmented
■ Many features are stored in multiple system
■ Overlapping content type
● UI Rigidity
○ Fluid design for multiple screens
○ High performing static content
○ Single model everywhere
○ Explicit Feedback
Proprietary, Confidential, & Thoughtful
23
High Performing Content
Static Content
24
Proprietary, Confidential, & Thoughtful
Opportunities
● Client Reform
○ The new shelves based design helps to surface context based
recommendations
○ Provide separate screen to present notified recommendations
○ Search based recommendations
● Modeling
○ Long term value (Align with the productʼs vision)
○ Multi-modal recommendation (text, image, audio)
○ Personalized shelves ranking
○ Causal AI to get explainability
○ Optimize novelty, diversity and relevance
Proprietary, Confidential, & Thoughtful
Personalized
Personalized
24
25
Proprietary, Confidential, & Thoughtful
Opportunities
● Robust E2E System
○ Impression data
○ Analyze data drift, model drift and concept drift
○ Smart caching
○ Real time inference with continuous learning
○ Interconnected recommendation (email, push, notification)
Proprietary, Confidential, & Thoughtful
Personalized
Personalized
25
26
Proprietary, Confidential, & Thoughtful
References
○ Engineering Blog :
■ https://medium.com/headspace-engineering/mindful-experiment
ation-evaluate-recommendation-system-performance-using-a-b-t
esting-at-headspace-3c8c05d0ae3b
○ CXL Guide
■ https://cxl.com/blog/ab-testing-guide/
Proprietary, Confidential, & Thoughtful
27
Proprietary, Confidential, & Thoughtful
Thank you
27
https://www.linkedin.com/in/rohan3/

A_B Testing Personalized Meditation Recommendations.pdf

  • 1.
    Proprietary, Confidential, &Thoughtful Rohan Singh Rajput Senior Data Scientist Headspace Building Personalized Meditation Experiences
  • 2.
    Proprietary, Confidential, &Thoughtful Agenda ● Introduction ● History ● Phase I - Modeling ● Phase II - A/B Testing ● Learnings ● Opportunities ● Q/A 2
  • 3.
    Proprietary, Confidential, &Thoughtful Headspace is an online meditation and mindfulness company 3
  • 4.
  • 5.
    5 Proprietary, Confidential, &Thoughtful History ● Launched in 2020, the content customizer started to countering the declining retention rate. ● Every headspace user was getting the same editorial content that used to refresh everyday. ● Headspace had over 1000+ unique content and more than 2 million B2C subscribers
  • 6.
    6 Proprietary, Confidential, &Thoughtful Hypothesis Personalized recommendations will help members find more relevant content, thereby increasing their engagement with Headspace ● Content Customizer (a.k.a CC) is our recommendation system that serves personalized recommendations ● It uses historical user-item interaction data to generate recommendation everyday Proprietary, Confidential, Thoughtful
  • 7.
    Proprietary, Confidential, &Thoughtful Proprietary, Confidential, & Thoughtful 7 Phase I Modeling
  • 8.
    Proprietary, Confidential, &Thoughtful Components of Modeling ● Data ○ B2C and B2B Subscribers ○ Feature Store ■ User tenure, demographics, language, time of the day etc. ■ Item type, playtime duration, content creation date etc. ○ IOS and Android ● Evaluation ○ Offline: NDGC@K , HITRate@K, Precision@K, etc. ○ Online: A/B Testing ● Inference and Post processing ○ Nightly batched training ○ Payload to prediction service ○ Fix sequence, remove popular and recently completed Proprietary, Confidential, & Thoughtful 8
  • 9.
    9 Proprietary, Confidential, &Thoughtful Modeling Pipeline ● Deployed for both B2C and B2B members ● Deployed to both iOS and Android platform Data Pull Model Training Generate Prediction Post Processing Send Data to the Prediction Service
  • 10.
    10 Proprietary, Confidential, &Thoughtful Version Name Model Type LightFM model using user and item features Feature Based Model Matrix Factorization model using historical content interaction NCF Model using user and item features Feature Based Model Deep learning model that helps to predict the nonlinear latent factors and users explicit pos/neg feedback TiSASRec Model using historical sequences Sequence Based Model Transformer model to capture sequences of content completes History of Models
  • 11.
    Proprietary, Confidential, &Thoughtful LightFM Model ● Reasons ○ Matrix Factorization ○ Implicit and Explicit Feedback ○ GPU Optimization ○ Highly scalable ● Result ○ Offline: Precision@k and Recall@k ○ Online: No statistically significant lift (content start/complete) in A/B test Kula, Maciej. "Metadata embeddings for user and item cold-start recommendations." arXiv preprint arXiv:1507.08439 (2015) 11 Image Credit: Google Developer
  • 12.
    Proprietary, Confidential, &Thoughtful Neural Collaborative Filtering Model ● Reasons ○ Non-linearity ○ User-item features ○ Sparsity ● Result ○ Offline: Precision@k and Recall@k ○ Online: Over 2% statistically significant lift (content start/complete) in A/B test He, Xiangnan, et al. "Neural collaborative filtering." Proceedings of the 26th international conference on world wide web. 2017. 12
  • 13.
    Proprietary, Confidential, &Thoughtful Time Interval Aware Self-Attention for Sequential Recommendation (TiSASRec) ● Reasons ○ Sequential Data ○ Temporal Dynamics ○ Transformer Architecture ● Result ○ Offline: NDCG@k and HitRate@k ○ Online: Over 4.68% statistically significant lift (content play) in A/B test Li, Jiacheng, Yujie Wang, and Julian McAuley. "Time interval aware self-attention for sequential recommendation." Proceedings of the 13th international conference on web search and data mining. 2020. 13
  • 14.
    Proprietary, Confidential, &Thoughtful System Architecture 14
  • 15.
    Proprietary, Confidential, &Thoughtful Proprietary, Confidential, & Thoughtful 15 Phase II A/B Testing
  • 16.
    16 Proprietary, Confidential, &Thoughtful Introduction ● “If you canʼt measure it, you canʼt improve it.” — Peter Drucker. ● Online Controlled Experiments, a.k.a A/B testing, are the gold standard for estimating causality with high probability. ● A data-driven decision-making culture helps estimate the measurement uncertainty to refute the null hypothesis based on experimental data. ● Furthermore, a random assignment of the users into a control-treatment group allows us to safely ignore the unobserved factors and model the parameters as random variables Proprietary, Confidential, & Thoughtful
  • 17.
    17 Proprietary, Confidential, &Thoughtful Experiment Design ● Recommendation surface area ○ Todayʼs tab ○ Hero and Recommended tab on Meditate/Sleep/Move/Focus ● Platform Selection ○ iOS ○ Android ● Audience Selection ○ User type: B2C, B2B ○ Subscription type: Paid, Free trials ○ Language: English, German, French ○ Region: North America, International ● Driver Metrics and Business KPI ○ Metrics: Content Start, Content Complete, Content Play, Content start to complete ratio ○ KPIs: DAU, WAU, MAU, Paid conversion, Churn Rate Proprietary, Confidential, & Thoughtful
  • 18.
    Proprietary, Confidential, &Thoughtful 18 Experiment Design
  • 19.
    19 Proprietary, Confidential, &Thoughtful Experiment Design ● Sample Size and Experiment Duration: ○ Baseline conversion rate ○ Minimum detectable lift ○ Use standard setting with alpha = 0.05 and beta = 20% for 1 sided test ○ Classical test: https://www.statsig.com/calculator ○ Sequence test: https://docs.statsig.com/experiments-plus/pow er-analysis Proprietary, Confidential, & Thoughtful
  • 20.
    20 Proprietary, Confidential, &Thoughtful Experiment Design ● Important details ○ Prefer A/B test over Multivariate test ○ Use 1 or 2 driver metrics ○ Use guardrail metrics for rollback ○ Avoid early peeking ○ Use confidence intervals over p-values Proprietary, Confidential, & Thoughtful
  • 21.
    21 Proprietary, Confidential, &Thoughtful Experiment Architecture ● Content Customizer Model: A Machine Learning model that generates personalized recommendations for every eligible user. It sends its prediction to the prediction service using offline inference. ● Prediction Service: A microservice that sends predictions to the layout service asynchronously. It is required to update the layout service after post-processing of the content. ● Layout Service: A bundle of services that creates a layout for the client for respective tabs. It receives requests from the client and serves appropriate content to the layout. ● A/B Testing Tool: A/B testing platform that helps to administer the online controlled experiment. It randomly buckets the eligible users into control and treatment. Proprietary, Confidential, & Thoughtful A/B Testing Tool
  • 22.
    22 Proprietary, Confidential, &Thoughtful Monitoring and Evaluation ● Confidence Interval is much more reliable indicator of effectiveness than sole p-value ○ p = 0.06 (this) ○ -9% to 20% (v/s this) ● Always use practical significance threshold along with statistical significance threshold ○ Statistical Significance is a theoretical indicator of the impact ○ Practical significance is a practical indicator of the impact Proprietary, Confidential, & Thoughtful
  • 23.
    Proprietary, Confidential, &Thoughtful Learnings ● Feature Space ○ User features are not rich enough to understand the user behavior ■ Subscription mismatch ■ Sparse interaction ■ Privacy and Compliance ○ Item features are fragmented ■ Many features are stored in multiple system ■ Overlapping content type ● UI Rigidity ○ Fluid design for multiple screens ○ High performing static content ○ Single model everywhere ○ Explicit Feedback Proprietary, Confidential, & Thoughtful 23 High Performing Content Static Content
  • 24.
    24 Proprietary, Confidential, &Thoughtful Opportunities ● Client Reform ○ The new shelves based design helps to surface context based recommendations ○ Provide separate screen to present notified recommendations ○ Search based recommendations ● Modeling ○ Long term value (Align with the productʼs vision) ○ Multi-modal recommendation (text, image, audio) ○ Personalized shelves ranking ○ Causal AI to get explainability ○ Optimize novelty, diversity and relevance Proprietary, Confidential, & Thoughtful Personalized Personalized 24
  • 25.
    25 Proprietary, Confidential, &Thoughtful Opportunities ● Robust E2E System ○ Impression data ○ Analyze data drift, model drift and concept drift ○ Smart caching ○ Real time inference with continuous learning ○ Interconnected recommendation (email, push, notification) Proprietary, Confidential, & Thoughtful Personalized Personalized 25
  • 26.
    26 Proprietary, Confidential, &Thoughtful References ○ Engineering Blog : ■ https://medium.com/headspace-engineering/mindful-experiment ation-evaluate-recommendation-system-performance-using-a-b-t esting-at-headspace-3c8c05d0ae3b ○ CXL Guide ■ https://cxl.com/blog/ab-testing-guide/ Proprietary, Confidential, & Thoughtful
  • 27.
    27 Proprietary, Confidential, &Thoughtful Thank you 27 https://www.linkedin.com/in/rohan3/