SlideShare a Scribd company logo
Leveraging an in-house modeling
framework for fun and profit
Mike Skarlinski & Brian Graham
{michael.skarlinski, brian.graham}@weightwatchers.com
June 2019
Outline
• Introduction: data science at WW – the new Weight Watchers
• Problem: scalable, simple modeling and recommendation systems with a small team
• Solution: design and benefits of building a framework
• Implementation: Examples of deployed recommenders
WW is a data driven application to help members
on their wellness journeys
Member Social
Network
Activity & Food
tracking
Weight progress &
goals
Recipe & food
database
As a new team, we are tasked with building a
foundation of data products
Social
Network:
Connect
Growth
WW
Program
Infra-
structure
Churn model
Return model
LTV models
Single Member View
Recipe recommender
Similar recipes
Composite foods ontology
Personalized feed
Groups search
Who to follow
APIs
Primrose
Data science team’s success hinges on effectively
sharing work and knowledge
openopen
Brian
Graham
Reka
Daniel-Weiner
Yameng
(Eliza) Zhang
Kevin
Zecchini
Carl
Anderson
Michael (Mike)
Skarlinski
open
Dec.
2019
May
2018
Jan.
2019
Mar.
2019
Feb.
2019
...
(Hint hint)
How can we build software that helps us grow and develop as a team?
WW recommender and modeling
challenges
Taking stock of our own challenges at WW
What would make a good recommender system at WW?
Slow serialization
but our medium data
can be kept in RAM...
No live features
but we know Docker, k8s...
Easy onboarding
mono repo with config as code...
We built a framework to solve our challenges and
enforce our design decisions
(Open source coming soon!!!!!)
Primrose: a framework for simple, quick
modeling deployments
Primrose has features to address each design
consideration
Python in-memory DAG runner, with no
serialization between nodes of the DAG.
DAG is defined as configuration-as-code
approach -- one container for all models
Abstract ML and data manipulation operations,
data scientists can easily extend the framework
Data science Infrastructure People
Primrose: (Production In-Memory Solution) framework for solving
WW’s most common use cases, caching batched predictions with
machine-learning engineering baked-in.
Primrose jobs are executed as Directed Acyclic
Graphs (DAG)s in python
Flexibility: any number of operations
allowed in a single DAG, across any
python library
Data and functions are passed between
nodes in an object that understands how
to extract the correct data for each node
DAGs are composed of implementation agnostic,
extensible nodes for data science
Data scientists can write any class that
matches the abstract interface &
incorporate in their DAGs
Data scientists can write individual nodes using
any Python framework or library they choose
Primrose is run like an ETL pipeline in a single
docker container for each configuration
For simpler deployments: Primrose uses a
“configuration as code” approach
Object configuration and DAG structure
are build in a configuration JSON
Primrose validates the configuration
and instantiates the correct classes at
runtime
Different outputs and results for each
DAG
Recipe recommender DAG JSON
Churn Model DAG JSON
Connect Feed DAG JSON
Primrose container Success, fame, money...
The framework has helped our team grow
and develop production models
Deployed 3 production
models and 3 production
recommenders
Onboarded 6 members in less
than a year, everyone is working
in the framework!
We’re going to open-source Primrose !!! Keep on the lookout or contact us!
WW Recommender Examples
Food is at the core of our product
We know you and meet you where you are.
coffee
croissant
fish tacos
apple
cobb salad
pasta with red sauce
ice cream
Personalize your
experience using your data
Recipe Recommendations
Similar Recipes Dinner Recommendations
Similar Recipes Flow
US WW Recipes
Similar Ingredients
Similar Names
Filters
dietary
course
cuisine
main ingredient
document = ingredient list or name string
lemmatize, tokenize, TF-IDF
Cosine similarity
Rank
*Only recipes with images*
Business Logic (filters)
Productionalize in Primrose DAG
Google BigQuery Data lake Reader
NLTK + Custom Lemmatization
Sklearn TF-IDF + cosine similarity
Write to GCS Bucket and Google MemoryStore
Success!
logging.info(‘Your newbie DS has written production quality code.’)
Business Logic (filters)
Productionalize in Primrose DAG
Google BigQuery Data lake Reader
NLTK + Custom Lemmatization
Sklearn TF-IDF + cosine similarity
Write to GCS Bucket and Google MemoryStore
Success!
logging.info(‘Your newbie DS has written production quality code.’)
Business Logic (filters)
Productionalize in Primrose DAG
Google BigQuery Data lake Reader
NLTK + Custom Lemmatization
Sklearn TF-IDF + cosine similarity
Write to GCS Bucket and Google MemoryStore
Success!
logging.info(‘Your newbie DS has written production quality code.’)
Dinner Recommendations Flow
US WW Recipes
Similar Ingredients
Similar Names Business Logic
Eligible Members
2 weeks of tracking history
Tracked >= 1 recipe
US members
Potential Recs
tracked
most similar
X XX
X
2nd most sim.
n = 4 recommendations
Productionalizing is easier the second time
Same BQ reader class,
different SQL input file
New postprocess class to sort, filter and interleave potential recommendations
Success!
logging.warning(‘Data Scientist is developing software engineering skills.’)
Container
Dinner
Recs
Primrose
Container
Container
Recipe Recs
Micro-Service
Flask API
Similar
Recipes
Primrose
Redis Cache
MemoryStore
Final Deployment Architecture
Datalake
BigQuery
Refresh Daily
Refresh Daily
Android
Endpoint
Clients
iOS
Web
Q & A
Open sourcing primrose here soon:
https://github.com/ww-tech
Tech blog
https://medium.com/ww-tech-blog
Leveraging an in-house modeling framework for fun and profit

More Related Content

What's hot

Predictive Analytics - How to get stuff out of your Crystal Ball
Predictive Analytics - How to get stuff out of your Crystal BallPredictive Analytics - How to get stuff out of your Crystal Ball
Predictive Analytics - How to get stuff out of your Crystal Ball
DATAVERSITY
 
Data Driven Strategy Analytics Technology Approach Corporate
Data Driven Strategy Analytics Technology Approach CorporateData Driven Strategy Analytics Technology Approach Corporate
Data Driven Strategy Analytics Technology Approach Corporate
SlideTeam
 
1530 track2 reid
1530 track2 reid1530 track2 reid
1530 track2 reid
Rising Media, Inc.
 
Reinventing the Modern Information Pipeline: Paxata and MapR
Reinventing the Modern Information Pipeline: Paxata and MapRReinventing the Modern Information Pipeline: Paxata and MapR
Reinventing the Modern Information Pipeline: Paxata and MapR
Lilia Gutnik
 
Four Key Considerations for your Big Data Analytics Strategy
Four Key Considerations for your Big Data Analytics StrategyFour Key Considerations for your Big Data Analytics Strategy
Four Key Considerations for your Big Data Analytics Strategy
Arcadia Data
 
Building a Data Driven Organization
Building a Data Driven OrganizationBuilding a Data Driven Organization
Building a Data Driven Organization
IT Weekend
 
Data Quality & Data Governance
Data Quality & Data GovernanceData Quality & Data Governance
Data Quality & Data Governance
Tuba Yaman Him
 
1415 gold sanford
1415 gold sanford1415 gold sanford
1415 gold sanford
Rising Media, Inc.
 
Stop searching for that elusive data scientist
Stop searching for that elusive data scientistStop searching for that elusive data scientist
Stop searching for that elusive data scientist
Yogita Bansal
 
Data-Ed Webinar: The Seven Deadly Data Sins - Emerging from Management Purgatory
Data-Ed Webinar: The Seven Deadly Data Sins - Emerging from Management PurgatoryData-Ed Webinar: The Seven Deadly Data Sins - Emerging from Management Purgatory
Data-Ed Webinar: The Seven Deadly Data Sins - Emerging from Management Purgatory
DATAVERSITY
 
Analytics Strategy and Roadmap Offering v2 (1)
Analytics Strategy and Roadmap Offering v2 (1)Analytics Strategy and Roadmap Offering v2 (1)
Analytics Strategy and Roadmap Offering v2 (1)
Joey Amanchukwu
 
Webinar: Data Quality, Data Engineering, and Data Science
Webinar: Data Quality, Data Engineering, and Data ScienceWebinar: Data Quality, Data Engineering, and Data Science
Webinar: Data Quality, Data Engineering, and Data Science
DATAVERSITY
 
H2O World - Advanced Analytics at Macys.com - Daqing Zhao
H2O World - Advanced Analytics at Macys.com - Daqing ZhaoH2O World - Advanced Analytics at Macys.com - Daqing Zhao
H2O World - Advanced Analytics at Macys.com - Daqing Zhao
Sri Ambati
 
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
DATAVERSITY
 
The Five Data Questions
The Five Data QuestionsThe Five Data Questions
The Five Data Questions
crystalpullen
 
RWDG Slides: Using Agile to Justify Data Governance
RWDG Slides: Using Agile to Justify Data GovernanceRWDG Slides: Using Agile to Justify Data Governance
RWDG Slides: Using Agile to Justify Data Governance
DATAVERSITY
 
Data Analyics
Data AnalyicsData Analyics
Data Analyics
SysDiva Consultants
 
Data-Ed Slides: Exorcising the Seven Deadly Data Sins
Data-Ed Slides: Exorcising the Seven Deadly Data SinsData-Ed Slides: Exorcising the Seven Deadly Data Sins
Data-Ed Slides: Exorcising the Seven Deadly Data Sins
DATAVERSITY
 
Stop Searching for That Elusive Data Scientist
Stop Searching for That Elusive Data ScientistStop Searching for That Elusive Data Scientist
Stop Searching for That Elusive Data Scientist
Vaibhav Srivastav
 
1215 daa industry lunch
1215 daa industry lunch1215 daa industry lunch
1215 daa industry lunch
Rising Media, Inc.
 

What's hot (20)

Predictive Analytics - How to get stuff out of your Crystal Ball
Predictive Analytics - How to get stuff out of your Crystal BallPredictive Analytics - How to get stuff out of your Crystal Ball
Predictive Analytics - How to get stuff out of your Crystal Ball
 
Data Driven Strategy Analytics Technology Approach Corporate
Data Driven Strategy Analytics Technology Approach CorporateData Driven Strategy Analytics Technology Approach Corporate
Data Driven Strategy Analytics Technology Approach Corporate
 
1530 track2 reid
1530 track2 reid1530 track2 reid
1530 track2 reid
 
Reinventing the Modern Information Pipeline: Paxata and MapR
Reinventing the Modern Information Pipeline: Paxata and MapRReinventing the Modern Information Pipeline: Paxata and MapR
Reinventing the Modern Information Pipeline: Paxata and MapR
 
Four Key Considerations for your Big Data Analytics Strategy
Four Key Considerations for your Big Data Analytics StrategyFour Key Considerations for your Big Data Analytics Strategy
Four Key Considerations for your Big Data Analytics Strategy
 
Building a Data Driven Organization
Building a Data Driven OrganizationBuilding a Data Driven Organization
Building a Data Driven Organization
 
Data Quality & Data Governance
Data Quality & Data GovernanceData Quality & Data Governance
Data Quality & Data Governance
 
1415 gold sanford
1415 gold sanford1415 gold sanford
1415 gold sanford
 
Stop searching for that elusive data scientist
Stop searching for that elusive data scientistStop searching for that elusive data scientist
Stop searching for that elusive data scientist
 
Data-Ed Webinar: The Seven Deadly Data Sins - Emerging from Management Purgatory
Data-Ed Webinar: The Seven Deadly Data Sins - Emerging from Management PurgatoryData-Ed Webinar: The Seven Deadly Data Sins - Emerging from Management Purgatory
Data-Ed Webinar: The Seven Deadly Data Sins - Emerging from Management Purgatory
 
Analytics Strategy and Roadmap Offering v2 (1)
Analytics Strategy and Roadmap Offering v2 (1)Analytics Strategy and Roadmap Offering v2 (1)
Analytics Strategy and Roadmap Offering v2 (1)
 
Webinar: Data Quality, Data Engineering, and Data Science
Webinar: Data Quality, Data Engineering, and Data ScienceWebinar: Data Quality, Data Engineering, and Data Science
Webinar: Data Quality, Data Engineering, and Data Science
 
H2O World - Advanced Analytics at Macys.com - Daqing Zhao
H2O World - Advanced Analytics at Macys.com - Daqing ZhaoH2O World - Advanced Analytics at Macys.com - Daqing Zhao
H2O World - Advanced Analytics at Macys.com - Daqing Zhao
 
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
 
The Five Data Questions
The Five Data QuestionsThe Five Data Questions
The Five Data Questions
 
RWDG Slides: Using Agile to Justify Data Governance
RWDG Slides: Using Agile to Justify Data GovernanceRWDG Slides: Using Agile to Justify Data Governance
RWDG Slides: Using Agile to Justify Data Governance
 
Data Analyics
Data AnalyicsData Analyics
Data Analyics
 
Data-Ed Slides: Exorcising the Seven Deadly Data Sins
Data-Ed Slides: Exorcising the Seven Deadly Data SinsData-Ed Slides: Exorcising the Seven Deadly Data Sins
Data-Ed Slides: Exorcising the Seven Deadly Data Sins
 
Stop Searching for That Elusive Data Scientist
Stop Searching for That Elusive Data ScientistStop Searching for That Elusive Data Scientist
Stop Searching for That Elusive Data Scientist
 
1215 daa industry lunch
1215 daa industry lunch1215 daa industry lunch
1215 daa industry lunch
 

Similar to Leveraging an in-house modeling framework for fun and profit

#Datacaeer - AI Guild workshop on data roles in industry with Adam Green
#Datacaeer - AI Guild workshop on data roles in industry with Adam Green#Datacaeer - AI Guild workshop on data roles in industry with Adam Green
#Datacaeer - AI Guild workshop on data roles in industry with Adam Green
AI Guild
 
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Christopher Gutknecht
 
Big Data for Data Scientists - Info Session
Big Data for Data Scientists - Info SessionBig Data for Data Scientists - Info Session
Big Data for Data Scientists - Info Session
WeCloudData
 
BDA311 Introduction to AWS Glue
BDA311 Introduction to AWS GlueBDA311 Introduction to AWS Glue
BDA311 Introduction to AWS Glue
Amazon Web Services
 
How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists
CCG
 
Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems
MongoDB
 
Using Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your ClusterUsing Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your Cluster
MongoDB
 
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
All Things Open
 
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
dtz001
 
Build your open source data science platform
Build your open source data science platformBuild your open source data science platform
Build your open source data science platform
David Talby
 
Data engineering design patterns
Data engineering design patternsData engineering design patterns
Data engineering design patterns
Valdas Maksimavičius
 
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 edition
David Talby
 
Introduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudDataIntroduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudData
WeCloudData
 
Introduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudDataIntroduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudData
WeCloudData
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
DataWorks Summit
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
Rajesh Muppalla
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Databricks
 
What is Greenstone Digital Library and Tips for Development
What is Greenstone Digital Library and Tips for DevelopmentWhat is Greenstone Digital Library and Tips for Development
What is Greenstone Digital Library and Tips for Development
Ashok Kumar Satapathy
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
James Serra
 

Similar to Leveraging an in-house modeling framework for fun and profit (20)

#Datacaeer - AI Guild workshop on data roles in industry with Adam Green
#Datacaeer - AI Guild workshop on data roles in industry with Adam Green#Datacaeer - AI Guild workshop on data roles in industry with Adam Green
#Datacaeer - AI Guild workshop on data roles in industry with Adam Green
 
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
 
Big Data for Data Scientists - Info Session
Big Data for Data Scientists - Info SessionBig Data for Data Scientists - Info Session
Big Data for Data Scientists - Info Session
 
BDA311 Introduction to AWS Glue
BDA311 Introduction to AWS GlueBDA311 Introduction to AWS Glue
BDA311 Introduction to AWS Glue
 
How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists
 
Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems
 
Using Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your ClusterUsing Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your Cluster
 
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
 
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
 
Build your open source data science platform
Build your open source data science platformBuild your open source data science platform
Build your open source data science platform
 
Data engineering design patterns
Data engineering design patternsData engineering design patterns
Data engineering design patterns
 
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 edition
 
Introduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudDataIntroduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudData
 
Introduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudDataIntroduction to Machine Learning - WeCloudData
Introduction to Machine Learning - WeCloudData
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
 
What is Greenstone Digital Library and Tips for Development
What is Greenstone Digital Library and Tips for DevelopmentWhat is Greenstone Digital Library and Tips for Development
What is Greenstone Digital Library and Tips for Development
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 

Recently uploaded

Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 

Recently uploaded (20)

Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 

Leveraging an in-house modeling framework for fun and profit

  • 1. Leveraging an in-house modeling framework for fun and profit Mike Skarlinski & Brian Graham {michael.skarlinski, brian.graham}@weightwatchers.com June 2019
  • 2. Outline • Introduction: data science at WW – the new Weight Watchers • Problem: scalable, simple modeling and recommendation systems with a small team • Solution: design and benefits of building a framework • Implementation: Examples of deployed recommenders
  • 3.
  • 4. WW is a data driven application to help members on their wellness journeys Member Social Network Activity & Food tracking Weight progress & goals Recipe & food database
  • 5. As a new team, we are tasked with building a foundation of data products Social Network: Connect Growth WW Program Infra- structure Churn model Return model LTV models Single Member View Recipe recommender Similar recipes Composite foods ontology Personalized feed Groups search Who to follow APIs Primrose
  • 6. Data science team’s success hinges on effectively sharing work and knowledge openopen Brian Graham Reka Daniel-Weiner Yameng (Eliza) Zhang Kevin Zecchini Carl Anderson Michael (Mike) Skarlinski open Dec. 2019 May 2018 Jan. 2019 Mar. 2019 Feb. 2019 ... (Hint hint) How can we build software that helps us grow and develop as a team?
  • 7. WW recommender and modeling challenges
  • 8. Taking stock of our own challenges at WW What would make a good recommender system at WW? Slow serialization but our medium data can be kept in RAM... No live features but we know Docker, k8s... Easy onboarding mono repo with config as code...
  • 9. We built a framework to solve our challenges and enforce our design decisions (Open source coming soon!!!!!)
  • 10. Primrose: a framework for simple, quick modeling deployments
  • 11. Primrose has features to address each design consideration Python in-memory DAG runner, with no serialization between nodes of the DAG. DAG is defined as configuration-as-code approach -- one container for all models Abstract ML and data manipulation operations, data scientists can easily extend the framework Data science Infrastructure People Primrose: (Production In-Memory Solution) framework for solving WW’s most common use cases, caching batched predictions with machine-learning engineering baked-in.
  • 12. Primrose jobs are executed as Directed Acyclic Graphs (DAG)s in python Flexibility: any number of operations allowed in a single DAG, across any python library Data and functions are passed between nodes in an object that understands how to extract the correct data for each node
  • 13. DAGs are composed of implementation agnostic, extensible nodes for data science Data scientists can write any class that matches the abstract interface & incorporate in their DAGs Data scientists can write individual nodes using any Python framework or library they choose
  • 14. Primrose is run like an ETL pipeline in a single docker container for each configuration
  • 15. For simpler deployments: Primrose uses a “configuration as code” approach Object configuration and DAG structure are build in a configuration JSON Primrose validates the configuration and instantiates the correct classes at runtime Different outputs and results for each DAG Recipe recommender DAG JSON Churn Model DAG JSON Connect Feed DAG JSON Primrose container Success, fame, money...
  • 16. The framework has helped our team grow and develop production models Deployed 3 production models and 3 production recommenders Onboarded 6 members in less than a year, everyone is working in the framework! We’re going to open-source Primrose !!! Keep on the lookout or contact us!
  • 18. Food is at the core of our product
  • 19. We know you and meet you where you are. coffee croissant fish tacos apple cobb salad pasta with red sauce ice cream Personalize your experience using your data
  • 20. Recipe Recommendations Similar Recipes Dinner Recommendations
  • 21. Similar Recipes Flow US WW Recipes Similar Ingredients Similar Names Filters dietary course cuisine main ingredient document = ingredient list or name string lemmatize, tokenize, TF-IDF Cosine similarity Rank *Only recipes with images*
  • 22. Business Logic (filters) Productionalize in Primrose DAG Google BigQuery Data lake Reader NLTK + Custom Lemmatization Sklearn TF-IDF + cosine similarity Write to GCS Bucket and Google MemoryStore Success! logging.info(‘Your newbie DS has written production quality code.’)
  • 23. Business Logic (filters) Productionalize in Primrose DAG Google BigQuery Data lake Reader NLTK + Custom Lemmatization Sklearn TF-IDF + cosine similarity Write to GCS Bucket and Google MemoryStore Success! logging.info(‘Your newbie DS has written production quality code.’)
  • 24. Business Logic (filters) Productionalize in Primrose DAG Google BigQuery Data lake Reader NLTK + Custom Lemmatization Sklearn TF-IDF + cosine similarity Write to GCS Bucket and Google MemoryStore Success! logging.info(‘Your newbie DS has written production quality code.’)
  • 25. Dinner Recommendations Flow US WW Recipes Similar Ingredients Similar Names Business Logic Eligible Members 2 weeks of tracking history Tracked >= 1 recipe US members Potential Recs tracked most similar X XX X 2nd most sim. n = 4 recommendations
  • 26. Productionalizing is easier the second time Same BQ reader class, different SQL input file New postprocess class to sort, filter and interleave potential recommendations Success! logging.warning(‘Data Scientist is developing software engineering skills.’)
  • 27. Container Dinner Recs Primrose Container Container Recipe Recs Micro-Service Flask API Similar Recipes Primrose Redis Cache MemoryStore Final Deployment Architecture Datalake BigQuery Refresh Daily Refresh Daily Android Endpoint Clients iOS Web
  • 28. Q & A Open sourcing primrose here soon: https://github.com/ww-tech Tech blog https://medium.com/ww-tech-blog