Leveraging an in-house modeling framework for fun and profit

Leveraging an in-house modeling
framework for fun and profit
Mike Skarlinski & Brian Graham
{michael.skarlinski, brian.graham}@weightwatchers.com
June 2019

Outline
• Introduction: data science at WW – the new Weight Watchers
• Problem: scalable, simple modeling and recommendation systems with a small team
• Solution: design and beneﬁts of building a framework
• Implementation: Examples of deployed recommenders

WW is a data driven application to help members
on their wellness journeys
Member Social
Network
Activity & Food
tracking
Weight progress &
goals
Recipe & food
database

As a new team, we are tasked with building a
foundation of data products
Social
Network:
Connect
Growth
WW
Program
Infra-
structure
Churn model
Return model
LTV models
Single Member View
Recipe recommender
Similar recipes
Composite foods ontology
Personalized feed
Groups search
Who to follow
APIs
Primrose

Data science team’s success hinges on effectively
sharing work and knowledge
openopen
Brian
Graham
Reka
Daniel-Weiner
Yameng
(Eliza) Zhang
Kevin
Zecchini
Carl
Anderson
Michael (Mike)
Skarlinski
open
Dec.
2019
May
2018
Jan.
2019
Mar.
2019
Feb.
2019
...
(Hint hint)
How can we build software that helps us grow and develop as a team?

WW recommender and modeling
challenges

Taking stock of our own challenges at WW
What would make a good recommender system at WW?
Slow serialization
but our medium data
can be kept in RAM...
No live features
but we know Docker, k8s...
Easy onboarding
mono repo with conﬁg as code...

We built a framework to solve our challenges and
enforce our design decisions
(Open source coming soon!!!!!)

Primrose: a framework for simple, quick
modeling deployments

Primrose has features to address each design
consideration
Python in-memory DAG runner, with no
serialization between nodes of the DAG.
DAG is deﬁned as configuration-as-code
approach -- one container for all models
Abstract ML and data manipulation operations,
data scientists can easily extend the framework
Data science Infrastructure People
Primrose: (Production In-Memory Solution) framework for solving
WW’s most common use cases, caching batched predictions with
machine-learning engineering baked-in.

Primrose jobs are executed as Directed Acyclic
Graphs (DAG)s in python
Flexibility: any number of operations
allowed in a single DAG, across any
python library
Data and functions are passed between
nodes in an object that understands how
to extract the correct data for each node

DAGs are composed of implementation agnostic,
extensible nodes for data science
Data scientists can write any class that
matches the abstract interface &
incorporate in their DAGs
Data scientists can write individual nodes using
any Python framework or library they choose

Primrose is run like an ETL pipeline in a single
docker container for each configuration

For simpler deployments: Primrose uses a
“configuration as code” approach
Object configuration and DAG structure
are build in a configuration JSON
Primrose validates the configuration
and instantiates the correct classes at
runtime
Different outputs and results for each
DAG
Recipe recommender DAG JSON
Churn Model DAG JSON
Connect Feed DAG JSON
Primrose container Success, fame, money...

The framework has helped our team grow
and develop production models
Deployed 3 production
models and 3 production
recommenders
Onboarded 6 members in less
than a year, everyone is working
in the framework!
We’re going to open-source Primrose !!! Keep on the lookout or contact us!

Food is at the core of our product

We know you and meet you where you are.
coffee
croissant
fish tacos
apple
cobb salad
pasta with red sauce
ice cream
Personalize your
experience using your data

Recipe Recommendations
Similar Recipes Dinner Recommendations

Similar Recipes Flow
US WW Recipes
Similar Ingredients
Similar Names
Filters
dietary
course
cuisine
main ingredient
document = ingredient list or name string
lemmatize, tokenize, TF-IDF
Cosine similarity
Rank
*Only recipes with images*

Business Logic (filters)
Productionalize in Primrose DAG
Google BigQuery Data lake Reader
NLTK + Custom Lemmatization
Sklearn TF-IDF + cosine similarity
Write to GCS Bucket and Google MemoryStore
Success!
logging.info(‘Your newbie DS has written production quality code.’)

Dinner Recommendations Flow
US WW Recipes
Similar Ingredients
Similar Names Business Logic
Eligible Members
2 weeks of tracking history
Tracked >= 1 recipe
US members
Potential Recs
tracked
most similar
X XX
X
2nd most sim.
n = 4 recommendations

Productionalizing is easier the second time
Same BQ reader class,
different SQL input file
New postprocess class to sort, filter and interleave potential recommendations
Success!
logging.warning(‘Data Scientist is developing software engineering skills.’)

Container
Dinner
Recs
Primrose
Container
Container
Recipe Recs
Micro-Service
Flask API
Similar
Recipes
Primrose
Redis Cache
MemoryStore
Final Deployment Architecture
Datalake
BigQuery
Refresh Daily
Refresh Daily
Android
Endpoint
Clients
iOS
Web

Q & A
Open sourcing primrose here soon:
https://github.com/ww-tech
Tech blog
https://medium.com/ww-tech-blog

Leveraging an in-house modeling framework for fun and profit

Leveraging an in-house modeling framework for fun and profit

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Leveraging an in-house modeling framework for fun and profit

Similar to Leveraging an in-house modeling framework for fun and profit (20)

Recently uploaded

Recently uploaded (20)

Leveraging an in-house modeling framework for fun and profit