Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Machine Learning Product Managers Meetup Event
1. 1. Meetup 101
2. The data team @ Meetup
3. ML product considerations
Alex Charnas, Product Manager
Ben Schulte, Sr. Engineering Director
Zachary Cohn, Principal Engineer
12. Data and Machine Learning Mission
Data and analytics drives impact for the entire organization
● understand impact
● identify opportunities
● improve the customer experience
Machine learning directly improves the customer experience
● personalization -- batch & low latency
● insights at scale
13. How is the Data team organized @ Meetup?
Machine Learning (ML)
Build quality and relevance into Meetup with
customer products and reusable APIs
Data Science (DS)
Deep insights into Meetup activity and
experimentation for internal customers
Data Platform (DP)
The bedrock for low-latency, accurate data
that power DS, ML and analytics
14. How do the teams work together?
DP ∩ ML
● Implement & operate a machine learning platform to bring ML product to our members
● Empower other teams to use ML models & insights in their products
DS ∩ DP
● Collect, organize and enhance analytic data
● Provide trusted, performant & self-service access to Meetup data & insights
Machine Learning (ML)
Build quality and relevance into Meetup with
customer products and reusable APIs
● Connect members & organizers through
high-quality, highly relevant
recommendations
● Maintain a library of reusable attributes
describing our members, groups &
events
Data Science (DS)
Deep insights into Meetup activity and
experimentation for internal customers
● Establish, maintain and expand a set of
ground truths describing Meetup activity
● Maintain an experiment framework that is
trusted & used by PMs & engineers
Data Platform (DP)
The bedrock for low-latency, accurate data
that power DS, ML and analytics
● Ensure ongoing data fidelity,
low-latency data access and system
stability
● Provide tools for internal customers
to simplify data access and make
development at scale easy
DS ∩ ML
● Apply statistics at scale to describe &
predict meetup activity
17. 1. Do you improve tools or the product?
Tools
● Decrease the cost ($$$)
● Reduce modeling / iteration
cycle time
● Add better data, feature,
model tracking
Product features
● New features
● New models
● Discovery / research
18. false choice!Correct answer…
Ideally* you improve the tools via product work:
Meetup ML product release New tooling added & now reused throughout platform
New Group Announcement Reusable feature library & distributed XGBoost training
Auto-approve Meetup Groups Low-latency features & auto-model retraining
Member → Group recommendation Airflow scheduling & lambda-served recommendations (burst
capacity!) on AWS
Show-up model Reduce model iteration time
Member → Topic recommendation Cloud compute $$$ pits of success
* $$$ / hours
19. 2. Selecting an Objective Function
● How will success be measured?
● What should the machine try to learn?
22. But we care about lots of stuff
● Joins per email but also...
○ Are they RSVPing to the events later?
○ Are we seeing an increase in unsubscribes?
○ Do we see an increase in new group successful starts?
23. But we care about lots of stuff
● Joins per email but also...
○ Are they RSVPing to the events later?
○ Are we seeing an increase in unsubscribes?
○ Do we see an increase in new group successful starts?
● Could try to find one metric to rule them all
○ We prefer a straightforward and interpretable key indicator
○ Other metrics are balancing: look at only to identify problems
24. 3. Making progress on projects crossing domains
Neighbor’s
fence
Neighbor’s yard
28. 4. How to prioritize having data?
I often say that when you can measure what you are
speaking about, and express it in numbers, you
know something about it; but when you cannot
measure it, when you cannot express it in numbers,
your knowledge is of a meagre and unsatisfactory
kind; it may be the beginning of knowledge, but you
have scarcely, in your thoughts, advanced to the
stage of science, whatever the matter may be.
-- Lord Kelvin (and not a pithy Peter Drucker quote.)
29. Back to the Future
● Data is the lifeblood of machine learning.
30. Back to the Future
● Data is the lifeblood of machine learning.
● Observing the past is easier than predicting the future.
● Observing the past is hard!
31. Back to the Future
● Data is the lifeblood of machine learning.
● Observing the past is easier than predicting the future.
● Observing the past is hard!
● Training requires predicting the future, in the past.
○ That sounds easy -- it’s already in the past.
○ But you need a representation of the state
of the world at arbitrary points of history.
32. 5. Translating Local Lift → Global Impact
Starting point: good (not great) impact from new ML model
How do we pump up the added value?
1. Follow the eyeballs → Know where impact is possible (not always easy)
2. Make some friends → what adjacent product could reuse your insight?
3. Socialize your ML portfolio
33. 6. Owned vs. Supported vs. Arbitrated
1. Algorithms aren’t a neutral selection
mechanism -- while they can optimize
content in a “shared” channel (e.g.
what should we promote on our
homepage) these are rarely solely
data-driven decisions.
2. ML teams need a good way to iterate
independently -- offline analysis is
great, but the gold standard is A/B
testing in production. Without a way to
do that, improvements are slower.