The document discusses culture, data engineering, and approaches to data science at Netflix. It emphasizes that Netflix's data science culture values freedom and responsibility, and providing context rather than control. It also stresses that data engineering is equally as important as data science, as it allows data scientists to scale their work. The document contrasts two paradigms for structuring work - the "hamburger stand" approach of simply fulfilling requests, versus the "butler" approach of anticipating needs. It also overview how Netflix has built collaborative data science ecosystems.
2. Outline.
● Culture
● Data Engineering is Important
● Hamburger Stands vs. Butlers
● Data Science Ecosystems
● Questions
3. What I talk about when I talk about
data science at Netflix.
4. Culture
There is no way for me to discuss how data science works at Netflix
without discussing culture.
● Freedom and responsibility.
● Context, not control.
● What we value: Judgment, Communication, Curiosity, Innovation,
Courage, Passion, Selflessness, Inclusion, Integrity, Impact.
https://jobs.netflix.com/culture
5. Culture - Freedom and Responsibility
● Freedom: Choice of programming language.
● Responsibility: Need to integrate with the rest of the tech stack
and communicate clearly how to use.
● Freedom: Choice of projects.
● Responsibility: Use my best judgment to choose projects that
provide highest impact to Netflix.
6. Culture - Context Not Control
Data scientists often want to convince stakeholders to use outputs of
models rather than heuristics.
● Context: Provide documentation (communication) and compelling
reasons (innovation, impact) to use the output of a predictive
model.
● Not Control: Cannot and should not mandate workflows.
8. ● Roughly 125 data scientists and analysts on five sub-teams:
○ Growth and Business Operations [25%]
○ Content and Marketing [25%]
○ Member Product [20%]
○ Studio Production and Streaming [20%]
○ Discovery Research [10%]
Our Science and Analytics Team
9. ● Roughly 125 engineers on five sub-teams:
○ Data Infrastructure [45%]
○ Product, Marketing, and Content Data [15%]
○ Studio Production and Streaming Data [15%]
○ Membership, Corporate, and Platform Data [15%]
○ Experimentation Platform [10%]
Our Data Engineering Team
10. ● Data science and analytics touch every area of the company.
○ Context, not control.
● Netflix tends towards applied research.
○ Cultural values: Impact, judgement.
● Data engineering has EQUAL priority with science and analytics.
What to take away from these numbers.
12. ● Data science is 80% data, 20% science.
● Without good data engineering as a
backbone...
○ Best case: Your data scientist will
become a okay data engineer.
○ Worst case: Your data scientist will
become a highly paid data entry
person.
The data/science split.
13. Super scrappy data scientist
Best Case Scenario
● Pulls together various internal data
sources, builds a data set, trains a
model, provides useful predictions.
● Success!
14. Super scrappy data scientist
Best Case Scenario
● Can we have that refreshed - daily,
weekly, monthly, quarterly?
16. Best Case Scenario
● What your scrappy data scientist isn’t doing:
○ Improving the original model via experimentation
○ Working on new projects
○ Learning new things for other areas of the business
17. Example - Problem Set Up
To do data science on content, we need data on content.
That data then needs to be organized, cleaned, and
understood.
18. Example - Version 1
title_name studio
Ex Machina Universal
Run Lola Run Prokind
Friends Warner Brothers
Life of Pi 20th Century Fox
20. Example - Version 2
title_name release_year studio
Ex Machina 2014 Universal
Run Lola Run 1998 Prokind
Friends 1994 Warner Brothers
Life of Pi 2012 20th Century
Fox
Frozen 2010 Anchor Bay
Frozen 2013 Disney
23. Data is the Hot New Thing™, let’s hire ourselves a data scientist!
Worst Case Scenario
24. Super scrappy data scientist
Worst Case Scenario
● Data scientist is given the data in the
following format.
○ Becky_data_v1.csv
○ Becky_data_v1-1.csv
○ Becky_Rachel_data_combo_v2.xlsx
○ Data_no_column_headers.csv
○ Becky_data_current_v1.dat
○ data_20171214_final.csv
25. Super scrappy data scientist
Worst Case
Super scrappy, overpaid
data entry person
28. Hamburger Stands vs Butlers
Two paradigms for how you structure the flow of demands from other
areas of the company into a data science team.
29. ● Service oriented.
○ Customer happiness matters.
○ Serving a larger cause, not themselves.
● Rarely solo, typically work in teams.
○ Light specialization, with broad skill sets.
● Good at juggling many demands on time.
What they have in common
30. The Hamburger Stand
● Takes requests, fills orders.
○ Primary metrics: Volume,
efficiency, accuracy.
● Not strategic.
● Treats all customers the same.
● Requires customers to specify exactly
what they want (especially problematic
since they may not know).
31. The Butler
● “We are ahead of our guests needs. We like to
be there before they ask.”
○ Anticipate needs or problems.
○ Have to understand your customer.
● Find repeatable solutions to a variety of
problems.
● Metrics: Quality, reliability, efficiency.
● “When things are going very, very smoothly and
they don’t notice you, you’re successful.”
○ Cultural value: Selflessness.
http://fortune.com/2013/03/04/inside-the-world-of-the-modern-day-butler/
33. Hundreds of models deployed for business needs across the company
Typical Ecosystem
34. ● Close partnership with data engineers/dev teams.
○ No throwing it over the fence
○ They are also not hamburger stands
● Emphasis on maintainability
○ Graceful error handling
○ Four day rule on ETL
What works for us
35. ● Moderate software engineering practices
○ Use Git.
○ Reproducible/documented research
○ Published iPython notebooks
○ Contribution to shared data science ecosystems
○ Microservices for ease of model access (no gatekeeping)
What works for us