Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Rapid Analytics @ Netflix LA
Chris Stephens
Senior Data Engineer
Yes, we’re in LA!
● 500+ employees in Beverly Hills
● growing rapidly
● new 14 story LA headquarters under construction
○ ...
Freedom & Responsibility
Context, not Control
Highly Aligned, Loosely Coupled Teams
Culture + Technology
Courage
Judgement...
Freedom & Responsibility
What does this mean for our day-to-day?
We let everyone drop tables
in production
Cost / Benefit
Conscientious people make mistakes,
but not very often
Data warehouse is not an operational system
What hap...
We have some protection
In Hive, all tables are external tables pointing to S3 locations.
ETL writes a new “batch” of data then updates the metast...
In our MPP databases, we have a procedure for upgrading and
downgrading our privileges.
CALL admin.UpgradePrivileges('me')...
When other teams are ready to move to production ...
We’re done. And moving on to the next thing.
You can trust your peopl...
We share our code
Netflix believes in open source, both inside and out.
● cross team access to source code repositories
● bi-weekly dedicate...
Netflix’s Big Data Portal
Data engineers & analysts use good judgement to decide
what work will have the most impact for our customers.
Managers pro...
We don’t have an “on call”
(Use a “first responder” instead)
Everyone on the team takes a shift: both BI and data engineers (even
managers every once in a while!)
First Responder = th...
Goal is to protect the team’s time and focus
How we do this
● visually define what needs attention and what doesn’t
○ “above the line” vs “below the line”
● email aler...
Have a very clear sense of what is urgent, and what isn’t
Treating every failure like it’s urgent bleeds your team of the time they
need to do work
Build your processes so they can...
Our Engineers use what they need
Netflix doesn’t have a CTO, and doesn’t have
architects, technical fellows, or technocrats.
Instead, Netflix has amazing e...
Data team wants to store real-time aggregations of
billions of records available for point queries.
Example #1
Data team does most processing on a table in batch,
but needs real-time lookups and updates in some cases.
Example #2
Our data platform team:
● gives us access to or build tools to let us do it ourselves
● holds regular “office hours” we ca...
We aren’t defined by our roles
BI engineer needs data structured a certain way for a report
Many environments:
● Ask a data engineer to build them a tabl...
We focus on centers of excellence, not role boundaries
More Examples:
● our BI engineers use Python to automate tasks
● our data engineers have Tableau licenses, and use them fo...
Questions?
Big Data Day LA 2016/ Big Data Track - Rapid Analytics @ Netflix LA (Updated and Expanded), Chris Stephens, Senior Data En...
Big Data Day LA 2016/ Big Data Track - Rapid Analytics @ Netflix LA (Updated and Expanded), Chris Stephens, Senior Data En...
Upcoming SlideShare
Loading in …5
×

Big Data Day LA 2016/ Big Data Track - Rapid Analytics @ Netflix LA (Updated and Expanded), Chris Stephens, Senior Data Engineer, Netflix

522 views

Published on

This talk explores how Netflix equips its engineers with the freedom to find and introduce the right software for the job - even if it isn't used anywhere else in-house. Examples include how Netflix has enabled analysts to fluidly switch between MPP RDBMS and an auto-scaling Presto cluster, how Spark + NoSQL stores are used when deploying data sets to internal web apps, and how data scientists are enabled to work in the ML framework of their choosing and deploy models as a service.

Published in: Technology
  • Be the first to comment

Big Data Day LA 2016/ Big Data Track - Rapid Analytics @ Netflix LA (Updated and Expanded), Chris Stephens, Senior Data Engineer, Netflix

  1. 1. Rapid Analytics @ Netflix LA Chris Stephens Senior Data Engineer
  2. 2. Yes, we’re in LA! ● 500+ employees in Beverly Hills ● growing rapidly ● new 14 story LA headquarters under construction ○ planning to move in early 2017
  3. 3. Freedom & Responsibility Context, not Control Highly Aligned, Loosely Coupled Teams Culture + Technology Courage Judgement Honesty Communication Curiosity Passion Innovation Impact Selflessness
  4. 4. Freedom & Responsibility What does this mean for our day-to-day?
  5. 5. We let everyone drop tables in production
  6. 6. Cost / Benefit Conscientious people make mistakes, but not very often Data warehouse is not an operational system What happens if a table is accidentally dropped? ● Do you have backups? ● How quickly can you restore a table? Is the benefit of worth the tax on every data / analytical product your team produces?
  7. 7. We have some protection
  8. 8. In Hive, all tables are external tables pointing to S3 locations. ETL writes a new “batch” of data then updates the metastore. s3://[bucket]/hive/schema.db/table/batchid=1459364911 ALTER TABLE table SET LOCATION [path to new batch ID]; DROP TABLE does not delete any data.
  9. 9. In our MPP databases, we have a procedure for upgrading and downgrading our privileges. CALL admin.UpgradePrivileges('me') Lasts for several hours. Usage is logged. Accidents? Restore from backups. Or reload from Hive.
  10. 10. When other teams are ready to move to production ... We’re done. And moving on to the next thing. You can trust your people to work the same way.
  11. 11. We share our code
  12. 12. Netflix believes in open source, both inside and out. ● cross team access to source code repositories ● bi-weekly dedicated time for innovation ● common identity management and access control for APIs
  13. 13. Netflix’s Big Data Portal
  14. 14. Data engineers & analysts use good judgement to decide what work will have the most impact for our customers. Managers provide context and support.
  15. 15. We don’t have an “on call” (Use a “first responder” instead)
  16. 16. Everyone on the team takes a shift: both BI and data engineers (even managers every once in a while!) First Responder = the first one to respond ● handles most common failures (restarting jobs) ● reaches out directly to ETL owner if escalation is required ● handles communication surrounding ETL delays
  17. 17. Goal is to protect the team’s time and focus
  18. 18. How we do this ● visually define what needs attention and what doesn’t ○ “above the line” vs “below the line” ● email alerts for “above the line” jobs that take longer than normal ● playbook for fixing common stuff ○ the more complete your entries are, the less you get called!
  19. 19. Have a very clear sense of what is urgent, and what isn’t
  20. 20. Treating every failure like it’s urgent bleeds your team of the time they need to do work Build your processes so they can be ignored for 3 days ● don’t load data if it’s incomplete ● reprocess fact data for several days instead of picking up the latest Gives you the freedom to judge whether a failure is worth an interruption
  21. 21. Our Engineers use what they need
  22. 22. Netflix doesn’t have a CTO, and doesn’t have architects, technical fellows, or technocrats. Instead, Netflix has amazing engineers.
  23. 23. Data team wants to store real-time aggregations of billions of records available for point queries. Example #1
  24. 24. Data team does most processing on a table in batch, but needs real-time lookups and updates in some cases. Example #2
  25. 25. Our data platform team: ● gives us access to or build tools to let us do it ourselves ● holds regular “office hours” we can use if we need help
  26. 26. We aren’t defined by our roles
  27. 27. BI engineer needs data structured a certain way for a report Many environments: ● Ask a data engineer to build them a table Our environment: ● Let the BI engineer schedule a Hive script and adjust as necessary
  28. 28. We focus on centers of excellence, not role boundaries
  29. 29. More Examples: ● our BI engineers use Python to automate tasks ● our data engineers have Tableau licenses, and use them for quick visualizations and report deployments For small tasks, this helps us avoid the overhead of interruption and knowledge transfer
  30. 30. Questions?

×