Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014


Published on

As Netflix expands their services to more countries, devices, and content, they continue to evolve their big data analytics platform to accommodate the increasing needs of product and consumer insights. This year, Netflix re-innovated their big data platform: they upgraded to Hadoop 2, transitioned to the Parquet file format, experimented with Pig on Tez for the ETL workload, and adopted Presto as their interactive querying engine. In this session, Netflix discusses their latest architecture, how they built it on the Amazon EMR infrastructure, the contributions put into the open source community, as well as some performance numbers for running a big data warehouse with Amazon S3.

Published in: Technology

(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014

  1. 1. November 12, 2014 | Las Vegas, LV Eva Tse, Netflix
  2. 2. Cloud apps Suro Ursula Cassandra Aegisthus Dimension data Event Data 15 min Daily Amazon S3 SS tables
  3. 3. Amazon S3 Storage Compute Service Tools
  4. 4. Amazon S3 v2.0 Storage Compute Service Tools
  5. 5. •Works well on AmazonSimple Storage Service (S3)
  6. 6. YARN-1864 YARN-2026 YARN-2012 YARN-2214 YARN-2360 YARN-2540
  7. 7. S3
  8. 8. S3
  9. 9. Tez Plan Tez Execution Engine Logical Plan Physical Plan MR Plan MR Execution Engine MRCompiler TezCompiler d
  10. 10. A Distributed SQL Query Engine for Big Data
  11. 11.
  12. 12. 21 committed PRs and 14 PRs in review
  13. 13. S3
  14. 14. v2.0
  15. 15.
  16. 16. Amazon S3 v2.0 d Storage Compute Service Tools
  17. 17. YARN-1864 YARN-2026 YARN-2012 YARN-2214 YARN-2360 YARN-2540 HIVE-6783 HIVE-6785 HIVE-6938 HIVE-7800 PARQUET-100 PARQUET-106 PARQUET-2 PARQUET-22 PARQUET-70 PARQUET-75 PARQUET-92 PARQUET-99 PIG-3986
  18. 18. Talk Time Title PFC-305 Wednesday, 1:15pm Embracing Failure: Fault Injection and ServiceReliability BDT-403 Wednesday, 2:15pm Next Generation Big Data Platform at Netflix PFC-306 Wednesday, 3:30pm Performance Tuning EC2 DEV-309 Wednesday, 3:30pm From Asgardto Zuul, How Netflix’s proven Open Source Tools Can Accelerateand Scale Your Services ARC-317 Wednesday, 4:30pm Maintaining a ResilientFront-Door at Massive Scale PFC-304 Wednesday, 4:30pm Effective Inter-process Communicationsin the Cloud: The Pros and Cons of Micro Services Architectures ENT-209 Wednesday, 4:30pm Cloud Migration, Dev-Ops and Distributed Systems APP-310 Friday, 9:00am Scheduling using Apache Mesosin the Cloud
  19. 19.