Cloud Connect 2012, Big Data @ Netflix

3,362 views

Published on

Using Big Data at Netflix to Grow our Business
& Retain our Customers.
Offline analysis: Honu, Hadoop & Hive
Online data: Cassandra

Published in: Technology
3 Comments
5 Likes
Statistics
Notes
No Downloads
Views
Total views
3,362
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
3
Likes
5
Embeds 0
No embeds

No notes for slide

Cloud Connect 2012, Big Data @ Netflix

  1. 1. Big Data @ Using Big Data to Grow our Business & Retain our Customers. Jerome Boulon Lead Architect, Hadoop Big Data Infrastructure February 15, 2012jboulon@netflix.com
  2. 2. Big Data @ NetflixOffline analysis:•  Honu: Scalable log analysis system to gain business insights: –  Errors logs (unstructured logs) –  Statistical logs & Performance logs –  EtcOnline analysis:•  Cassandra for all online activities and user facing data –  A/B testing (test allocation, metadata) –  Service level Configuration –  etc 2
  3. 3. Overview Data collection pipelineApplicaon   Collectors   Hive   M/R   Data processing pipeline 3
  4. 4. Honu - Structured Log APIUsing  Annota+ons   Using the Key/Value API•  Convert Java Class to Hive •  Produce the same result as Table dynamically Annotation•  Add/Remove column •  Avoid unnecessary object•  Supported java types: creation •  All primitives •  Fully dynamic •  Map •  Thread Safe •  Object using the toString method
  5. 5. Honu, What you get:log.logEvent(myObject) Hive table movieId customerId timestamp hostname Select customerId, count(1) from MyTable group by customerId;
  6. 6. December 2009 Collectors  –  POC for Streaming analysis Applicaon  –  Single AWS zone–  1 application–  60 Millions events/Day–  50 clients–  Small Hadoop cluster Oracle  –  1 Map/Reduce–  1 Table M/R  
  7. 7. Feb 2012 40+ Billion events/Day 8+ tables with 1+TB/Day 100+ smaller tables Self-serve: à No DBA à No Pre-provisioning     à Fully integrated with Hive- Multi Regions deployments- Transparent to our engineers- Streaming based solution- Zero configuration- 7000+ clients- Built-in: Netflix Hive warehouse - Fail-Over - Load balancing   à One central Data warehouse à Hourly/Daily reports à Data retention/expiration
  8. 8. Traceability & Performance analysis•  Track service level call –  Instrument low level HTTP client –  Calls graph –  Request processing vs Perceive latency –  Payload marshalling/unmarshalling - duration, size, etc –  Service Result - Status, Error code, Exception, etc
  9. 9. Diagnostic Information•  Collect latency information for all external operations•  If Latency > threshold log to Honu: –  AWS Region & Zone –  Instance –  Service details•  Open Jira/Ticket & Attach diagnostic info
  10. 10. Mix Offline and Online DataOffline data Specific conditions- Fire & forget - Online Data availability is not mandatory- Scale to very large volumes - If exist, data could be useful online- Cost effective - Only a subset useful Online - Ready to pay a little bit more Special collectors Customer support - All data goes to Hive - Browsing history - A subset goes to a real-time system - Historical & non-critical actions - Still cost effective Debug - Push validation - Root cause analysis
  11. 11. Honu Realtime usages•  Movie playback experience •  Customer Support –  Video quality –  Historical usage –  Network issue –  Last activity•  Errors Summary •  Launch Reports –  Error tracking per service –  Push validation –  Error tracking per device –  Root cause analysis
  12. 12. Honu Realtime - Architecture Realtime Data collection pipelineApplicaon   Collectors   Realme   Access   Realtime System M/R  
  13. 13. A/B Testing Test: An experiment where several competing behaviors are implemented and compared. Cell: different experiences within a test that are being compared against each other. Allocation: a customer-specific assignment to a cell within a testOnline data: Tracking 1 M customers per Test- Cell Allocation > 1 Billion records information 8 tracking events per Day- Test config: 1 entry/test/customer (example) ------------------------------------ 100 Tests = 800 M events/ Day 3 Months = 72 B events
  14. 14. Movie Presentation A/B Test
  15. 15. A/B Testing - Architecture Online Data Offline Data- Customer test allocation - Test tracking- Metadata about the test Ex:Ex: - Retention- Start/End date - Engagement metrics- UI directives- Logging directives
  16. 16. Beacon ServerUser behavior- Client side interactions- Search/Play/Stop/Pause Ajax callsDevice monitoring- Heartbeat- Status & Key metrics Beacon   Beacon   Beacon  
  17. 17. BI IntegrationThree main technologies•  Teradata (Data center)•  Hive (Cloud)•  Cassandra (Cloud)
  18. 18. Hive ß à BI–  Dimension tables (daily export from Teradata)–  Hourly/Daily Hive summary queries–  Hourly/Daily export from Hive to BI •  Queries runs in the cloud •  Aggregated result goes back to our BI solution
  19. 19. Hive Reports
  20. 20. Cassandra à BI•  Use Cassandra backups to run analytics•  Export SSTable to Hadoop•  Pig to: –  Parse SSTable –  Extract/Group required information•  Load the result back to Teradata
  21. 21. jboulon@gmail.comwww.linkedin.com/in/jboulon

×