Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cloud Connect 2012, Big Data @ Netflix


Published on

Using Big Data at Netflix to Grow our Business
& Retain our Customers.
Offline analysis: Honu, Hadoop & Hive
Online data: Cassandra

Published in: Technology

Cloud Connect 2012, Big Data @ Netflix

  1. 1. Big Data @ Using Big Data to Grow our Business & Retain our Customers. Jerome Boulon Lead Architect, Hadoop Big Data Infrastructure February 15,
  2. 2. Big Data @ NetflixOffline analysis:•  Honu: Scalable log analysis system to gain business insights: –  Errors logs (unstructured logs) –  Statistical logs & Performance logs –  EtcOnline analysis:•  Cassandra for all online activities and user facing data –  A/B testing (test allocation, metadata) –  Service level Configuration –  etc 2
  3. 3. Overview Data collection pipelineApplicaon   Collectors   Hive   M/R   Data processing pipeline 3
  4. 4. Honu - Structured Log APIUsing  Annota+ons   Using the Key/Value API•  Convert Java Class to Hive •  Produce the same result as Table dynamically Annotation•  Add/Remove column •  Avoid unnecessary object•  Supported java types: creation •  All primitives •  Fully dynamic •  Map •  Thread Safe •  Object using the toString method
  5. 5. Honu, What you get:log.logEvent(myObject) Hive table movieId customerId timestamp hostname Select customerId, count(1) from MyTable group by customerId;
  6. 6. December 2009 Collectors  –  POC for Streaming analysis Applicaon  –  Single AWS zone–  1 application–  60 Millions events/Day–  50 clients–  Small Hadoop cluster Oracle  –  1 Map/Reduce–  1 Table M/R  
  7. 7. Feb 2012 40+ Billion events/Day 8+ tables with 1+TB/Day 100+ smaller tables Self-serve: à No DBA à No Pre-provisioning     à Fully integrated with Hive- Multi Regions deployments- Transparent to our engineers- Streaming based solution- Zero configuration- 7000+ clients- Built-in: Netflix Hive warehouse - Fail-Over - Load balancing   à One central Data warehouse à Hourly/Daily reports à Data retention/expiration
  8. 8. Traceability & Performance analysis•  Track service level call –  Instrument low level HTTP client –  Calls graph –  Request processing vs Perceive latency –  Payload marshalling/unmarshalling - duration, size, etc –  Service Result - Status, Error code, Exception, etc
  9. 9. Diagnostic Information•  Collect latency information for all external operations•  If Latency > threshold log to Honu: –  AWS Region & Zone –  Instance –  Service details•  Open Jira/Ticket & Attach diagnostic info
  10. 10. Mix Offline and Online DataOffline data Specific conditions- Fire & forget - Online Data availability is not mandatory- Scale to very large volumes - If exist, data could be useful online- Cost effective - Only a subset useful Online - Ready to pay a little bit more Special collectors Customer support - All data goes to Hive - Browsing history - A subset goes to a real-time system - Historical & non-critical actions - Still cost effective Debug - Push validation - Root cause analysis
  11. 11. Honu Realtime usages•  Movie playback experience •  Customer Support –  Video quality –  Historical usage –  Network issue –  Last activity•  Errors Summary •  Launch Reports –  Error tracking per service –  Push validation –  Error tracking per device –  Root cause analysis
  12. 12. Honu Realtime - Architecture Realtime Data collection pipelineApplicaon   Collectors   Realme   Access   Realtime System M/R  
  13. 13. A/B Testing Test: An experiment where several competing behaviors are implemented and compared. Cell: different experiences within a test that are being compared against each other. Allocation: a customer-specific assignment to a cell within a testOnline data: Tracking 1 M customers per Test- Cell Allocation > 1 Billion records information 8 tracking events per Day- Test config: 1 entry/test/customer (example) ------------------------------------ 100 Tests = 800 M events/ Day 3 Months = 72 B events
  14. 14. Movie Presentation A/B Test
  15. 15. A/B Testing - Architecture Online Data Offline Data- Customer test allocation - Test tracking- Metadata about the test Ex:Ex: - Retention- Start/End date - Engagement metrics- UI directives- Logging directives
  16. 16. Beacon ServerUser behavior- Client side interactions- Search/Play/Stop/Pause Ajax callsDevice monitoring- Heartbeat- Status & Key metrics Beacon   Beacon   Beacon  
  17. 17. BI IntegrationThree main technologies•  Teradata (Data center)•  Hive (Cloud)•  Cassandra (Cloud)
  18. 18. Hive ß à BI–  Dimension tables (daily export from Teradata)–  Hourly/Daily Hive summary queries–  Hourly/Daily export from Hive to BI •  Queries runs in the cloud •  Aggregated result goes back to our BI solution
  19. 19. Hive Reports
  20. 20. Cassandra à BI•  Use Cassandra backups to run analytics•  Export SSTable to Hadoop•  Pig to: –  Parse SSTable –  Extract/Group required information•  Load the result back to Teradata
  21. 21.