Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)

3,536 views

Published on

Overview of the data platform as a service architecture at Netflix. We examine the tools and services built around the Netflix Hadoop platform that are designed to make access to big data at Netflix easy, efficient, and self-service for our users.

From the perspective of a user of the platform, we walk through how various services in the architecture can be used to build a recommendation engine. Sting, a tool for fast in memory aggregation and data visualization, and Lipstick, our workflow visualization and monitoring tool for Apache Pig, are discussed in depth. Lipstick is now part of Netflix OSS - clone it on github, or learn more from our techblog post: http://techblog.netflix.com/2013/06/introducing-lipstick-on-apache-pig.html.

Published in: Technology, Business
  • Be the first to comment

Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)

  1. 1. Watching Pigs Fly with the Netflix Hadoop Toolkit Hadoop Summit 2013 San Jose, CA
  2. 2. Data should be accessible, easy to discover, and easy to process for everyone. Our Motivation
  3. 3. Our Users Analysts Engineers
  4. 4. Hadoop Platform as a Service
  5. 5. Hadoop Platform as a Service S3
  6. 6. Hadoop Platform as a Service Data Platform
  7. 7. Data Platform as a Service Franklin (Metadata API) Sting (Adhoc Visualization) Forklift (Data Movement) Looper (Backloading) Ignite (A/B Test Analytics) Spock (Data Auditing) Genie (Hadoop PaaS) Lipstick (Pig Workflow Visualization) Event Service (Orchestration) Hadoop S3 Other Processing
  8. 8. Let’s solve a problem using the data!
  9. 9. Build a recommender.
  10. 10. But, what makes good recommendations? Similarity Personalization
  11. 11. COLORS!
  12. 12. COLORS! Box art is colorful…
  13. 13. We’re Sorry COLORS! Box art is colorful…
  14. 14. Where can I find the data?
  15. 15. Hadoop Platform as a Service S3
  16. 16. Hadoop Platform as a Service S3Cassandra TeradataRedshiftRDS
  17. 17. Data Platform as a Service Franklin (Metadata API) S3Cassandra TeradataRedshiftRDS
  18. 18. Data Platform as a Service Franklin (Metadata API)
  19. 19. Create a dataset for box art and color.
  20. 20. Whether your dataset is large or small, being able to visualize it makes it easier to explain.
  21. 21. Data Platform as a Service Franklin (Metadata API) Sting (Adhoc Visualization)
  22. 22. Sting • Allows users to cache the results of a genie job in memory • Sub second response to OLAP style operations (slicing, dicing, aggregations). • Adhoc / recurring schedule • Easy to use!
  23. 23. Hive Query Schema
  24. 24. % Content Consumed / Hour
  25. 25. Hemlock Grove House of Cards Arrested Development
  26. 26. Similarity
  27. 27. House of Cards Macbeth
  28. 28. Toddlers & Tiaras Star Trek: Voyager
  29. 29. Personalization
  30. 30. # of subscribers X # of titles = ???,000,…,000 (big data) Big Data
  31. 31. Netflix Apache Pig
  32. 32. Data Platform as a Service Franklin (Metadata API) Sting (Adhoc Visualization)
  33. 33. Lipstick • Allows users to visualize their data flow • Allows users to see common errors • Allows users to easily monitor their jobs • Empowers users to support themselves • Facilitates communication between infrastructure team and users
  34. 34. Lipstick
  35. 35. Overall Job Progress
  36. 36. Logical Plan Overall Job Progress
  37. 37. Logical Operator (reduce side) Logical Operator (map side) Map/Reduce Job Intermediate Row Count Records Loaded
  38. 38. Hadoop Counters
  39. 39. My Job has stalled. Common Problem #1
  40. 40. Unoptimized/Optimized Logical Plan Toggle Dangling Operator
  41. 41. I didn’t get the data I was expecting Common Problem #2
  42. 42. I don’t understand why my job failed. Common Problem #3
  43. 43. Failed Job (light red background) Successful Job (light blue background)
  44. 44. Wrapping up • Demos at the Netflix booth in the exhibit hall (see more Lipstick, Sting, and Genie). • Lipstick is part of Netflix OSS. • Clone it on github at http://github.com/Netflix/Lipstick • We welcome feedback and contributions!
  45. 45.  Charles Smith: charsmith@netflix.com  Jeff Magnusson: jmagnusson@netflix.com Thank you! Jobs: http://jobs.netflix.com Netflix OSS: http://netflix.github.io Tech Blog: http://techblog.netflix.com/

×