Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Watching Pigs Fly with the
Netflix Hadoop Toolkit
Hadoop Summit 2013
San Jose, CA
Data should be accessible, easy to discover, and
easy to process for everyone.
Our Motivation
Our Users
Analysts Engineers
Hadoop Platform as a Service
Hadoop Platform as a Service
S3
Hadoop Platform as a Service
Data Platform
Data Platform as a Service
Franklin
(Metadata API)
Sting
(Adhoc Visualization)
Forklift
(Data Movement)
Looper
(Backloadin...
Let’s solve a problem using the data!
Build a recommender.
But, what makes good recommendations?
Similarity
Personalization
COLORS!
COLORS!
Box art is colorful…
We’re Sorry
COLORS!
Box art is colorful…
Where can I find the data?
Hadoop Platform as a Service
S3
Hadoop Platform as a Service
S3Cassandra TeradataRedshiftRDS
Data Platform as a Service
Franklin
(Metadata API)
S3Cassandra TeradataRedshiftRDS
Data Platform as a Service
Franklin
(Metadata API)
Create a dataset for box art and color.
Whether your dataset is large or small, being
able to visualize it makes it easier to explain.
Data Platform as a Service
Franklin
(Metadata API)
Sting
(Adhoc Visualization)
Sting
• Allows users to cache the results of a genie job
in memory
• Sub second response to OLAP style operations
(slicing...
Hive
Query
Schema
% Content Consumed / Hour
Hemlock
Grove
House of
Cards
Arrested
Development
Similarity
House of
Cards
Macbeth
Toddlers
& Tiaras
Star Trek:
Voyager
Personalization
# of subscribers X # of titles
= ???,000,…,000 (big data)
Big Data
Netflix Apache Pig
Data Platform as a Service
Franklin
(Metadata API)
Sting
(Adhoc Visualization)
Lipstick
• Allows users to visualize their data flow
• Allows users to see common errors
• Allows users to easily monitor ...
Lipstick
Overall Job
Progress
Logical
Plan
Overall Job
Progress
Logical Operator
(reduce side)
Logical Operator
(map side)
Map/Reduce Job
Intermediate Row Count
Records
Loaded
Hadoop
Counters
My Job has stalled.
Common Problem #1
Unoptimized/Optimized
Logical Plan Toggle
Dangling
Operator
I didn’t get the data I was expecting
Common Problem #2
I don’t understand why my job failed.
Common Problem #3
Failed Job
(light red background)
Successful Job
(light blue background)
Wrapping up
• Demos at the Netflix booth in the exhibit hall
(see more Lipstick, Sting, and Genie).
• Lipstick is part of ...
 Charles Smith: charsmith@netflix.com
 Jeff Magnusson: jmagnusson@netflix.com
Thank you!
Jobs: http://jobs.netflix.com
N...
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Upcoming SlideShare
Loading in …5
×

Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)

3,516 views

Published on

Overview of the data platform as a service architecture at Netflix. We examine the tools and services built around the Netflix Hadoop platform that are designed to make access to big data at Netflix easy, efficient, and self-service for our users.

From the perspective of a user of the platform, we walk through how various services in the architecture can be used to build a recommendation engine. Sting, a tool for fast in memory aggregation and data visualization, and Lipstick, our workflow visualization and monitoring tool for Apache Pig, are discussed in depth. Lipstick is now part of Netflix OSS - clone it on github, or learn more from our techblog post: http://techblog.netflix.com/2013/06/introducing-lipstick-on-apache-pig.html.

Published in: Technology, Business
  • Be the first to comment

Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)

  1. 1. Watching Pigs Fly with the Netflix Hadoop Toolkit Hadoop Summit 2013 San Jose, CA
  2. 2. Data should be accessible, easy to discover, and easy to process for everyone. Our Motivation
  3. 3. Our Users Analysts Engineers
  4. 4. Hadoop Platform as a Service
  5. 5. Hadoop Platform as a Service S3
  6. 6. Hadoop Platform as a Service Data Platform
  7. 7. Data Platform as a Service Franklin (Metadata API) Sting (Adhoc Visualization) Forklift (Data Movement) Looper (Backloading) Ignite (A/B Test Analytics) Spock (Data Auditing) Genie (Hadoop PaaS) Lipstick (Pig Workflow Visualization) Event Service (Orchestration) Hadoop S3 Other Processing
  8. 8. Let’s solve a problem using the data!
  9. 9. Build a recommender.
  10. 10. But, what makes good recommendations? Similarity Personalization
  11. 11. COLORS!
  12. 12. COLORS! Box art is colorful…
  13. 13. We’re Sorry COLORS! Box art is colorful…
  14. 14. Where can I find the data?
  15. 15. Hadoop Platform as a Service S3
  16. 16. Hadoop Platform as a Service S3Cassandra TeradataRedshiftRDS
  17. 17. Data Platform as a Service Franklin (Metadata API) S3Cassandra TeradataRedshiftRDS
  18. 18. Data Platform as a Service Franklin (Metadata API)
  19. 19. Create a dataset for box art and color.
  20. 20. Whether your dataset is large or small, being able to visualize it makes it easier to explain.
  21. 21. Data Platform as a Service Franklin (Metadata API) Sting (Adhoc Visualization)
  22. 22. Sting • Allows users to cache the results of a genie job in memory • Sub second response to OLAP style operations (slicing, dicing, aggregations). • Adhoc / recurring schedule • Easy to use!
  23. 23. Hive Query Schema
  24. 24. % Content Consumed / Hour
  25. 25. Hemlock Grove House of Cards Arrested Development
  26. 26. Similarity
  27. 27. House of Cards Macbeth
  28. 28. Toddlers & Tiaras Star Trek: Voyager
  29. 29. Personalization
  30. 30. # of subscribers X # of titles = ???,000,…,000 (big data) Big Data
  31. 31. Netflix Apache Pig
  32. 32. Data Platform as a Service Franklin (Metadata API) Sting (Adhoc Visualization)
  33. 33. Lipstick • Allows users to visualize their data flow • Allows users to see common errors • Allows users to easily monitor their jobs • Empowers users to support themselves • Facilitates communication between infrastructure team and users
  34. 34. Lipstick
  35. 35. Overall Job Progress
  36. 36. Logical Plan Overall Job Progress
  37. 37. Logical Operator (reduce side) Logical Operator (map side) Map/Reduce Job Intermediate Row Count Records Loaded
  38. 38. Hadoop Counters
  39. 39. My Job has stalled. Common Problem #1
  40. 40. Unoptimized/Optimized Logical Plan Toggle Dangling Operator
  41. 41. I didn’t get the data I was expecting Common Problem #2
  42. 42. I don’t understand why my job failed. Common Problem #3
  43. 43. Failed Job (light red background) Successful Job (light blue background)
  44. 44. Wrapping up • Demos at the Netflix booth in the exhibit hall (see more Lipstick, Sting, and Genie). • Lipstick is part of Netflix OSS. • Clone it on github at http://github.com/Netflix/Lipstick • We welcome feedback and contributions!
  45. 45.  Charles Smith: charsmith@netflix.com  Jeff Magnusson: jmagnusson@netflix.com Thank you! Jobs: http://jobs.netflix.com Netflix OSS: http://netflix.github.io Tech Blog: http://techblog.netflix.com/

×