Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Genie - Hadoop Platform as a Service at Netflix

7,915 views

Published on

In a prior tech blog (http://nflx.it/XoySYR), we had discussed the architecture of our petabyte-scale data warehouse in the cloud. Salient features of our architecture include the use of Amazon’s Simple Storage Service (S3) as our "source of truth", leveraging the elasticity of the cloud to run multiple dynamically resizable Hadoop clusters to support various workloads, and our horizontally scalable Hadoop Platform as a Service called Genie.

We are pleased to announce that Genie is now open source (http://nflx.it/15rd6pJ), and available to the public from the Netflix OSS GitHub site (https://github.com/Netflix/genie).

Published in: Technology, Business
  • Be the first to comment

Genie - Hadoop Platform as a Service at Netflix

  1. 1. 1Genie – Hadoop Platform as a Service at NetflixSriram KrishnanHadoop Summit, June 26, 2013
  2. 2. Netflix does Hadoop
  3. 3. Netflix does Hadoop at scale
  4. 4. Netflix does Hadoop at scale*
  5. 5. Netflix does Hadoop at scale in the cloud
  6. 6. S3 as the Cloud Data WarehouseCloud Data Warehouse
  7. 7. Multiple Hadoop ClustersCloud Data WarehouseHadoop (EMR) Clusters
  8. 8. Data Platform as a ServiceCloud Data WarehouseHadoop (EMR) ClustersHadoop Platform as a ServiceJobExecutionResource Configuration& ManagementMetadata Service(Franklin)
  9. 9. Large Ecosystem of Clients & ToolsCloud Data WarehouseHadoop (EMR) ClustersHadoop Platform as a ServiceJobExecutionResource Configuration& ManagementMetadata Service(Franklin)
  10. 10. Why Genie? Simple API for job submission and management Accessible from the data center and the cloud Abstraction of physical details of back-endHadoop clusters
  11. 11. What Genie is Not A workflow scheduler, such as Oozie A task scheduler, such as fair share or capacityschedulers An end-to-end resource management tool
  12. 12. Genie: Job Execution API to run Hadoop, Hive and Pigjobs Auto-magic submission of jobsto the right Hadoop cluster Abstracting away cluster detailsfrom clients
  13. 13. Genie: Resource Configuration API for management of clustermetadata Status: up, out of service, orterminated Site-specific Hadoop, Hive andPig configurations Cluster naming/tagging for jobsubmissions
  14. 14. Eureka ServiceEureka ServiceClientEurekaClientRibbonClient EurekaClientPython APIRegistersserviceDiscoversserviceDiscoversserviceInvokes(submits job)Launchescluster(s)LaunchesjobRegistersclusterEnd-usersAdminsNetflix OSShttp://netflix.github.comKaryonEurekaClientRibbonServoHadoopHivePigKaryonArchaiusRibbonServoHadoopHivePigEurekaClient
  15. 15. Genie: Job Execution• Job Type: {hadoop, hive, pig}• File dependencies (script, udfs, etc)• Command-line arguments• Schedule: {adhoc, sla}• Configuration: {prod, test, unittest}REST call
  16. 16. Genie: Job Execution* Used to query status, get outputs, kill jobResponse: job ID*
  17. 17. Genie Job DetailsJob IDScript to executeStandard output and errorPig logsJob conf directory
  18. 18. Genie – Use Cases Enabled at Netflix Running nightly short-lived “bonus” clusters toaugment ETL processing Re-routing traffic between clusters “Red/black” pushes for clusters Attaching stand-alone gateways to clusters Running 100% of all SLA jobs, and a highpercentage of ad-hoc jobs
  19. 19. Nightly Short-lived Bonus ClustersExecution Service Configuration ServiceProd SLA Cluster:Schedule: slaConfigurations: prod
  20. 20. Nightly Short-lived Bonus ClustersBonus Cluster:Schedule: bonusConfigurations: prodExecution Service Configuration Service{Schedule=bonus,Configuration=prod}Prod SLA Cluster:Schedule: slaConfigurations: prod
  21. 21. Nightly Short-lived Bonus ClustersBonus Cluster:Schedule: bonusConfigurations: prodStatus: OUT_OF_SERVICEExecution Service Configuration ServiceProd SLA Cluster:Schedule: slaConfigurations: prod{Schedule=sla,Configuration=prod}
  22. 22. Nightly Short-lived Bonus ClustersBonus Cluster:Schedule: bonusConfigurations: prodStatus: TERMINATEDExecution Service Configuration ServiceProd SLA Cluster:Schedule: slaConfigurations: prod{Schedule=sla,Configuration=prod}
  23. 23. Rerouting Traffic Between ClustersAd-hoc Cluster:Schedule: adhocConfigurations: prod, testProd SLA Cluster:Schedule: slaConfigurations: prodExecution Service Configuration Service{Schedule=sla,Configuration=prod}
  24. 24. Rerouting Traffic Between ClustersAd-hoc Cluster:Schedule: adhoc, slaConfigurations: prod, testExecution Service Configuration Service{Schedule=sla,Configuration=prod}Prod SLA Cluster:Schedule: slaConfigurations: prodStatus: OUT_OF_SERVICE
  25. 25. Rerouting Traffic Between ClustersAd-hoc Cluster:Schedule: adhocConfigurations: prod, testProd SLA Cluster:Schedule: slaConfigurations: prodStatus: UPExecution Service Configuration Service{Schedule=sla,Configuration=prod}
  26. 26. “Red/Black” Pushes for ClustersProd SLA Cluster:Schedule: slaConfigurations: prodStatus: UPExecution Service Configuration Service{Schedule=sla,Configuration=prod}
  27. 27. “Red/Black” Pushes for ClustersProd SLA Cluster:Schedule: slaConfigurations: prodStatus: OUT_OF_SERVICEExecution Service Configuration Service{Schedule=sla,Configuration=prod}Prod SLA Cluster:Schedule: slaConfigurations: prodStatus: UP
  28. 28. “Red/Black” Pushes for ClustersProd SLA Cluster:Schedule: slaConfigurations: prodStatus: TERMINATEDExecution Service Configuration Service{Schedule=sla,Configuration=prod}Prod SLA Cluster:Schedule: slaConfigurations: prodStatus: UP
  29. 29. Genie Usage at Netflix Usage statistics brought to you by “Sherlock” Pig job to gather Hadoop job statistics Tableau-based visualization
  30. 30. Cloud Deployment Asgard is also part of Netflix OSS https://github.com/Netflix/asgard
  31. 31. Auto Scaling in the Cloud
  32. 32. Genie is now part of Netflix OSS! http://techblog.netflix.com/2013/06/genie-is-out-of-bottle.html Clone it on GitHub at: https://github.com/Netflix/genie Still “version 0” – work in progress! All contributions and feedback welcome! Come talk to us and check out live demos at theNetflix Booth
  33. 33. Watching Pigs Fly with theNetflix Hadoop Toolkit
  34. 34.  Sriram KrishnanWe’re hiring!Thank you!Home: http://www.netflix.comJobs: http://jobs.netflix.comTech Blog: http://techblog.netflix.com/

×