Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack

2,578 views
2,380 views

Published on

Published in: Health & Medicine, Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,578
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
210
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack

  1. 1. Savanna Hadoop on OpenStack Ilya Elterman (Mirantis) Matthew Farrellee (Red Hat) Sergey Lukjanov (Mirantis)
  2. 2. Agenda ● Savanna Overview ● Current state ○ EDP overview ○ other features ● Roadmap ● Live Demo
  3. 3. Agenda ● Savanna Overview ● Current state ○ EDP overview ○ other features ● Roadmap ● Live Demo
  4. 4. OpenStack Data Processing - Savanna Mission: To provide the OpenStack community with an open, cutting edge, performant and scalable data processing stack and associated management interfaces ● provision and operate Hadoop clusters ● schedule and operate Hadoop jobs
  5. 5. Hadoop - Big Data Platform
  6. 6. Popularity Hadoop OpenStack http://www.google.com/trends/explore?q=hadoop+openstack#q=openstack%2C%20hadoop&cmpt=q
  7. 7. Use Cases ● Self-service provisioning of Hadoop clusters ● Utilization of unused compute capacity for bursty workloads ● Run Hadoop workloads in few clicks without expertise in Hadoop ops
  8. 8. Architecture Overview Keystone Hadoop VM Hadoop VM Horizon Hadoop VM Auth Savanna Pages Cluster Configuration Manager REST API Savanna Python Client Hadoop VM Vendors Plugins Job Sources Job Manager Data Access Layer Swift Nova Trove DB Data Sources Resources Orchestration Manager Glance Heat Cinder Neutron
  9. 9. Savanna Status ● Official incubated OpenStack project ● v0.3 released 17 Oct 2013 ● Supported Hadoop distros: ○ Vanilla Apache Hadoop (reference implementation) ○ Hortonworks Data Platform 1.3.x ○ Intel Distribution on review ○ Cloudera Distribution in blueprint ● Included in OpenStack distros: ○ RDO - http://openstack.redhat.com ○ Mirantis OpenStack - http://software.mirantis.com
  10. 10. Cluster Provisioning Performance
  11. 11. Agenda ● Savanna Overview ● Current state ○ EDP overview ○ other features ● Roadmap ● Live Demo
  12. 12. EDP Overview ● End users have data and questions ○ The data lives in a data repository ○ The questions are embodied in code ● Savanna Elastic Data Processing (EDP) brings the Hadoop ecosystem to the end user ○ Hides all cluster management behind the scenes
  13. 13. EDP “Customers launch millions of Amazon EMR clusters every year.” http://aws.amazon.com/elasticmapreduce/
  14. 14. EDP ● Variety and depth of value add offerings on top of clouds are growing ● Offerings are rarely open, rarely allow for choice ● Examples - Google Cloud, Azure, AWS
  15. 15. EDP Savanna and EDP can both match and exceed use cases provided by most public clouds
  16. 16. EDP in Savanna v0.3 ● UI, integrated into Horizon, for ad-hoc analytics queries based on Hive or Pig ● API to execute MapReduce jobs without exposing details of underlying infrastructure ● Pluggable data sources: Swift ● Supported job types: Jar, Pig, Hive ● Integration with Oozie for workflow management
  17. 17. Agenda ● Savanna Overview ● Current state ○ EDP overview ○ other features ● Roadmap ● Live Demo
  18. 18. Cluster Ops in Savanna 0.3 REST API Configuration templates Manual cluster scaling Data node anti-affinity and location control Full support of data locality - rack and 4-level awareness for HDFS and Swift ● Swift integration ● ● ● ● ●
  19. 19. OpenStack Integration in Savanna 0.3 ● ● ● ● OpenStack Dashboard plugin Both Neutron and Nova Network support Keystone trusts used for async operations Python client
  20. 20. Agenda ● Savanna Overview ● Current state ○ EDP overview ○ other features ● Roadmap ● Live Demo
  21. 21. Live Demo
  22. 22. Icehouse Roadmap ● Integration with OpenStack ecosystem ○ Heat ○ Tempest ○ Devstack ○ Ceilometer ○ Ironic ● EDP enhancements ● Code hardening ● Polished api v2 ● Performance testing
  23. 23. Design Summit Sessions Friday, November 8 ● 1:30pm Network and installation topologies ● 2:20pm Heat integration and scalability ● 3:10pm Further OpenStack integration ● 4:10pm Savanna in Icehouse http://goo.gl/2iEv8u
  24. 24. Q&A

×