One-click Hadoop Cluster
Deployment on OpenPOWER
Systems
Pradeep K Surisetty
IBM
#OpenPOWERSummit
© 2014 IBM Corporation
2
#Whoami l
Systems & Infrastructure Engineer
l
9 Years + of Linux, Virtualization
l
Believe in Open Source Everything
l
Virtualization Test Lead/Solution Engineer
l
pradeepkumars@in.ibm.com
This is a team work:
Core Team:
• Pradipta Kumar, Pradeep K Surisetty, Ashish Kumar, Yogananth Subramaniyan, Poornima
Nayak, Sudeesh John
Acknowledgements:
• Dipankar Sarma, Vaidyanathan Srinivasan, Tarundeep S Kalra, Anbazhagan Mani, Ashish
Billore, Akash Gunjal
Join the conversation at #OpenPOWERSummit
© 2014 IBM Corporation
3
Elastic Hadoop on OpenPower Systems
Goal
●
Make Deployment & Operation of Hadoop Clusters simple on OpenPower Systems
●
Managed by OpenStack.
●
Run Hadoop Performance Benchmarks on this cluster. .
Key characteristics
●
Opensource Hadoop
●
OpenStack Native
●
Example OpenStack Sahara based Elastic Hadoop on OpenPower Servers
●
Benchmark Results
Join the conversation at #OpenPOWERSummit
© 2014 IBM Corporation
4
Intro on OpenStack and Sahara
OpenStack core components
Compute - Nova
Networking -Neutron
Object Storage – Swift
Block Storage – Cinder
Dashboard - Horizon
Identity Service - Keystone
Image Service - Glance
OpenStack Sahara Project
Sahara project is an initiative to provision
Hadoop on top of OpenStack (started by
Mirantis, Hortonworks and Red Hat)
Join the conversation at #OpenPOWERSummit
© 2014 IBM Corporation
5
High Level Architecture Overview
VM 1 VM 1VM 2 VM 2
PowerKVM
Compute Node
PowerKVM
Compute Node
HDFS
Openstack
Controller Node
Local Disk
Compute + Storage Node
●
Nova-compute
●
Cinder-volume
Controller Node
●
Nova-api
●
Cinder-scheduler
●
Cinder-api
●
Glance
●
Neutron
●
Horizon
●
Sahara
Join the conversation at #OpenPOWERSummit
© 2014 IBM Corporation
6
Test Environment Details
Hypervisor
Version PowerKVM-2.1.1
Kernel 3.10.42 -2015.1.pkvm2_1_1.40.
VM
OS RHEL7 PPC64
Kernel 3.10.0-123.el7
VCPU 8
Memory 40G
OpenStack
Version Juno
Sahara Upstream
Diskimage-builder Upstream
Infrastructure
Hardware IBM S822L
Socket 2
CPU 12
Memory 1TB
Disk 7.2TB
RAID 0
Hadoop Cluster
Hadoop 2.5.2
Data Node 2
Name node 1
Join the conversation at #OpenPOWERSummit
© 2014 IBM Corporation
7
Elastic Hadoop Work flow
1. Setup OpenStack Controller with Sahara plugin
2. Add Power/KVM compute nodes to OpenStack controller
3. Create Power arch (ppc64) images for Sahara
sahara-image-elements/diskimage-create/diskimage-create.sh -p vanilla -v 2.4 -i fedora
4. Register Image with Sahara
5. Create Node Group Templates based on required processes in the nodes.
●
Worker Template having only Data Node
●
Master Template Having Name node, Resource Manager, Node Manager
6 . Create Cluster Template as required
7. Launch Cluster based on template
8. Submit jobs to the Cluster
Demo Video: https://www.youtube.com/watch?v=JMprhJAF8FQ
Join the conversation at #OpenPOWERSummit
© 2014 IBM Corporation
8
Performance Results
Terasort for 500 GB of workload took 7000 seconds on this environment
with 2 Data nodes, 1 Name node
Join the conversation at #OpenPOWERSummit
© 2014 IBM Corporation
9
Upstream Contributions for PPC64 support
https://review.openstack.org/#/c/149045/
https://review.openstack.org/#/c/149165/
https://review.openstack.org/#/c/153404/
3. Enable vm element to create PowerPC image
2. Add support for using local PowerPC VM image
1. Ramdisk-image-create: Add support for vmlinux file
Join the conversation at #OpenPOWERSummit
© 2014 IBM Corporation
10
Upstream Contributions for PPC64 support
https://review.openstack.org/#/c/149045/
https://review.openstack.org/#/c/149165/
https://review.openstack.org/#/c/153404/
3. Enable vm element to create PowerPC image
2. Add support for using local PowerPC VM image
1. Ramdisk-image-create: Add support for vmlinux file
Join the conversation at #OpenPOWERSummit
© 2014 IBM Corporation
11
Reference & Demo Video

Hadoop on PowerKVM video:
https://www.youtube.com/watch?v=JMprhJAF8FQ

Creating an OpenStack cloud using DevStack and Power8 Compute Nodes
http://goo.gl/ZHYsot

Creating Openstack cloud using IBM Cloud Manager and Power8 Compute Nodes
http://goo.gl/3f46Lv

Hadoop Releases:
http://goo.gl/MOTq1x
http://hadoop.apache.org/releases.html/
Join the conversation at #OpenPOWERSummit
© 2014 IBM Corporation
12
Summary
• Hadoop Deployment & Operation can be done seamlessly on
OpenPower systems using OpenStack and Sahara
Join the conversation at #OpenPOWERSummit
© 2014 IBM Corporation
13Join the conversation at #OpenPOWERSummit
Post your questions here
pradeepkumars@in.ibm.com
bpradipta@in.ibm.com

One-click Hadoop Cluster Deployment on OpenPOWER Systems

  • 1.
    One-click Hadoop Cluster Deploymenton OpenPOWER Systems Pradeep K Surisetty IBM #OpenPOWERSummit
  • 2.
    © 2014 IBMCorporation 2 #Whoami l Systems & Infrastructure Engineer l 9 Years + of Linux, Virtualization l Believe in Open Source Everything l Virtualization Test Lead/Solution Engineer l pradeepkumars@in.ibm.com This is a team work: Core Team: • Pradipta Kumar, Pradeep K Surisetty, Ashish Kumar, Yogananth Subramaniyan, Poornima Nayak, Sudeesh John Acknowledgements: • Dipankar Sarma, Vaidyanathan Srinivasan, Tarundeep S Kalra, Anbazhagan Mani, Ashish Billore, Akash Gunjal Join the conversation at #OpenPOWERSummit
  • 3.
    © 2014 IBMCorporation 3 Elastic Hadoop on OpenPower Systems Goal ● Make Deployment & Operation of Hadoop Clusters simple on OpenPower Systems ● Managed by OpenStack. ● Run Hadoop Performance Benchmarks on this cluster. . Key characteristics ● Opensource Hadoop ● OpenStack Native ● Example OpenStack Sahara based Elastic Hadoop on OpenPower Servers ● Benchmark Results Join the conversation at #OpenPOWERSummit
  • 4.
    © 2014 IBMCorporation 4 Intro on OpenStack and Sahara OpenStack core components Compute - Nova Networking -Neutron Object Storage – Swift Block Storage – Cinder Dashboard - Horizon Identity Service - Keystone Image Service - Glance OpenStack Sahara Project Sahara project is an initiative to provision Hadoop on top of OpenStack (started by Mirantis, Hortonworks and Red Hat) Join the conversation at #OpenPOWERSummit
  • 5.
    © 2014 IBMCorporation 5 High Level Architecture Overview VM 1 VM 1VM 2 VM 2 PowerKVM Compute Node PowerKVM Compute Node HDFS Openstack Controller Node Local Disk Compute + Storage Node ● Nova-compute ● Cinder-volume Controller Node ● Nova-api ● Cinder-scheduler ● Cinder-api ● Glance ● Neutron ● Horizon ● Sahara Join the conversation at #OpenPOWERSummit
  • 6.
    © 2014 IBMCorporation 6 Test Environment Details Hypervisor Version PowerKVM-2.1.1 Kernel 3.10.42 -2015.1.pkvm2_1_1.40. VM OS RHEL7 PPC64 Kernel 3.10.0-123.el7 VCPU 8 Memory 40G OpenStack Version Juno Sahara Upstream Diskimage-builder Upstream Infrastructure Hardware IBM S822L Socket 2 CPU 12 Memory 1TB Disk 7.2TB RAID 0 Hadoop Cluster Hadoop 2.5.2 Data Node 2 Name node 1 Join the conversation at #OpenPOWERSummit
  • 7.
    © 2014 IBMCorporation 7 Elastic Hadoop Work flow 1. Setup OpenStack Controller with Sahara plugin 2. Add Power/KVM compute nodes to OpenStack controller 3. Create Power arch (ppc64) images for Sahara sahara-image-elements/diskimage-create/diskimage-create.sh -p vanilla -v 2.4 -i fedora 4. Register Image with Sahara 5. Create Node Group Templates based on required processes in the nodes. ● Worker Template having only Data Node ● Master Template Having Name node, Resource Manager, Node Manager 6 . Create Cluster Template as required 7. Launch Cluster based on template 8. Submit jobs to the Cluster Demo Video: https://www.youtube.com/watch?v=JMprhJAF8FQ Join the conversation at #OpenPOWERSummit
  • 8.
    © 2014 IBMCorporation 8 Performance Results Terasort for 500 GB of workload took 7000 seconds on this environment with 2 Data nodes, 1 Name node Join the conversation at #OpenPOWERSummit
  • 9.
    © 2014 IBMCorporation 9 Upstream Contributions for PPC64 support https://review.openstack.org/#/c/149045/ https://review.openstack.org/#/c/149165/ https://review.openstack.org/#/c/153404/ 3. Enable vm element to create PowerPC image 2. Add support for using local PowerPC VM image 1. Ramdisk-image-create: Add support for vmlinux file Join the conversation at #OpenPOWERSummit
  • 10.
    © 2014 IBMCorporation 10 Upstream Contributions for PPC64 support https://review.openstack.org/#/c/149045/ https://review.openstack.org/#/c/149165/ https://review.openstack.org/#/c/153404/ 3. Enable vm element to create PowerPC image 2. Add support for using local PowerPC VM image 1. Ramdisk-image-create: Add support for vmlinux file Join the conversation at #OpenPOWERSummit
  • 11.
    © 2014 IBMCorporation 11 Reference & Demo Video  Hadoop on PowerKVM video: https://www.youtube.com/watch?v=JMprhJAF8FQ  Creating an OpenStack cloud using DevStack and Power8 Compute Nodes http://goo.gl/ZHYsot  Creating Openstack cloud using IBM Cloud Manager and Power8 Compute Nodes http://goo.gl/3f46Lv  Hadoop Releases: http://goo.gl/MOTq1x http://hadoop.apache.org/releases.html/ Join the conversation at #OpenPOWERSummit
  • 12.
    © 2014 IBMCorporation 12 Summary • Hadoop Deployment & Operation can be done seamlessly on OpenPower systems using OpenStack and Sahara Join the conversation at #OpenPOWERSummit
  • 13.
    © 2014 IBMCorporation 13Join the conversation at #OpenPOWERSummit Post your questions here pradeepkumars@in.ibm.com bpradipta@in.ibm.com