Big Data Technologies
Sahara Intro & Future Plan
Weiting Chen
weiting.chen@intel.com
SSG / STO / BDT
Legal Disclaimers
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this
document.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of
merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from
course of performance, course of dealing, or usage in trade.
This document contains information on products, services and/or processes in development. All information
provided here is subject to change without notice. Contact your Intel representative to obtain the latest
forecast, schedule, specifications and roadmaps.
The products and services described may contain defects or errors known as errata which may cause
deviations from published specifications. Current characterized errata are available on request.
© 2015 Intel Corporation.
SSG / STO / BDT
WHO WE ARE
Bring Cloudera CDH 5.3 Plugin into OpenStack Sahara
Complete to add all the services in Cloudera CDH 5.3 and integrate them into Sahara CDH Plugin
Provide Complete Integration Test to Help a Better User Experience
A complete integration testing in OpenStack Sahara to help deliver a good user experience in Sahara
CDH Plugin
Rank #3 Commits Company in Sahara Contribution
Ranked after #1 Mirantis and #2 Red Hat
SSG / STO / BDT
OPENSTACK HISTORY
Austin
Bexar
Cactus
Diablo
Essex
Folsom
Grizzly
Havana
Icehouse
Juno
Kilo
Nova
Swift
Glance
Horizon
Keystone
Quantum
Cinder
Ceilometer
Trove
Sahara
Ironic
• Zaqar
• Manila
• Designate
• Barbican
Incubation
2010
2011
2012
2013
2014
2015
SSG / STO / BDT
Move Focus from IaaS to PaaS and SaaS
more and more applications(xxx-as-a-service) based on OpenStack infrastructure
SSG / STO / BDT
~ 25.9% CAGR
Big Data Market expects to
grow from 16.5 billion (2014)
to 41.5 billion (2018), it also
includes cloud infrastructure
segment from 1.2 billion
(2014) to 4.7 billion (2018)
200 Billion
Cloud market will hit 118
billion in 2015, 200 billion by
2018, from 95.8 million
market reached in 2014.
Trend
Source from IDC 2014
Cloud-based solution will
shape IT spending for years.
IDC estimates cloud services
spending will continue to
grow at double-digit rates for
the next few years.
FROM THE MARKET
Big Data Cloud Market X-as-a-Service
SSG / STO / BDT
Big DataInternet Of Thing
THE VISION
Cloud Computing
Different data source will
come from diversity of
devices.
Using data processing
model to process the data
and transfer it become high
value.
A shared resources
infrastructure to support a
flexible IT environment and
fulfill the requirement on
demand.
SSG / STO / BDT
OpenStack vs Hadoop
Most Companies using OpenStack cluster in their IT environment are
also preparing another Hadoop cluster for Big Data analytics.
Sahara is a solution to bring Hadoop and OpenStack together.
SSG / STO / BDT
SAHARA BACKGROUND
Basic Idea comes from Amazon Elastic MapReduce (EMR)
To provide users easily provisioning Hadoop clusters by specifying
several parameters
Analytics as a Service for data scientist or analyst
SSG / STO / BDT
ARCHITECTURE
SSG / STO / BDT
Sahara Key Features - Provision Cluster
Create/Terminate Cluster
• Heat API/Nova Direct API
• Neutron/Nova Network
• Floating IP Management
• Anti-affinity
Cluster Scaling
• Add Node/Remove Node
Support Plugins
• Vanilla/Hortonworks Data Platform/Cloudera/Spark/MapR
SSG / STO / BDT
Sahara Key Features - Elastic Data Processing
Support Job Type
• Hive/Pig/MapReduce/MapReduce Streaming/Java/Spark/Shell/HBase
Support Data Locality
• Rack/Hypervisor/Swift
Data Source
• Internal: Ephemeral Disk/Cinder
• External: Swift
Run Job in Transient Cluster
*Different Plugin provide different capabilities
SSG / STO / BDT
WORKING FLOW
Fast Cluster Provisioning
Select
Hadoop Version
Select
Base Image
w/ Hadoop
Define
Cluster
Configuration
Provision
Cluster
Operate
Cluster
Terminate
Cluster
Analytic as a Service using Elastic Data Processing
Select
Hadoop Version
Configure Jobs
Set Limit
for Cluster
Execute Jobs Get The Result
• Choose type of the job: pig, hive, jar-file, etc.
• Select input and output data location (Swift support)
• Cluster will be removed automatically after the job completion
• Provide the details Hadoop configuration, like size, topology, and others
• Sahara will provision VMs, install and configure Hadoop
• Support Scale out Cluster to add/remove nodes
SSG / STO / BDT
CLOUDERA CDH PLUGIN
Controller Computing Node1
VM1 - Master VM2 - Slave
Cloudera Manager
(Cloudera Express v5.1.3,
CDH v5.0.0 & CM API v7)
Job History
Resource Manager
Oozie Server
Name Node
Secondary
Name Node
Data Node
Node Manager
Cloudera Manager
API Python Client
(Migrate from CM-API Client)
Sahara Service
Horizon(OpenStack Dashboard)
CDH Plugin
Step1: Create VM via Heat by using Cluster Template. CM must be included in one master machine.
Step2: Use CM API Client to connect to CM and provision the other services in the cluster.
STEP1
STEP2
CDH ClusterEnd Customer
SSG / STO / BDT
DATA PROCESSING MODEL
Swift
OpenStack
Virtual Clusters
OpenStack
Virtual Clusters
HDFS
Collector Agent
Data Stream
Pattern 2: External - SwiftPattern 1: Internal - HDFS Only
Collector Agent
Collecting Data
Collecting Data
OpenStack use Swift as a data source to store input
and output data. The benefit is to process the data
directly and persist the data via Swift.
OpenStack support to create HDFS on Cinder or
Ephemeral Disk. This method can provide a better
data processing performance via Ephemeral Disk or
to persist the data via Cinder with lower performance.
Cinder
Ephemeral Disk
MapReduce MapReduce
SSG / STO / BDT
Current Issue
~30%
Performance Loss
We use Sahara with KVM to create a Hadoop
Cluster(HDFS in Ephemeral Disk) and compare
with a Bare Metal Hadoop in the same servers.
Different workloads(Hi-Bench) may shown
different results.
SSG / STO / BDT
Beyond The Performance…
Performance may always be an issue compare with Hypervisor and Bare Metal
SSG / STO / BDT
IT Integration
Sahara must provide an elastic platform
to fulfill the customer’s request and to
adopt big data’s infrastructure. To
support more technologies can help
Sahara seamless integrating to
customer’s IT environment.
EDP should provide a simple interface
to help data scientists only need to
focus on their own expertise and no
worry about how to deploying clusters.
Analytics-as-a-Service is a trend in the
future.
Workload-based EDP
SSG / STO / BDT
MORE …
Bare Metal Support
• OpenStack Ironic
Docker Support
• Nova-docker driver, OpenStack Magnum
Support More Storage Backend
• OpenStack Manila, External HDFS
Complete to Support More Data Processing Model
• Hadoop, Spark, …etc
SSG / STO / BDT
WHAT’S NEW IN KILO
• Vanilla support Hadoop v1.2.1 and Hadoop 2.6
• Spark Plugin
• Cloudera CDH Plugin
• MapR Plugin
• Storm Plugin
• New Horizon UI with New Guide Panel
• Default Template Support
20150314 sahara intro and the future plan for open stack meetup

20150314 sahara intro and the future plan for open stack meetup

  • 1.
    Big Data Technologies SaharaIntro & Future Plan Weiting Chen weiting.chen@intel.com
  • 2.
    SSG / STO/ BDT Legal Disclaimers No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps. The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request. © 2015 Intel Corporation.
  • 3.
    SSG / STO/ BDT WHO WE ARE Bring Cloudera CDH 5.3 Plugin into OpenStack Sahara Complete to add all the services in Cloudera CDH 5.3 and integrate them into Sahara CDH Plugin Provide Complete Integration Test to Help a Better User Experience A complete integration testing in OpenStack Sahara to help deliver a good user experience in Sahara CDH Plugin Rank #3 Commits Company in Sahara Contribution Ranked after #1 Mirantis and #2 Red Hat
  • 4.
    SSG / STO/ BDT OPENSTACK HISTORY Austin Bexar Cactus Diablo Essex Folsom Grizzly Havana Icehouse Juno Kilo Nova Swift Glance Horizon Keystone Quantum Cinder Ceilometer Trove Sahara Ironic • Zaqar • Manila • Designate • Barbican Incubation 2010 2011 2012 2013 2014 2015
  • 5.
    SSG / STO/ BDT Move Focus from IaaS to PaaS and SaaS more and more applications(xxx-as-a-service) based on OpenStack infrastructure
  • 6.
    SSG / STO/ BDT ~ 25.9% CAGR Big Data Market expects to grow from 16.5 billion (2014) to 41.5 billion (2018), it also includes cloud infrastructure segment from 1.2 billion (2014) to 4.7 billion (2018) 200 Billion Cloud market will hit 118 billion in 2015, 200 billion by 2018, from 95.8 million market reached in 2014. Trend Source from IDC 2014 Cloud-based solution will shape IT spending for years. IDC estimates cloud services spending will continue to grow at double-digit rates for the next few years. FROM THE MARKET Big Data Cloud Market X-as-a-Service
  • 7.
    SSG / STO/ BDT Big DataInternet Of Thing THE VISION Cloud Computing Different data source will come from diversity of devices. Using data processing model to process the data and transfer it become high value. A shared resources infrastructure to support a flexible IT environment and fulfill the requirement on demand.
  • 8.
    SSG / STO/ BDT OpenStack vs Hadoop Most Companies using OpenStack cluster in their IT environment are also preparing another Hadoop cluster for Big Data analytics. Sahara is a solution to bring Hadoop and OpenStack together.
  • 9.
    SSG / STO/ BDT SAHARA BACKGROUND Basic Idea comes from Amazon Elastic MapReduce (EMR) To provide users easily provisioning Hadoop clusters by specifying several parameters Analytics as a Service for data scientist or analyst
  • 10.
    SSG / STO/ BDT ARCHITECTURE
  • 11.
    SSG / STO/ BDT Sahara Key Features - Provision Cluster Create/Terminate Cluster • Heat API/Nova Direct API • Neutron/Nova Network • Floating IP Management • Anti-affinity Cluster Scaling • Add Node/Remove Node Support Plugins • Vanilla/Hortonworks Data Platform/Cloudera/Spark/MapR
  • 12.
    SSG / STO/ BDT Sahara Key Features - Elastic Data Processing Support Job Type • Hive/Pig/MapReduce/MapReduce Streaming/Java/Spark/Shell/HBase Support Data Locality • Rack/Hypervisor/Swift Data Source • Internal: Ephemeral Disk/Cinder • External: Swift Run Job in Transient Cluster *Different Plugin provide different capabilities
  • 13.
    SSG / STO/ BDT WORKING FLOW Fast Cluster Provisioning Select Hadoop Version Select Base Image w/ Hadoop Define Cluster Configuration Provision Cluster Operate Cluster Terminate Cluster Analytic as a Service using Elastic Data Processing Select Hadoop Version Configure Jobs Set Limit for Cluster Execute Jobs Get The Result • Choose type of the job: pig, hive, jar-file, etc. • Select input and output data location (Swift support) • Cluster will be removed automatically after the job completion • Provide the details Hadoop configuration, like size, topology, and others • Sahara will provision VMs, install and configure Hadoop • Support Scale out Cluster to add/remove nodes
  • 14.
    SSG / STO/ BDT CLOUDERA CDH PLUGIN Controller Computing Node1 VM1 - Master VM2 - Slave Cloudera Manager (Cloudera Express v5.1.3, CDH v5.0.0 & CM API v7) Job History Resource Manager Oozie Server Name Node Secondary Name Node Data Node Node Manager Cloudera Manager API Python Client (Migrate from CM-API Client) Sahara Service Horizon(OpenStack Dashboard) CDH Plugin Step1: Create VM via Heat by using Cluster Template. CM must be included in one master machine. Step2: Use CM API Client to connect to CM and provision the other services in the cluster. STEP1 STEP2 CDH ClusterEnd Customer
  • 15.
    SSG / STO/ BDT DATA PROCESSING MODEL Swift OpenStack Virtual Clusters OpenStack Virtual Clusters HDFS Collector Agent Data Stream Pattern 2: External - SwiftPattern 1: Internal - HDFS Only Collector Agent Collecting Data Collecting Data OpenStack use Swift as a data source to store input and output data. The benefit is to process the data directly and persist the data via Swift. OpenStack support to create HDFS on Cinder or Ephemeral Disk. This method can provide a better data processing performance via Ephemeral Disk or to persist the data via Cinder with lower performance. Cinder Ephemeral Disk MapReduce MapReduce
  • 16.
    SSG / STO/ BDT Current Issue ~30% Performance Loss We use Sahara with KVM to create a Hadoop Cluster(HDFS in Ephemeral Disk) and compare with a Bare Metal Hadoop in the same servers. Different workloads(Hi-Bench) may shown different results.
  • 17.
    SSG / STO/ BDT Beyond The Performance… Performance may always be an issue compare with Hypervisor and Bare Metal
  • 18.
    SSG / STO/ BDT IT Integration Sahara must provide an elastic platform to fulfill the customer’s request and to adopt big data’s infrastructure. To support more technologies can help Sahara seamless integrating to customer’s IT environment. EDP should provide a simple interface to help data scientists only need to focus on their own expertise and no worry about how to deploying clusters. Analytics-as-a-Service is a trend in the future. Workload-based EDP
  • 19.
    SSG / STO/ BDT MORE … Bare Metal Support • OpenStack Ironic Docker Support • Nova-docker driver, OpenStack Magnum Support More Storage Backend • OpenStack Manila, External HDFS Complete to Support More Data Processing Model • Hadoop, Spark, …etc
  • 20.
    SSG / STO/ BDT WHAT’S NEW IN KILO • Vanilla support Hadoop v1.2.1 and Hadoop 2.6 • Spark Plugin • Cloudera CDH Plugin • MapR Plugin • Storm Plugin • New Horizon UI with New Guide Panel • Default Template Support

Editor's Notes

  • #4 IOT-BIG DATA-CLOUD COMPUTING
  • #7 By 2016, 11% IT budget away from traditional in house IT towards cloud based solution By 2017, 35% of new applications will use cloud-enabled
  • #16 Support External HDFS, but needs to have some configurations manually
  • #17 The root cause about performance comes from the difference between KVM and Bare Metal.