SlideShare a Scribd company logo
© 2009 VMware Inc. All rights reserved
vSphere Big Data Extensions Deep Dive
路广
大数据研发高级经理
VMware中国研发中心
Get your Hadoop cluster in minutes
Hadoop Installation and
Configuration
Network Configuration
OS installation
Server preparation
Manual process, cost days
Fully automated process,
10 minutes to get a
Hadoop/HBase cluster from
scratch
1/1000 human efforts,
Least Hadoop operation knowledge
Automate by Serengeti on
vSphere with best practice
Serengeti deployment architecture
• Serengeti is packaged as virtual appliance, which can be easily
deployed on VC.
• Serengeti works as a VC extension and establishes SSL connection
with VC.
• Serengeti will clone VM from template and control/config VM through
VC.
Storage
Evolution of Hadoop on VMs – Data/Compute separation
Compute
Current
Hadoop:
Combined
Storage/Com
pute
Storage
T1 T2
VM VM VM
VMVM
VM
Hadoop in VM
- * VM lifecycle
determined
by Datanode
- * Limited elasticity
Separate Storage
- * Separate compute
from data
- * Remove elastic constrain
- by Datanode
- * Elastic compute
- * Raise utilization
Separate Compute Clusters
- * Separate virtual compute
- * Compute cluster per tenant
- * Stronger VM-grade security
and resource isolation
Slave Node
Elastic Scalability & Multi-Tenancy
Deploy separate compute clusters for different tenants sharing HDFS.
Commission/decommission compute nodes according to priority and
available resources
ExperimentationDynamic resourcepool
Data layer
Production
recommendation engine
Compute layer Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Compute
VM
Experimentation Production
Compute
VM
Job
Tracker
Job
Tracker
VMware vSphere + Serengeti
Serengeti architecture diagram
Rapid Deployment of a Hadoop/HBase Cluster with Serengeti
Done
Step 1: Deploy Serengeti virtual appliance on vSphere.
Step 2: A few clicks to stand up Hadoop Cluster.
Customizing your Hadoop/HBase cluster with Serengeti
 Choice of distros
 Storage configuration
• Choice of shared storage or Local disk
 Resource configuration
 High availability option
 # of nodes
…
"distro":"apache",
"groups":[
{ "name":"master",
"roles":[
"hadoop_namenode",
"hadoop_jobtracker”],
"storage": {
"type": "SHARED",
"sizeGB": 20},
"instance_type":MEDIUM,
"instance_num":1,
"ha":true},
{"name":"worker",
"roles":[
"hadoop_datanode",
"hadoop_tasktracker"
],
"instance_type":SMALL,
"instance_num":5,
"ha":false
…
Cluster creation workflow – VM creation
VM placement
Calculation
UI
CLI
Create cluster request
Host
Host
TT
DN
TT
Cluster Spec
{
groups”:[
“name”:
“roles”:
"placementPolicies": {
}
]
}
VC
DN
Query
resource
Serengeti
Web Service
VM Creation
Template VM Host
DN
TT
Query resource
Clone VM
Add disk
Configure VM
1
2
4
Clone VM
Clone VM
Add disk
Configure VM
Analyze
spec
3
Workflow - Hadoop Package Deployment
Serengeti Server
Package Server
Hadoop Nodes
Admin
1) download
hadoop tarballs or
create yum repo on
Package Server
2) config tarball urls
or yum repo urls for
each distro in
manifest file
3) run ‘cluster
create’ to create a
cluster for a hadoop
distro; save tarball
urls or yum repo
urls in Chef Server.
4) remotely ssh to Hadoop nodes
and execute chef-client
chef-client
5) read tarball urls or yum
repo urls from Chef Server,
then download and extract
hadoop tarballs to
/usr/lib/hadoop/ or yum
install rpms from Package
Server
6) generate hadoop
configuration files on all
nodes
7) start hadoop daemons
on all nodes
simultaneously with
synchronization between
NN, DDs, JT, TTsChef Server
Cluster creation workflow – Software installation
Ironfan
Software bootstrap request
Cluster Spec
for Ironfan
"cluster_data": {
"rack_topology_policy":
"NONE",
"groups": [
{
"name":
"ComputeMaster",
"roles": [
"hadoop_jobtracker"
],
"instances": [
{
"name": “sample-
ComputeMaster-0",
……}
}
"distro_package_repos": [
"http://<server
ip>mapr/2.1.3/mapr-
m5.repo"
],
……
DN1
Serengeti
Web Service
1
Analyze
spec
Ironfan
Thrift Service
Chef Server Package Server
Chef Client
TT1
Chef Client
2
Create
Chef
Nodes
SSH to
start chef
client
3
4
Login to Chef
server
Download
cookbook
REST API
5 5Execute
cookbook
DataNode
cookbook
TaskTracker
cookbook
Download bits
Hadoop
binary
Pig, Hive,
etc.
6
Cluster creation workflow – Software installation - continued
Ironfan
Software bootstrap request
DN1
Serengeti
Web Service
Ironfan
Thrift Service
Chef Server
Chef Client
TT1
Chef Client
7
Get properties
REST API
8 8
Configure Hadoop
Start Hadoop daemons with
synchronization between NN, DDs, JT, TTs
Get
bootstrap
status
Persist
bootstrap
staus
Bootstrap
status query
Serengeti
Web Service
Note: Software installation on all
nodes are executed
simultaneously
Configure/reconfigure Hadoop with ease by Serengeti
Modify Hadoop cluster configuration from Serengeti
• Use the “configuration” section of the json spec file
• Specify Hadoop attributes in core-site.xml, hdfs-site.xml, mapred-site.xml,
hadoop-env.sh, log4j.properties
• Apply new Hadoop configuration using the edited spec file
"configuration": {
"hadoop": {
"core-site.xml": {
// check for all settings at http://hadoop.apache.org/common/docs/r1.0.0/core-default.html
},
"hdfs-site.xml": {
// check for all settings at http://hadoop.apache.org/common/docs/r1.0.0/hdfs-default.html
},
"mapred-site.xml": {
// check for all settings at http://hadoop.apache.org/common/docs/r1.0.0/mapred-default.html
"io.sort.mb": "300"
} ,
"hadoop-env.sh": {
// "HADOOP_HEAPSIZE": "",
// "HADOOP_NAMENODE_OPTS": "",
// "HADOOP_DATANODE_OPTS": "",
…
> cluster config --name myHadoop --specFile /home/serengeti/myHadoop.json
Workflow - Tuning Hadoop Configuration
Serengeti Server Hadoop Nodes
Admin
1) run ‘cluster export’
to export cluster spec
and set hadoop conf
params in the spec.
2) run ‘cluster config’
to apply the new
hadoop configuration
to the whole cluster
or a node group of
the cluster.
3) save new hadoop
configuration into
Chef Server.
4) remotely ssh to hadoop nodes
and execute chef-client
chef-client
5) read hadoop configuration
from Chef Server
6) generate new hadoop
configuration files on all
nodes
7) restart corresponding
hadoop daemons on all
nodes simultaneously to
apply the new configuration
Chef Server
Rolling operation
Rolling operation works on one node each time, which does not
impact whole cluster job execution.
Supported functions:
• Cluster scale up/down
• Cluster fix
Workflow
• The workflow for each node is similar to whole cluster operation.
• Only when one node finishes all steps, the other node will start.
• Node will be restarted during the operation.
One click to scale out your cluster with Serengeti
Easily scale out using Serengeti
Host Host Host Host Host
Virtualization Platform
NN JT
• Use Case:
 When the cluster capacity is not big enough
 New hardware is available
• Through Serengeti
 One click in UI to scale out cluster
worker worker worker worker
Virtualization Platform
VC adapter
Leverage VLSI to connect VC
Have VC object cache to improve VC query performance
Listen for VC event
• VM power on, VM power off, VM creation, etc.
• If VM status is changed from VC outside of Serengeti, cluster list can
immediately show the VM status change
VM placement - Fine control of DC separation cluster
Constraint number of nodes on each host
Group association:
• Put compute nodes close to data nodes
VM placement - Rack aware placement
Balance number of nodes across multiple racks
Disk placement
Host
DN CN
Even Split on local disks
Host
DN CN
Aggregate on shared storage
Separated system disk
Host
DN CN
Host
DN CN
System disk
Separated virtual system disks on
specified local storage
System disk
Data disks
Data disks
Separated virtual system disks on
shared storage
VHM: Example Architecture
ESX ESX ESX
J
T
DATA VM DATA VM DATA VM
Local Disks
SAN/NAS Non-Hadoop VMs
Hadoop Compute VMs
JT: JobTracker
TT: TaskTracker
NN: NameNode
VHM: Virtual Hadoop Manager
N
N
T
T
T
T
T
T
VirtualCenter Management Server
DRS DRS DRSDRS DRS
V
H
M
Hadoop HDFS VMs
T
T
T
T
T
T
J
T
Virtual Hadoop Manager
State, stats
(Slots used,
Pending work)
Commands
(Decommission,
Recommission)
Stats and VM
configuration
Serengeti Job
Tracker
vCenter DB
Manual/Auto
Power on/off
Virtual Hadoop Manager (VHM)
Job
Tracker
Task
Tracker
Task
Tracker
Task
Tracker
vCenter Server
Serengeti
Configuration
VC
state and stats
Hadoop
state and stats
VC
actions
Hadoop
actions
Algorithms
Cluster
Configuration
Q&A

More Related Content

What's hot

Wordpress optimization
Wordpress optimizationWordpress optimization
Wordpress optimization
Almog Baku
 
Whirr dev-up-puppetconf2011
Whirr dev-up-puppetconf2011Whirr dev-up-puppetconf2011
Whirr dev-up-puppetconf2011
Puppet
 
Automation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure DataAutomation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure Data
Yan Wang
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jkEdureka!
 
Hazelcast Introduction
Hazelcast IntroductionHazelcast Introduction
Hazelcast Introduction
CodeOps Technologies LLP
 
Breaking IO Performance Barriers: Scalable Parallel File System for AWS
Breaking IO Performance Barriers: Scalable Parallel File System for AWSBreaking IO Performance Barriers: Scalable Parallel File System for AWS
Breaking IO Performance Barriers: Scalable Parallel File System for AWS
Amazon Web Services
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
Douglas Bernardini
 
What is Trove, the Database as a Service on OpenStack?
What is Trove, the Database as a Service on OpenStack?What is Trove, the Database as a Service on OpenStack?
What is Trove, the Database as a Service on OpenStack?
OpenStack_Online
 
Cloudera hadoop installation
Cloudera hadoop installationCloudera hadoop installation
Cloudera hadoop installation
Sumitra Pundlik
 
Improving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxImproving Hadoop Performance via Linux
Improving Hadoop Performance via Linux
Alex Moundalexis
 
HBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBaseHBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBase
HBaseCon
 
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
Amazon Web Services
 
Benchmarking MongoDB and CouchBase
Benchmarking MongoDB and CouchBaseBenchmarking MongoDB and CouchBase
Benchmarking MongoDB and CouchBaseChristopher Choi
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon
 
Cloud computing 3702
Cloud computing 3702Cloud computing 3702
Cloud computing 3702Jess Coburn
 
Building and Deploying Application to Apache Mesos
Building and Deploying Application to Apache MesosBuilding and Deploying Application to Apache Mesos
Building and Deploying Application to Apache Mesos
Joe Stein
 
Combining Real-time and Batch Analytics with NoSQL, Storm and Hadoop - NoSQL ...
Combining Real-time and Batch Analytics with NoSQL, Storm and Hadoop - NoSQL ...Combining Real-time and Batch Analytics with NoSQL, Storm and Hadoop - NoSQL ...
Combining Real-time and Batch Analytics with NoSQL, Storm and Hadoop - NoSQL ...
Aerospike
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteDataWorks Summit
 

What's hot (19)

Wordpress optimization
Wordpress optimizationWordpress optimization
Wordpress optimization
 
Whirr dev-up-puppetconf2011
Whirr dev-up-puppetconf2011Whirr dev-up-puppetconf2011
Whirr dev-up-puppetconf2011
 
Automation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure DataAutomation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure Data
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jk
 
Hazelcast Introduction
Hazelcast IntroductionHazelcast Introduction
Hazelcast Introduction
 
Breaking IO Performance Barriers: Scalable Parallel File System for AWS
Breaking IO Performance Barriers: Scalable Parallel File System for AWSBreaking IO Performance Barriers: Scalable Parallel File System for AWS
Breaking IO Performance Barriers: Scalable Parallel File System for AWS
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
 
What is Trove, the Database as a Service on OpenStack?
What is Trove, the Database as a Service on OpenStack?What is Trove, the Database as a Service on OpenStack?
What is Trove, the Database as a Service on OpenStack?
 
Cloudera hadoop installation
Cloudera hadoop installationCloudera hadoop installation
Cloudera hadoop installation
 
Improving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxImproving Hadoop Performance via Linux
Improving Hadoop Performance via Linux
 
ha_module5
ha_module5ha_module5
ha_module5
 
HBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBaseHBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBase
 
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
 
Benchmarking MongoDB and CouchBase
Benchmarking MongoDB and CouchBaseBenchmarking MongoDB and CouchBase
Benchmarking MongoDB and CouchBase
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
 
Cloud computing 3702
Cloud computing 3702Cloud computing 3702
Cloud computing 3702
 
Building and Deploying Application to Apache Mesos
Building and Deploying Application to Apache MesosBuilding and Deploying Application to Apache Mesos
Building and Deploying Application to Apache Mesos
 
Combining Real-time and Batch Analytics with NoSQL, Storm and Hadoop - NoSQL ...
Combining Real-time and Batch Analytics with NoSQL, Storm and Hadoop - NoSQL ...Combining Real-time and Batch Analytics with NoSQL, Storm and Hadoop - NoSQL ...
Combining Real-time and Batch Analytics with NoSQL, Storm and Hadoop - NoSQL ...
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
 

Similar to 3. v sphere big data extensions

Guaranteeing Storage Performance by Mike Tutkowski
Guaranteeing Storage Performance by Mike TutkowskiGuaranteeing Storage Performance by Mike Tutkowski
Guaranteeing Storage Performance by Mike Tutkowski
buildacloud
 
Postgres the hardway
Postgres the hardwayPostgres the hardway
Postgres the hardway
Dave Pitts
 
OSDC 2015: Mitchell Hashimoto | Automating the Modern Datacenter, Development...
OSDC 2015: Mitchell Hashimoto | Automating the Modern Datacenter, Development...OSDC 2015: Mitchell Hashimoto | Automating the Modern Datacenter, Development...
OSDC 2015: Mitchell Hashimoto | Automating the Modern Datacenter, Development...
NETWAYS
 
SharePoint 2010 Virtualization - Hungarian SharePoint User Group
SharePoint 2010 Virtualization - Hungarian SharePoint User GroupSharePoint 2010 Virtualization - Hungarian SharePoint User Group
SharePoint 2010 Virtualization - Hungarian SharePoint User Group
Michael Noel
 
TIAD : Automating the modern datacenter
TIAD : Automating the modern datacenterTIAD : Automating the modern datacenter
TIAD : Automating the modern datacenter
The Incredible Automation Day
 
DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...
DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...
DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...
Timofey Turenko
 
Guaranteeing CloudStack Storage Performance
Guaranteeing CloudStack Storage Performance Guaranteeing CloudStack Storage Performance
Guaranteeing CloudStack Storage Performance
NetApp
 
AAI-3218 Production Deployment Best Practices for WebSphere Liberty Profile
AAI-3218 Production Deployment Best Practices for WebSphere Liberty ProfileAAI-3218 Production Deployment Best Practices for WebSphere Liberty Profile
AAI-3218 Production Deployment Best Practices for WebSphere Liberty Profile
WASdev Community
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
outstanding59
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldRichard McDougall
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)
outstanding59
 
Deploying your web application with AWS ElasticBeanstalk
Deploying your web application with AWS ElasticBeanstalkDeploying your web application with AWS ElasticBeanstalk
Deploying your web application with AWS ElasticBeanstalk
Julien SIMON
 
Deep dive into AWS fargate
Deep dive into AWS fargateDeep dive into AWS fargate
Deep dive into AWS fargate
Amazon Web Services
 
Storage and hyper v - the choices you can make and the things you need to kno...
Storage and hyper v - the choices you can make and the things you need to kno...Storage and hyper v - the choices you can make and the things you need to kno...
Storage and hyper v - the choices you can make and the things you need to kno...Louis Göhl
 
VMware Virtual SAN Presentation
VMware Virtual SAN PresentationVMware Virtual SAN Presentation
VMware Virtual SAN Presentation
virtualsouthwest
 
UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...
UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...
UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...
Ivanti
 
VMware vSphere - Adam Grare - ManageIQ Design Summit 2016
VMware vSphere - Adam Grare - ManageIQ Design Summit 2016VMware vSphere - Adam Grare - ManageIQ Design Summit 2016
VMware vSphere - Adam Grare - ManageIQ Design Summit 2016
ManageIQ
 
Automating Your CloudStack Cloud with Puppet
Automating Your CloudStack Cloud with PuppetAutomating Your CloudStack Cloud with Puppet
Automating Your CloudStack Cloud with Puppet
buildacloud
 
Deploying windows containers with kubernetes
Deploying windows containers with kubernetesDeploying windows containers with kubernetes
Deploying windows containers with kubernetes
Ben Hall
 

Similar to 3. v sphere big data extensions (20)

Guaranteeing Storage Performance by Mike Tutkowski
Guaranteeing Storage Performance by Mike TutkowskiGuaranteeing Storage Performance by Mike Tutkowski
Guaranteeing Storage Performance by Mike Tutkowski
 
Postgres the hardway
Postgres the hardwayPostgres the hardway
Postgres the hardway
 
OSDC 2015: Mitchell Hashimoto | Automating the Modern Datacenter, Development...
OSDC 2015: Mitchell Hashimoto | Automating the Modern Datacenter, Development...OSDC 2015: Mitchell Hashimoto | Automating the Modern Datacenter, Development...
OSDC 2015: Mitchell Hashimoto | Automating the Modern Datacenter, Development...
 
SharePoint 2010 Virtualization - Hungarian SharePoint User Group
SharePoint 2010 Virtualization - Hungarian SharePoint User GroupSharePoint 2010 Virtualization - Hungarian SharePoint User Group
SharePoint 2010 Virtualization - Hungarian SharePoint User Group
 
TIAD : Automating the modern datacenter
TIAD : Automating the modern datacenterTIAD : Automating the modern datacenter
TIAD : Automating the modern datacenter
 
DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...
DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...
DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...
 
Guaranteeing CloudStack Storage Performance
Guaranteeing CloudStack Storage Performance Guaranteeing CloudStack Storage Performance
Guaranteeing CloudStack Storage Performance
 
vSphere
vSpherevSphere
vSphere
 
AAI-3218 Production Deployment Best Practices for WebSphere Liberty Profile
AAI-3218 Production Deployment Best Practices for WebSphere Liberty ProfileAAI-3218 Production Deployment Best Practices for WebSphere Liberty Profile
AAI-3218 Production Deployment Best Practices for WebSphere Liberty Profile
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)
 
Deploying your web application with AWS ElasticBeanstalk
Deploying your web application with AWS ElasticBeanstalkDeploying your web application with AWS ElasticBeanstalk
Deploying your web application with AWS ElasticBeanstalk
 
Deep dive into AWS fargate
Deep dive into AWS fargateDeep dive into AWS fargate
Deep dive into AWS fargate
 
Storage and hyper v - the choices you can make and the things you need to kno...
Storage and hyper v - the choices you can make and the things you need to kno...Storage and hyper v - the choices you can make and the things you need to kno...
Storage and hyper v - the choices you can make and the things you need to kno...
 
VMware Virtual SAN Presentation
VMware Virtual SAN PresentationVMware Virtual SAN Presentation
VMware Virtual SAN Presentation
 
UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...
UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...
UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...
 
VMware vSphere - Adam Grare - ManageIQ Design Summit 2016
VMware vSphere - Adam Grare - ManageIQ Design Summit 2016VMware vSphere - Adam Grare - ManageIQ Design Summit 2016
VMware vSphere - Adam Grare - ManageIQ Design Summit 2016
 
Automating Your CloudStack Cloud with Puppet
Automating Your CloudStack Cloud with PuppetAutomating Your CloudStack Cloud with Puppet
Automating Your CloudStack Cloud with Puppet
 
Deploying windows containers with kubernetes
Deploying windows containers with kubernetesDeploying windows containers with kubernetes
Deploying windows containers with kubernetes
 

More from Chiou-Nan Chen

Moving NEON to 64 bits
Moving NEON to 64 bitsMoving NEON to 64 bits
Moving NEON to 64 bits
Chiou-Nan Chen
 
64-bit Android
64-bit Android64-bit Android
64-bit Android
Chiou-Nan Chen
 
Intelligent Power Allocation
Intelligent Power AllocationIntelligent Power Allocation
Intelligent Power Allocation
Chiou-Nan Chen
 
4. v sphere big data extensions hadoop
4. v sphere big data extensions   hadoop4. v sphere big data extensions   hadoop
4. v sphere big data extensions hadoop
Chiou-Nan Chen
 
2. hadoop
2. hadoop2. hadoop
2. hadoop
Chiou-Nan Chen
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoop
Chiou-Nan Chen
 
5. pivotal hd 2013
5. pivotal hd 20135. pivotal hd 2013
5. pivotal hd 2013
Chiou-Nan Chen
 

More from Chiou-Nan Chen (20)

Moving NEON to 64 bits
Moving NEON to 64 bitsMoving NEON to 64 bits
Moving NEON to 64 bits
 
64-bit Android
64-bit Android64-bit Android
64-bit Android
 
Intelligent Power Allocation
Intelligent Power AllocationIntelligent Power Allocation
Intelligent Power Allocation
 
4. v sphere big data extensions hadoop
4. v sphere big data extensions   hadoop4. v sphere big data extensions   hadoop
4. v sphere big data extensions hadoop
 
2. hadoop
2. hadoop2. hadoop
2. hadoop
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoop
 
5. pivotal hd 2013
5. pivotal hd 20135. pivotal hd 2013
5. pivotal hd 2013
 
Emc keynote 1130 1200
Emc keynote 1130 1200Emc keynote 1130 1200
Emc keynote 1130 1200
 
Emc keynote 1030 1130
Emc keynote 1030 1130Emc keynote 1030 1130
Emc keynote 1030 1130
 
Emc keynote 0945 1030
Emc keynote 0945 1030Emc keynote 0945 1030
Emc keynote 0945 1030
 
Emc keynote 0930 0945
Emc keynote 0930 0945Emc keynote 0930 0945
Emc keynote 0930 0945
 
102 1600-1630
102 1600-1630102 1600-1630
102 1600-1630
 
102 1530-1600
102 1530-1600102 1530-1600
102 1530-1600
 
102 1430-1445
102 1430-1445102 1430-1445
102 1430-1445
 
102 1315-1345
102 1315-1345102 1315-1345
102 1315-1345
 
102 1630 1700
102 1630 1700102 1630 1700
102 1630 1700
 
102 1445 1515
102 1445 1515102 1445 1515
102 1445 1515
 
101 cd 1630-1700
101 cd 1630-1700101 cd 1630-1700
101 cd 1630-1700
 
101 cd 1600-1630
101 cd 1600-1630101 cd 1600-1630
101 cd 1600-1630
 
101 cd 1445-1515
101 cd 1445-1515101 cd 1445-1515
101 cd 1445-1515
 

Recently uploaded

PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 

Recently uploaded (20)

PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 

3. v sphere big data extensions

  • 1. © 2009 VMware Inc. All rights reserved vSphere Big Data Extensions Deep Dive 路广 大数据研发高级经理 VMware中国研发中心
  • 2. Get your Hadoop cluster in minutes Hadoop Installation and Configuration Network Configuration OS installation Server preparation Manual process, cost days Fully automated process, 10 minutes to get a Hadoop/HBase cluster from scratch 1/1000 human efforts, Least Hadoop operation knowledge Automate by Serengeti on vSphere with best practice
  • 3. Serengeti deployment architecture • Serengeti is packaged as virtual appliance, which can be easily deployed on VC. • Serengeti works as a VC extension and establishes SSL connection with VC. • Serengeti will clone VM from template and control/config VM through VC.
  • 4. Storage Evolution of Hadoop on VMs – Data/Compute separation Compute Current Hadoop: Combined Storage/Com pute Storage T1 T2 VM VM VM VMVM VM Hadoop in VM - * VM lifecycle determined by Datanode - * Limited elasticity Separate Storage - * Separate compute from data - * Remove elastic constrain - by Datanode - * Elastic compute - * Raise utilization Separate Compute Clusters - * Separate virtual compute - * Compute cluster per tenant - * Stronger VM-grade security and resource isolation Slave Node
  • 5. Elastic Scalability & Multi-Tenancy Deploy separate compute clusters for different tenants sharing HDFS. Commission/decommission compute nodes according to priority and available resources ExperimentationDynamic resourcepool Data layer Production recommendation engine Compute layer Compute VM Compute VM Compute VM Compute VM Compute VM Compute VM Compute VM Compute VM Compute VM Compute VM Compute VM Compute VM Compute VM Compute VM Compute VM Experimentation Production Compute VM Job Tracker Job Tracker VMware vSphere + Serengeti
  • 7. Rapid Deployment of a Hadoop/HBase Cluster with Serengeti Done Step 1: Deploy Serengeti virtual appliance on vSphere. Step 2: A few clicks to stand up Hadoop Cluster.
  • 8. Customizing your Hadoop/HBase cluster with Serengeti  Choice of distros  Storage configuration • Choice of shared storage or Local disk  Resource configuration  High availability option  # of nodes … "distro":"apache", "groups":[ { "name":"master", "roles":[ "hadoop_namenode", "hadoop_jobtracker”], "storage": { "type": "SHARED", "sizeGB": 20}, "instance_type":MEDIUM, "instance_num":1, "ha":true}, {"name":"worker", "roles":[ "hadoop_datanode", "hadoop_tasktracker" ], "instance_type":SMALL, "instance_num":5, "ha":false …
  • 9. Cluster creation workflow – VM creation VM placement Calculation UI CLI Create cluster request Host Host TT DN TT Cluster Spec { groups”:[ “name”: “roles”: "placementPolicies": { } ] } VC DN Query resource Serengeti Web Service VM Creation Template VM Host DN TT Query resource Clone VM Add disk Configure VM 1 2 4 Clone VM Clone VM Add disk Configure VM Analyze spec 3
  • 10. Workflow - Hadoop Package Deployment Serengeti Server Package Server Hadoop Nodes Admin 1) download hadoop tarballs or create yum repo on Package Server 2) config tarball urls or yum repo urls for each distro in manifest file 3) run ‘cluster create’ to create a cluster for a hadoop distro; save tarball urls or yum repo urls in Chef Server. 4) remotely ssh to Hadoop nodes and execute chef-client chef-client 5) read tarball urls or yum repo urls from Chef Server, then download and extract hadoop tarballs to /usr/lib/hadoop/ or yum install rpms from Package Server 6) generate hadoop configuration files on all nodes 7) start hadoop daemons on all nodes simultaneously with synchronization between NN, DDs, JT, TTsChef Server
  • 11. Cluster creation workflow – Software installation Ironfan Software bootstrap request Cluster Spec for Ironfan "cluster_data": { "rack_topology_policy": "NONE", "groups": [ { "name": "ComputeMaster", "roles": [ "hadoop_jobtracker" ], "instances": [ { "name": “sample- ComputeMaster-0", ……} } "distro_package_repos": [ "http://<server ip>mapr/2.1.3/mapr- m5.repo" ], …… DN1 Serengeti Web Service 1 Analyze spec Ironfan Thrift Service Chef Server Package Server Chef Client TT1 Chef Client 2 Create Chef Nodes SSH to start chef client 3 4 Login to Chef server Download cookbook REST API 5 5Execute cookbook DataNode cookbook TaskTracker cookbook Download bits Hadoop binary Pig, Hive, etc. 6
  • 12. Cluster creation workflow – Software installation - continued Ironfan Software bootstrap request DN1 Serengeti Web Service Ironfan Thrift Service Chef Server Chef Client TT1 Chef Client 7 Get properties REST API 8 8 Configure Hadoop Start Hadoop daemons with synchronization between NN, DDs, JT, TTs Get bootstrap status Persist bootstrap staus Bootstrap status query Serengeti Web Service Note: Software installation on all nodes are executed simultaneously
  • 13. Configure/reconfigure Hadoop with ease by Serengeti Modify Hadoop cluster configuration from Serengeti • Use the “configuration” section of the json spec file • Specify Hadoop attributes in core-site.xml, hdfs-site.xml, mapred-site.xml, hadoop-env.sh, log4j.properties • Apply new Hadoop configuration using the edited spec file "configuration": { "hadoop": { "core-site.xml": { // check for all settings at http://hadoop.apache.org/common/docs/r1.0.0/core-default.html }, "hdfs-site.xml": { // check for all settings at http://hadoop.apache.org/common/docs/r1.0.0/hdfs-default.html }, "mapred-site.xml": { // check for all settings at http://hadoop.apache.org/common/docs/r1.0.0/mapred-default.html "io.sort.mb": "300" } , "hadoop-env.sh": { // "HADOOP_HEAPSIZE": "", // "HADOOP_NAMENODE_OPTS": "", // "HADOOP_DATANODE_OPTS": "", … > cluster config --name myHadoop --specFile /home/serengeti/myHadoop.json
  • 14. Workflow - Tuning Hadoop Configuration Serengeti Server Hadoop Nodes Admin 1) run ‘cluster export’ to export cluster spec and set hadoop conf params in the spec. 2) run ‘cluster config’ to apply the new hadoop configuration to the whole cluster or a node group of the cluster. 3) save new hadoop configuration into Chef Server. 4) remotely ssh to hadoop nodes and execute chef-client chef-client 5) read hadoop configuration from Chef Server 6) generate new hadoop configuration files on all nodes 7) restart corresponding hadoop daemons on all nodes simultaneously to apply the new configuration Chef Server
  • 15. Rolling operation Rolling operation works on one node each time, which does not impact whole cluster job execution. Supported functions: • Cluster scale up/down • Cluster fix Workflow • The workflow for each node is similar to whole cluster operation. • Only when one node finishes all steps, the other node will start. • Node will be restarted during the operation.
  • 16. One click to scale out your cluster with Serengeti
  • 17. Easily scale out using Serengeti Host Host Host Host Host Virtualization Platform NN JT • Use Case:  When the cluster capacity is not big enough  New hardware is available • Through Serengeti  One click in UI to scale out cluster worker worker worker worker Virtualization Platform
  • 18. VC adapter Leverage VLSI to connect VC Have VC object cache to improve VC query performance Listen for VC event • VM power on, VM power off, VM creation, etc. • If VM status is changed from VC outside of Serengeti, cluster list can immediately show the VM status change
  • 19. VM placement - Fine control of DC separation cluster Constraint number of nodes on each host Group association: • Put compute nodes close to data nodes
  • 20. VM placement - Rack aware placement Balance number of nodes across multiple racks
  • 21. Disk placement Host DN CN Even Split on local disks Host DN CN Aggregate on shared storage
  • 22. Separated system disk Host DN CN Host DN CN System disk Separated virtual system disks on specified local storage System disk Data disks Data disks Separated virtual system disks on shared storage
  • 23. VHM: Example Architecture ESX ESX ESX J T DATA VM DATA VM DATA VM Local Disks SAN/NAS Non-Hadoop VMs Hadoop Compute VMs JT: JobTracker TT: TaskTracker NN: NameNode VHM: Virtual Hadoop Manager N N T T T T T T VirtualCenter Management Server DRS DRS DRSDRS DRS V H M Hadoop HDFS VMs T T T T T T J T
  • 24. Virtual Hadoop Manager State, stats (Slots used, Pending work) Commands (Decommission, Recommission) Stats and VM configuration Serengeti Job Tracker vCenter DB Manual/Auto Power on/off Virtual Hadoop Manager (VHM) Job Tracker Task Tracker Task Tracker Task Tracker vCenter Server Serengeti Configuration VC state and stats Hadoop state and stats VC actions Hadoop actions Algorithms Cluster Configuration
  • 25. Q&A

Editor's Notes

  1. Simple description for Key modules: Web service running above Tomcat, is the central controller of cluster management workflow, which is leveraging Spring Batch library. The VM placement algorithm, disk placement policy are processed in VM placement module in WS layer. Serengeti is talking with VC through VC adapter layer, which maintain several VC sessions to execute different VC tasks and listen for VC events. Serengeti is distro neutral, so the hadoop software is installed and configured after VM is created. Open source project Chef and Ironfan are leveraged to install and configure hadoop services. Chef is a popular distribute software configuration tool. 5. Runtime Manager is responsible for hadoop cluster elasticity control. Serengeti is talking with VHM through rabbitMQ.
  2. Chef Server and Package Server are now deployed in the same VM of Serengeti Server. They can be deployed on separate VMs to support large scale cluster (200+ nodes)
  3. Step 4: chef client connect to chef server, and download cookbook through REST API.
  4. Chef provide flexible software deployment and configuration mechanism, so it’s easy to add more services into Serengeti. During VM placement, embed several performance improvement configuration based on host and VM CPU/Memory size.
  5. At this stage, you will constantly configuring and reconfigure your cluster to tune for optimal results. With sergenti, this process is very simple. Taking the json spec file I showed earlier, you can specify the various hadoop attributes through xml file and apply these new configuration to the cluster. We will automatically change the hadoop cluster according to your specification, and the changes are propagated to the entire cluster. You don’t need to do reconfigure one node at a time.
  6. Sample Hadoop Configuration: { … … // we suggest running convert-hadoop-conf.rb to generate "configuration" section and paste the output here "configuration": { "hadoop": { "core-site.xml": { // check for all settings at http://hadoop.apache.org/docs/stable/core-default.html // note: any value (int, float, boolean, string) must be enclosed in double quotes and here is a sample: // "io.file.buffer.size": "4096" }, "hdfs-site.xml": { // check for all settings at http://hadoop.apache.org/docs/stable/hdfs-default.html }, "mapred-site.xml": { // check for all settings at http://hadoop.apache.org/docs/stable/mapred-default.html }, "hadoop-env.sh": { // "HADOOP_HEAPSIZE": "", // "HADOOP_NAMENODE_OPTS": "", // "HADOOP_DATANODE_OPTS": "", // "HADOOP_SECONDARYNAMENODE_OPTS": "", // "HADOOP_JOBTRACKER_OPTS": "", // "HADOOP_TASKTRACKER_OPTS": "", // "HADOOP_CLASSPATH": "", // "JAVA_HOME": "", // "PATH": "" }, "log4j.properties": { // "hadoop.root.logger": "INFO,RFA", // "log4j.appender.RFA.MaxBackupIndex": "10", // "log4j.appender.RFA.MaxFileSize": "100MB", // "hadoop.security.logger": "DEBUG,DRFA" }, "fair-scheduler.xml": { // check for all settings at http://hadoop.apache.org/docs/stable/fair_scheduler.html // "text": "the full content of fair-scheduler.xml in one line" }, "capacity-scheduler.xml": { // check for all settings at http://hadoop.apache.org/docs/stable/capacity_scheduler.html }, "mapred-queue-acls.xml": { // check for all settings at http://hadoop.apache.org/docs/stable/cluster_setup.html#Configuring+the+Hadoop+Daemons // "mapred.queue.queue-name.acl-submit-job": "", // "mapred.queue.queue-name.acl-administer-jobs", "" } } } }
  7. Not configurable to choose which disk placement rule.
  8. The separated system disk can be configured in cluster spec at node group level as following: dsNames4System:<ds name used to put system disk> dsNames4Data:<ds name used to put data disk> If these attribute is not set, default value will be used.