Introduction and agenda Ops benefits Tech benefits Architecture Use cases Demo video Hybrid data model Current directions Q&A Supplementals
Adobe is a Big Data company. Adobe adopting a virtualization approach of Hadoop has both business and technical justifications and allows competitive differentiation. Analytics is core competency of DMBU.
Rapid provisioning: Much of the cluster deployment process can be automated using existing tools. High availability: HA protection can be provided through the virtualization platform to protect the single points of failure in the Hadoop system. Elasticity: Hadoop capacity can be scaled up and down on demand in a virtual environment. Multi-tenancy: Different tenants running Hadoop can be isolated in separate VMs, providing stronger VM-grade resource and security isolation.
Expecting a lot of questions on this one and halfway through, so good time for intermediate Q&A if Chris wants to discuss some of the physical design. We can defer questions on use-cases and workflows since those will be immediately following.
Prod and dev review
Video walkthrough of vCAC deployment and auto-discovery via Cloudera Manager
Hybrid storage model to get the both of both worlds Or for flexibility Master nodes: NameNode, JobTracker on shared storage Leverage vSphere vMotion, HA and FT Slave nodes TaskTracker, DataNode on local storage Lower cost, scalable bandwidth
Identify acronyms, DMBU and vCAC first. Integration with Adobe DMBU Private Cloud: IaaS environment leveraging VMware stack (vCAC + vCOPs + vCenter). HDFS Storage Integration: Storage team is currently managing >10PB of data on Isilon. Presenting this layer, via HDFS, to multiple product teams from a single-view. Service Blueprints in vCAC: Offering multiple blueprints for various cluster types and sizes within vCAC. Present these blueprints to the Service Catalog and our internal self-provisioning portal.
Contributed back to Hadoop community
BDE Components Supplemental
1. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Hadoop-as-a-Service for Lifecycle Management Simplicity
Chris Mutchler | Adobe Compute Platform Engineer | @chrismutchler
Andrew Nelson | VMware Staff Systems Engineer | @vmwnelson | virtual-hiking.blogspot.com
2. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Operational Approach to Virtualizing Hadoop
Why even bother?
These are the four reasons why I wanted to tackle this
» Excited about the idea of developing
internal Platform-as-a-Service offering.
» Solves a common “shadow IT” problem in
infrastructure organizations and save $$$.
» Adobe is a Big Data company, it makes
sense for us to have a Hadoop offering.
» It’s bleeding edge. Innovate quickly
3. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Benefits of Virtualizing Hadoop
4. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Private Cloud Resource Pools
VMware vCloud Automation Center
VMware Big Data Extensions
5. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Two unique use-cases
Production A Production X
Service A… …Service X
Environment: Multiple teams with
zero Hadoop experience with a desire
to investigate Hadoop.
Production Environment: Digital
Marketing products looking to take
advantage of existing data managed
6. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
3rd Party Integrated Deployment
7. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
What are the two questions op teams must answer?
»Where is my data?
»How do I access it?
Local StorageShared Storage
8. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
• Integration with Adobe DMBU Private Cloud
• HDFS Storage Integration
• Service Blueprints in vCAC
Data Layer – Hadoop on Isilon
Elastic Virtual Compute Layer
9. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
10. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
 VMware vSphere BDE web site
 Virtualized Hadoop Performance with VMware vSphere 5.1
 Benchmarking Case Study of Virtualized Hadoop Performance on vSphere 5
 Hadoop Virtualization Extensions (HVE) :
 Apache Hadoop High Availability Solution on VMware vSphere 5.1
 Hadoop-as-a-Service workflows for vCloud Automation Center
 Project Serengeti website
VMware vSphere BDE and Hadoop Resources
11. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Hadoop Virtual Extensions
 Topology Extensions:
• Enable Hadoop to recognize additional virtualization layer for read/write/balancing for proper
• Enable compute/data node separation without losing locality
 Elasticity Extensions:
• Ability to dynamically adjust resources allocated (CPU, memory, map/reduce slots) to
• Enables runtime elasticity of Hadoop nodes
12. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
HVE Adds a New Layer in Hadoop Network Topology
• D = data center
• R = rack
• NG = node group
• HG = node
N13N1 N2 N3 N4 N5 N6 N7 N8 N9 N10 N11 N12
R1 R2 R3 R4
NG1 NG2 NG3 NG4 NG5 NG6 NG7 NG8
13. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
vSphere Big Data Extensions Architecture
vCloud Automation Center
Big Data Extensions
14. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Stats and VM configuration
Virtual Hadoop Manager (VHM)
state and stats
state and stats
Resource Management Module