• Save
Hadoop-as-a-Service for Lifecycle Management Simplicity
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Hadoop-as-a-Service for Lifecycle Management Simplicity

on

  • 942 views

 

Statistics

Views

Total Views
942
Views on SlideShare
942
Embed Views
0

Actions

Likes
2
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Introduction and agenda <br /> Ops benefits <br /> Tech benefits <br /> Architecture <br /> Use cases <br /> Demo video <br /> Hybrid data model <br /> Current directions <br /> Q&A <br /> Supplementals <br />
  • Adobe is a Big Data company. <br /> Adobe adopting a virtualization approach of Hadoop has both business and technical justifications and allows competitive differentiation. <br /> Analytics is core competency of DMBU. <br /> <br />
  • Rapid provisioning: Much of the cluster deployment process can be automated using existing tools. <br /> High availability: HA protection can be provided through the virtualization platform to protect the single points of failure in the Hadoop system. <br /> Elasticity: Hadoop capacity can be scaled up and down on demand in a virtual environment. <br /> Multi-tenancy: Different tenants running Hadoop can be isolated in separate VMs, providing stronger VM-grade resource and security isolation. <br /> <br /> Operational Simplicity <br /> Rapid Deployment <br /> Self service tools <br /> Performance <br /> Maximize Resource Utilization <br /> True multi-tenancy <br /> Elastic scaling <br /> Avoid dedicated hardware <br /> VM-based isolation <br /> Increase resource utilization <br /> Architect Scalable Platform <br /> Deployment choice <br /> Maintain management flexibility at scale <br /> Control Costs <br /> Leverage toolsets <br /> Security <br /> <br />
  • Expecting a lot of questions on this one and halfway through, so good time for intermediate Q&A if Chris wants to discuss some of the physical design. We can defer questions on use-cases and workflows since those will be immediately following.
  • Prod and dev review
  • Video walkthrough of vCAC deployment and auto-discovery via Cloudera Manager
  • Hybrid storage model to get the both of both worlds <br /> Or for flexibility <br /> Master nodes: <br /> NameNode, JobTracker on shared storage <br /> Leverage vSphere vMotion, HA and FT <br /> Slave nodes <br /> TaskTracker, DataNode on local storage <br /> Lower cost, scalable bandwidth <br />
  • Identify acronyms, DMBU and vCAC first. <br /> Integration with Adobe DMBU Private Cloud: IaaS environment leveraging VMware stack (vCAC + vCOPs + vCenter). <br /> HDFS Storage Integration: Storage team is currently managing >10PB of data on Isilon. Presenting this layer, via HDFS, to multiple product teams from a single-view. <br /> Service Blueprints in vCAC: Offering multiple blueprints for various cluster types and sizes within vCAC. Present these blueprints to the Service Catalog and our internal self-provisioning portal. <br />
  • Q&A slide
  • Supplementary links
  • Contributed back to Hadoop community
  • HVE Supplemental
  • BDE Components Supplemental
  • BDE supplementary

Hadoop-as-a-Service for Lifecycle Management Simplicity Presentation Transcript

  • 1. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Hadoop-as-a-Service for Lifecycle Management Simplicity Chris Mutchler | Adobe Compute Platform Engineer | @chrismutchler Andrew Nelson | VMware Staff Systems Engineer | @vmwnelson | virtual-hiking.blogspot.com 1
  • 2. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Operational Approach to Virtualizing Hadoop 2 Why even bother? These are the four reasons why I wanted to tackle this problem. » Excited about the idea of developing internal Platform-as-a-Service offering. » Solves a common “shadow IT” problem in infrastructure organizations and save $$$. » Adobe is a Big Data company, it makes sense for us to have a Hadoop offering. » It’s bleeding edge. Innovate quickly and scale.
  • 3. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Benefits of Virtualizing Hadoop
  • 4. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Reference Architecture 4 ESXi Tier-0 Flash OS App OS App OS App Private Cloud Resource Pools VMware vCenter VMware vCloud Automation Center VMware Big Data Extensions
  • 5. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Adobe Use-Cases 5 Two unique use-cases Experimentation Production A Production X Test/Dev Production Test Production Test Experimentation Service A… …Service X Experimentation Engineering Pre-Production Environment: Multiple teams with zero Hadoop experience with a desire to investigate Hadoop. Production Environment: Digital Marketing products looking to take advantage of existing data managed by Ops.
  • 6. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 3rd Party Integrated Deployment 6
  • 7. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Big Questions 7 What are the two questions op teams must answer? »Where is my data? »How do I access it? Local StorageShared Storage
  • 8. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Solution 8 • Integration with Adobe DMBU Private Cloud • HDFS Storage Integration • Service Blueprints in vCAC Data Layer – Hadoop on Isilon Elastic Virtual Compute Layer
  • 9. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
  • 10. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.  VMware vSphere BDE web site  http://www.vmware.com/bde  Virtualized Hadoop Performance with VMware vSphere 5.1  http://www.vmware.com/resources/techresources/10220  Benchmarking Case Study of Virtualized Hadoop Performance on vSphere 5  http://vmware.com/files/pdf/VMW-Hadoop-Performance-vSphere5.pdf  Hadoop Virtualization Extensions (HVE) :  http://www.vmware.com/files/pdf/Hadoop-Virtualization-Extensions-on-VMware-vSphere-5.pdf  Apache Hadoop High Availability Solution on VMware vSphere 5.1 http://vmware.com/files/pdf/Apache-Hadoop-VMware-HA-solution.pdf  Hadoop-as-a-Service workflows for vCloud Automation Center https://solutionexchange.vmware.com/store/products/hadoop-as-a-service-vmware-vcloud-automation-center-and-big-data-extension  Project Serengeti website http://www.projectserengeti.org https://github.com/vmware-serengeti VMware vSphere BDE and Hadoop Resources
  • 11. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Hadoop Virtual Extensions  Topology Extensions: • Enable Hadoop to recognize additional virtualization layer for read/write/balancing for proper replica placement • Enable compute/data node separation without losing locality  Elasticity Extensions: • Ability to dynamically adjust resources allocated (CPU, memory, map/reduce slots) to compute nodes • Enables runtime elasticity of Hadoop nodes
  • 12. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. HVE Adds a New Layer in Hadoop Network Topology • D = data center • R = rack • NG = node group • HG = node N13N1 N2 N3 N4 N5 N6 N7 N8 N9 N10 N11 N12 R1 R2 R3 R4 D1 D2 / NG1 NG2 NG3 NG4 NG5 NG6 NG7 NG8
  • 13. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. vSphere Big Data Extensions Architecture vCloud Automation Center Big Data Extensions vCenterOperationsManager
  • 14. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. State, stats (Slots used, Pending work) Commands (Decommission, Recommission) Stats and VM configuration Serengeti Job Tracker vCenter DB Manual/Auto Power on/off Virtual Hadoop Manager (VHM) Job Tracker Task Tracker Task Tracker Task Tracker vCenter Server Serengeti Configuration VC state and stats Hadoop state and stats VC actions Hadoop actions Algorithms Cluster Configuration Resource Management Module