Hadoop-as-a-Service for Lifecycle Management Simplicity

  • 1,494 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,494
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Introduction and agenda
    Ops benefits
    Tech benefits
    Architecture
    Use cases
    Demo video
    Hybrid data model
    Current directions
    Q&A
    Supplementals
  • Adobe is a Big Data company.
    Adobe adopting a virtualization approach of Hadoop has both business and technical justifications and allows competitive differentiation.
    Analytics is core competency of DMBU.

  • Rapid provisioning: Much of the cluster deployment process can be automated using existing tools.
    High availability: HA protection can be provided through the virtualization platform to protect the single points of failure in the Hadoop system.
    Elasticity: Hadoop capacity can be scaled up and down on demand in a virtual environment.
    Multi-tenancy: Different tenants running Hadoop can be isolated in separate VMs, providing stronger VM-grade resource and security isolation.

    Operational Simplicity
    Rapid Deployment
    Self service tools
    Performance
    Maximize Resource Utilization
    True multi-tenancy
    Elastic scaling
    Avoid dedicated hardware
    VM-based isolation
    Increase resource utilization
    Architect Scalable Platform
    Deployment choice
    Maintain management flexibility at scale
    Control Costs
    Leverage toolsets
    Security

  • Expecting a lot of questions on this one and halfway through, so good time for intermediate Q&A if Chris wants to discuss some of the physical design. We can defer questions on use-cases and workflows since those will be immediately following.
  • Prod and dev review
  • Video walkthrough of vCAC deployment and auto-discovery via Cloudera Manager
  • Hybrid storage model to get the both of both worlds
    Or for flexibility
    Master nodes:
    NameNode, JobTracker on shared storage
    Leverage vSphere vMotion, HA and FT
    Slave nodes
    TaskTracker, DataNode on local storage
    Lower cost, scalable bandwidth
  • Identify acronyms, DMBU and vCAC first.
    Integration with Adobe DMBU Private Cloud: IaaS environment leveraging VMware stack (vCAC + vCOPs + vCenter).
    HDFS Storage Integration: Storage team is currently managing >10PB of data on Isilon. Presenting this layer, via HDFS, to multiple product teams from a single-view.
    Service Blueprints in vCAC: Offering multiple blueprints for various cluster types and sizes within vCAC. Present these blueprints to the Service Catalog and our internal self-provisioning portal.
  • Q&A slide
  • Supplementary links
  • Contributed back to Hadoop community
  • HVE Supplemental
  • BDE Components Supplemental
  • BDE supplementary

Transcript

  • 1. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Hadoop-as-a-Service for Lifecycle Management Simplicity Chris Mutchler | Adobe Compute Platform Engineer | @chrismutchler Andrew Nelson | VMware Staff Systems Engineer | @vmwnelson | virtual-hiking.blogspot.com 1
  • 2. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Operational Approach to Virtualizing Hadoop 2 Why even bother? These are the four reasons why I wanted to tackle this problem. » Excited about the idea of developing internal Platform-as-a-Service offering. » Solves a common “shadow IT” problem in infrastructure organizations and save $$$. » Adobe is a Big Data company, it makes sense for us to have a Hadoop offering. » It’s bleeding edge. Innovate quickly and scale.
  • 3. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Benefits of Virtualizing Hadoop
  • 4. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Reference Architecture 4 ESXi Tier-0 Flash OS App OS App OS App Private Cloud Resource Pools VMware vCenter VMware vCloud Automation Center VMware Big Data Extensions
  • 5. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Adobe Use-Cases 5 Two unique use-cases Experimentation Production A Production X Test/Dev Production Test Production Test Experimentation Service A… …Service X Experimentation Engineering Pre-Production Environment: Multiple teams with zero Hadoop experience with a desire to investigate Hadoop. Production Environment: Digital Marketing products looking to take advantage of existing data managed by Ops.
  • 6. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 3rd Party Integrated Deployment 6
  • 7. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Big Questions 7 What are the two questions op teams must answer? »Where is my data? »How do I access it? Local StorageShared Storage
  • 8. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Solution 8 • Integration with Adobe DMBU Private Cloud • HDFS Storage Integration • Service Blueprints in vCAC Data Layer – Hadoop on Isilon Elastic Virtual Compute Layer
  • 9. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
  • 10. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.  VMware vSphere BDE web site  http://www.vmware.com/bde  Virtualized Hadoop Performance with VMware vSphere 5.1  http://www.vmware.com/resources/techresources/10220  Benchmarking Case Study of Virtualized Hadoop Performance on vSphere 5  http://vmware.com/files/pdf/VMW-Hadoop-Performance-vSphere5.pdf  Hadoop Virtualization Extensions (HVE) :  http://www.vmware.com/files/pdf/Hadoop-Virtualization-Extensions-on-VMware-vSphere-5.pdf  Apache Hadoop High Availability Solution on VMware vSphere 5.1 http://vmware.com/files/pdf/Apache-Hadoop-VMware-HA-solution.pdf  Hadoop-as-a-Service workflows for vCloud Automation Center https://solutionexchange.vmware.com/store/products/hadoop-as-a-service-vmware-vcloud-automation-center-and-big-data-extension  Project Serengeti website http://www.projectserengeti.org https://github.com/vmware-serengeti VMware vSphere BDE and Hadoop Resources
  • 11. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Hadoop Virtual Extensions  Topology Extensions: • Enable Hadoop to recognize additional virtualization layer for read/write/balancing for proper replica placement • Enable compute/data node separation without losing locality  Elasticity Extensions: • Ability to dynamically adjust resources allocated (CPU, memory, map/reduce slots) to compute nodes • Enables runtime elasticity of Hadoop nodes
  • 12. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. HVE Adds a New Layer in Hadoop Network Topology • D = data center • R = rack • NG = node group • HG = node N13N1 N2 N3 N4 N5 N6 N7 N8 N9 N10 N11 N12 R1 R2 R3 R4 D1 D2 / NG1 NG2 NG3 NG4 NG5 NG6 NG7 NG8
  • 13. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. vSphere Big Data Extensions Architecture vCloud Automation Center Big Data Extensions vCenterOperationsManager
  • 14. © 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. State, stats (Slots used, Pending work) Commands (Decommission, Recommission) Stats and VM configuration Serengeti Job Tracker vCenter DB Manual/Auto Power on/off Virtual Hadoop Manager (VHM) Job Tracker Task Tracker Task Tracker Task Tracker vCenter Server Serengeti Configuration VC state and stats Hadoop state and stats VC actions Hadoop actions Algorithms Cluster Configuration Resource Management Module