Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon
 

Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon

on

  • 1,059 views

Hadoop has made it into the enterprise mainstream as Big Data technology. But, what about Hadoop as a private or public cloud service on a shared infrastructure? This session looks at a Hadoop ...

Hadoop has made it into the enterprise mainstream as Big Data technology. But, what about Hadoop as a private or public cloud service on a shared infrastructure? This session looks at a Hadoop solution with virtualization, shared storage, and multi-tenancy, and discuss how service providers can use Pivotal Hadoop Distribution, Isilon, and Serengeti to offer Hadoop-as-a-Service.


Objective 1: Understand Hadoop and its deployment challenges.
After this session you will be able to:
Objective 2: Understand the EMC HDaaS solution architecture and the use cases it addresses.
Objective 3: Understand Pivotal Hadoop Distribution, Serengeti and Isilon's Hadoop features.

Statistics

Views

Total Views
1,059
Views on SlideShare
1,031
Embed Views
28

Actions

Likes
2
Downloads
71
Comments
0

3 Embeds 28

http://localhost 14
http://dschool.co 9
http://192.168.6.179 5

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon Presentation Transcript

  • Building Hadoop-as-a-Service Using Pivotal HD, Project Serengeti, And EMC Isilon Bernd Kaponig EMC Solutions Group © Copyright 2013 EMC Corporation. All rights reserved. 1
  • Roadmap Information Disclaimer  EMC makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”).  Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.  Roadmap information is EMC Restricted Confidential and is provided under the terms, conditions and restrictions defined in the EMC NonDisclosure Agreement in place with your organization. © Copyright 2013 EMC Corporation. All rights reserved. 2
  • Goal Of This Session  Demonstrate How Greenplum/Pivotal HD, Project Serengeti And Isilon Can Work Together To Deliver Hadoop-as-a-Service Capabilities In A Public Or Private Service Provider Context © Copyright 2013 EMC Corporation. All rights reserved. 3
  • What Is Hadoop-As-A-Service? Tenant Analytics-asa-Service Data Scientist Tenant/User Management Tenant Hadoop-asa-Service Self-Service Portal Data Scientist Metering Infrastructureas-a-Service © Copyright 2013 EMC Corporation. All rights reserved. Provisiong Service Provider 4
  • How “Classic” Hadoop Works HDFS CLIEN T 1: Create file JOB TRKR NAME NODE Master © Copyright 2013 EMC Corporation. All rights reserved. 2: Write TASK TRKR DATA NODE Worker 3: Replicate TASK TRKR DATA NODE Worker TASK TRKR DATA NODE Physical Hardware Worker 5
  • How “Classic” Hadoop Works MR APP 1: Submit job 2: Check for tasks JOB TRKR NAME NODE Master © Copyright 2013 EMC Corporation. All rights reserved. 3: Retrieve task resources TASK TRKR DATA NODE Worker TASK TRKR DATA NODE Worker TASK TRKR DATA NODE Physical Hardware Worker 6
  • How “Classic” Hadoop Works  Physical Hardware Is Dedicated To Node  Each Node Works With Local Storage  Physical Network Topology JOB TRKR NAME NODE Master © Copyright 2013 EMC Corporation. All rights reserved. TASK TRKR DATA NODE Worker TASK TRKR DATA NODE Worker TASK TRKR DATA NODE Physical Hardware Worker 7
  • Pivotal HD Architecture Pivotal HD Enterprise Configure, Resource Management & Workflow HBase Hadoop Virtualization (HVE) Pig, Hive, Mahout Map Reduce Yarn Monitor, Manage Command Center HDFS Zookeeper Deploy, DataLoader Sqoop Flume Apache © Copyright 2013 EMC Corporation. All rights reserved. Pivotal HD Added Value 8
  • “Classic” Hadoop Challenges  Hard To Deploy And Operate  Poor Utilization Of Storage And/Or CPU  Inefficient Data Staging And Loading Processes  Lack Of Multi-Tenancy  Backup And Disaster Recovery Missing  Cluster Sprawl © Copyright 2013 EMC Corporation. All rights reserved. 9
  • The Road To Hadoop-As-A-Service Tenant/User Management Self-Service Portal Metering Provisioning  Physical  Virtual  Dedicated  Shared, Elastic Compute  Shared, Elastic Storage  Multi-Tenant  Single Tenant  Multi-App  As-A-Service © Copyright 2013 EMC Corporation. All rights reserved. 10
  • Virtualized Hadoop With Local Storage Virtual Infrastructure VMMaster + VMDK VM + VMDK Worker JOB TRKR TASK TRKR NAME NODE Master Server + DAS DATA NODE Server + DAS Worker © Copyright 2013 EMC Corporation. All rights reserved. VM + VMDK Worker TASK TRKR DATA NODE Worker Server + DAS VM + VMDK Worker TASK TRKR DATA NODE Physical Hardware Server + DAS Worker 11
  • Virtualized Hadoop With Local Storage JOB TRKR NAME NODE TASK TRKR Master Server + DAS DATA NODE Worker Server + DAS TASK TRKR DATA NODE Worker Server + DAS TASK TRKR DATA NODE Worker Server + DAS  Unified Operations  Shared Resources = Higher Utilization  Elastic Resources = Faster Provisioning 5-10x Better CPU Utilization! © Copyright 2013 EMC Corporation. All rights reserved. 12
  • Hadoop Runs Well Virtualized 450 Elapsed time, seconds (lower is better) 400 350 Nativ e 1 VM 300 250 200 150 100 50 0 TeraGen TeraSort TeraValidate Source: http://www.vmware.com/files/pdf/techpaper/VMW-HadoopPerformance-vSphere5.pdf © Copyright 2013 EMC Corporation. All rights reserved. 13
  • Project Serengeti  Deploy Hadoop Cluster In 10 minutes  Customize Hadoop Cluster  One-Stop Command Center  Open Source Project Backed By VMware, Launched In June 2012 © Copyright 2013 EMC Corporation. All rights reserved. 14
  • Virtualized Hadoop With Shared Storage JOB TRKR NAME NODE TASK TRKR DATA NODE TASK TRKR DATA NODE TASK TRKR DATA NODE Virtual Infrastructure Master Worker Worker Worker Physical Hardware Server + DAS Server + DAS © Copyright 2013 EMC Corporation. All rights reserved. Server + DAS Server + DAS 15
  • Virtualized Hadoop With Shared Storage JOB TRKR NAME NODE TASK TRKR DATA NODE TASK TRKR DATA NODE TASK TRKR DATA NODE Virtual Infrastructure Master Worker Worker Worker NAME NODE Server © Copyright 2013 EMC Corporation. All rights reserved. Server Isilon Physical Hardware Isilon 16
  • Virtualized Hadoop With Isilon  Worker NAME NODE Server Server TASK TRKR Isilon Efficient Data Loading  No SPOF End-To-End Data Protection  Leading Storage Efficiency Worker DATA NODE NAME NODE DATA NODE Isilon Replication Overhead Only 20% Rather Than 200%! © Copyright 2013 EMC Corporation. All rights reserved. Native HDFS Support (Plus NFS, CIFS etc.)  Worker TASK TRKR Independent Scaling  Master TASK TRKR   JOB TRKR Multi-App ScaleOut Storage Platform 17
  • Hadoop With Software-Defined Storage JOB TRKR TASK TRKR TASK TRKR NAME NODE DATA NODE Virtual Infrastructure Master Worker Worker Isilon VM Physical Hardware Server © Copyright 2013 EMC Corporation. All rights reserved. Server Any NAS Any NAS 18
  • Making It As-A-Service SELF SERV WaveMaker HD LCM Serengeti WORK FLOWS METE RING USER MGMT TEN’T MGMT vCenter O & CB Postgres TASK TRKR TASK TRKR HD Cmd Center Portal JOB TRKR vCenter NAME NODE DATA NODE NAME NODE DATA NODE Infrastr. Mgmt. © Copyright 2013 EMC Corporation. All rights reserved. 19
  • HDaaS Solution Component Interaction Data Scientist Analyze Manage PORTAL UI SERENGETI CLIENT API 2: Invoke HDAAS WORKFLOWS WaveMaker 1: AAA 3: Provision vCenter Orchestrator SERENGETI SERVER 4: Instantiate SERENGETI AGENT PIVOTAL HD MASTER Serengeti 3: Provision ISILON REST API vCenter & ChargeBack PLATINU M GOLD SERENSERENGETI GETI AGENT AGENT vC & CB APIs PIVOPIVOTAL HD TAL HD MASTER WORKER SILVER BRONZE Isilon USER/T ENANT MGMT Postgres 3: Provision © Copyright 2013 EMC Corporation. All rights reserved. Serengeti Pivotal HD 20
  • Tenant Isolation On Isilon /ifs/HDFS  One Directory Within OneFS Per Tenant, One Subdirectory Per Data Scientist  Access Controlled By Group And User Rights /tenant1 /ds1 /tenant2 /ds2  Leverage SmartQuotas To Set Resource Limits And Report Usage  Separate Subnets For Tenants, LoadBalanced With SmartConnect © Copyright 2013 EMC Corporation. All rights reserved. 21
  • Demo © Copyright 2013 EMC Corporation. All rights reserved. 22
  •  HDaaS Solution Is Your Jump-Start Kit To Hadoop-As-A-Service – Free! Compute Summary  Pivotal HD Brings Features Like Virtualization Support to Hadoop  Serengeti Allows “One-Click” Deployment Of Hadoop Clusters On vSphere Systems © Copyright 2013 EMC Corporation. All rights reserved. Storage  Isilon Is The First And Only Enterprise-Ready, Scale-Out NAS That Natively Supports HDFS 23
  • What’s Next? HAWQ HAWQ– Advanced Database Services Pivotal HD Enterprise ANSI SQL + Analytics Configure, HBase Xtension Catalog Query Framework Services Optimizer Hadoop Virtualization (HVE) Pig, Hive, Mahout Dynamic Pipelining Resource Management & Workflow Map Reduce Yarn Monitor, Manage Command Center HDFS Zookeeper Deploy, DataLoader Sqoop Flume Apache © Copyright 2013 EMC Corporation. All rights reserved. Pivotal HD Added Value 24
  • Resources  HDaaS Solution Collateral – White Paper, Presentations, Demos – http://powerlink.emc.com  EMC Solution Pavillion  Related Sessions – Hadoop for Powerful Processing of Unstructured Data for Valuable Insights – Virtualize Big Data to Make the Elephant Dance – Taking Command of Big Data: Hadoop Analytics + Isilon Scale-Out Storage = One-Stop Solution for High Impact Business Insight © Copyright 2013 EMC Corporation. All rights reserved. 25