Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon

  • 1,194 views
Uploaded on

Hadoop has made it into the enterprise mainstream as Big Data technology. But, what about Hadoop as a private or public cloud service on a shared infrastructure? This session looks at a Hadoop …

Hadoop has made it into the enterprise mainstream as Big Data technology. But, what about Hadoop as a private or public cloud service on a shared infrastructure? This session looks at a Hadoop solution with virtualization, shared storage, and multi-tenancy, and discuss how service providers can use Pivotal Hadoop Distribution, Isilon, and Serengeti to offer Hadoop-as-a-Service.


Objective 1: Understand Hadoop and its deployment challenges.
After this session you will be able to:
Objective 2: Understand the EMC HDaaS solution architecture and the use cases it addresses.
Objective 3: Understand Pivotal Hadoop Distribution, Serengeti and Isilon's Hadoop features.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,194
On Slideshare
0
From Embeds
0
Number of Embeds
3

Actions

Shares
Downloads
101
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Building Hadoop-as-a-Service Using Pivotal HD, Project Serengeti, And EMC Isilon Bernd Kaponig EMC Solutions Group © Copyright 2013 EMC Corporation. All rights reserved. 1
  • 2. Roadmap Information Disclaimer  EMC makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”).  Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.  Roadmap information is EMC Restricted Confidential and is provided under the terms, conditions and restrictions defined in the EMC NonDisclosure Agreement in place with your organization. © Copyright 2013 EMC Corporation. All rights reserved. 2
  • 3. Goal Of This Session  Demonstrate How Greenplum/Pivotal HD, Project Serengeti And Isilon Can Work Together To Deliver Hadoop-as-a-Service Capabilities In A Public Or Private Service Provider Context © Copyright 2013 EMC Corporation. All rights reserved. 3
  • 4. What Is Hadoop-As-A-Service? Tenant Analytics-asa-Service Data Scientist Tenant/User Management Tenant Hadoop-asa-Service Self-Service Portal Data Scientist Metering Infrastructureas-a-Service © Copyright 2013 EMC Corporation. All rights reserved. Provisiong Service Provider 4
  • 5. How “Classic” Hadoop Works HDFS CLIEN T 1: Create file JOB TRKR NAME NODE Master © Copyright 2013 EMC Corporation. All rights reserved. 2: Write TASK TRKR DATA NODE Worker 3: Replicate TASK TRKR DATA NODE Worker TASK TRKR DATA NODE Physical Hardware Worker 5
  • 6. How “Classic” Hadoop Works MR APP 1: Submit job 2: Check for tasks JOB TRKR NAME NODE Master © Copyright 2013 EMC Corporation. All rights reserved. 3: Retrieve task resources TASK TRKR DATA NODE Worker TASK TRKR DATA NODE Worker TASK TRKR DATA NODE Physical Hardware Worker 6
  • 7. How “Classic” Hadoop Works  Physical Hardware Is Dedicated To Node  Each Node Works With Local Storage  Physical Network Topology JOB TRKR NAME NODE Master © Copyright 2013 EMC Corporation. All rights reserved. TASK TRKR DATA NODE Worker TASK TRKR DATA NODE Worker TASK TRKR DATA NODE Physical Hardware Worker 7
  • 8. Pivotal HD Architecture Pivotal HD Enterprise Configure, Resource Management & Workflow HBase Hadoop Virtualization (HVE) Pig, Hive, Mahout Map Reduce Yarn Monitor, Manage Command Center HDFS Zookeeper Deploy, DataLoader Sqoop Flume Apache © Copyright 2013 EMC Corporation. All rights reserved. Pivotal HD Added Value 8
  • 9. “Classic” Hadoop Challenges  Hard To Deploy And Operate  Poor Utilization Of Storage And/Or CPU  Inefficient Data Staging And Loading Processes  Lack Of Multi-Tenancy  Backup And Disaster Recovery Missing  Cluster Sprawl © Copyright 2013 EMC Corporation. All rights reserved. 9
  • 10. The Road To Hadoop-As-A-Service Tenant/User Management Self-Service Portal Metering Provisioning  Physical  Virtual  Dedicated  Shared, Elastic Compute  Shared, Elastic Storage  Multi-Tenant  Single Tenant  Multi-App  As-A-Service © Copyright 2013 EMC Corporation. All rights reserved. 10
  • 11. Virtualized Hadoop With Local Storage Virtual Infrastructure VMMaster + VMDK VM + VMDK Worker JOB TRKR TASK TRKR NAME NODE Master Server + DAS DATA NODE Server + DAS Worker © Copyright 2013 EMC Corporation. All rights reserved. VM + VMDK Worker TASK TRKR DATA NODE Worker Server + DAS VM + VMDK Worker TASK TRKR DATA NODE Physical Hardware Server + DAS Worker 11
  • 12. Virtualized Hadoop With Local Storage JOB TRKR NAME NODE TASK TRKR Master Server + DAS DATA NODE Worker Server + DAS TASK TRKR DATA NODE Worker Server + DAS TASK TRKR DATA NODE Worker Server + DAS  Unified Operations  Shared Resources = Higher Utilization  Elastic Resources = Faster Provisioning 5-10x Better CPU Utilization! © Copyright 2013 EMC Corporation. All rights reserved. 12
  • 13. Hadoop Runs Well Virtualized 450 Elapsed time, seconds (lower is better) 400 350 Nativ e 1 VM 300 250 200 150 100 50 0 TeraGen TeraSort TeraValidate Source: http://www.vmware.com/files/pdf/techpaper/VMW-HadoopPerformance-vSphere5.pdf © Copyright 2013 EMC Corporation. All rights reserved. 13
  • 14. Project Serengeti  Deploy Hadoop Cluster In 10 minutes  Customize Hadoop Cluster  One-Stop Command Center  Open Source Project Backed By VMware, Launched In June 2012 © Copyright 2013 EMC Corporation. All rights reserved. 14
  • 15. Virtualized Hadoop With Shared Storage JOB TRKR NAME NODE TASK TRKR DATA NODE TASK TRKR DATA NODE TASK TRKR DATA NODE Virtual Infrastructure Master Worker Worker Worker Physical Hardware Server + DAS Server + DAS © Copyright 2013 EMC Corporation. All rights reserved. Server + DAS Server + DAS 15
  • 16. Virtualized Hadoop With Shared Storage JOB TRKR NAME NODE TASK TRKR DATA NODE TASK TRKR DATA NODE TASK TRKR DATA NODE Virtual Infrastructure Master Worker Worker Worker NAME NODE Server © Copyright 2013 EMC Corporation. All rights reserved. Server Isilon Physical Hardware Isilon 16
  • 17. Virtualized Hadoop With Isilon  Worker NAME NODE Server Server TASK TRKR Isilon Efficient Data Loading  No SPOF End-To-End Data Protection  Leading Storage Efficiency Worker DATA NODE NAME NODE DATA NODE Isilon Replication Overhead Only 20% Rather Than 200%! © Copyright 2013 EMC Corporation. All rights reserved. Native HDFS Support (Plus NFS, CIFS etc.)  Worker TASK TRKR Independent Scaling  Master TASK TRKR   JOB TRKR Multi-App ScaleOut Storage Platform 17
  • 18. Hadoop With Software-Defined Storage JOB TRKR TASK TRKR TASK TRKR NAME NODE DATA NODE Virtual Infrastructure Master Worker Worker Isilon VM Physical Hardware Server © Copyright 2013 EMC Corporation. All rights reserved. Server Any NAS Any NAS 18
  • 19. Making It As-A-Service SELF SERV WaveMaker HD LCM Serengeti WORK FLOWS METE RING USER MGMT TEN’T MGMT vCenter O & CB Postgres TASK TRKR TASK TRKR HD Cmd Center Portal JOB TRKR vCenter NAME NODE DATA NODE NAME NODE DATA NODE Infrastr. Mgmt. © Copyright 2013 EMC Corporation. All rights reserved. 19
  • 20. HDaaS Solution Component Interaction Data Scientist Analyze Manage PORTAL UI SERENGETI CLIENT API 2: Invoke HDAAS WORKFLOWS WaveMaker 1: AAA 3: Provision vCenter Orchestrator SERENGETI SERVER 4: Instantiate SERENGETI AGENT PIVOTAL HD MASTER Serengeti 3: Provision ISILON REST API vCenter & ChargeBack PLATINU M GOLD SERENSERENGETI GETI AGENT AGENT vC & CB APIs PIVOPIVOTAL HD TAL HD MASTER WORKER SILVER BRONZE Isilon USER/T ENANT MGMT Postgres 3: Provision © Copyright 2013 EMC Corporation. All rights reserved. Serengeti Pivotal HD 20
  • 21. Tenant Isolation On Isilon /ifs/HDFS  One Directory Within OneFS Per Tenant, One Subdirectory Per Data Scientist  Access Controlled By Group And User Rights /tenant1 /ds1 /tenant2 /ds2  Leverage SmartQuotas To Set Resource Limits And Report Usage  Separate Subnets For Tenants, LoadBalanced With SmartConnect © Copyright 2013 EMC Corporation. All rights reserved. 21
  • 22. Demo © Copyright 2013 EMC Corporation. All rights reserved. 22
  • 23.  HDaaS Solution Is Your Jump-Start Kit To Hadoop-As-A-Service – Free! Compute Summary  Pivotal HD Brings Features Like Virtualization Support to Hadoop  Serengeti Allows “One-Click” Deployment Of Hadoop Clusters On vSphere Systems © Copyright 2013 EMC Corporation. All rights reserved. Storage  Isilon Is The First And Only Enterprise-Ready, Scale-Out NAS That Natively Supports HDFS 23
  • 24. What’s Next? HAWQ HAWQ– Advanced Database Services Pivotal HD Enterprise ANSI SQL + Analytics Configure, HBase Xtension Catalog Query Framework Services Optimizer Hadoop Virtualization (HVE) Pig, Hive, Mahout Dynamic Pipelining Resource Management & Workflow Map Reduce Yarn Monitor, Manage Command Center HDFS Zookeeper Deploy, DataLoader Sqoop Flume Apache © Copyright 2013 EMC Corporation. All rights reserved. Pivotal HD Added Value 24
  • 25. Resources  HDaaS Solution Collateral – White Paper, Presentations, Demos – http://powerlink.emc.com  EMC Solution Pavillion  Related Sessions – Hadoop for Powerful Processing of Unstructured Data for Valuable Insights – Virtualize Big Data to Make the Elephant Dance – Taking Command of Big Data: Hadoop Analytics + Isilon Scale-Out Storage = One-Stop Solution for High Impact Business Insight © Copyright 2013 EMC Corporation. All rights reserved. 25