Building

Hadoop-as-a-Service
Using Pivotal HD, Project Serengeti,
And EMC Isilon

Bernd Kaponig
EMC Solutions Group

© Co...
Roadmap Information Disclaimer
 EMC makes no representation and undertakes no obligations with
regard to product planning...
Goal Of This Session
 Demonstrate How Greenplum/Pivotal HD, Project
Serengeti And Isilon Can Work Together To Deliver
Had...
What Is Hadoop-As-A-Service?
Tenant
Analytics-asa-Service

Data
Scientist
Tenant/User
Management

Tenant
Hadoop-asa-Servic...
How “Classic” Hadoop Works
HDFS
CLIEN
T

1: Create file

JOB
TRKR

NAME
NODE

Master

© Copyright 2013 EMC Corporation. Al...
How “Classic” Hadoop Works
MR
APP

1: Submit job

2: Check for tasks

JOB
TRKR

NAME
NODE

Master

© Copyright 2013 EMC Co...
How “Classic” Hadoop Works
 Physical Hardware Is Dedicated To Node
 Each Node Works With Local Storage
 Physical Networ...
Pivotal HD Architecture
Pivotal HD
Enterprise
Configure,
Resource
Management
& Workflow

HBase

Hadoop Virtualization (HVE...
“Classic” Hadoop Challenges
 Hard To Deploy And Operate
 Poor Utilization Of Storage And/Or CPU
 Inefficient Data Stagi...
The Road To Hadoop-As-A-Service
Tenant/User
Management

Self-Service
Portal

Metering
Provisioning

 Physical

 Virtual
...
Virtualized Hadoop With Local Storage
Virtual
Infrastructure
VMMaster
+ VMDK

VM + VMDK
Worker

JOB
TRKR

TASK
TRKR

NAME
...
Virtualized Hadoop With Local Storage
JOB
TRKR

NAME
NODE

TASK
TRKR

Master

Server + DAS

DATA
NODE

Worker

Server + DA...
Hadoop Runs Well Virtualized
450
Elapsed time, seconds
(lower is better)

400
350

Nativ
e
1 VM

300
250
200
150
100
50
0
...
Project Serengeti
 Deploy Hadoop Cluster In 10
minutes
 Customize Hadoop Cluster
 One-Stop Command Center
 Open Source...
Virtualized Hadoop With Shared Storage
JOB
TRKR

NAME
NODE

TASK
TRKR

DATA
NODE

TASK
TRKR

DATA
NODE

TASK
TRKR

DATA
NO...
Virtualized Hadoop With Shared Storage
JOB
TRKR

NAME
NODE

TASK
TRKR

DATA
NODE

TASK
TRKR

DATA
NODE

TASK
TRKR

DATA
NO...
Virtualized Hadoop With Isilon


Worker

NAME
NODE

Server

Server

TASK
TRKR

Isilon

Efficient Data
Loading



No SPOF...
Hadoop With Software-Defined Storage
JOB
TRKR

TASK
TRKR

TASK
TRKR

NAME
NODE

DATA
NODE

Virtual
Infrastructure
Master

...
Making It As-A-Service
SELF
SERV

WaveMaker

HD
LCM

Serengeti

WORK
FLOWS

METE
RING

USER
MGMT

TEN’T
MGMT

vCenter O & ...
HDaaS Solution Component Interaction
Data
Scientist

Analyze

Manage

PORTAL
UI

SERENGETI
CLIENT

API

2: Invoke

HDAAS
W...
Tenant Isolation On Isilon
/ifs/HDFS

 One Directory Within OneFS Per Tenant,
One Subdirectory Per Data Scientist
 Acces...
Demo
© Copyright 2013 EMC Corporation. All rights reserved.

22
 HDaaS Solution Is Your Jump-Start Kit To
Hadoop-As-A-Service – Free!

Compute

Summary

 Pivotal HD Brings Features Lik...
What’s Next? HAWQ
HAWQ– Advanced
Database Services

Pivotal HD
Enterprise

ANSI SQL + Analytics

Configure,

HBase

Xtensi...
Resources
 HDaaS Solution Collateral

– White Paper, Presentations, Demos
– http://powerlink.emc.com

 EMC Solution Pavi...
Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon
Upcoming SlideShare
Loading in...5
×

Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon

1,606

Published on

Hadoop has made it into the enterprise mainstream as Big Data technology. But, what about Hadoop as a private or public cloud service on a shared infrastructure? This session looks at a Hadoop solution with virtualization, shared storage, and multi-tenancy, and discuss how service providers can use Pivotal Hadoop Distribution, Isilon, and Serengeti to offer Hadoop-as-a-Service.


Objective 1: Understand Hadoop and its deployment challenges.
After this session you will be able to:
Objective 2: Understand the EMC HDaaS solution architecture and the use cases it addresses.
Objective 3: Understand Pivotal Hadoop Distribution, Serengeti and Isilon's Hadoop features.

Published in: Technology

Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon

  1. 1. Building Hadoop-as-a-Service Using Pivotal HD, Project Serengeti, And EMC Isilon Bernd Kaponig EMC Solutions Group © Copyright 2013 EMC Corporation. All rights reserved. 1
  2. 2. Roadmap Information Disclaimer  EMC makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”).  Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.  Roadmap information is EMC Restricted Confidential and is provided under the terms, conditions and restrictions defined in the EMC NonDisclosure Agreement in place with your organization. © Copyright 2013 EMC Corporation. All rights reserved. 2
  3. 3. Goal Of This Session  Demonstrate How Greenplum/Pivotal HD, Project Serengeti And Isilon Can Work Together To Deliver Hadoop-as-a-Service Capabilities In A Public Or Private Service Provider Context © Copyright 2013 EMC Corporation. All rights reserved. 3
  4. 4. What Is Hadoop-As-A-Service? Tenant Analytics-asa-Service Data Scientist Tenant/User Management Tenant Hadoop-asa-Service Self-Service Portal Data Scientist Metering Infrastructureas-a-Service © Copyright 2013 EMC Corporation. All rights reserved. Provisiong Service Provider 4
  5. 5. How “Classic” Hadoop Works HDFS CLIEN T 1: Create file JOB TRKR NAME NODE Master © Copyright 2013 EMC Corporation. All rights reserved. 2: Write TASK TRKR DATA NODE Worker 3: Replicate TASK TRKR DATA NODE Worker TASK TRKR DATA NODE Physical Hardware Worker 5
  6. 6. How “Classic” Hadoop Works MR APP 1: Submit job 2: Check for tasks JOB TRKR NAME NODE Master © Copyright 2013 EMC Corporation. All rights reserved. 3: Retrieve task resources TASK TRKR DATA NODE Worker TASK TRKR DATA NODE Worker TASK TRKR DATA NODE Physical Hardware Worker 6
  7. 7. How “Classic” Hadoop Works  Physical Hardware Is Dedicated To Node  Each Node Works With Local Storage  Physical Network Topology JOB TRKR NAME NODE Master © Copyright 2013 EMC Corporation. All rights reserved. TASK TRKR DATA NODE Worker TASK TRKR DATA NODE Worker TASK TRKR DATA NODE Physical Hardware Worker 7
  8. 8. Pivotal HD Architecture Pivotal HD Enterprise Configure, Resource Management & Workflow HBase Hadoop Virtualization (HVE) Pig, Hive, Mahout Map Reduce Yarn Monitor, Manage Command Center HDFS Zookeeper Deploy, DataLoader Sqoop Flume Apache © Copyright 2013 EMC Corporation. All rights reserved. Pivotal HD Added Value 8
  9. 9. “Classic” Hadoop Challenges  Hard To Deploy And Operate  Poor Utilization Of Storage And/Or CPU  Inefficient Data Staging And Loading Processes  Lack Of Multi-Tenancy  Backup And Disaster Recovery Missing  Cluster Sprawl © Copyright 2013 EMC Corporation. All rights reserved. 9
  10. 10. The Road To Hadoop-As-A-Service Tenant/User Management Self-Service Portal Metering Provisioning  Physical  Virtual  Dedicated  Shared, Elastic Compute  Shared, Elastic Storage  Multi-Tenant  Single Tenant  Multi-App  As-A-Service © Copyright 2013 EMC Corporation. All rights reserved. 10
  11. 11. Virtualized Hadoop With Local Storage Virtual Infrastructure VMMaster + VMDK VM + VMDK Worker JOB TRKR TASK TRKR NAME NODE Master Server + DAS DATA NODE Server + DAS Worker © Copyright 2013 EMC Corporation. All rights reserved. VM + VMDK Worker TASK TRKR DATA NODE Worker Server + DAS VM + VMDK Worker TASK TRKR DATA NODE Physical Hardware Server + DAS Worker 11
  12. 12. Virtualized Hadoop With Local Storage JOB TRKR NAME NODE TASK TRKR Master Server + DAS DATA NODE Worker Server + DAS TASK TRKR DATA NODE Worker Server + DAS TASK TRKR DATA NODE Worker Server + DAS  Unified Operations  Shared Resources = Higher Utilization  Elastic Resources = Faster Provisioning 5-10x Better CPU Utilization! © Copyright 2013 EMC Corporation. All rights reserved. 12
  13. 13. Hadoop Runs Well Virtualized 450 Elapsed time, seconds (lower is better) 400 350 Nativ e 1 VM 300 250 200 150 100 50 0 TeraGen TeraSort TeraValidate Source: http://www.vmware.com/files/pdf/techpaper/VMW-HadoopPerformance-vSphere5.pdf © Copyright 2013 EMC Corporation. All rights reserved. 13
  14. 14. Project Serengeti  Deploy Hadoop Cluster In 10 minutes  Customize Hadoop Cluster  One-Stop Command Center  Open Source Project Backed By VMware, Launched In June 2012 © Copyright 2013 EMC Corporation. All rights reserved. 14
  15. 15. Virtualized Hadoop With Shared Storage JOB TRKR NAME NODE TASK TRKR DATA NODE TASK TRKR DATA NODE TASK TRKR DATA NODE Virtual Infrastructure Master Worker Worker Worker Physical Hardware Server + DAS Server + DAS © Copyright 2013 EMC Corporation. All rights reserved. Server + DAS Server + DAS 15
  16. 16. Virtualized Hadoop With Shared Storage JOB TRKR NAME NODE TASK TRKR DATA NODE TASK TRKR DATA NODE TASK TRKR DATA NODE Virtual Infrastructure Master Worker Worker Worker NAME NODE Server © Copyright 2013 EMC Corporation. All rights reserved. Server Isilon Physical Hardware Isilon 16
  17. 17. Virtualized Hadoop With Isilon  Worker NAME NODE Server Server TASK TRKR Isilon Efficient Data Loading  No SPOF End-To-End Data Protection  Leading Storage Efficiency Worker DATA NODE NAME NODE DATA NODE Isilon Replication Overhead Only 20% Rather Than 200%! © Copyright 2013 EMC Corporation. All rights reserved. Native HDFS Support (Plus NFS, CIFS etc.)  Worker TASK TRKR Independent Scaling  Master TASK TRKR   JOB TRKR Multi-App ScaleOut Storage Platform 17
  18. 18. Hadoop With Software-Defined Storage JOB TRKR TASK TRKR TASK TRKR NAME NODE DATA NODE Virtual Infrastructure Master Worker Worker Isilon VM Physical Hardware Server © Copyright 2013 EMC Corporation. All rights reserved. Server Any NAS Any NAS 18
  19. 19. Making It As-A-Service SELF SERV WaveMaker HD LCM Serengeti WORK FLOWS METE RING USER MGMT TEN’T MGMT vCenter O & CB Postgres TASK TRKR TASK TRKR HD Cmd Center Portal JOB TRKR vCenter NAME NODE DATA NODE NAME NODE DATA NODE Infrastr. Mgmt. © Copyright 2013 EMC Corporation. All rights reserved. 19
  20. 20. HDaaS Solution Component Interaction Data Scientist Analyze Manage PORTAL UI SERENGETI CLIENT API 2: Invoke HDAAS WORKFLOWS WaveMaker 1: AAA 3: Provision vCenter Orchestrator SERENGETI SERVER 4: Instantiate SERENGETI AGENT PIVOTAL HD MASTER Serengeti 3: Provision ISILON REST API vCenter & ChargeBack PLATINU M GOLD SERENSERENGETI GETI AGENT AGENT vC & CB APIs PIVOPIVOTAL HD TAL HD MASTER WORKER SILVER BRONZE Isilon USER/T ENANT MGMT Postgres 3: Provision © Copyright 2013 EMC Corporation. All rights reserved. Serengeti Pivotal HD 20
  21. 21. Tenant Isolation On Isilon /ifs/HDFS  One Directory Within OneFS Per Tenant, One Subdirectory Per Data Scientist  Access Controlled By Group And User Rights /tenant1 /ds1 /tenant2 /ds2  Leverage SmartQuotas To Set Resource Limits And Report Usage  Separate Subnets For Tenants, LoadBalanced With SmartConnect © Copyright 2013 EMC Corporation. All rights reserved. 21
  22. 22. Demo © Copyright 2013 EMC Corporation. All rights reserved. 22
  23. 23.  HDaaS Solution Is Your Jump-Start Kit To Hadoop-As-A-Service – Free! Compute Summary  Pivotal HD Brings Features Like Virtualization Support to Hadoop  Serengeti Allows “One-Click” Deployment Of Hadoop Clusters On vSphere Systems © Copyright 2013 EMC Corporation. All rights reserved. Storage  Isilon Is The First And Only Enterprise-Ready, Scale-Out NAS That Natively Supports HDFS 23
  24. 24. What’s Next? HAWQ HAWQ– Advanced Database Services Pivotal HD Enterprise ANSI SQL + Analytics Configure, HBase Xtension Catalog Query Framework Services Optimizer Hadoop Virtualization (HVE) Pig, Hive, Mahout Dynamic Pipelining Resource Management & Workflow Map Reduce Yarn Monitor, Manage Command Center HDFS Zookeeper Deploy, DataLoader Sqoop Flume Apache © Copyright 2013 EMC Corporation. All rights reserved. Pivotal HD Added Value 24
  25. 25. Resources  HDaaS Solution Collateral – White Paper, Presentations, Demos – http://powerlink.emc.com  EMC Solution Pavillion  Related Sessions – Hadoop for Powerful Processing of Unstructured Data for Valuable Insights – Virtualize Big Data to Make the Elephant Dance – Taking Command of Big Data: Hadoop Analytics + Isilon Scale-Out Storage = One-Stop Solution for High Impact Business Insight © Copyright 2013 EMC Corporation. All rights reserved. 25
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×