Is Your Cloud Ready for Big Data?
Richard McDougall
CTO, Storage and Application Services

© 2009 VMware Inc. All rights r...
Not Just for the Web Giants – The Intelligent Enterprise

2
Real-time analysis allows
instant understanding of
market dynamics.
Retailers can have intimate
understanding of their
cus...
The Emerging Pattern of Big Data Systems: Retail Example

Real-Time
Processing
Data Science

Real-Time
Streams

Machine
Le...
Storage: Plan for Peta-scale Data Storage and Processing
Analytics Rapidly Outgrows Traditional Data Size
by 100x

1000

1...
Unprecedented Scale

We are creating an Exabyte
of data every minute in 2013

Yottabyte by 2030

“Data transparency,
ampli...
A single GE Jet Engine produces
10 Terabytes of data in one hour
– 90 Petabytes per year.
Enabling early detection of
faul...
The Emerging Pattern of Big Data Systems: Manufacturing
Real-Time
Sensor
Analytics

Product
Engineering

Support

Data Sci...
Cloud Platform

© 2009 VMware Inc. All rights reserved
Cloud Platform: Supporting Mixed Big Data Workloads

Machine
Learning
Machine
Learning

Real-Time
Analytics

Real-Time
Ana...
Cloud Platform: Supporting Multiple Tenants

Web User
Analytics

Historical Customer
Behavior

Financial
Analysis

Cloud I...
What if you can…
Recommendation engine

Production

Test

Ad targeting

Production

One physical platform to support multi...
Values of a Cloud Platform for Big Data

1

Agility / Rapid deployment

2

Isolation for resource control
and security

3
...
Hadoop as a Service

Operational Simplicity

High Availability

Elasticity & Multi-tenancy

!  Rapid deployment

!  High a...
Self Service Access to Big Data Environments

Developer

Data Scientist

High priority

…

•  3 Hadoop nodes
•  Cloudera, ...
Big Data needs a Mix of Workloads

Other

Hadoop
batch analysis

Compute
layer

NoSQL
Big SQL

HBase
real-time queries

Im...
Strong Isolation between Workloads is Key

Hungry
Workload 1

Reckless
Workload 2

Virtualization Platform

17

Nosy
Workl...
Community activity in Isolation and Resource Management

!  YARN
•  Goal: Support workloads other than M-R on Hadoop
•  In...
Use case: Elastic Hadoop with Tiered SLA

•  Production workloads has high priority
•  Experimentation workloads has lower...
Cloud Enabled Auto-elastic Hadoop

J
T

N
N

T
T

VHM

T
T

T
T

J
T

T
T

T
T

DATA VM

DATA VM

DATA VM

ESX

ESX

ESX

...
Hadoop Performance with Virtualization
32 hosts/3.6GHz 8 cores/15K RPM 146GB SAS disks/10GbE/72-96GB RAM

(lower is better...
Network Platform

© 2009 VMware Inc. All rights reserved
A Typical Network Architecture
Aggrega7ng%
Switch%

Aggrega7ng%
Switch%

L2%Switch%

L2%Switch%

L2%Switch%

L2%Switch%

T...
Traditional Networks: Core Switch is the Choke Point

Core

Aggregation

Network
Topology

Rack

Hosts

Hosts

Non Uniform...
Modern Networks: Great for Big Data

Spine

Network
Topology

Leaf

Hosts

Uniform Bandwidth

25

Modeled
Bandwidth
Flat Networks Allow for New Infrastructure Models

Converged
Storage

Separated
Storage

Top%of%Rack%Switch%

Top%of%Rack%...
Storage Platform

© 2009 VMware Inc. All rights reserved
Use Local Disk where it’s Needed

SAN Storage

Local Storage

$2 - $10/Gigabyte

$1 - $5/Gigabyte

$0.05/Gigabyte

$1M get...
Storage Economics: Traditional vs. Scale-out

$5.50

Traditional
SAN/NAS

$5.00
$4.50
$4.00
$3.50
$3.00
Cost per GB

$2.50...
Big Data Storage Architectures
External SAN
With HDFS

Local Disks
With HDFS or
Other

HDFS,
CEPH,
MAPR,
Gluster,
Scality,...
Features from New Storage Solutions

Snapshots

Universal File Store

Clones

Geo Replication

Erasure Coding
NFS Access
P...
Summary

© 2009 VMware Inc. All rights reserved
Customers Winning from Consolidated Big Data Platforms
“Dedicated hardware makes no
sense”
“Software-defined Datacenter
en...
Cloud Infrastructure is Ready for Big Data – Are you?

Cloud Infrastructure

34
Is Your Cloud Ready for Big Data?
Richard McDougall
CTO, Storage and Application Services
@richardmcdougll

© 2009 VMware ...
Upcoming SlideShare
Loading in...5
×

Is your cloud ready for Big Data? Strata NY 2013

826

Published on

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
826
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
27
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Is your cloud ready for Big Data? Strata NY 2013

  1. 1. Is Your Cloud Ready for Big Data? Richard McDougall CTO, Storage and Application Services © 2009 VMware Inc. All rights reserved
  2. 2. Not Just for the Web Giants – The Intelligent Enterprise 2
  3. 3. Real-time analysis allows instant understanding of market dynamics. Retailers can have intimate understanding of their customers needs and use direct targeted marketing. Market Segment Analysis ! Personalized Customer Targeting` 3
  4. 4. The Emerging Pattern of Big Data Systems: Retail Example Real-Time Processing Data Science Real-Time Streams Machine Learning Parallel Data Processing Exa-scale Data Store Cloud Infrastructure 4
  5. 5. Storage: Plan for Peta-scale Data Storage and Processing Analytics Rapidly Outgrows Traditional Data Size by 100x 1000 100 10 Online Apps PB of Data Analytics 1 0.1 0.01 2000 5 2003 2006 2009 2012 2015
  6. 6. Unprecedented Scale We are creating an Exabyte of data every minute in 2013 Yottabyte by 2030 “Data transparency, amplified by Social Networks generates data at a scale never seen before” - The Human Face of Big Data 6
  7. 7. A single GE Jet Engine produces 10 Terabytes of data in one hour – 90 Petabytes per year. Enabling early detection of faults, common mode failures, product engineering feedback. Post Mortem ! Proactively Maintained Connected Product 7
  8. 8. The Emerging Pattern of Big Data Systems: Manufacturing Real-Time Sensor Analytics Product Engineering Support Data Science Real-Time Processing Machine Learning Parallel Data Processing Exa-scale Data Store Cloud Infrastructure 8
  9. 9. Cloud Platform © 2009 VMware Inc. All rights reserved
  10. 10. Cloud Platform: Supporting Mixed Big Data Workloads Machine Learning Machine Learning Real-Time Analytics Real-Time Analytics Hadoop Cloud Infrastructure Compute Hadoop Management Storage/Availability Network/Security 10
  11. 11. Cloud Platform: Supporting Multiple Tenants Web User Analytics Historical Customer Behavior Financial Analysis Cloud Infrastructure Compute Management Storage/Availability Network/Security 11
  12. 12. What if you can… Recommendation engine Production Test Ad targeting Production One physical platform to support multiple virtual big data clusters Test Experimentation Production recommendation engine Experimentation 12 Experimentation Test/Dev Production Ad Targeting
  13. 13. Values of a Cloud Platform for Big Data 1 Agility / Rapid deployment 2 Isolation for resource control and security 3 Lower Capex 4 Operational efficiency Compute Management Storage/Availability Network/Security 13
  14. 14. Hadoop as a Service Operational Simplicity High Availability Elasticity & Multi-tenancy !  Rapid deployment !  High availability for entire Hadoop stack !  Shrink and expand cluster on demand !  One click to setup !  Independent scaling of Compute and data !  One stop command center !  Easy to configure/ reconfigure 14 !  Battle-tested !  Strong multi-tenancy
  15. 15. Self Service Access to Big Data Environments Developer Data Scientist High priority … •  3 Hadoop nodes •  Cloudera, Pivotal MapR •  Small VM •  Local storage •  No HA •  … •  •  •  •  •  •  •  •  •  •  •  •  •  … •  … 5 Hadoop nodes Cloudera, Pivotal Hive, Pig Medium VM HA … 50 Hadoop nodes Cloudera Hive, Pig Large VM HA … Templates for Different Cloud Users 15
  16. 16. Big Data needs a Mix of Workloads Other Hadoop batch analysis Compute layer NoSQL Big SQL HBase real-time queries Impala, Pivotal HawQ Cassandra, Mongo, etc Spark, Shark, Solr, Platfora, Etc,… File System/Data Store Platform Virtualization Technology Host 16 Host Host Host Host Host Host
  17. 17. Strong Isolation between Workloads is Key Hungry Workload 1 Reckless Workload 2 Virtualization Platform 17 Nosy Workload 3
  18. 18. Community activity in Isolation and Resource Management !  YARN •  Goal: Support workloads other than M-R on Hadoop •  Initial need is for MPI/M-R from Yahoo •  Non-posix File system self selects workload types !  Mesos •  Distributed Resource Broker •  Mixed Workloads with some RM •  Active project, in use at Twitter •  Leverages OS Virtualization – e.g. cgroups !  Virtualization •  Virtual machine as the primary isolation, resource management and versioned deployment container •  Basis for Project Serengeti 18
  19. 19. Use case: Elastic Hadoop with Tiered SLA •  Production workloads has high priority •  Experimentation workloads has lower priority Experimentation Mapreduce Compute layer Dynamic resourcepool Compute VM Compute VM Experimentation Experimentation Production Mapreduce Compute VM Compute VM Compute VM 19 Compute VM Compute VM Production recommendation engine Production vSphere Data layer Compute VM
  20. 20. Cloud Enabled Auto-elastic Hadoop J T N N T T VHM T T T T J T T T T T DATA VM DATA VM DATA VM ESX ESX ESX Local Disks JT: JobTracker TT: TaskTracker NN: NameNode VHM: Virtual Hadoop Manager 20 T T Hadoop HDFS VMs Hadoop Compute VMs SAN/NAS Non-Hadoop VMs
  21. 21. Hadoop Performance with Virtualization 32 hosts/3.6GHz 8 cores/15K RPM 146GB SAS disks/10GbE/72-96GB RAM (lower is better) [http://www.vmware.com/resources/techresources/10360, Jeff Buell, Apr 2013] 21
  22. 22. Network Platform © 2009 VMware Inc. All rights reserved
  23. 23. A Typical Network Architecture Aggrega7ng% Switch% Aggrega7ng% Switch% L2%Switch% L2%Switch% L2%Switch% L2%Switch% Top%of%Rack%Switch% Top%of%Rack%Switch% Top%of%Rack%Switch% Top%of%Rack%Switch% Host% Host% Host% Host% Host% Host% Host% Host% Host% Host% Host% Host% 23 Host% Host% Host% Host%
  24. 24. Traditional Networks: Core Switch is the Choke Point Core Aggregation Network Topology Rack Hosts Hosts Non Uniform Bandwidth 10s of Gbits 100s of Gbits 24 Modeled Bandwidth
  25. 25. Modern Networks: Great for Big Data Spine Network Topology Leaf Hosts Uniform Bandwidth 25 Modeled Bandwidth
  26. 26. Flat Networks Allow for New Infrastructure Models Converged Storage Separated Storage Top%of%Rack%Switch% Top%of%Rack%Switch% Host% Host% Host% Compute Top%of%Rack%Switch% Host% Storage Top%of%Rack%Switch% Host% Storage% Host% Storage% Host% Storage% Host% Storage% Host% Host% Host% Separated Storage Host% Host% Host% 26 Storage%
  27. 27. Storage Platform © 2009 VMware Inc. All rights reserved
  28. 28. Use Local Disk where it’s Needed SAN Storage Local Storage $2 - $10/Gigabyte $1 - $5/Gigabyte $0.05/Gigabyte $1M gets: 0.5Petabytes 200,000 IOPS 8Gbyte/sec 28 NAS Filers $1M gets: 1 Petabyte 200,000 IOPS 10Gbyte/sec $1M gets: 10 Petabytes 400,000 IOPS 250 Gbytes/sec
  29. 29. Storage Economics: Traditional vs. Scale-out $5.50 Traditional SAN/NAS $5.00 $4.50 $4.00 $3.50 $3.00 Cost per GB $2.50 $2.00 $1.50 Distributed Object Storage $1.00 $0.50 $0.5 1 2 4 8 16 Petabytes Deployed Scale-out NAS Isilon 29 32 64 128 HDFS MAPR CEPH Gluster Scality
  30. 30. Big Data Storage Architectures External SAN With HDFS Local Disks With HDFS or Other HDFS, CEPH, MAPR, Gluster, Scality, … 30 External Scale-out NAS
  31. 31. Features from New Storage Solutions Snapshots Universal File Store Clones Geo Replication Erasure Coding NFS Access Posix Support 31 QoS Controls SSD Capability
  32. 32. Summary © 2009 VMware Inc. All rights reserved
  33. 33. Customers Winning from Consolidated Big Data Platforms “Dedicated hardware makes no sense” “Software-defined Datacenter enables rapid deployment multiple tenants and labs” “Our mixed workloads include Hadoop, Database, ETL and App-servers” Compute Management Storage/Availability Network/Security 33 “Any performance penalties are minor”
  34. 34. Cloud Infrastructure is Ready for Big Data – Are you? Cloud Infrastructure 34
  35. 35. Is Your Cloud Ready for Big Data? Richard McDougall CTO, Storage and Application Services @richardmcdougll © 2009 VMware Inc. All rights reserved
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×