• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Is your cloud ready for Big Data? Strata NY 2013
 

Is your cloud ready for Big Data? Strata NY 2013

on

  • 542 views

 

Statistics

Views

Total Views
542
Views on SlideShare
539
Embed Views
3

Actions

Likes
2
Downloads
19
Comments
0

2 Embeds 3

http://www.linkedin.com 2
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Is your cloud ready for Big Data? Strata NY 2013 Is your cloud ready for Big Data? Strata NY 2013 Presentation Transcript

    • Is Your Cloud Ready for Big Data? Richard McDougall CTO, Storage and Application Services © 2009 VMware Inc. All rights reserved
    • Not Just for the Web Giants – The Intelligent Enterprise 2
    • Real-time analysis allows instant understanding of market dynamics. Retailers can have intimate understanding of their customers needs and use direct targeted marketing. Market Segment Analysis ! Personalized Customer Targeting` 3
    • The Emerging Pattern of Big Data Systems: Retail Example Real-Time Processing Data Science Real-Time Streams Machine Learning Parallel Data Processing Exa-scale Data Store Cloud Infrastructure 4
    • Storage: Plan for Peta-scale Data Storage and Processing Analytics Rapidly Outgrows Traditional Data Size by 100x 1000 100 10 Online Apps PB of Data Analytics 1 0.1 0.01 2000 5 2003 2006 2009 2012 2015
    • Unprecedented Scale We are creating an Exabyte of data every minute in 2013 Yottabyte by 2030 “Data transparency, amplified by Social Networks generates data at a scale never seen before” - The Human Face of Big Data 6
    • A single GE Jet Engine produces 10 Terabytes of data in one hour – 90 Petabytes per year. Enabling early detection of faults, common mode failures, product engineering feedback. Post Mortem ! Proactively Maintained Connected Product 7
    • The Emerging Pattern of Big Data Systems: Manufacturing Real-Time Sensor Analytics Product Engineering Support Data Science Real-Time Processing Machine Learning Parallel Data Processing Exa-scale Data Store Cloud Infrastructure 8
    • Cloud Platform © 2009 VMware Inc. All rights reserved
    • Cloud Platform: Supporting Mixed Big Data Workloads Machine Learning Machine Learning Real-Time Analytics Real-Time Analytics Hadoop Cloud Infrastructure Compute Hadoop Management Storage/Availability Network/Security 10
    • Cloud Platform: Supporting Multiple Tenants Web User Analytics Historical Customer Behavior Financial Analysis Cloud Infrastructure Compute Management Storage/Availability Network/Security 11
    • What if you can… Recommendation engine Production Test Ad targeting Production One physical platform to support multiple virtual big data clusters Test Experimentation Production recommendation engine Experimentation 12 Experimentation Test/Dev Production Ad Targeting
    • Values of a Cloud Platform for Big Data 1 Agility / Rapid deployment 2 Isolation for resource control and security 3 Lower Capex 4 Operational efficiency Compute Management Storage/Availability Network/Security 13
    • Hadoop as a Service Operational Simplicity High Availability Elasticity & Multi-tenancy !  Rapid deployment !  High availability for entire Hadoop stack !  Shrink and expand cluster on demand !  One click to setup !  Independent scaling of Compute and data !  One stop command center !  Easy to configure/ reconfigure 14 !  Battle-tested !  Strong multi-tenancy
    • Self Service Access to Big Data Environments Developer Data Scientist High priority … •  3 Hadoop nodes •  Cloudera, Pivotal MapR •  Small VM •  Local storage •  No HA •  … •  •  •  •  •  •  •  •  •  •  •  •  •  … •  … 5 Hadoop nodes Cloudera, Pivotal Hive, Pig Medium VM HA … 50 Hadoop nodes Cloudera Hive, Pig Large VM HA … Templates for Different Cloud Users 15
    • Big Data needs a Mix of Workloads Other Hadoop batch analysis Compute layer NoSQL Big SQL HBase real-time queries Impala, Pivotal HawQ Cassandra, Mongo, etc Spark, Shark, Solr, Platfora, Etc,… File System/Data Store Platform Virtualization Technology Host 16 Host Host Host Host Host Host
    • Strong Isolation between Workloads is Key Hungry Workload 1 Reckless Workload 2 Virtualization Platform 17 Nosy Workload 3
    • Community activity in Isolation and Resource Management !  YARN •  Goal: Support workloads other than M-R on Hadoop •  Initial need is for MPI/M-R from Yahoo •  Non-posix File system self selects workload types !  Mesos •  Distributed Resource Broker •  Mixed Workloads with some RM •  Active project, in use at Twitter •  Leverages OS Virtualization – e.g. cgroups !  Virtualization •  Virtual machine as the primary isolation, resource management and versioned deployment container •  Basis for Project Serengeti 18
    • Use case: Elastic Hadoop with Tiered SLA •  Production workloads has high priority •  Experimentation workloads has lower priority Experimentation Mapreduce Compute layer Dynamic resourcepool Compute VM Compute VM Experimentation Experimentation Production Mapreduce Compute VM Compute VM Compute VM 19 Compute VM Compute VM Production recommendation engine Production vSphere Data layer Compute VM
    • Cloud Enabled Auto-elastic Hadoop J T N N T T VHM T T T T J T T T T T DATA VM DATA VM DATA VM ESX ESX ESX Local Disks JT: JobTracker TT: TaskTracker NN: NameNode VHM: Virtual Hadoop Manager 20 T T Hadoop HDFS VMs Hadoop Compute VMs SAN/NAS Non-Hadoop VMs
    • Hadoop Performance with Virtualization 32 hosts/3.6GHz 8 cores/15K RPM 146GB SAS disks/10GbE/72-96GB RAM (lower is better) [http://www.vmware.com/resources/techresources/10360, Jeff Buell, Apr 2013] 21
    • Network Platform © 2009 VMware Inc. All rights reserved
    • A Typical Network Architecture Aggrega7ng% Switch% Aggrega7ng% Switch% L2%Switch% L2%Switch% L2%Switch% L2%Switch% Top%of%Rack%Switch% Top%of%Rack%Switch% Top%of%Rack%Switch% Top%of%Rack%Switch% Host% Host% Host% Host% Host% Host% Host% Host% Host% Host% Host% Host% 23 Host% Host% Host% Host%
    • Traditional Networks: Core Switch is the Choke Point Core Aggregation Network Topology Rack Hosts Hosts Non Uniform Bandwidth 10s of Gbits 100s of Gbits 24 Modeled Bandwidth
    • Modern Networks: Great for Big Data Spine Network Topology Leaf Hosts Uniform Bandwidth 25 Modeled Bandwidth
    • Flat Networks Allow for New Infrastructure Models Converged Storage Separated Storage Top%of%Rack%Switch% Top%of%Rack%Switch% Host% Host% Host% Compute Top%of%Rack%Switch% Host% Storage Top%of%Rack%Switch% Host% Storage% Host% Storage% Host% Storage% Host% Storage% Host% Host% Host% Separated Storage Host% Host% Host% 26 Storage%
    • Storage Platform © 2009 VMware Inc. All rights reserved
    • Use Local Disk where it’s Needed SAN Storage Local Storage $2 - $10/Gigabyte $1 - $5/Gigabyte $0.05/Gigabyte $1M gets: 0.5Petabytes 200,000 IOPS 8Gbyte/sec 28 NAS Filers $1M gets: 1 Petabyte 200,000 IOPS 10Gbyte/sec $1M gets: 10 Petabytes 400,000 IOPS 250 Gbytes/sec
    • Storage Economics: Traditional vs. Scale-out $5.50 Traditional SAN/NAS $5.00 $4.50 $4.00 $3.50 $3.00 Cost per GB $2.50 $2.00 $1.50 Distributed Object Storage $1.00 $0.50 $0.5 1 2 4 8 16 Petabytes Deployed Scale-out NAS Isilon 29 32 64 128 HDFS MAPR CEPH Gluster Scality
    • Big Data Storage Architectures External SAN With HDFS Local Disks With HDFS or Other HDFS, CEPH, MAPR, Gluster, Scality, … 30 External Scale-out NAS
    • Features from New Storage Solutions Snapshots Universal File Store Clones Geo Replication Erasure Coding NFS Access Posix Support 31 QoS Controls SSD Capability
    • Summary © 2009 VMware Inc. All rights reserved
    • Customers Winning from Consolidated Big Data Platforms “Dedicated hardware makes no sense” “Software-defined Datacenter enables rapid deployment multiple tenants and labs” “Our mixed workloads include Hadoop, Database, ETL and App-servers” Compute Management Storage/Availability Network/Security 33 “Any performance penalties are minor”
    • Cloud Infrastructure is Ready for Big Data – Are you? Cloud Infrastructure 34
    • Is Your Cloud Ready for Big Data? Richard McDougall CTO, Storage and Application Services @richardmcdougll © 2009 VMware Inc. All rights reserved