Is your cloud ready for Big Data? Strata NY 2013


Published on

Published in: Technology, Education
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Is your cloud ready for Big Data? Strata NY 2013

  1. 1. Is Your Cloud Ready for Big Data? Richard McDougall CTO, Storage and Application Services © 2009 VMware Inc. All rights reserved
  2. 2. Not Just for the Web Giants – The Intelligent Enterprise 2
  3. 3. Real-time analysis allows instant understanding of market dynamics. Retailers can have intimate understanding of their customers needs and use direct targeted marketing. Market Segment Analysis ! Personalized Customer Targeting` 3
  4. 4. The Emerging Pattern of Big Data Systems: Retail Example Real-Time Processing Data Science Real-Time Streams Machine Learning Parallel Data Processing Exa-scale Data Store Cloud Infrastructure 4
  5. 5. Storage: Plan for Peta-scale Data Storage and Processing Analytics Rapidly Outgrows Traditional Data Size by 100x 1000 100 10 Online Apps PB of Data Analytics 1 0.1 0.01 2000 5 2003 2006 2009 2012 2015
  6. 6. Unprecedented Scale We are creating an Exabyte of data every minute in 2013 Yottabyte by 2030 “Data transparency, amplified by Social Networks generates data at a scale never seen before” - The Human Face of Big Data 6
  7. 7. A single GE Jet Engine produces 10 Terabytes of data in one hour – 90 Petabytes per year. Enabling early detection of faults, common mode failures, product engineering feedback. Post Mortem ! Proactively Maintained Connected Product 7
  8. 8. The Emerging Pattern of Big Data Systems: Manufacturing Real-Time Sensor Analytics Product Engineering Support Data Science Real-Time Processing Machine Learning Parallel Data Processing Exa-scale Data Store Cloud Infrastructure 8
  9. 9. Cloud Platform © 2009 VMware Inc. All rights reserved
  10. 10. Cloud Platform: Supporting Mixed Big Data Workloads Machine Learning Machine Learning Real-Time Analytics Real-Time Analytics Hadoop Cloud Infrastructure Compute Hadoop Management Storage/Availability Network/Security 10
  11. 11. Cloud Platform: Supporting Multiple Tenants Web User Analytics Historical Customer Behavior Financial Analysis Cloud Infrastructure Compute Management Storage/Availability Network/Security 11
  12. 12. What if you can… Recommendation engine Production Test Ad targeting Production One physical platform to support multiple virtual big data clusters Test Experimentation Production recommendation engine Experimentation 12 Experimentation Test/Dev Production Ad Targeting
  13. 13. Values of a Cloud Platform for Big Data 1 Agility / Rapid deployment 2 Isolation for resource control and security 3 Lower Capex 4 Operational efficiency Compute Management Storage/Availability Network/Security 13
  14. 14. Hadoop as a Service Operational Simplicity High Availability Elasticity & Multi-tenancy !  Rapid deployment !  High availability for entire Hadoop stack !  Shrink and expand cluster on demand !  One click to setup !  Independent scaling of Compute and data !  One stop command center !  Easy to configure/ reconfigure 14 !  Battle-tested !  Strong multi-tenancy
  15. 15. Self Service Access to Big Data Environments Developer Data Scientist High priority … •  3 Hadoop nodes •  Cloudera, Pivotal MapR •  Small VM •  Local storage •  No HA •  … •  •  •  •  •  •  •  •  •  •  •  •  •  … •  … 5 Hadoop nodes Cloudera, Pivotal Hive, Pig Medium VM HA … 50 Hadoop nodes Cloudera Hive, Pig Large VM HA … Templates for Different Cloud Users 15
  16. 16. Big Data needs a Mix of Workloads Other Hadoop batch analysis Compute layer NoSQL Big SQL HBase real-time queries Impala, Pivotal HawQ Cassandra, Mongo, etc Spark, Shark, Solr, Platfora, Etc,… File System/Data Store Platform Virtualization Technology Host 16 Host Host Host Host Host Host
  17. 17. Strong Isolation between Workloads is Key Hungry Workload 1 Reckless Workload 2 Virtualization Platform 17 Nosy Workload 3
  18. 18. Community activity in Isolation and Resource Management !  YARN •  Goal: Support workloads other than M-R on Hadoop •  Initial need is for MPI/M-R from Yahoo •  Non-posix File system self selects workload types !  Mesos •  Distributed Resource Broker •  Mixed Workloads with some RM •  Active project, in use at Twitter •  Leverages OS Virtualization – e.g. cgroups !  Virtualization •  Virtual machine as the primary isolation, resource management and versioned deployment container •  Basis for Project Serengeti 18
  19. 19. Use case: Elastic Hadoop with Tiered SLA •  Production workloads has high priority •  Experimentation workloads has lower priority Experimentation Mapreduce Compute layer Dynamic resourcepool Compute VM Compute VM Experimentation Experimentation Production Mapreduce Compute VM Compute VM Compute VM 19 Compute VM Compute VM Production recommendation engine Production vSphere Data layer Compute VM
  20. 20. Cloud Enabled Auto-elastic Hadoop J T N N T T VHM T T T T J T T T T T DATA VM DATA VM DATA VM ESX ESX ESX Local Disks JT: JobTracker TT: TaskTracker NN: NameNode VHM: Virtual Hadoop Manager 20 T T Hadoop HDFS VMs Hadoop Compute VMs SAN/NAS Non-Hadoop VMs
  21. 21. Hadoop Performance with Virtualization 32 hosts/3.6GHz 8 cores/15K RPM 146GB SAS disks/10GbE/72-96GB RAM (lower is better) [, Jeff Buell, Apr 2013] 21
  22. 22. Network Platform © 2009 VMware Inc. All rights reserved
  23. 23. A Typical Network Architecture Aggrega7ng% Switch% Aggrega7ng% Switch% L2%Switch% L2%Switch% L2%Switch% L2%Switch% Top%of%Rack%Switch% Top%of%Rack%Switch% Top%of%Rack%Switch% Top%of%Rack%Switch% Host% Host% Host% Host% Host% Host% Host% Host% Host% Host% Host% Host% 23 Host% Host% Host% Host%
  24. 24. Traditional Networks: Core Switch is the Choke Point Core Aggregation Network Topology Rack Hosts Hosts Non Uniform Bandwidth 10s of Gbits 100s of Gbits 24 Modeled Bandwidth
  25. 25. Modern Networks: Great for Big Data Spine Network Topology Leaf Hosts Uniform Bandwidth 25 Modeled Bandwidth
  26. 26. Flat Networks Allow for New Infrastructure Models Converged Storage Separated Storage Top%of%Rack%Switch% Top%of%Rack%Switch% Host% Host% Host% Compute Top%of%Rack%Switch% Host% Storage Top%of%Rack%Switch% Host% Storage% Host% Storage% Host% Storage% Host% Storage% Host% Host% Host% Separated Storage Host% Host% Host% 26 Storage%
  27. 27. Storage Platform © 2009 VMware Inc. All rights reserved
  28. 28. Use Local Disk where it’s Needed SAN Storage Local Storage $2 - $10/Gigabyte $1 - $5/Gigabyte $0.05/Gigabyte $1M gets: 0.5Petabytes 200,000 IOPS 8Gbyte/sec 28 NAS Filers $1M gets: 1 Petabyte 200,000 IOPS 10Gbyte/sec $1M gets: 10 Petabytes 400,000 IOPS 250 Gbytes/sec
  29. 29. Storage Economics: Traditional vs. Scale-out $5.50 Traditional SAN/NAS $5.00 $4.50 $4.00 $3.50 $3.00 Cost per GB $2.50 $2.00 $1.50 Distributed Object Storage $1.00 $0.50 $0.5 1 2 4 8 16 Petabytes Deployed Scale-out NAS Isilon 29 32 64 128 HDFS MAPR CEPH Gluster Scality
  30. 30. Big Data Storage Architectures External SAN With HDFS Local Disks With HDFS or Other HDFS, CEPH, MAPR, Gluster, Scality, … 30 External Scale-out NAS
  31. 31. Features from New Storage Solutions Snapshots Universal File Store Clones Geo Replication Erasure Coding NFS Access Posix Support 31 QoS Controls SSD Capability
  32. 32. Summary © 2009 VMware Inc. All rights reserved
  33. 33. Customers Winning from Consolidated Big Data Platforms “Dedicated hardware makes no sense” “Software-defined Datacenter enables rapid deployment multiple tenants and labs” “Our mixed workloads include Hadoop, Database, ETL and App-servers” Compute Management Storage/Availability Network/Security 33 “Any performance penalties are minor”
  34. 34. Cloud Infrastructure is Ready for Big Data – Are you? Cloud Infrastructure 34
  35. 35. Is Your Cloud Ready for Big Data? Richard McDougall CTO, Storage and Application Services @richardmcdougll © 2009 VMware Inc. All rights reserved