VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

585 views

Published on

VMworld 2013

Chris Greer, FedEx
Richard McDougall, VMware

Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
585
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
17
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps

  1. 1. Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps Chris Greer, FedEx Richard McDougall, VMware VAPP5402 #VAPP5402
  2. 2. © 2013 VMware Inc. All rights reserved Beyond Mission Critical: Virtualizing Big-Data, Hadoop and Cloud Apps Richard McDougall CTO, Storage and Application Services Chris Greer, Enterprise Architect, FedEx
  3. 3. 3 Virtualize Everything: Next Generation Apps Virtual Storage Arrays vSphere SAN/NAS Object / BLOB Traditional Applications • Traditional enterprise storage • HW-based resiliency, QoS Next Gen Cloud Apps • Scale out, flash, DAS • Application specific storage All SSD Array Server-side Flash
  4. 4. 4 The complexity enterprise IT and developers face today An Idea for a cool app Spec a server config Justify server costs Procurement process Wait for HW to arrive Wait for IT ops to Image the server Install a Database LOB Architecture approval Central IT Architectural approval Justify more server for scale testing Wait for more HW Configure ACLs and LBs New infrastructures New Languages and Frameworks New Devices and Domains New Data types and requirements
  5. 5. 5 Micro Clouds Cloud Foundry – Announced Today on vSphere Data Services Other Services Msg Services .js Public Clouds Private Clouds
  6. 6. 6 Big Data - Not Just for the Web Giants – Now the Intelligent Enterprise
  7. 7. 7 Real-time analysis allows instant understanding of market dynamics. Retailers can have intimate understanding of their customers needs and use direct targeted marketing. Market Segment Analysis  Personalized Customer Targeting`
  8. 8. 8 The Emerging Pattern of Big Data Systems: Retail Example Real-Time Streams Exa-scale Data Store Parallel Data Processing Real-Time Processing Machine Learning Data Science Cloud Infrastructure
  9. 9. 9 Storage: Plan for Peta-scale Data Storage and Processing 0.01 0.1 1 10 100 1000 2000 2003 2006 2009 2012 2015 Online Apps Analytics PB of Data Analytics Rapidly Outgrows Traditional Data Size by 100x
  10. 10. 10 Unprecedented Scale “Data transparency, amplified by Social Networks generates data at a scale never seen before” - The Human Face of Big Data We are creating an Exabyte of data every minute in 2013 Yottabyte by 2030
  11. 11. 11 A single GE Jet Engine produces 10 Terabytes of data in one hour – 90 Petabytes per year. Enabling early detection of faults, common mode failures, product engineering feedback. Post Mortem  Proactively Maintained Connected Product
  12. 12. 12 Cloud Infrastructure Supports Mixed Big Data Workloads Machine Learning HadoopReal-Time Analytics Cloud Infrastructure Machine Learning Hadoop Real-Time Analytics Management Network/Security Storage/Availability Compute
  13. 13. 13 Cloud Infrastructure Supports Multiple Tenants Cloud Infrastructure Management Network/Security Storage/Availability Compute Web User Analytics Financial Analysis Historical Customer Behavior
  14. 14. 14 Software-defined Datacenter: Compute Agility / Rapid deployment Lower Capex Isolation for resource control and security 1 2 3 Operational efficiency4 Management The Core Values of Virtualization Apply to Big Data Network/Security Storage/Availability Compute
  15. 15. 15 Strong Isolation between Workloads is Key Hungry Workload 1 Reckless Workload 2 Nosy Workload 3 Cloud Infrastructure
  16. 16. 16 Virtualizing Hadoop  Shrink and expand cluster on demand  Independent scaling of Compute and data  Strong multi-tenancy Elasticity & Multi-tenancy  High availability for entire Hadoop stack  One click to setup  Battle-tested High Availability  Rapid deployment  One stop command center  Easy to configure/reconfigure Operational Simplicity
  17. 17. 17 Serengeti Virtual Hadoop Manager (VHM) Hadoop Virtualization Extensions (HVE) Big Data Extensions: Core Components  Core is Open Source  Tool to simplify virtualized Hadoop deployment & operations Serengeti  Virtualization changes for core Hadoop  Contributed back to Apache Hadoop  Advanced resource management on vSphere
  18. 18. 18 Hadoop batch analysis Big Data Family of Frameworks File System/Data Store Host Host Host Host Host Host HBase real-time queries NoSQL Cassandra, Mongo, etc Big SQL Impala, Pivotal HawQ Compute layer Virtualization Host Other Spark, Shark, Solr, Platfora, Etc,…
  19. 19. 19 Traditional Hadoop vs. Elastic Hadoop Scale-out Network Storage Traditional Hadoop: Converged Compute/Storage Elastic Compute Scale-out Network Storage
  20. 20. 20 Management Software-defined Datacenter: Storage Requirements of Next Generation Storage Network/Security Storage/Availability Compute 10x lower cost of storage Handle explosive data growth Support a variety of application types 1 2 3 Solve the privacy and security issues 4
  21. 21. 21 HDFS Model ESX ESX ESX J T HDFS or MAPR VM HDFS or MAPR VM HDFS or MAPR VM Local Disks SAN/NAS Non-Hadoop VMs Hadoop Compute VMs JT: JobTracker TT: TaskTracker NN: NameNode VHM: Virtual Hadoop Manager N N T T T T T T VirtualCenter Management Server DRS DRS DRSDRS DRS VHM Hadoop HDFS VMs T T T T T T J T
  22. 22. 22 Big-Data using Local Disks Host Host Host Host Host Host Host Top of Rack Switch Servers with Local Disks 16-24 core server 12-24 SATA 2-4TB Disks 10 GbE adapter iSCSI/NFS for Shared Storage for vMotion etc,… High Performance 10GBE Switch per Rack
  23. 23. 23 Scale-out Storage for Big Data $- $0.50 $1.00 $1.50 $2.00 $2.50 $3.00 $3.50 $4.00 $4.50 $5.00 $5.50 0.5 1 2 4 8 16 32 64 128 Cost per GB Petabytes Deployed Traditional SAN/NAS Distributed Object Storage HDFS MAPR CEPH Scale-out NAS Isilon, NTAP
  24. 24. 24 Big Data Storage Scale-out Network Storage Elastic Compute Scale-out Network Storage • Hadoop Protocol • Snapshots • Posix Apps • Full NFS Access • Replication • Erasure Coding
  25. 25. 25 Big Data with Scale-out-NAS Big-Data using Scale-out NAS Host Host Host Host Host Host Top of Rack Switch Scale-out NAS Host Host Host Host Host Host Top of Rack Switch Scale-out NAS Temp Data Shared Data Isilon Scale-out NAS Local Disk or SSD In each Host For Transient Data
  26. 26. 26 Chris Greer, FedEx Services
  27. 27. 27 Breakthrough Use Cases  Web Log Analysis  Initial exploration was around detection of mobile devices accessing the website.  Analysis of 570 billion web server log entries took approximately 9 minutes to complete on a small cluster.  ZIP code Analysis  Analysis of data to determine which ZIP codes are the highest source or destination for shipments.  Shipment Analysis  Analysis of shipment information to determine patterns that may delay a package.
  28. 28. 28 Agile Big Data at FedEx • Trusted Isolation • Well known auditable platform Security • Deploy in minutes • Optimize for shift in workload characteristics Agility • Create true multi- tenancy • Mixed workloads Elasticity
  29. 29. 29 Hadoop Service at FedEx: vSphere + Isilon Storage Scale-out Isilon Cluster - Shared Data - NAS + Hadoop Elastic vSphere Cluster - Mixed Workloads - vSphere - Existing Rack Mount Servers
  30. 30. 30 Agility: Automation of Hadoop Cluster Management Deploy Resize Elastic scaling Customize Incorporate best practices Manage Tune configuration Run Execute jobs Access HDFS
  31. 31. 31 Monitoring Agility: Ease of Management Due to Consolidation Cluster setup and provisioning Monitoring HW procurement and sizing Cluster setup and provisioning HW procurement and sizing
  32. 32. 32 Elasticity: Mixed Workloads on a Shared Platform Production Test Experimentation Dept A: Marketing Dept B: Operations Production Test Experimentation Log files Social dataTransaction data Historical data  Common Infrastructure  Common Infrastructure can be shared by multiple logical Hadoop clusters and prioritized with VMWare resource pools.  Data Segregation  Data that should not be shared can be kept separate and leverage VMWare security controls for isolation.
  33. 33. 33 Security  Known Security Model • VMs provide the required levels of Isolation for different workloads  Trusted Auditable Platform • Leverage virtualization as the platform • Known to auditors • Accepted as a valid deployment model
  34. 34. 34 Summary
  35. 35. 35 Customers Winning from Consolidated Big Data Platforms “Dedicated hardware makes no sense” “Software-defined Datacenter enables rapid deployment multiple tenants and labs” “Our mixed workloads include Hadoop, Database, ETL and App-servers” “Any performance penalties are minor”Management Network/Security Storage/Availability Compute
  36. 36. 36 Q&A
  37. 37. 37 Other VMware Activities Related to This Session  HOL-SDC-1309 - vSphere Big Data Extensions  VAPP5484 – Big Data Extensions Advanced Features  VAPP5626 – Big Data Panel
  38. 38. THANK YOU
  39. 39. Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale Apps Chris Greer, FedEx Richard McDougall, VMware VAPP5402 #VAPP5402

×