Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
A modern, flexible approach to Hadoop implementation
HPE Big Data Reference Architecture
Gilles Noisette
HPE EMEA Big Data...
Agenda
• Big Data IT infrastructure trends
• Hadoop Evolution & Architecture trends
• Hadoop YARN Labelling
• Hadoop Stora...
IT infrastructures must evolve to handle Big Data demands
• Multiple silos with multiple copies
of the same data
• Difficu...
The Analytic Cycle
The Pace of Change
The Pace of Change
And how people are buying Hadoop is changing also….
Hadoop YARN Labelling
Running applications on particular set of nodes
YARN Labelling (Node-labels / Hadoop 2.6 / jira YARN...
YARN Labels are used in production
YARN Labelling case studies
Vinod Vavilapalli – @Tshooter
Yahoo! uses machines with GPU...
Hadoop Storage tiering
Hadoop Architecture trends
HDFS Tiering / Heterogeneous Storage Tiers (HDFS-2832)
Allows a single c...
Ebay use Tiered Storage for its Hadoop cluster
HDFS Tiering case study
 40 PB / 2000 nodes cluster was getting full
HDFS ...
Hadoop gets asymmetric
but I thought we were taking the work to the data…
B
App
L1 L1 L1
Isolate
A A A
nodes
labels
Hot
Al...
New complementary approach to address Big Data demands
Storage Optimized Servers
Benefits of HPE Big Data Reference Architecture
HPE Moonshot and Apollo servers address a variety of enterprise big data n...
DFSIO testing on Big Data Reference architecture
Better numbers with optimized IO Servers for HDFS
HPE Big Data Reference Architecture
Hadoop and its ecosystem take advantage of the BDRA
17
Ethernet
Network Switches
East ...
HPE Hadoop Traditional vs HPE Big Data Reference Architecture
2X Hadoop MapReduce performance with the same footprint
2.5X...
1.5PB configuration example
Comparable Hadoop performance and raw compute (SpecInt) power
Compared to 2U rackmount BDRA
Ac...
HOT COLD
Independent scaling of compute and storage
[ HPE ProLiant DL380 Gen9 ] vs [ HPE Moonshot for Computing + HPE ProL...
HPE BDRA Components
24
Hadoop performance density > 2 times better - Power consumption = 0.5
HPE Big Data Reference Architecture
Scale-Out Buildi...
HPE Moonshot 1500
28
2 internal switches
45 hot-plug cartridges
• 1-node = 45 servers in a chassis
• 4-nodes =180 servers ...
Big Data Compute Node
Big data Storage Node
HPE Apollo 4200 - Bringing Big Data storage server density to enterprise
Big data Storage Node for Backup or Archival
HPE Apollo 4510 - Very High density Big Data storage server
Scalable density
...
HPE BDRA in a Virtualized context
Usage example
33
HPE BDRA used for multi-tenancy or Hadoop as a Service
Multi-tenancy or Hadoop as a service, are made easier when separati...
VMDK
HPE BDRA used in a fully Elastic Virtualized environment
Compute and Storage nodes are virtualized in a different man...
Summarizing &
HPE Big Data Architecture long term view
37
HPE Big Data Reference Architecture
– The HPE BDRA is a complementary Hadoop reference Architecture that brings
• Elastici...
HPE BDRA Optimized Compute & Storage nodes
Support multiple compute and storage blocks
Converged Infrastructure benefits for Big Data
Hadoop Node Labels feature (jira YARN-796)
• Combined with the HPE Big Data...
HPE BDRA CI for Big Data long term view
Evolve to support multiple compute and storage blocks
Multi-temperate Storage usin...
Thankyou!
Learn more on how your organization can benefit from
HPE Big Data Reference Architecture
HPE Big Data Reference ...
Upcoming SlideShare
Loading in …5
×

Key trends in Big Data and new reference architecture from Hewlett Packard Enterprise / Gilles Noisette (Hewlett Packard)

2,035 views

Published on

Динамичное развитие инструментов для обработки Больших Данных порождает новые подходы к повышению производительности. Ключевые новые технологии в Hadoop 2.0, такие как Yarn labeling и Storage Tiering, уже используются компаниями Yahoo и Ebay. Эти новые технологии открывают путь для серьезного повышения эффективности ИТ-инфраструктуры для Hadoop, достигая прироста производительности в несколько десятков процентов при одновременном снижении потребления памяти и электроэнергии.

Эталонная архитектура для Hadoop от HP — HP Big Data Reference Architecture — предлагает использование специализированных "микросерверов" HP Moonshot вкупе с высокоплотными узлами хранения HP Apollo для достижения лучших на сегодня показателей полезной отдачи от железа в Hadoop.

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Key trends in Big Data and new reference architecture from Hewlett Packard Enterprise / Gilles Noisette (Hewlett Packard)

  1. 1. A modern, flexible approach to Hadoop implementation HPE Big Data Reference Architecture Gilles Noisette HPE EMEA Big Data Center Of Excellence November 2015
  2. 2. Agenda • Big Data IT infrastructure trends • Hadoop Evolution & Architecture trends • Hadoop YARN Labelling • Hadoop Storage Tiering • New HPE Architecture approach to Big Data • HPE Big Data Reference Architecture • Scaling Hadoop more efficiently • HPE BDRA Components • HPE BDRA in a virtualized context • HPE Big Data Architecture long term view
  3. 3. IT infrastructures must evolve to handle Big Data demands • Multiple silos with multiple copies of the same data • Difficult to standardize on a consistent server architecture • Less elastic than other virtualized or converged infrastructure • Large scale makes density, cost and power problematic Challenges
  4. 4. The Analytic Cycle
  5. 5. The Pace of Change
  6. 6. The Pace of Change And how people are buying Hadoop is changing also….
  7. 7. Hadoop YARN Labelling Running applications on particular set of nodes YARN Labelling (Node-labels / Hadoop 2.6 / jira YARN-796) Capability to create groups of similar nodes to run different types of applications with different workload, each, on the most appropriate group of node • Admin tags nodes with labels (e.g.: GPU, Storm) − One node can have more than one label (e.g.: GPU, m710) • Applications can include labels in container requests Enabling the next Generation of Hadoop Applications . . . NodeManager [Storm] Application Master I want a GPU NodeManager [GPU, m710] HPE Moonshot cartridge NodeManager [Analytic, XL170r] HPE Apollo blades
  8. 8. YARN Labels are used in production YARN Labelling case studies Vinod Vavilapalli – @Tshooter Yahoo! uses machines with GPUs on #Hadoop clusters (#YARN) to model 'beautiful' images on Flickr. #hadoopsummit 1:43 AM - 16 Apr 2015 Vinod Vavilapalli – @Tshooter .@pcnudde talking about #Yahoo using custom #Hadoop #YARN apps together with Node labels / High CPU machines for learning. #hadoopsummit 1:49 AM - 16 Apr 2015 Yahoo uses YARN labels eBay cluster use YARN labels to • Separate Machine Learning workloads from regular workloads • Separate licensed software to some machines • Enable GPU workloads • Separate organizational workloads Mayank Bansal, ebay
  9. 9. Hadoop Storage tiering Hadoop Architecture trends HDFS Tiering / Heterogeneous Storage Tiers (HDFS-2832) Allows a single cluster to have multiple storage tiers such as ARCHIVE, DISK, SSD, RAM-disk. Awareness of storage media allow HDFS to make better decisions about the placement of block data with input from applications. Distribution of replicas could be based on its performance and durability requirements. • Phase2: –HDFS-5682 - Application APIs for heterogeneous storage –HDFS-7228 - SSD storage tier –HDFS-5851 - Memory as a storage tier HDFS Archival Storage Design (HDFS-6584) – Introduces a new concept of storage policies. For accommodating future storage technology and different cluster characteristics, cluster administrators will be able to modify the predefined storage policies and/or define custom storage policies. – Data policy names : Very Hot  Hot  Warm  Luke Warm  Cold
  10. 10. Ebay use Tiered Storage for its Hadoop cluster HDFS Tiering case study  40 PB / 2000 nodes cluster was getting full HDFS Tiering features • Data reside on same cluster in a standard HDFS • Data could easily move back and forth, to and from, the Archive • Tiered storage is operated using storage types and storage policies • Archival policy is based on access pattern – Antony Benoy, ebay 40 PB / 2000 nodes DISK 10 PB / 48 nodes ARCHIVAL HDFS
  11. 11. Hadoop gets asymmetric but I thought we were taking the work to the data… B App L1 L1 L1 Isolate A A A nodes labels Hot All replicas on DISK Warm 1 replica on DISK, others on ARCHIVE Cold All replicas on ARCHIVE Hadoop cluster DISK DISK DISK DISK DISK DISK DISK DISK DISK ARCHIVE ARCHIVE ARCHIVE ARCHIVE ARCHIVE ARCHIVE ARCHIVE ARCHIVE ARCHIVE Yarn Labels Allows applications running in yarn containers to be constrained to designated nodes in the cluster HDFS Tiering Allows the creation of pools of storage for SSD, HDD and Archive, RAM-disk, leveraging different server configurations What about Data Locality ?
  12. 12. New complementary approach to address Big Data demands Storage Optimized Servers
  13. 13. Benefits of HPE Big Data Reference Architecture HPE Moonshot and Apollo servers address a variety of enterprise big data needs Cluster consolidation Multiple big data environments can directly access a shared pool of data Flexibility to scale Scale compute and storage independently Maximum elasticity Rapidly provision compute without affecting storage Breakthrough economics Significantly better density, cost and power through workload optimized components
  14. 14. DFSIO testing on Big Data Reference architecture Better numbers with optimized IO Servers for HDFS
  15. 15. HPE Big Data Reference Architecture Hadoop and its ecosystem take advantage of the BDRA 17 Ethernet Network Switches East - West Networking Impala
  16. 16. HPE Hadoop Traditional vs HPE Big Data Reference Architecture 2X Hadoop MapReduce performance with the same footprint 2.5X HBase performance with the same footprint Note: Comparison configuration is ProLiant DL380 Gen9 servers 2 x Higher Density 2.4 x Memory Density 46% Less Power (Watts) Traditional architecture Big Data Reference Architecture versus
  17. 17. 1.5PB configuration example Comparable Hadoop performance and raw compute (SpecInt) power Compared to 2U rackmount BDRA Acquisition cost 3% lower Power 54% lower Density (total rack U) 2x density 5 year power/cooling savings (assume $.20/kWh) $472K
  18. 18. HOT COLD Independent scaling of compute and storage [ HPE ProLiant DL380 Gen9 ] vs [ HPE Moonshot for Computing + HPE ProLiant Apollo 4200 for Storage ] HPE Big Data Reference Architecture Traditional Architecture 2.8x compute 97% of the storage capacity 4x the memory 1.6x compute 1.5x the storage capacity 2.5x the memory 90% of the compute 2.1x the storage capacity 1.5x the memory
  19. 19. HPE BDRA Components 24
  20. 20. Hadoop performance density > 2 times better - Power consumption = 0.5 HPE Big Data Reference Architecture Scale-Out Building blocks HPE Apollo Scalable System Storage optimized servers Cost-effective industry standard storage server purpose built for big data with converged infrastructure that offers high density energy- efficient storage HPE Network Switches East – West Networking HPE Moonshot System with 45 x m710 Compute nodes HPE Apollo 2200 with 4 x XL170r Gen9 High Compute nodes Compute optimized servers Front Rear
  21. 21. HPE Moonshot 1500 28 2 internal switches 45 hot-plug cartridges • 1-node = 45 servers in a chassis • 4-nodes =180 servers in a chassis • HP Moonshot-45G (45 x1Gb port) • HP Moonshot-180G (180 x1Gb port) • HP Moonshot-45XG (45 x10Gb port) Web-cache 64-bit ARM m400 Remote PCs XenDesktop m700 Big Data, Hadoop Video transcoding m710p Real-time analytics Telecom, finance m800 Web-hosting 180 servers in 4.3U m350 Full WEB-infrastructure in a single chassis Dedicated hosting m300 45 Hadoop Low-power Hadoop compute nodes per enclosure !
  22. 22. Big Data Compute Node
  23. 23. Big data Storage Node HPE Apollo 4200 - Bringing Big Data storage server density to enterprise
  24. 24. Big data Storage Node for Backup or Archival HPE Apollo 4510 - Very High density Big Data storage server Scalable density Lower TCO Workload optimized Rack-scale storage server density Up to 5.44 PB in 42U rack Rack-scale extreme density – 5.44 PB per Rack! Cost effective 68 LFF HDDs/SSDs in 4U server chassis for low-cost, power & space efficient solutions Configuration flexibility Balance capacity, cost and throughput with flexible options for disks, CPUs , I/O and interconnects
  25. 25. HPE BDRA in a Virtualized context Usage example 33
  26. 26. HPE BDRA used for multi-tenancy or Hadoop as a Service Multi-tenancy or Hadoop as a service, are made easier when separating the data processing service and the storage management service as it brings Often based on a Virtualized environment – Better workload isolation between YARN applications – More flexibility by scaling compute and storage independently – Full elasticity on the computing side – Rapidly provision and decommission compute without affecting storage
  27. 27. VMDK HPE BDRA used in a fully Elastic Virtualized environment Compute and Storage nodes are virtualized in a different manner 363PARF400 3PARF400 3PARF400 VMDK VMDK Ext4 Ext4 Ext4 Hadoop DataNode Virtualization Hosts 3PARF400 3PARF400 3PARF400 3PARF400 3PARF400 3PARF400 HadoopComputeNode HadoopComputeNode HadoopComputeNode HadoopComputeNode VMDKExt4 HostVM BDRAStorageNode BDRAComputeNodes
  28. 28. Summarizing & HPE Big Data Architecture long term view 37
  29. 29. HPE Big Data Reference Architecture – The HPE BDRA is a complementary Hadoop reference Architecture that brings • Elasticity  extreme elasticity brought to Hadoop • Flexibility  adaptive architecture that makes IT more responsive • Efficiency  scale compute and storage independantly – It takes advantage of new Hadoop trends and features like • Hadoop YARN Labels • Hadoop HDFS Tiering – The target customers are • Mature Hadoop customers who want to consolidate clusters • People who need virtualization, multi-tenancy, Elasticity or want to build a smart Data Lake • People who want to optimize the density and the power consumption (breakthrough economics) – The BDRA works with fully standard Hadoop stacks (no patches, not proprietary) • Cloudera Enterprise 5 • Hortonworks Data Platform 2 • MapR M5
  30. 30. HPE BDRA Optimized Compute & Storage nodes Support multiple compute and storage blocks
  31. 31. Converged Infrastructure benefits for Big Data Hadoop Node Labels feature (jira YARN-796) • Combined with the HPE Big Data Reference Architecture, compute nodes can be dynamically assigned as there is no need for data repartitioning • HPE contributed IP into the Hadoop trunk, working with Hortonworks • Allows scheduling of YARN containers to specific pools of nodes
  32. 32. HPE BDRA CI for Big Data long term view Evolve to support multiple compute and storage blocks Multi-temperate Storage using HDFS Tiering and ObjectStores Workload Optimized compute nodes to accelerate various big data software
  33. 33. Thankyou! Learn more on how your organization can benefit from HPE Big Data Reference Architecture HPE Big Data Reference Architecture: Overview HPE Big Data Reference Architecture: Hortonworks implementation HPE Big Data Reference Architecture: Cloudera implementation HPE Big Data Reference Architecture: MapR implementation Running HBase on the HPE Big Data Reference Architecture http://www.hpe.com/go/hadoop

×