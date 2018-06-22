Successfully reported this slideshow.
How to Use Flash in Hadoop 3 Agenda  Major HDP advances: Erasure Coding, GPU Resource Pooling, etc.  Three use cases wit...
How to Use Flash in Hadoop 4 Today’s Presenters Saumitra Buragohain Senior Director, Product Management, Hortonworks Santa...
What’s New in HDP 3.x Faster, Smarter, Hybrid Data
Market Drivers Faster time to deployment (Containerized Micro-Services) AGILITY App 1 App 2 Infinitely Scalable (Billions ...
HDP 3.0 Faster, Smarter, Hybrid Data
HDP 3.0 Our At-Rest Platform for Global Data Management Hortonworks Data Platform HDP Core Enterprise Data Warehouse Secur...
Use Cases Performance Solid State Drive Capacity Solid State Drive Archive Access to Data 0days 30days 90days Forever Prob...
Micron Technology Both a User of HDP Hive, and a Solution Provider
Micron Performed an HDP 3.0 beta evaluation 11 Problem Can solid state drives (SSDs) benefit the Hadoop ecosystem without ...
Test Configuration for HDP 3.0 beta Evaluation 12 Hardware (All-HDD) 4x Node Supermicro SYS-2028-TNRT+ 12x 15K HDDs (300GB...
up to 2.5x query performance improvements Long queries w/ shuffle & large aggregation yielded high gains w/ SSDs TPC-DS Hi...
Question: What’s the perf gain? (withasingle SSD per node) WaitIO_%CPU 15K HDDs: Bottleneck is disk I/O NVMe SSD: No disk ...
Micron Technology IS Real-World Uses of HDP, Hive in Manufacturing
16 2 Clusters 506 Data Nodes 28PB HDFS Storage 1984 Logical CPUs 179TB Memory 64,222 Hive Tables 576K Jobs/Day 10M Hive Pa...
LLAP Architecture 17 Benefits of LLAP Architecture include: ▪ LLAP uses persistent Daemons to avoid long startup times and...
Cache Options 18 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 Cost Per TB1 Max Sustainable Transfer Rate (MB/s) Aver...
HDP LLAP- Test Configuration 19 hive.llap.daemon.num.executors: 12 hive.llap.io.thread: 12 hive.llap.daemon.yarn.container...
20 TCP-DS 1TB Results (Normalized) 3.7xPerformance Increase with LLAP + NVMe on avg 2.6xPerformance Increase with LLAP on ...
Simple Scale Out Model 21 How much would be gained by adding NVMe SSD to existing nodes? 200-Node Cluster + NVMe SSD Cache...
22 Thank you for joining us. Visit with Hortonworks & Micron in the Exhibit Hall. Hortonworks.com Micron.com/bigdata
How to use flash drives with Apache Hadoop 3.x: Real world use cases and proof points—better results, better economics
Apache Hadoop 3.x ushers in major architectural advances such as erasure coding in HDFS, containerized workload flexibility, GPU resource pooling and a litany of other features. These enhancements help drive real benefits when combined with high-speed, high-capacity solid state drives (SSDs).

Micron is a user of Apache Hadoop as well as an innovator in next-gen IT architecture, pushing the envelope on flash storage with the latest 3D NAND. Micron labs have benchmarks showing that adding a single SSD to existing HDD-based cluster nodes can deliver 200% faster Hadoop, at a fraction of the cost of new nodes.

In this session, Micron and Hortonworks will show real-world results demonstrating the tangible benefits of Apache Hadoop 3.x combined with the latest in non-volatile storage and an updated IT infrastructure with NVMe™ solid state drives in well-designed platforms. We will explore specific workloads and application acceleration by combining Apache Hadoop 3.x with SSDs to build analytics platforms that provide a sustainable competitive advantage for many applications to deliver a combination of low latency, high-performance active archives with better results and reduced storage overhead.

Saumitra Buragohain, Hortonworks, Sr. Director, Product Management
Mike Cunliffe, Micron Technology, Data Management Architect

  1. 1. ©2018 Micron Technology, Inc. All rights reserved. Information, products, and/or specifications are subject to change without notice. All information is provided on an “AS IS” basis without warranties of any kind. Statements regarding products, including regarding their features, availability, functionality, or compatibility, are provided for informational purposes only and do not modify the warranty, if any, applicable to any product. Drawings may not be to scale. Micron, the Micron logo, and all other Micron trademarks are the property of Micron Technology, Inc. All other trademarks are the property of their respective owners. Real-world Use Cases and Proof Points Saumitra Buragohain, Hortonworks Mike Cunliffe, Micron Technology Greg Kincade, Micron Technology How to Use Flash in Hadoop
  2. 2. Disclaimer 2  This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed.  Technical feasibility, market demand, user feedback, and the Apache Software Foundation community development process can all effect timing and final delivery.  This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks or Micron to deliver these features in any generally available product.  Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.  Since this document contains an outline of general product development plans, customers should not rely upon it when making a purchase decision.
  3. 3. How to Use Flash in Hadoop 3 Agenda  Major HDP advances: Erasure Coding, GPU Resource Pooling, etc.  Three use cases with flash storage in Hadoop environment Welcome and introductions New in HDP 3.x Micron SBU: Real-world testing on HDP Micron IS: Hive running on flash storage Q&A
  4. 4. How to Use Flash in Hadoop 4 Today’s Presenters Saumitra Buragohain Senior Director, Product Management, Hortonworks Santa Clara, California Manages the Hadoop Data Products solution Mike Cunliffe Lead Data Architect, IS Group Micron Technology Boise, Idaho Oversees precision manufacturing & Big Data analysis for Micron fabs Greg Kincade Senior Ecosystem Enablement Program Manager Micron Technology, Storage Business Unit Austin, Texas Develops and manages Micron’s enterprise flash storage product portfolio
  5. 5. What’s New in HDP 3.x Faster, Smarter, Hybrid Data
  6. 6. Market Drivers Faster time to deployment (Containerized Micro-Services) AGILITY App 1 App 2 Infinitely Scalable (Billions of files, Exabytes) Low TCO (Less Storage Overhead) SCALE DEEP LEARNING & GPU Deep Learning frameworks (TensorFlow, Caffe) REAL-TIME EDW One SQL Layer (Across Historical, Real-time) SECURED & GOVERNED Data Swamp->Data Lake Bring ISVs (Hadoop eco-system) ISV STRATEGY IBM DSX 3rd Party App GPU Pooling/Isolation Data Science EDW Security & Governance Operational Data Store HDP Core Release Agility (De-coupled HDP Components)
  7. 7. HDP 3.0 Faster, Smarter, Hybrid Data
  8. 8. HDP 3.0 Our At-Rest Platform for Global Data Management Hortonworks Data Platform HDP Core Enterprise Data Warehouse Security Governance [1] HDP 2.6 – Shows current Apache branches being used. Final component version subject to change based on Apache release process. [2] Spark 1.6.3+ Spark 2.1 – HDP 2.6 supports both Spark 1.6.3 and Spark 2.1 as GA. [3] Hive 2.1 is GA within HDP 2.6. [4] Apache Solr is available as an add-on product HDP Search. [5] Spark 2.2 is GA 0.9.0 0.6.0 2.4.0 3.4.61.7.04.7.00.7.0 1.2.1+ 2.1[3]0.16.04.2.0 1.6.2+ 2.0[2] 1.1.20.6.0 Ongoing Innovation in Apache HDP 2.6.4[1] Q4 2017 0.10.1 0.12.0 0.7.0 2.6.1 3.4.6 1.5.2 1.5.2 0.90 0.901.7.04.7.0 0.10.0 0.10.00.7.0 1.2.1+ 2.1[3]0.16.04.2.0 1.6.3+ 2.2 [5] 1.1.20.7.3 HDP 3.0.0 Q3 2018 0.12.0 1.0.0 1.0.0 2.7.0 0.10.0 0.10.1 1.0 3.4.6 0.7.0 0.8.0 1.0.01.7.05.0.0 1.0.1 1.1.0 1.2.10.9.13.0.00.16.04.3.1 2.3 2.0.00.8.0 Data Science Operational Data Store 5.5.1 5.5.1[4] 7.0 HDP SearchOperations Stream Processing 1.4.6 1.4.6 1.4.7 1.2.0 1.16 Ongoing Innovation in Apache 2.7.3 2.7.3 3.1.0 Removed/Moved Components Solr HDP 2.6.5 Q2 2018 0.10.1 0.12.0 0.7.0 2.6.1 3.4.6 1.5.2 0.901.7.04.7.0 0.10.00.7.0 1.2.1+ 2.1[3]0.16.04.2.0 1.6.3+ 2.3 1.1.20.7.3 1.0.00.8.0 1.1.0 5.5.1[4]1.4.61.2.02.7.3 0.91.0 0.92.0 0.92.0 HDP 2.5 Aug 2016
  9. 9. Use Cases Performance Solid State Drive Capacity Solid State Drive Archive Access to Data 0days 30days 90days Forever ProbabilityofReuse 100% 0% S3 Performance Solid State Drive Performance Solid State Drive Tiering
  10. 10. Micron Technology Both a User of HDP Hive, and a Solution Provider
  11. 11. Micron Performed an HDP 3.0 beta evaluation 11 Problem Can solid state drives (SSDs) benefit the Hadoop ecosystem without wrecking the $/GB profile? Approach Introduce a single TLC* SSD per node to accelerate application performance. Baseline Test Hortonworks cluster nodes with 15K hard disk drives running Hive as application Comparison Test Same cluster with a single Micron NVMe TLC SSD added per node for caching YARN local/log directories to accelerate Hive queries *TLC = triple level cell; a common high density SSD architecture
  12. 12. Test Configuration for HDP 3.0 beta Evaluation 12 Hardware (All-HDD) 4x Node Supermicro SYS-2028-TNRT+ 12x 15K HDDs (300GB) 256GB Micron DDR4-2666 DRAM 2x Broadcom 25GbE Dual port NICs Software RHEL 7.5 (7.5.1804) Hortonworks HDP 3.0 (beta) Hive / TPC-DS (94 queries) Test Methodology NVMe SSDs used for YARN cache 2:1 ratio of data to memory (2TB database used with 822GB total cluster memory) Hardware (HDD-SSD hybrid) 4x Node Supermicro SYS-2028-TNRT+ 12x 15K HDDs (300GB) 1x Micron 9200MAX NVMe SSD (3.4TB) 256GB Micron DDR4-2666 DRAM 2x Broadcom 25GbE Dual port NICs
  13. 13. up to 2.5x query performance improvements Long queries w/ shuffle & large aggregation yielded high gains w/ SSDs TPC-DS Hive – Longest Running Queries 13 NVMe YARN Cache has Max Impact on Queries that Create Shuffle RunDAG 15K HDD NVMe 15K HDD NVMe 15K HDD NVMe 15K HDD NVMe 15K HDD NVMe 15K HDD NVMe Query 16 Query 67 Query 23 Query 94 Query 14 Query 75 2,981 Sec 3,593 Sec 2,596 Sec 2,316 Sec 1,436 Sec 9,950 Sec 5,338 Sec 1,579 Sec 1,730 Sec1,406 Sec 1,781 Sec 701 Sec 186% 256% 172% 164% 130% 205%
  14. 14. Question: What’s the perf gain? (withasingle SSD per node) WaitIO_%CPU 15K HDDs: Bottleneck is disk I/O NVMe SSD: No disk I/O bottleneck 14 Answer: 1.7x perf of all-15K HDD cluster. Greater perf multiplier expected whencompare is toall-7.2K HDD cluster Highest Gain Lowest Gain Greater than 140% Greater than 150% Greater than 175% Greater than 200% Average Percent 256% 96% 53 queries 45 queries 14 queries 6 queries 172%
  15. 15. Micron Technology IS Real-World Uses of HDP, Hive in Manufacturing
  16. 16. 16 2 Clusters 506 Data Nodes 28PB HDFS Storage 1984 Logical CPUs 179TB Memory 64,222 Hive Tables 576K Jobs/Day 10M Hive Partitions Micron’s Hadoop Environment for Semiconductor Manufacturing
  17. 17. LLAP Architecture 17 Benefits of LLAP Architecture include: ▪ LLAP uses persistent Daemons to avoid long startup times and deliver fast SQL. ▪ LLAP shares its cache among all SQL users, maximizing the use of this scarce resource. ▪ LLAP is 100% compatible with existing Hive SQL and Hive tools Query Coordinators HiveServer2 (Query Endpoint) Coord- inator Coord- inator Coord- inator LLAP Daemon Query Executors YARN Cluster Deep Storage HDFS and Compatible S3 WASB Isilon LLAP Daemon Query Executors LLAP Daemon Query Executors LLAP Daemon Query Executors SQL Queries OBDC/JDBC In-Memory/NVMe Cache (Shared Across All Users)
  18. 18. Cache Options 18 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 Cost Per TB1 Max Sustainable Transfer Rate (MB/s) Average Latency (ns) Cost Per TB1 Max Sustainable Transfer Rate (MB/s) Average Latency (ns) Cost Per TB1 Max Sustainable Transfer Rate (MB/s) Average Latency (ns) HDD SATA (Seagate 7200 RPM) NVMe SSD (Micron 9200) Memory (Crucial) 40x Latency 16x Throughput 14x Cost 300,000x Latency 89x Throughput 390x Cost 1. Retail cost from CDW and Crucial; Seagate ST6000NM0235 6TB 7200 RPM
  19. 19. HDP LLAP- Test Configuration 19 hive.llap.daemon.num.executors: 12 hive.llap.io.thread: 12 hive.llap.daemon.yarn.container.mb: 106496 llap_heap_size: 78643 llap_headroom_space: 12288 hive.llap.io.memory.size: 12244 Memory Cache hive.llap.daemon.num.executors: 12 hive.llap.io.thread: 12 hive.llap.daemon.yarn.container.mb: 106496 llap_heap_size: 78643 llap_headroom_space: 12288 hive.llap.io.memory.size: 1048576 hive.llap.io.allocator.mmap: TRUE hive.llap.io.allocator.mmap.path: /opt/hadoop/cache/01/hadoop/hive/llap yarn.nodemanager.local-dirs: /opt/hadoop/cache/01/hadoop/yarn/local NVMe Cache 3.2TBNVMe Storage/Node (1 Drive) 72TBSATA 7200RPM Storage/Node (12 Drives) 40CPUs/NODE (Logical) 256GBMemory/Node 10Data Nodes
  20. 20. 20 TCP-DS 1TB Results (Normalized) 3.7xPerformance Increase with LLAP + NVMe on avg 2.6xPerformance Increase with LLAP on avg
  21. 21. Simple Scale Out Model 21 How much would be gained by adding NVMe SSD to existing nodes? 200-Node Cluster + NVMe SSD Cache The Equivalent of 80Additional Nodes
  22. 22. 22 Thank you for joining us. Visit with Hortonworks & Micron in the Exhibit Hall. Hortonworks.com Micron.com/bigdata
