Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
© 2017 IBM CorporationSept 2017
Breaching the 100TB mark with
SQL over Hadoop
Analytics Performance
.1 Highlights
L’s lead...
© 2017 IBM Corporation2
Acknowledgements and Disclaimers
Availability. References in this presentation to IBM products, pr...
© 2017 IBM Corporation3
Big SQL is the only SQL-on-Hadoop
solution to understand SQL syntax from
other vendors and product...
© 2017 IBM Corporation4
Want to modernize
your EDW without
long and costly
migration efforts
Offloading historical
data fr...
© 2017 IBM Corporation5
Here’s How Big SQL Addresses These Challenges
 Compatible with Oracle, Db2 & Netezza SQL syntax
...
© 2017 IBM Corporation6
IBM’s Big SQL Preserves Open Source Foundation
Leverages Hive metastore and storage formats.
No Lo...
© 2017 IBM Corporation7
Big SQL queries heterogeneous systems in a single query - only SQL-on-Hadoop that virtualizes more...
© 2017 IBM Corporation8
 Easy porting of enterprise applications
 Ability to work seamlessly with Business Intelligence ...
© 2017 IBM Corporation9
BRANCH_A FINANCE
(security admin)BRANCH_B
Role Based Access Control
enables separation
of Duties /...
© 2017 IBM Corporation10
Themes
Integration
Performance
Usability &
Serviceability
Enterprise,
Governance &
Security
Bring...
© 2017 IBM Corporation11
• At a High Level:
• Bi-directional Spark integration allows you to
run Spark jobs from Big SQL
•...
© 2017 IBM Corporation12
• At a High Level:
• Launch multiple Big SQL workers on a node
using Turbo Boost technology for S...
© 2017 IBM Corporation13
• At a High Level:
• Oracle compatibility for application portability
• Ranger integration provid...
© 2017 IBM Corporation14
• At a High Level:
• UI driven simple install with a few clicks
• Data loaded for immediate use
•...
© 2017 IBM Corporation15
Right Tool for the Right Job
Not Mutually Exclusive. Hive, Big SQL & Spark SQL can co-exist and c...
© 2017 IBM Corporation16
To Summarize - Core Themes of Big SQL
SQL Compatibility
•Understands different SQL
dialects
•Reus...
© 2017 IBM Corporation17
Hadoop-DS @ 100TB
Breaching the 100TB mark: The Environment
F1 ClusterLoad
Single
Stream
4-Streams
© 2017 IBM Corporation18
About the Hadoop-DS Workload
Aim: To provide the fairest and most meaningful comparison of SQL
o...
© 2017 IBM Corporation19
What is TPC-DS?
 TPC = Transaction Processing Council
 Non-profit corporation (vendor independe...
© 2017 IBM Corporation20
SO WHAT DOES IT TAKE TO BREACH 100TB
F1 CLUSTER: DESIGNED FOR SPARK
HARDWARE
28
LENOVO x3650 M5 N...
© 2017 IBM Corporation21
5. HDFS Rebalance
Balance data across nodes to reduce
uneven load
3. MR2 Tuning (for data load)
7...
© 2017 IBM Corporation22
100TB Hadoop-DS is BIGdata
© 2017 IBM Corporation23
100TB Database Build
Parquet (with compression) was chosen as the storage
format for Big SQL and...
© 2017 IBM Corporation24
Query compliance through the scale factors
 Spark SQL has made impressive strides since v1.6 to
...
© 2017 IBM Corporation25
Performance: Single Stream Run
 Single stream run represents a power run
 Interesting engineeri...
© 2017 IBM Corporation26
Performance: 4 concurrent streams
 Multi-stream query execution most closely represents real-lif...
© 2017 IBM Corporation27
CPU Profile for Big SQL vs. Spark SQL
Hadoop-DS @ 100TB, 4 Concurrent Streams
Spark SQL uses almo...
© 2017 IBM Corporation28
Big SQL vs Spark SQL Memory Consumption
Hadoop-DS @ 100TB, 4 Concurrent Streams
 Big SQL is only...
© 2017 IBM Corporation29
I/O Activity: 4-streams.
 Indicates Spark SQL needs to do
more I/O to complete the
workload, but...
© 2017 IBM Corporation30
PERFORMANCE
Big SQL 3.2x faster
4 concurrent query streams.
HADOOP-DS @ 100TB: SUMMARY
WORKING QU...
© 2017 IBM Corporation31
Lessons Learned: General!
 Building a brand-new cluster from the ground up is tough!
 Full stac...
© 2017 IBM Corporation32
Lessons Learned: Spark SQL!
 Spark SQL has come a long way, very quickly. BUT…
 Success at lowe...
© 2017 IBM Corporation33
Lessons Learned: Big SQL!
 4 query failures in early runs using product defaults:
 Quickly elim...
© 2017 IBM Corporation34
ORC performance evaluation
V5.0.1
V2.1
LLAP on TEZ
Hadoop-DS @ 10TB
Load
Single
Stream
HARDWARE
1...
© 2017 IBM Corporation35
PERFORMANCE: 6-streams
BIG SQL 2.3X FASTER
HADOOP-DS @ 10TB BIG SQL V5.0.1 AND HIVE 2.1 (LLAP WIT...
© 2017 IBM Corporation36
WHY ???
Advanced Autonomics
Self Tuning
Memory Manager
Integrated Work Load
Manager
World Class C...
© 2017 IBM Corporation37
So, what does all this boil down to ?
 Data Scientists/Business Analysts can be 3-4 times more p...
© 2017 IBM Corporation38
Questions?
https://developer.ibm.com/hadoop/category/bigsql/
© 2017 IBM Corporation39
Thank you!
© 2017 IBM Corporation
Backup slides
© 2017 IBM Corporation41
SQL over
Hadoop
use cases
SQL
Adhoc data
preparation
for analytics.
Federation
Transactional
with...
© 2017 IBM Corporation42
Big SQL v5 + YARN Integration
Dynamic Allocation / Release of Resources
Big SQL Head
NM NM NM NM ...
© 2017 IBM Corporation43
Big SQL v5 Elastic Boost – Multiple Workers per Host
More Granular Elasticity
NM NM NM NM NM NM
Y...
© 2017 IBM Corporation44
Cluster Details (F1 Spark cluster)
Designed for Spark
 Totals across all Cluster Data nodes
 1,...
© 2017 IBM Corporation45
Query compliance through the scale factors (cont)
Almost half (7) of the Spark SQL 2.1 queries w...
© 2017 IBM Corporation46
Hadoop Tuning
HDFS Setting Default 100TB
NameNode Java heap 4G 20G
NameNode new generation size 5...
© 2017 IBM Corporation47
SPARK Tuning
Spark Setting Default 10TB 100TB
spark.rpc.askTimeout (s) 120 1200 36000
spark.kryos...
© 2017 IBM Corporation48
Big SQL Tuning
Big SQL Setting Default 100TB
Big SQL Workers per Node 1 12
INSTANCE_MEMORY 25% 97...
© 2017 IBM Corporation49
Publisher Date Product TPC-DS Queries Data Vol
Cloudera Sept 2016 Impala 2.6 on AWS
Claims 42% mo...
Upcoming SlideShare
Loading in …5
×

Breaching the 100TB Mark with SQL Over Hadoop

1,119 views

Published on

During the second half of 2016, IBM built a state of the art Hadoop cluster with the aim of running massive scale workloads. The amount of data available to derive insights continues to grow exponentially in this increasingly connected era, resulting in larger and larger data lakes year after year. SQL remains one of the most commonly used languages used to perform such analysis, but how do today’s SQL-over-Hadoop engines stack up to real BIG data? To find out, we decided to run a derivative of the popular TPC-DS benchmark using a 100 TB dataset, which stresses both the performance and SQL support of data warehousing solutions! Over the course of the project, we encountered a number of challenges such as poor query execution plans, uneven distribution of work, out of memory errors, and more. Join this session to learn how we tackled such challenges and the type of tuning that was required to the various layers in the Hadoop stack (including HDFS, YARN, and Spark) to run SQL-on-Hadoop engines such as Spark SQL 2.0 and IBM Big SQL at scale!

Speaker
Simon Harris, Cognitive Analytics, IBM Research

Published in: Technology
  • Be the first to comment

Breaching the 100TB Mark with SQL Over Hadoop

  1. 1. © 2017 IBM CorporationSept 2017 Breaching the 100TB mark with SQL over Hadoop Analytics Performance .1 Highlights L’s leadership in performance and scalability ovements ts partitioning options ments mory and caching improvements s with ORC file format ther on these enhancements by allowing a Simon Harris (siharris@au1.ibm.com) IBM Research Priya Tiruthani (ntiruth@us.ibm.com) Big SQL Offering Manager
  2. 2. © 2017 IBM Corporation2 Acknowledgements and Disclaimers Availability. References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. The workshops, sessions and materials have been prepared by IBM or the session speakers and reflect their own views. They are provided for informational purposes only, and are neither intended to, nor shall have the effect of being, legal or other guidance or advice to any participant. While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is provided AS-IS without warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this presentation or any other materials. Nothing contained in this presentation is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software. All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results. © Copyright IBM Corporation 2017. All rights reserved. — U.S. Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. IBM, the IBM logo, ibm.com, BigInsights, and Big SQL are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or TM), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml TPC Benchmark, TPC-DS, and QphDS are trademarks of Transaction Processing Performance Council Cloudera, the Cloudera logo, Cloudera Impala are trademarks of Cloudera. Hortonworks, the Hortonworks logo and other Hortonworks trademarks are trademarks of Hortonworks Inc. in the United States and other countries. Other company, product, or service names may be trademarks or service marks of others.
  3. 3. © 2017 IBM Corporation3 Big SQL is the only SQL-on-Hadoop solution to understand SQL syntax from other vendors and products, including: Oracle, IBM Db2 and Netezza. For this reason, Big SQL is the ultimate hybrid engine to optimize EDW workloads on an open Hadoop platform What is IBM Big SQL?
  4. 4. © 2017 IBM Corporation4 Want to modernize your EDW without long and costly migration efforts Offloading historical data from Oracle, Db2, Netezza because reaching capacity Operationalize machine learning Need to query, optimize and integrate multiple data sources from one single endpoint Slow query performance for SQL workloads Require skill set to migrate data from RDBMS to Hadoop / Hive Do you have any of these challenges?
  5. 5. © 2017 IBM Corporation5 Here’s How Big SQL Addresses These Challenges  Compatible with Oracle, Db2 & Netezza SQL syntax  Modernizing EDW workloads on Hadoop has never been easier  Application portability (eg: Cognos, Tableau, MicroStrategy,…)  Federates all your data behind a single SQL engine  Query Hive, Spark and HBase data from a single endpoint  Federate your Hadoop data using connectors to Teradata, Oracle, Db2 & more  Query data sources that have Spark connectors  Addresses a skill set gap needed to migrate technologies  Delivers high performance & concurrency for BI workloads  Unlock Hadoop data with analytics tools of choice  Provides greater security while accessing data  Robust role-based access control and Ranger integration  Operationalize machine learning through integration with Spark  Bi-directional integration with Spark exploits Spark’s connectors as well as ML capabilities
  6. 6. © 2017 IBM Corporation6 IBM’s Big SQL Preserves Open Source Foundation Leverages Hive metastore and storage formats. No Lock-in. Data part of Hadoop, not Big SQL. SQL Execution Engines Big SQL (IBM) Hive (Open Source) Hive Storage Model (open source) CSV Parquet ORC Others…Tab Delim. Hive Metastore (open source) Applications
  7. 7. © 2017 IBM Corporation7 Big SQL queries heterogeneous systems in a single query - only SQL-on-Hadoop that virtualizes more than 10 different data sources: RDBMS, NoSQL, HDFS or Object Store Big SQL Fluid Query (federation) Oracle SQL Server Teradata DB2 Netezza (PDA) Informix Microsoft SQL Server Hive HBase HDFS Object Store (S3) WebHDFS Big SQL allows query federation by virtualizing data sources and processing where data resides Hortonworks Data Platform (HDP) Data Virtualization
  8. 8. © 2017 IBM Corporation8  Easy porting of enterprise applications  Ability to work seamlessly with Business Intelligence tools like Cognos to gain insights  Big SQL integrates with Information Governance Catalog by enabling easy shared imports to InfoSphere Metadata Asset Manager, which allows:  Analyze assets  Utilize assets in jobs  Designate stewards for the assets Oracle SQL DB2 SQL Netezza SQL Big SQL SQL syntax tolerance (ANSI SQL Compliant) Cognos Analytics InfoSphere Metadata Asset Manager Big SQL is a synergetic SQL engine that offers SQL compatibility, portability and collaborative ability to get composite analysis on data Data Offloading and Analytics
  9. 9. © 2017 IBM Corporation9 BRANCH_A FINANCE (security admin)BRANCH_B Role Based Access Control enables separation of Duties / Audit Row Level Security Row and Column Level Security Big SQL offers row and column level access control (RBAC) among other security settings Data Security
  10. 10. © 2017 IBM Corporation10 Themes Integration Performance Usability & Serviceability Enterprise, Governance & Security Bringing together different components of the Hadoop and making sure Big SQL offers enhanced capabilities and smooth. Execution of queries, simple or complex, needs to complete with low latency. Big SQL continues to focus on improving the query execution for all open source file formats. By simplifying the complexity of setting up or trouble-shooting that comes with Hadoop ecosystem, users will benefit by increases the productivity of their use cases. Enterprise needs are specific to application portability and data security. Big SQL has high application portability and makes strides to enhance it further. Big SQL also focuses on having centralized governance and auditing for.
  11. 11. © 2017 IBM Corporation11 • At a High Level: • Bi-directional Spark integration allows you to run Spark jobs from Big SQL • Ranger integration provide centralized security • Yarn integration for easy flex up/down of Big SQL workers • Integration with HDP v2.6.x Bi-directional Spark integration+ Ranger integration+ HDP integration+ Integration Big SQL v5.0.x focuses on providing integration with the following… YARN integration+
  12. 12. © 2017 IBM Corporation12 • At a High Level: • Launch multiple Big SQL workers on a node using Turbo Boost technology for SQL execution • Enhancements in handling ORC file format has shown marked increase in performance (at par with parquet file format) • New benchmark shows great performance ORC enhancement+ Performance Constant upgrades to performance helps Big SQL to lead in performance for complex queries Performance benchmarks+ Elastic Boost technology+
  13. 13. © 2017 IBM Corporation13 • At a High Level: • Oracle compatibility for application portability • Ranger integration provide centralized security Oracle compatibility+ Ranger integration+ Enterprise, Governance & Security
  14. 14. © 2017 IBM Corporation14 • At a High Level: • UI driven simple install with a few clicks • Data loaded for immediate use • Tutorial guided using Zeppelin Sandbox+ Big SQL Sandbox A single node sandbox to visualize data using Zeppelin
  15. 15. © 2017 IBM Corporation15 Right Tool for the Right Job Not Mutually Exclusive. Hive, Big SQL & Spark SQL can co-exist and complement each other in a cluster Big SQL Federation Complex Queries High Concurrency Enterprise ready Application portability All open source file formats Spark SQL Machine learning Data exploration Simpler SQL Hive In-memory cache Geospatial analytics ACID capabilities Fast ingest Ideal tool for Data Scientists and discovery Ideal tool for BI Data Analysts and production workloads Ideal tool for BI Data Analysts and production workloads
  16. 16. © 2017 IBM Corporation16 To Summarize - Core Themes of Big SQL SQL Compatibility •Understands different SQL dialects •Reuse skills and applications with less/no changes Federation •Connect to remote data sources •Query pushdown •Spark connectors for more data sources & ML models Performance •Execute all 99 TPCDS queries •Scales linearly with increased concurrency Enterprise & Security •Automatic memory management •Role/column based data security •SQL compatible with: •Applications work as-is without any changes •Federates to more than 10 data sources: RDBMS, NoSQL and/or Object Stores •Integrates bi-directionally with Spark, like no other •Operationalizes ML models •Exhibits high performance even when data scales up to 100TB with complex SQLs •Handles many concurrent users without relinquishing performance •Secures data using SQL with roles •Integrates with Ranger for centralized management Big SQL is the only SQL-on-Hadoop engine that……
  17. 17. © 2017 IBM Corporation17 Hadoop-DS @ 100TB Breaching the 100TB mark: The Environment F1 ClusterLoad Single Stream 4-Streams
  18. 18. © 2017 IBM Corporation18 About the Hadoop-DS Workload Aim: To provide the fairest and most meaningful comparison of SQL over Hadoop solutions Hadoop-DS benchmark is based on the TPC-DS* benchmark. Strives to follow latest (v2.3) TPC-DS specification whenever possible. Key deviations:  No data maintenance or data persistence phases - not possible across all vendors  Uses a sub-set of queries that all solutions can successfully execute at that scale factor  Queries are not cherry picked Is STILL the most complete TPC-DS like benchmark executed so far Includes database build, single stream run and multi-stream run First published in Oct 2014, using Big SQL, Impala and Hive This publication compares Big SQL v5.0 with Spark 2.1 and focuses on 4- stream run
  19. 19. © 2017 IBM Corporation19 What is TPC-DS?  TPC = Transaction Processing Council  Non-profit corporation (vendor independent)  Defines various industry driven database benchmarks…. DS = Decision Support  Models a multi-domain data warehouse environment for a hypothetical retailer Retail Sales Web Sales Inventory Demographics Promotions Multiple scale factors: 100GB, 300GB, 1TB, 3TB, 10TB, 30TB and 100TB 99 Pre-Defined Queries Query Classes: 1.Reporting 4.Ad Hoc2.Iterative OLAP 3.Data Mining TPC-DS now at Version 2.5 (http://www.tpc.org/tpcds/default.asp)
  20. 20. © 2017 IBM Corporation20 SO WHAT DOES IT TAKE TO BREACH 100TB F1 CLUSTER: DESIGNED FOR SPARK HARDWARE 28 LENOVO x3650 M5 NODES 100 GbE MELLANOX SN2700 448 TB SSD PCIe INTEL NVMe 1008 INTEL BROADWELL CORES 42,000 GB RAM SOFTWARE 4.2 IOP 7.2 RHEL 2.1 SPARK 5.0 Big SQL CLUSTER BANDWIDTH 375 GB/S NETWORK 480 GB/S DISK IO ALL 4 SEASONS OF HOUSE OF CARDS + ORANGE IS THE NEW BLACK LOADED INTO RAM IN 1 SEC DATA PREP 7 HOURS TO GENERATE 100TB RAW DATA 39 HOURS TO PARTITION AND LOAD (PARQUET) 39.7 TB ON-HDFS SIZE FOR PARQUET FILES COMPRESSION 60%SPACE SAVED WITH PARQUET ENERGY USE PER NODE 167 WATTS AT STAND-BY 560 WATTS AT PEAK LOAD 475 WATTS LOAD AVERAGE PEAK CPU USAGE 96% TOTAL 73.7% USER 22% SYS <1% IOWAIT
  21. 21. © 2017 IBM Corporation21 5. HDFS Rebalance Balance data across nodes to reduce uneven load 3. MR2 Tuning (for data load) 7 properties Map/reduce memory, java heap size, io.sort.factor 4. Parquet <-> HDFS block alignment to reduce unnecessary io ops block size = 128MB 8. Big SQL Tuning (with elastic boost) 11 properties (5 are now used as the defaults) HADOOP-DS @ 100TB: TUNING THE STACK 7. Spark SQL Tuning 11 properties 3 spark-submit properties 6. YARN Tuning 10 properties Mainly for container allocation 2. HDFS Tuning 5 properties Mainly for NameNode 1. Low level machine tuning file system optimization & mounts network tuning disable CPU scaling Fulldetailsareinbackupslides Generate data Load data Re-load data Rebalance Spark SQL queries Big SQL queries Basic cpu, io & network throughput tests
  22. 22. © 2017 IBM Corporation22 100TB Hadoop-DS is BIGdata
  23. 23. © 2017 IBM Corporation23 100TB Database Build Parquet (with compression) was chosen as the storage format for Big SQL and Spark SQL Fact tables were partitioned (compliant) to take advantage of new ‘partition elimination thru join keys’ available in Big SQL v5 Both Big SQL and Spark SQL used exactly the same partitioned parquet files Spark SQL did not require the Analyze & Stats View build stages Load stage took ~39 hours  STORE_SALES table is heavily skewed on null partition (SS_SOLD_DATE_SK=NULL)  Most of time spent loading this null partition (~ 570GB, other partitions are ~20GB). In LOAD this is done by a single reducer  INSERT..SELECT.. using multiple logical Big SQL workers may be faster (we ran out of time before we could try it)
  24. 24. © 2017 IBM Corporation24 Query compliance through the scale factors  Spark SQL has made impressive strides since v1.6 to run all 99 TPC-DS compliant queries out of the box  But this is only at the lower scale factors  AT 100TB, 16 of the 99 queries fail with runtime errors or timeout (> 10 hours)  Big SQL has been successfully executing all 99 queries since Oct 2014  IBM is the only vendor that has proven SQL compatibility at scale factors up to 100TB  With compliant DDL and query syntax For an apples-to-apples comparison, the 83 queries which Spark could successfully execute were used for the comparison. Big SQLSpark SQL
  25. 25. © 2017 IBM Corporation25 Performance: Single Stream Run  Single stream run represents a power run  Interesting engineering exercise, but not representative of real life usage. Big SQL is 3.8x faster than Spark 2.1 for single stream run 27859 18145 68735 0 10000 20000 30000 40000 50000 60000 70000 80000 BIG SQL V5.0 BIG SQL V5.0 SPARK SQL V2.1 Total elapsed time (secs) Total elapsed time for Hadoop DS workload @ 100TB. Single stream. (shorter bars are better) 83 queries 99 queries 83 queries
  26. 26. © 2017 IBM Corporation26 Performance: 4 concurrent streams  Multi-stream query execution most closely represents real-life usage.  Analysis focus on 4-stream runs Big SQL is 3.2x faster than Spark 2.1 for 4 concurrent streams 81329 49217 155515 0 50000 100000 150000 200000 BIG SQL V5.0 BIG SQL V5.0 SPARK SQL V2.1 Total elapsed time (secs) Total elapsed time for Hadoop DS workload @ 100TB. 4 concurrent streams. (shorter bars are better) 83 queries*4 streams = 332 99 queries*4 streams = 396 queries 83 Q * 4 Strms = 332 queries
  27. 27. © 2017 IBM Corporation27 CPU Profile for Big SQL vs. Spark SQL Hadoop-DS @ 100TB, 4 Concurrent Streams Spark SQL uses almost 3x more system CPU. These are wasted CPU cycles. Average CPU Utilization: 76.4% Average CPU Utilization: 88.2% Spark SQL runs 3.2x longer than Big SQL – so Spark SQL actually consumes > 3x more CPU for the same workload!  Average CPU consumption across 4-stream run for Big SQL is 76.4% compared to 88.2% for Spark SQL  Almost 3x more system CPU for Spark SQL. These are wasted CPU cycles.  Very little io wait time for both (SSDs are fast)  Since Spark SQL run is 3.2x longer than Big SQL, Spark SQL actually consumes more than 3x the CPU resources to complete the same workload  Big SQL: Some nodes have higher CPU consumption than others, showing imbalance in the distribution of work amongst the nodes  Spark SQL: Spark SQL has even distribution of CPU across nodes indicating work is more evenly distributed  Big SQL is much more efficient in how it uses the available CPU resources.
  28. 28. © 2017 IBM Corporation28 Big SQL vs Spark SQL Memory Consumption Hadoop-DS @ 100TB, 4 Concurrent Streams  Big SQL is only ‘actively’ using approx. 1/3rd of the available memory  Indicating more memory could be assigned to bufferpools and sort space etc…  So could Big SQL be even faster and/or support greater concurrency !!!  Spark SQL is doing a better job at utilizing the available memory, but consequently has less room for improvement via tuning Big SQL Spark SQL Active Inactive Free
  29. 29. © 2017 IBM Corporation29 I/O Activity: 4-streams.  Indicates Spark SQL needs to do more I/O to complete the workload, but when high I/O throughput is required, Big SQL can drive the SSDs harder than Spark SQL  Spark SQL is performing more I/O than Big SQL  Since Spark SQL run lasts 3.2x longer than Big SQL, Spark SQL is actually reading ~12x more data than Big SQL and writing ~30x more data. Indicates greater efficiency within the mature Big SQL optimizer & execution engine.
  30. 30. © 2017 IBM Corporation30 PERFORMANCE Big SQL 3.2x faster 4 concurrent query streams. HADOOP-DS @ 100TB: SUMMARY WORKING QUERIES CPU (vs Spark) Big SQL uses 3.7x less CPU I/O (vs Spark) Big SQL reads 12x less data Big SQL writes 30x less data COMPRESSION 60%SPACE SAVED WITH PARQUET AVERAGE CPU: 76.4% MAX I/O THROUGHPUT (per node): READ 4.4 GB/sec WRITE 2.8 GB/sec
  31. 31. © 2017 IBM Corporation31 Lessons Learned: General!  Building a brand-new cluster from the ground up is tough!  Full stack tuning required in order to get the cluster to a state capable of handling 100TB  Pay close attention to how the data is loaded, and think carefully about the partitioning scheme  Be cognizant of data skew - especially on your partitioning scheme Concurrency is much more difficult than single stream Complex queries pose a significant problem for most SQL over Hadoop solutions at scale  Near best performance often achieved in the first 5-8 runs, absolute best may take much longer – the 80:20 rule.
  32. 32. © 2017 IBM Corporation32 Lessons Learned: Spark SQL!  Spark SQL has come a long way, very quickly. BUT…  Success at lower scale factors does not guarantee success at higher scale factors  Significant effort required to tune failing queries Spark SQL still relies heavily (almost entirely) on manual tuning…  To get the best out of Spark SQL, the level of parallelism (num-executors) and memory assigned to them (executor-memory) needs to be tuned for each query, and the optimal values vary depending on how many other Spark queries are running in the cluster at that particular time.  Very difficult, if not impossible, to manage this in a production environment
  33. 33. © 2017 IBM Corporation33 Lessons Learned: Big SQL!  4 query failures in early runs using product defaults:  Quickly eliminated via product tuning  Big SQL defaults changed as a result  Focused on hardening “Elastic Boost” capability to gain maximum throughput  Extensive development work in the Scheduler to evenly distribute work amongst the logical workers  Spare capacity (memory, cpu) could be better utilized  Could have done better!  Big SQL has unique tuning features to help with stubborn queries  Only a limited set of these are allowed by the Hadoop-DS rules, but could be deployed in production clusters
  34. 34. © 2017 IBM Corporation34 ORC performance evaluation V5.0.1 V2.1 LLAP on TEZ Hadoop-DS @ 10TB Load Single Stream HARDWARE 17 LENOVO x3650 M4 NODES 640 LOGICAL CORES 2,048 GB RAM 288 TB DISK SPACE 10 Gb ETHERNET6- Streams
  35. 35. © 2017 IBM Corporation35 PERFORMANCE: 6-streams BIG SQL 2.3X FASTER HADOOP-DS @ 10TB BIG SQL V5.0.1 AND HIVE 2.1 (LLAP WITH TEZ) AT A GLANCE: 85 COMMON QUERIES WORKING COMPLIANT QUERIES: 6-streams WORKLOAD SCALE FACTOR: 10 TB FILE FORMAT: ORC (ZLIB) CONCURRENCY: 6 STREAMS QUERY SUBSET: 85 QUERIES RESOURCE UTILIZATION: 6-STREAMS Big SQL: 1.5x FEWER CPU CYCLES USED STACK HDP 2.6.1 BIG SQL 5.0.1 HIVE 2.1 LLAP ON TEZ INTERESTING FACTS FASTEST QUERY 5.4X FASTER (BIG SQL: 1.5 SEC, HIVE: 8.1 SEC) SLOWEST QUERY (QUERY 67) 1.7X FASTER (BIG SQL: 6827 SEC, HIVE: 11830 SEC) BIG SQL FASTER FOR 80% OF QUERIES RUN PERFORMANCE: 1-stream BIG SQL 1.8X FASTER hrs hrs
  36. 36. © 2017 IBM Corporation36 WHY ??? Advanced Autonomics Self Tuning Memory Manager Integrated Work Load Manager World Class Cost Based Optimizer Query rewrite Advanced Statistics Advanced Partitioning Native Row & Columnar stores Hardened runtime Elastic Boost SQL Compatibility
  37. 37. © 2017 IBM Corporation37 So, what does all this boil down to ?  Data Scientists/Business Analysts can be 3-4 times more productive using Big SQL compared to Spark SQL.  With Big SQL, users can focus on what they want to do, and not worry about how it is executed. Proof points:  Able to successfully run all 99 TPC-DS queries @ 100TB in 4-concurrent streams  Performance leadership  Uses fewer cluster resources  Simpler configuration with mature self-tuning and workload management features  Big SQL is the best SQL over Hadoop engine for complex analytical workloads  No one else has published @100TB (or anywhere close)
  38. 38. © 2017 IBM Corporation38 Questions? https://developer.ibm.com/hadoop/category/bigsql/
  39. 39. © 2017 IBM Corporation39 Thank you!
  40. 40. © 2017 IBM Corporation Backup slides
  41. 41. © 2017 IBM Corporation41 SQL over Hadoop use cases SQL Adhoc data preparation for analytics. Federation Transactional with fast lookups. Fewer users. Adhoc queries and discovery. ELT and simple, large scale queries. Complex SQL. Many users. Deep analytics Operational Data Store Need to balance “best tool for the job” paradigm with maintainability and support costs. Big SQL Hive Spark SQL Big SQL Hbase Big SQL Phoenix Spark SQL Hive Big SQL Hbase Big SQL Phoenix But SQL can do so much more…
  42. 42. © 2017 IBM Corporation42 Big SQL v5 + YARN Integration Dynamic Allocation / Release of Resources Big SQL Head NM NM NM NM NM NM Slider Client YARN Resource Manager & Scheduler Big SQL AM Big SQL Worker Big SQL Worker Big SQL Worker Container YARN components Slider Components Big SQL Components Big SQL Worker Big SQL Worker Big SQL Worker Stopped workers release memory to YARN for other jobs Stopped workers release memory to YARN for other jobs Stopped workers release memory to YARN for other jobs Big SQL Slider package implements Slider Client APIs HDFS
  43. 43. © 2017 IBM Corporation43 Big SQL v5 Elastic Boost – Multiple Workers per Host More Granular Elasticity NM NM NM NM NM NM YARN Resource Manager & Scheduler Big SQL AM Container YARN components Slider Components Big SQL Components Big SQL Head Slider Client Big SQL Worker Big SQL Worker Big SQL Worker Big SQL Worker Big SQL Worker Big SQL Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker HDFS
  44. 44. © 2017 IBM Corporation44 Cluster Details (F1 Spark cluster) Designed for Spark  Totals across all Cluster Data nodes  1,080 cores, 2,160 threads  45TB memory  100TB database storage replicated 3X plus temp, 480TB raw, 240 MVMe  Hardware Details  100TB SQL Database requires a 28-node cluster • 2 management nodes (Lenovo x3650 M5), co-located with data nodes • 28 data nodes (Lenovo x3650 M5) • 2 racks, 20x 2U servers per rack (42U racks) • 1 switch, 100GbE, 32 ports, 1U, (Mellanox SN2700)  Each data node  CPU: 2x E5-2697 v4 @ 2.3GHz (Broardwell) (18c) Passmark: 23,054 / 46,108  Memory: 1.5TB per server 24x 64GB DDR4 2400MHz  Flash Storage: 8x 2TB SSD PCIe MVMe (Intel DC P3700), 16TB per server  Network: 100GbE adapter (Mellanox ConnectX-5 EN)  IO Bandwidth per server: 16GB/s, Network bandwidth 12.5GB/s  IO Bandwidth per cluster: 480GB/s, Network bandwidth 375GB/s
  45. 45. © 2017 IBM Corporation45 Query compliance through the scale factors (cont) Almost half (7) of the Spark SQL 2.1 queries which fail at 100TB can be classified as complex in nature  No surprise since Spark is a relatively immature technology  In-line with findings from original Hadoop-DS work in 2014 Big SQL RDBMs heritage is the key to providing Enterprise grade SQL for complex analytical workloads Any query which does not complete, requires modification or tuning impacts business productivity and wastes valuable human & machine resources
  46. 46. © 2017 IBM Corporation46 Hadoop Tuning HDFS Setting Default 100TB NameNode Java heap 4G 20G NameNode new generation size 512 2.5G NameNode maximum new generation size 512 2.5G Hadoop maximum Java heap size 4G 10G DataNode max data transfer threads (helps HDFS data rebalance) 4096 16384 MR2 Settings, applicable to load operations Default 100TB MapReduce Framework map memory 2G 35G MapReduce Framework reduce memory 4G 69G MapReduce Sort Allocation Memory (helps with hdfs rebalancing) 1G 2G MR Map Java Heap Size (MB) 1638 28262 MR Reduce Java Heap Size (MB) 7840 56524 mapreduce.jobhistory.intermediate-done-dir /var /data15/var mapreduce.task.io.sort.factor 100 1000 YARN setting Default 100TB Percentage of physical CPU allocated for all containers 80% 90% Number of virtual cores 57 (80%) 72 Container - Minimum container size 44G 20G ResourceManager Java heap size 1G 8G NodeManager Java heap size 1G 8G AppTimelineServer Java heap size 1G 8G YARN Java heap size 1G 2G Advanced: Fault Tolerance yarn.resourcemanager.connect.retry- interval.ms 30000 250 Advanced: Advanced yarn-site, hadoop.registry.rm.enabled False True Advanced: Advanced yarn-site, yarn.client.nodemanager-connect.retry- interval-ms 10000 250
  47. 47. © 2017 IBM Corporation47 SPARK Tuning Spark Setting Default 10TB 100TB spark.rpc.askTimeout (s) 120 1200 36000 spark.kryoserializer.buffer.max (mb) 64 768 768 spark.yarn.executor.memoryOverhead (mb) 384 1384 8192 spark.driver.maxResultSize 1G 8G 40G spark.local.dir /tmp /data[1-10]/tmp /data[1-10]/tmp spark.network.timeout 120 1200 36000 spark.sql.broadcastTimeout 120 1600 36000 spark.buffer.pageSize computed computed 64m spark.shuffle.file.buffer computed computed 512k spark.memory.fraction 0.6 0.8 0.8 spark.scheduler.listenerbus.eventqueue.size 10K 120K 600K
  48. 48. © 2017 IBM Corporation48 Big SQL Tuning Big SQL Setting Default 100TB Big SQL Workers per Node 1 12 INSTANCE_MEMORY 25% 97% DB2_CPU_BINDING 25% MACHINE_SHARE=94 DB2_EXT_TABLE_SETTINGS DFSIO_MEM_RESERVE:20 DFSIO_MEM_RESERVE:0 DFT_DEGREE 8 4 SORTHEAP SHEAPTHRES_SHR Computed Computed 4.4 G 70 G BufferPool Size Computed 15 G scheduler.cache.splits false true scheduler.assignment.algorithm GREEDY MLN_RANDOMIZED scheduler.dataLocationCount Computed max:28 scheduler.maxWorkerThreads Computed 8192 Green highlights the defaults will be changed in v5.0
  49. 49. © 2017 IBM Corporation49 Publisher Date Product TPC-DS Queries Data Vol Cloudera Sept 2016 Impala 2.6 on AWS Claims 42% more performant than AWS Redshift 70 query subset 3TB Cloudera August 2016 Impala 2.6 Claims 22% faster for TPC-DS than previous version 17 queries referenced Not specified Cloudera April 2016 Impala 2.5 Claims 4.3x faster for TPC-DS than previous version 24 query subset 15TB *2 Hortonworks July 2016 Hive 2.1 with LLAP Claims 25x faster for TPC-DS than Hive 1.2 15 query subset 1TB Radiant Advisors *1 June 2016 Impala 2.5 on CDH 62 successful 37 fail 100GB / 1TB Radiant Advisors *1 June 2016 Presto .141t on Teradata Hadoop Appliance (HDP, CDH) 78 successful 21 fail 100GB / 1TB Radiant Advisors *1 June 2016 Hive 1.2.1, Tez 0.7.0 on HDP 63 successful 35 fail 100GB / 1TB What About Other SQL Hadoop TPC-DS Benchmarks? No other vendor has demonstrated ability to execute all 99 TPC-DS queries - even at lower scale factors.

×