SlideShare a Scribd company logo
SCALING IMPALA
Manish Maheshwari | Strata London 2019 | #StrataData
manish@cloudera.com
2 © Cloudera, Inc. All rights reserved.
AGENDA
• Impala overview
• KRPC Improvements
• Scaling issues and solutions
• Understanding query profiles
• Key Takeaways
3 © Cloudera, Inc. All rights reserved.
• Open source
• Fast
• Massively parallel processing (MPP)
• C++, run time code generation, streaming
• Flexible
• Multiple storage engines (HDFS, S3, ADLS, Apache Kudu, …)
• Multiple file formats (Parquet, Text, Sequence, Avro, ORC, …)
• Enterprise-grade
• Authorization, authentication, lineage tracing, auditing, encryption
• >1400 customers, >97000 machines
• Scalable (Now even more!)
• Large clusters with 400+ nodes
Apache Impala
4 © Cloudera, Inc. All rights reserved.
HDFS Kudu S3 HBase
Impala Architecture
Query Compiler
Query Executor
Query Coordinator
Metadata
HDFS NameNode
StateStore
Catalog
FE
(Java)
BE (C++)
Metadata Cache
Impala Daemons
Metadata
Execution
Storage ADLS
Sentry
Hive MetaStore
Query Compiler
Query Executor
Query Coordinator
Metadata
Query Compiler
Query Executor
Query Coordinator
Metadata
HDFS
0
1
Query Compiler
Query Executor
Query Coordinator
Metadata
Kudu S3/ADLS HBase HDFS Kudu S3/ADLS HBase HDFS Kudu S3/ADLS HBase
Impala Daemons
SQL App
ODBC/JDBC
2
3
4
5
6
7
• Request arrives via ODBC/JDBC
• Planner turns request into collection of plan fragments
• Coordinator initiates execution on remote Impala daemons
• Intermediate results are streamed between Impala daemons
• Query results are streamed back to the client
Select Query flow in Impala
Query Compiler
Query Executor
Query Coordinator
Metadata
Query Compiler
Query Executor
Query Coordinator
Metadata
Impala Daemons
Impala Daemons
6 © Cloudera, Inc. All rights reserved.
What’s KRPC?
• Replaces Thrift RPC for inter daemon communication from CDH 5.15+ for
certain RPCs
• Reduces # of connections in cluster
• Reduce stress on MIT KDC / AD
• KRPC supports both synchronous and asynchronous RPCs
• KRPC supports connection multiplexing
• One connection per direction between every pair of hosts
• Fixed size thread pool maintained by KRPC internally
• Query performances during concurrent execution up by average 2x-3x
7 © Cloudera, Inc. All rights reserved.
KRPC Performance
8 © Cloudera, Inc. All rights reserved.
KRPC Stability
9 © Cloudera, Inc. All rights reserved.
KRPC Throughput
10 © Cloudera, Inc. All rights reserved.
Scaling Issues
• I got 20 more use cases to onboard
• And our data volumes just went up 10x
• And we have 10x more tables and 100x more partitions
• And we have 10x more concurrent queries
• And our ETL is now real time
• Sure, Just add more nodes right?
• Ok, did that too.. My queries are still 3x slower ☹
11 © Cloudera, Inc. All rights reserved.
Metadata/Catalog Cache Woes
• Impala Catalog and Daemons cache the HMS Metadata + HDFS Block locations
• Memory = num of tables * 5KB + num of partitions * 2KB + num of files * 750B + num of file blocks
* 300B + sum(incremental col stats per table)
• Incremental stats
• For each table, num columns * num partitions * 400B
• e.g. A large telco has over 56K tables
• and partitions and files and blocks and replicas…
• Catalog memory approx. 80 GB, GC times go up significantly
• Reduced memory for query execution.
• OOM issues on the catalogd
• Long metadata loading time, Long time for SS to send the catalog to all the daemons
• Slow DDL and DML Statements, Even `describe table` requires catalog cache for the whole
table
12 © Cloudera, Inc. All rights reserved.
Metadata/Catalog Cache Woes
Daemon
#1
CatalogD
A,B,C,D
Table
A
HMS
Table
B
Table
C
Table
D
Namenode
Daemon
#1
A,B,C,D
Daemons
A,B,C,D
Daemon
#1
A,B,C,D
Daemon
#1
A,B,C,D
Daemons
A,B,C,D
Daemon
#1
A,B,C,D
Daemon
#1
A,B,C,D
Daemons
A,B,C,D
StateStore Daemon
13 © Cloudera, Inc. All rights reserved.
Metadata/Catalog Cache - Solutions
• Regularly merge small files
• And run a Refresh Table
• Avoid data ingestion processes that produce many small files
• Use a larger block size (can be over 2GB)
• Optimal partitioning strategy
• Don’t over or under partition
• HDFS file handle cache
• Consider using HBase, Kudu, …
• If nothing works, just delete the data 😜
14 © Cloudera, Inc. All rights reserved.
Dedicated Coordinators
• Coordinators - Compile the queries, creates the execution plan,
• Needs table metadata!!
• Executors - Executes the query plans and sends the results to other executors / coordinator
• Does not need table metadata!!
• Benefits
• Executors need less memory
• Statestore does not need to send metadata to all nodes
• Faster metadata updates and propogation
• Coordinator nodes don’t need to be datanodes
15 © Cloudera, Inc. All rights reserved.
Dedicated Coordinators
Coordinators
CatalogD
A,B,C,D
Table
A
HMS
Table
B
Table
C
Table
Z
Namenode
Executors Executors
StateStore Daemon
Coordinators
A,B,C,D
Executors
Executors
Executors
Executors
16 © Cloudera, Inc. All rights reserved.
Dedicated Coordinators – Best Practices
• Thumb Rule – 1 Coordinator per 50 executors.
• Start with just one coordinator. Run it on an edge node. (Can run on datanodes too)
• Needs atleast few disks to write any spill data
• Add another coordinator when CPU / network utilization > 80%
• When using a load balancer double the coordinator counts, but set only half as active and rest as
backup
• Use sticky connections on the load balancer
• Increase the fe_service_threads on the coordinators to handle client connections
• Increase the Java heap of the coordinators according to the catalog size
• Increase num_metadata_loading_threads
• Default is 16. Increase if you have lots of tables and partitions.
17 © Cloudera, Inc. All rights reserved.
Further Catalog Cache Improvements (In Beta)
Coordinator
#1
CatalogD
Coordinator
#2
CatalogD
A B
impala-shell> describe A impala-shell> describe B
Table
A
HMS +
Namenode
Table
B
Table
C
Table
Z
• On-Demand metadata for
coordinators from the CatalogD
• Metadata LRU cache (forget old
tables)
• Metadata release on memory
pressure
• Smart Cache Invalidation
• Compressed Incremental Stats
See IMPALA-7127 for full list of
improvements
HMS
Notifications
Coordinator
#3
C
impala-shell> describe C
18 © Cloudera, Inc. All rights reserved.
HMS Notifications
Hive Metastore
HDFS NameNode
StateStore Catalog
Metadata
2
3
Lightweight
Notifications
Query Coordinators
Coordinators
Lightweight
Notifications
HMS Notifications
Metadata
Query Coordinators
19 © Cloudera, Inc. All rights reserved.
Metadata on-demand (In Beta)
• Use Local catalog cache
• Set --catalog_topic_mode = minimal on the catalog daemons
• Set --use_local_catalog = true on the coordinators
• Time-based catalog cache eviction -
• Set invalidate_tables_timeout_s on both catalogd and coordinators
• E.g invalidate_tables_timeout_s = 3600 will invalidate tables that are older than 1 hour.
• Memory-based catalog cache eviction -
• Set invalidate_tables_on_memory_pressure = true on both catalogd and impalad.
• When the memory pressure reaches 60% of JVM heap size after a Java garbage collection
in catalogd, Impala invalidates 10% of the least recently used tables.
20 © Cloudera, Inc. All rights reserved.
Impala Architecture – Coordinators and Executors
HDFS Kudu S3 HBase
Query Compiler
Query Executor
Query Coordinator
Metadata
Query Executor
HDFS NameNode
StateStore
Catalog
FE
(Java)
BE (C++)
Metadata Cache
Impala Coordinator (‘s) Impala Executors
Metadata
Execution
Storage
Query Executor
Query Executor
Query Executor
Query Executor
Query Executor
Query Executor
Query Executor
Query Executor
Query Executor
ADLS
Sentry
Hive MetaStore
21 © Cloudera, Inc. All rights reserved.
Admission Control Woes
• Impala Admission Control not enabled / Default Memory limit not set for each pool
• Heuristics based memory estimation, not 100% accurate. Worse if table stats are unavailable
• group by’s estimate can be particularly off – when there’s a large number of group by columns.
• Mem estimate = NDV of group by column 1 * NDV of group by column 2 * … NDV of group by column n
• Under admission due to higher than required memory reserved on each daemon
• Queries will OOM out unnecessarily
• Query Status: Admission for query exceeded timeout 60000ms in pool root.nprd_tst_hadoop_data_appl_readonly.
Queued reason: Not enough aggregate memory available in pool root.nprd_tst_hadoop_data_appl_readonly with max
mem resources 150.00 GB. Needed 40.00 GB but only 30.00 GB was available.
• Always enable IAC
• Limit the amount of memory used by an individual query using per query mem-limit
• Set it from Impala shell / Hue - set mem_limit=<per query limit>
• Set default memory limit per pool
22 © Cloudera, Inc. All rights reserved.
Query Concurrency Woes
• Impala Admission Control is decentralized, each coordinator makes independent
decision on the basis of last know running queries on the cluster as communicated by
the SS
• This makes IAC decisions are fast, But might be a little imprecise during times of
heavy load across many daemons. This is called as over admission.
• Using dedicated coordinators, limits this over admission of queries!!
• You get controlled query concurrency and thus each query runs faster and overall
higher query throughput.
• Ideal total query concurrency = # Cores on the executors / datanodes
23 © Cloudera, Inc. All rights reserved.
Impala Resource Pools
Resource Pools Design -
• 10 impala daemons, 200 GB per daemon – 2 TB
total, 8 Tenants
• Ok. Lets divide the memory into tenants. Everyone
gets what they pay for. Good design right?
Issues
• Unused memory cannot be used by other tenants
• Busy tenants queue up queries in admission
control causing overall ”slowness” in query
execution.
• Small tenants running large queries will spill to disk
until the spill to disk limit and eventually OOM out
• How bad does it get - 25 + tenants, Only 2/3 active
at any given time
24 © Cloudera, Inc. All rights reserved.
Scaling Impala Resource Pools
• Design resource pools according to peak memory needed
• Use simple grouping to create small, medium and large pools
• Use Cloudera Manager API’s for user chargeback if needed
25 © Cloudera, Inc. All rights reserved.
Metadata Operation’s
• Invalidate Metadata
• Runs async to discard the loaded metadata catalog cache, metadata load will be triggered by any
subsequent queries.
• Should be run when
• New tables created / Tables dropped by Hive/Spark
• Block locations changed by HDFS load balancer
• Recover Partitions
• Scans HDFS to check if any new partition directories were added and cache block metadata for those files
• Refresh Table / Refresh Table Partition
• Adding/Removing/Overwriting files into partitions via Hive/Spark
• Running operations like ALTER TABLE
• Reloads metadata for the table from HMS and does an incremental reload of the file and block metadata.
26 © Cloudera, Inc. All rights reserved.
Scaling Metadata Operation’s
• How bad does it get?
• 18K Invalidate tables per day.
• What not to do
• No commands are needed if
operations/ETL runs in Impala
• Always run refresh <table>
<partition> when adding data
• Recover partitions when
partitions are added
• Refresh Table for other changes
• Limit IM to <Table> Only
27 © Cloudera, Inc. All rights reserved.
Automatic Metadata Sync (In Beta)
• CatalogD polls Hive Metastore (HMS) notifications events
• Invalidates the tables when it receives the ALTER TABLE events or the ALTER, ADD,
or DROP their partitions.
• Adds the tables or databases when it receives the CREATE TABLE or CREATE
DATABASE events.
• Removes the tables from catalogd when it receives the DROP TABLE or DROP
DATABASE events.
• The operations that do not generate events in HMS, such as adding new data to existing
tables/partitions from Spark are not supported.
• Aka, Load / Insert still needs a refresh table partition
28 © Cloudera, Inc. All rights reserved.
Automatic Metadata Sync (In Beta)
• To disable the event based HMS sync for a new database, set the impala.disableHmsSync database properties in
Hive
• CREATE DATABASE <name> WITH DBPROPERTIES ('impala.disableHmsSync'='true');
• To enable or disable the event based HMS sync for a table
• CREATE TABLE <name> WITH TBLPROPERTIES ('impala.disableHmsSync'='true' | 'false');
• To change the event based HMS sync at the table level
• ALTER TABLE <name> WITH TBLPROPERTIES ('impala.disableHmsSync'='true' | 'false’);
• When both table and database level properties are set, the table level property takes precedence.
• If the property is changed from true (meaning events are skipped) to false (meaning events are not skipped), issue
a manual INVALIDATE METADATA command to reset.
29 © Cloudera, Inc. All rights reserved.
Scaling Compute Stats
• Compute Stats is very CPU-intensive – Based on number of rows, number of data files, the
total size of the data files, and the file format.
• For partitioned tables, the numbers are calculated per partition, and as totals for the whole
table.
• Limit the number of columns to only compute stats on columns involved in filters, join
conditions, group by or partition by clauses.
• Re-compute stats only when there is > 30% data change
• Run compute stats on weekends/nights. Not needed after every data load.
• If you reload a complete new set of data for a table, but the number of rows and number of
distinct values for each column is relatively unchanged from before, you do not need to
recompute stats for the table
• Use enable_stats_extrapolation (experimental)
30 © Cloudera, Inc. All rights reserved.
Set Statistics Manually
• Quick fix as part of data load, while compute stats can be scheduled on weekends
• Set total number of rows. Applies to both unpartitioned and partitioned tables.
• alter table <table_name> set tblproperties('numRows'='new_value',
'STATS_GENERATED_VIA_STATS_TASK'='true’);
• Set total number of rows for a specific partition. Applies to partitioned tables only. -- You must
specify all the partition key columns in the PARTITION clause.
• alter table table_name partition (keycol1=val1,keycol2=val2...) set
tblproperties('numRows'='new_value',
'STATS_GENERATED_VIA_STATS_TASK'='true’);
• Column stats:
• ALTER TABLE <table_name> SET COLUMN STATS <col_name>
(‘numDVs'=‘100‘)
• Compute numDVs with “SELECT NDV(col)”
31 © Cloudera, Inc. All rights reserved.
Other Scalability Considerations
• Use star schemas, integer join keys
• Check for hot spotting - Increase replication factor for master data / frequently queries data
• Avoid casts – implicit or explicit (easily over 10% improvements for larger volumes)
• Increase RUNTIME_FILTER_WAIT_TIME_MS for complicated queries, but coordinators need
to do more work
• Use HDFS file handle cache
• Give the OS enough free memory to cache data blocks
• Set default compression codec - improves disk read performance
• Use high CPU nodes, fast processors
• Impala clusters - DistCp the data from remote cluster
32 © Cloudera, Inc. All rights reserved.
BI Tools
• Always always close queries
• idle_query_timeout = 60
• idle_session_timeout = 1800
• Use handcrafted SQL’s
• Use different pools for different queries
and encourage use of set mem_limit;
• Use JDBC over Kerberos authentication
mechanism
33 © Cloudera, Inc. All rights reserved.
Understanding Query Profiles
• Impala query profiles can be retrieved from Cloudera manager, impala coordinator webui or
from the command line by executing `profile`
• Includes nanosecond timers for all operations on all nodes
• Quite detailed and exhaustive, but the basics are easy
• We can easily answer -
• What’s the bottleneck for this query?
• Why this run is fast but that run is slow?
• How can I tune to improve this query’s performance.
34 © Cloudera, Inc. All rights reserved.
Understanding Query Profiles
• Always check impala
version and default query
options set
• Check warnings
• Query state – running,
cancelled
• Check query type – Query,
DDL, etc
35 © Cloudera, Inc. All rights reserved.
Understanding Query Profiles
• Check per node peak memory usage
• Tell you how much should be the memory limit for these queries
• Shows skews in memory usage among nodes
36 © Cloudera, Inc. All rights reserved.
Understanding Query Profiles
• For completed queries, read the summary in detail
• Check what’s taking the max time, max memory, check for skews in data
• Check row estimates, depending on stats available or not these can be skewed
• Check join order is determined entirely by total size (#rows * column width)
• Try to ensure after partition pruning, we have RHS smaller than LHS
• Broadcast joins are the default, partitioned joins for large tables of roughly equal size.
37 © Cloudera, Inc. All rights reserved.
Understanding Query Profiles
• Read the query timeline in detail
• Check which step is taking the most time
and why?
• Usual culprits
• Metadata load
• Completed admission
• ClientFetchWaitTimer
• First dynamic filter received
• Last row fetched
38 © Cloudera, Inc. All rights reserved.
Understanding Query Profiles
• Check each plan fragment
• Tells us what it did, how
many hosts did it run on
• How much data it
processed
• Partition pruning stats for
HDFS scans
• Parquet push down
predicates
39 © Cloudera, Inc. All rights reserved.
Key Takeaways
• Always use dedicated coordinators/executors
• IAC should be enabled and memory limit set
• Metadata management is significantly improved
• Zero touch metadata coming soon
• Follow best practices for impala queries and performance tuning –Refer Impala
cookbook
40 © Cloudera, Inc. All rights reserved.
Rate today’s session

More Related Content

Similar to Strata London 2019 Scaling Impala.pptx

February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
Yahoo Developer Network
 
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Kudu: Resolving Transactional and Analytic Trade-offs in HadoopKudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
jdcryans
 
Chicago spark meetup-april2017-public
Chicago spark meetup-april2017-publicChicago spark meetup-april2017-public
Chicago spark meetup-april2017-public
Guru Dharmateja Medasani
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using Elasticsearch
Joe Alex
 
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Cloudera, Inc.
 
What's New in Apache Hive
What's New in Apache HiveWhat's New in Apache Hive
What's New in Apache Hive
DataWorks Summit
 
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Data Con LA
 
Real-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaReal-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using Impala
Jason Shih
 
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Hadoop / Spark Conference Japan
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
huguk
 
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Mladen Kovacevic
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Swiss Big Data User Group
 
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMsJava one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
Speedment, Inc.
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Cloudera, Inc.
 
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016
StampedeCon
 
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming dataUsing Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Mike Percy
 
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and HadoopEventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
Ayon Sinha
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
thelabdude
 
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
Cloudera, Inc.
 

Similar to Strata London 2019 Scaling Impala.pptx (20)

February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
 
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Kudu: Resolving Transactional and Analytic Trade-offs in HadoopKudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
 
Chicago spark meetup-april2017-public
Chicago spark meetup-april2017-publicChicago spark meetup-april2017-public
Chicago spark meetup-april2017-public
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using Elasticsearch
 
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
 
What's New in Apache Hive
What's New in Apache HiveWhat's New in Apache Hive
What's New in Apache Hive
 
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
 
Real-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaReal-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using Impala
 
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
 
Drupal performance
Drupal performanceDrupal performance
Drupal performance
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMsJava one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
 
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016
 
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming dataUsing Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
 
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and HadoopEventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
 

Recently uploaded

Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 

Recently uploaded (20)

Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 

Strata London 2019 Scaling Impala.pptx

  • 1. SCALING IMPALA Manish Maheshwari | Strata London 2019 | #StrataData manish@cloudera.com
  • 2. 2 © Cloudera, Inc. All rights reserved. AGENDA • Impala overview • KRPC Improvements • Scaling issues and solutions • Understanding query profiles • Key Takeaways
  • 3. 3 © Cloudera, Inc. All rights reserved. • Open source • Fast • Massively parallel processing (MPP) • C++, run time code generation, streaming • Flexible • Multiple storage engines (HDFS, S3, ADLS, Apache Kudu, …) • Multiple file formats (Parquet, Text, Sequence, Avro, ORC, …) • Enterprise-grade • Authorization, authentication, lineage tracing, auditing, encryption • >1400 customers, >97000 machines • Scalable (Now even more!) • Large clusters with 400+ nodes Apache Impala
  • 4. 4 © Cloudera, Inc. All rights reserved. HDFS Kudu S3 HBase Impala Architecture Query Compiler Query Executor Query Coordinator Metadata HDFS NameNode StateStore Catalog FE (Java) BE (C++) Metadata Cache Impala Daemons Metadata Execution Storage ADLS Sentry Hive MetaStore Query Compiler Query Executor Query Coordinator Metadata Query Compiler Query Executor Query Coordinator Metadata
  • 5. HDFS 0 1 Query Compiler Query Executor Query Coordinator Metadata Kudu S3/ADLS HBase HDFS Kudu S3/ADLS HBase HDFS Kudu S3/ADLS HBase Impala Daemons SQL App ODBC/JDBC 2 3 4 5 6 7 • Request arrives via ODBC/JDBC • Planner turns request into collection of plan fragments • Coordinator initiates execution on remote Impala daemons • Intermediate results are streamed between Impala daemons • Query results are streamed back to the client Select Query flow in Impala Query Compiler Query Executor Query Coordinator Metadata Query Compiler Query Executor Query Coordinator Metadata Impala Daemons Impala Daemons
  • 6. 6 © Cloudera, Inc. All rights reserved. What’s KRPC? • Replaces Thrift RPC for inter daemon communication from CDH 5.15+ for certain RPCs • Reduces # of connections in cluster • Reduce stress on MIT KDC / AD • KRPC supports both synchronous and asynchronous RPCs • KRPC supports connection multiplexing • One connection per direction between every pair of hosts • Fixed size thread pool maintained by KRPC internally • Query performances during concurrent execution up by average 2x-3x
  • 7. 7 © Cloudera, Inc. All rights reserved. KRPC Performance
  • 8. 8 © Cloudera, Inc. All rights reserved. KRPC Stability
  • 9. 9 © Cloudera, Inc. All rights reserved. KRPC Throughput
  • 10. 10 © Cloudera, Inc. All rights reserved. Scaling Issues • I got 20 more use cases to onboard • And our data volumes just went up 10x • And we have 10x more tables and 100x more partitions • And we have 10x more concurrent queries • And our ETL is now real time • Sure, Just add more nodes right? • Ok, did that too.. My queries are still 3x slower ☹
  • 11. 11 © Cloudera, Inc. All rights reserved. Metadata/Catalog Cache Woes • Impala Catalog and Daemons cache the HMS Metadata + HDFS Block locations • Memory = num of tables * 5KB + num of partitions * 2KB + num of files * 750B + num of file blocks * 300B + sum(incremental col stats per table) • Incremental stats • For each table, num columns * num partitions * 400B • e.g. A large telco has over 56K tables • and partitions and files and blocks and replicas… • Catalog memory approx. 80 GB, GC times go up significantly • Reduced memory for query execution. • OOM issues on the catalogd • Long metadata loading time, Long time for SS to send the catalog to all the daemons • Slow DDL and DML Statements, Even `describe table` requires catalog cache for the whole table
  • 12. 12 © Cloudera, Inc. All rights reserved. Metadata/Catalog Cache Woes Daemon #1 CatalogD A,B,C,D Table A HMS Table B Table C Table D Namenode Daemon #1 A,B,C,D Daemons A,B,C,D Daemon #1 A,B,C,D Daemon #1 A,B,C,D Daemons A,B,C,D Daemon #1 A,B,C,D Daemon #1 A,B,C,D Daemons A,B,C,D StateStore Daemon
  • 13. 13 © Cloudera, Inc. All rights reserved. Metadata/Catalog Cache - Solutions • Regularly merge small files • And run a Refresh Table • Avoid data ingestion processes that produce many small files • Use a larger block size (can be over 2GB) • Optimal partitioning strategy • Don’t over or under partition • HDFS file handle cache • Consider using HBase, Kudu, … • If nothing works, just delete the data 😜
  • 14. 14 © Cloudera, Inc. All rights reserved. Dedicated Coordinators • Coordinators - Compile the queries, creates the execution plan, • Needs table metadata!! • Executors - Executes the query plans and sends the results to other executors / coordinator • Does not need table metadata!! • Benefits • Executors need less memory • Statestore does not need to send metadata to all nodes • Faster metadata updates and propogation • Coordinator nodes don’t need to be datanodes
  • 15. 15 © Cloudera, Inc. All rights reserved. Dedicated Coordinators Coordinators CatalogD A,B,C,D Table A HMS Table B Table C Table Z Namenode Executors Executors StateStore Daemon Coordinators A,B,C,D Executors Executors Executors Executors
  • 16. 16 © Cloudera, Inc. All rights reserved. Dedicated Coordinators – Best Practices • Thumb Rule – 1 Coordinator per 50 executors. • Start with just one coordinator. Run it on an edge node. (Can run on datanodes too) • Needs atleast few disks to write any spill data • Add another coordinator when CPU / network utilization > 80% • When using a load balancer double the coordinator counts, but set only half as active and rest as backup • Use sticky connections on the load balancer • Increase the fe_service_threads on the coordinators to handle client connections • Increase the Java heap of the coordinators according to the catalog size • Increase num_metadata_loading_threads • Default is 16. Increase if you have lots of tables and partitions.
  • 17. 17 © Cloudera, Inc. All rights reserved. Further Catalog Cache Improvements (In Beta) Coordinator #1 CatalogD Coordinator #2 CatalogD A B impala-shell> describe A impala-shell> describe B Table A HMS + Namenode Table B Table C Table Z • On-Demand metadata for coordinators from the CatalogD • Metadata LRU cache (forget old tables) • Metadata release on memory pressure • Smart Cache Invalidation • Compressed Incremental Stats See IMPALA-7127 for full list of improvements HMS Notifications Coordinator #3 C impala-shell> describe C
  • 18. 18 © Cloudera, Inc. All rights reserved. HMS Notifications Hive Metastore HDFS NameNode StateStore Catalog Metadata 2 3 Lightweight Notifications Query Coordinators Coordinators Lightweight Notifications HMS Notifications Metadata Query Coordinators
  • 19. 19 © Cloudera, Inc. All rights reserved. Metadata on-demand (In Beta) • Use Local catalog cache • Set --catalog_topic_mode = minimal on the catalog daemons • Set --use_local_catalog = true on the coordinators • Time-based catalog cache eviction - • Set invalidate_tables_timeout_s on both catalogd and coordinators • E.g invalidate_tables_timeout_s = 3600 will invalidate tables that are older than 1 hour. • Memory-based catalog cache eviction - • Set invalidate_tables_on_memory_pressure = true on both catalogd and impalad. • When the memory pressure reaches 60% of JVM heap size after a Java garbage collection in catalogd, Impala invalidates 10% of the least recently used tables.
  • 20. 20 © Cloudera, Inc. All rights reserved. Impala Architecture – Coordinators and Executors HDFS Kudu S3 HBase Query Compiler Query Executor Query Coordinator Metadata Query Executor HDFS NameNode StateStore Catalog FE (Java) BE (C++) Metadata Cache Impala Coordinator (‘s) Impala Executors Metadata Execution Storage Query Executor Query Executor Query Executor Query Executor Query Executor Query Executor Query Executor Query Executor Query Executor ADLS Sentry Hive MetaStore
  • 21. 21 © Cloudera, Inc. All rights reserved. Admission Control Woes • Impala Admission Control not enabled / Default Memory limit not set for each pool • Heuristics based memory estimation, not 100% accurate. Worse if table stats are unavailable • group by’s estimate can be particularly off – when there’s a large number of group by columns. • Mem estimate = NDV of group by column 1 * NDV of group by column 2 * … NDV of group by column n • Under admission due to higher than required memory reserved on each daemon • Queries will OOM out unnecessarily • Query Status: Admission for query exceeded timeout 60000ms in pool root.nprd_tst_hadoop_data_appl_readonly. Queued reason: Not enough aggregate memory available in pool root.nprd_tst_hadoop_data_appl_readonly with max mem resources 150.00 GB. Needed 40.00 GB but only 30.00 GB was available. • Always enable IAC • Limit the amount of memory used by an individual query using per query mem-limit • Set it from Impala shell / Hue - set mem_limit=<per query limit> • Set default memory limit per pool
  • 22. 22 © Cloudera, Inc. All rights reserved. Query Concurrency Woes • Impala Admission Control is decentralized, each coordinator makes independent decision on the basis of last know running queries on the cluster as communicated by the SS • This makes IAC decisions are fast, But might be a little imprecise during times of heavy load across many daemons. This is called as over admission. • Using dedicated coordinators, limits this over admission of queries!! • You get controlled query concurrency and thus each query runs faster and overall higher query throughput. • Ideal total query concurrency = # Cores on the executors / datanodes
  • 23. 23 © Cloudera, Inc. All rights reserved. Impala Resource Pools Resource Pools Design - • 10 impala daemons, 200 GB per daemon – 2 TB total, 8 Tenants • Ok. Lets divide the memory into tenants. Everyone gets what they pay for. Good design right? Issues • Unused memory cannot be used by other tenants • Busy tenants queue up queries in admission control causing overall ”slowness” in query execution. • Small tenants running large queries will spill to disk until the spill to disk limit and eventually OOM out • How bad does it get - 25 + tenants, Only 2/3 active at any given time
  • 24. 24 © Cloudera, Inc. All rights reserved. Scaling Impala Resource Pools • Design resource pools according to peak memory needed • Use simple grouping to create small, medium and large pools • Use Cloudera Manager API’s for user chargeback if needed
  • 25. 25 © Cloudera, Inc. All rights reserved. Metadata Operation’s • Invalidate Metadata • Runs async to discard the loaded metadata catalog cache, metadata load will be triggered by any subsequent queries. • Should be run when • New tables created / Tables dropped by Hive/Spark • Block locations changed by HDFS load balancer • Recover Partitions • Scans HDFS to check if any new partition directories were added and cache block metadata for those files • Refresh Table / Refresh Table Partition • Adding/Removing/Overwriting files into partitions via Hive/Spark • Running operations like ALTER TABLE • Reloads metadata for the table from HMS and does an incremental reload of the file and block metadata.
  • 26. 26 © Cloudera, Inc. All rights reserved. Scaling Metadata Operation’s • How bad does it get? • 18K Invalidate tables per day. • What not to do • No commands are needed if operations/ETL runs in Impala • Always run refresh <table> <partition> when adding data • Recover partitions when partitions are added • Refresh Table for other changes • Limit IM to <Table> Only
  • 27. 27 © Cloudera, Inc. All rights reserved. Automatic Metadata Sync (In Beta) • CatalogD polls Hive Metastore (HMS) notifications events • Invalidates the tables when it receives the ALTER TABLE events or the ALTER, ADD, or DROP their partitions. • Adds the tables or databases when it receives the CREATE TABLE or CREATE DATABASE events. • Removes the tables from catalogd when it receives the DROP TABLE or DROP DATABASE events. • The operations that do not generate events in HMS, such as adding new data to existing tables/partitions from Spark are not supported. • Aka, Load / Insert still needs a refresh table partition
  • 28. 28 © Cloudera, Inc. All rights reserved. Automatic Metadata Sync (In Beta) • To disable the event based HMS sync for a new database, set the impala.disableHmsSync database properties in Hive • CREATE DATABASE <name> WITH DBPROPERTIES ('impala.disableHmsSync'='true'); • To enable or disable the event based HMS sync for a table • CREATE TABLE <name> WITH TBLPROPERTIES ('impala.disableHmsSync'='true' | 'false'); • To change the event based HMS sync at the table level • ALTER TABLE <name> WITH TBLPROPERTIES ('impala.disableHmsSync'='true' | 'false’); • When both table and database level properties are set, the table level property takes precedence. • If the property is changed from true (meaning events are skipped) to false (meaning events are not skipped), issue a manual INVALIDATE METADATA command to reset.
  • 29. 29 © Cloudera, Inc. All rights reserved. Scaling Compute Stats • Compute Stats is very CPU-intensive – Based on number of rows, number of data files, the total size of the data files, and the file format. • For partitioned tables, the numbers are calculated per partition, and as totals for the whole table. • Limit the number of columns to only compute stats on columns involved in filters, join conditions, group by or partition by clauses. • Re-compute stats only when there is > 30% data change • Run compute stats on weekends/nights. Not needed after every data load. • If you reload a complete new set of data for a table, but the number of rows and number of distinct values for each column is relatively unchanged from before, you do not need to recompute stats for the table • Use enable_stats_extrapolation (experimental)
  • 30. 30 © Cloudera, Inc. All rights reserved. Set Statistics Manually • Quick fix as part of data load, while compute stats can be scheduled on weekends • Set total number of rows. Applies to both unpartitioned and partitioned tables. • alter table <table_name> set tblproperties('numRows'='new_value', 'STATS_GENERATED_VIA_STATS_TASK'='true’); • Set total number of rows for a specific partition. Applies to partitioned tables only. -- You must specify all the partition key columns in the PARTITION clause. • alter table table_name partition (keycol1=val1,keycol2=val2...) set tblproperties('numRows'='new_value', 'STATS_GENERATED_VIA_STATS_TASK'='true’); • Column stats: • ALTER TABLE <table_name> SET COLUMN STATS <col_name> (‘numDVs'=‘100‘) • Compute numDVs with “SELECT NDV(col)”
  • 31. 31 © Cloudera, Inc. All rights reserved. Other Scalability Considerations • Use star schemas, integer join keys • Check for hot spotting - Increase replication factor for master data / frequently queries data • Avoid casts – implicit or explicit (easily over 10% improvements for larger volumes) • Increase RUNTIME_FILTER_WAIT_TIME_MS for complicated queries, but coordinators need to do more work • Use HDFS file handle cache • Give the OS enough free memory to cache data blocks • Set default compression codec - improves disk read performance • Use high CPU nodes, fast processors • Impala clusters - DistCp the data from remote cluster
  • 32. 32 © Cloudera, Inc. All rights reserved. BI Tools • Always always close queries • idle_query_timeout = 60 • idle_session_timeout = 1800 • Use handcrafted SQL’s • Use different pools for different queries and encourage use of set mem_limit; • Use JDBC over Kerberos authentication mechanism
  • 33. 33 © Cloudera, Inc. All rights reserved. Understanding Query Profiles • Impala query profiles can be retrieved from Cloudera manager, impala coordinator webui or from the command line by executing `profile` • Includes nanosecond timers for all operations on all nodes • Quite detailed and exhaustive, but the basics are easy • We can easily answer - • What’s the bottleneck for this query? • Why this run is fast but that run is slow? • How can I tune to improve this query’s performance.
  • 34. 34 © Cloudera, Inc. All rights reserved. Understanding Query Profiles • Always check impala version and default query options set • Check warnings • Query state – running, cancelled • Check query type – Query, DDL, etc
  • 35. 35 © Cloudera, Inc. All rights reserved. Understanding Query Profiles • Check per node peak memory usage • Tell you how much should be the memory limit for these queries • Shows skews in memory usage among nodes
  • 36. 36 © Cloudera, Inc. All rights reserved. Understanding Query Profiles • For completed queries, read the summary in detail • Check what’s taking the max time, max memory, check for skews in data • Check row estimates, depending on stats available or not these can be skewed • Check join order is determined entirely by total size (#rows * column width) • Try to ensure after partition pruning, we have RHS smaller than LHS • Broadcast joins are the default, partitioned joins for large tables of roughly equal size.
  • 37. 37 © Cloudera, Inc. All rights reserved. Understanding Query Profiles • Read the query timeline in detail • Check which step is taking the most time and why? • Usual culprits • Metadata load • Completed admission • ClientFetchWaitTimer • First dynamic filter received • Last row fetched
  • 38. 38 © Cloudera, Inc. All rights reserved. Understanding Query Profiles • Check each plan fragment • Tells us what it did, how many hosts did it run on • How much data it processed • Partition pruning stats for HDFS scans • Parquet push down predicates
  • 39. 39 © Cloudera, Inc. All rights reserved. Key Takeaways • Always use dedicated coordinators/executors • IAC should be enabled and memory limit set • Metadata management is significantly improved • Zero touch metadata coming soon • Follow best practices for impala queries and performance tuning –Refer Impala cookbook
  • 40. 40 © Cloudera, Inc. All rights reserved. Rate today’s session