1
Apache Impala Internals
Tanel Poder
@tanelpoder
http://gluent.com
22
Gluent - who we are
I also co-authored the Expert Oracle
Exadata book
Tanel Poder
Co-founder & CEO & still a performance
geek
I was an independent consultant for many
years, Oracle performance & scalability
work.
Long term Oracle Database &
Data Warehousing guys –
focused on performance & scale ...
Alumni 2009-2016
... We are on a mission to liberate
enterprise data!
3
• Explain some interesting details about Apache Impala internals
• Including low level geekery!
• Use typical tools for everyday (query performance) work
• Compare some of these items to a typical Oracle Database setup
• Hacking session: Let’s just run stuff and see what happens!
Agenda
4
• Google Dremel paper:
• https://research.google.com/pubs/pub36632.html
• Impala, Drill, Druid, BigQuery, ...
Google’s Dremel Paper
5
• Interactive analytic database engine for Hadoop
• Low latency as in <1 second queries possible (even over large datasets)
• Initially built & announced by Cloudera in 2012
• http://blog.cloudera.com/blog/2012/10/cloudera-impala-real-time-queries-in-
apache-hadoop-for-real/
• Open source, submitted to Apache Software Foundation
• Incubating, more contributors welcome!
• http://impala.apache.org/
• Shipped by Cloudera, MapR, Oracle (with CDH on BDA)
• Is also part of earlier versions of Amazon EMR image
Apache Impala (incubating)
6
Hadoop-native SQL engine
• http://impala.apache.org/overview.html
HDFS
7
• All nodes can perform any query execution task
• Any node can accept connections + do reads, writes, joins, aggr + talk to other nodes!
Fully Symmetric MPP SQL engine
Impala Hive SparkSQL PrestoDrill
Impala
impalad
Hive
impalad
SparkSQL
impalad
Drill
impalad
Presto
impalad
HDFS
Impala 2.9.x supports specifying dedicated coordinator nodes:
https://issues.apache.org/jira/browse/IMPALA-3807
8
• Hive Metastore = Data Dictionary of Hadoop
• Multiple (SQL) engines can share the same Hive Metastore backend
• ... and see all data
Hive Metastore Integration
Hive
Metastore
Impala Hive SparkSQL PrestoDrill
Impala
Impala
Hive
Hive
SparkSQL
SparkSQL
Drill
Drill
Presto
Presto
HDFS
9
• Decoupled computation engine & storage location + format
• Text, Avro, Sequencefile
• Parquet
• Open Source Columnar data format for efficient scanning & analytics
• Just like a database with lots of external tables
• Other engines can also read the same datafiles concurrently
• There’s more!
• Kudu - column store with transactional capabilities!
• S3 - Amazon S3 support
• ADLS - Azure Data Lake Store support (HDFS in Azure)
Not your typical SQL database engine
One data,
many engines
10
Columnar Data Structure
A1
A2
A3
A4
B1
B2
B3
B4
C1
C2
C3
C4
Col
A
Col
B
Col
C
Logical Table Row-oriented physical layout
Column-oriented physical layout
A1 A2 A3 A4B1 B2 B3 B4C1 C2 C3 C4
A1 A2 A3 A4 B1 B2 B3 B4 C1 C2 C3 C4
Great for sequential scanning of many records
Great for random reads and writes of entire records (OLTP)
11
Columnar Data Structure (physical layout)
Scanning a column in a
row-oriented data block
Scanning a column in a
column-oriented data block
col 1 col 2
col 3
col 4
col 5
col 6
col 2
col 2
col 3
col 3
col 4
col 4
col 5
col 5
col5
col 6
col 1 col 2
3…
col 3 col 4
col 4 col 5
col 6 col 1 col 2
col 3
col 3
col 4
col 4
col 5
col 5
col 1 col 2
col 6
col 6
col 1 col 2
3…
col 3 col 4
col 4 col 5
col 6 col 1 col 2
col 3
col 3
col 4
col 4
col 5
col 5
col 1 col 2
col 6
col 6
col 1 col 2
3…
col 3 col 4
col 4 col 5
col 6 col 1 col 2
col 3
col 3
col 4
col 4
col 5
col 5
col 1 col 2
col 6
col 6
Columnar compression
Reduced disk I/O
Scan only required columns
Process only required columns (CPU)
12
Parquet Structure
Source: http://apache.parquet.io
Typical structure sizes:
Row group:
128MB, 256MB, ..., 1GB
Page:
64kB...1MB
(can be even 8kB)
13
Parquet internal layout
Source: https://www.slideshare.net/julienledem/the-columnar-roadmap-apache-parquet-and-apache-arrow
Credit: Julien Le Dem / http://twitter.com/J_
Sophisticated encoding &
compression techniques
14
$ ps -ef | grep /usr/lib/impala/sbin
impala 5161 1 0 19:27 ? 00:00:00 /usr/lib/impala/sbin/statestored -log_dir=/var/log/impala...
impala 5618 1 11 19:27 ? 00:00:05 /usr/lib/impala/sbin/catalogd -log_dir=/var/log/impala ...
impala 5664 1 14 19:27 ? 00:00:06 /usr/lib/impala/sbin/impalad -log_dir=/var/log/impala ...
$ file ./impalad
./impalad: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs)
$ ps -Lfp `pgrep impala` | wc -l
245
$ sudo pmap `pgrep impalad`
5664: /usr/lib/impala/sbin/impalad -log_dir=/var/log/impala
0000000000400000 36200K r-x-- /usr/lib/impala/sbin-retail/impalad
000000000275a000 1332K rw--- /usr/lib/impala/sbin-retail/impalad
00000000028a7000 328K rw--- [ anon ]
000000000404f000 81548K rw--- [ anon ]
...
0000003f9f000000 1576K r-x-- /lib64/libc-2.12.so
0000003f9f400000 92K r-x-- /lib64/libpthread-2.12.so
...
00007f6bb76cc000 44K r--s- /usr/java/jdk1.7.0_67-cloudera/jre/lib/charsets.jar
00007f6bb76d7000 40K r--s- /usr/java/jdk1.7.0_67-cloudera/jre/lib/resources.jar
...
00007f6bd3188000 156K rw--- [ anon ]
00007f6bd31af000 11712K r-x-- /usr/java/jdk1.7.0_67-cloudera/jre/lib/amd64/server/libjvm.so
00007f6bd3d1f000 2044K ----- /usr/java/jdk1.7.0_67-cloudera/jre/lib/amd64/server/libjvm.so
00007f6bd3f1e000 788K rw--- /usr/java/jdk1.7.0_67-cloudera/jre/lib/amd64/server/libjvm.so
impalad (process layout)
Impala uses
both C++ and
Java under the
hood!
15
• Why Java and C++?
• Most of Hadoop ecosystem operates in Java (JVM) world
• Lots of tools, libraries, integration
• Still need more control & performance for some tasks
• Memory management
• Tight loops like scanning, filtering, aggregation etc
• Use Java for non-performance-critical (metadata) operations
• Use C++ for low level stuff
• Generate single-use special purpose binary executable code for tight loops
• LLVM.org
Geekery: Execution internals & LLVM
16
Impala Runtime Code Generation (LLVM compiler module)
Source: http://llvm.org/devmtg/2013-11/slides/Wanderman-Milne-Cloudera.pdf
• It’s all about CPU efficiency!
17
• Minimize branches & complexity in tight loops
• Oracle 12c In-Memory Option example
• Not instruction codegen. Jumptable to previously compiled special purpose functions
It’s all about CPU-efficiency
generate pcodes
3 opcodes
0: new order 0 opcode 646 cost 2
1: new order 1 opcode 646 cost 2
2: new order 2 opcode 646 cost 2
PCODE
------
version: PCODE1.0, flags: 0x0 size: 280, numbvs: 2
expeal: 0xcc16440
consts: 0x1b proj-pcode: 0x0
[0x7f114c13d658] [== constant] Filt0xffffffffffffffff BV1 = Col0, 0x3fbfaf2280(len=2)
[0x7f114c13d688] [branch if ] if (BV1 == 0) goto 0x7f114c13d708
[0x7f114c13d6a8] [>= and <= ] Filt0x1 BV2 = Col2, 0x3fbfaf1fc8(len=2), 0x3fbfaf1e30(len=3)
[0x7f114c13d6e8] [and ] BV1 = BV1 & BV2
[0x7f114c13d708] [end]
More info on CPU-efficiency topics:
http://blog.tanelpoder.com/2015/08/09/ram-is-the-new-disk-and-how-to-measure-its-performance-part-1/
18
• summary;
• profile;
• set live_summary = 1;
• set live_progress = 1;
• http://any-impala-host:25000
Typical Everyday Performance Tools
19
Impala Execution Plans (port 25000 of any coordinator node)
20
Impala Query Plans
21
“Shuffling” in an Oracle SQL execution plan
Hashing CPU,
network I/O CPU,
network traffic
22
IO skipping (storage index)
23
Impala
• https://blog.acolyer.org/2015/02/05/impala-a-modern-open-source-sql-engine-for-hadoop/
• http://cidrdb.org/cidr2015/Papers/CIDR15_Paper28.pdf
• https://blog.cloudera.com/blog/2015/11/new-in-cloudera-enterprise-5-5-support-for-complex-types-in-impala/
• http://blog.cloudera.com/blog/2013/02/inside-cloudera-impala-runtime-code-generation/
• http://llvm.org/devmtg/2013-11/slides/Wanderman-Milne-Cloudera.pdf
Parquet
• https://www.slideshare.net/RyanBlue3/parquet-performance-tuning-the-missing-guide
• https://www.slideshare.net/julienledem/the-columnar-roadmap-apache-parquet-and-apache-arrow
Dremel
• http://cloud.berkeley.edu/data/dremel.pptx
CPU-efficiency & columnar world
• http://blog.tanelpoder.com/2015/08/09/ram-is-the-new-disk-and-how-to-measure-its-performance-part-1/
Additional Reading
24
• Gluent customer webinar next Wednesday!
• July 26 @ 9am-10am CDT
• http://gluent.com/events/
Thank You!

Apache Impala Internals with Tanel Poder

  • 1.
    1 Apache Impala Internals TanelPoder @tanelpoder http://gluent.com
  • 2.
    22 Gluent - whowe are I also co-authored the Expert Oracle Exadata book Tanel Poder Co-founder & CEO & still a performance geek I was an independent consultant for many years, Oracle performance & scalability work. Long term Oracle Database & Data Warehousing guys – focused on performance & scale ... Alumni 2009-2016 ... We are on a mission to liberate enterprise data!
  • 3.
    3 • Explain someinteresting details about Apache Impala internals • Including low level geekery! • Use typical tools for everyday (query performance) work • Compare some of these items to a typical Oracle Database setup • Hacking session: Let’s just run stuff and see what happens! Agenda
  • 4.
    4 • Google Dremelpaper: • https://research.google.com/pubs/pub36632.html • Impala, Drill, Druid, BigQuery, ... Google’s Dremel Paper
  • 5.
    5 • Interactive analyticdatabase engine for Hadoop • Low latency as in <1 second queries possible (even over large datasets) • Initially built & announced by Cloudera in 2012 • http://blog.cloudera.com/blog/2012/10/cloudera-impala-real-time-queries-in- apache-hadoop-for-real/ • Open source, submitted to Apache Software Foundation • Incubating, more contributors welcome! • http://impala.apache.org/ • Shipped by Cloudera, MapR, Oracle (with CDH on BDA) • Is also part of earlier versions of Amazon EMR image Apache Impala (incubating)
  • 6.
    6 Hadoop-native SQL engine •http://impala.apache.org/overview.html HDFS
  • 7.
    7 • All nodescan perform any query execution task • Any node can accept connections + do reads, writes, joins, aggr + talk to other nodes! Fully Symmetric MPP SQL engine Impala Hive SparkSQL PrestoDrill Impala impalad Hive impalad SparkSQL impalad Drill impalad Presto impalad HDFS Impala 2.9.x supports specifying dedicated coordinator nodes: https://issues.apache.org/jira/browse/IMPALA-3807
  • 8.
    8 • Hive Metastore= Data Dictionary of Hadoop • Multiple (SQL) engines can share the same Hive Metastore backend • ... and see all data Hive Metastore Integration Hive Metastore Impala Hive SparkSQL PrestoDrill Impala Impala Hive Hive SparkSQL SparkSQL Drill Drill Presto Presto HDFS
  • 9.
    9 • Decoupled computationengine & storage location + format • Text, Avro, Sequencefile • Parquet • Open Source Columnar data format for efficient scanning & analytics • Just like a database with lots of external tables • Other engines can also read the same datafiles concurrently • There’s more! • Kudu - column store with transactional capabilities! • S3 - Amazon S3 support • ADLS - Azure Data Lake Store support (HDFS in Azure) Not your typical SQL database engine One data, many engines
  • 10.
    10 Columnar Data Structure A1 A2 A3 A4 B1 B2 B3 B4 C1 C2 C3 C4 Col A Col B Col C LogicalTable Row-oriented physical layout Column-oriented physical layout A1 A2 A3 A4B1 B2 B3 B4C1 C2 C3 C4 A1 A2 A3 A4 B1 B2 B3 B4 C1 C2 C3 C4 Great for sequential scanning of many records Great for random reads and writes of entire records (OLTP)
  • 11.
    11 Columnar Data Structure(physical layout) Scanning a column in a row-oriented data block Scanning a column in a column-oriented data block col 1 col 2 col 3 col 4 col 5 col 6 col 2 col 2 col 3 col 3 col 4 col 4 col 5 col 5 col5 col 6 col 1 col 2 3… col 3 col 4 col 4 col 5 col 6 col 1 col 2 col 3 col 3 col 4 col 4 col 5 col 5 col 1 col 2 col 6 col 6 col 1 col 2 3… col 3 col 4 col 4 col 5 col 6 col 1 col 2 col 3 col 3 col 4 col 4 col 5 col 5 col 1 col 2 col 6 col 6 col 1 col 2 3… col 3 col 4 col 4 col 5 col 6 col 1 col 2 col 3 col 3 col 4 col 4 col 5 col 5 col 1 col 2 col 6 col 6 Columnar compression Reduced disk I/O Scan only required columns Process only required columns (CPU)
  • 12.
    12 Parquet Structure Source: http://apache.parquet.io Typicalstructure sizes: Row group: 128MB, 256MB, ..., 1GB Page: 64kB...1MB (can be even 8kB)
  • 13.
    13 Parquet internal layout Source:https://www.slideshare.net/julienledem/the-columnar-roadmap-apache-parquet-and-apache-arrow Credit: Julien Le Dem / http://twitter.com/J_ Sophisticated encoding & compression techniques
  • 14.
    14 $ ps -ef| grep /usr/lib/impala/sbin impala 5161 1 0 19:27 ? 00:00:00 /usr/lib/impala/sbin/statestored -log_dir=/var/log/impala... impala 5618 1 11 19:27 ? 00:00:05 /usr/lib/impala/sbin/catalogd -log_dir=/var/log/impala ... impala 5664 1 14 19:27 ? 00:00:06 /usr/lib/impala/sbin/impalad -log_dir=/var/log/impala ... $ file ./impalad ./impalad: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs) $ ps -Lfp `pgrep impala` | wc -l 245 $ sudo pmap `pgrep impalad` 5664: /usr/lib/impala/sbin/impalad -log_dir=/var/log/impala 0000000000400000 36200K r-x-- /usr/lib/impala/sbin-retail/impalad 000000000275a000 1332K rw--- /usr/lib/impala/sbin-retail/impalad 00000000028a7000 328K rw--- [ anon ] 000000000404f000 81548K rw--- [ anon ] ... 0000003f9f000000 1576K r-x-- /lib64/libc-2.12.so 0000003f9f400000 92K r-x-- /lib64/libpthread-2.12.so ... 00007f6bb76cc000 44K r--s- /usr/java/jdk1.7.0_67-cloudera/jre/lib/charsets.jar 00007f6bb76d7000 40K r--s- /usr/java/jdk1.7.0_67-cloudera/jre/lib/resources.jar ... 00007f6bd3188000 156K rw--- [ anon ] 00007f6bd31af000 11712K r-x-- /usr/java/jdk1.7.0_67-cloudera/jre/lib/amd64/server/libjvm.so 00007f6bd3d1f000 2044K ----- /usr/java/jdk1.7.0_67-cloudera/jre/lib/amd64/server/libjvm.so 00007f6bd3f1e000 788K rw--- /usr/java/jdk1.7.0_67-cloudera/jre/lib/amd64/server/libjvm.so impalad (process layout) Impala uses both C++ and Java under the hood!
  • 15.
    15 • Why Javaand C++? • Most of Hadoop ecosystem operates in Java (JVM) world • Lots of tools, libraries, integration • Still need more control & performance for some tasks • Memory management • Tight loops like scanning, filtering, aggregation etc • Use Java for non-performance-critical (metadata) operations • Use C++ for low level stuff • Generate single-use special purpose binary executable code for tight loops • LLVM.org Geekery: Execution internals & LLVM
  • 16.
    16 Impala Runtime CodeGeneration (LLVM compiler module) Source: http://llvm.org/devmtg/2013-11/slides/Wanderman-Milne-Cloudera.pdf • It’s all about CPU efficiency!
  • 17.
    17 • Minimize branches& complexity in tight loops • Oracle 12c In-Memory Option example • Not instruction codegen. Jumptable to previously compiled special purpose functions It’s all about CPU-efficiency generate pcodes 3 opcodes 0: new order 0 opcode 646 cost 2 1: new order 1 opcode 646 cost 2 2: new order 2 opcode 646 cost 2 PCODE ------ version: PCODE1.0, flags: 0x0 size: 280, numbvs: 2 expeal: 0xcc16440 consts: 0x1b proj-pcode: 0x0 [0x7f114c13d658] [== constant] Filt0xffffffffffffffff BV1 = Col0, 0x3fbfaf2280(len=2) [0x7f114c13d688] [branch if ] if (BV1 == 0) goto 0x7f114c13d708 [0x7f114c13d6a8] [>= and <= ] Filt0x1 BV2 = Col2, 0x3fbfaf1fc8(len=2), 0x3fbfaf1e30(len=3) [0x7f114c13d6e8] [and ] BV1 = BV1 & BV2 [0x7f114c13d708] [end] More info on CPU-efficiency topics: http://blog.tanelpoder.com/2015/08/09/ram-is-the-new-disk-and-how-to-measure-its-performance-part-1/
  • 18.
    18 • summary; • profile; •set live_summary = 1; • set live_progress = 1; • http://any-impala-host:25000 Typical Everyday Performance Tools
  • 19.
    19 Impala Execution Plans(port 25000 of any coordinator node)
  • 20.
  • 21.
    21 “Shuffling” in anOracle SQL execution plan Hashing CPU, network I/O CPU, network traffic
  • 22.
  • 23.
    23 Impala • https://blog.acolyer.org/2015/02/05/impala-a-modern-open-source-sql-engine-for-hadoop/ • http://cidrdb.org/cidr2015/Papers/CIDR15_Paper28.pdf •https://blog.cloudera.com/blog/2015/11/new-in-cloudera-enterprise-5-5-support-for-complex-types-in-impala/ • http://blog.cloudera.com/blog/2013/02/inside-cloudera-impala-runtime-code-generation/ • http://llvm.org/devmtg/2013-11/slides/Wanderman-Milne-Cloudera.pdf Parquet • https://www.slideshare.net/RyanBlue3/parquet-performance-tuning-the-missing-guide • https://www.slideshare.net/julienledem/the-columnar-roadmap-apache-parquet-and-apache-arrow Dremel • http://cloud.berkeley.edu/data/dremel.pptx CPU-efficiency & columnar world • http://blog.tanelpoder.com/2015/08/09/ram-is-the-new-disk-and-how-to-measure-its-performance-part-1/ Additional Reading
  • 24.
    24 • Gluent customerwebinar next Wednesday! • July 26 @ 9am-10am CDT • http://gluent.com/events/ Thank You!