SlideShare a Scribd company logo
1 
Impala 
2.0 
Update 
Sho 
Shimauchi, 
Cloudera 
2014/10/31
2 
Today’s 
Topic 
• What 
is 
Cloudera 
Impala? 
• Impala 
1.4 
/ 
2.0 
update 
• Performance 
Improvement 
• Query 
Language 
• Resource 
Management 
and 
Security 
• Others
3 
Who 
am 
I 
? 
• Pre-­‐sales 
SoluLons 
Architect 
• joined 
Cloudera 
in 
2011, 
the 
first 
Japanese 
employee 
at 
Cloudera 
• email: 
sho@cloudera.com 
• twiTer: 
@shiumachi
4 
Cloudera 
Impala
5 
What 
is 
Impala? 
• MPP 
SQL 
query 
engine 
for 
Hadoop 
environment 
• wriTen 
in 
naLve 
code 
for 
maximum 
hardware 
efficiency 
• open-­‐source! 
• hTp://impala.io/ 
• Supported 
by 
Cloudera, 
Amazon, 
and 
MapR 
• History 
• 2012/10 
Public 
Beta 
released 
• 2013/04 
Impala 
1.0 
released 
• current 
version: 
Impala 
2.0
6 
Impala 
is 
easy 
to 
use 
• create 
tables 
as 
virtual 
views 
over 
data 
stored 
in 
HDFS 
/ 
HBase 
• schema 
metadata 
is 
stored 
in 
Metastore 
• shared 
with 
Hive, 
Pig, 
etc. 
• connect 
via 
ODBC 
/ 
JDBC 
• authenLcate 
via 
Kerberos 
/ 
LDAP 
• run 
standard 
SQL 
• ANSI 
SQL-­‐92 
based 
• limited 
to 
SELECT 
and 
bulk 
INSERT 
• no 
correlated 
subqueries 
available 
in 
2.0 
• UDF 
/ 
UDAF
7 
Impala 
1.4 
(2014/07) 
• DECIMAL(<precision>, 
<scale>) 
• HDFS 
caching 
DDL 
• column 
definiLon 
based 
on 
Parquet 
file 
(CREATE 
TABLE 
… 
LIKE 
PARQUET) 
• ORDER 
BY 
without 
LIMIT 
• LDAP 
connecLons 
through 
TLS 
• SHOW 
PARTITIONS 
• YARN 
integrated 
resource 
manager 
will 
be 
producLon 
ready 
• Llama 
HA 
support 
• CREATE 
TABLE 
… 
STORED 
AS 
AVRO 
• SUMMARY 
command 
in 
impala-­‐shell 
(provides 
high-­‐level 
summary 
of 
query 
plan) 
• faster 
COMPUTE 
STATS 
• Performance 
improvements 
for 
parLLon 
pruning 
• impala 
shell 
supports 
UTF-­‐8 
characters 
• addiLonal 
built-­‐ins 
from 
EDW 
systems
8 
Impala 
2.0 
(2014/10) 
• hash 
table 
can 
spill 
to 
disk 
• join 
and 
aggregate 
tables 
of 
arbitrary 
size 
• Subquery 
enhancements 
• allowed 
in 
WHERE 
queries 
• EXISTS 
/ 
NOT 
EXISTS 
• IN 
/ 
NOT 
IN 
can 
operate 
on 
the 
result 
set 
from 
a 
subquery 
• correlated 
/ 
uncorrelated 
subqueries 
• scalar 
subqueries 
• SQL 
2003 
compliant 
analyLc 
window 
funcLons 
• LEAD(), 
LAG(), 
RANK(), 
FIRST_VALUE(), 
etc. 
• New 
Data 
Type: 
VARCHAR, 
CHAR 
• Security 
Enhancements 
• mulLple 
authenLcaLon 
methods 
• GRANT 
/ 
REVOKE 
/ 
CREATE 
ROLE 
/ 
DROP 
ROLE 
/ 
SHOW 
ROLES 
/ 
etc. 
• text 
+ 
gzip 
/ 
bzip2 
/ 
Snappy 
• Hint 
inside 
views 
• QUERY_TIMEOUT_S 
• DATE_PART() 
/ 
EXTRACT() 
• Parquet 
default 
block 
size 
is 
changed 
to 
256MB 
(was: 
1GB) 
• LEFT 
ANTI 
JOIN 
/ 
RIGHT 
ANTI 
JOIN 
• impala-­‐shell 
can 
read 
sesngs 
from 
$HOME/.impalarc
9 
Performance 
Improvement
10 
HDFS 
caching 
• When 
HDFS 
files 
are 
cached 
in 
memory, 
Impala 
can 
read 
the 
cached 
data 
without 
any 
disk 
reads, 
and 
without 
making 
an 
addiLonal 
copy 
of 
the 
data 
in 
memory 
• avoids 
checksumming 
and 
data 
copies 
• new 
HDFS 
API 
is 
available 
in 
CDH 
5.0 
• configure 
cache 
with 
Impala 
DDL 
• CREATE 
TABLE 
tbl_name 
CACHED 
IN 
‘<pool>’ 
• ALTER 
TABLE 
tbl_name 
ADD 
PARTITION 
… 
CACHED 
IN 
‘<pool>’
11 
ParLLon 
Pruning 
improvement 
• 
Previously, 
Impala 
typically 
queried 
tables 
with 
up 
to 
approximately 
3000 
parLLons. 
With 
the 
performance 
improvement 
in 
parLLon 
pruning, 
now 
Impala 
can 
comfortably 
handle 
tables 
with 
tens 
of 
thousands 
of 
parLLons.
12 
Spilling 
to 
Disk 
SQL 
OperaLon 
• write 
temporary 
data 
to 
when 
Impala 
is 
close 
to 
exceeding 
its 
memory 
limit 
• In 
PROFILE, 
BlockMgr.BytesWriTen 
counter 
reports 
how 
much 
data 
was 
wriTen 
to 
disk 
during 
the 
query
13 
Query 
Language
14 
Subquery 
Scalar 
subquery: 
produces 
a 
result 
set 
with 
a 
single 
row 
containing 
a 
single 
column 
SELECT x FROM t1 WHERE x > (SELECT MAX(y) FROM t2);! 
Uncorrelated 
subquery: 
not 
refer 
to 
any 
tables 
from 
the 
outer 
block 
of 
the 
query 
SELECT x FROM t1 WHERE x IN (SELECT y FROM t2);! 
Correlated 
subquery: 
compare 
one 
or 
more 
values 
from 
the 
outer 
query 
block 
to 
values 
referenced 
in 
the 
WHERE 
clause 
of 
the 
subquery 
SELECT employee_name, employee_id FROM employees one WHERE! 
salary > (SELECT avg(salary) FROM employees two WHERE 
one.dept_id = two.dept_id);!
15 
AnalyLc 
FuncLons 
(a.k.a 
Window 
FuncLons) 
• supported 
in 
2.0 
and 
later 
• supported 
funcLons 
• RANK() 
/ 
DENSE_RANK() 
• FIRST_VALUE() 
/ 
LAST_VALUE() 
• LAG() 
/ 
LEAD() 
• ROW_NUMBER() 
• Aggregate 
funcLons 
are 
already 
implemented 
• MAX(), 
MIN(), 
AVG(), 
SUM(), 
etc.
16 
AnalyLc 
FuncLons 
Example 
For 
each 
day, 
the 
query 
prints 
the 
closing 
price 
alongside 
the 
previous 
day's 
closing 
price: 
select stock_symbol, closing_date, closing_price,! 
lag(closing_price,1) over (partition by stock_symbol order by closing_date) as 
"yesterday closing"! 
from stock_ticker! 
order by closing_date;! 
+--------------+---------------------+---------------+-------------------+! 
| stock_symbol | closing_date | closing_price | yesterday closing |! 
+--------------+---------------------+---------------+-------------------+! 
| JDR | 2014-09-13 00:00:00 | 12.86 | NULL |! 
| JDR | 2014-09-14 00:00:00 | 12.89 | 12.86 |! 
| JDR | 2014-09-15 00:00:00 | 12.94 | 12.89 |! 
| JDR | 2014-09-16 00:00:00 | 12.55 | 12.94 |! 
| JDR | 2014-09-17 00:00:00 | 14.03 | 12.55 |! 
| JDR | 2014-09-18 00:00:00 | 14.75 | 14.03 |! 
| JDR | 2014-09-19 00:00:00 | 13.98 | 14.75 |! 
+--------------+---------------------+---------------+-------------------+!
17 
ApproximaLon 
features 
• APPX_COUNT_DISTINCT 
query 
opLon 
• rewrite 
COUNT(DISTINCT) 
calls 
to 
use 
NDV() 
• speeds 
up 
the 
operaLon 
• allows 
mulLple 
COUNT(DISTINCT) 
in 
a 
single 
query 
• APPX_MEDIAN() 
• returns 
a 
value 
that 
is 
approximately 
the 
median 
(midpoint) 
of 
values 
in 
the 
set 
of 
input 
values
18 
Approx. 
funcLons 
example 
[localhost:21000] > select min(x), max(x), avg(x) from 
million_numbers;! 
+-------------------+-------------------+-------------------+! 
| min(x) | max(x) | avg(x) |! 
+-------------------+-------------------+-------------------+! 
| 4.725693727250069 | 49994.56852674231 | 24945.38563793553 |! 
+-------------------+-------------------+-------------------+! 
[localhost:21000] > select appx_median(x) from million_numbers;! 
+----------------+! 
| appx_median(x) |! 
+----------------+! 
| 24721.6 |! 
+----------------+!
19 
CREATE 
TABLE 
… 
LIKE 
PARQUET 
• CREATE 
TABLE 
... 
LIKE 
PARQUET 
'hdfs_path_of_parquet_file' 
• The 
column 
names 
and 
data 
types 
are 
automaLcally 
configured 
based 
on 
the 
Parquet 
data 
file
20 
ORDER 
BY 
without 
LIMIT 
• LIMIT 
clause 
is 
now 
opLonal 
for 
queries 
that 
use 
the 
ORDER 
BY 
clause 
• Impala 
automaLcally 
uses 
a 
temporary 
disk 
work 
area 
to 
perform 
the 
sort 
if 
the 
sort 
operaLon 
would 
otherwise 
exceed 
the 
Impala 
memory 
limit 
for 
a 
parLcular 
data 
node.
21 
DECODE() 
SELECT event, DECODE(day_of_week, 1, "Monday", 2, "Tuesday", 3, 
"Wednesday”, 4, "Thursday", 5, "Friday", 6, "Saturday", 7, 
"Sunday", "Unknown day")! 
FROM calendar;!
22 
ANTI 
JOIN 
LEFT 
ANTI 
JOIN 
/ 
RIGHT 
ANTI 
JOIN 
are 
supported 
in 
Impala 
2.0 
[localhost:21000] > create table t1 (x int);! 
[localhost:21000] > insert into t1 values (1), (2), (3), (4), (5), (6);! 
! 
[localhost:21000] > create table t2 (y int);! 
[localhost:21000] > insert into t2 values (2), (4), (6);! 
! 
[localhost:21000] > select x from t1 left anti join t2 on (t1.x = t2.y);! 
+---+! 
| x |! 
+---+! 
| 1 |! 
| 3 |! 
| 5 |! 
+---+! 
!
23 
new 
data 
types 
• DECIMAL 
(Impala 
1.4) 
• column_name 
DECIMAL[(precision[,scale])] 
• with 
no 
precision 
or 
scale 
values 
is 
equivalent 
to 
DECIMAL(9,0) 
• VARCHAR 
(Impala 
2.0) 
• STRING 
with 
a 
max 
length 
• CHAR 
(Impala 
2.0) 
• STRING 
with 
a 
precise 
length
24 
new 
built-­‐in 
funcLons 
• EXTRACT() 
: 
returns 
one 
date 
or 
Lme 
field 
from 
a 
TIMESTAMP 
value 
• TRUNC() 
: 
truncates 
date/Lme 
values 
to 
year, 
month, 
etc. 
• ADD_MONTHS(): 
alias 
for 
MONTHS_ADD() 
• ROUND(): 
rounds 
DECIMAL 
values 
• for 
compuLng 
properLes 
for 
staLsLcal 
distribuLons 
• STDDEV() 
• STDDEV_SAMP() 
/ 
STDDEV_POP() 
• VARIANCE() 
• VARIANCE_SAMP() 
/ 
VARIANCE_POP() 
• MAX_INT() 
/ 
MIN_SMALLINT() 
• IS_INF() 
/ 
IS_NAN()
25 
SHOW 
PARTITIONS 
[localhost:21000] > show partitions census;! 
+-------+-------+--------+------+---------+! 
| year | #Rows | #Files | Size | Format |! 
+-------+-------+--------+------+---------+! 
| 2000 | -1 | 0 | 0B | TEXT |! 
| 2004 | -1 | 0 | 0B | TEXT |! 
| 2008 | -1 | 0 | 0B | TEXT |! 
| 2010 | -1 | 0 | 0B | TEXT |! 
| 2011 | 4 | 1 | 22B | TEXT |! 
| 2012 | 4 | 1 | 22B | TEXT |! 
| 2013 | 1 | 1 | 231B | PARQUET |! 
| Total | 9 | 3 | 275B | |! 
+-------+-------+--------+------+---------+! 
!
26 
SUMMARY 
• impala-­‐shell 
command 
• easy-­‐to-­‐digest 
overview 
of 
the 
Lmings 
for 
the 
different 
phases 
of 
execuLon 
for 
a 
query 
[localhost:21000] > select avg(ss_sales_price) from store_sales where ss_coupon_amt = 0;! 
+---------------------+! 
| avg(ss_sales_price) |! 
+---------------------+! 
| 37.80770926328327 |! 
+---------------------+! 
[localhost:21000] > summary;! 
+--------------+--------+----------+----------+-------+------------+----------+---------------+-----------------+! 
| Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail |! 
+--------------+--------+----------+----------+-------+------------+----------+---------------+-----------------+! 
| 03:AGGREGATE | 1 | 1.03ms | 1.03ms | 1 | 1 | 48.00 KB | -1 B | MERGE FINALIZE |! 
| 02:EXCHANGE | 1 | 0ns | 0ns | 1 | 1 | 0 B | -1 B | UNPARTITIONED |! 
| 01:AGGREGATE | 1 | 30.79ms | 30.79ms | 1 | 1 | 80.00 KB | 10.00 MB | |! 
| 00:SCAN HDFS | 1 | 5.45s | 5.45s | 2.21M | -1 | 64.05 MB | 432.00 MB | tpc.store_sales |! 
+--------------+--------+----------+----------+-------+------------+----------+---------------+-----------------+!
27 
SET 
statement 
• Before 
Impala 
2.0, 
SET 
can 
be 
used 
only 
in 
impala-­‐ 
shell 
• In 
Impala 
2.0, 
you 
can 
use 
SET 
in 
client 
app 
through 
JDBC 
/ 
ODBC 
APIs.
28 
Resource 
Management 
and 
Security
29 
Admission 
Control 
(Impala 
1.3) 
• Fast 
and 
lightweight 
resource 
management 
mechanism 
• avoids 
oversubscripLon 
of 
resources 
for 
concurrent 
workloads 
• queries 
are 
queued 
when 
reaching 
configurable 
limits 
• Run 
on 
every 
impalad 
• no 
SPOF
30 
YARN 
and 
Llama 
• Llama: 
Low 
Latency 
ApplicaLon 
MAster 
• Subdivides 
coarse-­‐grain 
YARN 
scheduling 
into 
finer-­‐ 
granularity 
for 
low-­‐latency 
and 
short-­‐lived 
queries 
• Llama 
registers 
one 
long-­‐lived 
AM 
per 
YARN 
pool 
• Llama 
caches 
resources 
allocated 
by 
YARN 
for 
a 
short 
Lme, 
so 
that 
they 
can 
be 
quickly 
re-­‐allocated 
to 
Impala 
queries 
• much 
faster 
than 
waiLng 
for 
YARN 
• Impala 
1.4: 
GA. 
Llama 
HA 
support
31 
Query 
Timeout 
• A 
new 
query 
opLon, 
QUERY_TIMEOUT_S, 
lets 
you 
specify 
a 
Lmeout 
period 
in 
seconds 
for 
individual 
queries 
• Note: 
The 
Lmeout 
clock 
for 
queries 
and 
sessions 
only 
starts 
Lcking 
when 
the 
query 
or 
session 
is 
idle
32 
Security 
• Impala 
2.0 
can 
accept 
either 
kind 
of 
auth. 
request 
• ex) 
host 
A 
with 
Kerberos, 
and 
host 
B 
with 
LDAP 
• Security 
related 
statement 
• GRANT 
• REVOKE 
• CREATE 
ROLE 
• DROP 
ROLE 
• SHOW 
ROLES 
• SHOW 
ROLE 
GRANT 
• -­‐-­‐disk_spill_encrypLon 
opLon
33 
Others
34 
Text 
+ 
gzip, 
bzip2, 
and 
Snappy 
• In 
Impala 
2.0 
and 
later, 
Impala 
supports 
using 
text 
data 
files 
that 
employ 
gzip, 
bzip2, 
or 
Snappy 
compression 
• use 
ROW 
FORMAT 
with 
delimiter 
and 
escape 
character 
to 
create 
table 
CREATE TABLE csv_compressed (a STRING, b STRING, c STRING)! 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";!
35 
impala-­‐shell 
• UTF-­‐8 
support 
(1.4) 
• .impalarc 
file 
(2.0) 
[impala]! 
verbose=true! 
default_db=tpc_benchmarking! 
write_delimited=true! 
output_delimiter=,! 
output_file=/home/tester1/benchmark_results.csv! 
show_profiles=true!
36 
DocumentaLon 
• Cluster 
Sizing 
Guidelines 
for 
Impala 
• hTp://www.cloudera.com/content/cloudera/en/ 
documentaLon/core/latest/topics/ 
impala_cluster_sizing.html
37

More Related Content

What's hot

Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
markgrover
 
Kudu: Fast Analytics on Fast Data
Kudu: Fast Analytics on Fast DataKudu: Fast Analytics on Fast Data
Kudu: Fast Analytics on Fast Data
michaelguia
 
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache HadoopNYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoop
markgrover
 
HBase and Accumulo | Washington DC Hadoop User Group
HBase and Accumulo | Washington DC Hadoop User GroupHBase and Accumulo | Washington DC Hadoop User Group
HBase and Accumulo | Washington DC Hadoop User Group
Cloudera, Inc.
 
Strata London 2019 Scaling Impala
Strata London 2019 Scaling ImpalaStrata London 2019 Scaling Impala
Strata London 2019 Scaling Impala
Manish Maheshwari
 
Hadoop For Enterprises
Hadoop For EnterprisesHadoop For Enterprises
Hadoop For Enterprises
nvvrajesh
 
HiveServer2
HiveServer2HiveServer2
HiveServer2
Schubert Zhang
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start Tutorial
Carl Steinbach
 
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのかApache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Toshihiro Suzuki
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for Beginners
Rahul Jain
 
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer toolsMay 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
Yahoo Developer Network
 
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQLCompressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Arseny Chernov
 
Apache Sqoop: Unlocking Hadoop for Your Relational Database
Apache Sqoop: Unlocking Hadoop for Your Relational Database Apache Sqoop: Unlocking Hadoop for Your Relational Database
Apache Sqoop: Unlocking Hadoop for Your Relational Database
huguk
 
Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesBig data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting Languages
Corley S.r.l.
 
Barcelona mysqlnd qc
Barcelona mysqlnd qcBarcelona mysqlnd qc
Barcelona mysqlnd qcAnis Berejeb
 
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
Amazon Web Services
 
SolrCloud on Hadoop
SolrCloud on HadoopSolrCloud on Hadoop
SolrCloud on Hadoop
Alex Moundalexis
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
lucenerevolution
 
Sqoop
SqoopSqoop
Hw09 Monitoring Best Practices
Hw09   Monitoring Best PracticesHw09   Monitoring Best Practices
Hw09 Monitoring Best PracticesCloudera, Inc.
 

What's hot (20)

Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
Kudu: Fast Analytics on Fast Data
Kudu: Fast Analytics on Fast DataKudu: Fast Analytics on Fast Data
Kudu: Fast Analytics on Fast Data
 
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache HadoopNYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoop
 
HBase and Accumulo | Washington DC Hadoop User Group
HBase and Accumulo | Washington DC Hadoop User GroupHBase and Accumulo | Washington DC Hadoop User Group
HBase and Accumulo | Washington DC Hadoop User Group
 
Strata London 2019 Scaling Impala
Strata London 2019 Scaling ImpalaStrata London 2019 Scaling Impala
Strata London 2019 Scaling Impala
 
Hadoop For Enterprises
Hadoop For EnterprisesHadoop For Enterprises
Hadoop For Enterprises
 
HiveServer2
HiveServer2HiveServer2
HiveServer2
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start Tutorial
 
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのかApache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for Beginners
 
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer toolsMay 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
 
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQLCompressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
 
Apache Sqoop: Unlocking Hadoop for Your Relational Database
Apache Sqoop: Unlocking Hadoop for Your Relational Database Apache Sqoop: Unlocking Hadoop for Your Relational Database
Apache Sqoop: Unlocking Hadoop for Your Relational Database
 
Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesBig data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting Languages
 
Barcelona mysqlnd qc
Barcelona mysqlnd qcBarcelona mysqlnd qc
Barcelona mysqlnd qc
 
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
 
SolrCloud on Hadoop
SolrCloud on HadoopSolrCloud on Hadoop
SolrCloud on Hadoop
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 
Sqoop
SqoopSqoop
Sqoop
 
Hw09 Monitoring Best Practices
Hw09   Monitoring Best PracticesHw09   Monitoring Best Practices
Hw09 Monitoring Best Practices
 

Similar to Impala 2.0 Update #impalajp

Oracle Database 12c - New Features for Developers and DBAs
Oracle Database 12c - New Features for Developers and DBAsOracle Database 12c - New Features for Developers and DBAs
Oracle Database 12c - New Features for Developers and DBAs
Alex Zaballa
 
Oracle Database 12c - New Features for Developers and DBAs
Oracle Database 12c  - New Features for Developers and DBAsOracle Database 12c  - New Features for Developers and DBAs
Oracle Database 12c - New Features for Developers and DBAs
Alex Zaballa
 
Optimizer percona live_ams2015
Optimizer percona live_ams2015Optimizer percona live_ams2015
Optimizer percona live_ams2015
Manyi Lu
 
Query Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New TricksQuery Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New Tricks
MYXPLAIN
 
An Approach to Sql tuning - Part 1
An Approach to Sql tuning - Part 1An Approach to Sql tuning - Part 1
An Approach to Sql tuning - Part 1
Navneet Upneja
 
NOCOUG_201311_Fine_Tuning_Execution_Plans.pdf
NOCOUG_201311_Fine_Tuning_Execution_Plans.pdfNOCOUG_201311_Fine_Tuning_Execution_Plans.pdf
NOCOUG_201311_Fine_Tuning_Execution_Plans.pdf
cookie1969
 
[JSS2015] In memory and operational analytics
[JSS2015] In memory and operational analytics[JSS2015] In memory and operational analytics
[JSS2015] In memory and operational analytics
GUSS
 
Jss 2015 in memory and operational analytics
Jss 2015   in memory and operational analyticsJss 2015   in memory and operational analytics
Jss 2015 in memory and operational analytics
David Barbarin
 
Streaming ETL - from RDBMS to Dashboard with KSQL
Streaming ETL - from RDBMS to Dashboard with KSQLStreaming ETL - from RDBMS to Dashboard with KSQL
Streaming ETL - from RDBMS to Dashboard with KSQL
Bjoern Rost
 
Sql and PL/SQL Best Practices I
Sql and PL/SQL Best Practices ISql and PL/SQL Best Practices I
Sql and PL/SQL Best Practices I
Carlos Oliveira
 
Oracle Database 12c - The Best Oracle Database 12c Tuning Features for Develo...
Oracle Database 12c - The Best Oracle Database 12c Tuning Features for Develo...Oracle Database 12c - The Best Oracle Database 12c Tuning Features for Develo...
Oracle Database 12c - The Best Oracle Database 12c Tuning Features for Develo...
Alex Zaballa
 
Oracle SQL Tuning
Oracle SQL TuningOracle SQL Tuning
Oracle SQL Tuning
Alex Zaballa
 
Developers' mDay 2017. - Bogdan Kecman Oracle
Developers' mDay 2017. - Bogdan Kecman OracleDevelopers' mDay 2017. - Bogdan Kecman Oracle
Developers' mDay 2017. - Bogdan Kecman Oracle
mCloud
 
Developers’ mDay u Banjoj Luci - Bogdan Kecman, Oracle – MySQL Server 8.0
Developers’ mDay u Banjoj Luci - Bogdan Kecman, Oracle – MySQL Server 8.0Developers’ mDay u Banjoj Luci - Bogdan Kecman, Oracle – MySQL Server 8.0
Developers’ mDay u Banjoj Luci - Bogdan Kecman, Oracle – MySQL Server 8.0
mCloud
 
Is SQLcl the Next Generation of SQL*Plus?
Is SQLcl the Next Generation of SQL*Plus?Is SQLcl the Next Generation of SQL*Plus?
Is SQLcl the Next Generation of SQL*Plus?
Zohar Elkayam
 
Top 10 tips for Oracle performance
Top 10 tips for Oracle performanceTop 10 tips for Oracle performance
Top 10 tips for Oracle performance
Guy Harrison
 
Hotsos 2011: Mining the AWR repository for Capacity Planning, Visualization, ...
Hotsos 2011: Mining the AWR repository for Capacity Planning, Visualization, ...Hotsos 2011: Mining the AWR repository for Capacity Planning, Visualization, ...
Hotsos 2011: Mining the AWR repository for Capacity Planning, Visualization, ...Kristofferson A
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentation
Michael Keane
 
Real World Performance - Data Warehouses
Real World Performance - Data WarehousesReal World Performance - Data Warehouses
Real World Performance - Data Warehouses
Connor McDonald
 

Similar to Impala 2.0 Update #impalajp (20)

Oracle Database 12c - New Features for Developers and DBAs
Oracle Database 12c - New Features for Developers and DBAsOracle Database 12c - New Features for Developers and DBAs
Oracle Database 12c - New Features for Developers and DBAs
 
Oracle Database 12c - New Features for Developers and DBAs
Oracle Database 12c  - New Features for Developers and DBAsOracle Database 12c  - New Features for Developers and DBAs
Oracle Database 12c - New Features for Developers and DBAs
 
Optimizer percona live_ams2015
Optimizer percona live_ams2015Optimizer percona live_ams2015
Optimizer percona live_ams2015
 
Master tuning
Master   tuningMaster   tuning
Master tuning
 
Query Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New TricksQuery Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New Tricks
 
An Approach to Sql tuning - Part 1
An Approach to Sql tuning - Part 1An Approach to Sql tuning - Part 1
An Approach to Sql tuning - Part 1
 
NOCOUG_201311_Fine_Tuning_Execution_Plans.pdf
NOCOUG_201311_Fine_Tuning_Execution_Plans.pdfNOCOUG_201311_Fine_Tuning_Execution_Plans.pdf
NOCOUG_201311_Fine_Tuning_Execution_Plans.pdf
 
[JSS2015] In memory and operational analytics
[JSS2015] In memory and operational analytics[JSS2015] In memory and operational analytics
[JSS2015] In memory and operational analytics
 
Jss 2015 in memory and operational analytics
Jss 2015   in memory and operational analyticsJss 2015   in memory and operational analytics
Jss 2015 in memory and operational analytics
 
Streaming ETL - from RDBMS to Dashboard with KSQL
Streaming ETL - from RDBMS to Dashboard with KSQLStreaming ETL - from RDBMS to Dashboard with KSQL
Streaming ETL - from RDBMS to Dashboard with KSQL
 
Sql and PL/SQL Best Practices I
Sql and PL/SQL Best Practices ISql and PL/SQL Best Practices I
Sql and PL/SQL Best Practices I
 
Oracle Database 12c - The Best Oracle Database 12c Tuning Features for Develo...
Oracle Database 12c - The Best Oracle Database 12c Tuning Features for Develo...Oracle Database 12c - The Best Oracle Database 12c Tuning Features for Develo...
Oracle Database 12c - The Best Oracle Database 12c Tuning Features for Develo...
 
Oracle SQL Tuning
Oracle SQL TuningOracle SQL Tuning
Oracle SQL Tuning
 
Developers' mDay 2017. - Bogdan Kecman Oracle
Developers' mDay 2017. - Bogdan Kecman OracleDevelopers' mDay 2017. - Bogdan Kecman Oracle
Developers' mDay 2017. - Bogdan Kecman Oracle
 
Developers’ mDay u Banjoj Luci - Bogdan Kecman, Oracle – MySQL Server 8.0
Developers’ mDay u Banjoj Luci - Bogdan Kecman, Oracle – MySQL Server 8.0Developers’ mDay u Banjoj Luci - Bogdan Kecman, Oracle – MySQL Server 8.0
Developers’ mDay u Banjoj Luci - Bogdan Kecman, Oracle – MySQL Server 8.0
 
Is SQLcl the Next Generation of SQL*Plus?
Is SQLcl the Next Generation of SQL*Plus?Is SQLcl the Next Generation of SQL*Plus?
Is SQLcl the Next Generation of SQL*Plus?
 
Top 10 tips for Oracle performance
Top 10 tips for Oracle performanceTop 10 tips for Oracle performance
Top 10 tips for Oracle performance
 
Hotsos 2011: Mining the AWR repository for Capacity Planning, Visualization, ...
Hotsos 2011: Mining the AWR repository for Capacity Planning, Visualization, ...Hotsos 2011: Mining the AWR repository for Capacity Planning, Visualization, ...
Hotsos 2011: Mining the AWR repository for Capacity Planning, Visualization, ...
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentation
 
Real World Performance - Data Warehouses
Real World Performance - Data WarehousesReal World Performance - Data Warehouses
Real World Performance - Data Warehouses
 

More from Cloudera Japan

Impala + Kudu を用いたデータウェアハウス構築の勘所 (仮)
Impala + Kudu を用いたデータウェアハウス構築の勘所 (仮)Impala + Kudu を用いたデータウェアハウス構築の勘所 (仮)
Impala + Kudu を用いたデータウェアハウス構築の勘所 (仮)
Cloudera Japan
 
機械学習の定番プラットフォームSparkの紹介
機械学習の定番プラットフォームSparkの紹介機械学習の定番プラットフォームSparkの紹介
機械学習の定番プラットフォームSparkの紹介
Cloudera Japan
 
HDFS Supportaiblity Improvements
HDFS Supportaiblity ImprovementsHDFS Supportaiblity Improvements
HDFS Supportaiblity Improvements
Cloudera Japan
 
分散DB Apache Kuduのアーキテクチャ DBの性能と一貫性を両立させる仕組み 「HybridTime」とは
分散DB Apache KuduのアーキテクチャDBの性能と一貫性を両立させる仕組み「HybridTime」とは分散DB Apache KuduのアーキテクチャDBの性能と一貫性を両立させる仕組み「HybridTime」とは
分散DB Apache Kuduのアーキテクチャ DBの性能と一貫性を両立させる仕組み 「HybridTime」とは
Cloudera Japan
 
Apache Impalaパフォーマンスチューニング #dbts2018
Apache Impalaパフォーマンスチューニング #dbts2018Apache Impalaパフォーマンスチューニング #dbts2018
Apache Impalaパフォーマンスチューニング #dbts2018
Cloudera Japan
 
Apache Hadoop YARNとマルチテナントにおけるリソース管理
Apache Hadoop YARNとマルチテナントにおけるリソース管理Apache Hadoop YARNとマルチテナントにおけるリソース管理
Apache Hadoop YARNとマルチテナントにおけるリソース管理
Cloudera Japan
 
HBase Across the World #LINE_DM
HBase Across the World #LINE_DMHBase Across the World #LINE_DM
HBase Across the World #LINE_DM
Cloudera Japan
 
Cloudera のサポートエンジニアリング #supennight
Cloudera のサポートエンジニアリング #supennightCloudera のサポートエンジニアリング #supennight
Cloudera のサポートエンジニアリング #supennight
Cloudera Japan
 
Train, predict, serve: How to go into production your machine learning model
Train, predict, serve: How to go into production your machine learning modelTrain, predict, serve: How to go into production your machine learning model
Train, predict, serve: How to go into production your machine learning model
Cloudera Japan
 
Apache Kuduを使った分析システムの裏側
Apache Kuduを使った分析システムの裏側Apache Kuduを使った分析システムの裏側
Apache Kuduを使った分析システムの裏側
Cloudera Japan
 
Cloudera in the Cloud #CWT2017
Cloudera in the Cloud #CWT2017Cloudera in the Cloud #CWT2017
Cloudera in the Cloud #CWT2017
Cloudera Japan
 
先行事例から学ぶ IoT / ビッグデータの始め方
先行事例から学ぶ IoT / ビッグデータの始め方先行事例から学ぶ IoT / ビッグデータの始め方
先行事例から学ぶ IoT / ビッグデータの始め方
Cloudera Japan
 
Clouderaが提供するエンタープライズ向け運用、データ管理ツールの使い方 #CW2017
Clouderaが提供するエンタープライズ向け運用、データ管理ツールの使い方 #CW2017Clouderaが提供するエンタープライズ向け運用、データ管理ツールの使い方 #CW2017
Clouderaが提供するエンタープライズ向け運用、データ管理ツールの使い方 #CW2017
Cloudera Japan
 
How to go into production your machine learning models? #CWT2017
How to go into production your machine learning models? #CWT2017How to go into production your machine learning models? #CWT2017
How to go into production your machine learning models? #CWT2017
Cloudera Japan
 
Apache Kudu - Updatable Analytical Storage #rakutentech
Apache Kudu - Updatable Analytical Storage #rakutentechApache Kudu - Updatable Analytical Storage #rakutentech
Apache Kudu - Updatable Analytical Storage #rakutentech
Cloudera Japan
 
Hue 4.0 / Hue Meetup Tokyo #huejp
Hue 4.0 / Hue Meetup Tokyo #huejpHue 4.0 / Hue Meetup Tokyo #huejp
Hue 4.0 / Hue Meetup Tokyo #huejp
Cloudera Japan
 
Apache Kuduは何がそんなに「速い」DBなのか? #dbts2017
Apache Kuduは何がそんなに「速い」DBなのか? #dbts2017Apache Kuduは何がそんなに「速い」DBなのか? #dbts2017
Apache Kuduは何がそんなに「速い」DBなのか? #dbts2017
Cloudera Japan
 
Cloudera Data Science WorkbenchとPySparkで 好きなPythonライブラリを 分散で使う #cadeda
Cloudera Data Science WorkbenchとPySparkで 好きなPythonライブラリを 分散で使う #cadedaCloudera Data Science WorkbenchとPySparkで 好きなPythonライブラリを 分散で使う #cadeda
Cloudera Data Science WorkbenchとPySparkで 好きなPythonライブラリを 分散で使う #cadeda
Cloudera Japan
 
Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016
Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016
Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016
Cloudera Japan
 
Cloud Native Hadoop #cwt2016
Cloud Native Hadoop #cwt2016Cloud Native Hadoop #cwt2016
Cloud Native Hadoop #cwt2016
Cloudera Japan
 

More from Cloudera Japan (20)

Impala + Kudu を用いたデータウェアハウス構築の勘所 (仮)
Impala + Kudu を用いたデータウェアハウス構築の勘所 (仮)Impala + Kudu を用いたデータウェアハウス構築の勘所 (仮)
Impala + Kudu を用いたデータウェアハウス構築の勘所 (仮)
 
機械学習の定番プラットフォームSparkの紹介
機械学習の定番プラットフォームSparkの紹介機械学習の定番プラットフォームSparkの紹介
機械学習の定番プラットフォームSparkの紹介
 
HDFS Supportaiblity Improvements
HDFS Supportaiblity ImprovementsHDFS Supportaiblity Improvements
HDFS Supportaiblity Improvements
 
分散DB Apache Kuduのアーキテクチャ DBの性能と一貫性を両立させる仕組み 「HybridTime」とは
分散DB Apache KuduのアーキテクチャDBの性能と一貫性を両立させる仕組み「HybridTime」とは分散DB Apache KuduのアーキテクチャDBの性能と一貫性を両立させる仕組み「HybridTime」とは
分散DB Apache Kuduのアーキテクチャ DBの性能と一貫性を両立させる仕組み 「HybridTime」とは
 
Apache Impalaパフォーマンスチューニング #dbts2018
Apache Impalaパフォーマンスチューニング #dbts2018Apache Impalaパフォーマンスチューニング #dbts2018
Apache Impalaパフォーマンスチューニング #dbts2018
 
Apache Hadoop YARNとマルチテナントにおけるリソース管理
Apache Hadoop YARNとマルチテナントにおけるリソース管理Apache Hadoop YARNとマルチテナントにおけるリソース管理
Apache Hadoop YARNとマルチテナントにおけるリソース管理
 
HBase Across the World #LINE_DM
HBase Across the World #LINE_DMHBase Across the World #LINE_DM
HBase Across the World #LINE_DM
 
Cloudera のサポートエンジニアリング #supennight
Cloudera のサポートエンジニアリング #supennightCloudera のサポートエンジニアリング #supennight
Cloudera のサポートエンジニアリング #supennight
 
Train, predict, serve: How to go into production your machine learning model
Train, predict, serve: How to go into production your machine learning modelTrain, predict, serve: How to go into production your machine learning model
Train, predict, serve: How to go into production your machine learning model
 
Apache Kuduを使った分析システムの裏側
Apache Kuduを使った分析システムの裏側Apache Kuduを使った分析システムの裏側
Apache Kuduを使った分析システムの裏側
 
Cloudera in the Cloud #CWT2017
Cloudera in the Cloud #CWT2017Cloudera in the Cloud #CWT2017
Cloudera in the Cloud #CWT2017
 
先行事例から学ぶ IoT / ビッグデータの始め方
先行事例から学ぶ IoT / ビッグデータの始め方先行事例から学ぶ IoT / ビッグデータの始め方
先行事例から学ぶ IoT / ビッグデータの始め方
 
Clouderaが提供するエンタープライズ向け運用、データ管理ツールの使い方 #CW2017
Clouderaが提供するエンタープライズ向け運用、データ管理ツールの使い方 #CW2017Clouderaが提供するエンタープライズ向け運用、データ管理ツールの使い方 #CW2017
Clouderaが提供するエンタープライズ向け運用、データ管理ツールの使い方 #CW2017
 
How to go into production your machine learning models? #CWT2017
How to go into production your machine learning models? #CWT2017How to go into production your machine learning models? #CWT2017
How to go into production your machine learning models? #CWT2017
 
Apache Kudu - Updatable Analytical Storage #rakutentech
Apache Kudu - Updatable Analytical Storage #rakutentechApache Kudu - Updatable Analytical Storage #rakutentech
Apache Kudu - Updatable Analytical Storage #rakutentech
 
Hue 4.0 / Hue Meetup Tokyo #huejp
Hue 4.0 / Hue Meetup Tokyo #huejpHue 4.0 / Hue Meetup Tokyo #huejp
Hue 4.0 / Hue Meetup Tokyo #huejp
 
Apache Kuduは何がそんなに「速い」DBなのか? #dbts2017
Apache Kuduは何がそんなに「速い」DBなのか? #dbts2017Apache Kuduは何がそんなに「速い」DBなのか? #dbts2017
Apache Kuduは何がそんなに「速い」DBなのか? #dbts2017
 
Cloudera Data Science WorkbenchとPySparkで 好きなPythonライブラリを 分散で使う #cadeda
Cloudera Data Science WorkbenchとPySparkで 好きなPythonライブラリを 分散で使う #cadedaCloudera Data Science WorkbenchとPySparkで 好きなPythonライブラリを 分散で使う #cadeda
Cloudera Data Science WorkbenchとPySparkで 好きなPythonライブラリを 分散で使う #cadeda
 
Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016
Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016
Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016
 
Cloud Native Hadoop #cwt2016
Cloud Native Hadoop #cwt2016Cloud Native Hadoop #cwt2016
Cloud Native Hadoop #cwt2016
 

Recently uploaded

Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 

Recently uploaded (20)

Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 

Impala 2.0 Update #impalajp

  • 1. 1 Impala 2.0 Update Sho Shimauchi, Cloudera 2014/10/31
  • 2. 2 Today’s Topic • What is Cloudera Impala? • Impala 1.4 / 2.0 update • Performance Improvement • Query Language • Resource Management and Security • Others
  • 3. 3 Who am I ? • Pre-­‐sales SoluLons Architect • joined Cloudera in 2011, the first Japanese employee at Cloudera • email: sho@cloudera.com • twiTer: @shiumachi
  • 5. 5 What is Impala? • MPP SQL query engine for Hadoop environment • wriTen in naLve code for maximum hardware efficiency • open-­‐source! • hTp://impala.io/ • Supported by Cloudera, Amazon, and MapR • History • 2012/10 Public Beta released • 2013/04 Impala 1.0 released • current version: Impala 2.0
  • 6. 6 Impala is easy to use • create tables as virtual views over data stored in HDFS / HBase • schema metadata is stored in Metastore • shared with Hive, Pig, etc. • connect via ODBC / JDBC • authenLcate via Kerberos / LDAP • run standard SQL • ANSI SQL-­‐92 based • limited to SELECT and bulk INSERT • no correlated subqueries available in 2.0 • UDF / UDAF
  • 7. 7 Impala 1.4 (2014/07) • DECIMAL(<precision>, <scale>) • HDFS caching DDL • column definiLon based on Parquet file (CREATE TABLE … LIKE PARQUET) • ORDER BY without LIMIT • LDAP connecLons through TLS • SHOW PARTITIONS • YARN integrated resource manager will be producLon ready • Llama HA support • CREATE TABLE … STORED AS AVRO • SUMMARY command in impala-­‐shell (provides high-­‐level summary of query plan) • faster COMPUTE STATS • Performance improvements for parLLon pruning • impala shell supports UTF-­‐8 characters • addiLonal built-­‐ins from EDW systems
  • 8. 8 Impala 2.0 (2014/10) • hash table can spill to disk • join and aggregate tables of arbitrary size • Subquery enhancements • allowed in WHERE queries • EXISTS / NOT EXISTS • IN / NOT IN can operate on the result set from a subquery • correlated / uncorrelated subqueries • scalar subqueries • SQL 2003 compliant analyLc window funcLons • LEAD(), LAG(), RANK(), FIRST_VALUE(), etc. • New Data Type: VARCHAR, CHAR • Security Enhancements • mulLple authenLcaLon methods • GRANT / REVOKE / CREATE ROLE / DROP ROLE / SHOW ROLES / etc. • text + gzip / bzip2 / Snappy • Hint inside views • QUERY_TIMEOUT_S • DATE_PART() / EXTRACT() • Parquet default block size is changed to 256MB (was: 1GB) • LEFT ANTI JOIN / RIGHT ANTI JOIN • impala-­‐shell can read sesngs from $HOME/.impalarc
  • 10. 10 HDFS caching • When HDFS files are cached in memory, Impala can read the cached data without any disk reads, and without making an addiLonal copy of the data in memory • avoids checksumming and data copies • new HDFS API is available in CDH 5.0 • configure cache with Impala DDL • CREATE TABLE tbl_name CACHED IN ‘<pool>’ • ALTER TABLE tbl_name ADD PARTITION … CACHED IN ‘<pool>’
  • 11. 11 ParLLon Pruning improvement • Previously, Impala typically queried tables with up to approximately 3000 parLLons. With the performance improvement in parLLon pruning, now Impala can comfortably handle tables with tens of thousands of parLLons.
  • 12. 12 Spilling to Disk SQL OperaLon • write temporary data to when Impala is close to exceeding its memory limit • In PROFILE, BlockMgr.BytesWriTen counter reports how much data was wriTen to disk during the query
  • 14. 14 Subquery Scalar subquery: produces a result set with a single row containing a single column SELECT x FROM t1 WHERE x > (SELECT MAX(y) FROM t2);! Uncorrelated subquery: not refer to any tables from the outer block of the query SELECT x FROM t1 WHERE x IN (SELECT y FROM t2);! Correlated subquery: compare one or more values from the outer query block to values referenced in the WHERE clause of the subquery SELECT employee_name, employee_id FROM employees one WHERE! salary > (SELECT avg(salary) FROM employees two WHERE one.dept_id = two.dept_id);!
  • 15. 15 AnalyLc FuncLons (a.k.a Window FuncLons) • supported in 2.0 and later • supported funcLons • RANK() / DENSE_RANK() • FIRST_VALUE() / LAST_VALUE() • LAG() / LEAD() • ROW_NUMBER() • Aggregate funcLons are already implemented • MAX(), MIN(), AVG(), SUM(), etc.
  • 16. 16 AnalyLc FuncLons Example For each day, the query prints the closing price alongside the previous day's closing price: select stock_symbol, closing_date, closing_price,! lag(closing_price,1) over (partition by stock_symbol order by closing_date) as "yesterday closing"! from stock_ticker! order by closing_date;! +--------------+---------------------+---------------+-------------------+! | stock_symbol | closing_date | closing_price | yesterday closing |! +--------------+---------------------+---------------+-------------------+! | JDR | 2014-09-13 00:00:00 | 12.86 | NULL |! | JDR | 2014-09-14 00:00:00 | 12.89 | 12.86 |! | JDR | 2014-09-15 00:00:00 | 12.94 | 12.89 |! | JDR | 2014-09-16 00:00:00 | 12.55 | 12.94 |! | JDR | 2014-09-17 00:00:00 | 14.03 | 12.55 |! | JDR | 2014-09-18 00:00:00 | 14.75 | 14.03 |! | JDR | 2014-09-19 00:00:00 | 13.98 | 14.75 |! +--------------+---------------------+---------------+-------------------+!
  • 17. 17 ApproximaLon features • APPX_COUNT_DISTINCT query opLon • rewrite COUNT(DISTINCT) calls to use NDV() • speeds up the operaLon • allows mulLple COUNT(DISTINCT) in a single query • APPX_MEDIAN() • returns a value that is approximately the median (midpoint) of values in the set of input values
  • 18. 18 Approx. funcLons example [localhost:21000] > select min(x), max(x), avg(x) from million_numbers;! +-------------------+-------------------+-------------------+! | min(x) | max(x) | avg(x) |! +-------------------+-------------------+-------------------+! | 4.725693727250069 | 49994.56852674231 | 24945.38563793553 |! +-------------------+-------------------+-------------------+! [localhost:21000] > select appx_median(x) from million_numbers;! +----------------+! | appx_median(x) |! +----------------+! | 24721.6 |! +----------------+!
  • 19. 19 CREATE TABLE … LIKE PARQUET • CREATE TABLE ... LIKE PARQUET 'hdfs_path_of_parquet_file' • The column names and data types are automaLcally configured based on the Parquet data file
  • 20. 20 ORDER BY without LIMIT • LIMIT clause is now opLonal for queries that use the ORDER BY clause • Impala automaLcally uses a temporary disk work area to perform the sort if the sort operaLon would otherwise exceed the Impala memory limit for a parLcular data node.
  • 21. 21 DECODE() SELECT event, DECODE(day_of_week, 1, "Monday", 2, "Tuesday", 3, "Wednesday”, 4, "Thursday", 5, "Friday", 6, "Saturday", 7, "Sunday", "Unknown day")! FROM calendar;!
  • 22. 22 ANTI JOIN LEFT ANTI JOIN / RIGHT ANTI JOIN are supported in Impala 2.0 [localhost:21000] > create table t1 (x int);! [localhost:21000] > insert into t1 values (1), (2), (3), (4), (5), (6);! ! [localhost:21000] > create table t2 (y int);! [localhost:21000] > insert into t2 values (2), (4), (6);! ! [localhost:21000] > select x from t1 left anti join t2 on (t1.x = t2.y);! +---+! | x |! +---+! | 1 |! | 3 |! | 5 |! +---+! !
  • 23. 23 new data types • DECIMAL (Impala 1.4) • column_name DECIMAL[(precision[,scale])] • with no precision or scale values is equivalent to DECIMAL(9,0) • VARCHAR (Impala 2.0) • STRING with a max length • CHAR (Impala 2.0) • STRING with a precise length
  • 24. 24 new built-­‐in funcLons • EXTRACT() : returns one date or Lme field from a TIMESTAMP value • TRUNC() : truncates date/Lme values to year, month, etc. • ADD_MONTHS(): alias for MONTHS_ADD() • ROUND(): rounds DECIMAL values • for compuLng properLes for staLsLcal distribuLons • STDDEV() • STDDEV_SAMP() / STDDEV_POP() • VARIANCE() • VARIANCE_SAMP() / VARIANCE_POP() • MAX_INT() / MIN_SMALLINT() • IS_INF() / IS_NAN()
  • 25. 25 SHOW PARTITIONS [localhost:21000] > show partitions census;! +-------+-------+--------+------+---------+! | year | #Rows | #Files | Size | Format |! +-------+-------+--------+------+---------+! | 2000 | -1 | 0 | 0B | TEXT |! | 2004 | -1 | 0 | 0B | TEXT |! | 2008 | -1 | 0 | 0B | TEXT |! | 2010 | -1 | 0 | 0B | TEXT |! | 2011 | 4 | 1 | 22B | TEXT |! | 2012 | 4 | 1 | 22B | TEXT |! | 2013 | 1 | 1 | 231B | PARQUET |! | Total | 9 | 3 | 275B | |! +-------+-------+--------+------+---------+! !
  • 26. 26 SUMMARY • impala-­‐shell command • easy-­‐to-­‐digest overview of the Lmings for the different phases of execuLon for a query [localhost:21000] > select avg(ss_sales_price) from store_sales where ss_coupon_amt = 0;! +---------------------+! | avg(ss_sales_price) |! +---------------------+! | 37.80770926328327 |! +---------------------+! [localhost:21000] > summary;! +--------------+--------+----------+----------+-------+------------+----------+---------------+-----------------+! | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail |! +--------------+--------+----------+----------+-------+------------+----------+---------------+-----------------+! | 03:AGGREGATE | 1 | 1.03ms | 1.03ms | 1 | 1 | 48.00 KB | -1 B | MERGE FINALIZE |! | 02:EXCHANGE | 1 | 0ns | 0ns | 1 | 1 | 0 B | -1 B | UNPARTITIONED |! | 01:AGGREGATE | 1 | 30.79ms | 30.79ms | 1 | 1 | 80.00 KB | 10.00 MB | |! | 00:SCAN HDFS | 1 | 5.45s | 5.45s | 2.21M | -1 | 64.05 MB | 432.00 MB | tpc.store_sales |! +--------------+--------+----------+----------+-------+------------+----------+---------------+-----------------+!
  • 27. 27 SET statement • Before Impala 2.0, SET can be used only in impala-­‐ shell • In Impala 2.0, you can use SET in client app through JDBC / ODBC APIs.
  • 28. 28 Resource Management and Security
  • 29. 29 Admission Control (Impala 1.3) • Fast and lightweight resource management mechanism • avoids oversubscripLon of resources for concurrent workloads • queries are queued when reaching configurable limits • Run on every impalad • no SPOF
  • 30. 30 YARN and Llama • Llama: Low Latency ApplicaLon MAster • Subdivides coarse-­‐grain YARN scheduling into finer-­‐ granularity for low-­‐latency and short-­‐lived queries • Llama registers one long-­‐lived AM per YARN pool • Llama caches resources allocated by YARN for a short Lme, so that they can be quickly re-­‐allocated to Impala queries • much faster than waiLng for YARN • Impala 1.4: GA. Llama HA support
  • 31. 31 Query Timeout • A new query opLon, QUERY_TIMEOUT_S, lets you specify a Lmeout period in seconds for individual queries • Note: The Lmeout clock for queries and sessions only starts Lcking when the query or session is idle
  • 32. 32 Security • Impala 2.0 can accept either kind of auth. request • ex) host A with Kerberos, and host B with LDAP • Security related statement • GRANT • REVOKE • CREATE ROLE • DROP ROLE • SHOW ROLES • SHOW ROLE GRANT • -­‐-­‐disk_spill_encrypLon opLon
  • 34. 34 Text + gzip, bzip2, and Snappy • In Impala 2.0 and later, Impala supports using text data files that employ gzip, bzip2, or Snappy compression • use ROW FORMAT with delimiter and escape character to create table CREATE TABLE csv_compressed (a STRING, b STRING, c STRING)! ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";!
  • 35. 35 impala-­‐shell • UTF-­‐8 support (1.4) • .impalarc file (2.0) [impala]! verbose=true! default_db=tpc_benchmarking! write_delimited=true! output_delimiter=,! output_file=/home/tester1/benchmark_results.csv! show_profiles=true!
  • 36. 36 DocumentaLon • Cluster Sizing Guidelines for Impala • hTp://www.cloudera.com/content/cloudera/en/ documentaLon/core/latest/topics/ impala_cluster_sizing.html
  • 37. 37