SlideShare a Scribd company logo
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
1	
1	
Evaluation  of  Cloudera  impala  1.1
Aug  7,  2013
CELLANT  Corp.  R&D  Strategy  Division
Yukinori  SUDA
@sudabon
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
v  Sentry  support:
l  Fine-‐‑‒grained  authorization
l  Role-‐‑‒based  authorization
v  Support  for  views
v  Performance  improvements
l  Parquet  columnar  performance
l  More  efficient  metadata  refresh  for  larger  installations
v  Additional  SQL
l  SQL-‐‑‒89  joins  (in  addition  to  existing  SQL-‐‑‒92)
l  LOAD  function
l  REFRESH  command  for  JDBC/ODBC
v  Improved  Hbase  support:
l  Binary  types
l  Caching  configuration
v  Fixed  many  bugs
Cloudera  Impala  1.1  was  released  !!
2
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
v Hive  ⇒  Impala
l On  Impala  shell,  can  read  data  in  “VIEW”  that  was  
created  via  Hive  command  ?
v Impala  ⇒  Hive
l On  Hive  shell,  can  read  data  in  “VIEW”  that  was  
created  via  Impala  command  ?
v Result
Two  “VIEW”s  have  compatibility
Check  compatibility  of  “VIEW”
3
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
Check  performance  (Hive  on  Cluster1)
4
0 50 100 150 200 250
No  Comp.
Gzip
Snappy
Gzip
Snappy
TextFileSequenceFileRCFile
222.039
244.67
239.182
228.801
230.327
Avg.  Job  Latency  [sec]
This result will be invalid as performance evaluation cause some data may be read remotely.
See the slide of “Check performance (Hive on Cluster2)”.
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
Check  performance  (Impala  on  Cluster1)
5
0 50 100 150 200 250
No  Comp.
Gzip
Snappy
Gzip
Snappy
Snappy
Text
File
Sequence
FileRCFile
Parquet
File
23.518
32.155
28.617
20.774
12.654
13.146
Avg.  Job  Latency  [sec]
This result will be invalid as performance evaluation
cause some data may be read remotely.
See the slide of “Check performance (Impala on Cluster2)”.
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
Check  performance  (Hive  on  Cluster2)
6
0 50 100 150 200 250 300
No  Comp.
Gzip
Snappy
Gzip
Snappy
TextFileSequenceFileRCFile
272.176
249.531
245.009
230.034
216.802
Avg.  Job  Latency  [sec]
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
Check  performance  (Impala  on  Cluster2)
7
0 50 100 150 200 250 300
No  Comp.
Gzip
Snappy
Gzip
Snappy
Snappy
Text
File
Sequence
FileRCFile
Parquet
File
32.528
28.73
21.173
24.794
14.308
19.814
Avg.  Job  Latency  [sec]
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
v IMPALA-‐‑‒357
l Insert  into  Parquet  exceed  mem-‐‑‒limit
v Problem
l Even  if  set  mem_̲limit  setting,  when  create  ParquetFile  
table  with  partitions,  consumed  memory  isnʼ’t  limited.  
l At  last,  Impalad  crashes  due  to  memory  shortage
v Result
CREATE  command  failed  due  to  memory  limit
Check  fixed  bug
8
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
v Thanks  to  dev.  team,  Impala  is  also  going  
from  “Good  to  Great”
v Both  “VIEW”  and  “Parquet”  are  already  ready
v Performance
v RCFile+Snappy  is  the  fastest  on  both  Cluster1  and  
Cluster2
v If  use  larger  size  table,  Parquet+Snappy  may  be  the  
fastest
v Hope  for  future  extension
l Support  Structure  Types
l Support  UDF/UDTF,  etc
Summary
9
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
10
Appendix.  Benchmark  Details
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
Our  System  Environment(Cluster1)
11
v  Install  using  Cloudera  Manager  Free  Edition  4.6.0
Master Slave
14  Servers
All  servers  are  connected  with  1Gbps  Ethernet  through  an  L2  switch
Active
NameNode
DataNode
TaskTracker
Impalad
Stand-‐‑‒by
NameNode
JobTracker
statestored
3  Servers
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
DataNode
DataNode
DataNode
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
Our  System  Environment(Cluster2)
12
v  Install  using  Cloudera  Manager  Free  Edition  4.6.0
Master Slave
10  Servers
All  servers  are  connected  with  1Gbps  Ethernet  through  an  L2  switch
Active
NameNode
DataNode
TaskTracker
Impalad
Stand-‐‑‒by
NameNode
JobTracker
statestored
3  Servers
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
TaskTracker
Impalad
DataNode
DataNode
DataNode
DataNode
Decommissioned
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
v CPU
l Intel  Core  2  Duo  2.13  GHz  with  Hyper  Threading
v Memory
l 8GB  :  Namenodes  only
l 4GB  :  Others
v Disk
l 7,200  rpm  SATA  mechanical  Hard  Disk  Drive  *  1
v OS
l Cent  OS  6.3
Our  Server  Specification
13
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
v  Use  CDH4.3.0  +  Impala  1.1
v  Use  hivebench  in  open-‐‑‒sourced  benchmark  tool  “HiBench”
l  https://github.com/hibench
v  Modified  datasets  to  1/10  scale
l  Default  configuration  generates  table  with  1  billion  rows
v  Modified  query  sentence
l  Deleted  “INSERT  INTO  TABLE  …”  to  evaluate  read-‐‑‒only  performance
v  Combines  a  few  storage  format  with  a  few  compression  method
l  TextFile,  SequenceFile,  RCFile,  ParquestFile
l  No  compression,  Gzip,  Snappy
v  Comparison  with  job  query  latency
v  Average  job  latency  over  5  measurements
v  Benchmark  on  both  Cluster1  and  Cluster2
Benchmark
14
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
•  Uservisits  table
–  100  million  rows
–  16,895  MB  as  TextFile
–  Table  Definitions
•  sourceIP   string
•  destURL   string
•  visitDate   string
•  adRevenue   double
•  userAgent   string
•  countryCode   string
•  languageCode  string
•  searchWord   string
•  duration   int
•  Rankings  table
–  12  million  rows
–  744  MB  as  TextFile
–  Table  Definitions
•  pageURL string
•  pageRank int
•  avgDuration int
Modified  Datasets
15
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
SELECT
  sourceIP,
  sum(adRevenue)  as  totalRevenue,
  avg(pageRank)  
FROM
  rankings_̲t  R
JOIN  [BROADCAST]  (
  SELECT
    sourceIP,
    destURL,
    adRevenue
  FROM
    uservisits_̲t  UV
  WHERE
    (datediff(UV.visitDate,  '1999-‐‑‒01-‐‑‒01')>=0
    AND
    datediff(UV.visitDate,  '2000-‐‑‒01-‐‑‒01')<=0)
  )  NUV
ON
  (R.pageURL  =  NUV.destURL)
group  by  sourceIP
order  by  totalRevenue  DESC
limit  1;
Modified  Query
16
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /
17
Thanks!
I  want  to  use  TPC  in  next  evaluation…

More Related Content

What's hot

Building Spark as Service in Cloud
Building Spark as Service in CloudBuilding Spark as Service in Cloud
Building Spark as Service in Cloud
InMobi Technology
 
PGConf.ASIA 2019 Bali - Foreign Data Wrappers - Etsuro Fujita & Tatsuro Yamada
PGConf.ASIA 2019 Bali - Foreign Data Wrappers - Etsuro Fujita & Tatsuro YamadaPGConf.ASIA 2019 Bali - Foreign Data Wrappers - Etsuro Fujita & Tatsuro Yamada
PGConf.ASIA 2019 Bali - Foreign Data Wrappers - Etsuro Fujita & Tatsuro Yamada
Equnix Business Solutions
 
RedGateWebinar - Where did my CPU go?
RedGateWebinar - Where did my CPU go?RedGateWebinar - Where did my CPU go?
RedGateWebinar - Where did my CPU go?Kristofferson A
 
Replicating in Real-time from MySQL to Amazon Redshift
Replicating in Real-time from MySQL to Amazon RedshiftReplicating in Real-time from MySQL to Amazon Redshift
Replicating in Real-time from MySQL to Amazon Redshift
Continuent
 
OOW 2013: Where did my CPU go
OOW 2013: Where did my CPU goOOW 2013: Where did my CPU go
OOW 2013: Where did my CPU goKristofferson A
 
FortranCon2020: Highly Parallel Fortran and OpenACC Directives
FortranCon2020: Highly Parallel Fortran and OpenACC DirectivesFortranCon2020: Highly Parallel Fortran and OpenACC Directives
FortranCon2020: Highly Parallel Fortran and OpenACC Directives
Jeff Larkin
 
Case Studies on PostgreSQL
Case Studies on PostgreSQLCase Studies on PostgreSQL
Case Studies on PostgreSQL
InMobi Technology
 
PGConf APAC 2018 - Monitoring PostgreSQL at Scale
PGConf APAC 2018 - Monitoring PostgreSQL at ScalePGConf APAC 2018 - Monitoring PostgreSQL at Scale
PGConf APAC 2018 - Monitoring PostgreSQL at Scale
PGConf APAC
 
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
Equnix Business Solutions
 
Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016
Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016
Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016
DataStax
 
NVIDIA Tesla K40 GPU
NVIDIA Tesla K40 GPUNVIDIA Tesla K40 GPU
NVIDIA Tesla K40 GPU
Can Ozdoruk
 
An Overview of the IHK/McKernel Multi-kernel Operating System
An Overview of the IHK/McKernel Multi-kernel Operating SystemAn Overview of the IHK/McKernel Multi-kernel Operating System
An Overview of the IHK/McKernel Multi-kernel Operating System
Linaro
 
HCQC : HPC Compiler Quality Checker
HCQC : HPC Compiler Quality CheckerHCQC : HPC Compiler Quality Checker
HCQC : HPC Compiler Quality Checker
Linaro
 
The Database Sizing Workflow
The Database Sizing WorkflowThe Database Sizing Workflow
The Database Sizing WorkflowKristofferson A
 
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar AhmedPGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
Equnix Business Solutions
 
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
Rakuten Group, Inc.
 
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
Equnix Business Solutions
 
OakTableWorld 2013: Ultimate Exadata IO monitoring – Flash, HardDisk , & Writ...
OakTableWorld 2013: Ultimate Exadata IO monitoring – Flash, HardDisk , & Writ...OakTableWorld 2013: Ultimate Exadata IO monitoring – Flash, HardDisk , & Writ...
OakTableWorld 2013: Ultimate Exadata IO monitoring – Flash, HardDisk , & Writ...Kristofferson A
 
Replicate from Oracle to data warehouses and analytics
Replicate from Oracle to data warehouses and analyticsReplicate from Oracle to data warehouses and analytics
Replicate from Oracle to data warehouses and analytics
Continuent
 
Whitepaper: Exadata Consolidation Success Story
Whitepaper: Exadata Consolidation Success StoryWhitepaper: Exadata Consolidation Success Story
Whitepaper: Exadata Consolidation Success StoryKristofferson A
 

What's hot (20)

Building Spark as Service in Cloud
Building Spark as Service in CloudBuilding Spark as Service in Cloud
Building Spark as Service in Cloud
 
PGConf.ASIA 2019 Bali - Foreign Data Wrappers - Etsuro Fujita & Tatsuro Yamada
PGConf.ASIA 2019 Bali - Foreign Data Wrappers - Etsuro Fujita & Tatsuro YamadaPGConf.ASIA 2019 Bali - Foreign Data Wrappers - Etsuro Fujita & Tatsuro Yamada
PGConf.ASIA 2019 Bali - Foreign Data Wrappers - Etsuro Fujita & Tatsuro Yamada
 
RedGateWebinar - Where did my CPU go?
RedGateWebinar - Where did my CPU go?RedGateWebinar - Where did my CPU go?
RedGateWebinar - Where did my CPU go?
 
Replicating in Real-time from MySQL to Amazon Redshift
Replicating in Real-time from MySQL to Amazon RedshiftReplicating in Real-time from MySQL to Amazon Redshift
Replicating in Real-time from MySQL to Amazon Redshift
 
OOW 2013: Where did my CPU go
OOW 2013: Where did my CPU goOOW 2013: Where did my CPU go
OOW 2013: Where did my CPU go
 
FortranCon2020: Highly Parallel Fortran and OpenACC Directives
FortranCon2020: Highly Parallel Fortran and OpenACC DirectivesFortranCon2020: Highly Parallel Fortran and OpenACC Directives
FortranCon2020: Highly Parallel Fortran and OpenACC Directives
 
Case Studies on PostgreSQL
Case Studies on PostgreSQLCase Studies on PostgreSQL
Case Studies on PostgreSQL
 
PGConf APAC 2018 - Monitoring PostgreSQL at Scale
PGConf APAC 2018 - Monitoring PostgreSQL at ScalePGConf APAC 2018 - Monitoring PostgreSQL at Scale
PGConf APAC 2018 - Monitoring PostgreSQL at Scale
 
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
 
Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016
Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016
Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016
 
NVIDIA Tesla K40 GPU
NVIDIA Tesla K40 GPUNVIDIA Tesla K40 GPU
NVIDIA Tesla K40 GPU
 
An Overview of the IHK/McKernel Multi-kernel Operating System
An Overview of the IHK/McKernel Multi-kernel Operating SystemAn Overview of the IHK/McKernel Multi-kernel Operating System
An Overview of the IHK/McKernel Multi-kernel Operating System
 
HCQC : HPC Compiler Quality Checker
HCQC : HPC Compiler Quality CheckerHCQC : HPC Compiler Quality Checker
HCQC : HPC Compiler Quality Checker
 
The Database Sizing Workflow
The Database Sizing WorkflowThe Database Sizing Workflow
The Database Sizing Workflow
 
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar AhmedPGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
 
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
 
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
 
OakTableWorld 2013: Ultimate Exadata IO monitoring – Flash, HardDisk , & Writ...
OakTableWorld 2013: Ultimate Exadata IO monitoring – Flash, HardDisk , & Writ...OakTableWorld 2013: Ultimate Exadata IO monitoring – Flash, HardDisk , & Writ...
OakTableWorld 2013: Ultimate Exadata IO monitoring – Flash, HardDisk , & Writ...
 
Replicate from Oracle to data warehouses and analytics
Replicate from Oracle to data warehouses and analyticsReplicate from Oracle to data warehouses and analytics
Replicate from Oracle to data warehouses and analytics
 
Whitepaper: Exadata Consolidation Success Story
Whitepaper: Exadata Consolidation Success StoryWhitepaper: Exadata Consolidation Success Story
Whitepaper: Exadata Consolidation Success Story
 

Viewers also liked

Impala presentation ahad rana
Impala presentation ahad ranaImpala presentation ahad rana
Impala presentation ahad rana
Data Con LA
 
(Aaron myers) hdfs impala
(Aaron myers)   hdfs impala(Aaron myers)   hdfs impala
(Aaron myers) hdfs impalaNAVER D2
 
ImpalaToGo introduction
ImpalaToGo introductionImpalaToGo introduction
ImpalaToGo introduction
David Groozman
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
markgrover
 
Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...
Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...
Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...
Gregg Barrett
 
Performance evaluation of cloudera impala (with Comparison to Hive)
Performance evaluation of cloudera impala (with Comparison to Hive)Performance evaluation of cloudera impala (with Comparison to Hive)
Performance evaluation of cloudera impala (with Comparison to Hive)Yukinori Suda
 

Viewers also liked (6)

Impala presentation ahad rana
Impala presentation ahad ranaImpala presentation ahad rana
Impala presentation ahad rana
 
(Aaron myers) hdfs impala
(Aaron myers)   hdfs impala(Aaron myers)   hdfs impala
(Aaron myers) hdfs impala
 
ImpalaToGo introduction
ImpalaToGo introductionImpalaToGo introduction
ImpalaToGo introduction
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...
Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...
Quick Introduction: To run a SQL query on the Chicago Employee Data, using Cl...
 
Performance evaluation of cloudera impala (with Comparison to Hive)
Performance evaluation of cloudera impala (with Comparison to Hive)Performance evaluation of cloudera impala (with Comparison to Hive)
Performance evaluation of cloudera impala (with Comparison to Hive)
 

Similar to Evaluation of cloudera impala 1.1

Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to HivePerformance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to HiveYukinori Suda
 
What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0
ScyllaDB
 
OpenStack in 10 minutes with Devstack
OpenStack in 10 minutes with DevstackOpenStack in 10 minutes with Devstack
OpenStack in 10 minutes with Devstack
Sean Dague
 
Switch as a Server - PuppetConf 2014 - Leslie Carr
Switch as a Server - PuppetConf 2014 - Leslie CarrSwitch as a Server - PuppetConf 2014 - Leslie Carr
Switch as a Server - PuppetConf 2014 - Leslie Carr
Cumulus Networks
 
SCM Puppet: from an intro to the scaling
SCM Puppet: from an intro to the scalingSCM Puppet: from an intro to the scaling
SCM Puppet: from an intro to the scaling
Stanislav Osipov
 
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYApache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Wangda Tan
 
Production Grade Kubernetes Applications
Production Grade Kubernetes ApplicationsProduction Grade Kubernetes Applications
Production Grade Kubernetes Applications
Narayanan Krishnamurthy
 
The Switch as a Server - PuppetConf 2014
The Switch as a Server - PuppetConf 2014The Switch as a Server - PuppetConf 2014
The Switch as a Server - PuppetConf 2014
Puppet
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017
Dave Holland
 
Pig on Tez - Low Latency ETL with Big Data
Pig on Tez - Low Latency ETL with Big DataPig on Tez - Low Latency ETL with Big Data
Pig on Tez - Low Latency ETL with Big DataDataWorks Summit
 
Katello on TorqueBox
Katello on TorqueBoxKatello on TorqueBox
Katello on TorqueBox
lzap
 
The Data Center and Hadoop
The Data Center and HadoopThe Data Center and Hadoop
The Data Center and Hadoop
DataWorks Summit
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weitingWei Ting Chen
 
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
Chris Fregly
 
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
Chris Fregly
 
Container orchestration from theory to practice
Container orchestration from theory to practiceContainer orchestration from theory to practice
Container orchestration from theory to practice
Docker, Inc.
 
Running Stateful Apps on Kubernetes
Running Stateful Apps on KubernetesRunning Stateful Apps on Kubernetes
Running Stateful Apps on Kubernetes
Yugabyte
 
ApacheCon 2021 - Apache NiFi Deep Dive 300
ApacheCon 2021 - Apache NiFi Deep Dive 300ApacheCon 2021 - Apache NiFi Deep Dive 300
ApacheCon 2021 - Apache NiFi Deep Dive 300
Timothy Spann
 
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
Vietnam Open Infrastructure User Group
 
N(ot)-o(nly)-(Ha)doop - the DAG showdown
N(ot)-o(nly)-(Ha)doop - the DAG showdownN(ot)-o(nly)-(Ha)doop - the DAG showdown
N(ot)-o(nly)-(Ha)doop - the DAG showdown
DataWorks Summit
 

Similar to Evaluation of cloudera impala 1.1 (20)

Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to HivePerformance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
 
What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0
 
OpenStack in 10 minutes with Devstack
OpenStack in 10 minutes with DevstackOpenStack in 10 minutes with Devstack
OpenStack in 10 minutes with Devstack
 
Switch as a Server - PuppetConf 2014 - Leslie Carr
Switch as a Server - PuppetConf 2014 - Leslie CarrSwitch as a Server - PuppetConf 2014 - Leslie Carr
Switch as a Server - PuppetConf 2014 - Leslie Carr
 
SCM Puppet: from an intro to the scaling
SCM Puppet: from an intro to the scalingSCM Puppet: from an intro to the scaling
SCM Puppet: from an intro to the scaling
 
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYApache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
 
Production Grade Kubernetes Applications
Production Grade Kubernetes ApplicationsProduction Grade Kubernetes Applications
Production Grade Kubernetes Applications
 
The Switch as a Server - PuppetConf 2014
The Switch as a Server - PuppetConf 2014The Switch as a Server - PuppetConf 2014
The Switch as a Server - PuppetConf 2014
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017
 
Pig on Tez - Low Latency ETL with Big Data
Pig on Tez - Low Latency ETL with Big DataPig on Tez - Low Latency ETL with Big Data
Pig on Tez - Low Latency ETL with Big Data
 
Katello on TorqueBox
Katello on TorqueBoxKatello on TorqueBox
Katello on TorqueBox
 
The Data Center and Hadoop
The Data Center and HadoopThe Data Center and Hadoop
The Data Center and Hadoop
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
 
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
 
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
 
Container orchestration from theory to practice
Container orchestration from theory to practiceContainer orchestration from theory to practice
Container orchestration from theory to practice
 
Running Stateful Apps on Kubernetes
Running Stateful Apps on KubernetesRunning Stateful Apps on Kubernetes
Running Stateful Apps on Kubernetes
 
ApacheCon 2021 - Apache NiFi Deep Dive 300
ApacheCon 2021 - Apache NiFi Deep Dive 300ApacheCon 2021 - Apache NiFi Deep Dive 300
ApacheCon 2021 - Apache NiFi Deep Dive 300
 
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
 
N(ot)-o(nly)-(Ha)doop - the DAG showdown
N(ot)-o(nly)-(Ha)doop - the DAG showdownN(ot)-o(nly)-(Ha)doop - the DAG showdown
N(ot)-o(nly)-(Ha)doop - the DAG showdown
 

More from Yukinori Suda

Hadoop operation chaper 4
Hadoop operation chaper 4Hadoop operation chaper 4
Hadoop operation chaper 4
Yukinori Suda
 
Cloudera Impalaをサービスに組み込むときに苦労した話
Cloudera Impalaをサービスに組み込むときに苦労した話Cloudera Impalaをサービスに組み込むときに苦労した話
Cloudera Impalaをサービスに組み込むときに苦労した話
Yukinori Suda
 
Hadoopエコシステムを駆使したこれからのWebアクセス解析サービス
Hadoopエコシステムを駆使したこれからのWebアクセス解析サービスHadoopエコシステムを駆使したこれからのWebアクセス解析サービス
Hadoopエコシステムを駆使したこれからのWebアクセス解析サービス
Yukinori Suda
 
自宅でHive愛を育む方法 〜Raspberry Pi編〜
自宅でHive愛を育む方法 〜Raspberry Pi編〜自宅でHive愛を育む方法 〜Raspberry Pi編〜
自宅でHive愛を育む方法 〜Raspberry Pi編〜Yukinori Suda
 
⾃宅で Hive 愛を育むための⼿順(Raspberry Pi 編)
⾃宅で Hive 愛を育むための⼿順(Raspberry Pi 編)⾃宅で Hive 愛を育むための⼿順(Raspberry Pi 編)
⾃宅で Hive 愛を育むための⼿順(Raspberry Pi 編)Yukinori Suda
 
HiveとImpalaのおいしいとこ取り
HiveとImpalaのおいしいとこ取りHiveとImpalaのおいしいとこ取り
HiveとImpalaのおいしいとこ取り
Yukinori Suda
 
Cloudera impalaの性能評価(Hiveとの比較)
Cloudera impalaの性能評価(Hiveとの比較)Cloudera impalaの性能評価(Hiveとの比較)
Cloudera impalaの性能評価(Hiveとの比較)
Yukinori Suda
 

More from Yukinori Suda (7)

Hadoop operation chaper 4
Hadoop operation chaper 4Hadoop operation chaper 4
Hadoop operation chaper 4
 
Cloudera Impalaをサービスに組み込むときに苦労した話
Cloudera Impalaをサービスに組み込むときに苦労した話Cloudera Impalaをサービスに組み込むときに苦労した話
Cloudera Impalaをサービスに組み込むときに苦労した話
 
Hadoopエコシステムを駆使したこれからのWebアクセス解析サービス
Hadoopエコシステムを駆使したこれからのWebアクセス解析サービスHadoopエコシステムを駆使したこれからのWebアクセス解析サービス
Hadoopエコシステムを駆使したこれからのWebアクセス解析サービス
 
自宅でHive愛を育む方法 〜Raspberry Pi編〜
自宅でHive愛を育む方法 〜Raspberry Pi編〜自宅でHive愛を育む方法 〜Raspberry Pi編〜
自宅でHive愛を育む方法 〜Raspberry Pi編〜
 
⾃宅で Hive 愛を育むための⼿順(Raspberry Pi 編)
⾃宅で Hive 愛を育むための⼿順(Raspberry Pi 編)⾃宅で Hive 愛を育むための⼿順(Raspberry Pi 編)
⾃宅で Hive 愛を育むための⼿順(Raspberry Pi 編)
 
HiveとImpalaのおいしいとこ取り
HiveとImpalaのおいしいとこ取りHiveとImpalaのおいしいとこ取り
HiveとImpalaのおいしいとこ取り
 
Cloudera impalaの性能評価(Hiveとの比較)
Cloudera impalaの性能評価(Hiveとの比較)Cloudera impalaの性能評価(Hiveとの比較)
Cloudera impalaの性能評価(Hiveとの比較)
 

Recently uploaded

The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 

Recently uploaded (20)

The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 

Evaluation of cloudera impala 1.1

  • 1. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / 1 1 Evaluation  of  Cloudera  impala  1.1 Aug  7,  2013 CELLANT  Corp.  R&D  Strategy  Division Yukinori  SUDA @sudabon
  • 2. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / v  Sentry  support: l  Fine-‐‑‒grained  authorization l  Role-‐‑‒based  authorization v  Support  for  views v  Performance  improvements l  Parquet  columnar  performance l  More  efficient  metadata  refresh  for  larger  installations v  Additional  SQL l  SQL-‐‑‒89  joins  (in  addition  to  existing  SQL-‐‑‒92) l  LOAD  function l  REFRESH  command  for  JDBC/ODBC v  Improved  Hbase  support: l  Binary  types l  Caching  configuration v  Fixed  many  bugs Cloudera  Impala  1.1  was  released  !! 2
  • 3. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / v Hive  ⇒  Impala l On  Impala  shell,  can  read  data  in  “VIEW”  that  was   created  via  Hive  command  ? v Impala  ⇒  Hive l On  Hive  shell,  can  read  data  in  “VIEW”  that  was   created  via  Impala  command  ? v Result Two  “VIEW”s  have  compatibility Check  compatibility  of  “VIEW” 3
  • 4. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / Check  performance  (Hive  on  Cluster1) 4 0 50 100 150 200 250 No  Comp. Gzip Snappy Gzip Snappy TextFileSequenceFileRCFile 222.039 244.67 239.182 228.801 230.327 Avg.  Job  Latency  [sec] This result will be invalid as performance evaluation cause some data may be read remotely. See the slide of “Check performance (Hive on Cluster2)”.
  • 5. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / Check  performance  (Impala  on  Cluster1) 5 0 50 100 150 200 250 No  Comp. Gzip Snappy Gzip Snappy Snappy Text File Sequence FileRCFile Parquet File 23.518 32.155 28.617 20.774 12.654 13.146 Avg.  Job  Latency  [sec] This result will be invalid as performance evaluation cause some data may be read remotely. See the slide of “Check performance (Impala on Cluster2)”.
  • 6. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / Check  performance  (Hive  on  Cluster2) 6 0 50 100 150 200 250 300 No  Comp. Gzip Snappy Gzip Snappy TextFileSequenceFileRCFile 272.176 249.531 245.009 230.034 216.802 Avg.  Job  Latency  [sec]
  • 7. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / Check  performance  (Impala  on  Cluster2) 7 0 50 100 150 200 250 300 No  Comp. Gzip Snappy Gzip Snappy Snappy Text File Sequence FileRCFile Parquet File 32.528 28.73 21.173 24.794 14.308 19.814 Avg.  Job  Latency  [sec]
  • 8. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / v IMPALA-‐‑‒357 l Insert  into  Parquet  exceed  mem-‐‑‒limit v Problem l Even  if  set  mem_̲limit  setting,  when  create  ParquetFile   table  with  partitions,  consumed  memory  isnʼ’t  limited.   l At  last,  Impalad  crashes  due  to  memory  shortage v Result CREATE  command  failed  due  to  memory  limit Check  fixed  bug 8
  • 9. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / v Thanks  to  dev.  team,  Impala  is  also  going   from  “Good  to  Great” v Both  “VIEW”  and  “Parquet”  are  already  ready v Performance v RCFile+Snappy  is  the  fastest  on  both  Cluster1  and   Cluster2 v If  use  larger  size  table,  Parquet+Snappy  may  be  the   fastest v Hope  for  future  extension l Support  Structure  Types l Support  UDF/UDTF,  etc Summary 9
  • 10. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / 10 Appendix.  Benchmark  Details
  • 11. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / Our  System  Environment(Cluster1) 11 v  Install  using  Cloudera  Manager  Free  Edition  4.6.0 Master Slave 14  Servers All  servers  are  connected  with  1Gbps  Ethernet  through  an  L2  switch Active NameNode DataNode TaskTracker Impalad Stand-‐‑‒by NameNode JobTracker statestored 3  Servers DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode DataNode DataNode DataNode
  • 12. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / Our  System  Environment(Cluster2) 12 v  Install  using  Cloudera  Manager  Free  Edition  4.6.0 Master Slave 10  Servers All  servers  are  connected  with  1Gbps  Ethernet  through  an  L2  switch Active NameNode DataNode TaskTracker Impalad Stand-‐‑‒by NameNode JobTracker statestored 3  Servers DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode DataNode DataNode DataNode Decommissioned
  • 13. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / v CPU l Intel  Core  2  Duo  2.13  GHz  with  Hyper  Threading v Memory l 8GB  :  Namenodes  only l 4GB  :  Others v Disk l 7,200  rpm  SATA  mechanical  Hard  Disk  Drive  *  1 v OS l Cent  OS  6.3 Our  Server  Specification 13
  • 14. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / v  Use  CDH4.3.0  +  Impala  1.1 v  Use  hivebench  in  open-‐‑‒sourced  benchmark  tool  “HiBench” l  https://github.com/hibench v  Modified  datasets  to  1/10  scale l  Default  configuration  generates  table  with  1  billion  rows v  Modified  query  sentence l  Deleted  “INSERT  INTO  TABLE  …”  to  evaluate  read-‐‑‒only  performance v  Combines  a  few  storage  format  with  a  few  compression  method l  TextFile,  SequenceFile,  RCFile,  ParquestFile l  No  compression,  Gzip,  Snappy v  Comparison  with  job  query  latency v  Average  job  latency  over  5  measurements v  Benchmark  on  both  Cluster1  and  Cluster2 Benchmark 14
  • 15. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / •  Uservisits  table –  100  million  rows –  16,895  MB  as  TextFile –  Table  Definitions •  sourceIP  string •  destURL  string •  visitDate  string •  adRevenue  double •  userAgent  string •  countryCode  string •  languageCode  string •  searchWord  string •  duration  int •  Rankings  table –  12  million  rows –  744  MB  as  TextFile –  Table  Definitions •  pageURL string •  pageRank int •  avgDuration int Modified  Datasets 15
  • 16. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / SELECT   sourceIP,   sum(adRevenue)  as  totalRevenue,   avg(pageRank)   FROM   rankings_̲t  R JOIN  [BROADCAST]  (   SELECT     sourceIP,     destURL,     adRevenue   FROM     uservisits_̲t  UV   WHERE     (datediff(UV.visitDate,  '1999-‐‑‒01-‐‑‒01')>=0     AND     datediff(UV.visitDate,  '2000-‐‑‒01-‐‑‒01')<=0)   )  NUV ON   (R.pageURL  =  NUV.destURL) group  by  sourceIP order  by  totalRevenue  DESC limit  1; Modified  Query 16
  • 17. Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / 17 Thanks! I  want  to  use  TPC  in  next  evaluation…