Your SlideShare is downloading. ×
Performance evaluation of cloudera impala (with Comparison to Hive)
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Performance evaluation of cloudera impala (with Comparison to Hive)

4,156
views

Published on

Published in: Technology

0 Comments
10 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,156
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
111
Comments
0
Likes
10
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Cloudera  impala  Performance   Evaluation  (with  Comparison  to  Hive) Dec. 8, 2012 CELLANT Corp. R&D Strategy Division Yukinori SUDA @sudabon
  • 2. About  Cloudera  impala •  Latest version is 0.3 beta•  Open-sourced implementation inspired by Google Dremel and F1•  Developed by famous Hadoop distributor Cloudera•  Bring real-time, ad-hoc query capability on Apache Hadoop•  Query data stored in HDFS or Apache Hbase•  Use the same metadata, SQL syntax (HiveQL) as Apache Hive•  Support for TextFile and SequenceFile as Hive storage format•  Also support SequenceFile compressed as Snappy, Gzip and Bzip•  Directly access the data through a specialized distributed query engine
  • 3. Architecture •  State Store works as an impala-state-store(statestored) daemon•  Query Planner, Query Coordinator and Query Exec Engine work as an impalad daemon
  • 4. System  Environment •  Install via Cloudera Manager Free Edition Master Slave・HDFS NameNode SecondaryNameNode ・HDFS ・MapReduceV1 DataNode JobTracker ・MapReduceV1 ・impala TaskTracker impalad ・impala impala-­‐‑state-­‐‑store impalad (statestored) 1  Sever 13  Servers All  servers  are  connected  with  1Gbps  Ethernet  through  an  L2  switch
  • 5. Server  Specification •  CPU o  Intel Core 2 Duo 2.13 GHz with Hyper Threading•  Memory o  4GB•  Disk o  7,200 rpm SATA mechanical Hard Disk Drive•  OS o  CentOS 6.2
  • 6. Benchmark •  Use CDH4.1 + impala version 0.2 and 0.3•  Use hivebench in open-sourced benchmark tool “HiBench” o  https://github.com/hibench•  Modified datasets to 1/10 scale o  Default configuration generates table with 1 billion rows•  Modified query sentence o  Deleted “INSERT INTO TABLE …” to evaluate read-only performance o  Deleted “datediff” function (I mistook not to be supported)•  Combines a few Hive storage format with a few compression method o  TextFile, SequenceFile, RCFile o  No compression, Gzip, Snappy•  Comparison with job query latency o  Average job latency over 5 measurements
  • 7. Modified  Datasets •  Uservisits table •  Rankings table o  100 million rows o  12 million rows o  Schema o  Schema •  sourceIP string •  pageURL string •  destURL string •  pageRank int •  visitDate string •  avgDuration int •  adRevenue double •  userAgent string •  countryCode string •  languageCode string •  searchWord string •  duration int
  • 8. Modified  Query SELECT ON sourceIP, (R.pageURL = NUV.destURL) sum(adRevenue) as totalRevenue, avg(pageRank) GROUP BY sourceIPFROM ORDER BY totalRevenue DESC rankings R LIMIT 1JOIN ( SELECT sourceIP, destURL, adRevenue FROM uservisits UV WHERE UV.visitData >= ‘1999-01-01’ AND UV.visitData <= ‘2001-01-01’ ) NUV
  • 9. Benchmark  Result   (Hive)
  • 10. Benchmark  Result   (impala  0.2)
  • 11. Benchmark  Result   (impala  0.3)
  • 12. Conclusion •  Impala is over 10 times faster than MR + Hive o  Impala 0.3 •  SequenceFile compressed as Snappy: 14.337 seconds o  Impala 0.2 •  SequenceFile compressed as Gzip: 19.733 seconds o  Hive •  RCFile compressed as Snappy: 164.161 seconds•  Hope that impala version 1.0 included in CDH5 makes faster o  Support RCFile and Trevni columner format
  • 13. Thank  you