Successfully reported this slideshow.
Cloudera  impala  0.6  beta                          Performance  Evaluation                               (with  Comparis...
Cloudera  impala  0.6  beta        v  ChangeLogs  from  0.5  beta             v  Cloudera  Manager  4.5  and  CDH  4.2  ...
System  Environment            v  Install  via  Cloudera  Manager  Free  Edition  4.5.0                 Master           ...
Server  Specification        v CPU               l  Intel  Core  2  Duo  2.13  GHz  with  Hyper  Threading        v Memo...
Benchmark        v  Use  CDH4.2.0  +  impala  version  0.6  beta        v  Use  hivebench  in  open-‐‑‒sourced  benchmar...
Modified  Datasets        •  Uservisits  table                              •  Rankings  table               –  100  millio...
Modified  Query        SELECT                                               ON          sourceIP,                          ...
Benchmark  Result  (Hive)                                                                                        197.894  ...
Benchmark  Result  (impala)                                              16.059                                Snappy     ...
Block  Location  Cache  effect  ?                                  TextFile	        SequenceFile	                 RCFile	  ...
Conclusion        v Impala  is  over  10  times  faster  than  MRv1  +             Hive        v Specifically,           ...
Thanks.                                                                               12Copyright © CELLANT Corp. All Righ...
Upcoming SlideShare
Loading in …5
×

Performance evaluation of cloudera impala 0.6 beta with comparison to Hive

6,089 views

Published on

Performance evaluation of cloudera impala 0.6 beta with comparison to Hive

  1. 1. Cloudera  impala  0.6  beta   Performance  Evaluation (with  Comparison  to  Hive) Mar.  6,  2013 CELLANT  Corp.  R&D  Strategy  Division Yukinori  SUDA @sudabon 1 Copyright © CELLANT Corp. All Rights Reserved. http://www.cellant.jp/
  2. 2. Cloudera  impala  0.6  beta v  ChangeLogs  from  0.5  beta v  Cloudera  Manager  4.5  and  CDH  4.2  support  Impala  0.6. v  Support  for  the  RCFile  file  format. v  Added  support  for  Impala  on  SUSE  and  Debian/Ubuntu. v RHEL5.7/6.2  and  Centos5.7/6.2 v SUSE  11  with  Service  Pack  1  or  later v Ubuntu  10.04/12.04  and  Debian  6.03 2Copyright © CELLANT Corp. All Rights Reserved. http://www.cellant.jp/
  3. 3. System  Environment v  Install  via  Cloudera  Manager  Free  Edition  4.5.0 Master Slave DataNode DataNode DataNode DataNode Active TaskTracker TaskTracker TaskTracker TaskTracker NameNode Impalad Impalad Impalad Impalad DataNode DataNode DataNode DataNode Stand-‐‑‒by TaskTracker TaskTracker TaskTracker TaskTracker NameNode Impalad Impalad Impalad Impalad DataNode JobTracker DataNode DataNode TaskTracker statestored TaskTracker TaskTracker Impalad Impalad Impalad 3  Servers 11  Servers All  servers  are  connected  with  1Gbps  Ethernet  through  an  L2  switch 3Copyright © CELLANT Corp. All Rights Reserved. http://www.cellant.jp/
  4. 4. Server  Specification v CPU l  Intel  Core  2  Duo  2.13  GHz  with  Hyper  Threading v Memory l  4GB v Disk l  7,200  rpm  SATA  mechanical  Hard  Disk  Drive v OS l  Cent  OS  6.2 4Copyright © CELLANT Corp. All Rights Reserved. http://www.cellant.jp/
  5. 5. Benchmark v  Use  CDH4.2.0  +  impala  version  0.6  beta v  Use  hivebench  in  open-‐‑‒sourced  benchmark  tool  “HiBench” l  https://github.com/hibench v  Modified  datasets  to  1/10  scale l  Default  configuration  generates  table  with  1  billion  rows v  Modified  query  sentence l  Deleted  “INSERT  INTO  TABLE  …”  to  evaluate  read-‐‑‒only  performance v  Combines  a  few  Hive  storage  format  with  a  few  compression   method l  TextFile,  SequenceFile,  RCFile l  No  compression,  Gzip,  Snappy v  Comparison  with  job  query  latency v  Average  job  latency  over  5  measurements 5Copyright © CELLANT Corp. All Rights Reserved. http://www.cellant.jp/
  6. 6. Modified  Datasets •  Uservisits  table •  Rankings  table –  100  million  rows –  12  million  rows –  Table  Definitions –  Table  Definitions •  sourceIP string •  pageURL string •  destURL string •  pageRank int •  visitDate string •  avgDuration int •  adRevenue double •  userAgent string •  countryCode string •  languageCode string •  searchWord string •  duration int 6Copyright © CELLANT Corp. All Rights Reserved. http://www.cellant.jp/
  7. 7. Modified  Query SELECT ON   sourceIP,   (R.pageURL  =  NUV.destURL)   sum(adRevenue)  as  totalRevenue, group  by  sourceIP   avg(pageRank)   order  by  totalRevenue  DESC FROM limit  1;   rankings_̲t  R JOIN  (   SELECT     sourceIP,     destURL,     adRevenue   FROM     uservisits_̲t  UV   WHERE     (datediff(UV.visitDate,  1999-‐‑‒01-‐‑‒01)>=0     AND     datediff(UV.visitDate,  2000-‐‑‒01-‐‑‒01)<=0)   )  NUV 7Copyright © CELLANT Corp. All Rights Reserved. http://www.cellant.jp/
  8. 8. Benchmark  Result  (Hive) 197.894 Snappy RCFile 234.289 Gzip SequenceFile 213.616 Snappy 227.883 Gzip TextFile 235.843 No  Comp. 0 50 100 150 200 250 Avg.  Job  Latency  [sec] 8Copyright © CELLANT Corp. All Rights Reserved. http://www.cellant.jp/
  9. 9. Benchmark  Result  (impala) 16.059 Snappy RCFile 17.03 Gzip SequenceFile 17.725 Snappy 21.25 Gzip TextFile 32.776 No  Comp. 0 50 100 150 200 250 Avg.  Job  Latency  [sec] 9Copyright © CELLANT Corp. All Rights Reserved. http://www.cellant.jp/
  10. 10. Block  Location  Cache  effect  ? TextFile SequenceFile RCFile job No Comp. Gzip Snappy Gzip Snappy 1st 50.256 23.692 22.085 18.475 20.042 2nd 34.905 20.710 19.733 16.690 18.859 3rd 30.752 20.604 15.608 16.620 16.642 4th 26.848 20.625 15.602 16.617 12.148 5th 21.121 20.620 15.597 16.747 12.606 Average 32.776 21.250 17.725 17.030 16.059 v  1st  job  is  the  slowest,  and  the  fastest  job  is  one  of  the  others   due  to  Block  Location  Cache  effect? 10Copyright © CELLANT Corp. All Rights Reserved. http://www.cellant.jp/
  11. 11. Conclusion v Impala  is  over  10  times  faster  than  MRv1  +   Hive v Specifically, l  Impala  0.6  beta •  RCFile  compressed  as  Snappy:  16.059  sec l  MRv1  +  Hive  0.10 •  RCFile  compressed  as  Snappy:  197.894  sec v Hope  that  impala  GA  included  in  CDH5   makes  faster l  Support  Trevni  columner  format l  Optimized  Query  Planner 11Copyright © CELLANT Corp. All Rights Reserved. http://www.cellant.jp/
  12. 12. Thanks. 12Copyright © CELLANT Corp. All Rights Reserved. http://www.cellant.jp/

×