• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Performance Evaluation of Cloudera Impala GA
 

Performance Evaluation of Cloudera Impala GA

on

  • 2,755 views

Performance Evaluation of Cloudera Impala GA

Performance Evaluation of Cloudera Impala GA

Statistics

Views

Total Views
2,755
Views on SlideShare
2,560
Embed Views
195

Actions

Likes
5
Downloads
49
Comments
2

1 Embed 195

https://twitter.com 195

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

12 of 2 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • Thx for your comment.
    I got vmstat output. So I post it below.
    This is the node that coordinator doesn't work on.

    procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
    r b swpd free buff cache si so bi bo in cs us sy id wa st
    2 0 235704 2707672 8252 688024 2 3 38 40 2 2 3 0 96 1 0
    0 0 235704 2707672 8260 688028 0 0 0 34 222 356 0 0 99 0 0
    0 0 235704 2707672 8260 688028 0 0 0 0 228 345 0 0 100 0 0
    0 0 235704 2706800 8260 688028 0 0 0 0 237 576 1 0 99 0 0
    0 0 235704 2706800 8268 688028 0 0 0 10 213 354 0 0 99 0 0
    0 0 235704 2706800 8268 688028 0 0 0 0 211 342 0 0 100 0 0
    0 0 235704 2706800 8268 688028 0 0 0 0 235 355 0 0 100 0 0
    13 0 235704 2553448 8276 688760 0 0 368 60 9266 13693 39 18 42 1 0
    0 0 235704 2336192 8276 688760 0 0 0 54 15222 24274 36 29 35 0 0
    2 0 235704 2167308 8276 688760 0 0 0 2 13604 21715 37 25 38 0 0
    9 0 235704 1946424 8284 688756 0 0 0 8 15724 25903 32 28 39 0 0
    4 0 235704 1906328 8284 688760 0 0 0 10 10295 16678 40 17 43 0 0
    2 0 235704 1588900 8284 688760 0 0 0 0 15357 25161 32 30 38 0 0
    2 0 235704 1550012 8292 688752 0 0 0 8 15750 27304 31 27 42 0 0
    2 0 235704 1253764 8292 688760 0 0 20 0 13792 21643 34 29 36 0 0
    1 0 235704 956628 8292 688804 0 0 0 0 4265 4563 44 6 50 0 0
    2 0 235704 930324 8300 688800 0 0 0 18 3051 3387 51 5 44 1 0
    0 0 235704 927192 8300 688804 0 0 0 2 2740 2930 68 2 30 0 0
    0 0 235704 926436 8300 688804 0 0 0 16 1081 1365 3 1 97 0 0
    0 0 235704 924352 8308 688804 0 0 0 20 899 1118 4 1 94 1 0
    0 0 235704 924368 8308 688804 0 0 0 0 211 348 0 0 100 0 0
    Are you sure you want to
    Your message goes here
    Processing…
  • thx. as only 4 GB memory what would 'vmstat 2 100 >>fred.txt &' look like on a typical box started 10 secs before you launch it
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Performance Evaluation of Cloudera Impala GA Performance Evaluation of Cloudera Impala GA Presentation Transcript

    • Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /1 1 Performance  Evaluation  ofCloudera  impala  1.0May  1,  2013CELLANT  Corp.  R&D  Strategy  DivisionYukinori  SUDA@sudabon
    • Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /v  Support  for  a  subset  of  ANSI-‐‑‒92  SQLv  CREATE,  ALTER,  SELECT,  INSERT,  JOIN,  and  subqueriesv  Support  for  partitioned  joins,  fully  distributed  aggregations,  and  fully  distributed  top-‐‑‒n  queriesv  Support  for  a  variety  of  data  formats:v  Hadoop  native  (Apache  Avro,  SequenceFile,  RCFile  with  Snappy,  GZIP,  BZIP,  or  uncompressed)v  text  (uncompressed  or  LZO-‐‑‒compressed)v  Parquet  (Snappy  or  uncompressed)v  Support  for  all  CDH4  64-‐‑‒bit  packages:v  RHEL  6.2/5.7,  Ubuntu,  Debian,  SLESv  Connectivity  via  JDBC,  ODBC,  Hue  GUI,  or  command-‐‑‒line  shellv  Kerberos  authentication  and  MR/Impala  resource  isolationv  etcCloudera  Impala  GA  was  released  !!2
    • Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /Our  System  Environment3v  Install  using  Cloudera  Manager  Free  Edition  4.5.2Master Slave11  ServersAll  servers  are  connected  with  1Gbps  Ethernet  through  an  L2  switchActiveNameNodeDataNodeTaskTrackerImpaladStand-‐‑‒byNameNodeJobTrackerstatestored3  ServersDataNodeTaskTrackerImpaladDataNodeTaskTrackerImpaladDataNodeTaskTrackerImpaladDataNodeTaskTrackerImpaladDataNodeTaskTrackerImpaladDataNodeTaskTrackerImpaladDataNodeTaskTrackerImpaladDataNodeTaskTrackerImpaladDataNodeTaskTrackerImpaladDataNodeTaskTrackerImpalad
    • Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /v CPUl Intel  Core  2  Duo  2.13  GHz  with  Hyper  Threadingv Memoryl 4GBv Diskl 7,200  rpm  SATA  mechanical  Hard  Disk  Drive  *  1v OSl Cent  OS  6.2Our  “wimpy”  Server  Specification4
    • Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /v  Use  CDH4.2.1  +  Impala  version  1.0v  Use  hivebench  in  open-‐‑‒sourced  benchmark  tool  “HiBench”l  https://github.com/hibenchv  Modified  datasets  to  1/10  scalel  Default  configuration  generates  table  with  1  billion  rowsv  Modified  query  sentencel  Deleted  “INSERT  INTO  TABLE  …”  to  evaluate  read-‐‑‒only  performancev  Combines  a  few  storage  format  with  a  few  compression  methodl  TextFile,  SequenceFile,  RCFile,  ParquestFilel  No  compression,  Gzip,  Snappyv  Comparison  with  job  query  latencyv  Average  job  latency  over  5  measurementsBenchmark5
    • Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /•  Uservisits  table–  100  million  rows–  16,895  MB  as  TextFile–  Table  Definitions•  sourceIP  string•  destURL  string•  visitDate  string•  adRevenue  double•  userAgent  string•  countryCode  string•  languageCode  string•  searchWord  string•  duration  int•  Rankings  table–  12  million  rows–  744  MB  as  TextFile–  Table  Definitions•  pageURL string•  pageRank int•  avgDuration intModified  Datasets6
    • Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /SELECT  sourceIP,  sum(adRevenue)  as  totalRevenue,  avg(pageRank)  FROM  rankings_̲t  RJOIN  (  SELECT    sourceIP,    destURL,    adRevenue  FROM    uservisits_̲t  UV  WHERE    (datediff(UV.visitDate,  1999-‐‑‒01-‐‑‒01)>=0    AND    datediff(UV.visitDate,  2000-‐‑‒01-‐‑‒01)<=0)  )  NUVON  (R.pageURL  =  NUV.destURL)group  by  sourceIPorder  by  totalRevenue  DESClimit  1;Modified  Query7
    • Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /Benchmark  Result  (Hive)cited  from  “Performance  evaluation  of  Cloudera  impala  0.6  beta...”80 50 100 150 200 250No  Comp.GzipSnappyGzipSnappyTextFileSequenceFileRCFile235.843227.883213.616234.289197.894Avg.  Job  Latency  [sec]
    • Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /Benchmark  Result  (Impala)90 50 100 150 200 250No  Comp.GzipSnappyGzipSnappySnappyTextFileSequenceFileRCFileParquetFile36.6129.73624.02426.08319.58616.2Avg.  Job  Latency  [sec]
    • Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /v Exchange  the  order  of  JOINed  Tables  like  belowSELECTsourceIP,  sum(adRevenue)  as  totalRevenue,  avg(pageRank)FROM(SELECT  sourceIP,  destURL,  adRevenue  FROM  uservisits_̲ps  UV  WHERE  (datediff(UV.visitDate,  1999-‐‑‒01-‐‑‒01)>=0  AND  datediff(UV.visitDate,  2000-‐‑‒01-‐‑‒01)<=0))  NUVJOINrankings_̲ps  RON(R.pageURL  =  NUV.destURL)group  by  sourceIPorder  by  totalRevenue  DESClimit  1;v Resultl Parquet  compressed  as  Snappy:  34.374  secAdditional  Experiments10
    • Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /v Parquet  +  Snappy  is  the  fastestv Specifically,l ParquetFile  compressed  as  Snappy:  16.2  secv Need  to  take  care  the  order  of  JOINed  tablesv Hope  for  future  extensionl Support  UDFl Window  Functionl etcConclusion11
    • Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /12Letʼ’s  try  it  out  on  your  envrionment!!Thanks!