Presentations from the Cloudera Impala meetup on Aug 20 2013

7,181 views

Published on

Presentations from the Cloudera Impala meetup on Aug 20 2013:

- Nong Li on Parquet+Impala and UDF support
- Henry Robinson on performance tuning for Impala

Published in: Technology

Presentations from the Cloudera Impala meetup on Aug 20 2013

  1. 1. 1 Parquet  Update/UDFs  in  Impala     Nong  Li   So:ware  Engineer,  Cloudera  
  2. 2. Agenda   2 •  Parquet   •  File  format  descripBon   •  Benchmark  Results  in  Impala   •  Parquet  2.0   •  UDF/UDAs  
  3. 3. 3 Parquet  
  4. 4. 4
  5. 5. 5
  6. 6. 6
  7. 7. 7
  8. 8. Data  Pages   8 •  Values  are  stored  in  data  pages  as  a  triple:   DefiniBon  Level,  RepeBBon  Level  and  Value.   •  These  are  stored  conBguous  on  disk  =>  1  seek  to  read  a   column  regardless  of  nesBng.   •  Data  pages  are  stored  with  different   encodings:   •  Bit  packing  and  Run  Length  Encoding  (RLE)   •  DicBonary  for  strings   •  Extended  to  all  types  in  Parquet  1.1   •  Plain  (liWle  endian  encoding)  for  naBve  types.  
  9. 9. Parquet  2.0   9 •  AddiBonal  Encodings   •  Group  VarInt  (for  small  ints)   •  Improved  string  storage  format   •  Delta  Encoding  (for  strings  and  ints)   •  AddiBonal  Metadata   •  Sorted  files   •  Page/Column/File  StaBsBcs   •  Expected  to  further  reduce  on  disk  size  and   allow  for  skipping  values  on  the  read  path.  
  10. 10. Hardware  Setup   10 •  10  Nodes   •  16  Core  Xeon   •  48  GB  Ram   •  12  Disks   •  CDH4.3   •  Impala  1.1  
  11. 11. TPC-­‐H  lineitem  table  @  1TB  scale  factor   11 0   100   200   300   400   500   600   700   800   Text   Text  w/  Lzo   Seq  w/  Snappy   Avro  w/  Snappy   RcFile  w/  Snappy   Parquet  w/  Snappy   Seq  w/  Gzip   Size  (GB)  
  12. 12. Query  Times  on  TPC-­‐H  lineitem  table   12 0   100   200   300   400   500   600   700   800   1  Column   3  Columns   5  Columns   16  (all)  Columns   5  Columns,  3   Clients   Tpch  Q1  (7   Columns)   Bytes  Read  Q1   (GB)   Text   Seq  w/  Snappy   Avro  w/  Snappy   RcFile  w/  Snappy   Parquet  w/  Snappy  
  13. 13. Query  Times  on  TPCDS  Queries   13 0   50   100   150   200   250   300   350   400   450   500   Q27   Q34   Q42   Q43   Q46   Q52   Q55   Q59   Q65   Q73   Q79   Q96   Seconds   Text   Seq  w/  Snappy   RC  w/Snappy   Parquet  w/Snappy   Average  Times  (Geometric  Mean)   •  Text:  224  seconds   •  Seq  Snappy:  257  seconds   •  RC  Snappy:  150  seconds   •  Parquet:  61  seconds  
  14. 14. Agenda   14 •  Parquet   •  File  format  descripBon   •  Benchmark  Results  in  Impala   •  What’s  Next   •  UDF/UDAs  (Work  in  Progress)  
  15. 15. Terminology   15 •  UDF:  Tuple  -­‐>  Scalar   user-­‐defined  funcBon   •  E.g.  Substring   •  UDA/UDAF:  {Tuple}  -­‐>  Scalar   user-­‐defined  aggregate  funcBon   •  E.g.  Min   •  UDTF:  {Tuple}  -­‐>  {Tuple}   user-­‐defined  table  funcBon  
  16. 16. Impala  1.2   16 •  Support  Hive  UDFs  (java)   •  ExisBng  hive  jars  will  run  without  a  recompile.   •  Add  Impala  (naBve)  UDFs  and  UDAs.   •  New  interface  designed  to  execute  as  efficiently  as   possible  for  Impala.   •  Similar  interface  as  Postgres  UDFs/UDAs   •  UDF/UDA  registered  for  impala  service  in   metadata  catalog   •  i.e.  CREATE  FUNCTION/CREATE  AGGREGATE      
  17. 17. Example  UDF   17 //  This  UDF  adds  two  ints  and  returns  an  int.     IntVal  AddUdf(UdfContext*  context,                const  IntVal&  arg1,                                const  IntVal&  arg2)  {        if  (arg1.is_null  ||  arg2.is_null)  return  IntVal::null();      return  IntVal(arg1.val  +  arg2.val);   }  
  18. 18. DDL   18 CREATE  statement  will  need  to  specify  the   UDF/UDA  signature,  the  locaBon  of  the   binary  and  the  symbol  for  the  UDF  funBon.   CREATE  FUNCTION  substring(string,  int,  int)   RETURNS  string  LOCATION  “hdfs://path”   “com.me.Substring”     CREATE  FUNCTION  log(anytype)  RETURNS  anytype   LOCATION  “hdfs:://path2”  “Log”  
  19. 19. UDFs   19 •  Support  for  variadic  args     •  Support  for  polymorphic  types  
  20. 20. UDAs   20 •  UDA  must  implement  typical  state   machine:   •  Init()   •  Update()   •  Serialize()   •  Merge()   •  Finalize()   •  Data  movement  handled  by  Impala  
  21. 21. UDA  Example   21 //  This  is  a  sample  of  implementing  the  COUNT  aggregate  function.     void  Init(UdfContext*  context,  BigIntVal*  val)  {      val-­‐>is_null  =  false;      val-­‐>val  =  0;   }     void  Update(UdfContext*  context,  const  AnyVal&  input,  BigIntVal*  val)  {      if  (input.is_null)  return;      ++val-­‐>val;   }     void  Merge(UdfContext*  context,  const  BigIntVal&  src,  BigIntVal*  dst)  {      dst-­‐>val  +=  src.val;   }     BigIntVal  Finalize(UdfContext*  context,  const  BigIntVal&  val)  {      return  val;   }  
  22. 22. RunBme  Code-­‐GeneraBon   22 •  Impala  uses  LLVM  to,  at  runBme,   generate  code  to  run  the  query.   •  Takes  into  account  constants  that  that  are  only   known  a:er  query  analysis.   •  Greatly  improves  CPU  efficiency   •  NaBve  UDFs/UDAs  can  benefit  from  this  as   well.   •  Instead  of  providing  the  UDF/UDA  as  a  shared  object,   compile  it  (with  CLANG)  with  an  addiBonal  flag  and   Impala  to  LLVM  IR   •  IR  will  be  integrated  with  the  query  execuBon.   •  No  funcBon  call  overhead  for  UDF/UDAs  
  23. 23. LimitaBons   23 •  Hive  UDAs/UDTFs  not  supported   •  No  UDTFs  in  naBve  interface   •  Can’t  run  out  of  process   •  NaBve  interface  is  designed  to  support  this,   will  be  able  to  run  without  a  recompile   •  We’re  planning  to  address  this  in  Impala   1.3      
  24. 24. Thanks!   24 •  We’d  love  your  feedback  for  UDFs/UDAs   •  QuesBons?  
  25. 25. Performance Considerations for Cloudera Impala Henry Robinson henry@cloudera.com / @henryr Impala Meetup 2013-08-20
  26. 26. Agenda ● The basics: Performance Checklist ● Review: How does Impala execute queries? ● What makes queries fast (or slow)? ● How can I debug my queries?
  27. 27. Impala Performance Checklist ● Verify – Simple count * query on a relatively big table and verify: ○ Data locality, block locality, and NO check-summing (“Testing Impala Performance”) ○ Optimal IO throughput of HDFS scans (typically ~100 MB/s per disk) ● Stats – BOTH table and column stats, especially for: ○ Joining two large tables ○ Insert into as select through Impala ● Join table ordering – will be automatic in the Impala 2.0 wave. Until then: ○ Largest table first ○ Then most selective to least selective ● Monitor - monitor Impala queries to pinpoint slow queries and drill into potential issues ○ CM 4.6 adds query monitoring ○ CM 5.0 will have the next big enhancements
  28. 28. Part 1: How does Impala execute queries?
  29. 29. The basic idea ● Every Impala query runs across a cluster of multiple nodes, with lots of available CPU cores, memory and disk ● Best query speeds usually come when every node in the cluster has something to do ● Impala solves two basic problems: ○ Figure out what every node should do (compilation) ○ Make them do it really quickly! (execution)
  30. 30. Query compilation ● a.k.a. ‘figuring out what every node should do’ ● Impala compiles a SQL query into a plan describing what to execute, and where ● A plan is shaped like a tree. Data flows up from the leaves of the tree to the root. ● Each node in the tree is a query operator ● Impala chops this tree up into plan fragments ● Each node gets one or more plan fragments
  31. 31. Query execution ● Once started, each query operator can run independently of any other operator ● Every operator can be doing something at the same time ● This is the not-so-secret sauce for all massively parallel query execution engines
  32. 32. Part 2: What makes queries fast (or... slow)?
  33. 33. What determines performance? ● Data size ● Per-operator execution efficiency ● Available parallelism ● Available concurrency ● Hardware ● Schema design and file format
  34. 34. Data size ● More data means more work ● Not just the size of the disk-based data at plan leaves, but size of internal data flowing in to any operator ● How can you help? ○ Partition your data ○ SELECT with LIMIT in subqueries ○ Push predicates down ○ Use correct JOIN order ■ Gather table statistics ○ Use the right file format
  35. 35. ● Tables are joined in the order listed in the FROM clause ● Impala uses left-deep trees for nested joins ● “Largest” table should be listed first ○ largest = returning most rows before join filtering ○ In a star schema, this is often the fact table ● Then list tables in order of most selective join filter to least selective ○ Filter the most rows as early as possible Table Ordering
  36. 36. Join Types ● Two types of join strategy are supported ○ Broadcast ○ Shuffle/Partitioned ● Broadcast ○ Each node receives a full copy of the right table ○ Per node memory usage = size of right table ● Shuffle ○ Both sides of the join are partitioned ○ Matching partitions sent to same node ○ Per node memory usage = 1/nodes x size of right table ● Without column statistics, all joins are broadcast
  37. 37. Per-operator execution efficiency ● Impala is fast, and getting faster ● LLVM-based improvements ● More efficient disk scanners ● More modern algorithms from the DB literature ● How can you help? ○ Upgrade to the latest version
  38. 38. Available parallelism ● Parallelism: number of resources available to use at once ● More hardware means more parallelism ● Impala will take advantage of more cores, disks and memory where possible ● Easiest (but most expensive!) way to improve performance of large class of queries ● You can scale up incrementally
  39. 39. Available concurrency ● Concurrency: how well can a query take advantage of available parallelism? ● Impala will take care of this mostly for you ● But some operators naturally don’t parallelise well in certain conditions ● For example: joining two huge tables together. ○ The hash-node operators have to wait for one side to be read completely before reading much of the other side ● How you can help: ○ Read the profiles, look for obvious bottlenecks, rephrase if possible
  40. 40. Hardware ● Designed for modern hardware ○ Leverages SSE 4.2 (Intel Nehalem or newer) ○ LLVM Compiler Infrastructure ○ Runtime Code Generation ○ In-memory execution pipelines ● Today’s hardware ○ 2 x Xeon E5 6 core CPUs ○ 12 x 3 TB HDD ○ 128 GB RAM ● How you can help: ○ Use the supported platforms, with Cloudera’s packages
  41. 41. Schema design ● PARTITION BY is an easy win ● In general, string is slower than fixed-width types (particularly for aggregations etc) ● File formats are crucial ○ Experiment with Parquet for performance ○ Avoid text
  42. 42. Supported File Formats ● Various HDFS file formats ○ Text File (read/write) ○ Avro (read) ○ SequenceFile (read) ○ RCFile (read) ○ ParquetFile (read/write) ● Various compression codecs ○ Snappy (ParquetFile, RCFile, SequenceFile, Avro) ○ LZO (Text) ○ Bzip (ParquetFile, RCFile, SequenceFile, Avro) ○ Gzip (ParquetFile, RCFile, SequenceFile, Avro) ● HBase also supported
  43. 43. Partitioning Considerations ● Single largest performance feature ○ Skips unnecessary data ○ Requires queries contain partition keys as filters ● Choose a reasonable number of partitions ○ Lots of small files becomes an issue ○ Metadata overhead on NameNode ○ Metadata overhead for Hive Metastore ○ Impala caches this, but first load may take long
  44. 44. Part 3: Debugging queries
  45. 45. The Debug Pages ● Every impalad exports a lot of useful information on http://<impalad>:25000 (by default), including: ○ Last 25 queries ○ Active sessions ○ Known tables ○ Last 1MB of the log ○ System metrics ○ Query profiles ● Information-dense - not for the faint of heart!
  46. 46. Thanks! Questions? Try It Out! ● Apache-licensed open source ○ Impala 1.1 released 7/24/2013 ○ Impala 1.0 GA released 4/30/2013 ● Questions/comments? ○ Download: cloudera.com/impala ○ Email: impala-user@cloudera.org ○ Join: groups.cloudera.org ○ MeetUp: meetup.com/Bay-Area-Impala-Users- Group/

×