Presentations from the Cloudera Impala meetup on Aug 20 2013
 

Presentations from the Cloudera Impala meetup on Aug 20 2013

on

  • 5,744 views

Presentations from the Cloudera Impala meetup on Aug 20 2013:

Presentations from the Cloudera Impala meetup on Aug 20 2013:

- Nong Li on Parquet+Impala and UDF support
- Henry Robinson on performance tuning for Impala

Statistics

Views

Total Views
5,744
Views on SlideShare
5,322
Embed Views
422

Actions

Likes
25
Downloads
193
Comments
0

5 Embeds 422

http://www.bigdatanosql.com 240
https://twitter.com 160
http://www.scoop.it 20
http://webcache.googleusercontent.com 1
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Presentations from the Cloudera Impala meetup on Aug 20 2013 Presentations from the Cloudera Impala meetup on Aug 20 2013 Presentation Transcript

  • 1 Parquet  Update/UDFs  in  Impala     Nong  Li   So:ware  Engineer,  Cloudera  
  • Agenda   2 •  Parquet   •  File  format  descripBon   •  Benchmark  Results  in  Impala   •  Parquet  2.0   •  UDF/UDAs  
  • 3 Parquet  
  • 4
  • 5
  • 6
  • 7
  • Data  Pages   8 •  Values  are  stored  in  data  pages  as  a  triple:   DefiniBon  Level,  RepeBBon  Level  and  Value.   •  These  are  stored  conBguous  on  disk  =>  1  seek  to  read  a   column  regardless  of  nesBng.   •  Data  pages  are  stored  with  different   encodings:   •  Bit  packing  and  Run  Length  Encoding  (RLE)   •  DicBonary  for  strings   •  Extended  to  all  types  in  Parquet  1.1   •  Plain  (liWle  endian  encoding)  for  naBve  types.  
  • Parquet  2.0   9 •  AddiBonal  Encodings   •  Group  VarInt  (for  small  ints)   •  Improved  string  storage  format   •  Delta  Encoding  (for  strings  and  ints)   •  AddiBonal  Metadata   •  Sorted  files   •  Page/Column/File  StaBsBcs   •  Expected  to  further  reduce  on  disk  size  and   allow  for  skipping  values  on  the  read  path.  
  • Hardware  Setup   10 •  10  Nodes   •  16  Core  Xeon   •  48  GB  Ram   •  12  Disks   •  CDH4.3   •  Impala  1.1  
  • TPC-­‐H  lineitem  table  @  1TB  scale  factor   11 0   100   200   300   400   500   600   700   800   Text   Text  w/  Lzo   Seq  w/  Snappy   Avro  w/  Snappy   RcFile  w/  Snappy   Parquet  w/  Snappy   Seq  w/  Gzip   Size  (GB)  
  • Query  Times  on  TPC-­‐H  lineitem  table   12 0   100   200   300   400   500   600   700   800   1  Column   3  Columns   5  Columns   16  (all)  Columns   5  Columns,  3   Clients   Tpch  Q1  (7   Columns)   Bytes  Read  Q1   (GB)   Text   Seq  w/  Snappy   Avro  w/  Snappy   RcFile  w/  Snappy   Parquet  w/  Snappy  
  • Query  Times  on  TPCDS  Queries   13 0   50   100   150   200   250   300   350   400   450   500   Q27   Q34   Q42   Q43   Q46   Q52   Q55   Q59   Q65   Q73   Q79   Q96   Seconds   Text   Seq  w/  Snappy   RC  w/Snappy   Parquet  w/Snappy   Average  Times  (Geometric  Mean)   •  Text:  224  seconds   •  Seq  Snappy:  257  seconds   •  RC  Snappy:  150  seconds   •  Parquet:  61  seconds  
  • Agenda   14 •  Parquet   •  File  format  descripBon   •  Benchmark  Results  in  Impala   •  What’s  Next   •  UDF/UDAs  (Work  in  Progress)  
  • Terminology   15 •  UDF:  Tuple  -­‐>  Scalar   user-­‐defined  funcBon   •  E.g.  Substring   •  UDA/UDAF:  {Tuple}  -­‐>  Scalar   user-­‐defined  aggregate  funcBon   •  E.g.  Min   •  UDTF:  {Tuple}  -­‐>  {Tuple}   user-­‐defined  table  funcBon  
  • Impala  1.2   16 •  Support  Hive  UDFs  (java)   •  ExisBng  hive  jars  will  run  without  a  recompile.   •  Add  Impala  (naBve)  UDFs  and  UDAs.   •  New  interface  designed  to  execute  as  efficiently  as   possible  for  Impala.   •  Similar  interface  as  Postgres  UDFs/UDAs   •  UDF/UDA  registered  for  impala  service  in   metadata  catalog   •  i.e.  CREATE  FUNCTION/CREATE  AGGREGATE      
  • Example  UDF   17 //  This  UDF  adds  two  ints  and  returns  an  int.     IntVal  AddUdf(UdfContext*  context,                const  IntVal&  arg1,                                const  IntVal&  arg2)  {        if  (arg1.is_null  ||  arg2.is_null)  return  IntVal::null();      return  IntVal(arg1.val  +  arg2.val);   }  
  • DDL   18 CREATE  statement  will  need  to  specify  the   UDF/UDA  signature,  the  locaBon  of  the   binary  and  the  symbol  for  the  UDF  funBon.   CREATE  FUNCTION  substring(string,  int,  int)   RETURNS  string  LOCATION  “hdfs://path”   “com.me.Substring”     CREATE  FUNCTION  log(anytype)  RETURNS  anytype   LOCATION  “hdfs:://path2”  “Log”  
  • UDFs   19 •  Support  for  variadic  args     •  Support  for  polymorphic  types  
  • UDAs   20 •  UDA  must  implement  typical  state   machine:   •  Init()   •  Update()   •  Serialize()   •  Merge()   •  Finalize()   •  Data  movement  handled  by  Impala  
  • UDA  Example   21 //  This  is  a  sample  of  implementing  the  COUNT  aggregate  function.     void  Init(UdfContext*  context,  BigIntVal*  val)  {      val-­‐>is_null  =  false;      val-­‐>val  =  0;   }     void  Update(UdfContext*  context,  const  AnyVal&  input,  BigIntVal*  val)  {      if  (input.is_null)  return;      ++val-­‐>val;   }     void  Merge(UdfContext*  context,  const  BigIntVal&  src,  BigIntVal*  dst)  {      dst-­‐>val  +=  src.val;   }     BigIntVal  Finalize(UdfContext*  context,  const  BigIntVal&  val)  {      return  val;   }  
  • RunBme  Code-­‐GeneraBon   22 •  Impala  uses  LLVM  to,  at  runBme,   generate  code  to  run  the  query.   •  Takes  into  account  constants  that  that  are  only   known  a:er  query  analysis.   •  Greatly  improves  CPU  efficiency   •  NaBve  UDFs/UDAs  can  benefit  from  this  as   well.   •  Instead  of  providing  the  UDF/UDA  as  a  shared  object,   compile  it  (with  CLANG)  with  an  addiBonal  flag  and   Impala  to  LLVM  IR   •  IR  will  be  integrated  with  the  query  execuBon.   •  No  funcBon  call  overhead  for  UDF/UDAs  
  • LimitaBons   23 •  Hive  UDAs/UDTFs  not  supported   •  No  UDTFs  in  naBve  interface   •  Can’t  run  out  of  process   •  NaBve  interface  is  designed  to  support  this,   will  be  able  to  run  without  a  recompile   •  We’re  planning  to  address  this  in  Impala   1.3      
  • Thanks!   24 •  We’d  love  your  feedback  for  UDFs/UDAs   •  QuesBons?  
  • Performance Considerations for Cloudera Impala Henry Robinson henry@cloudera.com / @henryr Impala Meetup 2013-08-20
  • Agenda ● The basics: Performance Checklist ● Review: How does Impala execute queries? ● What makes queries fast (or slow)? ● How can I debug my queries?
  • Impala Performance Checklist ● Verify – Simple count * query on a relatively big table and verify: ○ Data locality, block locality, and NO check-summing (“Testing Impala Performance”) ○ Optimal IO throughput of HDFS scans (typically ~100 MB/s per disk) ● Stats – BOTH table and column stats, especially for: ○ Joining two large tables ○ Insert into as select through Impala ● Join table ordering – will be automatic in the Impala 2.0 wave. Until then: ○ Largest table first ○ Then most selective to least selective ● Monitor - monitor Impala queries to pinpoint slow queries and drill into potential issues ○ CM 4.6 adds query monitoring ○ CM 5.0 will have the next big enhancements
  • Part 1: How does Impala execute queries?
  • The basic idea ● Every Impala query runs across a cluster of multiple nodes, with lots of available CPU cores, memory and disk ● Best query speeds usually come when every node in the cluster has something to do ● Impala solves two basic problems: ○ Figure out what every node should do (compilation) ○ Make them do it really quickly! (execution)
  • Query compilation ● a.k.a. ‘figuring out what every node should do’ ● Impala compiles a SQL query into a plan describing what to execute, and where ● A plan is shaped like a tree. Data flows up from the leaves of the tree to the root. ● Each node in the tree is a query operator ● Impala chops this tree up into plan fragments ● Each node gets one or more plan fragments
  • Query execution ● Once started, each query operator can run independently of any other operator ● Every operator can be doing something at the same time ● This is the not-so-secret sauce for all massively parallel query execution engines
  • Part 2: What makes queries fast (or... slow)?
  • What determines performance? ● Data size ● Per-operator execution efficiency ● Available parallelism ● Available concurrency ● Hardware ● Schema design and file format
  • Data size ● More data means more work ● Not just the size of the disk-based data at plan leaves, but size of internal data flowing in to any operator ● How can you help? ○ Partition your data ○ SELECT with LIMIT in subqueries ○ Push predicates down ○ Use correct JOIN order ■ Gather table statistics ○ Use the right file format
  • ● Tables are joined in the order listed in the FROM clause ● Impala uses left-deep trees for nested joins ● “Largest” table should be listed first ○ largest = returning most rows before join filtering ○ In a star schema, this is often the fact table ● Then list tables in order of most selective join filter to least selective ○ Filter the most rows as early as possible Table Ordering
  • Join Types ● Two types of join strategy are supported ○ Broadcast ○ Shuffle/Partitioned ● Broadcast ○ Each node receives a full copy of the right table ○ Per node memory usage = size of right table ● Shuffle ○ Both sides of the join are partitioned ○ Matching partitions sent to same node ○ Per node memory usage = 1/nodes x size of right table ● Without column statistics, all joins are broadcast
  • Per-operator execution efficiency ● Impala is fast, and getting faster ● LLVM-based improvements ● More efficient disk scanners ● More modern algorithms from the DB literature ● How can you help? ○ Upgrade to the latest version
  • Available parallelism ● Parallelism: number of resources available to use at once ● More hardware means more parallelism ● Impala will take advantage of more cores, disks and memory where possible ● Easiest (but most expensive!) way to improve performance of large class of queries ● You can scale up incrementally
  • Available concurrency ● Concurrency: how well can a query take advantage of available parallelism? ● Impala will take care of this mostly for you ● But some operators naturally don’t parallelise well in certain conditions ● For example: joining two huge tables together. ○ The hash-node operators have to wait for one side to be read completely before reading much of the other side ● How you can help: ○ Read the profiles, look for obvious bottlenecks, rephrase if possible
  • Hardware ● Designed for modern hardware ○ Leverages SSE 4.2 (Intel Nehalem or newer) ○ LLVM Compiler Infrastructure ○ Runtime Code Generation ○ In-memory execution pipelines ● Today’s hardware ○ 2 x Xeon E5 6 core CPUs ○ 12 x 3 TB HDD ○ 128 GB RAM ● How you can help: ○ Use the supported platforms, with Cloudera’s packages
  • Schema design ● PARTITION BY is an easy win ● In general, string is slower than fixed-width types (particularly for aggregations etc) ● File formats are crucial ○ Experiment with Parquet for performance ○ Avoid text
  • Supported File Formats ● Various HDFS file formats ○ Text File (read/write) ○ Avro (read) ○ SequenceFile (read) ○ RCFile (read) ○ ParquetFile (read/write) ● Various compression codecs ○ Snappy (ParquetFile, RCFile, SequenceFile, Avro) ○ LZO (Text) ○ Bzip (ParquetFile, RCFile, SequenceFile, Avro) ○ Gzip (ParquetFile, RCFile, SequenceFile, Avro) ● HBase also supported
  • Partitioning Considerations ● Single largest performance feature ○ Skips unnecessary data ○ Requires queries contain partition keys as filters ● Choose a reasonable number of partitions ○ Lots of small files becomes an issue ○ Metadata overhead on NameNode ○ Metadata overhead for Hive Metastore ○ Impala caches this, but first load may take long
  • Part 3: Debugging queries
  • The Debug Pages ● Every impalad exports a lot of useful information on http://<impalad>:25000 (by default), including: ○ Last 25 queries ○ Active sessions ○ Known tables ○ Last 1MB of the log ○ System metrics ○ Query profiles ● Information-dense - not for the faint of heart!
  • Thanks! Questions? Try It Out! ● Apache-licensed open source ○ Impala 1.1 released 7/24/2013 ○ Impala 1.0 GA released 4/30/2013 ● Questions/comments? ○ Download: cloudera.com/impala ○ Email: impala-user@cloudera.org ○ Join: groups.cloudera.org ○ MeetUp: meetup.com/Bay-Area-Impala-Users- Group/