Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Strata Stinger Talk October 2013


Published on

Slides from the talk Apache Hive and Stinger, Petabyte scale SQL in Hadoop presented by Arun Murthy, Alan Gates, and Owen O'Malley

Published in: Technology
  • Ok, think I fixed it.
    Are you sure you want to  Yes  No
    Your message goes here
  • I'm not sure, there seems to be a problem in translation. It wasn't intentional. If you download the slides you should hopefully be able to see the graphs.
    Are you sure you want to  Yes  No
    Your message goes here
  • Why are all the slides with performance claims blanked out?
    Are you sure you want to  Yes  No
    Your message goes here

Strata Stinger Talk October 2013

  1. 1. Apache Hive and Stinger: SQL in Hadoop Arun Murthy (@acmurthy) Alan Gates (@alanfgates) Owen O’Malley (@owen_omalley) @hortonworks © Hortonworks Inc. 2013.
  2. 2. YARN: Taking Hadoop Beyond Batch Store ALL DATA in one place… Interact with that data in MULTIPLE WAYS with Predictable Performance and Quality of Service Applica'ons  Run  Na'vely  IN  Hadoop   BATCH   INTERACTIVE   (MapReduce)   (Tez)   ONLINE   (HBase)   STREAMING   (Storm,  S4,…)   GRAPH   (Giraph)   IN-­‐MEMORY   (Spark)   HPC  MPI   (OpenMPI)   OTHER   (Search)   (Weave…)   YARN  (Cluster  Resource  Management)       HDFS2  (Redundant,  Reliable  Storage)   © Hortonworks Inc. 2013. Page 2
  3. 3. Hadoop Beyond Batch with YARN A shift from the old to the new… Single Use System Multi Use Data Platform Batch Apps Batch, Interactive, Online, Streaming, … HADOOP 1 HADOOP 2 MapReduce   (batch)   MapReduce   Tez   (interac:ve)   YARN   Others   (varied)   (cluster  resource  management    &  data  processing)   (opera:ng  system:  cluster  resource  management)   HDFS   HDFS2   (redundant,  reliable  storage)   © Hortonworks Inc. 2013. (redundant,  reliable  storage)  
  4. 4. Apache Tez (“Speed”) •  Replaces MapReduce as primitive for Pig, Hive, Cascading etc. – Smaller latency for interactive queries – Higher throughput for batch queries – 22 contributors: Hortonworks (13), Facebook, Twitter, Yahoo, Microsoft Task with pluggable Input, Processor and Output Input   Processor   Output   Task   Tez Task - <Input, Processor, Output> YARN ApplicationMaster to run DAG of Tez Tasks © Hortonworks Inc. 2013.
  5. 5. Tez: Building blocks for scalable data processing Classical ‘Map’ HDFS   Input   Map   Processor   Classical ‘Reduce’ Sorted   Output   Shuffle   Input   Shuffle   Input   Reduce   Processor   Sorted   Output   Intermediate ‘Reduce’ for Map-Reduce-Reduce © Hortonworks Inc. 2013. Reduce   Processor   HDFS   Output  
  6. 6. Hive-on-MR vs. Hive-on-Tez Tez avoids unneeded writes to HDFS SELECT a.x, AVERAGE(b.y) AS avg FROM a JOIN b ON ( = GROUP BY a UNION SELECT x, AVERAGE(y) AS AVG FROM c GROUP BY x ORDER BY AVG; Hive – MR M M Hive – Tez M SELECT a.state SELECT R R M SELECT a.state, c.itemId M M M R M SELECT R M HDFS JOIN (a, c) SELECT c.price M R M R HDFS R JOIN (a, c) R HDFS JOIN(a, b) GROUP BY a.state COUNT(*) AVERAGE(c.price) © Hortonworks Inc. 2013. M M R M JOIN(a, b) GROUP BY a.state COUNT(*) AVERAGE(c.price) R
  7. 7. Tez Sessions … because Map/Reduce query startup is expensive • Tez Sessions – Hot containers ready for immediate use – Removes task and job launch overhead (~5s – 30s) • Hive – Session launch/shutdown in background (seamless, user not aware) – Submits query plan directly to Tez Session Native Hadoop service, not ad-hoc © Hortonworks Inc. 2013.
  8. 8. Tez Delivers Interactive Query - Out of the Box! Feature   Descrip'on   Benefit   Tez  Session   Overcomes  Map-­‐Reduce  job-­‐launch  latency  by  pre-­‐ launching  Tez  AppMaster   Latency   Tez  Container  Pre-­‐ Launch   Overcomes  Map-­‐Reduce  latency  by  pre-­‐launching   hot  containers  ready  to  serve  queries.   Latency   Finished  maps  and  reduces  pick  up  more  work   Tez  Container  Re-­‐Use   rather  than  exi:ng.  Reduces  latency  and  eliminates   difficult  split-­‐size  tuning.  Out  of  box  performance!   Run:me  re-­‐ Run:me  query  tuning  by  picking  aggrega:on   configura:on  of  DAG   parallelism  using  online  query  sta:s:cs   Tez  In-­‐Memory   Cache   Hot  data  kept  in  RAM  for  fast  access.   Complex  DAGs   Tez  Broadcast  Edge  and  Map-­‐Reduce-­‐Reduce   paXern  improve  query  scale  and  throughput.   © Hortonworks Inc. 2013. Latency   Throughput   Latency   Throughput   Page 8
  9. 9. Batch AND Interactive SQL-IN-Hadoop Stinger Initiative A broad, community-based effort to drive the next generation of HIVE S'nger  Project   (announced  February  2013)   Hive  0.11,  May  2013:   •  Base  Op:miza:ons   •  SQL  Analy:c  Func:ons   •  ORCFile,  Modern  File  Format   Goals: Speed Improve Hive query performance by 100X to allow for interactive query times (seconds) Hive  0.12,  October  2013:   •  •  •  •  The only SQL interface to Hadoop designed for queries that scale from TB to PB SQL   © Hortonworks Inc. 2013. Hive  on  Apache  Tez   Query  Service   Buffer  Cache   Cost  Based  Op:mizer  (Op:q)   Vectorized  Processing       Coming  Soon:   Support broadest range of SQL semantics for analytic applications running against Hadoop …all IN Hadoop VARCHAR,  DATE  Types   ORCFile  predicate  pushdown   Advanced  Op:miza:ons   Performance  Boosts  via  YARN   •  •  •  •  •  Scale
  10. 10. Hive 0.12 Hive 0.12 Release Theme Speed, Scale and SQL Specific Features •  10x faster query launch when using large number (500+) of partitions •  ORCFile predicate pushdown speeds queries •  Evaluate LIMIT on the map side •  Parallel ORDER BY •  New query optimizer •  Introduces VARCHAR and DATE datatypes •  GROUP BY on structs or unions Included Components Apache Hive 0.12 © Hortonworks Inc. 2013.
  11. 11. SPEED: Increasing Hive Performance Interactive Query Times across ALL use cases •  Simple and advanced queries in seconds •  Integrates seamlessly with existing tools •  Currently a >100x improvement in just nine months Performance Improvements included in Hive 12 –  Base & advanced query optimization –  Startup time improvement –  Join optimizations © Hortonworks Inc. 2013.
  12. 12. Stinger Phase 3: Interactive Query In Hadoop Query  27:  Pricing  Analy'cs  using  Star  Schema  Join     Query  82:  Inventory  Analy'cs  Joining  2  Large  Fact  Tables   1400s 190x   Improvement   3200s 200x   Improvement   65s 39s 14.9s 7.2s TPC-­‐DS  Query  27   Hive 10 Hive 0.11 (Phase 1) TPC-­‐DS  Query  82   Trunk (Phase 3) All  Results  at  Scale  Factor  200  (Approximately  200GB  Data)   © Hortonworks Inc. 2013. Page 12
  13. 13. Speed: Delivering Interactive Query Query  Time  in  Seconds   Query  52:  Star  Schema  Join     Query  5:  Star  Schema  Join   41.1s 39.8s 4.2s TPC-­‐DS  Query  52   Hive 0.12 Trunk (Phase 3) © Hortonworks Inc. 2013. 4.1s TPC-­‐DS  Query  55   Test  Cluster:   •  200  GB  Data  (Impala:  Parquet    Hive:  ORCFile)   •  20  Nodes,  24GB  RAM  each,  6x  disk  each    
  14. 14. Speed: Delivering Interactive Query Query  Time  in  Seconds   Query  28:  Vectoriza'on   Query  12:  Complex  join  (M-­‐R-­‐R  pabern)   31s 22s 9.8s TPC-­‐DS  Query  28   Hive 0.12 Trunk (Phase 3) © Hortonworks Inc. 2013. 6.7s TPC-­‐DS  Query  12   Test  Cluster:   •  200  GB  Data  (Impala:  Parquet    Hive:  ORCFile)   •  20  Nodes,  24GB  RAM  each,  6x  disk  each    
  15. 15. AMPLab Big Data Benchmark AMPLab  Query  1:  Simple  Filter  Query   63s 63s 45s 1.6s 2.3s AMPLab  Query  1a   AMPLab  Query  1b   9.4s AMPLab  Query  1c   Query  Time  in  Seconds   (lower  is  beXer)   Hive 0.10 (5 node EC2) Trunk (Phase 3) © Hortonworks Inc. 2013. S:nger  Phase  3  Cluster  Configura:on:   •  AMPLab  Data  Set  (~135  GB  Data)   •  20  Nodes,  24GB  RAM  each,  6x  Disk  each     Page 15
  16. 16. AMPLab Big Data Benchmark AMPLab  Query  2:  Group  By  IP  Block  and  Aggregate   552s 466s 104.3s AMPLab  Query  2a   490s 118.3s AMPLab  Query  2b   172.7s AMPLab  Query  2c   Query  Time  in  Seconds   (lower  is  beXer)   Hive 0.10 (5 node EC2) Trunk (Phase 3) © Hortonworks Inc. 2013. S:nger  Phase  3  Cluster  Configura:on:   •  AMPLab  Data  Set  (~135  GB  Data)   •  20  Nodes,  24GB  RAM  each,  6x  Disk  each     Page 16
  17. 17. AMPLab Big Data Benchmark AMPLab  Query  3:  Correlate  Page  Rankings  and  Revenues  Across  Time   490s 466s 40s AMPLab  Query  3a   145s AMPLab  Query  3b   Query  Time  in  Seconds   (lower  is  beXer)   Hive 0.10 (5 node EC2) Trunk (Phase 3) © Hortonworks Inc. 2013. S:nger  Phase  3  Cluster  Configura:on:   •  AMPLab  Data  Set  (~135  GB  Data)   •  20  Nodes,  24GB  RAM  each,  6x  Disk  each     Page 17
  18. 18. How Stinger Phase 3 Delivers Interactive Query Feature   Tez  Integra:on   Descrip'on   Tez  is  significantly  beXer  engine  than  MapReduce   Benefit   Latency   Vectorized  Query   Take  advantage  of  modern  hardware  by  processing   thousand-­‐row  blocks  rather  than  row-­‐at-­‐a-­‐:me.   Throughput   Query  Planner   Using  extensive  sta:s:cs  now  available  in  Metastore   to  beXer  plan  and  op:mize  query,  including   predicate  pushdown  during  compila:on  to  eliminate   por:ons  of  input  (beyond  par::on  pruning)   Latency   Cost  Based  Op:mizer   Join  re-­‐ordering  and  other  op:miza:ons  based  on   (Op:q)   column  sta:s:cs  including  histograms  etc.   © Hortonworks Inc. 2013. Latency   Page 18
  19. 19. SQL: Enhancing SQL Semantics Hive  SQL  Datatypes   Hive  SQL  Seman'cs   SQL Compliance INT   SELECT,  INSERT   TINYINT/SMALLINT/BIGINT   GROUP  BY,  ORDER  BY,  SORT  BY   BOOLEAN   JOIN  on  explicit  join  key   FLOAT   Inner,  outer,  cross  and  semi  joins   DOUBLE   Sub-­‐queries  in  FROM  clause   Hive 12 provides a wide array of SQL datatypes and semantics so your existing tools integrate more seamlessly with Hadoop STRING   ROLLUP  and  CUBE   TIMESTAMP   UNION   BINARY   Windowing  Func:ons  (OVER,  RANK,  etc)   DECIMAL   Custom  Java  UDFs   ARRAY,  MAP,  STRUCT,  UNION   Standard  Aggrega:on  (SUM,  AVG,  etc.)   DATE   Advanced  UDFs  (ngram,  Xpath,  URL)     VARCHAR   Sub-­‐queries  in  WHERE,  HAVING   CHAR   Expanded  JOIN  Syntax   SQL  Compliant  Security  (GRANT,  etc.)   INSERT/UPDATE/DELETE  (ACID)   © Hortonworks Inc. 2013. Available   Hive  0.12   Roadmap  
  20. 20. ORC File Format • Columnar format for complex data types • Built into Hive from 0.11 • Support for Pig and MapReduce via HCat • Two levels of compression – Lightweight type-specific and generic • Built in indexes – Every 10,000 rows with position information – Min, Max, Sum, Count of each column – Supports seek to row number © Hortonworks Inc. 2013. Page 20
  21. 21. SCALE: Interactive Query at Petabyte Scale Sustained Query Times Smaller Footprint Apache Hive 0.12 provides sustained acceptable query times even at petabyte scale Better encoding with ORC in Apache Hive 0.12 reduces resource requirements for your cluster File  Size  Comparison  Across  Encoding  Methods   Dataset:  TPC-­‐DS  Scale  500  Dataset   585  GB   (Original  Size)   505  GB   (14%  Smaller)   Impala   221  GB   (62%  Smaller)   Hive  12   131  GB   (78%  Smaller)   Encoded  with   Text   © Hortonworks Inc. 2013. Encoded  with   RCFile   Encoded  with   Parquet   Encoded  with   ORCFile   •  Larger Block Sizes •  Columnar format arranges columns adjacent within the file for compression & fast access
  22. 22. ORC File Format • Hive 0.12 – Predicate Push Down – Improved run length encoding – Adaptive string dictionaries – Padding stripes to HDFS block boundaries • Trunk – Stripe-based Input Splits – Input Split elimination – Vectorized Reader – Customized Pig Load and Store functions © Hortonworks Inc. 2013. Page 22
  23. 23. Vectorized Query Execution • Designed for Modern Processor Architectures – Avoid branching in the inner loop. – Make the most use of L1 and L2 cache. • How It Works – Process records in batches of 1,000 rows – Generate code from templates to minimize branching. • What It Gives – 30x improvement in rows processed per second. – Initial prototype: 100M rows/sec on laptop © Hortonworks Inc. 2013. Page 23
  24. 24. HDFS Buffer Cache • Use memory mapped buffers for zero copy – Avoid overhead of going through DataNode – Can mlock the block files into RAM • ORC Reader enhanced for zero-copy reads – New compression interfaces in Hadoop • Vectorization specific reader – Read 1000 rows at a time – Read into Hive’s internal representation © Hortonworks Inc. 2013.
  25. 25. Next Steps • Blog • Stinger Initiative • Stinger Beta: HDP-2.1 Beta, December, 2013 © Hortonworks Inc. 2013.
  26. 26. Thank You! @acmurthy @alanfgates @owen_omalley @hortonworks © Hortonworks Inc. 2013. Confidential and Proprietary.