Apache Hive and Stinger:
SQL in Hadoop
Arun Murthy (@acmurthy)
Alan Gates (@alanfgates)
Owen O’Malley (@owen_omalley)
@hor...
YARN: Taking Hadoop Beyond Batch
Store ALL DATA in one place…
Interact with that data in MULTIPLE WAYS

with Predictable P...
Hadoop Beyond Batch with YARN
A shift from the old to the new…
Single Use System

Multi Use Data Platform

Batch Apps

Bat...
Apache Tez (“Speed”)
•  Replaces MapReduce as primitive for Pig, Hive, Cascading etc.
– Smaller latency for interactive qu...
Tez: Building blocks for scalable data processing
Classical ‘Map’

HDFS	
  
Input	
  

Map	
  
Processor	
  

Classical ‘R...
Hive-on-MR vs. Hive-on-Tez
Tez avoids
unneeded writes to
HDFS

SELECT a.x, AVERAGE(b.y) AS avg
FROM a JOIN b ON (a.id = b....
Tez Sessions
… because Map/Reduce query startup is expensive
• Tez Sessions
– Hot containers ready for immediate use
– Rem...
Tez Delivers Interactive Query - Out of the Box!
Feature	
  

Descrip'on	
  

Benefit	
  

Tez	
  Session	
  

Overcomes	
 ...
Batch AND Interactive SQL-IN-Hadoop
Stinger Initiative

A broad, community-based effort to
drive the next generation of HI...
Hive 0.12
Hive 0.12
Release Theme

Speed, Scale and SQL

Specific Features

•  10x faster query launch when using large nu...
SPEED: Increasing Hive Performance
Interactive Query Times across ALL use cases
•  Simple and advanced queries in seconds
...
Stinger Phase 3: Interactive Query In Hadoop
Query	
  27:	
  Pricing	
  Analy'cs	
  using	
  Star	
  Schema	
  Join	
  	
 ...
Speed: Delivering Interactive Query
Query	
  Time	
  in	
  Seconds	
  

Query	
  52:	
  Star	
  Schema	
  Join	
  	
  
Que...
Speed: Delivering Interactive Query
Query	
  Time	
  in	
  Seconds	
  

Query	
  28:	
  Vectoriza'on	
  
Query	
  12:	
  C...
AMPLab Big Data Benchmark
AMPLab	
  Query	
  1:	
  Simple	
  Filter	
  Query	
  
63s

63s

45s

1.6s

2.3s

AMPLab	
  Quer...
AMPLab Big Data Benchmark
AMPLab	
  Query	
  2:	
  Group	
  By	
  IP	
  Block	
  and	
  Aggregate	
  
552s
466s

104.3s
AM...
AMPLab Big Data Benchmark
AMPLab	
  Query	
  3:	
  Correlate	
  Page	
  Rankings	
  and	
  Revenues	
  Across	
  Time	
  
...
How Stinger Phase 3 Delivers Interactive Query

Feature	
  
Tez	
  Integra:on	
  

Descrip'on	
  
Tez	
  is	
  significantl...
SQL: Enhancing SQL Semantics
Hive	
  SQL	
  Datatypes	
  

Hive	
  SQL	
  Seman'cs	
  

SQL Compliance

INT	
  

SELECT,	
...
ORC File Format
• Columnar format for complex data types
• Built into Hive from 0.11
• Support for Pig and MapReduce via H...
SCALE: Interactive Query at Petabyte Scale
Sustained Query Times

Smaller Footprint

Apache Hive 0.12 provides
sustained a...
ORC File Format
• Hive 0.12
– Predicate Push Down
– Improved run length encoding
– Adaptive string dictionaries
– Padding ...
Vectorized Query Execution
• Designed for Modern Processor Architectures
– Avoid branching in the inner loop.
– Make the m...
HDFS Buffer Cache
• Use memory mapped buffers for zero copy
– Avoid overhead of going through DataNode
– Can mlock the blo...
Next Steps
• Blog
http://hortonworks.com/blog/delivering-on-stinger-a-phase-3-progress-update/

• Stinger Initiative
http:...
Thank You!

@acmurthy
@alanfgates
@owen_omalley
@hortonworks
© Hortonworks Inc. 2013. Confidential and Proprietary.
Upcoming SlideShare
Loading in...5
×

Strata Stinger Talk October 2013

3,950

Published on

Slides from the talk Apache Hive and Stinger, Petabyte scale SQL in Hadoop presented by Arun Murthy, Alan Gates, and Owen O'Malley

Published in: Technology
3 Comments
11 Likes
Statistics
Notes
  • Ok, think I fixed it.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • I'm not sure, there seems to be a problem in translation. It wasn't intentional. If you download the slides you should hopefully be able to see the graphs.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Why are all the slides with performance claims blanked out?
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
3,950
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
76
Comments
3
Likes
11
Embeds 0
No embeds

No notes for slide
  • query 52 star join followed by group/order (different keys), selective filterquery 55 same
  • query 28: 4subquery joinquery 12: star join over range of dates
  • query 1: SELECT pageURL, pageRank FROM rankings WHERE pageRank > X
  • SELECT SUBSTR(sourceIP, 1, X), SUM(adRevenue) FROM uservisits GROUP BYSUBSTR(sourceIP, 1, X)
  • SELECT sourceIP, totalRevenue, avgPageRankFROM (SELECT sourceIP, AVG(pageRank) as avgPageRank, SUM(adRevenue) as totalRevenue FROM Rankings AS R, UserVisits AS UV WHERE R.pageURL = UV.destURL AND UV.visitDate BETWEEN Date(`1980-01-01') AND Date(`X') GROUP BY UV.sourceIP)ORDER BY totalRevenue DESC LIMIT 1
  • With Hive and Stinger we are focused on enabling the SQL ecosystem and to do that we’ve put Hive on a clear roadmap to SQL compliance.That includes adding critical datatypes like character and date types as well as implementing common SQL semantics seen in most databases.
  • Transcript of "Strata Stinger Talk October 2013"

    1. 1. Apache Hive and Stinger: SQL in Hadoop Arun Murthy (@acmurthy) Alan Gates (@alanfgates) Owen O’Malley (@owen_omalley) @hortonworks © Hortonworks Inc. 2013.
    2. 2. YARN: Taking Hadoop Beyond Batch Store ALL DATA in one place… Interact with that data in MULTIPLE WAYS with Predictable Performance and Quality of Service Applica'ons  Run  Na'vely  IN  Hadoop   BATCH   INTERACTIVE   (MapReduce)   (Tez)   ONLINE   (HBase)   STREAMING   (Storm,  S4,…)   GRAPH   (Giraph)   IN-­‐MEMORY   (Spark)   HPC  MPI   (OpenMPI)   OTHER   (Search)   (Weave…)   YARN  (Cluster  Resource  Management)       HDFS2  (Redundant,  Reliable  Storage)   © Hortonworks Inc. 2013. Page 2
    3. 3. Hadoop Beyond Batch with YARN A shift from the old to the new… Single Use System Multi Use Data Platform Batch Apps Batch, Interactive, Online, Streaming, … HADOOP 1 HADOOP 2 MapReduce   (batch)   MapReduce   Tez   (interac:ve)   YARN   Others   (varied)   (cluster  resource  management    &  data  processing)   (opera:ng  system:  cluster  resource  management)   HDFS   HDFS2   (redundant,  reliable  storage)   © Hortonworks Inc. 2013. (redundant,  reliable  storage)  
    4. 4. Apache Tez (“Speed”) •  Replaces MapReduce as primitive for Pig, Hive, Cascading etc. – Smaller latency for interactive queries – Higher throughput for batch queries – 22 contributors: Hortonworks (13), Facebook, Twitter, Yahoo, Microsoft Task with pluggable Input, Processor and Output Input   Processor   Output   Task   Tez Task - <Input, Processor, Output> YARN ApplicationMaster to run DAG of Tez Tasks © Hortonworks Inc. 2013.
    5. 5. Tez: Building blocks for scalable data processing Classical ‘Map’ HDFS   Input   Map   Processor   Classical ‘Reduce’ Sorted   Output   Shuffle   Input   Shuffle   Input   Reduce   Processor   Sorted   Output   Intermediate ‘Reduce’ for Map-Reduce-Reduce © Hortonworks Inc. 2013. Reduce   Processor   HDFS   Output  
    6. 6. Hive-on-MR vs. Hive-on-Tez Tez avoids unneeded writes to HDFS SELECT a.x, AVERAGE(b.y) AS avg FROM a JOIN b ON (a.id = b.id) GROUP BY a UNION SELECT x, AVERAGE(y) AS AVG FROM c GROUP BY x ORDER BY AVG; Hive – MR M M Hive – Tez M SELECT a.state SELECT b.id R R M SELECT a.state, c.itemId M M M R M SELECT b.id R M HDFS JOIN (a, c) SELECT c.price M R M R HDFS R JOIN (a, c) R HDFS JOIN(a, b) GROUP BY a.state COUNT(*) AVERAGE(c.price) © Hortonworks Inc. 2013. M M R M JOIN(a, b) GROUP BY a.state COUNT(*) AVERAGE(c.price) R
    7. 7. Tez Sessions … because Map/Reduce query startup is expensive • Tez Sessions – Hot containers ready for immediate use – Removes task and job launch overhead (~5s – 30s) • Hive – Session launch/shutdown in background (seamless, user not aware) – Submits query plan directly to Tez Session Native Hadoop service, not ad-hoc © Hortonworks Inc. 2013.
    8. 8. Tez Delivers Interactive Query - Out of the Box! Feature   Descrip'on   Benefit   Tez  Session   Overcomes  Map-­‐Reduce  job-­‐launch  latency  by  pre-­‐ launching  Tez  AppMaster   Latency   Tez  Container  Pre-­‐ Launch   Overcomes  Map-­‐Reduce  latency  by  pre-­‐launching   hot  containers  ready  to  serve  queries.   Latency   Finished  maps  and  reduces  pick  up  more  work   Tez  Container  Re-­‐Use   rather  than  exi:ng.  Reduces  latency  and  eliminates   difficult  split-­‐size  tuning.  Out  of  box  performance!   Run:me  re-­‐ Run:me  query  tuning  by  picking  aggrega:on   configura:on  of  DAG   parallelism  using  online  query  sta:s:cs   Tez  In-­‐Memory   Cache   Hot  data  kept  in  RAM  for  fast  access.   Complex  DAGs   Tez  Broadcast  Edge  and  Map-­‐Reduce-­‐Reduce   paXern  improve  query  scale  and  throughput.   © Hortonworks Inc. 2013. Latency   Throughput   Latency   Throughput   Page 8
    9. 9. Batch AND Interactive SQL-IN-Hadoop Stinger Initiative A broad, community-based effort to drive the next generation of HIVE S'nger  Project   (announced  February  2013)   Hive  0.11,  May  2013:   •  Base  Op:miza:ons   •  SQL  Analy:c  Func:ons   •  ORCFile,  Modern  File  Format   Goals: Speed Improve Hive query performance by 100X to allow for interactive query times (seconds) Hive  0.12,  October  2013:   •  •  •  •  The only SQL interface to Hadoop designed for queries that scale from TB to PB SQL   © Hortonworks Inc. 2013. Hive  on  Apache  Tez   Query  Service   Buffer  Cache   Cost  Based  Op:mizer  (Op:q)   Vectorized  Processing       Coming  Soon:   Support broadest range of SQL semantics for analytic applications running against Hadoop …all IN Hadoop VARCHAR,  DATE  Types   ORCFile  predicate  pushdown   Advanced  Op:miza:ons   Performance  Boosts  via  YARN   •  •  •  •  •  Scale
    10. 10. Hive 0.12 Hive 0.12 Release Theme Speed, Scale and SQL Specific Features •  10x faster query launch when using large number (500+) of partitions •  ORCFile predicate pushdown speeds queries •  Evaluate LIMIT on the map side •  Parallel ORDER BY •  New query optimizer •  Introduces VARCHAR and DATE datatypes •  GROUP BY on structs or unions Included Components Apache Hive 0.12 © Hortonworks Inc. 2013.
    11. 11. SPEED: Increasing Hive Performance Interactive Query Times across ALL use cases •  Simple and advanced queries in seconds •  Integrates seamlessly with existing tools •  Currently a >100x improvement in just nine months Performance Improvements included in Hive 12 –  Base & advanced query optimization –  Startup time improvement –  Join optimizations © Hortonworks Inc. 2013.
    12. 12. Stinger Phase 3: Interactive Query In Hadoop Query  27:  Pricing  Analy'cs  using  Star  Schema  Join     Query  82:  Inventory  Analy'cs  Joining  2  Large  Fact  Tables   1400s 190x   Improvement   3200s 200x   Improvement   65s 39s 14.9s 7.2s TPC-­‐DS  Query  27   Hive 10 Hive 0.11 (Phase 1) TPC-­‐DS  Query  82   Trunk (Phase 3) All  Results  at  Scale  Factor  200  (Approximately  200GB  Data)   © Hortonworks Inc. 2013. Page 12
    13. 13. Speed: Delivering Interactive Query Query  Time  in  Seconds   Query  52:  Star  Schema  Join     Query  5:  Star  Schema  Join   41.1s 39.8s 4.2s TPC-­‐DS  Query  52   Hive 0.12 Trunk (Phase 3) © Hortonworks Inc. 2013. 4.1s TPC-­‐DS  Query  55   Test  Cluster:   •  200  GB  Data  (Impala:  Parquet    Hive:  ORCFile)   •  20  Nodes,  24GB  RAM  each,  6x  disk  each    
    14. 14. Speed: Delivering Interactive Query Query  Time  in  Seconds   Query  28:  Vectoriza'on   Query  12:  Complex  join  (M-­‐R-­‐R  pabern)   31s 22s 9.8s TPC-­‐DS  Query  28   Hive 0.12 Trunk (Phase 3) © Hortonworks Inc. 2013. 6.7s TPC-­‐DS  Query  12   Test  Cluster:   •  200  GB  Data  (Impala:  Parquet    Hive:  ORCFile)   •  20  Nodes,  24GB  RAM  each,  6x  disk  each    
    15. 15. AMPLab Big Data Benchmark AMPLab  Query  1:  Simple  Filter  Query   63s 63s 45s 1.6s 2.3s AMPLab  Query  1a   AMPLab  Query  1b   9.4s AMPLab  Query  1c   Query  Time  in  Seconds   (lower  is  beXer)   Hive 0.10 (5 node EC2) Trunk (Phase 3) © Hortonworks Inc. 2013. S:nger  Phase  3  Cluster  Configura:on:   •  AMPLab  Data  Set  (~135  GB  Data)   •  20  Nodes,  24GB  RAM  each,  6x  Disk  each     Page 15
    16. 16. AMPLab Big Data Benchmark AMPLab  Query  2:  Group  By  IP  Block  and  Aggregate   552s 466s 104.3s AMPLab  Query  2a   490s 118.3s AMPLab  Query  2b   172.7s AMPLab  Query  2c   Query  Time  in  Seconds   (lower  is  beXer)   Hive 0.10 (5 node EC2) Trunk (Phase 3) © Hortonworks Inc. 2013. S:nger  Phase  3  Cluster  Configura:on:   •  AMPLab  Data  Set  (~135  GB  Data)   •  20  Nodes,  24GB  RAM  each,  6x  Disk  each     Page 16
    17. 17. AMPLab Big Data Benchmark AMPLab  Query  3:  Correlate  Page  Rankings  and  Revenues  Across  Time   490s 466s 40s AMPLab  Query  3a   145s AMPLab  Query  3b   Query  Time  in  Seconds   (lower  is  beXer)   Hive 0.10 (5 node EC2) Trunk (Phase 3) © Hortonworks Inc. 2013. S:nger  Phase  3  Cluster  Configura:on:   •  AMPLab  Data  Set  (~135  GB  Data)   •  20  Nodes,  24GB  RAM  each,  6x  Disk  each     Page 17
    18. 18. How Stinger Phase 3 Delivers Interactive Query Feature   Tez  Integra:on   Descrip'on   Tez  is  significantly  beXer  engine  than  MapReduce   Benefit   Latency   Vectorized  Query   Take  advantage  of  modern  hardware  by  processing   thousand-­‐row  blocks  rather  than  row-­‐at-­‐a-­‐:me.   Throughput   Query  Planner   Using  extensive  sta:s:cs  now  available  in  Metastore   to  beXer  plan  and  op:mize  query,  including   predicate  pushdown  during  compila:on  to  eliminate   por:ons  of  input  (beyond  par::on  pruning)   Latency   Cost  Based  Op:mizer   Join  re-­‐ordering  and  other  op:miza:ons  based  on   (Op:q)   column  sta:s:cs  including  histograms  etc.   © Hortonworks Inc. 2013. Latency   Page 18
    19. 19. SQL: Enhancing SQL Semantics Hive  SQL  Datatypes   Hive  SQL  Seman'cs   SQL Compliance INT   SELECT,  INSERT   TINYINT/SMALLINT/BIGINT   GROUP  BY,  ORDER  BY,  SORT  BY   BOOLEAN   JOIN  on  explicit  join  key   FLOAT   Inner,  outer,  cross  and  semi  joins   DOUBLE   Sub-­‐queries  in  FROM  clause   Hive 12 provides a wide array of SQL datatypes and semantics so your existing tools integrate more seamlessly with Hadoop STRING   ROLLUP  and  CUBE   TIMESTAMP   UNION   BINARY   Windowing  Func:ons  (OVER,  RANK,  etc)   DECIMAL   Custom  Java  UDFs   ARRAY,  MAP,  STRUCT,  UNION   Standard  Aggrega:on  (SUM,  AVG,  etc.)   DATE   Advanced  UDFs  (ngram,  Xpath,  URL)     VARCHAR   Sub-­‐queries  in  WHERE,  HAVING   CHAR   Expanded  JOIN  Syntax   SQL  Compliant  Security  (GRANT,  etc.)   INSERT/UPDATE/DELETE  (ACID)   © Hortonworks Inc. 2013. Available   Hive  0.12   Roadmap  
    20. 20. ORC File Format • Columnar format for complex data types • Built into Hive from 0.11 • Support for Pig and MapReduce via HCat • Two levels of compression – Lightweight type-specific and generic • Built in indexes – Every 10,000 rows with position information – Min, Max, Sum, Count of each column – Supports seek to row number © Hortonworks Inc. 2013. Page 20
    21. 21. SCALE: Interactive Query at Petabyte Scale Sustained Query Times Smaller Footprint Apache Hive 0.12 provides sustained acceptable query times even at petabyte scale Better encoding with ORC in Apache Hive 0.12 reduces resource requirements for your cluster File  Size  Comparison  Across  Encoding  Methods   Dataset:  TPC-­‐DS  Scale  500  Dataset   585  GB   (Original  Size)   505  GB   (14%  Smaller)   Impala   221  GB   (62%  Smaller)   Hive  12   131  GB   (78%  Smaller)   Encoded  with   Text   © Hortonworks Inc. 2013. Encoded  with   RCFile   Encoded  with   Parquet   Encoded  with   ORCFile   •  Larger Block Sizes •  Columnar format arranges columns adjacent within the file for compression & fast access
    22. 22. ORC File Format • Hive 0.12 – Predicate Push Down – Improved run length encoding – Adaptive string dictionaries – Padding stripes to HDFS block boundaries • Trunk – Stripe-based Input Splits – Input Split elimination – Vectorized Reader – Customized Pig Load and Store functions © Hortonworks Inc. 2013. Page 22
    23. 23. Vectorized Query Execution • Designed for Modern Processor Architectures – Avoid branching in the inner loop. – Make the most use of L1 and L2 cache. • How It Works – Process records in batches of 1,000 rows – Generate code from templates to minimize branching. • What It Gives – 30x improvement in rows processed per second. – Initial prototype: 100M rows/sec on laptop © Hortonworks Inc. 2013. Page 23
    24. 24. HDFS Buffer Cache • Use memory mapped buffers for zero copy – Avoid overhead of going through DataNode – Can mlock the block files into RAM • ORC Reader enhanced for zero-copy reads – New compression interfaces in Hadoop • Vectorization specific reader – Read 1000 rows at a time – Read into Hive’s internal representation © Hortonworks Inc. 2013.
    25. 25. Next Steps • Blog http://hortonworks.com/blog/delivering-on-stinger-a-phase-3-progress-update/ • Stinger Initiative http://hortonworks.com/labs/stinger/ • Stinger Beta: HDP-2.1 Beta, December, 2013 © Hortonworks Inc. 2013.
    26. 26. Thank You! @acmurthy @alanfgates @owen_omalley @hortonworks © Hortonworks Inc. 2013. Confidential and Proprietary.
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×