Innovations In Apache Hadoop MapReduce,Pig and Hive for improving queryperformancegopalv@apache.orgvinodkv@apache.org     ...
© Hortonworks Inc. 2013
Operation Stinger     © Hortonworks Inc. 2013   Page 3
Performance at any cost        © Hortonworks Inc. 2013
Performance at any cost• Scalability   – Already works great, just don’t break it for performance gains• Isolation + Secur...
First things first• How far can we push Hive as it exists today?                      © Hortonworks Inc. 2013
Benchmark spec• The TPC-DS benchmark data+query set• Query 27 (big joins small)  – For all items sold in stores located in...
TL;DR - II• TPC-DS Query 82, Scale=200, 10 EC2 nodes (40 disks)3500         3257.6923000                2862.66925002000  ...
Forget the actual benchmark• First of all, YMMV  – Software  – Hardware  – Setup  – Tuning• Text formats seem to be the st...
What did the trick?• Mapreduce?• HDFS?• Or is it just Hive?                        © Hortonworks Inc. 2013
Optional Advice    © Hortonworks Inc. 2013
RCFile• Binary RCFiles• Hive pushes down column projections• Less I/O, Less CPU• Smaller files                     © Horto...
Data organization• No data system at scale is loaded once & left alone• Partitions are essential• Data flows into new part...
A closer look• Now revisiting the benchmark and its results                      © Hortonworks Inc. 2013
Query27 - BeforeStage-3                                                                                16Stage-2          ...
Before© Hortonworks Inc. 2013
Query 27 - After                                        Time Stage-9                                    33Stage-10       5...
After© Hortonworks Inc. 2013
Query 82 - BeforeStage-3                                                                         17Stage-2                ...
Query 82 - AfterStage-1                             71          0   10    20   30                  40    50   60   70   80...
What changed?• Job Count/Correct plan• Correct data formats• Correct data organization• Correct configuration             ...
© Hortonworks Inc. 2013
Is that all?• NO!• In Hive   – Metastore   – RCFile issues   – CPU intensive code• In YARN+MR   – Parallelism   – Spin-up ...
In Hive• NO!• In Hive   – Metastore   – RCFile issues   – CPU intensive code• In YARN+MR   – Parallelism   – Spin-up times...
In Hive• NO!• In Hive   – Metastore   – RCFile issues   – CPU intensive code• In YARN+MR   – Parallelism   – Spin-up times...
Hive Metastore• 1+N Select problem  – SELECT partitions FROM tables;  – /* for each needed partition */ SELECT * FROM Part...
In Hive• NO!• In Hive   – Metastore   – RCFile issues   – CPU intensive code• In YARN+MR   – Parallelism   – Spin-up times...
RCFile issues• RCFiles do not split well   – Row groups and row group boundaries• Small row groups vs big row groups   – S...
ORC file format• A single file as output of each task.  – Dramatically simplifies integration with Hive  – Lowers pressure...
In Hive• NO!• In Hive   – Metastore   – RCFile issues   – CPU intensive code• In YARN+MR   – Parallelism   – Spin-up times...
CPU intensive code      © Hortonworks Inc. 2013
CPU intensive code• Hive query engine processes one row at a time   – Very inefficient in terms of CPU usage• Lazy deseria...
Tighten your loops     © Hortonworks Inc. 2013
Vectorization to the rescue• Process a row batch at a time instead of a single row• Row batch to consist of column vectors...
Vectorization: Prelim results• Functionality   – Some arithmetic operators and filters using primitive type columns   – Ha...
In YARN+MR• NO!• In Hive   – Metastore   – RCFile issues   – CPU intensive code• In YARN+MR   – Data locality   – Parallel...
In YARN+MR• NO!• In Hive   – Metastore   – RCFile issues   – CPU intensive code• In YARN+MR   – Data locality   – Parallel...
Data Locality• CombineInputFormat• AM interaction with locality• Short-circuit reads!• Delay scheduling   – Good for throu...
In YARN+MR• NO!• In Hive   – Metastore   – RCFile issues   – CPU intensive code• In YARN+MR   – Data locality   – Parallel...
Parallelism• Can tune it (to some extent)   – Controlling splits/reducer count• Hive doesn’t know dynamic cluster status  ...
In YARN+MR• NO!• In Hive   – Metastore   – RCFile issues   – CPU intensive code• In YARN+MR   – Data locality   – Parallel...
Spin up times• AM startup costs• Task startup costs• Multiple waves of map tasks                       © Hortonworks Inc. ...
Apache Tez• Generic DAG workflow• Container re-use• AM pool service                         © Hortonworks Inc. 2013
AM Pool Service• Pre-launches a pool of AMs• Jobs submitted to these pre-launched AMs  – Saves 3-5 seconds• Pre-launched A...
Container reuse• Tez MapReduce AM supports Container reuse• Launched JVMs are re-used between tasks  – about 4-5 seconds s...
In HDFS• NO!• In Hive   – Metastore   – RCFile issues   – CPU intensive code• In YARN+MR   – Data locality   – Parallelism...
Speculation/bad disks• No cluster remains at 100% forever• Bad disks cause latency issues  – Speculation is one defense, b...
General guidelines• Benchmarking  – Be wary of benchmarks! Including ours!  – Algebra with X                            © ...
General guidelines contd.• Benchmarks: To repeat, YMMV.• Benchmark *your* use-case.• Decide your problem size   – If (smal...
Related talks• “Optimizing Hive Queries” by Owen O’Malley• “What’s New and What’s Next in Apache Hive” by Gunther  Hagleit...
Credits• Arun C Murthy• Bikas Saha• Gopal Vijayaraghavan• Hitesh Shah• Siddharth Seth• Vinod Kumar Vavilapalli• Alan Gates...
Q&A• Thanks!            © Hortonworks Inc. 2013
Innovations in Apache Hadoop MapReduce Pig Hive for Improving Query Performance
Innovations in Apache Hadoop MapReduce Pig Hive for Improving Query Performance
Upcoming SlideShare
Loading in...5
×

Innovations in Apache Hadoop MapReduce Pig Hive for Improving Query Performance

3,835

Published on

Apache Hadoop and its ecosystem projects Hive and Pig support interactions with data sets of enormous sizes. Hadoop always excelled at large-scale data processing; however, running smaller queries has been problematic due to the batch-oriented nature of the system. This talk will cover the enhancements we have made to YARN, MapReduce, Pig and Hive. We will also walk through the future enhancements we have planned.

Published in: Technology
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,835
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
0
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide
  • Since the time we started this, we’ve seen multiple people benchmark hive comparing its text format processors against alternatives
  • Notmapreduce, not hdfs, just plain hive
  • Layers of inspectors that identify column type, de-serialize data and determine appropriate expression routines in the inner loop
  • I wrote all of the code and Jitendra was just consulting :P
  • Innovations in Apache Hadoop MapReduce Pig Hive for Improving Query Performance

    1. 1. Innovations In Apache Hadoop MapReduce,Pig and Hive for improving queryperformancegopalv@apache.orgvinodkv@apache.org Page 1
    2. 2. © Hortonworks Inc. 2013
    3. 3. Operation Stinger © Hortonworks Inc. 2013 Page 3
    4. 4. Performance at any cost © Hortonworks Inc. 2013
    5. 5. Performance at any cost• Scalability – Already works great, just don’t break it for performance gains• Isolation + Security – Queries between different users run as different users• Fault tolerance – Keep all of MR’s safety nets to work around bad nodes in clusters• UDFs – Make sure they are “User” defined and not “Admin” defined © Hortonworks Inc. 2013
    6. 6. First things first• How far can we push Hive as it exists today? © Hortonworks Inc. 2013
    7. 7. Benchmark spec• The TPC-DS benchmark data+query set• Query 27 (big joins small) – For all items sold in stores located in specified states during a given year, find the average quantity, average list price, average list sales price, average coupon amount for a given gender, marital status, education and customer demographic.• Query 82 (big joins big) – List all items and current prices sold through the store channel from certain manufacturers in a given price range and consistently had a quantity between 100 and 500 on hand in a 60-day period. © Hortonworks Inc. 2013
    8. 8. TL;DR - II• TPC-DS Query 82, Scale=200, 10 EC2 nodes (40 disks)3500 3257.6923000 2862.66925002000 Text RCFile1500 Partitioned RCFile Partitioned RCFile + Optimizations1000500 255.641 71.114 0 Query 82 © Hortonworks Inc. 2013
    9. 9. Forget the actual benchmark• First of all, YMMV – Software – Hardware – Setup – Tuning• Text formats seem to be the staple of all comparisons – Really? – Everybody’s using it but only for benchmarks! © Hortonworks Inc. 2013
    10. 10. What did the trick?• Mapreduce?• HDFS?• Or is it just Hive? © Hortonworks Inc. 2013
    11. 11. Optional Advice © Hortonworks Inc. 2013
    12. 12. RCFile• Binary RCFiles• Hive pushes down column projections• Less I/O, Less CPU• Smaller files © Hortonworks Inc. 2013
    13. 13. Data organization• No data system at scale is loaded once & left alone• Partitions are essential• Data flows into new partitions every day © Hortonworks Inc. 2013
    14. 14. A closer look• Now revisiting the benchmark and its results © Hortonworks Inc. 2013
    15. 15. Query27 - BeforeStage-3 16Stage-2 17Stage-1 49Stage-6 355Stage-5 512Stage-4 553 0 200 400 600 800 1000 1200 1400 1600 © Hortonworks Inc. 2013
    16. 16. Before© Hortonworks Inc. 2013
    17. 17. Query 27 - After Time Stage-9 33Stage-10 5 0 5 10 15 20 25 30 35 40 © Hortonworks Inc. 2013
    18. 18. After© Hortonworks Inc. 2013
    19. 19. Query 82 - BeforeStage-3 17Stage-2 17 Start TimeStage-1 2199Stage-4 1025 0 500 1000 1500 2000 2500 3000 3500 © Hortonworks Inc. 2013
    20. 20. Query 82 - AfterStage-1 71 0 10 20 30 40 50 60 70 80 © Hortonworks Inc. 2013
    21. 21. What changed?• Job Count/Correct plan• Correct data formats• Correct data organization• Correct configuration © Hortonworks Inc. 2013
    22. 22. © Hortonworks Inc. 2013
    23. 23. Is that all?• NO!• In Hive – Metastore – RCFile issues – CPU intensive code• In YARN+MR – Parallelism – Spin-up times – Data locality• In HDFS – Bad disks/deteriorating nodes © Hortonworks Inc. 2013
    24. 24. In Hive• NO!• In Hive – Metastore – RCFile issues – CPU intensive code• In YARN+MR – Parallelism – Spin-up times – Data locality• In HDFS – Bad disks/deteriorating nodes © Hortonworks Inc. 2013
    25. 25. In Hive• NO!• In Hive – Metastore – RCFile issues – CPU intensive code• In YARN+MR – Parallelism – Spin-up times – Data locality• In HDFS – Bad disks/deteriorating nodes © Hortonworks Inc. 2013
    26. 26. Hive Metastore• 1+N Select problem – SELECT partitions FROM tables; – /* for each needed partition */ SELECT * FROM Partition .. – For query 27 , generates > 5000 queries! 4-5 seconds lost on each call! – Lazy loading or Include/Join are general solutions• Datanucleus/ORM issues – 100K NPEs try.. Catch.. Ignore..• Metastore DB Schema revisit – Denormalize some/all of it? © Hortonworks Inc. 2013
    27. 27. In Hive• NO!• In Hive – Metastore – RCFile issues – CPU intensive code• In YARN+MR – Parallelism – Spin-up times – Data locality• In HDFS – Bad disks/deteriorating nodes © Hortonworks Inc. 2013
    28. 28. RCFile issues• RCFiles do not split well – Row groups and row group boundaries• Small row groups vs big row groups – Sync() vs min split – Storage packing• Run-length information is lost – Unnecessary deserialization costs © Hortonworks Inc. 2013
    29. 29. ORC file format• A single file as output of each task. – Dramatically simplifies integration with Hive – Lowers pressure on the NameNode• Support for the Hive type model – Complex types (struct, list, map, union) – New types (datetime, decimal) – Encoding specific to the column type• Split files without scanning for markers• Bound the amount of memory required for reading or writing. © Hortonworks Inc. 2013
    30. 30. In Hive• NO!• In Hive – Metastore – RCFile issues – CPU intensive code• In YARN+MR – Parallelism – Spin-up times – Data locality• In HDFS – Bad disks/deteriorating nodes © Hortonworks Inc. 2013
    31. 31. CPU intensive code © Hortonworks Inc. 2013
    32. 32. CPU intensive code• Hive query engine processes one row at a time – Very inefficient in terms of CPU usage• Lazy deserialization: layers• Object inspector calls• Lots of virtual method calls © Hortonworks Inc. 2013
    33. 33. Tighten your loops © Hortonworks Inc. 2013
    34. 34. Vectorization to the rescue• Process a row batch at a time instead of a single row• Row batch to consist of column vectors – The column vector will consist of array(s) of primitive types as far as possible• Each operator will process the whole column vector at a time• File formats to give out vectorized batches for processing• Underlying research promises – Better instruction pipelines and cache usage – Mechanical sympathy © Hortonworks Inc. 2013
    35. 35. Vectorization: Prelim results• Functionality – Some arithmetic operators and filters using primitive type columns – Have a basic integration benchmark to prove that the whole setup works• Performance – Micro benchmark – More than 30x improvement in the CPU time – Disclaimer: – Micro benchmark! – Include io or deserialization costs or complex and string datatypes © Hortonworks Inc. 2013
    36. 36. In YARN+MR• NO!• In Hive – Metastore – RCFile issues – CPU intensive code• In YARN+MR – Data locality – Parallelism – Spin-up times• In HDFS – Bad disks/deteriorating nodes © Hortonworks Inc. 2013
    37. 37. In YARN+MR• NO!• In Hive – Metastore – RCFile issues – CPU intensive code• In YARN+MR – Data locality – Parallelism – Spin-up times• In HDFS – Bad disks/deteriorating nodes © Hortonworks Inc. 2013
    38. 38. Data Locality• CombineInputFormat• AM interaction with locality• Short-circuit reads!• Delay scheduling – Good for throughput – Bad for latency © Hortonworks Inc. 2013
    39. 39. In YARN+MR• NO!• In Hive – Metastore – RCFile issues – CPU intensive code• In YARN+MR – Data locality – Parallelism – Spin-up times• In HDFS – Bad disks/deteriorating nodes © Hortonworks Inc. 2013
    40. 40. Parallelism• Can tune it (to some extent) – Controlling splits/reducer count• Hive doesn’t know dynamic cluster status – Benchmarks max out clusters, real jobs may or may not• Hive does not let you control parallelism – particularly in case of multiple jobs in a query © Hortonworks Inc. 2013
    41. 41. In YARN+MR• NO!• In Hive – Metastore – RCFile issues – CPU intensive code• In YARN+MR – Data locality – Parallelism – Spin-up times• In HDFS – Bad disks/deteriorating nodes © Hortonworks Inc. 2013
    42. 42. Spin up times• AM startup costs• Task startup costs• Multiple waves of map tasks © Hortonworks Inc. 2013
    43. 43. Apache Tez• Generic DAG workflow• Container re-use• AM pool service © Hortonworks Inc. 2013
    44. 44. AM Pool Service• Pre-launches a pool of AMs• Jobs submitted to these pre-launched AMs – Saves 3-5 seconds• Pre-launched AMs can pre-allocate containers• Tasks can be started as soon as the job is submitted – Saves 2-3 seconds © Hortonworks Inc. 2013
    45. 45. Container reuse• Tez MapReduce AM supports Container reuse• Launched JVMs are re-used between tasks – about 4-5 seconds saved in case of multiple waves• Allows future enhancements – re-using task data structures across splits © Hortonworks Inc. 2013
    46. 46. In HDFS• NO!• In Hive – Metastore – RCFile issues – CPU intensive code• In YARN+MR – Data locality – Parallelism – Spin-up times• In HDFS – Bad disks/deteriorating nodes © Hortonworks Inc. 2013
    47. 47. Speculation/bad disks• No cluster remains at 100% forever• Bad disks cause latency issues – Speculation is one defense, but it is not enough – Fault tolerance is a safety net• Possible solutions: – More feedback from HDFS about stale nodes, bad/slow disks – Volume scheduling © Hortonworks Inc. 2013
    48. 48. General guidelines• Benchmarking – Be wary of benchmarks! Including ours! – Algebra with X © Hortonworks Inc. 2013
    49. 49. General guidelines contd.• Benchmarks: To repeat, YMMV.• Benchmark *your* use-case.• Decide your problem size – If (smallData) { Mysql/Postgres/Your smart phone } else { – Make it work – Make it scale – Make it faster }• If it is (seems to be) slow, file a bug, spend a little time!• Replacing systems without understanding them – Is an easy way to have an illusion of progress © Hortonworks Inc. 2013
    50. 50. Related talks• “Optimizing Hive Queries” by Owen O’Malley• “What’s New and What’s Next in Apache Hive” by Gunther Hagleitner © Hortonworks Inc. 2013
    51. 51. Credits• Arun C Murthy• Bikas Saha• Gopal Vijayaraghavan• Hitesh Shah• Siddharth Seth• Vinod Kumar Vavilapalli• Alan Gates• Ashutosh Chauhan• Vikram Dixit• Gunther Hagleitner• Owen O’Malley• Jintendranath Pandey• Yahoo!, Facebook, Twitter, SAP and Microsoft all contributing. © Hortonworks Inc. 2013
    52. 52. Q&A• Thanks! © Hortonworks Inc. 2013

    ×