June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale
Upcoming SlideShare
Loading in...5
×
 

June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

on

  • 461 views

June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale

Statistics

Views

Total Views
461
Views on SlideShare
456
Embed Views
5

Actions

Likes
2
Downloads
10
Comments
0

1 Embed 5

http://www.slideee.com 5

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Gopal was supposed to be presenting this with me, to talk about Tez. Point to Gopal/Jitendra’s talk on Hive/Tez for details on things I’ll have to skim over. <br /> Also, acknowledge Thomas Graves, who’s talking today about the excellent work he’s doing on driving Spark on Yarn.
  • There are several sides to query latency: <br /> <br /> Query execution time : Addressed in the physical query-execution layer. <br /> Query optimizations: The first step while optimizing the query plan seems to be to query for all partition instances. Very expensive for “Project Benzene”. <br /> Bad queries : Tableau, I’m looking at you. <br />
  • The Transaction Processing Performance Council (inexplicably abbreviated to TPC) suggests a set of benchmarks for query processing. Many have adopted TPC-DS to showcase performance. We chose TPC-h to complement. (Also, 22 much smaller number to deal with than… 90?) <br /> <br /> Transliteration: Evita and Kylie Minogue <br />
  • Lineitem and Orders are extremely large Fact tables. Nation and Region are the smallest dimension tables.
  • Tangent: Funny story: <br /> <br /> 1. About hard-drives: Can set up MR intermediate directories and HDFS data-node directories to be on different disks. Traffic from one doesn’t affect the other. But on the other hand, total read bandwidth might be reduced. <br />
  • Line-item: Partitioned on Ship-date. <br /> Orders: Order-date <br /> Customers: By market-segment <br /> Suppliers: On their region-key.
  • Q5 and q21 are anomalous. <br /> <br /> Q21: Hit a trailing reducer across all versions of Hive tested. Perhaps this can be improved with a better plan. <br /> Q5: Slow reducer that hit only Hive 13. Could be a bad plan. Could be a difference in data distribution when data was regenerated for Hadoop 2 cluster.
  • Tez : Scheduling. Playing the gaps, like Beethoven’s Fifth.
  • Vectorization: On average: 1.2x.
  • Except for a few outliers, ZLIB compression actually reduced performance for a 1TB dataset. Uncompressed was 1.3x faster than Compressed.
  • The situation reverses at the 10 TB level. The gains from decompression are actually offset by the disk-read time. <br /> <br /> The long-tail in 10TB/q21 threw the scale of the graph off, so I’ve excluded it in the results.
  • Talk about file-coalesce, small-file generation, Namenode pressure and parallelism. <br /> <br /> You don’t want to read an ORC stripe from a different node. <br /> <br /> Talk about distcp –pgrub, for ORC files. <br /> <br /> Mention that SNAPPY’s license is not Apache. <br /> <br /> Also, Yoda.
  • At 100 nodes, it performs at 0.9x the 350 node performance.
  • We’ve seen Hive and Tez scale down for latency, scale up for data-size, and scale out across larger clusters.
  • Familiarity : We have an existing ecosystem with Hive, HCatalog, Pig and Oozie that delivers revenue to Yahoo today. It’s hard to rock the boat. <br /> <br /> Community: The Apache Hive community is large, active and thriving. They’ve been solving issues with query latency for ages now. The switch to using the Tez execution engine was a solution within the Apache Hive project. This wasn’t a fork of Hive. This is Hive, proper. <br /> <br /> Scale: We’ve seen Hive and Tez perform at scale. Heck, we’ve seen Pig perform on Tez. <br /> <br /> Multitenant: Yahoo’s use-case is unique, and not just because of data-scale. There’s hundreds of active users and genuine multitenancy and security concerns. <br /> <br /> Design: We think the Hive community has tackled the right problems first, rather than throw RAM at the problem. <br />
  • Bucky Lasek at the X-Games in 2001. Notice where he’s looking… Not at the camera, but setting up his next trick.
  • Security: Kerberos support was patched in, after the benchmarks were run. <br /> Multi-tenancy: <br /> Data needs to be explicitly pinned into memory as RDDs. <br /> In a multi-tenant system, how would pinning work? Eviction policy for data. <br /> Compatibility: <br /> Needs to work with Metastore versions 12 and 13. Shark’s gone to 0.11 just recently. <br /> Integration with the rest of the stack: Oozie and Pig. <br /> <br /> Overall, we wanted a solution that works with high-dynamic range. i.e. works well with small datasets (100s of GBs), as well as scale to multi-terabyte datasets. We have a familiar system that seems to fit that bill. It doesn’t quite rock the boat. It’s not perfect yet. There are bugs that we’re working on. And we still haven’t solved the problem of data-volume/BI. <br /> <br /> By the way, I really like the idea of BlinkDB. I saw the JIRA. <br /> <br />

June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale June 2014 HUG : Hive On Tez - Benchmarked at Yahoo Scale Presentation Transcript

  • Benchmarking Hive at Yahoo Scale P R E S E N T E D B Y M i t h u n R a d h a k r i s h n a n ⎪ J u n e 1 8 , 2 0 1 4 H a d o o p U s e r G r o u p
  • About myself 2  HCatalog Committer, Hive contributor › Metastore, Notifications, HCatalog APIs › Integration with Oozie, Data Ingestion  Other odds and ends › DistCp  mithun@apache.org Hadoop User Group, 201406181830, Yahoo Sunnyvale
  • About this talk 3  Introduction to “Yahoo Scale”  The use-case in Yahoo  The Benchmark  The Setup  The Observations (and, possibly, lessons)  Fisticuffs Hadoop User Group, 201406181830, Yahoo Sunnyvale
  • The Y!Grid 4  16 Hadoop Clusters in YGrid › 32500 Nodes › 750K jobs a day  Hadoop 0.23.10.x, 2.4.x  Large Datasets › Daily, hourly, minute-level frequencies › Terabytes of data, 1000s of files, per dataset instance  Pig 0.11  Hive 0.10 / HCatalog 0.5 › => Hive 0.12 Hadoop User Group, 201406181830, Yahoo Sunnyvale
  • Data Processing Use cases 5 Hadoop User Group, 201406181830, Yahoo Sunnyvale  Pig for Data Pipelines › Imperative paradigm › ~45% Hadoop Jobs on Production Clusters • M/R + Oozie = 41%  Hive for Ad hoc queries › SQL › Relatively smaller number of jobs • *Major* Uptick  Use HCatalog for Inter-op
  • 6 Yahoo Confidential & Proprietary Hive is Currently the Fastest Growing Product on the Grid 0.0% 1.0% 2.0% 3.0% 4.0% 5.0% 6.0% 7.0% 8.0% 9.0% 10.0% 0 5 10 15 20 25 30 Mar-13 Apr-13 May-13 Jun-13 Jul-13 Aug-13 Sep-13 Oct-13 Nov-13 Dec-13 Jan-14 Feb-14 Mar-14 Apr-14 May-14 HiveJobs(%ofAllJobs) AllGridJobs(inMillions) All Jobs Hive (% of all jobs) 2.4 million Hive jobs
  • Business Intelligence Tools 7  {Tableau, MicroStrategy, Excel, … }  Challenges: › Security • ACLs, Authentication, Encryption over the wire, Full-disk Encryption › Bandwidth • Transporting results over ODBC › Query Latency • Query execution time • Cost of query “optimizations” • “Bad” queries Hadoop User Group, 201406181830, Yahoo Sunnyvale
  • The Benchmark 8  TPC-h › Industry standard (tpc.org/tpch) › 22 queries › dbgen –s 1000 –S 3 • Parallelizable  Reynold Xin’s excellent work: › https://github.com/rxin › Transliterated queries to suit Hive 0.9 Hadoop User Group, 201406181830, Yahoo Sunnyvale
  • Relational Diagram 9 Hadoop User Group, 201406181830, Yahoo Sunnyvale PARTKEY NAME MFGR BRAND TYPE SIZE CONTAINER COMMENT RETAILPRICE PARTKEY SUPPKEY AVAILQTY SUPPLYCOST COMMENT SUPPKEY NAME ADDRESS NATIONKEY PHONE ACCTBAL COMMENT ORDERKEY PARTKEY SUPPKEY LINENUMBER RETURNFLAG LINESTATUS SHIPDATE COMMITDATE RECEIPTDATE SHIPINSTRUCT SHIPMODE COMMENT CUSTKEY ORDERSTATUS TOTALPRICE ORDERDATE ORDER- PRIORITY SHIP- PRIORITY CLERK COMMENT CUSTKEY NAME ADDRESS PHONE ACCTBAL MKTSEGMENT COMMENT PART (P_) SF*200,000 PARTSUPP (PS_) SF*800,000 LINEITEM (L_) SF*6,000,000 ORDERS (O_) SF*1,500,000 CUSTOMER (C_) SF*150,000 SUPPLIER (S_) SF*10,000 ORDERKEY NATIONKEY EXTENDEDPRICE DISCOUNT TAX QUANTITY NATIONKEY NAME REGIONKEY NATION (N_) 25 COMMENT REGIONKEY NAME COMMENT REGION (R_) 5
  • The Setup 10 › 350 Node cluster • Xeon boxen: 2 Slots with E5530s => 16 CPUs • 24GB memory – NUMA enabled • 6 SATA drives, 2TB, 7200 RPM Seagates • RHEL 6.4 • JRE 1.7 (-d64) • Hadoop 0.23.7+/2.3+, Security turned off • Tez 0.3.x • 128MB HDFS block-size › Downscale tests: 100 Node cluster • hdfs-balancer.sh Hadoop User Group, 201406181830, Yahoo Sunnyvale
  • The Prep 11  Data generation: › Text data: dbgen on MapReduce › Transcode to RCFile and ORC: Hive on MR • insert overwrite table orc_table partition( … ) select * from text_table; › Partitioning: • Only for 1TB, 10TB cases • Perils of dynamic partitioning › ORC File: • 64MB stripes, ZLIB Compression Hadoop User Group, 201406181830, Yahoo Sunnyvale
  • Observations
  • 13 Hadoop User Group, 201406181830, Yahoo Sunnyvale 0 500 1000 1500 2000 2500 q1_pricing_summary_report.hive q2_minimum_cost_supplier.hiveq3_shipping_priority.hive q4_order_priority q5_local_supplier_volume.hive q6_forecast_revenue_change.hiveq7_volume_shipping.hive q8_na onal_market_share.hive q9_product_type_profit.hiveq10_returned_item.hiveq11_important_stock.hive q12_shipping.hive q13_customer_distribu on.hive q14_promo on_effect.hive q15_top_supplier.hive q16_parts_supplier_rela onship.hive q17_small_quan ty_order_revenue.hive q18_large_volume_customer.hive q19_discounted_revenue.hive q20_poten al_part_promo on.hive q21_suppliers_who_kept_orders_waing.hive q22_global_sales_opportunity.hive Time(inseconds) TPC-h 100GB Hive 0.10 (Text) Hive 0.10 RCFile Hive 0.11 ORC Hive 0.13 ORC MR Hive 0.13 ORC Tez
  • 100 GB 14 › 18x speedup over Hive 0.10 (Textfile) • 6-50x › 11.8x speedup over Hive 0.10 (RCFile) • 5-30x › Average query time: 28 seconds • Down from 530 (Hive 0.10 Text) › 85% queries completed in under a minute Hadoop User Group, 201406181830, Yahoo Sunnyvale
  • 15 Hadoop User Group, 201406181830, Yahoo Sunnyvale 0 500 1000 1500 2000 2500 q1_pricing_summary_report.hive q2_minimum_cost_supplier.hive q3_shipping_priority.hive q4_order_priority q5_local_supplier_volume.hive q6_forecast_revenue_change.hive q7_volume_shipping.hive q8_na onal_market_share.hive q9_product_type_profit.hiveq10_returned_item.hive q11_important_stock.hive q12_shipping.hive q13_customer_distribu on.hive q14_promo on_effect.hiveq15_top_supplier.hive q16_parts_supplier_rela onship.hive q17_small_quan ty_order_revenue.hive q18_large_volume_customer.hive q19_discounted_revenue.hive q20_poten al_part_promo on.hive q21_suppliers_who_kept_orders_waing.hive q22_global_sales_opportunity.hive Time(inseconds) TPC-h 1TB Hive 0.10 RC File Hive 0.11 ORC Hive 0.12 ORC Hive 0.13 ORC MR Hive 0.13 ORC Tez
  • 1 TB 16 › 6.2x speedup over Hive 0.10 (RCFile) • Between 2.5-17x › Average query time: 172 seconds • Between 5-947 seconds • Down from 729 seconds (Hive 0.10 RCFile) › 61% queries completed in under 2 minutes › 81% queries completed in under 4 minutes Hadoop User Group, 201406181830, Yahoo Sunnyvale
  • 17 Hadoop User Group, 201406181830, Yahoo Sunnyvale 0 2000 4000 6000 8000 10000 12000 q1_pricing_summary_report.hiveq2_minim um_cost_supplier.hive q3_shipping_priority.hive q4_order_priorityq5_local_supplier_volume.hive q6_forecast_revenue_change.hive q7_volume_shipping.hiveq8_na onal_market_share.hive q9_product_type_profit.hive q10_returned_item.hive q11_im portant_stock.hive q12_shipping.hiveq13_customer_distribu on.hive q14_promo on_effect.hive q15_top_supplier.hive q16_parts_supplier_rela onship.hive q17_small_quan ty_order_revenue.hiveq18_large_volume_customer.hiveq19_discounted_revenue.hive q20_poten al_part_promo on.hive q21_suppliers_who_kept_orders_waing.hive q22_global_sales_opportunity.hive Time(inseconds) TPC-h10TB Hive0.10RCFile Hive0.11ORC Hive0.12ORC Hive0.13ORCMR Hive0.13ORCTez
  • 10 TB 18 › 6.2x speedup over Hive 0.10 (RCFile) • Between 1.6-10x › Average query time: 908 seconds (426 seconds excluding outliers) • Down from 2129 seconds with Hive 0.10 RCFile – (1712 seconds excluding outliers) › 61% queries completed in under 5 minutes › 71% queries completed in under 10 minutes › Q6 still completes in 12 seconds! Hadoop User Group, 201406181830, Yahoo Sunnyvale
  • Explaining the speed-ups 19  Hadoop 2.x, et al.  Tez › (Arbitrary DAG)-based Execution Engine › “Playing the gaps” between M&R • Temporary data and the HDFS › Feedback loop › Smart scheduling › Container re-use › Pipelined job start-up  Hive › Statistics › “Vector-ized” Execution  ORC › PPD Hadoop User Group, 201406181830, Yahoo Sunnyvale
  • 20 Hadoop User Group, 201406181830, Yahoo Sunnyvale 0 100 200 300 400 500 600 700 800 900 1000 q1_pricing_sum mary_report.hive q2_m inim um _cost_supplier.hive q3_shipping_priority.hiveq4_order_priority q5_local_supplier_volume.hive q6_forecast_revenue_change.hive q7_volume_shipping.hive q8_na onal_m arket_share.hive q9_product_type_profit.hive q10_returned_item .hive q11_im portant_stock.hiveq12_shipping.hive q13_customer_distribu on.hive q14_promo on_effect.hive q15_top_supplier.hive q16_parts_supplier_rela onship.hive q17_small_quan ty_order_revenue.hive q18_large_volum e_customer.hive q19_discounted_revenue.hive q20_poten al_part_prom o on.hive q21_suppliers_who_kept_orders_waing.hive q22_global_sales_opportunity.hive Time(inseconds) Vectoriza on Hive 0.13 Tez ORC Hive 0.13 Tez ORC Vec
  • 21 Hadoop User Group, 201406181830, Yahoo Sunnyvale ORC File Layout  Data is composed of multiple streams per column  Index allows for skipping rows (default to every 10,000 rows), keeping position in each stream, and min-max for each column  Footer contains directory of stream locations, and the encoding for each column  Integer columns are serialized using run- length encoding  String columns are serialized using dictionary for column values, and the same run length encoding  Stripe footer is used to find the requested column’s data streams and adjacent stream reads are merged File Footer Postscript Index Data Row Data Stripe Footer 256MBStripe Index Data Row Data Stripe Footer 256MBStripe Index Data Row Data Stripe Footer 256MBStripe Column 1 Column 2 Column 7 Column 8 Column 3 Column 6 Column 4 Column 5 Column 1 Column 2 Column 7 Column 8 Column 3 Column 6 Column 4 Column 5 Stream 2.1 Stream 2.2 Stream 2.3 Stream 2.4
  • 22 Hadoop User Group, 201406181830, Yahoo Sunnyvale ORC Usage CREATE TABLE addresses ( name string, street string, city string, state string, zip int ) STORED AS orc TBLPROPERTIES ("orc.compress"= "ZLIB"); LOCATION ‘/path/to/addresses’; ALTER TABLE ... [PARTITION partition_spec] SET FILEFORMAT orc SET hive.default.fileformat = orc SET hive.exec.orc.memory.pool = 0.50 (ORC writer is allowed 50% of JVM heap size by default) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde’ INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat’ OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'; Key Default Comments orc.compress ZLIB high-level compression (one of NONE, ZLIB, Snappy) orc.compress.size 262,144 (256 KB) number of bytes in each compression chunk orc.stripe.size 67,108,864 (64 MB) number of bytes in each stripe. Each ORC stripe is processed in one map task (try 32 MB to cut down on disk I/O) orc.row.index.stride 10,000 number of rows between index entries (must be >= 1,000). A larger stride-size increases the probability of not being able to skip the stride, for a predicate. orc.create.index true whether to create row indexes. This is for predicate push-down. If data is frequently accessed/filtered on a certain column, then sorting on the column and using index-filters makes column filters work faster
  • 23 Hadoop User Group, 201406181830, Yahoo Sunnyvale 0 100 200 300 400 500 600 700 800 900 1000 q1_pricing_summary_report.hive q2_minimum_cost_supplier.hive q3_shipping_priority.hive q4_order_priority q5_local_supplier_volume.hive q6_forecast_revenue_change.hive q7_volume_shipping.hive q8_na onal_market_share.hive q9_product_type_profit.hive q10_returned_item.hive q11_important_stock.hive q12_shipping.hive q13_customer_distribu on.hive q14_promo on_effect.hiveq15_top_supplier.hive q16_parts_supplier_rela onship.hive q17_small_quan ty_order_revenue.hive q18_large_volume_customer.hive q19_discounted_revenue.hive q20_poten al_part_promo on.hive q21_suppliers_who_kept_orders_waing.hive q22_global_sales_opportunity.hive Time(inseconds) Effects of Compression (1TB) Hive 0.13 Uncompressed ORC Hive 0.13 ZLIB Compressed
  • 24 Hadoop User Group, 201406181830, Yahoo Sunnyvale 0 500 1000 1500 2000 2500 3000 q1_pricing_summary_report.hive q2_minimum_cost_supplier.hive q3_shipping_priority.hive q4_order_priority q5_local_supplier_volume.hive q6_forecast_revenue_change.hive q7_volume_shipping.hive q8_na onal_market_share.hive q9_product_type_profit.hiveq10_returned_item.hive q11_important_stock.hive q12_shipping.hive q13_customer_distribu on.hive q14_promo on_effect.hiveq15_top_supplier.hive q16_parts_supplier_rela onship.hive q17_small_quan ty_order_revenue.hive q18_large_volume_customer.hive q19_discounted_revenue.hive q20_poten al_part_promo on.hive q21_suppliers_who_kept_orders_waing.hive q22_global_sales_opportunity.hive Time(inseconds) Effects of Compression (10TB) Hive 0.13 Uncompressed Hive 0.13 Compressed
  • Configuring ORC 25  set hive.merge.mapredfiles=true  set hive.merge.mapfiles=true  set orc.stripe.size=67,108,864 › Half the HDFS block-size • Tangent: nStripes vs nBlocks • Tangent: DistCp  set orc.compress=??? › Depends on size and distribution › Snappy compression hasn’t been explored  YMMV › Experiment Hadoop User Group, 201406181830, Yahoo Sunnyvale
  • 26 Hadoop User Group, 201406181830, Yahoo Sunnyvale 0 100 200 300 400 500 600 700 800 900 1000 q1_pricing_sum m ary_report.hive q2_m inim um _cost_supplier.hive q3_shipping_priority.hive q4_order_priority q5_local_supplier_volum e.hive q6_forecast_revenue_change.hive q7_volum e_shipping.hive q8_na onal_m arket_share.hive q9_product_type_profit.hive q10_returned_item .hive q11_im portant_stock.hive q12_shipping.hive q13_custom er_distribu on.hive q14_prom o on_effect.hive q15_top_supplier.hive q16_parts_supplier_rela onship.hive q17_sm all_quan ty_order_revenue.hive q18_large_volum e_custom er.hive q19_discounted_revenue.hive q20_poten al_part_prom o on.hive q21_suppliers_w ho_kept_orders_w ai ng.hive q22_global_sales_opportunity.hive Time(inseconds) 100 vs 350 Nodes Hive 0.13 100 Nodes Hive 0.13 350 Nodes
  • Conclusions
  • Y!Grid sticking with Hive 28  Familiarity › Existing ecosystem  Community  Scale  Multitenant  Coming down the pike › CBO › In-memory caching solutions atop HDFS • RAMfs a la Tachyon? Hadoop User Group, 201406181830, Yahoo Sunnyvale
  • We’re not done yet 29  SQL compliance  Scaling up the metastore performance  Better BI Tool integration  Faster transport › HiveServer2 result-sets Hadoop User Group, 201406181830, Yahoo Sunnyvale
  • References 30  The YDN blog post: › http://yahoodevelopers.tumblr.com/post/85930551108/yahoo-betting-on-apache-hive- tez-and-yarn  Code: › https://github.com/mythrocks/hivebench (TPC-h scripts, datagen, transcode utils) › https://github.com/t3rmin4t0r/tpch-gen (Parallel TPC-h gen) › https://github.com/rxin/TPC-H-Hive (TPC-h scripts for Hive) › https://issues.apache.org/jira/browse/HIVE-600 (Yuntao’s initial TPC-h JIRA) Hadoop User Group, 201406181830, Yahoo Sunnyvale
  • Thank You @mithunrk mithun@apache.org We are hiring! Reach out to us at bigdata@yahoo-inc.com.
  • I’m glad you asked.
  • Sharky comments 33  Testing with Shark 0.7.x and Shark 0.8 › Compatible with Hive Metastore 0.9 › 100GB datasets : Admirable performance › 1TB/10TB: Tests did not run completely • Failures, especially in 10TB cases • Hangs while shuffling data • Scaled back to 100 nodes -> More tests ran through, but not completely › nReducers: Not inferred  Miscellany › Security › Multi-tenancy › Compatibility Hadoop User Group, 201406181830, Yahoo Sunnyvale