Low Latency SQL on Hadoop - What's best for your cluster

1,688
-1

Published on

Published in: Technology

Low Latency SQL on Hadoop - What's best for your cluster

  1. 1. Low Latency SQL on Hadoop What’s best for your cluster? Prepared by Alan Gardner June 2014
  2. 2. Alan Gardner © 2013 Pythian2 @alanctgardner gardner@pythian.com
  3. 3. © 2013 Pythian3
  4. 4. © 2013 Pythian4
  5. 5. Overview • Performance • Architecture • Features • Vendor Support • Conclusions © 2013 Pythian5
  6. 6. Performance
  7. 7. Berkeley Big Data Benchmark • Hive, Hive-on-Tez, RedShift, Shark, Impala • Tested on five m2.4xlarge EC2 instances • Uses Intel’s Hadoop Benchmark, not TPC • ~150GB of © 2013 Pythian7
  8. 8. Berkeley Big Data Benchmark • Finds Shark fastest at straight scans, and tied with Impala for aggregation and joining • Hive-on-Tez is a distant third • Not using the optimized, columnar formats © 2013 Pythian8
  9. 9. Cloudera SQL Benchmark • Impala, Hive-on-Tez, Shark and Presto • Uses high-end hardware with relatively large memory, fastest data types for each engine • 15TB scale factor for a TPC-DS based test © 2013 Pythian9
  10. 10. Cloudera SQL Benchmark • Finds Impala to be significantly faster across all data sizes • Shark and Tez outperform Presto 0.60, with Tez performing better for larger result sets • It’s unclear if table© 2013 Pythian10
  11. 11. Our Configuration • 9-node cluster of m2.2xlarge instances • 4 cores, 34GB RAM • 850GB of instance storage • 100GB scale factor – only from disk, no RDDs • Impala 1.3.1 on CDH 5.0.1 • Hive 0.13 from the© 2013 Pythian11
  12. 12. File Formats • Hive, Shark - ORC (ZLIB) • Presto - ORC (ZLIB) – RCFile (LazyBinarySerDe) was slower – RCFile (ColumnarSerDe) may be better • Impala – Parquet (no compression) © 2013 Pythian12
  13. 13. © 2013 Pythian13
  14. 14. TPC-H Queries • Query 1 – filtering and aggregation on a single table • Query 8 – select two columns from joins across many-to-many relationships • Query 10 – select and aggregate on eight© 2013 Pythian14
  15. 15. © 2013 Pythian15
  16. 16. Architecture
  17. 17. © 2013 Pythian17 • Hive 0.13 runs on Tez, which executes queries as DAGs • DAGs are more efficient than MRv1 query plans • Runs on YARN, resources are shared between all jobs • Individual node failures are tolerated and retried automatically
  18. 18. © 2013 Pythian18 • HiveServer creates a DAG from HQL submitted over JDBC • HiveServer requests or reuses a Tez AM to run the query • Tez handles placement of query fragments based on locality and resources
  19. 19. © 2013 Pythian19 • Shark uses the same core as Hive: the HQL parser and the file and UDF interfaces are compatible • DAGs produced by Shark are optimized for Spark, rather than Tez • Spark can be run on YARN for resource sharing, as well as Mesos or stand- alone
  20. 20. © 2013 Pythian20 • Spark is more mature and offers a wider range of optimizations right now • Shark also supports storing results as an RDD within Spark
  21. 21. © 2013 Pythian21 • Impala runs as an engine ‘next to’ YARN, not on top of it • To reduce resource contention and allow scheduling to be centralized in YARN, Llama was created • Llama creates “fake” applications on YARN as placeholders for Impala
  22. 22. © 2013 Pythian22 • Impalad receives queries, plans and executes them • Statestore broadcasts metadata updates and node status • Catalog caches block metadata and Hive table metadata
  23. 23. © 2013 Pythian23 • Presto doesn’t interact with YARN at all • cgroups are the only way to share resources between YARN jobs and Presto • Presto also handles all scheduling and job placement by itself
  24. 24. © 2013 Pythian24 • Presto has a single coordinator which plans and distributes query fragments • Workers are still co-located with DataNodes for locality • Discovery service manages worker status
  25. 25. Functionality
  26. 26. © 2013 Pythian26
  27. 27. © 2013 Pythian27
  28. 28. Text RCFile Parquet ORCFile Avro SequenceFile Presto R R R R R R Impala R/W R R/W - R R Hive/Shark R/W R/W R/W R/W R/W R/W © 2013 Pythian28 File Formats Flexibility SerDes Complex Data UDFs Spill to Disk JOIN Reordering Presto Yes Yes, but slow No No None Impala No No Yes No Cost-based Hive/Shark Yes Yes Yes Yes Cardinality
  29. 29. Text RCFile Parquet ORCFile Avro SequenceFile Presto R R R R R R Impala R/W R R/W - R R Hive/Shark R/W R/W R/W R/W R/W R/W © 2013 Pythian29 File Formats Flexibility SerDes Complex Data UDFs Spill to Disk JOIN Optimization Presto Yes Yes, but slow No No None Impala No No Yes No Cost-based Hive/Shark Yes Yes Yes Yes Cardinality
  30. 30. Text RCFile Parquet ORCFile Avro SequenceFile Presto R R R R R R Impala R/W R R/W - R R Hive/Shark R/W R/W R/W R/W R/W R/W © 2013 Pythian30 SerDes Complex Data UDFs Spill to Disk JOIN Optimization Presto Yes Yes, but slow No No None Impala No No Yes No Cost-based Hive/Shark Yes Yes Yes Yes Cardinality File Formats Flexibility
  31. 31. Vendor Support
  32. 32. © 2013 Pythian32 Cloudera MapR HortonWorks Presto No No No Impala Yes Yes No Hive No Tez No Tez Yes Shark Spark Yes Spark Note: based on vendor documentation as of 31/05/2014 Official Support
  33. 33. © 2013 Pythian33 Cloudera MapR HortonWorks Presto No No No Impala Yes Yes No Hive No Tez No Tez Yes Shark Spark Yes Spark Note: based on vendor documentation as of 31/05/2014 Official Support
  34. 34. © 2013 Pythian34 Cloudera MapR HortonWorks Presto No No No Impala Yes Yes No Hive No Tez No Tez Yes Shark Spark Yes Spark Note: based on vendor documentation as of 31/05/2014 Official Support
  35. 35. Conclusions
  36. 36. © 2013 Pythian36 A giant, indecipherable flowchart
  37. 37. Conclusions • Shark provides a faster alternative to Hive 0.13 for ETL and analytics, but support is lacking and tuning is difficult • Presto is still nascent – deployment is easy, but querying is not so simple © 2013 Pythian37
  38. 38. Thank you – Q&A To contact us gardner@pythian.com 1-877-PYTHIAN @pythian @alanctgardner © 2013 Pythian38

×