Low Latency SQL on Hadoop - What's best for your cluster

  • 720 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
720
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
0
Comments
0
Likes
4

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Low Latency SQL on Hadoop What’s best for your cluster? Prepared by Alan Gardner June 2014
  • 2. Alan Gardner © 2013 Pythian2 @alanctgardner gardner@pythian.com
  • 3. © 2013 Pythian3
  • 4. © 2013 Pythian4
  • 5. Overview • Performance • Architecture • Features • Vendor Support • Conclusions © 2013 Pythian5
  • 6. Performance
  • 7. Berkeley Big Data Benchmark • Hive, Hive-on-Tez, RedShift, Shark, Impala • Tested on five m2.4xlarge EC2 instances • Uses Intel’s Hadoop Benchmark, not TPC • ~150GB of © 2013 Pythian7
  • 8. Berkeley Big Data Benchmark • Finds Shark fastest at straight scans, and tied with Impala for aggregation and joining • Hive-on-Tez is a distant third • Not using the optimized, columnar formats © 2013 Pythian8
  • 9. Cloudera SQL Benchmark • Impala, Hive-on-Tez, Shark and Presto • Uses high-end hardware with relatively large memory, fastest data types for each engine • 15TB scale factor for a TPC-DS based test © 2013 Pythian9
  • 10. Cloudera SQL Benchmark • Finds Impala to be significantly faster across all data sizes • Shark and Tez outperform Presto 0.60, with Tez performing better for larger result sets • It’s unclear if table© 2013 Pythian10
  • 11. Our Configuration • 9-node cluster of m2.2xlarge instances • 4 cores, 34GB RAM • 850GB of instance storage • 100GB scale factor – only from disk, no RDDs • Impala 1.3.1 on CDH 5.0.1 • Hive 0.13 from the© 2013 Pythian11
  • 12. File Formats • Hive, Shark - ORC (ZLIB) • Presto - ORC (ZLIB) – RCFile (LazyBinarySerDe) was slower – RCFile (ColumnarSerDe) may be better • Impala – Parquet (no compression) © 2013 Pythian12
  • 13. © 2013 Pythian13
  • 14. TPC-H Queries • Query 1 – filtering and aggregation on a single table • Query 8 – select two columns from joins across many-to-many relationships • Query 10 – select and aggregate on eight© 2013 Pythian14
  • 15. © 2013 Pythian15
  • 16. Architecture
  • 17. © 2013 Pythian17 • Hive 0.13 runs on Tez, which executes queries as DAGs • DAGs are more efficient than MRv1 query plans • Runs on YARN, resources are shared between all jobs • Individual node failures are tolerated and retried automatically
  • 18. © 2013 Pythian18 • HiveServer creates a DAG from HQL submitted over JDBC • HiveServer requests or reuses a Tez AM to run the query • Tez handles placement of query fragments based on locality and resources
  • 19. © 2013 Pythian19 • Shark uses the same core as Hive: the HQL parser and the file and UDF interfaces are compatible • DAGs produced by Shark are optimized for Spark, rather than Tez • Spark can be run on YARN for resource sharing, as well as Mesos or stand- alone
  • 20. © 2013 Pythian20 • Spark is more mature and offers a wider range of optimizations right now • Shark also supports storing results as an RDD within Spark
  • 21. © 2013 Pythian21 • Impala runs as an engine ‘next to’ YARN, not on top of it • To reduce resource contention and allow scheduling to be centralized in YARN, Llama was created • Llama creates “fake” applications on YARN as placeholders for Impala
  • 22. © 2013 Pythian22 • Impalad receives queries, plans and executes them • Statestore broadcasts metadata updates and node status • Catalog caches block metadata and Hive table metadata
  • 23. © 2013 Pythian23 • Presto doesn’t interact with YARN at all • cgroups are the only way to share resources between YARN jobs and Presto • Presto also handles all scheduling and job placement by itself
  • 24. © 2013 Pythian24 • Presto has a single coordinator which plans and distributes query fragments • Workers are still co-located with DataNodes for locality • Discovery service manages worker status
  • 25. Functionality
  • 26. © 2013 Pythian26
  • 27. © 2013 Pythian27
  • 28. Text RCFile Parquet ORCFile Avro SequenceFile Presto R R R R R R Impala R/W R R/W - R R Hive/Shark R/W R/W R/W R/W R/W R/W © 2013 Pythian28 File Formats Flexibility SerDes Complex Data UDFs Spill to Disk JOIN Reordering Presto Yes Yes, but slow No No None Impala No No Yes No Cost-based Hive/Shark Yes Yes Yes Yes Cardinality
  • 29. Text RCFile Parquet ORCFile Avro SequenceFile Presto R R R R R R Impala R/W R R/W - R R Hive/Shark R/W R/W R/W R/W R/W R/W © 2013 Pythian29 File Formats Flexibility SerDes Complex Data UDFs Spill to Disk JOIN Optimization Presto Yes Yes, but slow No No None Impala No No Yes No Cost-based Hive/Shark Yes Yes Yes Yes Cardinality
  • 30. Text RCFile Parquet ORCFile Avro SequenceFile Presto R R R R R R Impala R/W R R/W - R R Hive/Shark R/W R/W R/W R/W R/W R/W © 2013 Pythian30 SerDes Complex Data UDFs Spill to Disk JOIN Optimization Presto Yes Yes, but slow No No None Impala No No Yes No Cost-based Hive/Shark Yes Yes Yes Yes Cardinality File Formats Flexibility
  • 31. Vendor Support
  • 32. © 2013 Pythian32 Cloudera MapR HortonWorks Presto No No No Impala Yes Yes No Hive No Tez No Tez Yes Shark Spark Yes Spark Note: based on vendor documentation as of 31/05/2014 Official Support
  • 33. © 2013 Pythian33 Cloudera MapR HortonWorks Presto No No No Impala Yes Yes No Hive No Tez No Tez Yes Shark Spark Yes Spark Note: based on vendor documentation as of 31/05/2014 Official Support
  • 34. © 2013 Pythian34 Cloudera MapR HortonWorks Presto No No No Impala Yes Yes No Hive No Tez No Tez Yes Shark Spark Yes Spark Note: based on vendor documentation as of 31/05/2014 Official Support
  • 35. Conclusions
  • 36. © 2013 Pythian36 A giant, indecipherable flowchart
  • 37. Conclusions • Shark provides a faster alternative to Hive 0.13 for ETL and analytics, but support is lacking and tuning is difficult • Presto is still nascent – deployment is easy, but querying is not so simple © 2013 Pythian37
  • 38. Thank you – Q&A To contact us gardner@pythian.com 1-877-PYTHIAN @pythian @alanctgardner © 2013 Pythian38