Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

SQL on Hadoop in Taiwan

8,465 views

Published on

http://2014.hadoopcon.org/wp/?p=10

Published in: Technology
  • Be the first to comment

SQL on Hadoop in Taiwan

  1. 1. SQL on Hadoop a Perspective of a Cloud-based, Managed Service Provider Masahiro Nakagawa Sep 13, 2014 Hadoop Meetup in Taiwan
  2. 2. Today’s agenda > Self introduction > Why SQL? > Hive > Presto > Conclusion
  3. 3. Who are you? > Masahiro Nakagawa > github/twitter: @repeatedly > Treasure Data, Inc. > Senior Software Engineer > Fluentd / td-agent developer > I love OSS :) > D language - Phobos committer > Fluentd - Main maintainer > MessagePack / RPC- D and Python (only RPC) > The organizer of Presto Source Code Reading > etc…
  4. 4. Do you love SQL?
  5. 5. Why we love SQL? > Easy to understand what we are doing > declarative language > common interface for data manipulation > There are many users > SQL is not the best but better than uncommon interfaces
  6. 6. We want to use SQL in the Hadoop world
  7. 7. SQL Players on Hadoop This color indicates a commercial product > Hive > Spark SQL Batch Short Batch Low latency Stream > Presto > Impala > Drill > Norikra > StreamSQL > HAWQ > Actian > etc… Latency: minutes - hours Latency: seconds - minutes Latency: immediate
  8. 8. SQL Players on Hadoop This color indicates a commercial product > Hive > Spark SQL Batch Short Batch Low latency Stream > Presto > Impala > Drill > HAWQ > Actian > etc… Red Ocean Blue Ocean? > Norikra > StreamSQL
  9. 9. 3 query engines on Treasure Data > Hive (batch) > for ETL and scheduled reporting > Presto (short batch / low latency) > for Ad hoc queries > Pig > Not SQL > There aren’t as many users… ;( Today’s talk
  10. 10. Presto https://hive.apache.org/
  11. 11. What’s Hive > Needs no explanation ;) > Most popular project in the ecosystem > HiveQL and MapReduce > Writing MapReduce code is hard > Hive is growing rapidly by Stinger initiative > Vectorized Processing > Query optimization with statistics > Tez instead of MapReduce > etc…
  12. 12. Apache Tez > Low level framework for YARN applications > Next generation query engine > Provide good IR for Hive, Pig and more > Task and DAG based pipelining > Spark uses a similar DAG model Input Processor Output Task DAG http://tez.apache.org/
  13. 13. Hive on MR vs. Hive on Tez SELECT g1.x, g2.avg, g2.cnt FROM (SELECT a.x AVERAGE(a.y) AS avg FROM a GROUP BY a.x) g1" JOIN (SELECT b.x, COUNT(b.y) AS avg FROM b GROUP BY b.x) g2" ON (g1.x = g2.x) ORDER BY avg; MapReduce Tez M M M M R HDFS HDFS M M M http://www.slideshare.net/Hadoop_Summit/w-235phall1pandey/9 M M R HDFS M R R R M R M M R R R Can avoid unnecessary HDFS write GROUP a BY a.x GROUP b BY b.x JOIN (a, b) ORDER BY GROUP BY x GROUP BY a.x" JOIN (a, b) ORDER BY
  14. 14. Why still use MapReduce? > The emphasis is on stability / reliability > Speed is important but not most important > Can use a MPP query engine for short batch > Tez/Spark are immature > Hard to manage in a multi-tenant env > Different failure models > We are now testing Tez for Hive •No code change needed for Hive. Spark is hard… • Disabling Tez is easy. Just remove ‘set hive.execution.engine=tez;’
  15. 15. Presto http://prestodb.io/
  16. 16. What’s Presto? A distributed SQL query engine for interactive data analisys against GBs to PBs of data.
  17. 17. Presto’s history > 2012 Fall: Project started at Facebook > Designed for interactive query with speed of commercial data warehouse > and scalability to the size of Facebook > 2013 Winter: Open sourced! > 30+ contributes in 6 months > including people outside of Facebook
  18. 18. What problems does it solve? > We couldn’t visualize data in HDFS directly using dashboards or BI tools > because Hive is too slow (not interactive) > or ODBC connectivity is unavailable/unstable > We needed to store daily-batch results to an interactive DB for quick response (PostgreSQL, Redshift, etc.) > Interactive DB costs more & less scalable > Some data are not stored in HDFS > We need to copy the data into HDFS to analyze
  19. 19. What problems does it solve? > We couldn’t visualize data in HDFS directly using dashboards or BI tools > because Hive is too slow (not interactive) > or ODBC connectivity is unavailable/unstable > We needed to store daily-batch results to an interactive DB for quick response (PostgreSQL, Redshift, etc.) > Interactive DB costs more & less scalable > Some data are not stored in HDFS > We need to copy the data into HDFS to analyze
  20. 20. What problems does it solve? > We couldn’t visualize data in HDFS directly using dashboards or BI tools > because Hive is too slow (not interactive) > or ODBC connectivity is unavailable/unstable > We needed to store daily-batch results to an interactive DB for quick response (PostgreSQL, Redshift, etc.) > Interactive DB costs more & less scalable > Some data are not stored in HDFS > We need to copy the data into HDFS to analyze
  21. 21. What problems does it solve? > We couldn’t visualize data in HDFS directly using dashboards or BI tools > because Hive is too slow (not interactive) > or ODBC connectivity is unavailable/unstable > We needed to store daily-batch results to an interactive DB for quick response (PostgreSQL, Redshift, etc.) > Interactive DB costs more & less scalable > Some data are not stored in HDFS > We need to copy the data into HDFS to analyze
  22. 22. HDFS Hive PostgreSQL, etc. Daily/Hourly Batch Interactive query Dashboard Commercial BI Tools Batch analysis platform Visualization platform
  23. 23. HDFS Daily/Hourly Batch Hive Interactive query PostgreSQL, etc. ✓ Less scalable ✓ Extra cost Dashboard Commercial BI Tools ✓ Can’t query against “live” data directly Batch analysis platform Visualization platform ✓ More work to manage 2 platforms
  24. 24. HDFS Hive Dashboard Presto PostgreSQL, etc. Daily/Hourly Batch HDFS Hive Dashboard Daily/Hourly Batch Interactive query Interactive query
  25. 25. Presto HDFS Hive Dashboard Daily/Hourly Batch Interactive query SQL on any data sets Cassandra MySQL Commertial DBs
  26. 26. Presto HDFS Hive Dashboard Daily/Hourly Batch Interactive query SQL on any data sets Commercial Cassandra MySQL Commertial DBs BI Tools ✓ IBM Cognos ✓ Tableau ✓ ... Data analysis platform
  27. 27. dashboard on chart.io: https://chartio.com/
  28. 28. What can Presto do? > Query interactively (in milliseconds to minutes) > MapReduce and Hive are still necessary for ETL > Query using commercial BI tools or dashboards > Reliable ODBC/JDBC connectivity > Query across multiple data sources such as Hive, HBase, Cassandra, or even commercial DBs > Plugin mechanism > Integrate batch analysis + visualization into a single data analysis platform
  29. 29. Presto’s deployment > Facebook > Multiple geographical regions > scaled to 1,000 nodes > actively used by 1,000+ employees > processing 1PB/day > Netflix, Dropbox, Treasure Data, Airbnb, Qubole, LINE, GREE, Scaleout, etc > Presto as a Service > Treasure Data, Qubole
  30. 30. Distributed architecture
  31. 31. Client Coordinator Connector Plugin Worker Worker Worker Storage / Metadata Discovery Service
  32. 32. Client Coordinator Connector Plugin Worker Worker Worker Storage / Metadata Discovery Service 1. Client sends a query using HTTP
  33. 33. Client Coordinator Connector Plugin Worker Worker Worker Storage / Metadata Discovery Service 2. Coordinator builds a query plan Connector plugin provides metadata (table schema, etc.)
  34. 34. Client Coordinator Connector Plugin Worker Worker Worker Storage / Metadata Discovery Service 3. Coordinator sends tasks to workers
  35. 35. Client Coordinator Connector Plugin Worker Worker Worker Storage / Metadata Discovery Service 4. Workers read data through connector plugin
  36. 36. Client Coordinator Connector Plugin Worker Worker Worker Storage / Metadata Discovery Service 5. Workers run tasks in memory and in parallel
  37. 37. Coordinator Connector Plugin Worker Worker Worker Storage / Metadata Discovery Service Client 6. Client gets the result from a worker
  38. 38. Client Coordinator Connector Plugin Worker Worker Worker Storage / Metadata Discovery Service
  39. 39. What’s Connectors? > Access to storage and metadata > provide table schema to coordinators > provide table rows to workers > Connectors are pluggable to Presto > written in Java > Implementations: > Hive connector > Cassandra connector > MySQL through JDBC connector (prerelease) > Or your own connector
  40. 40. Hive connector Client Coordinator Hive Connector Worker Worker Worker HDFS, Hive Metastore Discovery Service find servers in a cluster
  41. 41. Cassandra connector Client Coordinator Cassandra Connector Worker Worker Worker Cassandra Discovery Service find servers in a cluster
  42. 42. Client Coordinator other connectors ... Worker Worker Worker Cassandra Discovery Service find servers in a cluster Hive Connector HDFS / Metastore Multiple connectors in a query Cassandra Connector Other data sources...
  43. 43. Distributed architecture > 3 type of servers: > Coordinator, worker, discovery service > Get data/metadata through connector plugins. > Presto is NOT a database > Presto provides SQL to existent data stores > Client protocol is HTTP + JSON > Language bindings: Ruby, Python, PHP, Java (JDBC), R, Node.JS...
  44. 44. Query Execution
  45. 45. Presto’s execution model > Presto is NOT MapReduce > Use its own execution engine > Presto’s query plan is based on DAG > more like Apache Tez / Spark or traditional MPP databases > Impala and Drill use a similar model
  46. 46. How query runs? > Coordinator > SQL Parser > Query Planner > Execution planner > Workers > Task execution scheduler
  47. 47. SQL SQL Parser AST Logical Planner Metadata Distributed Planner Logical Query Plan Optimizer Execution Planner Discovery Server Connector Distributed Query Plan Execution Plan NodeManager ✓ node list ✓ table schema
  48. 48. SQL SQL Parser SQL Metadata Distributed Planner Logical Query Plan Optimizer Execution Planner Discovery Service Connector Query Plan Execution Plan NodeManager ✓ node list ✓ table schema (today’s talk) Query Planner
  49. 49. Query Planner SQL SELECT name, count(*) AS c FROM impressions GROUP BY name Table schema impressions ( name varchar time bigint ) Output (name, c) GROUP BY (name, count(*)) Table scan (name:varchar) + Output Exchange Sink Final aggr Exchange Sink Partial aggr Table scan Logical query plan Distributed query plan
  50. 50. Query Planner - Stages Output Exchange Sink Final aggr Exchange Sink Partial aggr Table scan inter-worker data transfer Stage-0 pipelined aggregation inter-worker data transfer Stage-1 Stage-2
  51. 51. Output Exchange Sink Partial aggr Table scan Sink Partial aggr Table scan Execution Planner +Node list ✓ 2 workers Sink Final aggr Exchange Sink Final aggr Exchange Output Exchange Sink Final aggr Exchange Sink Partial aggr Table scan Worker 1 Worker 2
  52. 52. Execution Planner - Tasks Worker 1 Worker 2 Sink Final aggr Exchange Sink Partial aggr Table scan Sink Final aggr Exchange Sink Partial aggr Table scan Task 1 task / worker / stage ✓ All tasks in parallel Output Exchange
  53. 53. Execution Planner - Split Sink Final aggr Exchange Sink Partial aggr Table scan Sink Final aggr Exchange Sink Partial aggr Table scan Output Exchange Split 1 split / task = 1 thread / worker many splits / task = many threads / worker (table scan) Worker 1 Worker 2 1 split / worker = 1 thread / worker
  54. 54. All stages are pipe-lined ✓ No wait time ✓ No fault-tolerance MapReduce vs. Presto MapReduce Presto reduce reduce disk map map disk reduce reduce map map task task task task task task memory-to-memory data transfer ✓ No disk IO ✓ Data chunk must fit in memory task disk Wait between stages Write data to disk
  55. 55. Query Execution > SQL is converted into stages, tasks and splits > All tasks run in parallel > No wait time between stages (pipelined) > If one task fails, all tasks fail at once (query fails) > Memory-to-memory data transfer > No disk IO > If aggregated data doesn’t fit in memory, query fails •Note: query dies but worker doesn’t die. Memory consumption of all queries is fully managed
  56. 56. Why select Presto? > The ease of operations > Easy to deploy. Just drop a jar > Easy to extend its functionalities • Pluggable and DI based loose coupling > Doesn’t crash when a query fails > Standard SQL syntax > Important for existing DB/DWH users > HiveQL is for MapReduce, not MPP DB
  57. 57. Our customer use cases Hive Presto > Scheduled reporting for customers > once every hour Online Ad Web/Social Retail > Scheduled reporting for management > Compute KPIs > Scheduled reporting for website, PoS and touch panel data > Hard deadlines! > Check ad-network performance > delivery logic optimization in realtime > Aggregation for user support > Measuring the effect of user campaigns > Ad-hoc query for Basket Analysis > Aggregate data for the product development
  58. 58. Conclusion
  59. 59. Batch summary > MapReduce-based Hive is still the default choice > Stable & Lots of shared experience and knowledge > Hive with Tez is for Hadoop users > No code change needed > HDP includes Tez by default > Spark and Spark SQL is a good alternative > Can’t reuse Hadoop knowledge > Mainly for in-memory processing for now
  60. 60. Short batch summary > Presto is a good default choice > Easy to manage and have useful features > Need faster queries? Try Impala > for HDFS and HBase > CDH includes Impala by default > If you are a challenger, check out Drill > The project’s goal is ambitious > The status is developer preview
  61. 61. Stream summary > Fluentd and Norikra > Fluentd is for robust log collection > Norikra is for SQL based CEP ! > StreamSQL > for Spark users > Current status is POC
  62. 62. Lastly… > Use different engines for different requirements > Hadoop/Spark for batch jobs > MapReduce won't die for the time being > MPP query engine for interactive queries > These engines are integrated into one system in the future > Batch now use DAG pipeline > Short Batch will support Task recovery The differences will be minimum
  63. 63. Enjoy SQL!
  64. 64. Cloud service for the entire data pipeline, including Presto Check: treasuredata.com

×