Presto @ Facebook: Past, Present and Future

4,419 views

Published on

Published in: Technology, Education

Presto @ Facebook: Past, Present and Future

  1. 1. Presto Past, Present and Future Martin Traverso June 5, 2014
  2. 2. Why build Presto?
  3. 3. “A good day is when I can run 6 Hive queries” — a Facebook data scientist
  4. 4. What is Presto? Distributed SQL analytics engine Optimized for low-latency, interactive analysis ANSI SQL Extensible
  5. 5. Architecture
  6. 6. Architecture Scheduler Data Location API Parser/ Analyzer Planner Metadata API Coordinator Client Worker Worker Worker Data Stream API Data Stream API
  7. 7. Connectors Coordinator Worker Parser/ Analyzer Planner Scheduler Cassandra Internal MySQL JMX Hive Metadata API Cassandra Internal MySQL JMX Hive Data Location API Cassandra Internal MySQL JMX Hive Data Stream API
  8. 8. Connectors Hadoop 1.x Hadoop 2.x CDH 4 CDH 5 Custom S3 integration for Hadoop Cassandra TPC-H
  9. 9. Other extension points Types Functions Operators
  10. 10. What makes Presto fast? Data in memory during execution Pipelining and streaming Very careful coding of inner loops Efficient flat-memory data structures Bytecode generation
  11. 11. What’s next?
  12. 12. More SQL features Structs, Maps and Lists Views Scalar sub queries Features required to run all TPC-DS
  13. 13. Execution engine Huge joins and aggregations •Hash distributed •Co-distributed and co-partitioned •Spill to disk (flash) Work stealing Basic task recovery
  14. 14. ODBC driver Targeting major BI tools •Tableau, MicroStrategy and Excel Support for Windows, Mac and Linux Entirely open source (ASL2)
  15. 15. Native store Stores data directly on worker nodes Custom data format Initial use cases •‘Hot’ data •‘Live’ data
  16. 16. Open source Apache License 2.0 Open development Releases every 1-2 weeks ! External contributions welcome!
  17. 17. Presto http://prestodb.io github.com/facebook/presto ! Martin Traverso @mtraverso github.com/martint
  18. 18. Bytecode generation while (in.advanceNextPosition()) {! if (in.getLong(3) >= 100 && ! in.getLong(3) <= 200 &&! in.getLong(4) < in.getLong(5)) {! ! out.advance();! in.appendStringTo(0, out);! out.appendLong(in.getLong(1) * in.getLong(2) / 10);! }! } SELECT! k AS c1,! (a * b) / 10 AS c2! FROM T! WHERE! c BETWEEN 100 AND 200! AND d < e! T: ! k varchar, ! a bigint, ! b bigint, ! c bigint, ! d bigint, ! e bigint

×