Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Presto @ Facebook: Past, Present and Future

5,134 views

Published on

Published in: Technology, Education
  • D0WNL0AD FULL ▶ ▶ ▶ ▶ http://1lite.top/jrKZF ◀ ◀ ◀ ◀
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Presto @ Facebook: Past, Present and Future

  1. 1. Presto Past, Present and Future Martin Traverso June 5, 2014
  2. 2. Why build Presto?
  3. 3. “A good day is when I can run 6 Hive queries” — a Facebook data scientist
  4. 4. What is Presto? Distributed SQL analytics engine Optimized for low-latency, interactive analysis ANSI SQL Extensible
  5. 5. Architecture
  6. 6. Architecture Scheduler Data Location API Parser/ Analyzer Planner Metadata API Coordinator Client Worker Worker Worker Data Stream API Data Stream API
  7. 7. Connectors Coordinator Worker Parser/ Analyzer Planner Scheduler Cassandra Internal MySQL JMX Hive Metadata API Cassandra Internal MySQL JMX Hive Data Location API Cassandra Internal MySQL JMX Hive Data Stream API
  8. 8. Connectors Hadoop 1.x Hadoop 2.x CDH 4 CDH 5 Custom S3 integration for Hadoop Cassandra TPC-H
  9. 9. Other extension points Types Functions Operators
  10. 10. What makes Presto fast? Data in memory during execution Pipelining and streaming Very careful coding of inner loops Efficient flat-memory data structures Bytecode generation
  11. 11. What’s next?
  12. 12. More SQL features Structs, Maps and Lists Views Scalar sub queries Features required to run all TPC-DS
  13. 13. Execution engine Huge joins and aggregations •Hash distributed •Co-distributed and co-partitioned •Spill to disk (flash) Work stealing Basic task recovery
  14. 14. ODBC driver Targeting major BI tools •Tableau, MicroStrategy and Excel Support for Windows, Mac and Linux Entirely open source (ASL2)
  15. 15. Native store Stores data directly on worker nodes Custom data format Initial use cases •‘Hot’ data •‘Live’ data
  16. 16. Open source Apache License 2.0 Open development Releases every 1-2 weeks ! External contributions welcome!
  17. 17. Presto http://prestodb.io github.com/facebook/presto ! Martin Traverso @mtraverso github.com/martint
  18. 18. Bytecode generation while (in.advanceNextPosition()) {! if (in.getLong(3) >= 100 && ! in.getLong(3) <= 200 &&! in.getLong(4) < in.getLong(5)) {! ! out.advance();! in.appendStringTo(0, out);! out.appendLong(in.getLong(1) * in.getLong(2) / 10);! }! } SELECT! k AS c1,! (a * b) / 10 AS c2! FROM T! WHERE! c BETWEEN 100 AND 200! AND d < e! T: ! k varchar, ! a bigint, ! b bigint, ! c bigint, ! d bigint, ! e bigint

×