Your SlideShare is downloading. ×
0
Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and Future
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Presto @ Facebook: Past, Present and Future

3,134

Published on

Published in: Technology, Education
0 Comments
13 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,134
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
13
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Presto Past, Present and Future Martin Traverso June 5, 2014
  • 2. Why build Presto?
  • 3. “A good day is when I can run 6 Hive queries” — a Facebook data scientist
  • 4. What is Presto? Distributed SQL analytics engine Optimized for low-latency, interactive analysis ANSI SQL Extensible
  • 5. Architecture
  • 6. Architecture Scheduler Data Location API Parser/ Analyzer Planner Metadata API Coordinator Client Worker Worker Worker Data Stream API Data Stream API
  • 7. Connectors Coordinator Worker Parser/ Analyzer Planner Scheduler Cassandra Internal MySQL JMX Hive Metadata API Cassandra Internal MySQL JMX Hive Data Location API Cassandra Internal MySQL JMX Hive Data Stream API
  • 8. Connectors Hadoop 1.x Hadoop 2.x CDH 4 CDH 5 Custom S3 integration for Hadoop Cassandra TPC-H
  • 9. Other extension points Types Functions Operators
  • 10. What makes Presto fast? Data in memory during execution Pipelining and streaming Very careful coding of inner loops Efficient flat-memory data structures Bytecode generation
  • 11. What’s next?
  • 12. More SQL features Structs, Maps and Lists Views Scalar sub queries Features required to run all TPC-DS
  • 13. Execution engine Huge joins and aggregations •Hash distributed •Co-distributed and co-partitioned •Spill to disk (flash) Work stealing Basic task recovery
  • 14. ODBC driver Targeting major BI tools •Tableau, MicroStrategy and Excel Support for Windows, Mac and Linux Entirely open source (ASL2)
  • 15. Native store Stores data directly on worker nodes Custom data format Initial use cases •‘Hot’ data •‘Live’ data
  • 16. Open source Apache License 2.0 Open development Releases every 1-2 weeks ! External contributions welcome!
  • 17. Presto http://prestodb.io github.com/facebook/presto ! Martin Traverso @mtraverso github.com/martint
  • 18. Bytecode generation while (in.advanceNextPosition()) {! if (in.getLong(3) >= 100 && ! in.getLong(3) <= 200 &&! in.getLong(4) < in.getLong(5)) {! ! out.advance();! in.appendStringTo(0, out);! out.appendLong(in.getLong(1) * in.getLong(2) / 10);! }! } SELECT! k AS c1,! (a * b) / 10 AS c2! FROM T! WHERE! c BETWEEN 100 AND 200! AND d < e! T: ! k varchar, ! a bigint, ! b bigint, ! c bigint, ! d bigint, ! e bigint

×