• Save
Presto @ Facebook: Past, Present and Future
Upcoming SlideShare
Loading in...5
×
 

Presto @ Facebook: Past, Present and Future

on

  • 1,169 views

 

Statistics

Views

Total Views
1,169
Views on SlideShare
1,155
Embed Views
14

Actions

Likes
5
Downloads
0
Comments
0

1 Embed 14

https://twitter.com 14

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Presto @ Facebook: Past, Present and Future Presto @ Facebook: Past, Present and Future Presentation Transcript

    • Presto Past, Present and Future Martin Traverso June 5, 2014
    • Why build Presto?
    • “A good day is when I can run 6 Hive queries” — a Facebook data scientist
    • What is Presto? Distributed SQL analytics engine Optimized for low-latency, interactive analysis ANSI SQL Extensible
    • Architecture
    • Architecture Scheduler Data Location API Parser/ Analyzer Planner Metadata API Coordinator Client Worker Worker Worker Data Stream API Data Stream API
    • Connectors Coordinator Worker Parser/ Analyzer Planner Scheduler Cassandra Internal MySQL JMX Hive Metadata API Cassandra Internal MySQL JMX Hive Data Location API Cassandra Internal MySQL JMX Hive Data Stream API
    • Connectors Hadoop 1.x Hadoop 2.x CDH 4 CDH 5 Custom S3 integration for Hadoop Cassandra TPC-H
    • Other extension points Types Functions Operators
    • What makes Presto fast? Data in memory during execution Pipelining and streaming Very careful coding of inner loops Efficient flat-memory data structures Bytecode generation
    • What’s next?
    • More SQL features Structs, Maps and Lists Views Scalar sub queries Features required to run all TPC-DS
    • Execution engine Huge joins and aggregations •Hash distributed •Co-distributed and co-partitioned •Spill to disk (flash) Work stealing Basic task recovery
    • ODBC driver Targeting major BI tools •Tableau, MicroStrategy and Excel Support for Windows, Mac and Linux Entirely open source (ASL2)
    • Native store Stores data directly on worker nodes Custom data format Initial use cases •‘Hot’ data •‘Live’ data
    • Open source Apache License 2.0 Open development Releases every 1-2 weeks ! External contributions welcome!
    • Presto http://prestodb.io github.com/facebook/presto ! Martin Traverso @mtraverso github.com/martint
    • Bytecode generation while (in.advanceNextPosition()) {! if (in.getLong(3) >= 100 && ! in.getLong(3) <= 200 &&! in.getLong(4) < in.getLong(5)) {! ! out.advance();! in.appendStringTo(0, out);! out.appendLong(in.getLong(1) * in.getLong(2) / 10);! }! } SELECT! k AS c1,! (a * b) / 10 AS c2! FROM T! WHERE! c BETWEEN 100 AND 200! AND d < e! T: ! k varchar, ! a bigint, ! b bigint, ! c bigint, ! d bigint, ! e bigint