Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
​Presto SQL Engine: what’s new?
​Strata Hadoop 2016 San Jose, CA
2
What is Presto?
100% open source distributed SQL query engine
Originally developed by Facebook
Key Differentiators:
Perf...
3
Brief history of Presto
FALL 2012
6 developers
start Presto
development
FALL 2014
88 Releases
41 Contributors
3943 Commi...
4
• Facebook
– Multiple production clusters (100s of nodes total)
- Massive 300PB Hadoop data warehouse
- Very large shard...
5
Presto Architecture
Data stream API
Worker
Data stream API
Worker
Coordinator
Metadata
API
Parser/
analyzer
Planner Sche...
6
Presto Extensibility – connectors
Parser/
analyzer
Planner
Worker
Data location API
HDFS/S3
NoSQL
DBMS
Custom
…
Metadata...
7
• Hadoop/Hive connector & file formats:
– HDFS & S3 + HCatalog
– ORC, RCFile, Parquet, SequenceFile, Text
• Open source ...
8
• In-memory processing
• Pipelined execution across nodes MPP-style
• Vectorized columnar processing
• Multithreaded exe...
9
[ WITH with_query [, ...] ]
SELECT [ ALL | DISTINCT ] select_expr [, ...]
[ FROM table1 [[ INNER | OUTER ] JOIN table2 O...
10
• Cluster deployment models for Presto:
– on premise (appliance or commodity clusters)
– VM (OpenStack, etc.)
– cloud (...
11
Open source initiative
• Announced in June 2015 at Hadoop Summit
– Growing interest and adoption
• Collaboration with F...
12
Implement Integrate Proliferate
• Installer
• Documentation
• Monitoring & Support
Tools
• Management Tool
Integration
...
13
Recent developments and roadmap
• Q1 release:
– Fully-featured ODBC & JDBC drivers
– Kerberos support
– DECIMAL support...
14
BI Tools certifications
15
Presto
Connectors
Teradata Certified Community Supported
Teradata QueryGrid™ - Multi-System Analytics
Targets
Entry Poi...
16
Certified Distro: www.teradata.com/presto
Website: www.prestodb.io
Presto Users Group: www.groups.google.com/group/pres...
17
www.teradata.com/presto
Upcoming SlideShare
Loading in …5
×

Presto Strata Hadoop SJ 2016 short talk

1,238 views

Published on

A short talk on Presto at Strata Hadoop 2016, San Jose, CA.

Published in: Technology

Presto Strata Hadoop SJ 2016 short talk

  1. 1. ​Presto SQL Engine: what’s new? ​Strata Hadoop 2016 San Jose, CA
  2. 2. 2 What is Presto? 100% open source distributed SQL query engine Originally developed by Facebook Key Differentiators: Performance & Scale Cross platform query capability, not only SQL on Hadoop Apache licensed, hosted on GitHub Certified distro & support from Teradata
  3. 3. 3 Brief history of Presto FALL 2012 6 developers start Presto development FALL 2014 88 Releases 41 Contributors 3943 Commits SPRING 2016 141 Releases 116 Contributors 6879 Commits SPRING 2013 Presto rolled out within Facebook FALL 2013 Facebook open sources Presto FALL 2008 Facebook open sources Hive
  4. 4. 4 • Facebook – Multiple production clusters (100s of nodes total) - Massive 300PB Hadoop data warehouse - Very large sharded MySQL installation - Growing usage of Raptor SSD-based storage – 1000s of internal daily active users – 10-100s of concurrent queries • Netflix – Over 200-node production cluster on EC2 – Over 25 PB in S3 (Parquet format) – Over 350 active users and 3K queries daily Presto in Production
  5. 5. 5 Presto Architecture Data stream API Worker Data stream API Worker Coordinator Metadata API Parser/ analyzer Planner Scheduler Worker Client Data location API Pluggable
  6. 6. 6 Presto Extensibility – connectors Parser/ analyzer Planner Worker Data location API HDFS/S3 NoSQL DBMS Custom … Metadata API HDFS/S3 NoSQL DBMS Custom … Data stream API HDFS/S3 NoSQL DBMS Custom … Scheduler Coordinator
  7. 7. 7 • Hadoop/Hive connector & file formats: – HDFS & S3 + HCatalog – ORC, RCFile, Parquet, SequenceFile, Text • Open source data stores: – MySQL & PostgreSQL (non-parallel) – Cassandra – Kafka – Redis • In development by community: – MongoDB – ElasticSearch – HBase Supported data sources & file formats
  8. 8. 8 • In-memory processing • Pipelined execution across nodes MPP-style • Vectorized columnar processing • Multithreaded execution keeps all CPU cores busy • Presto is written in highly tuned Java – Efficient flat-memory data structures (minimizes GC) – Very careful coding of inner loops – Runtime bytecode generation • Optimized ORC & Parquet readers • Excellent performance with interactive SQL analytics Presto – Query Execution Performance
  9. 9. 9 [ WITH with_query [, ...] ] SELECT [ ALL | DISTINCT ] select_expr [, ...] [ FROM table1 [[ INNER | OUTER ] JOIN table2 ON (…)] [ WHERE condition ] [ GROUP BY expression [, ...] ] [ HAVING condition] [ UNION [ ALL | DISTINCT ] select ] [ ORDER BY expression [ ASC | DESC ] [, ...] ] [ LIMIT [ count | ALL ] ] In addition: • Windowing functions • Statistical and approximate aggregate functions • UNNEST, TABLESAMPLE In development: • Complex subqueries • EXISTS, INTERSECT, EXCEPT • ROLLUP, CUBE ANSI SQL Support
  10. 10. 10 • Cluster deployment models for Presto: – on premise (appliance or commodity clusters) – VM (OpenStack, etc.) – cloud (Amazon, etc) • Types of Hadoop deployments: – on Hadoop/YARN cluster (all or subset of nodes) – on a dedicated cluster – mixed Deployment models
  11. 11. 11 Open source initiative • Announced in June 2015 at Hadoop Summit – Growing interest and adoption • Collaboration with Facebook and Presto community – Joint development, conference talks, meetups and webinars • Major commitment from Teradata Labs: – 20 full-time engineers – Free and open source contributions – Enterprise-ready distribution "A special shout out goes to Teradata — which joined the Presto community this year with a focus on enhancing enterprise features and providing support — for having seven of our top 10 external contributors." — Facebook
  12. 12. 12 Implement Integrate Proliferate • Installer • Documentation • Monitoring & Support Tools • Management Tool Integration • YARN Integration • ODBC Driver • JDBC Driver • BI Certification • Security • Cloud features Commercial Support Phase 1 Phase 2 Phase 3 June 8, 2015 Q4 2015 2016 Expanding ANSI SQL Coverage Teradata Contributions to Presto
  13. 13. 13 Recent developments and roadmap • Q1 release: – Fully-featured ODBC & JDBC drivers – Kerberos support – DECIMAL support • Later 2016: – BI tools certification – TPC-H and TPC-DS unmodified – Spill to disk
  14. 14. 14 BI Tools certifications
  15. 15. 15 Presto Connectors Teradata Certified Community Supported Teradata QueryGrid™ - Multi-System Analytics Targets Entry Points TERADATA DATABASE ASTER ANALYTICS PRESTO HADOOP HIVE / HDFS HADOOP OTHER DATABASE S NOSQL DATABASE S TERADATA DATABASE ASTER ANALYTICS PRESTO HADOOP Non-Relational DBsMulti-Genre Advanced Analytics™ Integrated Data Warehouses 3rd Party Relational DBs Multiple Hadoop SQL Query Engines and Distributions APACHE KAFKA APACHE CASSANDRA MYSQL POSTGRESQL PRESTO APIREDIS
  16. 16. 16 Certified Distro: www.teradata.com/presto Website: www.prestodb.io Presto Users Group: www.groups.google.com/group/presto-users GitHub: www.github.com/prestodb/presto www.github.com/Teradata/presto www.github.com/prestodb More information
  17. 17. 17 www.teradata.com/presto

×