SlideShare a Scribd company logo
11
Warsaw Hadoop User Group
Wojciech Biela
Łukasz Osipiuk
www.teradata.com/presto
2
➔ History of Teradata Center for Hadoop
◆ Formerly Hadapt Founded in July, 2010 by Justin Borgman, Kamil Bajda-
Pawlikowski, and Daniel Abadi
◆ Pioneered SQL-on-Hadoop market
◆ Based on work done by database research group in Yale Computer Science
Department
◆ Hybrid of Hadoop scalability and DBMS performance
➔ Today
◆ Acquired by Teradata in July, 2014, renamed Teradata Center for Hadoop
◆ 20+ developers with deep Hadoop and database expertise
◆ Headquarters in Boston, MA
◆ Teams in US (MA, CA) and Poland (Warsaw)
◆ Contributors to open source project Presto
Who are we? - Teradata Center for Hadoop!
3
➔ What is Presto?
➔ What is Teradata doing?
➔ Can I see a Demo?
➔ How can I contribute?
Talk Agenda
4
➔ 100% open source distributed ANSI SQL engine for Big Data
◆ Modern code base
◆ Proven scalability
➔ Optimized for low latency, Interactive querying
◆ Cross platform query capability, not only SQL on Hadoop
◆ Distributed under the Apache license, now supported by Teradata
◆ Used by a community of well known, well respected technology companies
What is Presto?
5
History of Presto
FALL 2012
6 developers
start Presto
development
FALL 2014
88 Releases
41 Contributors
3943 Commits
SPRING 2015
98 Releases
65 Contributors
4587 Commits
---------
Teradata joins
Presto community
& offers support
SPRING 2013
Presto rolled out
within Facebook
FALL 2013
Facebook open
sources Presto
FALL 2008
Facebook open
sources Hive
6
Query Execution
Data stream API
Worker
Data stream API
Worker
Coordinator
Metadata
API
Parser/
analyzer Planner Scheduler
Worker
Client
Data location
API
Pluggable
7
Query Execution
Data stream API
Worker
Data stream API
Worker
Coordinator
Data Location
API
Metadata
API
Parser/
analyzer Planner Scheduler
Worker
Client
Pluggable
8
Query Execution
Data stream API
Worker
Data stream API
Worker
Coordinator
Data Location
API
Metadata
API
Parser/
analyzer Planner Scheduler
Worker
Client
Pluggable
9
Query Execution
Data stream API
Worker
Data stream API
Worker
Coordinator
Data Location
API
Metadata
API
Parser/
analyzer Planner Scheduler
Worker
Client
Pluggable
10
select
shipdate,
count(*) count,
cast(sum(extendedprice)
as bigint) price
from
h_lineitem
where
returnflag = 'R'
group by shipdate
order by count
limit 20
Logical and fragmented plan
11
select
*
from
hive.default.h_nation,
psql.public.p_region
where
h_nation.regionkey = p_region.regionkey;
Logical and fragmented plan
12
Query Execution
Data stream API
Worker
Data stream API
Worker
Coordinator
Data Location
API
Metadata
API
Parser/
analyzer Planner Scheduler
Worker
Client
Pluggable
13
Query Execution
Data stream API
Worker
Data stream API
Worker
Coordinator
Data Location
API
Metadata
API
Parser/
analyzer Planner Scheduler
Worker
Client
Pluggable
14
Query Execution
Data stream API
Worker
Data stream API
Worker
Coordinator
Data Location
API
Metadata
API
Parser/
analyzer Planner Scheduler
Worker
Client
Pluggable
page 1
blockA
blockB
page
blockA
blockB ...
15
Query Execution
Data stream API
Worker
Data stream API
Worker
Coordinator
Data Location
API
Metadata
API
Parser/
analyzer Planner Scheduler
Worker
Client
Pluggable
16
Plan execution
Hive Presto
map
reduce
I/O
I/O
I/O
I/O
I/O
task task
task task
task task
task
I/O
17
Presto Extensibility – plugins
➔ Connectors
➔ Data types
➔ Extra functions
➔ (new) Security providers
18
Presto Extensibility – connector interfaces
Parser/
analyzer Planner
Worker
Data location API
Hive
Cassandra
Kafka
MySQL
…
Metadata API
Hive
Cassandra
Kafka
MySQL
…
Data stream API
Hive
Cassandra
Kafka
MySQL
…
Scheduler
Coordinator
19
Presto Extensibility – connector interfaces
public interface Connector
{
ConnectorHandleResolver getHandleResolver();
ConnectorMetadata getMetadata();
ConnectorSplitManager getSplitManager();
ConnectorPageSourceProvider getPageSourceProvider()
ConnectorRecordSetProvider getRecordSetProvider()
ConnectorPageSinkProvider getPageSinkProvider()
ConnectorRecordSinkProvider getRecordSinkProvider()
ConnectorIndexResolver getIndexResolver()
Set<SystemTable> getSystemTables()
List<PropertyMetadata<?>> getSessionProperties()
List<PropertyMetadata<?>> getTableProperties()
ConnectorAccessControl getAccessControl()
void shutdown() {}
}
20
➔ Data stays in memory during execution and is pipelined across nodes MPP-
style
➔ Vectorized columnar processing
➔ Presto is written in highly tuned Java
◆ Efficient in-memory data structures
◆ Very careful coding of inner loops
◆ Bytecode generation
➔ Optimized ORC reader
➔ Predicates push-down
➔ Query optimizer
Presto = Performance
21
➔ Facebook
◆ Multiple production clusters (100s of nodes total)
● Including 300PB Hadoop data warehouse
◆ 1000s of internal daily active users
◆ Millions of queries each month
◆ Multiple PBs scanned every day
◆ Trillions of rows a day
➔ Netflix
◆ Over 200-node production cluster on EC2
◆ Over 15 PB in S3 (Parquet format)
◆ Over 300 users and 2.5K queries daily
Presto in Production
22
➔ 100% open source contributions to Presto to
increase adoption in the enterprise
➔ A multi-year roadmap commitment to phased
enhancements of the open source code
➔ The first ever commercial support offering for
Presto
What is Teradata Doing?
Teradata Certified Presto
www.teradata.com/presto
23
➔ Hadoop Distro Agnostic
➔ Modern Code Base
◆ Presto is well-designed open source software with proper database architecture
➔ Strong Like-Minded Community
➔ Push down processing across multiple data platforms
➔ Leverage Teradata expertise to make SQL for Hadoop viable
Why is Teradata Contributing to Presto?
24
Implement Integrate Proliferate
Installer
Documentation
Monitoring & Support Tools
ODBC / JDBC Drivers
BI Certification
Security
Connectors
Commercial Support
Phase 1 Phase 2 Phase 3
June 8, 2015 Q4 2015 2016
Expanding ANSI SQL Coverage
Teradata Contributions to Presto
Management Tools
Integration
YARN Integration
25
➔ Ease of install and management via Presto-Admin tool
◆ www.github.com/prestodb/presto-admin
◆ Packaging Presto as an RPM
➔ Testing Framework for Presto
◆ www.github.com/prestodb/tempto
◆ Added large number of tests
➔ JDBC driver for JAVA 6
➔ Various SQL improvements
Teradata’s Contributions
26
➔ Continued SQL Improvements
➔ Security – Authentication & Authorization
➔ More Connectors – e.g. Hbase
➔ ODBC & JDBC Drivers that actually work
➔ BI tool certifications – e.g. Tableau
➔ YARN Integration
➔ Ambari Integration
➔ Open Source our Docker based Dev Env - WIP
➔ Open our Continuous Integration platform to the community
Teradata’s Contribution Product Roadmap
27
Teradata Engineers Dedicated to Presto
28
“Presto is an integral part of the Airbnb data infrastructure stack with hundreds
of employees running queries each day with the technology. We are excited to
see Teradata joining the Presto open source community and are encouraged by
the direction of their contributions”
- James Mayfield, product lead, Airbnb.
"We are excited to see Teradata's commitment to Presto and adding capabilities
in the open source domain. This will create interesting opportunities within our
technical and business teams to open up more access options to our critical
data. We think this is a positive for Teradata and for the community as a whole”
- Steve Deasy, vice president of Engineering, Groupon.
Early Feedback is Extremely Positive
29
Demo Time!
30
www.github.com/facebook/presto
www.github.com/prestodb
Certified Distro: www.teradata.com/presto
Website: www.prestodb.io
Presto : User’s Group: www.groups.google.com/group/presto-users
Facebook Page: www.facebook.com/prestodb
Twitter: #prestodb
How can I contribute?
31
We’re hiring!
➔ Warsaw
➔ Boston
Job Offer: bit.do/presto
Contact: Wojciech.Biela@teradata.com
Join us!
32
Available for Download
➔ Presto 101t Server, CLI, JDBC
➔ Presto-Admin 0.1
➔ Documentation
➔ HDP w/ Presto VM Sandbox
➔ CDH w/ Presto VM Sandbox
www.teradata.com/presto
Presto 101t certified by Teradata
33
Wojciech.Biela@teradata.com
Lukasz.Osipiuk@teradata.com

More Related Content

What's hot

Presto - Analytical Database. Overview and use cases.
Presto - Analytical Database. Overview and use cases.Presto - Analytical Database. Overview and use cases.
Presto - Analytical Database. Overview and use cases.
Wojciech Biela
 
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Spark Summit
 
Presto
PrestoPresto
Presto
Knoldus Inc.
 
Presto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talkPresto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talk
kbajda
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure Data
Taro L. Saito
 
Presto: Distributed sql query engine
Presto: Distributed sql query engine Presto: Distributed sql query engine
Presto: Distributed sql query engine
kiran palaka
 
Internals of Presto Service
Internals of Presto ServiceInternals of Presto Service
Internals of Presto Service
Treasure Data, Inc.
 
Prestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for PrestoPrestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for Presto
Sadayuki Furuhashi
 
Presto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @FacebookPresto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @Facebook
Treasure Data, Inc.
 
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Databricks
 
Continuous Processing in Structured Streaming with Jose Torres
 Continuous Processing in Structured Streaming with Jose Torres Continuous Processing in Structured Streaming with Jose Torres
Continuous Processing in Structured Streaming with Jose Torres
Databricks
 
Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017
Zhenxiao Luo
 
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
 Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
Databricks
 
Top 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applicationsTop 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applications
hadooparchbook
 
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using PrometheusMonitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
Databricks
 
Presto in the cloud
Presto in the cloudPresto in the cloud
Presto in the cloudQubole
 
Presto: SQL-on-anything
Presto: SQL-on-anythingPresto: SQL-on-anything
Presto: SQL-on-anything
DataWorks Summit
 
Facebook Presto presentation
Facebook Presto presentationFacebook Presto presentation
Facebook Presto presentation
Cyanny LIANG
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
StreamNative
 
The Revolution Will be Streamed
The Revolution Will be StreamedThe Revolution Will be Streamed
The Revolution Will be Streamed
Databricks
 

What's hot (20)

Presto - Analytical Database. Overview and use cases.
Presto - Analytical Database. Overview and use cases.Presto - Analytical Database. Overview and use cases.
Presto - Analytical Database. Overview and use cases.
 
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
 
Presto
PrestoPresto
Presto
 
Presto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talkPresto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talk
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure Data
 
Presto: Distributed sql query engine
Presto: Distributed sql query engine Presto: Distributed sql query engine
Presto: Distributed sql query engine
 
Internals of Presto Service
Internals of Presto ServiceInternals of Presto Service
Internals of Presto Service
 
Prestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for PrestoPrestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for Presto
 
Presto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @FacebookPresto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @Facebook
 
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
 
Continuous Processing in Structured Streaming with Jose Torres
 Continuous Processing in Structured Streaming with Jose Torres Continuous Processing in Structured Streaming with Jose Torres
Continuous Processing in Structured Streaming with Jose Torres
 
Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017
 
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
 Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
 
Top 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applicationsTop 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applications
 
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using PrometheusMonitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
 
Presto in the cloud
Presto in the cloudPresto in the cloud
Presto in the cloud
 
Presto: SQL-on-anything
Presto: SQL-on-anythingPresto: SQL-on-anything
Presto: SQL-on-anything
 
Facebook Presto presentation
Facebook Presto presentationFacebook Presto presentation
Facebook Presto presentation
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
 
The Revolution Will be Streamed
The Revolution Will be StreamedThe Revolution Will be Streamed
The Revolution Will be Streamed
 

Viewers also liked

Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Matt Fuller
 
Presto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringPresto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringTaro L. Saito
 
Presto - SQL on anything
Presto  - SQL on anythingPresto  - SQL on anything
Presto - SQL on anything
Grzegorz Kokosiński
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Sadayuki Furuhashi
 
Presto Meetup 2016 Small Start
Presto Meetup 2016 Small StartPresto Meetup 2016 Small Start
Presto Meetup 2016 Small Start
Hiroshi Toyama
 
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
Matt Fuller
 
AWS Meet-up: Logging At Scale on AWS
AWS Meet-up: Logging At Scale on AWSAWS Meet-up: Logging At Scale on AWS
AWS Meet-up: Logging At Scale on AWS
Chris Riddell
 
Prestogres internals
Prestogres internalsPrestogres internals
Prestogres internals
Sadayuki Furuhashi
 
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Cloudera, Inc.
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
ScyllaDB
 
Big Data: SQL query federation for Hadoop and RDBMS data
Big Data:  SQL query federation for Hadoop and RDBMS dataBig Data:  SQL query federation for Hadoop and RDBMS data
Big Data: SQL query federation for Hadoop and RDBMS data
Cynthia Saracco
 
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data WarehouseHybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
DataWorks Summit
 
Presto changes
Presto changesPresto changes
Presto changes
N Masahiro
 
Presto in my_use_case2
Presto in my_use_case2Presto in my_use_case2
Presto in my_use_case2
wyukawa
 
Teradata Big Data London Seminar
Teradata Big Data London SeminarTeradata Big Data London Seminar
Teradata Big Data London Seminar
Hortonworks
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor Landscape
Nicolas Morales
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop Tutorial
Daniel Abadi
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
Hortonworks
 
Data virtualization, Data Federation & IaaS with Jboss Teiid
Data virtualization, Data Federation & IaaS with Jboss TeiidData virtualization, Data Federation & IaaS with Jboss Teiid
Data virtualization, Data Federation & IaaS with Jboss Teiid
Anil Allewar
 
An introduction to data virtualization in business intelligence
An introduction to data virtualization in business intelligenceAn introduction to data virtualization in business intelligence
An introduction to data virtualization in business intelligence
David Walker
 

Viewers also liked (20)

Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
 
Presto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringPresto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoring
 
Presto - SQL on anything
Presto  - SQL on anythingPresto  - SQL on anything
Presto - SQL on anything
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
 
Presto Meetup 2016 Small Start
Presto Meetup 2016 Small StartPresto Meetup 2016 Small Start
Presto Meetup 2016 Small Start
 
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
 
AWS Meet-up: Logging At Scale on AWS
AWS Meet-up: Logging At Scale on AWSAWS Meet-up: Logging At Scale on AWS
AWS Meet-up: Logging At Scale on AWS
 
Prestogres internals
Prestogres internalsPrestogres internals
Prestogres internals
 
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
 
Big Data: SQL query federation for Hadoop and RDBMS data
Big Data:  SQL query federation for Hadoop and RDBMS dataBig Data:  SQL query federation for Hadoop and RDBMS data
Big Data: SQL query federation for Hadoop and RDBMS data
 
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data WarehouseHybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
 
Presto changes
Presto changesPresto changes
Presto changes
 
Presto in my_use_case2
Presto in my_use_case2Presto in my_use_case2
Presto in my_use_case2
 
Teradata Big Data London Seminar
Teradata Big Data London SeminarTeradata Big Data London Seminar
Teradata Big Data London Seminar
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor Landscape
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop Tutorial
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
 
Data virtualization, Data Federation & IaaS with Jboss Teiid
Data virtualization, Data Federation & IaaS with Jboss TeiidData virtualization, Data Federation & IaaS with Jboss Teiid
Data virtualization, Data Federation & IaaS with Jboss Teiid
 
An introduction to data virtualization in business intelligence
An introduction to data virtualization in business intelligenceAn introduction to data virtualization in business intelligence
An introduction to data virtualization in business intelligence
 

Similar to Presto for the Enterprise @ Hadoop Meetup

Open Source SQL for Hadoop: Where are we and Where are we Going?
Open Source SQL for Hadoop: Where are we and Where are we Going?Open Source SQL for Hadoop: Where are we and Where are we Going?
Open Source SQL for Hadoop: Where are we and Where are we Going?
DataWorks Summit
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
Wes McKinney
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Wes McKinney
 
Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...
DataWorks Summit
 
Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020
Piotr Findeisen
 
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data Lake
DATAVERSITY
 
E2E Data Pipeline - Apache Spark/Airflow/Livy
E2E Data Pipeline - Apache Spark/Airflow/LivyE2E Data Pipeline - Apache Spark/Airflow/Livy
E2E Data Pipeline - Apache Spark/Airflow/Livy
Rikin Tanna
 
Presto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 BostonPresto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 Boston
kbajda
 
Connecting your .Net Applications to NoSQL Databases - MongoDB & Cassandra
Connecting your .Net Applications to NoSQL Databases - MongoDB & CassandraConnecting your .Net Applications to NoSQL Databases - MongoDB & Cassandra
Connecting your .Net Applications to NoSQL Databases - MongoDB & Cassandra
Lohith Goudagere Nagaraj
 
Que hay de nuevo en Visual Studio 2013 y ASP.NET 5.1
Que hay de nuevo en Visual Studio 2013 y ASP.NET 5.1Que hay de nuevo en Visual Studio 2013 y ASP.NET 5.1
Que hay de nuevo en Visual Studio 2013 y ASP.NET 5.1
Rodolfo Finochietti
 
Simplify DevOps with Microservices and Mobile Backends.pptx
Simplify DevOps with Microservices and Mobile Backends.pptxSimplify DevOps with Microservices and Mobile Backends.pptx
Simplify DevOps with Microservices and Mobile Backends.pptx
ssuser5faa791
 
SharePoint 2016 Is Coming! Are You Ready?
SharePoint 2016 Is Coming! Are You Ready?SharePoint 2016 Is Coming! Are You Ready?
SharePoint 2016 Is Coming! Are You Ready?
Richard Harbridge
 
Sap integration with_j_boss_technologies
Sap integration with_j_boss_technologiesSap integration with_j_boss_technologies
Sap integration with_j_boss_technologiesSerge Pagop
 
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
DataKitchen
 
Apache AGE and the synergy effect in the combination of Postgres and NoSQL
 Apache AGE and the synergy effect in the combination of Postgres and NoSQL Apache AGE and the synergy effect in the combination of Postgres and NoSQL
Apache AGE and the synergy effect in the combination of Postgres and NoSQL
EDB
 
PaaS Anywhere - Deploying an OpenShift PaaS into your Cloud Provider of Choice
PaaS Anywhere - Deploying an OpenShift PaaS into your Cloud Provider of ChoicePaaS Anywhere - Deploying an OpenShift PaaS into your Cloud Provider of Choice
PaaS Anywhere - Deploying an OpenShift PaaS into your Cloud Provider of Choice
Isaac Christoffersen
 
Designing the Next Generation of Data Pipelines at Zillow with Apache Spark
Designing the Next Generation of Data Pipelines at Zillow with Apache SparkDesigning the Next Generation of Data Pipelines at Zillow with Apache Spark
Designing the Next Generation of Data Pipelines at Zillow with Apache Spark
Databricks
 
Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data Platforms
ScyllaDB
 
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStackSavanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Sergey Lukjanov
 
The Decoupled CMS in Financial Services
The Decoupled CMS in Financial ServicesThe Decoupled CMS in Financial Services
The Decoupled CMS in Financial Services
Open Source Strategy Forum
 

Similar to Presto for the Enterprise @ Hadoop Meetup (20)

Open Source SQL for Hadoop: Where are we and Where are we Going?
Open Source SQL for Hadoop: Where are we and Where are we Going?Open Source SQL for Hadoop: Where are we and Where are we Going?
Open Source SQL for Hadoop: Where are we and Where are we Going?
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
 
Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...
 
Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020
 
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data Lake
 
E2E Data Pipeline - Apache Spark/Airflow/Livy
E2E Data Pipeline - Apache Spark/Airflow/LivyE2E Data Pipeline - Apache Spark/Airflow/Livy
E2E Data Pipeline - Apache Spark/Airflow/Livy
 
Presto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 BostonPresto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 Boston
 
Connecting your .Net Applications to NoSQL Databases - MongoDB & Cassandra
Connecting your .Net Applications to NoSQL Databases - MongoDB & CassandraConnecting your .Net Applications to NoSQL Databases - MongoDB & Cassandra
Connecting your .Net Applications to NoSQL Databases - MongoDB & Cassandra
 
Que hay de nuevo en Visual Studio 2013 y ASP.NET 5.1
Que hay de nuevo en Visual Studio 2013 y ASP.NET 5.1Que hay de nuevo en Visual Studio 2013 y ASP.NET 5.1
Que hay de nuevo en Visual Studio 2013 y ASP.NET 5.1
 
Simplify DevOps with Microservices and Mobile Backends.pptx
Simplify DevOps with Microservices and Mobile Backends.pptxSimplify DevOps with Microservices and Mobile Backends.pptx
Simplify DevOps with Microservices and Mobile Backends.pptx
 
SharePoint 2016 Is Coming! Are You Ready?
SharePoint 2016 Is Coming! Are You Ready?SharePoint 2016 Is Coming! Are You Ready?
SharePoint 2016 Is Coming! Are You Ready?
 
Sap integration with_j_boss_technologies
Sap integration with_j_boss_technologiesSap integration with_j_boss_technologies
Sap integration with_j_boss_technologies
 
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
 
Apache AGE and the synergy effect in the combination of Postgres and NoSQL
 Apache AGE and the synergy effect in the combination of Postgres and NoSQL Apache AGE and the synergy effect in the combination of Postgres and NoSQL
Apache AGE and the synergy effect in the combination of Postgres and NoSQL
 
PaaS Anywhere - Deploying an OpenShift PaaS into your Cloud Provider of Choice
PaaS Anywhere - Deploying an OpenShift PaaS into your Cloud Provider of ChoicePaaS Anywhere - Deploying an OpenShift PaaS into your Cloud Provider of Choice
PaaS Anywhere - Deploying an OpenShift PaaS into your Cloud Provider of Choice
 
Designing the Next Generation of Data Pipelines at Zillow with Apache Spark
Designing the Next Generation of Data Pipelines at Zillow with Apache SparkDesigning the Next Generation of Data Pipelines at Zillow with Apache Spark
Designing the Next Generation of Data Pipelines at Zillow with Apache Spark
 
Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data Platforms
 
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStackSavanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
 
The Decoupled CMS in Financial Services
The Decoupled CMS in Financial ServicesThe Decoupled CMS in Financial Services
The Decoupled CMS in Financial Services
 

Recently uploaded

Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
kalichargn70th171
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 

Recently uploaded (20)

Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 

Presto for the Enterprise @ Hadoop Meetup

  • 1. 11 Warsaw Hadoop User Group Wojciech Biela Łukasz Osipiuk www.teradata.com/presto
  • 2. 2 ➔ History of Teradata Center for Hadoop ◆ Formerly Hadapt Founded in July, 2010 by Justin Borgman, Kamil Bajda- Pawlikowski, and Daniel Abadi ◆ Pioneered SQL-on-Hadoop market ◆ Based on work done by database research group in Yale Computer Science Department ◆ Hybrid of Hadoop scalability and DBMS performance ➔ Today ◆ Acquired by Teradata in July, 2014, renamed Teradata Center for Hadoop ◆ 20+ developers with deep Hadoop and database expertise ◆ Headquarters in Boston, MA ◆ Teams in US (MA, CA) and Poland (Warsaw) ◆ Contributors to open source project Presto Who are we? - Teradata Center for Hadoop!
  • 3. 3 ➔ What is Presto? ➔ What is Teradata doing? ➔ Can I see a Demo? ➔ How can I contribute? Talk Agenda
  • 4. 4 ➔ 100% open source distributed ANSI SQL engine for Big Data ◆ Modern code base ◆ Proven scalability ➔ Optimized for low latency, Interactive querying ◆ Cross platform query capability, not only SQL on Hadoop ◆ Distributed under the Apache license, now supported by Teradata ◆ Used by a community of well known, well respected technology companies What is Presto?
  • 5. 5 History of Presto FALL 2012 6 developers start Presto development FALL 2014 88 Releases 41 Contributors 3943 Commits SPRING 2015 98 Releases 65 Contributors 4587 Commits --------- Teradata joins Presto community & offers support SPRING 2013 Presto rolled out within Facebook FALL 2013 Facebook open sources Presto FALL 2008 Facebook open sources Hive
  • 6. 6 Query Execution Data stream API Worker Data stream API Worker Coordinator Metadata API Parser/ analyzer Planner Scheduler Worker Client Data location API Pluggable
  • 7. 7 Query Execution Data stream API Worker Data stream API Worker Coordinator Data Location API Metadata API Parser/ analyzer Planner Scheduler Worker Client Pluggable
  • 8. 8 Query Execution Data stream API Worker Data stream API Worker Coordinator Data Location API Metadata API Parser/ analyzer Planner Scheduler Worker Client Pluggable
  • 9. 9 Query Execution Data stream API Worker Data stream API Worker Coordinator Data Location API Metadata API Parser/ analyzer Planner Scheduler Worker Client Pluggable
  • 10. 10 select shipdate, count(*) count, cast(sum(extendedprice) as bigint) price from h_lineitem where returnflag = 'R' group by shipdate order by count limit 20 Logical and fragmented plan
  • 12. 12 Query Execution Data stream API Worker Data stream API Worker Coordinator Data Location API Metadata API Parser/ analyzer Planner Scheduler Worker Client Pluggable
  • 13. 13 Query Execution Data stream API Worker Data stream API Worker Coordinator Data Location API Metadata API Parser/ analyzer Planner Scheduler Worker Client Pluggable
  • 14. 14 Query Execution Data stream API Worker Data stream API Worker Coordinator Data Location API Metadata API Parser/ analyzer Planner Scheduler Worker Client Pluggable page 1 blockA blockB page blockA blockB ...
  • 15. 15 Query Execution Data stream API Worker Data stream API Worker Coordinator Data Location API Metadata API Parser/ analyzer Planner Scheduler Worker Client Pluggable
  • 17. 17 Presto Extensibility – plugins ➔ Connectors ➔ Data types ➔ Extra functions ➔ (new) Security providers
  • 18. 18 Presto Extensibility – connector interfaces Parser/ analyzer Planner Worker Data location API Hive Cassandra Kafka MySQL … Metadata API Hive Cassandra Kafka MySQL … Data stream API Hive Cassandra Kafka MySQL … Scheduler Coordinator
  • 19. 19 Presto Extensibility – connector interfaces public interface Connector { ConnectorHandleResolver getHandleResolver(); ConnectorMetadata getMetadata(); ConnectorSplitManager getSplitManager(); ConnectorPageSourceProvider getPageSourceProvider() ConnectorRecordSetProvider getRecordSetProvider() ConnectorPageSinkProvider getPageSinkProvider() ConnectorRecordSinkProvider getRecordSinkProvider() ConnectorIndexResolver getIndexResolver() Set<SystemTable> getSystemTables() List<PropertyMetadata<?>> getSessionProperties() List<PropertyMetadata<?>> getTableProperties() ConnectorAccessControl getAccessControl() void shutdown() {} }
  • 20. 20 ➔ Data stays in memory during execution and is pipelined across nodes MPP- style ➔ Vectorized columnar processing ➔ Presto is written in highly tuned Java ◆ Efficient in-memory data structures ◆ Very careful coding of inner loops ◆ Bytecode generation ➔ Optimized ORC reader ➔ Predicates push-down ➔ Query optimizer Presto = Performance
  • 21. 21 ➔ Facebook ◆ Multiple production clusters (100s of nodes total) ● Including 300PB Hadoop data warehouse ◆ 1000s of internal daily active users ◆ Millions of queries each month ◆ Multiple PBs scanned every day ◆ Trillions of rows a day ➔ Netflix ◆ Over 200-node production cluster on EC2 ◆ Over 15 PB in S3 (Parquet format) ◆ Over 300 users and 2.5K queries daily Presto in Production
  • 22. 22 ➔ 100% open source contributions to Presto to increase adoption in the enterprise ➔ A multi-year roadmap commitment to phased enhancements of the open source code ➔ The first ever commercial support offering for Presto What is Teradata Doing? Teradata Certified Presto www.teradata.com/presto
  • 23. 23 ➔ Hadoop Distro Agnostic ➔ Modern Code Base ◆ Presto is well-designed open source software with proper database architecture ➔ Strong Like-Minded Community ➔ Push down processing across multiple data platforms ➔ Leverage Teradata expertise to make SQL for Hadoop viable Why is Teradata Contributing to Presto?
  • 24. 24 Implement Integrate Proliferate Installer Documentation Monitoring & Support Tools ODBC / JDBC Drivers BI Certification Security Connectors Commercial Support Phase 1 Phase 2 Phase 3 June 8, 2015 Q4 2015 2016 Expanding ANSI SQL Coverage Teradata Contributions to Presto Management Tools Integration YARN Integration
  • 25. 25 ➔ Ease of install and management via Presto-Admin tool ◆ www.github.com/prestodb/presto-admin ◆ Packaging Presto as an RPM ➔ Testing Framework for Presto ◆ www.github.com/prestodb/tempto ◆ Added large number of tests ➔ JDBC driver for JAVA 6 ➔ Various SQL improvements Teradata’s Contributions
  • 26. 26 ➔ Continued SQL Improvements ➔ Security – Authentication & Authorization ➔ More Connectors – e.g. Hbase ➔ ODBC & JDBC Drivers that actually work ➔ BI tool certifications – e.g. Tableau ➔ YARN Integration ➔ Ambari Integration ➔ Open Source our Docker based Dev Env - WIP ➔ Open our Continuous Integration platform to the community Teradata’s Contribution Product Roadmap
  • 28. 28 “Presto is an integral part of the Airbnb data infrastructure stack with hundreds of employees running queries each day with the technology. We are excited to see Teradata joining the Presto open source community and are encouraged by the direction of their contributions” - James Mayfield, product lead, Airbnb. "We are excited to see Teradata's commitment to Presto and adding capabilities in the open source domain. This will create interesting opportunities within our technical and business teams to open up more access options to our critical data. We think this is a positive for Teradata and for the community as a whole” - Steve Deasy, vice president of Engineering, Groupon. Early Feedback is Extremely Positive
  • 30. 30 www.github.com/facebook/presto www.github.com/prestodb Certified Distro: www.teradata.com/presto Website: www.prestodb.io Presto : User’s Group: www.groups.google.com/group/presto-users Facebook Page: www.facebook.com/prestodb Twitter: #prestodb How can I contribute?
  • 31. 31 We’re hiring! ➔ Warsaw ➔ Boston Job Offer: bit.do/presto Contact: Wojciech.Biela@teradata.com Join us!
  • 32. 32 Available for Download ➔ Presto 101t Server, CLI, JDBC ➔ Presto-Admin 0.1 ➔ Documentation ➔ HDP w/ Presto VM Sandbox ➔ CDH w/ Presto VM Sandbox www.teradata.com/presto Presto 101t certified by Teradata