SlideShare a Scribd company logo
1
Boston Hadoop User Group Meetup, July 7, 2015
Kamil Bajda-Pawlikowski
Matt Fuller
2
•  History of Teradata Center for Hadoop
–  Formerly Hadapt Founded in July, 2010 by Borgman, Bajda-Pawlikowski, and
Abadi
–  Pioneered SQL-on-Hadoop market
–  Based on work done by database research group in Yale Computer Science
Department
–  Hybrid of Hadoop scalability and DBMS performance
•  Today
–  Acquired by Teradata in July, 2014, renamed Teradata Center for Hadoop
–  30 developers with deep Hadoop and database expertise
–  Headquarters in Boston, MA
–  Contributors to open source project Presto
Who are we? - Teradata Center for Hadoop!
3
•  What is Presto?
•  What is Teradata doing?
•  Can I see a Demo?
•  How can I contribute?
Talk Agenda
4
•  100% open source distributed ANSI SQL engine for Big Data
–  Modern code base
–  Proven scalability
–  Optimized for low latency, Interactive querying
•  Cross platform query capability, not only SQL on Hadoop
•  Distributed under the Apache license, now supported by Teradata
•  Used by a community of well known, well respected technology companies
What is Presto?
5
History of Presto
FALL 2012
4 developers
start Presto
development
FALL 2014
88 Releases
41 Contributors
3943 Commits
SPRING 2015
98 Releases
65 Contributors
4587 Commits
---------
Teradata joins
Presto community
& offers support
SPRING 2013
Presto rolled out
within Facebook
FALL 2013
Facebook open
sources Presto
FALL 2008
Facebook
open sources
Hive
Timeline image courtesy of Facebook
6
Presto Architecture
Data stream API
Worker
Data stream API
Worker
Coordinator
Metadata
API
Parser/
analyzer
Planner Scheduler
Worker
Client
Data location
API
Pluggable
https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920
7
Presto Extensibility – connectors
Parser/
analyzer
Planner
Worker
Data location API
Hive
Cassandra
Kafka
MySQL
…
Metadata API
Hive
Cassandra
Kafka
MySQL
…
Data stream API
Hive
Cassandra
Kafka
MySQL
…
Scheduler
Coordinator
https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920
8
•  Data stays in memory during execution and is pipelined across nodes MPP-style
•  Vectorized columnar processing
•  Presto is written in highly tuned Java
–  Efficient in-memory data structures
–  Very careful coding of inner loops
–  Bytecode generation
•  Optimized ORC reader
Presto = Performance
9
•  Facebook
–  Multiple production clusters (100s of nodes total)
-  Including 300PB Hadoop data warehouse
–  1000s of internal daily active users
–  Millions of queries each month
–  Multiple PBs scanned every day
–  Trillions of rows a day
•  Netflix
–  Over 200-node production cluster on EC2
–  Over 15 PB in S3 (Parquet format)
–  Over 300 users and 2.5K queries daily
Presto in Production
10
•  100% open source contributions to Presto to
increase adoption in the enterprise
•  A multi-year roadmap commitment to
phased enhancements of the open source
code
•  The first ever commercial support offering for
Presto
What is Teradata Doing?
Teradata Certified Presto
www.teradata.com/presto
11
•  Hadoop Distro Agnostic
•  Modern Code Base
–  Presto is well-designed open source software with proper database
architecture
•  Strong Like-Minded Community
•  Push down processing across multiple data platforms
•  Leverage Teradata expertise to make SQL for Hadoop viable
Why is Teradata Contributing to Presto?
12
Demo Time!
13
Implement Integrate Proliferate
•  Installer
•  Documentation
•  Monitoring & Support
Tools
•  Management Tool
Integration
•  YARN Integration
•  ODBC / JDBC Drivers
•  BI Certification
•  Security
•  Connectors
Commercial Support
Phase 1 Phase 2 Phase 3
June 8, 2015 Q4 2015 2016
Expanding ANSI SQL Coverage
Teradata Contributions to Presto
14
•  Ease of install and management via Presto-Admin tool
–  www.github.com/prestodb/presto-admin
–  Packaging Presto as an RPM
•  Testing Framework for Presto
–  www.github.com/prestodb/tempto
–  Added large number of tests
•  Improvements to JDBC driver
–  To be open sourced on www.github.com/prestodb soon!
•  Various SQL improvements
Teradata’s Contributions
15
•  YARN Integration
•  Ambari Integration
•  ODBC & JDBC Drivers that actually work
•  Security – Authentication & Authorization
•  Continued SQL Improvements
•  BI tool certifications – e.g. Tableau
•  More Connectors – e.g. Hbase
•  Open Source our Docker based Dev Env
•  Open our Continuous Integration platform to the community
Teradata’s Contribution Product Roadmap
16
www.github.com/facebook/presto
www.github.com/prestodb
Certified Distro: www.teradata.com/presto
Website: www.prestodb.io
Presto User’s Group: www.groups.google.com/group/presto-users
Facebook Page: www.facebook.com/prestodb
Twitter: #prestodb
How can I contribute?
17
Available for Download
–  Presto 101t Server, CLI, JDBC
–  Presto-Admin 0.1
–  Documentation
–  HDP w/ Presto VM Sandbox
–  CDH w/ Presto VM Sandbox
www.teradata.com/presto
Presto 101t certified by Teradata
18

More Related Content

What's hot

Presto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talkPresto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talk
kbajda
 
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Martin Traverso
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure Data
Taro L. Saito
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Sadayuki Furuhashi
 
Presto at Twitter
Presto at TwitterPresto at Twitter
Presto at Twitter
Bill Graham
 
Presto in my_use_case
Presto in my_use_casePresto in my_use_case
Presto in my_use_case
wyukawa
 
Presto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @FacebookPresto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @Facebook
Treasure Data, Inc.
 
Facebook Presto presentation
Facebook Presto presentationFacebook Presto presentation
Facebook Presto presentation
Cyanny LIANG
 
Prestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for PrestoPrestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for Presto
Sadayuki Furuhashi
 
Presto+MySQLで分散SQL
Presto+MySQLで分散SQLPresto+MySQLで分散SQL
Presto+MySQLで分散SQL
Sadayuki Furuhashi
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query Engine
DataWorks Summit
 
Continuous Processing in Structured Streaming with Jose Torres
 Continuous Processing in Structured Streaming with Jose Torres Continuous Processing in Structured Streaming with Jose Torres
Continuous Processing in Structured Streaming with Jose Torres
Databricks
 
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
Databricks
 
Presto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringPresto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringTaro L. Saito
 
How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case
Kai Sasaki
 
HBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbmsHBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbms
Michael Stack
 
Stream All Things—Patterns of Modern Data Integration with Gwen Shapira
Stream All Things—Patterns of Modern Data Integration with Gwen ShapiraStream All Things—Patterns of Modern Data Integration with Gwen Shapira
Stream All Things—Patterns of Modern Data Integration with Gwen Shapira
Databricks
 
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Spark Summit
 
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Databricks
 

What's hot (20)

Presto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talkPresto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talk
 
Presto
PrestoPresto
Presto
 
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure Data
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
 
Presto at Twitter
Presto at TwitterPresto at Twitter
Presto at Twitter
 
Presto in my_use_case
Presto in my_use_casePresto in my_use_case
Presto in my_use_case
 
Presto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @FacebookPresto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @Facebook
 
Facebook Presto presentation
Facebook Presto presentationFacebook Presto presentation
Facebook Presto presentation
 
Prestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for PrestoPrestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for Presto
 
Presto+MySQLで分散SQL
Presto+MySQLで分散SQLPresto+MySQLで分散SQL
Presto+MySQLで分散SQL
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query Engine
 
Continuous Processing in Structured Streaming with Jose Torres
 Continuous Processing in Structured Streaming with Jose Torres Continuous Processing in Structured Streaming with Jose Torres
Continuous Processing in Structured Streaming with Jose Torres
 
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
 
Presto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringPresto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoring
 
How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case
 
HBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbmsHBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbms
 
Stream All Things—Patterns of Modern Data Integration with Gwen Shapira
Stream All Things—Patterns of Modern Data Integration with Gwen ShapiraStream All Things—Patterns of Modern Data Integration with Gwen Shapira
Stream All Things—Patterns of Modern Data Integration with Gwen Shapira
 
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
 
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
 

Viewers also liked

What is the best Healthcare Data Warehouse Model for Your Organization?
What is the best Healthcare Data Warehouse Model for Your Organization?What is the best Healthcare Data Warehouse Model for Your Organization?
What is the best Healthcare Data Warehouse Model for Your Organization?
Health Catalyst
 
A New GIS-driven Approach to Optimize Service Area Boundaries for ACOs
A New GIS-driven Approach to Optimize Service Area Boundaries for ACOsA New GIS-driven Approach to Optimize Service Area Boundaries for ACOs
A New GIS-driven Approach to Optimize Service Area Boundaries for ACOs
Health Catalyst
 
From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...
From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...
From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...
Health Catalyst
 
Is Value-Based Healthcare Here to Stay? Looking for Answers in New Policies
Is Value-Based Healthcare Here to Stay? Looking for Answers in New PoliciesIs Value-Based Healthcare Here to Stay? Looking for Answers in New Policies
Is Value-Based Healthcare Here to Stay? Looking for Answers in New Policies
Health Catalyst
 
INIA- CISA: Análisis de las amenazas en la fauna silvestre
INIA- CISA: Análisis de las amenazas en la fauna silvestreINIA- CISA: Análisis de las amenazas en la fauna silvestre
INIA- CISA: Análisis de las amenazas en la fauna silvestre
Esri
 
Getting The Most Out of Your Data Analyst - HAS Session 9
Getting The Most Out of Your Data Analyst - HAS Session 9Getting The Most Out of Your Data Analyst - HAS Session 9
Getting The Most Out of Your Data Analyst - HAS Session 9
Health Catalyst
 
Improving Patient Safety and Quality Through Culture, Clinical Analytics, Evi...
Improving Patient Safety and Quality Through Culture, Clinical Analytics, Evi...Improving Patient Safety and Quality Through Culture, Clinical Analytics, Evi...
Improving Patient Safety and Quality Through Culture, Clinical Analytics, Evi...
Health Catalyst
 
Five Strategies for Easing the Burden of Clinical Quality Measures
Five Strategies for Easing the Burden of Clinical Quality MeasuresFive Strategies for Easing the Burden of Clinical Quality Measures
Five Strategies for Easing the Burden of Clinical Quality Measures
Health Catalyst
 
7 Features of Highly Effective Outcomes Improvement Projects
7 Features of Highly Effective Outcomes Improvement Projects7 Features of Highly Effective Outcomes Improvement Projects
7 Features of Highly Effective Outcomes Improvement Projects
Health Catalyst
 
The Who, What, and How of Health Outcome Measures
The Who, What, and How of Health Outcome MeasuresThe Who, What, and How of Health Outcome Measures
The Who, What, and How of Health Outcome Measures
Health Catalyst
 
The Top Five Recommendations for Improving the Patient Experience
The Top Five Recommendations for Improving the Patient ExperienceThe Top Five Recommendations for Improving the Patient Experience
The Top Five Recommendations for Improving the Patient Experience
Health Catalyst
 
Why Most Analytic Applications Will Never Be Able to Significantly Improve He...
Why Most Analytic Applications Will Never Be Able to Significantly Improve He...Why Most Analytic Applications Will Never Be Able to Significantly Improve He...
Why Most Analytic Applications Will Never Be Able to Significantly Improve He...
Health Catalyst
 
Healthcare Interoperability: New Tactics and Technology
Healthcare Interoperability: New Tactics and TechnologyHealthcare Interoperability: New Tactics and Technology
Healthcare Interoperability: New Tactics and Technology
Health Catalyst
 
Why You Need to Understand Value-Based Reimbursement and How to Survive It
Why You Need to Understand Value-Based Reimbursement and How to Survive ItWhy You Need to Understand Value-Based Reimbursement and How to Survive It
Why You Need to Understand Value-Based Reimbursement and How to Survive It
Health Catalyst
 
The Top 7 Outcomes Measures and 3 Measurement Essentials
The Top 7 Outcomes Measures and 3 Measurement EssentialsThe Top 7 Outcomes Measures and 3 Measurement Essentials
The Top 7 Outcomes Measures and 3 Measurement Essentials
Health Catalyst
 
Outcomes improvement: what you get when you mix good data with physician enga...
Outcomes improvement: what you get when you mix good data with physician enga...Outcomes improvement: what you get when you mix good data with physician enga...
Outcomes improvement: what you get when you mix good data with physician enga...
Health Catalyst
 
Patient Flight Path Analytics: From Airline Operations to Healthcare Outcomes
Patient Flight Path Analytics: From Airline Operations to Healthcare OutcomesPatient Flight Path Analytics: From Airline Operations to Healthcare Outcomes
Patient Flight Path Analytics: From Airline Operations to Healthcare Outcomes
Health Catalyst
 
Why We Need to Shift Healthcare Quality Measures from Volume to Value
Why We Need to Shift Healthcare Quality Measures from Volume to ValueWhy We Need to Shift Healthcare Quality Measures from Volume to Value
Why We Need to Shift Healthcare Quality Measures from Volume to Value
Health Catalyst
 
Demystifying Healthcare Data Governance
Demystifying Healthcare Data GovernanceDemystifying Healthcare Data Governance
Demystifying Healthcare Data Governance
Health Catalyst
 
Understanding Risk Stratification, Comorbidities, and the Future of Healthcare
Understanding Risk Stratification, Comorbidities, and the Future of HealthcareUnderstanding Risk Stratification, Comorbidities, and the Future of Healthcare
Understanding Risk Stratification, Comorbidities, and the Future of Healthcare
Health Catalyst
 

Viewers also liked (20)

What is the best Healthcare Data Warehouse Model for Your Organization?
What is the best Healthcare Data Warehouse Model for Your Organization?What is the best Healthcare Data Warehouse Model for Your Organization?
What is the best Healthcare Data Warehouse Model for Your Organization?
 
A New GIS-driven Approach to Optimize Service Area Boundaries for ACOs
A New GIS-driven Approach to Optimize Service Area Boundaries for ACOsA New GIS-driven Approach to Optimize Service Area Boundaries for ACOs
A New GIS-driven Approach to Optimize Service Area Boundaries for ACOs
 
From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...
From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...
From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...
 
Is Value-Based Healthcare Here to Stay? Looking for Answers in New Policies
Is Value-Based Healthcare Here to Stay? Looking for Answers in New PoliciesIs Value-Based Healthcare Here to Stay? Looking for Answers in New Policies
Is Value-Based Healthcare Here to Stay? Looking for Answers in New Policies
 
INIA- CISA: Análisis de las amenazas en la fauna silvestre
INIA- CISA: Análisis de las amenazas en la fauna silvestreINIA- CISA: Análisis de las amenazas en la fauna silvestre
INIA- CISA: Análisis de las amenazas en la fauna silvestre
 
Getting The Most Out of Your Data Analyst - HAS Session 9
Getting The Most Out of Your Data Analyst - HAS Session 9Getting The Most Out of Your Data Analyst - HAS Session 9
Getting The Most Out of Your Data Analyst - HAS Session 9
 
Improving Patient Safety and Quality Through Culture, Clinical Analytics, Evi...
Improving Patient Safety and Quality Through Culture, Clinical Analytics, Evi...Improving Patient Safety and Quality Through Culture, Clinical Analytics, Evi...
Improving Patient Safety and Quality Through Culture, Clinical Analytics, Evi...
 
Five Strategies for Easing the Burden of Clinical Quality Measures
Five Strategies for Easing the Burden of Clinical Quality MeasuresFive Strategies for Easing the Burden of Clinical Quality Measures
Five Strategies for Easing the Burden of Clinical Quality Measures
 
7 Features of Highly Effective Outcomes Improvement Projects
7 Features of Highly Effective Outcomes Improvement Projects7 Features of Highly Effective Outcomes Improvement Projects
7 Features of Highly Effective Outcomes Improvement Projects
 
The Who, What, and How of Health Outcome Measures
The Who, What, and How of Health Outcome MeasuresThe Who, What, and How of Health Outcome Measures
The Who, What, and How of Health Outcome Measures
 
The Top Five Recommendations for Improving the Patient Experience
The Top Five Recommendations for Improving the Patient ExperienceThe Top Five Recommendations for Improving the Patient Experience
The Top Five Recommendations for Improving the Patient Experience
 
Why Most Analytic Applications Will Never Be Able to Significantly Improve He...
Why Most Analytic Applications Will Never Be Able to Significantly Improve He...Why Most Analytic Applications Will Never Be Able to Significantly Improve He...
Why Most Analytic Applications Will Never Be Able to Significantly Improve He...
 
Healthcare Interoperability: New Tactics and Technology
Healthcare Interoperability: New Tactics and TechnologyHealthcare Interoperability: New Tactics and Technology
Healthcare Interoperability: New Tactics and Technology
 
Why You Need to Understand Value-Based Reimbursement and How to Survive It
Why You Need to Understand Value-Based Reimbursement and How to Survive ItWhy You Need to Understand Value-Based Reimbursement and How to Survive It
Why You Need to Understand Value-Based Reimbursement and How to Survive It
 
The Top 7 Outcomes Measures and 3 Measurement Essentials
The Top 7 Outcomes Measures and 3 Measurement EssentialsThe Top 7 Outcomes Measures and 3 Measurement Essentials
The Top 7 Outcomes Measures and 3 Measurement Essentials
 
Outcomes improvement: what you get when you mix good data with physician enga...
Outcomes improvement: what you get when you mix good data with physician enga...Outcomes improvement: what you get when you mix good data with physician enga...
Outcomes improvement: what you get when you mix good data with physician enga...
 
Patient Flight Path Analytics: From Airline Operations to Healthcare Outcomes
Patient Flight Path Analytics: From Airline Operations to Healthcare OutcomesPatient Flight Path Analytics: From Airline Operations to Healthcare Outcomes
Patient Flight Path Analytics: From Airline Operations to Healthcare Outcomes
 
Why We Need to Shift Healthcare Quality Measures from Volume to Value
Why We Need to Shift Healthcare Quality Measures from Volume to ValueWhy We Need to Shift Healthcare Quality Measures from Volume to Value
Why We Need to Shift Healthcare Quality Measures from Volume to Value
 
Demystifying Healthcare Data Governance
Demystifying Healthcare Data GovernanceDemystifying Healthcare Data Governance
Demystifying Healthcare Data Governance
 
Understanding Risk Stratification, Comorbidities, and the Future of Healthcare
Understanding Risk Stratification, Comorbidities, and the Future of HealthcareUnderstanding Risk Stratification, Comorbidities, and the Future of Healthcare
Understanding Risk Stratification, Comorbidities, and the Future of Healthcare
 

Similar to Boston Hadoop Meetup: Presto for the Enterprise

Open Source SQL for Hadoop: Where are we and Where are we Going?
Open Source SQL for Hadoop: Where are we and Where are we Going?Open Source SQL for Hadoop: Where are we and Where are we Going?
Open Source SQL for Hadoop: Where are we and Where are we Going?
DataWorks Summit
 
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
viirya
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
ssuserd3a367
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor Landscape
Nicolas Morales
 
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experience
Wilfried Hoge
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
Gwen (Chen) Shapira
 
Apache drill
Apache drillApache drill
Apache drill
MapR Technologies
 
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Data Con LA
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
Cask Data
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
DataWorks Summit
 
ECS19 - Vesa Juvonen - SharePoint and Office 365 Development PowerClass
ECS19 - Vesa Juvonen - SharePoint and Office 365 Development PowerClassECS19 - Vesa Juvonen - SharePoint and Office 365 Development PowerClass
ECS19 - Vesa Juvonen - SharePoint and Office 365 Development PowerClass
European Collaboration Summit
 
Interoperability Ms Sap
Interoperability Ms SapInteroperability Ms Sap
Interoperability Ms Sap
richaroy
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Dataconomy Media
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Mats Uddenfeldt
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
MapR Technologies
 
LeedsSharp May 2023 - Azure Integration Services
LeedsSharp May 2023 - Azure Integration ServicesLeedsSharp May 2023 - Azure Integration Services
LeedsSharp May 2023 - Azure Integration Services
Michael Stephenson
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
 
Big Data Introduction - Solix empower
Big Data Introduction - Solix empowerBig Data Introduction - Solix empower
Big Data Introduction - Solix empower
Durga Gadiraju
 
E2E Data Pipeline - Apache Spark/Airflow/Livy
E2E Data Pipeline - Apache Spark/Airflow/LivyE2E Data Pipeline - Apache Spark/Airflow/Livy
E2E Data Pipeline - Apache Spark/Airflow/Livy
Rikin Tanna
 
London Oracle Developer Meetup April 18
London Oracle Developer Meetup April 18London Oracle Developer Meetup April 18
London Oracle Developer Meetup April 18
Phil Wilkins
 

Similar to Boston Hadoop Meetup: Presto for the Enterprise (20)

Open Source SQL for Hadoop: Where are we and Where are we Going?
Open Source SQL for Hadoop: Where are we and Where are we Going?Open Source SQL for Hadoop: Where are we and Where are we Going?
Open Source SQL for Hadoop: Where are we and Where are we Going?
 
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor Landscape
 
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experience
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
 
Apache drill
Apache drillApache drill
Apache drill
 
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
ECS19 - Vesa Juvonen - SharePoint and Office 365 Development PowerClass
ECS19 - Vesa Juvonen - SharePoint and Office 365 Development PowerClassECS19 - Vesa Juvonen - SharePoint and Office 365 Development PowerClass
ECS19 - Vesa Juvonen - SharePoint and Office 365 Development PowerClass
 
Interoperability Ms Sap
Interoperability Ms SapInteroperability Ms Sap
Interoperability Ms Sap
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
LeedsSharp May 2023 - Azure Integration Services
LeedsSharp May 2023 - Azure Integration ServicesLeedsSharp May 2023 - Azure Integration Services
LeedsSharp May 2023 - Azure Integration Services
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Big Data Introduction - Solix empower
Big Data Introduction - Solix empowerBig Data Introduction - Solix empower
Big Data Introduction - Solix empower
 
E2E Data Pipeline - Apache Spark/Airflow/Livy
E2E Data Pipeline - Apache Spark/Airflow/LivyE2E Data Pipeline - Apache Spark/Airflow/Livy
E2E Data Pipeline - Apache Spark/Airflow/Livy
 
London Oracle Developer Meetup April 18
London Oracle Developer Meetup April 18London Oracle Developer Meetup April 18
London Oracle Developer Meetup April 18
 

Recently uploaded

Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
Cyanic lab
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
IES VE
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Natan Silnitsky
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
kalichargn70th171
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Tier1 app
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
Ortus Solutions, Corp
 
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Jay Das
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
informapgpstrackings
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 

Recently uploaded (20)

Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 

Boston Hadoop Meetup: Presto for the Enterprise

  • 1. 1 Boston Hadoop User Group Meetup, July 7, 2015 Kamil Bajda-Pawlikowski Matt Fuller
  • 2. 2 •  History of Teradata Center for Hadoop –  Formerly Hadapt Founded in July, 2010 by Borgman, Bajda-Pawlikowski, and Abadi –  Pioneered SQL-on-Hadoop market –  Based on work done by database research group in Yale Computer Science Department –  Hybrid of Hadoop scalability and DBMS performance •  Today –  Acquired by Teradata in July, 2014, renamed Teradata Center for Hadoop –  30 developers with deep Hadoop and database expertise –  Headquarters in Boston, MA –  Contributors to open source project Presto Who are we? - Teradata Center for Hadoop!
  • 3. 3 •  What is Presto? •  What is Teradata doing? •  Can I see a Demo? •  How can I contribute? Talk Agenda
  • 4. 4 •  100% open source distributed ANSI SQL engine for Big Data –  Modern code base –  Proven scalability –  Optimized for low latency, Interactive querying •  Cross platform query capability, not only SQL on Hadoop •  Distributed under the Apache license, now supported by Teradata •  Used by a community of well known, well respected technology companies What is Presto?
  • 5. 5 History of Presto FALL 2012 4 developers start Presto development FALL 2014 88 Releases 41 Contributors 3943 Commits SPRING 2015 98 Releases 65 Contributors 4587 Commits --------- Teradata joins Presto community & offers support SPRING 2013 Presto rolled out within Facebook FALL 2013 Facebook open sources Presto FALL 2008 Facebook open sources Hive Timeline image courtesy of Facebook
  • 6. 6 Presto Architecture Data stream API Worker Data stream API Worker Coordinator Metadata API Parser/ analyzer Planner Scheduler Worker Client Data location API Pluggable https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920
  • 7. 7 Presto Extensibility – connectors Parser/ analyzer Planner Worker Data location API Hive Cassandra Kafka MySQL … Metadata API Hive Cassandra Kafka MySQL … Data stream API Hive Cassandra Kafka MySQL … Scheduler Coordinator https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920
  • 8. 8 •  Data stays in memory during execution and is pipelined across nodes MPP-style •  Vectorized columnar processing •  Presto is written in highly tuned Java –  Efficient in-memory data structures –  Very careful coding of inner loops –  Bytecode generation •  Optimized ORC reader Presto = Performance
  • 9. 9 •  Facebook –  Multiple production clusters (100s of nodes total) -  Including 300PB Hadoop data warehouse –  1000s of internal daily active users –  Millions of queries each month –  Multiple PBs scanned every day –  Trillions of rows a day •  Netflix –  Over 200-node production cluster on EC2 –  Over 15 PB in S3 (Parquet format) –  Over 300 users and 2.5K queries daily Presto in Production
  • 10. 10 •  100% open source contributions to Presto to increase adoption in the enterprise •  A multi-year roadmap commitment to phased enhancements of the open source code •  The first ever commercial support offering for Presto What is Teradata Doing? Teradata Certified Presto www.teradata.com/presto
  • 11. 11 •  Hadoop Distro Agnostic •  Modern Code Base –  Presto is well-designed open source software with proper database architecture •  Strong Like-Minded Community •  Push down processing across multiple data platforms •  Leverage Teradata expertise to make SQL for Hadoop viable Why is Teradata Contributing to Presto?
  • 13. 13 Implement Integrate Proliferate •  Installer •  Documentation •  Monitoring & Support Tools •  Management Tool Integration •  YARN Integration •  ODBC / JDBC Drivers •  BI Certification •  Security •  Connectors Commercial Support Phase 1 Phase 2 Phase 3 June 8, 2015 Q4 2015 2016 Expanding ANSI SQL Coverage Teradata Contributions to Presto
  • 14. 14 •  Ease of install and management via Presto-Admin tool –  www.github.com/prestodb/presto-admin –  Packaging Presto as an RPM •  Testing Framework for Presto –  www.github.com/prestodb/tempto –  Added large number of tests •  Improvements to JDBC driver –  To be open sourced on www.github.com/prestodb soon! •  Various SQL improvements Teradata’s Contributions
  • 15. 15 •  YARN Integration •  Ambari Integration •  ODBC & JDBC Drivers that actually work •  Security – Authentication & Authorization •  Continued SQL Improvements •  BI tool certifications – e.g. Tableau •  More Connectors – e.g. Hbase •  Open Source our Docker based Dev Env •  Open our Continuous Integration platform to the community Teradata’s Contribution Product Roadmap
  • 16. 16 www.github.com/facebook/presto www.github.com/prestodb Certified Distro: www.teradata.com/presto Website: www.prestodb.io Presto User’s Group: www.groups.google.com/group/presto-users Facebook Page: www.facebook.com/prestodb Twitter: #prestodb How can I contribute?
  • 17. 17 Available for Download –  Presto 101t Server, CLI, JDBC –  Presto-Admin 0.1 –  Documentation –  HDP w/ Presto VM Sandbox –  CDH w/ Presto VM Sandbox www.teradata.com/presto Presto 101t certified by Teradata
  • 18. 18