Open Source SQL for Hadoop
Where are we now and where are we going
2
• History of Hadapt
– Founded in July, 2010 by Borgman, Bajda-Pawlikowski, and Abadi
– Pioneered SQL-on-Hadoop market
– Based on work done by database research group in Yale Computer Science Department
– Hybrid of Hadoop scalability and DBMS performance
• Today
– Hadapt acquired by Teradata in July, 2014, renamed Teradata Center for Hadoop
– 40 developers with deep Hadoop and database expertise
– Headquarters in Boston, MA
Teradata Center for Hadoop
Justin Borgman
VP & GM, Teradata Center for Hadoop
Former Founder & CEO, Hadapt
Presto
Bringing open-source Presto to the Enterprise
4
• 100% open source contributions to Presto to
increase adoption in the enterprise
• A multi-year roadmap commitment to
phased enhancements of the open source
code
• The first ever commercial support offering for
Presto
#presto
What is Teradata Announcing?
Available starting June 8, 2015
5
Presto
100% open source SQL query engine
– Modern code base
– Proven scalability
– Interactive querying
Cross platform query capability, not only SQL on Hadoop
Licensed by Apache
Not supported by a major vendor
Used by a community of well known, well respected technology companies
6
Why Contribute to Presto?
Portable and Open
- Facebook is vendor neutral
- Not tied to a Distro
Modern Code Base
- Presto is a well-designed blend of
open source Hadoop thinking and
basic database software principles
Existing Community
Push down processing across
multiple data platforms
© 2014 Teradata
Why is Teradata Contributing to Presto?
• Why Contribute to SQL-on-Hadoop?
Advancing the UDA requires better
SQL-on-Hadoop capabilities
- Teradata is aligned around a logical
data warehouse vision
- SQL is the interaction protocol across
UDA
- SQL-on-Hadoop engine is needed to
extend QueryGrid to Hadoop query
point
Opportunities are Teradata Strengths
in this still immature space
Up-the-stack capability versus core
Hadoop
What is Presto?
 Distributed SQL analytics engine
 Optimized for low-latency, interactive
analysis
 ANSI SQL
 Extensible
August 2012
4 developers
start Presto
development
June 2014
68 Releases
30 Contributors
2796 Commits
March 2015
98 Releases
65 Contributors
4587 Commits
December
2012
Presto rolled out
within Facebook
November
2013
Facebook open
sources Presto
FALL 2008
Facebook
open sources
Hive
Presto @ Facebook
 1000s of internal daily active users
 Millions of queries each month
 Multiple PBs scanned every day
 Trillions of rows a day
Architecture
Scheduler
Data
Location API
Parser/
Analyzer
Planner
Metadata
API
Coordinator
Client
Worker
Worker
Worker
Data Stream API
Data Stream API
Connectors
Coordinator Worker
Parser/
Analyzer
Planner Scheduler
Cassandra
Internal
MySQL
Kafka
Hive
Metadata API
Cassandra
Internal
MySQL
Kafka
Hive
Data Location API
Cassandra
Internal
MySQL
Kafka
Hive
Data Stream API
Connectors
 Hadoop (1.x, 2.x, CDH4, CDH5)
 Cassandra
 MySQL
 PostgreSQL
 Kafka
 Amazon Kinesis (soon)
 Redis (soon)
Other extension points
 Types
 Functions
 Operators
What makes Presto fast?
 Data in memory during execution
 Pipelining and streaming
 Very careful coding of inner loops
 Efficient flat-memory data structures
 Bytecode generation
 Custom ORC reader
Next
 More SQL features
 Planner/execution engine improvements
 Native columnar store
 Security
Open source
 Apache License 2.0
 Open development
 Releases every 1-2 weeks
Contributors welcome!
17
Teradata Contributions to Presto
Implement Integrate Proliferate
• Installer
• Documentation
• Monitoring & Support
Tools
• Management Tool
Integration
• YARN Integration
• ODBC / JDBC Drivers
• BI Certification
• Security
• Connectors
Commercial Support
Phase 1 Phase 2 Phase 3
June 8, 2015 Q4 2015 2016
Expanding ANSI SQL Coverage
1818
Teradata Engineers Dedicated to Presto
19
Early Feedback is Extremely Positive
“Presto is an integral part of the Airbnb data infrastructure stack with
hundreds of employees running queries each day with the technology.
We are excited to see Teradata joining the Presto open source
community and are encouraged by the direction of their contributions”
- James Mayfield, product lead, Airbnb.
"We are excited to see Teradata's commitment to Presto and adding
capabilities in the open source domain. This will create interesting
opportunities within our technical and business teams to open up more
access options to our critical data. We think this is a positive for Teradata
and for the community as a whole”
- Steve Deasy, vice president of Engineering, Groupon.
Download version Presto 101t today!
www.teradata.com/presto
Contribute to Presto today!
www.github.com/facebook/presto
www.prestodb.io
#presto

Open Source SQL for Hadoop: Where are we and Where are we Going?

  • 1.
    Open Source SQLfor Hadoop Where are we now and where are we going
  • 2.
    2 • History ofHadapt – Founded in July, 2010 by Borgman, Bajda-Pawlikowski, and Abadi – Pioneered SQL-on-Hadoop market – Based on work done by database research group in Yale Computer Science Department – Hybrid of Hadoop scalability and DBMS performance • Today – Hadapt acquired by Teradata in July, 2014, renamed Teradata Center for Hadoop – 40 developers with deep Hadoop and database expertise – Headquarters in Boston, MA Teradata Center for Hadoop Justin Borgman VP & GM, Teradata Center for Hadoop Former Founder & CEO, Hadapt
  • 3.
  • 4.
    4 • 100% opensource contributions to Presto to increase adoption in the enterprise • A multi-year roadmap commitment to phased enhancements of the open source code • The first ever commercial support offering for Presto #presto What is Teradata Announcing? Available starting June 8, 2015
  • 5.
    5 Presto 100% open sourceSQL query engine – Modern code base – Proven scalability – Interactive querying Cross platform query capability, not only SQL on Hadoop Licensed by Apache Not supported by a major vendor Used by a community of well known, well respected technology companies
  • 6.
    6 Why Contribute toPresto? Portable and Open - Facebook is vendor neutral - Not tied to a Distro Modern Code Base - Presto is a well-designed blend of open source Hadoop thinking and basic database software principles Existing Community Push down processing across multiple data platforms © 2014 Teradata Why is Teradata Contributing to Presto? • Why Contribute to SQL-on-Hadoop? Advancing the UDA requires better SQL-on-Hadoop capabilities - Teradata is aligned around a logical data warehouse vision - SQL is the interaction protocol across UDA - SQL-on-Hadoop engine is needed to extend QueryGrid to Hadoop query point Opportunities are Teradata Strengths in this still immature space Up-the-stack capability versus core Hadoop
  • 7.
    What is Presto? Distributed SQL analytics engine  Optimized for low-latency, interactive analysis  ANSI SQL  Extensible
  • 8.
    August 2012 4 developers startPresto development June 2014 68 Releases 30 Contributors 2796 Commits March 2015 98 Releases 65 Contributors 4587 Commits December 2012 Presto rolled out within Facebook November 2013 Facebook open sources Presto FALL 2008 Facebook open sources Hive
  • 9.
    Presto @ Facebook 1000s of internal daily active users  Millions of queries each month  Multiple PBs scanned every day  Trillions of rows a day
  • 10.
  • 11.
    Connectors Coordinator Worker Parser/ Analyzer Planner Scheduler Cassandra Internal MySQL Kafka Hive MetadataAPI Cassandra Internal MySQL Kafka Hive Data Location API Cassandra Internal MySQL Kafka Hive Data Stream API
  • 12.
    Connectors  Hadoop (1.x,2.x, CDH4, CDH5)  Cassandra  MySQL  PostgreSQL  Kafka  Amazon Kinesis (soon)  Redis (soon)
  • 13.
    Other extension points Types  Functions  Operators
  • 14.
    What makes Prestofast?  Data in memory during execution  Pipelining and streaming  Very careful coding of inner loops  Efficient flat-memory data structures  Bytecode generation  Custom ORC reader
  • 15.
    Next  More SQLfeatures  Planner/execution engine improvements  Native columnar store  Security
  • 16.
    Open source  ApacheLicense 2.0  Open development  Releases every 1-2 weeks Contributors welcome!
  • 17.
    17 Teradata Contributions toPresto Implement Integrate Proliferate • Installer • Documentation • Monitoring & Support Tools • Management Tool Integration • YARN Integration • ODBC / JDBC Drivers • BI Certification • Security • Connectors Commercial Support Phase 1 Phase 2 Phase 3 June 8, 2015 Q4 2015 2016 Expanding ANSI SQL Coverage
  • 18.
  • 19.
    19 Early Feedback isExtremely Positive “Presto is an integral part of the Airbnb data infrastructure stack with hundreds of employees running queries each day with the technology. We are excited to see Teradata joining the Presto open source community and are encouraged by the direction of their contributions” - James Mayfield, product lead, Airbnb. "We are excited to see Teradata's commitment to Presto and adding capabilities in the open source domain. This will create interesting opportunities within our technical and business teams to open up more access options to our critical data. We think this is a positive for Teradata and for the community as a whole” - Steve Deasy, vice president of Engineering, Groupon.
  • 20.
    Download version Presto101t today! www.teradata.com/presto Contribute to Presto today! www.github.com/facebook/presto www.prestodb.io #presto

Editor's Notes

  • #6 Interactive performance of execution engine Code generation for operators (similarly to Impala) Data is pipelined MPP-style Runs at Facebook scale *Capable of querying other non-HDFS data stores as well*