Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Boston Hadoop Meetup: Presto for the Enterprise

2,354 views

Published on

Presentation Title: Presto for the Enterprise
Presenter(s): Matt Fuller and Kamil Bajda-Pawlikowski
Company: Teradata Center for Hadoop
Short Description:
Teradata will provide a technical overview and demo of Presto, focusing on Presto's architecture and Teradata's contributions to the project and community. Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. Originally developed by Facebook, Teradata now joins Facebook as the second largest contributor to the open source project. Come join us and learn more about Presto. And how you can join the Presto community.

Published in: Software

Boston Hadoop Meetup: Presto for the Enterprise

  1. 1. 1 Boston Hadoop User Group Meetup, July 7, 2015 Kamil Bajda-Pawlikowski Matt Fuller
  2. 2. 2 •  History of Teradata Center for Hadoop –  Formerly Hadapt Founded in July, 2010 by Borgman, Bajda-Pawlikowski, and Abadi –  Pioneered SQL-on-Hadoop market –  Based on work done by database research group in Yale Computer Science Department –  Hybrid of Hadoop scalability and DBMS performance •  Today –  Acquired by Teradata in July, 2014, renamed Teradata Center for Hadoop –  30 developers with deep Hadoop and database expertise –  Headquarters in Boston, MA –  Contributors to open source project Presto Who are we? - Teradata Center for Hadoop!
  3. 3. 3 •  What is Presto? •  What is Teradata doing? •  Can I see a Demo? •  How can I contribute? Talk Agenda
  4. 4. 4 •  100% open source distributed ANSI SQL engine for Big Data –  Modern code base –  Proven scalability –  Optimized for low latency, Interactive querying •  Cross platform query capability, not only SQL on Hadoop •  Distributed under the Apache license, now supported by Teradata •  Used by a community of well known, well respected technology companies What is Presto?
  5. 5. 5 History of Presto FALL 2012 4 developers start Presto development FALL 2014 88 Releases 41 Contributors 3943 Commits SPRING 2015 98 Releases 65 Contributors 4587 Commits --------- Teradata joins Presto community & offers support SPRING 2013 Presto rolled out within Facebook FALL 2013 Facebook open sources Presto FALL 2008 Facebook open sources Hive Timeline image courtesy of Facebook
  6. 6. 6 Presto Architecture Data stream API Worker Data stream API Worker Coordinator Metadata API Parser/ analyzer Planner Scheduler Worker Client Data location API Pluggable https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920
  7. 7. 7 Presto Extensibility – connectors Parser/ analyzer Planner Worker Data location API Hive Cassandra Kafka MySQL … Metadata API Hive Cassandra Kafka MySQL … Data stream API Hive Cassandra Kafka MySQL … Scheduler Coordinator https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920
  8. 8. 8 •  Data stays in memory during execution and is pipelined across nodes MPP-style •  Vectorized columnar processing •  Presto is written in highly tuned Java –  Efficient in-memory data structures –  Very careful coding of inner loops –  Bytecode generation •  Optimized ORC reader Presto = Performance
  9. 9. 9 •  Facebook –  Multiple production clusters (100s of nodes total) -  Including 300PB Hadoop data warehouse –  1000s of internal daily active users –  Millions of queries each month –  Multiple PBs scanned every day –  Trillions of rows a day •  Netflix –  Over 200-node production cluster on EC2 –  Over 15 PB in S3 (Parquet format) –  Over 300 users and 2.5K queries daily Presto in Production
  10. 10. 10 •  100% open source contributions to Presto to increase adoption in the enterprise •  A multi-year roadmap commitment to phased enhancements of the open source code •  The first ever commercial support offering for Presto What is Teradata Doing? Teradata Certified Presto www.teradata.com/presto
  11. 11. 11 •  Hadoop Distro Agnostic •  Modern Code Base –  Presto is well-designed open source software with proper database architecture •  Strong Like-Minded Community •  Push down processing across multiple data platforms •  Leverage Teradata expertise to make SQL for Hadoop viable Why is Teradata Contributing to Presto?
  12. 12. 12 Demo Time!
  13. 13. 13 Implement Integrate Proliferate •  Installer •  Documentation •  Monitoring & Support Tools •  Management Tool Integration •  YARN Integration •  ODBC / JDBC Drivers •  BI Certification •  Security •  Connectors Commercial Support Phase 1 Phase 2 Phase 3 June 8, 2015 Q4 2015 2016 Expanding ANSI SQL Coverage Teradata Contributions to Presto
  14. 14. 14 •  Ease of install and management via Presto-Admin tool –  www.github.com/prestodb/presto-admin –  Packaging Presto as an RPM •  Testing Framework for Presto –  www.github.com/prestodb/tempto –  Added large number of tests •  Improvements to JDBC driver –  To be open sourced on www.github.com/prestodb soon! •  Various SQL improvements Teradata’s Contributions
  15. 15. 15 •  YARN Integration •  Ambari Integration •  ODBC & JDBC Drivers that actually work •  Security – Authentication & Authorization •  Continued SQL Improvements •  BI tool certifications – e.g. Tableau •  More Connectors – e.g. Hbase •  Open Source our Docker based Dev Env •  Open our Continuous Integration platform to the community Teradata’s Contribution Product Roadmap
  16. 16. 16 www.github.com/facebook/presto www.github.com/prestodb Certified Distro: www.teradata.com/presto Website: www.prestodb.io Presto User’s Group: www.groups.google.com/group/presto-users Facebook Page: www.facebook.com/prestodb Twitter: #prestodb How can I contribute?
  17. 17. 17 Available for Download –  Presto 101t Server, CLI, JDBC –  Presto-Admin 0.1 –  Documentation –  HDP w/ Presto VM Sandbox –  CDH w/ Presto VM Sandbox www.teradata.com/presto Presto 101t certified by Teradata
  18. 18. 18

×