Trino: A ludicrously fast query engine
Brian Olsen
Developer Advocate @
Trino Contributor
@bitsondatadev
Overview
● Pulsar SQL
● How Pulsar uses Presto
● Trino Overview
○ Hive: The perfect SQL HiveQL solution
○ Presto! Your queries are fast!
○ Trino (formerly known as PrestoSQL)
● Query Federation with Pulsar
● Demo
● Looking Forward
Pulsar SQL
● Pulsar is message/streaming or
event store?
● Segment centric storage in Apache
Bookkeeper
● Historical data (old segments) that
can exist in bookies or in S3
● Jerry Peng added PIP-19 Aug 2018
● Adds Pulsar SQL
● https://github.com/apache/pulsar/pull/2265
How Pulsar uses Presto
● Pulsar stores schema in registry
stored in Apache bookkeeper
● Uses Avro definition for schemas
● Provides structured streams that
allow data to be represented in
tabular format (ANSI compliant)
How Pulsar uses Presto...or is it Trino?
● Uses embedded Presto and
its own scripts to manage the
cluster
● March 2020: Issue 6605
prestodb changes to prestosql
● July 2020: Yuya Ebhyr
updates to prestosql v332
● Dec 2020: Trino rebrand from
prestosql
● June 2021: Pulsar community
donates Trino-Pulsar
connector (Marvin Cai) to
Trino
Presto vs Trino
● Wait isn’t today’s talk about Trino?
● Why are we talking about Presto?
Trino is a fast distributed SQL query engine designed to query large data sets
distributed over one or more heterogeneous data sources.
Trino
Trino TL;DL
Trino Clients Data Sources
Hive: The perfect SQL HiveQL solution
Developers at Facebook created Hive, a SQL-on-Hadoop solution that takes a SQL
like syntax, HiveQL, and transforms it into MapReduce operations on data in
Hadoop.
● Simplified development process
● Queries still taking long
● Established a base for how to model
SQL tables of generic data stores
Presto! Your queries are fast!
Martin Traverso, Dain Sundstrom, and David Phillips created Presto in 2012. It
aimed to solve for the slow queries of Hive at Facebook and eventually many more.
Development Philosophies:
● Open source with neutral governance model
● It just works (Netezza was a commercial inspiration)
● Fast, interactive analytics
● Correct results
● Standards based (ANSI SQL, JDBC, etc..)
Facebook management unilaterally rewrites
rules around committership in late 2018
Trino (formerly known as PrestoSQL)
● If you know Presto or are using PrestoSQL…
○ same software
○ same people
○ same community
○ under a shiny new name
○ and a cute bunny
● Companies don’t run open source projects,
people do.
● More details:
https://trino.io/blog/2020/12/27/announcing-trino.html
Query Federation with Pulsar
Facebook specific queries Open source queries
Query Federation with Pulsar
SPI
Facebook specific connectors Open source connectors
Query Federation with Pulsar (Pulsar Centric)
SPI
Query Federation with Pulsar (Trino Centric)
SPI
Demo
Looking Forward
Community
● Trino Slack
○ https://trino.io/slack.html
● Trino Community Broadcast
○ https://trino.io/broadcast/
● Trino Virtual Meetups
○ https://www.meetup.com/trino-americas
○ https://www.meetup.com/trino-emea
○ https://www.meetup.com/trino-apac
● Contribute to the project
○ https://trino.io/development/
● Write blogs or docs
○ https://trino.io/blog/
○ https://github.com/trinodb/trino/tree/mas
ter/docs
Thank you
Please give us a ⭐ on
github.com/trinodb/trino
@bitsondatadev

Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021

  • 1.
    Trino: A ludicrouslyfast query engine Brian Olsen Developer Advocate @ Trino Contributor @bitsondatadev
  • 2.
    Overview ● Pulsar SQL ●How Pulsar uses Presto ● Trino Overview ○ Hive: The perfect SQL HiveQL solution ○ Presto! Your queries are fast! ○ Trino (formerly known as PrestoSQL) ● Query Federation with Pulsar ● Demo ● Looking Forward
  • 3.
    Pulsar SQL ● Pulsaris message/streaming or event store? ● Segment centric storage in Apache Bookkeeper ● Historical data (old segments) that can exist in bookies or in S3 ● Jerry Peng added PIP-19 Aug 2018 ● Adds Pulsar SQL ● https://github.com/apache/pulsar/pull/2265
  • 4.
    How Pulsar usesPresto ● Pulsar stores schema in registry stored in Apache bookkeeper ● Uses Avro definition for schemas ● Provides structured streams that allow data to be represented in tabular format (ANSI compliant)
  • 5.
    How Pulsar usesPresto...or is it Trino? ● Uses embedded Presto and its own scripts to manage the cluster ● March 2020: Issue 6605 prestodb changes to prestosql ● July 2020: Yuya Ebhyr updates to prestosql v332 ● Dec 2020: Trino rebrand from prestosql ● June 2021: Pulsar community donates Trino-Pulsar connector (Marvin Cai) to Trino
  • 6.
    Presto vs Trino ●Wait isn’t today’s talk about Trino? ● Why are we talking about Presto?
  • 7.
    Trino is afast distributed SQL query engine designed to query large data sets distributed over one or more heterogeneous data sources. Trino Trino TL;DL Trino Clients Data Sources
  • 8.
    Hive: The perfectSQL HiveQL solution Developers at Facebook created Hive, a SQL-on-Hadoop solution that takes a SQL like syntax, HiveQL, and transforms it into MapReduce operations on data in Hadoop. ● Simplified development process ● Queries still taking long ● Established a base for how to model SQL tables of generic data stores
  • 9.
    Presto! Your queriesare fast! Martin Traverso, Dain Sundstrom, and David Phillips created Presto in 2012. It aimed to solve for the slow queries of Hive at Facebook and eventually many more. Development Philosophies: ● Open source with neutral governance model ● It just works (Netezza was a commercial inspiration) ● Fast, interactive analytics ● Correct results ● Standards based (ANSI SQL, JDBC, etc..) Facebook management unilaterally rewrites rules around committership in late 2018
  • 10.
    Trino (formerly knownas PrestoSQL) ● If you know Presto or are using PrestoSQL… ○ same software ○ same people ○ same community ○ under a shiny new name ○ and a cute bunny ● Companies don’t run open source projects, people do. ● More details: https://trino.io/blog/2020/12/27/announcing-trino.html
  • 11.
    Query Federation withPulsar Facebook specific queries Open source queries
  • 12.
    Query Federation withPulsar SPI Facebook specific connectors Open source connectors
  • 13.
    Query Federation withPulsar (Pulsar Centric) SPI
  • 14.
    Query Federation withPulsar (Trino Centric) SPI
  • 15.
  • 16.
  • 17.
    Community ● Trino Slack ○https://trino.io/slack.html ● Trino Community Broadcast ○ https://trino.io/broadcast/ ● Trino Virtual Meetups ○ https://www.meetup.com/trino-americas ○ https://www.meetup.com/trino-emea ○ https://www.meetup.com/trino-apac ● Contribute to the project ○ https://trino.io/development/ ● Write blogs or docs ○ https://trino.io/blog/ ○ https://github.com/trinodb/trino/tree/mas ter/docs
  • 18.
    Thank you Please giveus a ⭐ on github.com/trinodb/trino @bitsondatadev