Presto: Tuning performance of SQL-on-anything analytics
Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Airbnb, Bloomberg, Comcast, Facebook, FINRA, LinkedIn, Lyft, Netflix, Twitter, and Uber, in the last few years Presto has experienced an unprecedented growth in popularity in both on-premises and cloud deployments over object stores, HDFS, NoSQL, and RDBMS data stores.
With the ever-growing list of connectors to new data sources such as Azure Blob Storage, Elasticsearch, Netflix Iceberg, Apache Kudu, and Apache Pulsar, Presto’s recently introduced cost-based optimizer must account for heterogeneous inputs with differing and often incomplete data statistics. Kamil Bajda-Pawlikowski and Martin Traverso explore this topic and detail use cases for Presto across several industries. They also share recent Presto advancements, such as geospatial analytics at scale, and the project roadmap going forward.