Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Reading The Source Code of Presto

1,929 views

Published on

Presentation at Presto Conference Tokyo 2019

Published in: Technology

Reading The Source Code of Presto

  1. 1. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Taro L. Saito, Dongmin Yu Arm Treasure Data Presto Conference Tokyo 2019 June 11th, 2019 Reading Source Code of Presto 1
  2. 2. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. About Me: Taro L. Saito (Leo) 2 ● Principal Software Engineer at Arm Treasure Data ● Building distributed query engine service ● Living in US for 4 years ● DBMS & Data Science Background ● Ph.D. of Computer Science ● OSS Projects around DBMS ● snappy-java: a compression library used in Spark, Parquet, etc. ● sqlite-jdbc ● msgpack-java ■ MsgPack implementation for Java
  3. 3. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. New Release from O’Reilly Japan ● “Designing Data-Intensive Applications” ● By Martin Kleppman ● Techniques and concepts around distributed data processing systems ● A Japanese-translation will be available soon ● on July 18, 2019 ● Pre-order at: ■ Amazon.co.jp ■ O’Reilly Japan 3 分散データシステム入門の決定版の翻訳が来月発売

  4. 4. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Today’s Goals ● Learn How To Start Reading Presto’s Source Code ● GitHub ■ prestosql: https://github.com/prestosql/presto ● Note: prestodb is an old repo maintained by Facebook ● Find Your Own Interests And Learn Where To Look At: ● SQL on Everything ■ Using Presto as an SQL interface to your own data sources (connectors) ● Query Engine Core ■ Learn how to implement query engines ● Distributed Systems ■ Learn how to implement HTTP-based distributed systems ● Using Presto ■ presto clients, Presto’s REST protocol ● Extending Presto ■ e.g., Adding new UDFs 4
  5. 5. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Presto: SQL On Everything ● ICDE 2019 Paper ● Architecture overview and the details of the system design 5
  6. 6. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. 6 Navigating Code
  7. 7. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Setting Up IntelliJ IDEA ● Learn Useful Shortcuts ● Source Code Navigation ● shift x 2 ■ Search everything ● Go to declaration ■ Ctrl + Click ● Quick definition ■ Ctrl + Shift + I ● Find Usage of functions, classes ● Type Hierarchies ■ Ctrl + H ● Bookmarks 7
  8. 8. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Type Hierarchy (Ctrl + H) 8
  9. 9. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Bookmarks 9
  10. 10. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. 10 Connector: SQL on Everything
  11. 11. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Connector: SQL on Everything ● Presto Connectors (plug-ins) ● Enable processing SQL queries for various data sources ● Implement presto-spi interfaces ● Connector interface ● presto-hive ● A full-fledged connector using almost all SPI features ● Difficult to understand for beginners ● presto-base-jdbc ● Relatively easier connector to read ● Base of various DBMS adapters ■ presto-postgresql, presto-mysql, etc. 11
  12. 12. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. presto-base-jdbc connector 12
  13. 13. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Google Guice: Dependency Injection Library ● xxxModule classes define bindings to use at constructors with @Inject annotation 13
  14. 14. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Presto Coordinator Module ● You can learn what classes are used for the coordinator 14
  15. 15. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Reading Data From Data Sources ● Record/Page based readers ● RecordCursor interface ● isNull ● getType(field) ● getXXX(field) ● Mapping to Presto Data Types ● boolean ● long ● double ● Slice (utf8 string) ● Object ■ array, map, etc. 15
  16. 16. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Example ● JDBCRecordCursor ● Steps ● Connect to JDBC ● Prepare Column Readers ● Build SQL to run with JDBC ● Read JDBC ResultSets 16
  17. 17. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. TupleDomain ● Build SELECT statements for JDBC queries ● Presto provides: ● Projection ■ columns to select ● TupleDomain ● ColumnDomain ■ predicates ○ == ○ <, <=, >=, > ○ in (....) ○ null / not null ○ all 17
  18. 18. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Reading Column Data ● Convert External JDBC Results into Presto Column Data 18
  19. 19. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Writing Column Data (PageSink) ● Page ● Presto’s internal data format on memory ● Used for sending intermediate query results (table structure = releation) ● Page has multiple Blocks ■ columnar format ● Block ● column data of the same type ● 0 until position ● PageSink ● Receives Page ● appendPage(page) ● presto-base-jdbc ● Page -> insert into SQL statements 19
  20. 20. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. 20
  21. 21. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. 21 Query Engine Core
  22. 22. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Query Engine Core: Query Execution Flow 22
  23. 23. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Query Engine Core: Parsing SQL ● ANTLR4 Grammar (SqlBase.g4) ● SQL-92 syntax ● Used also in SparkSQL ● SqlBaseLexer/Parser: ● Generated by ANTLR4 ● SQL -> ANTLR parse tree ● SqlParser ● AstBuilder ■ Visitor pattern for ANTLR parse tree ■ Generates SQL tree for Presto: Statement 23
  24. 24. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Analyzer ● Traverse Statement structure ● Resolve actual column names and types in SQL ● Using Metadata (table schema provider) ● e.g., find actual column names accessed in SELECT * 24
  25. 25. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. SqlQueryExecution ● Analyze ● Generates a logical SQL plan (Plan) ● Apply logical plan optimizers ● DistributedPlan ● Split query stages into multiple tasks ● Assign worker nodes to use 25
  26. 26. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. LocalExecutionPlanner ● Running at worker nodes ● Optmization ● Create a compiled operator (Java Byte Code) ● Example: ● Generates predicate/projection evaluation code during table scan 26
  27. 27. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Further Reading: Anatomy of Presto ● By Dongmin Yu (Arm Treasure Data) ● https://www.slideshare.net/dongminyu/presto-anatomy ● How presto generates byte-codes for query processing 27
  28. 28. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. 28 Using Presto
  29. 29. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Using Presto: Presto REST Protocol (v1) ● POST /v1/statement ● body: SQL query text ● receive: QyeryResults data with nextUri ● Headers ■ X-Presto-User, X-Presto-Schema, X-Presto-Session, X-Presto-Client-Tags ● GET /v1/statement/(query_id)/(page token) ● nextUri, table data, query stage stats ● Keep reading until nextUri becomes null ● QueryResults model class ● Represented in JSON ■ Jackson JSON object mapper ● Error Handling ● Standard errors (e.g., SQL syntax errors) ■ 200: Error Response ■ 503: (Server slowdown), retry in 50 ~100 ms 29
  30. 30. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Presto UDFs ● User-Defined Functions ● Mapping Java functions to SQL functions ● FunctionRegistry 30
  31. 31. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. 31 Presto As A Distributed System
  32. 32. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. ● Airlift ● Presto’s internal framework for building REST services ● https://github.com/airlift/airlift ● REST API definitions ● xxxResource classes ● JAX-RS annotations ■ @Path, @GET, @POST ● JSON protocol (jackson) ● HTTP Services ● coordinator/worker ● discovery service ● JMX - JSON server ● Utilities ● Guice extension ■ bootstrap, configuration ● logger, units Presto As A Distributed System Implementation 32
  33. 33. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Summary ● Learned various flavors of Presto and the corresponding code locations ● SQL on Everything ■ presto connectors ● Query Engine Core ■ presto-main ● Distributed Systems ■ airlift modules ● Presto as a REST service (presto client) ■ query protocol ● Extending Presto ■ e.g., Adding new UDFs ● Enjoy Reading Presto’s Code For Your Own Interest! 33
  34. 34. Confidential © Arm 2017Confidential © Arm 2017Confidential © Arm 2017 Thank You! Danke! Merci! 谢谢! ありがとう! Gracias! Kiitos! 34

×