Presto changes
July 19, 2016 : Presto meeup
Masahiro Nakagawa
• Teradata starts Enterprise support
• http://www.teradata.jp/products-and-services/
presto-download/
• Airpal
• http://airbnb.io/airpal/
• Presto Admin
• https://github.com/prestodb/presto-admin
• Users: https://github.com/prestodb/presto/wiki/Presto-Users
Ecosystem
New DataType
• DECIMAL (v0.145)
• VARCHAR(N) (v0.137)
• INTEGER (v0.145)
• Use only 4 byte unlike 8 byte BIGINT
• SMALLINT, TINYINT
• FLOAT (v0.149)
New Connectors
• System Connector (v0.100)
• BlackHoleConnector (v0.108)
• Redis Connector (v0.119)
• Tableau Web Connector (v0.126)
• New Parquet Reader for Hive (v0.138)
• MongoDBConnector (v0.146)
• LocalFileConnector (v0.147)
New syntax
• CREATE TABLE (v0.101)
• DELETE (v0.107)
• TRY (v0.140)
• CUBE, ROLLUP, GROUPING SETS (v0.142)
• https://prestodb.io/docs/current/sql/
select.html#complex-grouping-operations
• EXPLAIN ANALYZE (v0.144)
New features
• String functions assume input is UTF-8 (v0.102)
• Transaction support in SQL / Connector (v0.132)
• Dynamic split concurrency (v0.139)
• non-correlated subquery in aggregation query (v0.142)
• colocated join (v0.147)
• non-equi outer joins (v0.148)
Execution Planner - Split
Sink
Final aggr
Exchange
Sink
Partial aggr
Table scan
Sink
Final aggr
Exchange
Sink
Partial aggr
Table scan
Output
Exchange
Split
many splits / task
= many threads / worker
(table scan)
1 split / task
= 1 thread / worker
Worker 1 Worker 2
1 split / worker
= 1 thread / worker
Configuration
• node-scheduler.network-topology (v0.129)
• legacy or flat for consider work queue
• query.max-cpu-time (v0.143)
• Limit CPU usage of a query
Operation
• Worker: Graceful shutdown (v0.128)
• resource groups (v0.147)
• resources and query queues isolation for various
workload
• https://prestodb.io/docs/current/admin/resource-
groups.html
• G1GC is recomended (v0.148)
New function (1)
• String
• normalize, lpad, rpad, split_to_map, substring, from_utf8, to_utf8, split
• Math
• truncate, width_bucket, sign, to_base, from_base, degrees, radians
• Array
• array_distinct, array_intersect, array_position, array_join, array_min,
array_max, slice, element_at, flatten, sequence, array_remove
• Map
• element_at, map_concat
New function (2)
• Aggregation
• map_agg, bool_and, bool_or, array_agg, geometric_mean, map_union,
multimap_agg, checksum, histogram, cover_pop, cover_samp, corr,
regr_slope, regr_intercept
• Binary
• md5, sha1, sha256, sha512
• Bitwise
• bit_count, bitwise_not, bitswise_and, bitwise_or, bitwise_xor
• URL
• url_encode, url_decode
New function (3)
• datetime
• date_format, date_parse, from_iso8601_timestamp,
from_iso8601_date, to_iso8601, year_of_week
• Regexp
• regexp_split, regexp_extract_all,
Next features
• https://github.com/prestodb/presto/wiki/Roadmap
• Prepared statements
• New optimizer
• Materialized query tables
• Spill to disk
• HTTP/2
• etc…

Presto changes

  • 1.
    Presto changes July 19,2016 : Presto meeup Masahiro Nakagawa
  • 2.
    • Teradata startsEnterprise support • http://www.teradata.jp/products-and-services/ presto-download/ • Airpal • http://airbnb.io/airpal/ • Presto Admin • https://github.com/prestodb/presto-admin • Users: https://github.com/prestodb/presto/wiki/Presto-Users Ecosystem
  • 3.
    New DataType • DECIMAL(v0.145) • VARCHAR(N) (v0.137) • INTEGER (v0.145) • Use only 4 byte unlike 8 byte BIGINT • SMALLINT, TINYINT • FLOAT (v0.149)
  • 4.
    New Connectors • SystemConnector (v0.100) • BlackHoleConnector (v0.108) • Redis Connector (v0.119) • Tableau Web Connector (v0.126) • New Parquet Reader for Hive (v0.138) • MongoDBConnector (v0.146) • LocalFileConnector (v0.147)
  • 5.
    New syntax • CREATETABLE (v0.101) • DELETE (v0.107) • TRY (v0.140) • CUBE, ROLLUP, GROUPING SETS (v0.142) • https://prestodb.io/docs/current/sql/ select.html#complex-grouping-operations • EXPLAIN ANALYZE (v0.144)
  • 6.
    New features • Stringfunctions assume input is UTF-8 (v0.102) • Transaction support in SQL / Connector (v0.132) • Dynamic split concurrency (v0.139) • non-correlated subquery in aggregation query (v0.142) • colocated join (v0.147) • non-equi outer joins (v0.148)
  • 7.
    Execution Planner -Split Sink Final aggr Exchange Sink Partial aggr Table scan Sink Final aggr Exchange Sink Partial aggr Table scan Output Exchange Split many splits / task = many threads / worker (table scan) 1 split / task = 1 thread / worker Worker 1 Worker 2 1 split / worker = 1 thread / worker
  • 8.
    Configuration • node-scheduler.network-topology (v0.129) •legacy or flat for consider work queue • query.max-cpu-time (v0.143) • Limit CPU usage of a query
  • 9.
    Operation • Worker: Gracefulshutdown (v0.128) • resource groups (v0.147) • resources and query queues isolation for various workload • https://prestodb.io/docs/current/admin/resource- groups.html • G1GC is recomended (v0.148)
  • 10.
    New function (1) •String • normalize, lpad, rpad, split_to_map, substring, from_utf8, to_utf8, split • Math • truncate, width_bucket, sign, to_base, from_base, degrees, radians • Array • array_distinct, array_intersect, array_position, array_join, array_min, array_max, slice, element_at, flatten, sequence, array_remove • Map • element_at, map_concat
  • 11.
    New function (2) •Aggregation • map_agg, bool_and, bool_or, array_agg, geometric_mean, map_union, multimap_agg, checksum, histogram, cover_pop, cover_samp, corr, regr_slope, regr_intercept • Binary • md5, sha1, sha256, sha512 • Bitwise • bit_count, bitwise_not, bitswise_and, bitwise_or, bitwise_xor • URL • url_encode, url_decode
  • 12.
    New function (3) •datetime • date_format, date_parse, from_iso8601_timestamp, from_iso8601_date, to_iso8601, year_of_week • Regexp • regexp_split, regexp_extract_all,
  • 13.
    Next features • https://github.com/prestodb/presto/wiki/Roadmap •Prepared statements • New optimizer • Materialized query tables • Spill to disk • HTTP/2 • etc…