Presto conferencetokyo2019

Presto At LINE
Presto Conference Tokyo 2019
2019/07/11
Wataru Yukawa & Yuya Ebihara

Agenda
● USE
○ Our on-premises log analysis platform and tools
○ Yanagishima features
○ Yanagishima internals
○ Presto error query analysis
○ Our Presto/Spark use case with Yanagishima/OASIS
● DEBUG
○ More than 100,000 partitions error
○ Webhdfs partition location does not exist
○ Schema mismatch of parquet
○ Contributions

Data flow
Hadoop
RDBMS
Log
Hadoop
Ingest Data Report
Ranger
Tableau
OASIS
Yanagishima
LINE Analytics
Aquarium

OASIS
● Web-based data analysis platform
like Apache Zeppelin, Jupyter
● Basically use Spark but also
can use Presto
OASIS - Data Analysis Platform for Multi-tenant Hadoop Cluster
https://www.slideshare.net/linecorp/oasis-data-analysis-platform-for-multitenant-hadoop-cluster
904 UU
67k PV

LINE Analytics
● Analysis tool similar to Google Analytics
○ Dashboard
○ Basic Summary
○ Realtime
○ Page Contents
○ Event Tracking
○ User Environment
○ Tools
● Backend is Presto
Why LINE's Front-end Development Team Built the Web Tracking System
https://www.slideshare.net/linecorp/why-lines-frontend-development-team-built-the-web-tracking-system
433 UU
7k PV

Aquarium
● Metadata catalog tool
○ Contacts
○ Note
○ Columns
○ Location
○ HDFS
○ Relationship
○ Reference
○ DDL
Efficient And Invincible Big Data Platform In LINE
https://www.slideshare.net/linecorp/efficient-and-invincible-big-data-platform-in-line/25
481 UU
10k PV

Yanagishima
1.3k UU
11k PV
● Web UI for
○ Presto
○ Hive
○ Spark SQL
○ Elasticsearch
● Started in 2015
● Many similar tools like Hue, Airpal, Shib
but I wanted to create in my mind
https://github.com/yanagishima/yanagishima

Yanagishima features
● Share query with permanent link
● Handle multiple Presto clusters
● Input parameters
● Pretty print for json & map
● Chart
● Pivot table
● Show EXPLAIN result as Text and Graphviz
● Desktop notification

● Supported chart type
○ Line
○ Stacked Area
○ Full-Stacked Area
○ Column
○ Stacked Column
Chart

Yanagishima Components
API server
○ Written in Java
○ Store query into sqlite3 or mysql
○ Store query result into filesystem
○ Don’t have authentication, use in-house auth system in proxy server
○ Don’t have authorization system, use Apache Ranger
SPA
○ Written in jQuery at first
○ Frontend engineers in our department replaced with Vue.js in 2017
○ Clean and modern code thanks to their major refactoring

How to process query
● Asynchronous processing flow
○ User submits query
○ Get query id
○ Track with query id by client side polling
○ User can know progress and kill query
● Easy to implement thanks to Presto REST API
● Not easy to implement in Hive and Spark due to lack of API
e.g., Not easy to get YARN application id

Dependency
● Depends on presto-cli not JDBC because of performance and feature
● Yanagishima wants not only query result but also column name in 1 Presto request
● DatabaseMetaData#getColumns is slow, more than 10s due to system.jdbc.columns table scan
● Presto didn’t support JDBC cancel in 2015 but now supports
● Chose to use presto-cli but it has de-merit

Compatibility issue
● Unfortunately, presto-cli >= 0.205 can’t connect old Presto server because of ROW type #224
→ Bundled new & old presto-cli without shade because package name is different
io.prestosql & com.facebook.presto
● May change to use JDBC because it’s better not to use presto-cli as PSF mentioned in the
above issue
Version Workers Auth
Analysis 315 76 -
Datachain 314 100 LDAP
Shonan 306 36 -
Dataopen 0.197 9 LDAP
Datalake2 0.188 200 LDAP

Analysis Presto error query
SemanticErrorName is available since 313 #790
Syntactic Analysis
Semantic Analysis
Syntax Error
Semantic Error
Fail
Fail
Pass
Thank you for kind code review 🙏🏻

Classification of USER_ERROR
● Many syntax errors
● Typical semantic error is that user accesses to not existed column/schema
SYNTAX_ERROR
● mismatched input ... expecting
● Hive views are not supported
● ...
Semantic Error Name Count
null 743
MISSING_ATTRIBUTE 273
MISSING_SCHEMA 169
MUST_BE_AGGREGATE_OR_GROUP_BY 111
TYPE_MISMATCH 87
FUNCTION_NOT_FOUND 72
MISSING_TABLE 53
MISSING_CATALOG 23
INVALID_LITERAL 4
AMBIGUOUS_ATTRIBUTE 4
CANNOT_HAVE_AGGREGATIONS_WINDOWS_OR_GROUPING 3
INVALID_ORDINAL 2
NOT_SUPPORTED 2
ORDER_BY_MUST_BE_IN_SELECT 2
NESTED_AGGREGATION 1
WINDOW_REQUIRES_OVER 1
REFERENCE_TO_OUTPUT_ATTRIBUTE_WITHIN_ORDER_BY_AGGREGATION 1
Error Name Count
SYNTAX_ERROR 635
NOT_FOUND 47
HIVE_EXCEEDED_PARTITION_LIMIT 26
INVALID_FUNCTION_ARGUMENT 19
INVALID_CAST_ARGUMENT 6
PERMISSION_DENIED 3
SUBQUERY_MULTIPLE_ROWS 3
ADMINISTRATIVELY_KILLED 2
INVALID_SESSION_PROPERTY 1
DIVISION_BY_ZERO 1

● Execute query with Presto because of speed and rich UDF
○ User wants to execute query quickly and check data roughly
● Implement batch with Spark SQL in OASIS or with Hive in console
○ User wants to create stable batch
● Yanagishima is like Gist, OASIS is like GitHub
Typical use case in Yanagishima & OASIS

Why we don’t use Presto in batch
● Lack of Hive metastore impersonation
○ Support Impersonation in Metastore communication #43
● Less stable than Hive or Spark
○ Want to prioritize stability rather than latency in batch
○ Need to handle huge data

Impersonation
● Presto does not support impersonating the end user when accessing the Hive metastore
● SELECT query is no problem but CREATE/DROP/ALTER/INSERT/DELETE query can be problem
● If Presto process’s user is presto and yukawa creates table, presto accesses Hive metastore as
presto user, not yukawa. It means other user can drop table if presto user has write permission
● We don’t allow presto user to write in HDFS with Apache Ranger
● HMS impersonation is available in Starburst Distribution of Presto
● Support for impersonation will be a game changer
CREATE TABLE line.ad
WRITE permission
hadoop.proxyuser.presto.groups=*
hadoop.proxyuser.presto.hosts=*
DROP TABLE line.ad
Hive Metastorepresto user
Ranger
Hadoop
yukawa
ebihara

Less stable than Hive/Spark
● A single query/worker crash can be a bottleneck
● Auto restart worker mechanizm may be necessary
● Presto worker, data node, node manager are deployed in the same machine
● Enabling CGroups may be necessary because pyspark python process cpu usage is high, etc
● For example, yarn.nodemanager.resource.percentage-physical-cpu-limit : 70%
Hive/Spark batch is more stable but it’s not easy to convert from Presto to
Hive/Spark due to date function, syntax, …

Presto
Hive
Spark SQL
json_extract_scalar get_json_object
date_format(now(), '%Y%m%d') date_format(current_timestamp(), 'yyyyMMdd')
cross join unnest () as t () lateral view explode() t
url function like url_extract_parameter -
try -
Hard to convert query

Confusing Spark SQL error message
Spark can use “DISTINCT” as column name SPARK-27170
It’s difficult to understand error message
It will be improved in Spark 3.0 SPARK-27901
cannot resolve ‘`distinct`’ given input columns: [...]; line 1 pos 7;
‘GlocalLimit 100’
+- ‘LocalLimit 100
+- ‘Project [‘distinct, ...’]
+- Filter (...)
+- SubqueryAlias ...
+- HiveTableRelation ...
SELECT distinct
,a
,b
,c
FROM test_table LIMIT 100
Confusing...💨

Recent Issues
More than 100,000 partitions error occurred in 307 #619
● The fixed version was released within one day
Partition location does not exist in hive external table #620
● Upgrade Hadoop library 2.7.7 to 3.2.0 affected
● Ongoing https://issues.apache.org/jira/browse/HDFS-14466
Create table failed when using viewfs #10099
● It’s known issue and not fatal for now because we use Presto with read-only mode
Handle repeated predicate pushdown into Hive connector #984
● Performance regression, already fixed in 315
Great !
Schema mismatch of parquet file #9156
● Our old cluster faced this issue recently, already fixed in 0.203

Scale
Table # Partitions
Table A 1,588,031
Table B 1,429,047
Table C 1,429,046
Table D 1,116,130
Table E 772,725
● Daily queries: ~20K
● Daily processed data: 330TB
● Daily processed rows: 4 Trillion rows
● Partitions
hive.max-partitions-per-scan (default: 100,000)
Maximum number of partitions for a single table scan

Already fixed in 308
More than 100,000 partitions error
More than 100,000 partitions error occurred in 307 #619
→ Query over table 'default.test_left' can potentially read more than 100000 partitions
at io.prestosql.plugin.hive.HiveMetadata.getPartitionsAsList(HiveMetadata.java:601)
at io.prestosql.plugin.hive.HiveMetadata.getTableLayouts(HiveMetadata.java:1645)
....

Steps to Reproduce
● Start hadoop-master docker image
$ presto-product-tests/conf/docker/singlenode/compose.sh up -d hadoop-master
$ presto-product-tests/conf/docker/singlenode/compose.sh up -d
● Create a table and populate rows
presto> CREATE TABLE test_part (col int, part_col int) with (partitioned_by = ARRAY['part_col']);
presto> INSERT INTO test_part (col, part_col) SELECT 0, CAST(id AS int) FROM UNNEST (sequence(1, 100)) AS u(id);
presto> INSERT INTO test_part (col, part_col) SELECT 0, CAST(id AS int) FROM UNNEST (sequence(101, 150)) AS u(id);
hive.max-partitions-per-scan=100 in product test
hive.max-partitions-per-writers=100 (default)
● Execute reproducible query (TestHivePartitionsTable.java)
presto> SELECT a.part_col FROM
(SELECT * FROM test_part WHERE part_col = 1) a, (SELECT * FROM test_part WHERE part_col = 1) b
WHERE a.col = b.col

Frames and Variables
● Migration to remove table layout was ongoing
● “TupleDomain” is one of the keywords about predicate pushdown

Fix
● Fixed EffectivePredicateExtractor.visitTableScan method
Actually, it’s the workaround until the migration completed
● Timeline
○ Created Issue April 11, 4PM
○ Merged commit April 12, 7AM
○ Released 308 April 12, 3PM
Released within one day 🎉

Webhdfs partition location does not exist
Partitioned webhdfs table throws “Partition location does not exist” error #620
● Webhdfs isn’t supported (at least not tested) due to missing classes #957
→ Add missing jar to plugin directory
● Create table with webhdfs location on hive
hive> CREATE TABLE test_part_webhdfs (col1 int) PARTITIONED BY (dt int)
LOCATION 'webhdfs://hadoop-master:50070/user/hive/warehouse/test_part_webhdfs';
hive> INSERT INTO test_part_webhdfs PARTITION(dt=1) VALUES (1);
presto> SELECT * FROM test_part_webhdfs;
→ Partition location does not exist:
webhdfs://hadoop-master:50070/user/hive/warehouse/test_part_webhdfs/dt=1

Remote Debugger
● Edit jvm.config
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005
● Remote debugger configuration in IntelliJ IDEA
Run→Edit Configurations...→＋→Remote

Step into Hadoop library
● The argument calling Hadoop library are same
○ We can also step into dependent libraries as local codes
Different internal call
● Hadoop 2.7.7 (Presto 306)
http://hadoop-master:50070/webhdfs/v1/user/hive/warehouse/test_part/dt=1?op=LISTSTATUS&user.name=x
● Hadoop 3.2.0 (Presto 307)
http://hadoop-master:50070/webhdfs/v1/user/hive/warehouse/test_part/dt%253D1?op=LISTSTATUS&user.name=x
{
"RemoteException":{
"exception":"FileNotFoundException",
"javaClassName":"java.io.FileNotFoundException",
"message":"File /user/hive/warehouse/test_part/dt%3D1 does not exist."
}
}

HDFS-14466
FileSystem.listLocatedStatus for path including '=' encodes it and returns FileNotFoundException
Equals sign is doubled encoded
dt=1 → dt%3D1 → dt%253D1
HADOOP-16258.001.patch by Iwasaki-san

Already fixed in 0.203
Schema mismatch of parquet
● Failed to access table created by Spark
presto> SELECT * FROM default.test_parquet WHERE dt='20190101'
Error opening Hive split hdfs://cluster/apps/hive/warehouse/test_parquet/dt=20190101/20190101.snappy.parquet
(offset=503316480, length=33554432):
Schema mismatch, metastore schema for row column col1.element has 13 fields but parquet schema has 12 fields
Issue column type is ARRAY<STRUCT<...>>
● Hive metastore returns 13 fields
● Parquet schema returns 12 fields

parquet-tools
● Supported options
○ cat
○ head
○ schema
○ meta
○ dump
○ merge
○ rowcount
○ size
https://github.com/apache/parquet-mr/tree/master/parquet-tools
● Inspect schema of parquet file
$ parquet-tools schema sample.parquet
message spark_schema {
optional group @metadata {
optional binary beat (UTF8);
optional binary topic (UTF8);
optional binary type (UTF8);
optional binary version (UTF8);
}
optional binary @timestamp (UTF8);
…
We can analyze even large file over GB within few seconds

Contributions
● General
○Retrieve semantic error name
○Fix partition pruning regression on 307
○COMMENT ON TABLE
○DROP COLUMN
○Column alias in CTAS
○Non-ascii date_format function argument
● Cassandra connector
○INSERT statement
○Auto-discover protocol version
○Materialized View
○Smallint, tinyint, date type
○Nested collection type
● Hive connector
○CREATE TABLE property
■ textfile_skip_header_line_count
■ textfile_skip_footer_line_count
● MySQL connector
○Map JSON to Presto JSON
● CLI
○Output format
■ JSON
■ CSV_UNQUOTED
■ CSV_HEADER_UNQUOTED
Thanks PSF, Starburst and ex-Teradata ✨

Presto conferencetokyo2019

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Presto conferencetokyo2019

Similar to Presto conferencetokyo2019 (20)

More from wyukawa

More from wyukawa (19)

Recently uploaded

Recently uploaded (20)

Presto conferencetokyo2019