Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Presto conferencetokyo2019

1,689 views

Published on

presto

Published in: Data & Analytics
  • I have tried numerous times over the last 10 years to try & beat bulimia on my own with no luck. Coming across ur program online was the best thing I have done, knowing that all my thoughts & actions were the same as what you & many other people with bulimia had gone through or are going through made me realize I'm not crazy & I wasn't alone in this. The thought of bingeing or purging rarely enters my mind anymore if it does its for a split second & I'm able to push it right back out again. I'm eating normally again everything in moderation, no dieting enjoying food which I hadn't for a long time being kind to my body & loving myself. I have now been in recovery with no relapses for over 6 months I never thought I would be able to overcome this but I'm so happy & proud to say I'm finally free ▲▲▲ http://ishbv.com/bulimiarec/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Presto conferencetokyo2019

  1. 1. Presto At LINE Presto Conference Tokyo 2019 2019/07/11 Wataru Yukawa & Yuya Ebihara
  2. 2. Agenda ● USE ○ Our on-premises log analysis platform and tools ○ Yanagishima features ○ Yanagishima internals ○ Presto error query analysis ○ Our Presto/Spark use case with Yanagishima/OASIS ● DEBUG ○ More than 100,000 partitions error ○ Webhdfs partition location does not exist ○ Schema mismatch of parquet ○ Contributions
  3. 3. Data flow Hadoop RDBMS Log Hadoop Ingest Data Report Ranger Tableau OASIS Yanagishima LINE Analytics Aquarium
  4. 4. OASIS ● Web-based data analysis platform like Apache Zeppelin, Jupyter ● Basically use Spark but also can use Presto OASIS - Data Analysis Platform for Multi-tenant Hadoop Cluster https://www.slideshare.net/linecorp/oasis-data-analysis-platform-for-multitenant-hadoop-cluster 904 UU 67k PV
  5. 5. LINE Analytics ● Analysis tool similar to Google Analytics ○ Dashboard ○ Basic Summary ○ Realtime ○ Page Contents ○ Event Tracking ○ User Environment ○ Tools ● Backend is Presto Why LINE's Front-end Development Team Built the Web Tracking System https://www.slideshare.net/linecorp/why-lines-frontend-development-team-built-the-web-tracking-system 433 UU 7k PV
  6. 6. Aquarium ● Metadata catalog tool ○ Contacts ○ Note ○ Columns ○ Location ○ HDFS ○ Relationship ○ Reference ○ DDL Efficient And Invincible Big Data Platform In LINE https://www.slideshare.net/linecorp/efficient-and-invincible-big-data-platform-in-line/25 481 UU 10k PV
  7. 7. Yanagishima 1.3k UU 11k PV ● Web UI for ○ Presto ○ Hive ○ Spark SQL ○ Elasticsearch ● Started in 2015 ● Many similar tools like Hue, Airpal, Shib but I wanted to create in my mind https://github.com/yanagishima/yanagishima
  8. 8. Agenda ● USE ○ Our on-premises log analysis platform and tools ○ Yanagishima features ○ Yanagishima internals ○ Presto error query analysis ○ Our Presto/Spark use case with Yanagishima/OASIS ● DEBUG ○ More than 100,000 partitions error ○ Webhdfs partition location does not exist ○ Schema mismatch of parquet ○ Contributions
  9. 9. Yanagishima features ● Share query with permanent link ● Handle multiple Presto clusters ● Input parameters ● Pretty print for json & map ● Chart ● Pivot table ● Show EXPLAIN result as Text and Graphviz ● Desktop notification
  10. 10. Input parameters
  11. 11. Pretty print
  12. 12. ● Supported chart type ○ Line ○ Stacked Area ○ Full-Stacked Area ○ Column ○ Stacked Column Chart
  13. 13. Pivot table
  14. 14. Explain ● Text ● Graphviz
  15. 15. Desktop notification
  16. 16. Agenda ● USE ○ Our on-premises log analysis platform and tools ○ Yanagishima features ○ Yanagishima internals ○ Presto error query analysis ○ Our Presto/Spark use case with Yanagishima/OASIS ● DEBUG ○ More than 100,000 partitions error ○ Webhdfs partition location does not exist ○ Schema mismatch of parquet ○ Contributions
  17. 17. Yanagishima Components API server ○ Written in Java ○ Store query into sqlite3 or mysql ○ Store query result into filesystem ○ Don’t have authentication, use in-house auth system in proxy server ○ Don’t have authorization system, use Apache Ranger SPA ○ Written in jQuery at first ○ Frontend engineers in our department replaced with Vue.js in 2017 ○ Clean and modern code thanks to their major refactoring
  18. 18. How to process query ● Asynchronous processing flow ○ User submits query ○ Get query id ○ Track with query id by client side polling ○ User can know progress and kill query ● Easy to implement thanks to Presto REST API ● Not easy to implement in Hive and Spark due to lack of API e.g., Not easy to get YARN application id
  19. 19. Dependency ● Depends on presto-cli not JDBC because of performance and feature ● Yanagishima wants not only query result but also column name in 1 Presto request ● DatabaseMetaData#getColumns is slow, more than 10s due to system.jdbc.columns table scan ● Presto didn’t support JDBC cancel in 2015 but now supports ● Chose to use presto-cli but it has de-merit
  20. 20. Compatibility issue ● Unfortunately, presto-cli >= 0.205 can’t connect old Presto server because of ROW type #224 → Bundled new & old presto-cli without shade because package name is different io.prestosql & com.facebook.presto ● May change to use JDBC because it’s better not to use presto-cli as PSF mentioned in the above issue Version Workers Auth Analysis 315 76 - Datachain 314 100 LDAP Shonan 306 36 - Dataopen 0.197 9 LDAP Datalake2 0.188 200 LDAP
  21. 21. Agenda ● USE ○ Our on-premises log analysis platform and tools ○ Yanagishima features ○ Yanagishima internals ○ Presto error query analysis ○ Our Presto/Spark use case with Yanagishima/OASIS ● DEBUG ○ More than 100,000 partitions error ○ Webhdfs partition location does not exist ○ Schema mismatch of parquet ○ Contributions
  22. 22. Analysis Presto error query SemanticErrorName is available since 313 #790 Syntactic Analysis Semantic Analysis Syntax Error Semantic Error Fail Fail Pass Thank you for kind code review 🙏🏻
  23. 23. Classification of USER_ERROR ● Many syntax errors ● Typical semantic error is that user accesses to not existed column/schema SYNTAX_ERROR ● mismatched input ... expecting ● Hive views are not supported ● ... Semantic Error Name Count null 743 MISSING_ATTRIBUTE 273 MISSING_SCHEMA 169 MUST_BE_AGGREGATE_OR_GROUP_BY 111 TYPE_MISMATCH 87 FUNCTION_NOT_FOUND 72 MISSING_TABLE 53 MISSING_CATALOG 23 INVALID_LITERAL 4 AMBIGUOUS_ATTRIBUTE 4 CANNOT_HAVE_AGGREGATIONS_WINDOWS_OR_GROUPING 3 INVALID_ORDINAL 2 NOT_SUPPORTED 2 ORDER_BY_MUST_BE_IN_SELECT 2 NESTED_AGGREGATION 1 WINDOW_REQUIRES_OVER 1 REFERENCE_TO_OUTPUT_ATTRIBUTE_WITHIN_ORDER_BY_AGGREGATION 1 Error Name Count SYNTAX_ERROR 635 NOT_FOUND 47 HIVE_EXCEEDED_PARTITION_LIMIT 26 INVALID_FUNCTION_ARGUMENT 19 INVALID_CAST_ARGUMENT 6 PERMISSION_DENIED 3 SUBQUERY_MULTIPLE_ROWS 3 ADMINISTRATIVELY_KILLED 2 INVALID_SESSION_PROPERTY 1 DIVISION_BY_ZERO 1
  24. 24. Agenda ● USE ○ Our on-premises log analysis platform and tools ○ Yanagishima features ○ Yanagishima internals ○ Presto error query analysis ○ Our Presto/Spark use case with Yanagishima/OASIS ● DEBUG ○ More than 100,000 partitions error ○ Webhdfs partition location does not exist ○ Schema mismatch of parquet ○ Contributions
  25. 25. ● Execute query with Presto because of speed and rich UDF ○ User wants to execute query quickly and check data roughly ● Implement batch with Spark SQL in OASIS or with Hive in console ○ User wants to create stable batch ● Yanagishima is like Gist, OASIS is like GitHub Typical use case in Yanagishima & OASIS
  26. 26. Why we don’t use Presto in batch ● Lack of Hive metastore impersonation ○ Support Impersonation in Metastore communication #43 ● Less stable than Hive or Spark ○ Want to prioritize stability rather than latency in batch ○ Need to handle huge data
  27. 27. Impersonation ● Presto does not support impersonating the end user when accessing the Hive metastore ● SELECT query is no problem but CREATE/DROP/ALTER/INSERT/DELETE query can be problem ● If Presto process’s user is presto and yukawa creates table, presto accesses Hive metastore as presto user, not yukawa. It means other user can drop table if presto user has write permission ● We don’t allow presto user to write in HDFS with Apache Ranger ● HMS impersonation is available in Starburst Distribution of Presto ● Support for impersonation will be a game changer CREATE TABLE line.ad WRITE permission hadoop.proxyuser.presto.groups=* hadoop.proxyuser.presto.hosts=* DROP TABLE line.ad Hive Metastorepresto user Ranger Hadoop yukawa ebihara
  28. 28. Less stable than Hive/Spark ● A single query/worker crash can be a bottleneck ● Auto restart worker mechanizm may be necessary ● Presto worker, data node, node manager are deployed in the same machine ● Enabling CGroups may be necessary because pyspark python process cpu usage is high, etc ● For example, yarn.nodemanager.resource.percentage-physical-cpu-limit : 70% Hive/Spark batch is more stable but it’s not easy to convert from Presto to Hive/Spark due to date function, syntax, …
  29. 29. Presto Hive Spark SQL json_extract_scalar get_json_object date_format(now(), '%Y%m%d') date_format(current_timestamp(), 'yyyyMMdd') cross join unnest () as t () lateral view explode() t url function like url_extract_parameter - try - Hard to convert query
  30. 30. Confusing Spark SQL error message Spark can use “DISTINCT” as column name SPARK-27170 It’s difficult to understand error message It will be improved in Spark 3.0 SPARK-27901 cannot resolve ‘`distinct`’ given input columns: [...]; line 1 pos 7; ‘GlocalLimit 100’ +- ‘LocalLimit 100 +- ‘Project [‘distinct, ...’] +- Filter (...) +- SubqueryAlias ... +- HiveTableRelation ... SELECT distinct ,a ,b ,c FROM test_table LIMIT 100 Confusing...💨
  31. 31. DEBUG
  32. 32. Recent Issues More than 100,000 partitions error occurred in 307 #619 ● The fixed version was released within one day Partition location does not exist in hive external table #620 ● Upgrade Hadoop library 2.7.7 to 3.2.0 affected ● Ongoing https://issues.apache.org/jira/browse/HDFS-14466 Create table failed when using viewfs #10099 ● It’s known issue and not fatal for now because we use Presto with read-only mode Handle repeated predicate pushdown into Hive connector #984 ● Performance regression, already fixed in 315 Great ! Schema mismatch of parquet file #9156 ● Our old cluster faced this issue recently, already fixed in 0.203
  33. 33. Scale Table # Partitions Table A 1,588,031 Table B 1,429,047 Table C 1,429,046 Table D 1,116,130 Table E 772,725 ● Daily queries: ~20K ● Daily processed data: 330TB ● Daily processed rows: 4 Trillion rows ● Partitions hive.max-partitions-per-scan (default: 100,000) Maximum number of partitions for a single table scan
  34. 34. Agenda ● USE ○ Our on-premises log analysis platform and tools ○ Yanagishima features ○ Yanagishima internals ○ Presto error query analysis ○ Our Presto/Spark use case with Yanagishima/OASIS ● DEBUG ○ More than 100,000 partitions error ○ Webhdfs partition location does not exist ○ Schema mismatch of parquet ○ Contributions
  35. 35. Already fixed in 308 More than 100,000 partitions error More than 100,000 partitions error occurred in 307 #619 → Query over table 'default.test_left' can potentially read more than 100000 partitions at io.prestosql.plugin.hive.HiveMetadata.getPartitionsAsList(HiveMetadata.java:601) at io.prestosql.plugin.hive.HiveMetadata.getTableLayouts(HiveMetadata.java:1645) ....
  36. 36. Steps to Reproduce ● Start hadoop-master docker image $ presto-product-tests/conf/docker/singlenode/compose.sh up -d hadoop-master $ presto-product-tests/conf/docker/singlenode/compose.sh up -d ● Create a table and populate rows presto> CREATE TABLE test_part (col int, part_col int) with (partitioned_by = ARRAY['part_col']); presto> INSERT INTO test_part (col, part_col) SELECT 0, CAST(id AS int) FROM UNNEST (sequence(1, 100)) AS u(id); presto> INSERT INTO test_part (col, part_col) SELECT 0, CAST(id AS int) FROM UNNEST (sequence(101, 150)) AS u(id); hive.max-partitions-per-scan=100 in product test hive.max-partitions-per-writers=100 (default) ● Execute reproducible query (TestHivePartitionsTable.java) presto> SELECT a.part_col FROM (SELECT * FROM test_part WHERE part_col = 1) a, (SELECT * FROM test_part WHERE part_col = 1) b WHERE a.col = b.col
  37. 37. Frames and Variables ● Migration to remove table layout was ongoing ● “TupleDomain” is one of the keywords about predicate pushdown
  38. 38. Fix ● Fixed EffectivePredicateExtractor.visitTableScan method Actually, it’s the workaround until the migration completed ● Timeline ○ Created Issue April 11, 4PM ○ Merged commit April 12, 7AM ○ Released 308 April 12, 3PM Released within one day 🎉
  39. 39. Agenda ● USE ○ Our on-premises log analysis platform and tools ○ Yanagishima features ○ Yanagishima internals ○ Presto error query analysis ○ Our Presto/Spark use case with Yanagishima/OASIS ● DEBUG ○ More than 100,000 partitions error ○ Webhdfs partition location does not exist ○ Schema mismatch of parquet ○ Contributions
  40. 40. Webhdfs partition location does not exist Partitioned webhdfs table throws “Partition location does not exist” error #620 ● Webhdfs isn’t supported (at least not tested) due to missing classes #957 → Add missing jar to plugin directory ● Create table with webhdfs location on hive hive> CREATE TABLE test_part_webhdfs (col1 int) PARTITIONED BY (dt int) LOCATION 'webhdfs://hadoop-master:50070/user/hive/warehouse/test_part_webhdfs'; hive> INSERT INTO test_part_webhdfs PARTITION(dt=1) VALUES (1); presto> SELECT * FROM test_part_webhdfs; → Partition location does not exist: webhdfs://hadoop-master:50070/user/hive/warehouse/test_part_webhdfs/dt=1
  41. 41. Remote Debugger ● Edit jvm.config -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005 ● Remote debugger configuration in IntelliJ IDEA Run→Edit Configurations...→+→Remote
  42. 42. Step into Hadoop library ● The argument calling Hadoop library are same ○ We can also step into dependent libraries as local codes Different internal call ● Hadoop 2.7.7 (Presto 306) http://hadoop-master:50070/webhdfs/v1/user/hive/warehouse/test_part/dt=1?op=LISTSTATUS&user.name=x ● Hadoop 3.2.0 (Presto 307) http://hadoop-master:50070/webhdfs/v1/user/hive/warehouse/test_part/dt%253D1?op=LISTSTATUS&user.name=x { "RemoteException":{ "exception":"FileNotFoundException", "javaClassName":"java.io.FileNotFoundException", "message":"File /user/hive/warehouse/test_part/dt%3D1 does not exist." } }
  43. 43. HDFS-14466 FileSystem.listLocatedStatus for path including '=' encodes it and returns FileNotFoundException Equals sign is doubled encoded dt=1 → dt%3D1 → dt%253D1 HADOOP-16258.001.patch by Iwasaki-san
  44. 44. Agenda ● USE ○ Our on-premises log analysis platform and tools ○ Yanagishima features ○ Yanagishima internals ○ Presto error query analysis ○ Our Presto/Spark use case with Yanagishima/OASIS ● DEBUG ○ More than 100,000 partitions error ○ Webhdfs partition location does not exist ○ Schema mismatch of parquet ○ Contributions
  45. 45. Already fixed in 0.203 Schema mismatch of parquet ● Failed to access table created by Spark presto> SELECT * FROM default.test_parquet WHERE dt='20190101' Error opening Hive split hdfs://cluster/apps/hive/warehouse/test_parquet/dt=20190101/20190101.snappy.parquet (offset=503316480, length=33554432): Schema mismatch, metastore schema for row column col1.element has 13 fields but parquet schema has 12 fields Issue column type is ARRAY<STRUCT<...>> ● Hive metastore returns 13 fields ● Parquet schema returns 12 fields
  46. 46. parquet-tools ● Supported options ○ cat ○ head ○ schema ○ meta ○ dump ○ merge ○ rowcount ○ size https://github.com/apache/parquet-mr/tree/master/parquet-tools ● Inspect schema of parquet file $ parquet-tools schema sample.parquet message spark_schema { optional group @metadata { optional binary beat (UTF8); optional binary topic (UTF8); optional binary type (UTF8); optional binary version (UTF8); } optional binary @timestamp (UTF8); … We can analyze even large file over GB within few seconds
  47. 47. Agenda ● USE ○ Our on-premises log analysis platform and tools ○ Yanagishima features ○ Yanagishima internals ○ Presto error query analysis ○ Our Presto/Spark use case with Yanagishima/OASIS ● DEBUG ○ More than 100,000 partitions error ○ Webhdfs partition location does not exist ○ Schema mismatch of parquet ○ Contributions
  48. 48. Contributions ● General ○Retrieve semantic error name ○Fix partition pruning regression on 307 ○COMMENT ON TABLE ○DROP COLUMN ○Column alias in CTAS ○Non-ascii date_format function argument ● Cassandra connector ○INSERT statement ○Auto-discover protocol version ○Materialized View ○Smallint, tinyint, date type ○Nested collection type ● Hive connector ○CREATE TABLE property ■ textfile_skip_header_line_count ■ textfile_skip_footer_line_count ● MySQL connector ○Map JSON to Presto JSON ● CLI ○Output format ■ JSON ■ CSV_UNQUOTED ■ CSV_HEADER_UNQUOTED Thanks PSF, Starburst and ex-Teradata ✨
  49. 49. THANK YOU

×