Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apache Kylin and Use Cases - 2018 Big Data Spain


Published on

Apache Kylin is rapidly being adopted over the world as the leading open source OLAP for Big Data. In this topic, Luke Han, creator and PMC chair of Apache Kylin, will introduce the motivation when build this project and technical highlights, alwo will explore how various industries use Apache Kylin, and the resulting business impact.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Apache Kylin and Use Cases - 2018 Big Data Spain

  1. 1. Apache Kylin & Use Cases Luke Han | 2018 Big Data Spain
  2. 2. Luke Han • Co-founder & CEO at Kyligence • Co-creator and PMC Chairof Apache Kylin • Apache Software FoundationMember • Microsoft RegionalDirector & MVP • Former eBay Big Data Product Manager Lead © Kyligence Inc. 2018. About Luke Han
  3. 3. Kyligence = Kylin + Intelligence - Kyligence is formed bythe team who created ApacheKylin, leading opensource OLAP for Big Data. Kyligence provides an intelligent data warehouse built fordata cognitive analytics at web scale. - Funding by leading VCs: - Redpoint Ventures, Cisco, - CBC Capital and Shunwei Capital, - Eight Roads Ventures (Fidelity International Arm) - CRN Top 10 Big Data Startups 2018 © Kyligence Inc. 2018. About Kyligence
  4. 4. Apache Kylin
  5. 5. © Kyligence Inc. 2018. About Apache Kylin • Leading Open Source OLAP for Big Data • Open sourced by eBay in 2014 • Graduated to Apache Top Project in 2015 • 1000+ Adoptions world wild • 2015 InfoWorld Bossie Awards • 2016 InfoWorld Bossie Awards
  6. 6. © Kyligence Inc. 2018. 1000+ Global Users Apache Kylin - Leading Open Source OLAP for Big Data
  7. 7. © Kyligence Inc. 2018. Presentation Visualization Data Lake Data Source o Too many options o Low performance o Long learning curve o Compatibility issue o Technology vs Data OLAP: The Missing Part of Big Data Hive Impala Spark SQL Drill MapReduce …Spark
  8. 8. © Kyligence Inc. 2018. Presentation Visualization Data Lake Data Source o SQL Acceleration for Big Data o Semantic Layer o Speed up Analytics o ANSI SQL Interface o High Performance and High Concurrency Apache Kylin: Bring OLAP back to Big Data OLAP Data Mart Hive Impala Spark SQL Drill MapReduce …Spark
  9. 9. Apache Kylin Technical Highlights
  10. 10. © Kyligence Inc. 2018. OLAP and OLAP Cube Online analytical processing, or OLAP, is an approach to answering multi- dimensional analytical (MDA) queries swiftly in computing. – Wikipedia Basic operations – Roll-up – Drill-down – Slice and dice – Pivot OLAP cube is a data structure optimized for very quick data analysis.
  11. 11. © Kyligence Inc. 2018. Cube: balance between space and time OLAP Cube --Key-Value Multiple Dimensional Model --Relational Classification, aggregation, and sorting
  12. 12. © Kyligence Inc. 2018. Apache Kylin Architecture Overview Apache Kylin Data Analyst, BI Tools, Web App… SQL Online calculation Offline calculation Scan & filter Extract Compute Load Optimize & Rewrite
  13. 13. © Kyligence Inc. 2018. SQL execution plan without Cube select l_returnflag, o_orderstatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price from v_lineitem inner join v_orders on l_orderkey = o_orderkey where l_shipdate <= '1998-09-16' group by l_returnflag, o_orderstatus order by l_returnflag, o_orderstatus; Sample:Check the order return and order status relationship in a time range Sort Aggr. Filter Tables O(N) Join No cube, all need online calculations, CPU and IO intensive, latency is remarkable.
  14. 14. © Kyligence Inc. 2018. SQL execution plan with Cube Cube technology speed up query performance with pre-calculation Sort Cube Filter Sort Aggr. Filter Tables O(N) Join O(flag x status x days) = O(1) Aggregated data The table join and aggregation are completed offline. Directly from aggregated data (cube) with index; Much less CPU and IO. Latency is small.
  15. 15. © Kyligence Inc. 2018. ORDERS CUSTOMER SUPPLIER PART LINEITEM PARTSUPP NATION REGION Join Join Join Join Join ORDERS CUSTOMER PART LINEITEM PARTSUPP Join Join Join All rights reserved ©Kyligence Inc. Multidimensional Schema Apach Kylin supports Star-Schema, Snowflake-Schema
  16. 16. © Kyligence Inc. 2018. Persistent the cube in HBase Relational to Key Value store
  17. 17. © Kyligence Inc. 2018. How to query the cube Translate cube query into HBase table scan – Columns, Group by → Cuboid ID – Filters -> Scan Range (Row Key) – Aggregations -> Measure Columns (Row Values) Scan HBase table and translate HBase result into cube result – HBase Result (key + value) -> Cube Result (dimensions + measures) No Hive touch, no MapReduce job in the query time
  18. 18. © Kyligence Inc. 2018. High performance & High concurrency together Sub-second latency on PB scale dataset Star schema benchmark: SQL Latency Lower is better Data VolumeScale Lower is better
  19. 19. © Kyligence Inc. 2018. Seamless integration with BI tools From open source to commercial BI
  20. 20. Apache Kylin Use Cases
  21. 21. © Kyligence Inc. 2018. Apache Kylin Use Cases Solution • Behavior Analytics • LogAnalysis • Data Mart/DW • Self-service Data Service • Retail Analytics • Financial Asset • Advertising Analytics • Real-time Analytics • Gaming Analysis Apache Kylin fits various scenarios 1000+ adoptions all of the world
  22. 22. © Kyligence Inc. 2018. Use Case – Insight on Trillion Data Top1 news feed app in China
  23. 23. © Kyligence Inc. 2018. Use Case: PB-level Analytics Platform Cube Storage: 971TB (almost PB) Cube numbers: 973 Cube Data Records: 8.9 Trillion rows 90%ile latency: <1.2s Frequency: 3.8 million queries / day Top O2O services provider in China Supporting all critical business lines including E-Takeaways, Hotel, Movie, LBS, Tickets… Latest updated -201808
  24. 24. © Kyligence Inc. 2018. Use Case: Online Shopping Reporting ▪ Our reporting system used Impala as a backend database previously. - It took a long time (about 60 sec) to show Web UI. ▪ In order to lower the latency, we moved to Apache Kylin. - Average latency < 1sec for most cases ▪ Thanks to low latency with Kylin, we become possible to focus on adding functions for users. ▪ We provide a reporting system that show statistics for store owners. - e. g. impressions, clicks and sales. The most visited website in Japan Yahoo! Japan
  25. 25. © Kyligence Inc. 2018. Use Case: Data Factory for Business • Serving 18 business lines as the engine for mi’s“data factory” • Daily incremental 17 billion • 95% queries < 500ms. Leading smart phone and smart device manufacture
  26. 26. © Kyligence Inc. 2018. The data platform based on Apache Kylin solved the problem of massive user queries excellently. -- Chase Zhang, Data Platform Engineer of Strikingly Performance • Use Apache Kylin to speedup analytics with, and support high concurrency Containerizing • Apache Kylin runs on AWS ECS Integration • Developed a scheduler systemto manage all kinds of jobs Use Case – Website traffic Analytics A company to provide convenient and one stop website building solutions.
  27. 27. Apache Kylin Roadmap
  28. 28. © Kyligence Inc. 2018. Apache Kylin Roadmap • New storage support –Parquet • Real-time support • Containerization From the community
  29. 29. About Kyligence
  30. 30. Traditional Data Warehousing EnormousManual Effortsand Repeated Work © Kyligence Inc. 2018, Confidential.
  31. 31. Human Intelligence Intelligence and Automation The future of DataAnalytics Artificial IntelligenceVS
  32. 32. Augmented Analytics Platform SQL Query Log Analytic Behavior Data Schema Data Profile ML-based Discoveryof Analytic Pattern ProprietaryData Modeling Automation Self-directed Storage Layer Optimization Intelligent QueryPush- down &Routing BI Real-time Analysis Data-as-a- Service Local Deployment Cloud Platform Container Data Services © Kyligence Inc. 2018, Confidential.
  33. 33. Machine Learning Augmented Analytics Available from Kyligence 3.x
  34. 34. © Kyligence Inc. 2018. Kyligence Cloud Transforming Big Data Analytics to Cloud Kyligence Cloud ANSI SQL Dashboard OLAP Hadoop Customer Cloud Account client cloud Kyligence Enterprise Platform streaming Cluster Deploy Account Management Diagnosis & Optimization Queries & Reporting cloud storage tables, logs, files RDBMS (metadata) ANSI SQL Cloud Data Warehouse Cluster Management
  35. 35. © Kyligence Inc. 2018. Kyligence Cloud Available: AWS, Azure, Google Cloud, Alibaba Cloud , Huawei Cloud One-click provisioning Auto Scaling High Performance Seamless Integration Intelligent Ops Deploy globally in 30 minutes Scale cluster automatically for different workloads Powered by Kyligence Analytics Platform Connect to cloud data sources Enterprise ODBC driver for BI Online diagnosis and continuous optimization Speed Upmission-critical analytics in the cloud
  36. 36. © Kyligence Inc. 2018. Use Case : Replaced IBM Cognos 1 Kyligence cube replaced 800+ IBM Cognos cubes PB level (300B records) big data warehouse of both self-service aggregation query and raw data query by business analysts Self-Service Big Data Warehouse Efficient IT Operation Significantly increase IT operation efficiency as 1 Kyligence cube replacing 800 Cognos cubes with unified data access management Kyligence scale-out architecture provide best flexibility for IT infrastructure when faced with increasing analytics and concurrency demands Better flexibility of Architecture Support analysis on high granularity dimensions such as Merchant (10M cardinality) and Card (10B cardinality) Merchant or Card Multi-dimensional Analytics
  37. 37. © Kyligence Inc. 2018. Use Case: Customer 360 for FMCG Azure + Kyligence ➢ 360 degree view of user profile. ➢ Powering analysts insight into data without IT ➢ HDInsight + Kyligence + Power BI
  38. 38. © Kyligence Inc. 2018. Global Partners Kyligence Open Ecosystem Microsoft Azure Partner AWS Technology Partner Tableau Technology Partner Cloudera Sliver Partner MapR Converge Partner Hortonworks Community Partner Huawei Solution Partner
  39. 39. Q & A