Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Augmented OLAP for Big Data

1,223 views

Published on

With an explosion of data, today’s emerging needs are not being met by existing technologies, which require rich skill sets and expertise. Companies that want to lead changes in highly competitive markets must optimize their storage, speed, and spending. The key is for them to augment their data management and analytics platforms with artificial intelligence and machine learning for analysts, engineers, and other users.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Augmented OLAP for Big Data

  1. 1. Augmented OLAP for Big Data Luke Han | luke.han@Kyligence.io Co-founder & CEO of Kyligence Apache Kylin PMC Chair Microsoft Reginal Director & MVP Strata Global Sponsor BOOTH #410
  2. 2. © Kyligence Inc. 2019. About Luke Han • Luke Han • Co-founder & CEO at Kyligence • Co-creator and PMC Chair of Apache Kylin • Apache Software Foundation Member • Microsoft Regional Director & MVP • Former eBay Big Data Product Manager Lead
  3. 3. © Kyligence Inc. 2019. About Apache Kylin • Leading Open Source OLAP for Big Data • Rank 1 from googling “big data OLAP” • Rank 1 from googling “hadoop OLAP” • Open sourced by eBay in 2014 • Graduated to Apache Top Project in 2015 • 1000+ Adoptions world wild • 2015 InfoWorld Bossie Awards • 2016 InfoWorld Bossie Awards
  4. 4. © Kyligence Inc. 2019. Agenda • About Kyligence • Pains in Big Data Analysis • Kyligence’s solution: Augmented OLAP • Video Demo • Benchmark • Use Cases
  5. 5. © Kyligence Inc. 2019. Kyligence = Kylin + Intelligence • Founded in 2016 by the original creators of Apache Kylin • CRN Top 10 Big Data Startups 2018 • Backing by leading VCs: • Redpoint Ventures • Cisco • CBC Capital • Shunwei Capital • Eight Roads Ventures (Fidelity International Arm) • Coatue • Global Offices: • Shanghai • Beijing • Shenzhen • San Jose • New York • Seattle • …
  6. 6. © Kyligence Inc. 2019. Telecom Finance Manufacturing Trusted by Global Leaders Retail & Others Most of them are Global Fortune 500
  7. 7. © Kyligence Inc. 2019. Global Partners
  8. 8. © Kyligence Inc. 2019. Agenda • About Kyligence • Pains in Big Data Analysis • Kyligence’s solution: Augmented OLAP • Use Cases
  9. 9. © Kyligence Inc. 2019. Let’s talk about photography story… https://technave.com/data/files/mall/article/201812271418327393.jpg
  10. 10. © Kyligence Inc. 2019. Let’s talk about photography story… How many people really know how to setup those?
  11. 11. © Kyligence Inc. 2019. Let’s talk about photography story… https://technave.com/data/files/mall/article/201812271437395703.jpg
  12. 12. © Kyligence Inc. 2019. Let’s talk about photography story… Google Photos
  13. 13. © Kyligence Inc. 2019. Let’s talk about photography story… How do you manage your 100,000+ photos? Google Photos
  14. 14. © Kyligence Inc. 2019, Confidential. Then… how about your enterprise data?
  15. 15. © Kyligence Inc. 2019, Confidential. https://www.slideshare.net/datascienceth/introduction-to-data-science-data-science-thailand-meetup-1 https://www.sintetia.com/wp-content/uploads/2014/05/Data-Scientist-What-I-really-do.png
  16. 16. © Kyligence Inc. 2019, Confidential. Fast and Changing Analysis Demand Slow and Heavy Big Data Operations vs
  17. 17. © Kyligence Inc. 2019. The Typical “Throw in some People” Approach Business Users Analysts Data Engineers Business Analysis Data Modeling Lake → Warehouse → Mart Reporting $$$High Cost : Administrators Slow Time-to-Insight:
  18. 18. © Kyligence Inc. 2019, Confidential. Presentation Visualization Impala Data Lake Hive Spark SQL Drill MapReduce Spark ……. Time-to-value Pain Weeks of waiting breaks the “online” promise. Collaboration Pain Hard to reuse asset across teams. Each team fights their own path. Resource Pain Hard to scale. Where to find so many skilled big data engineers? Pains in the “Throw in some People” Approach
  19. 19. © Kyligence Inc. 2019. Agenda • About Kyligence • Pains in Big Data Analysis • Kyligence’s solution: Augmented OLAP • Use Cases
  20. 20. © Kyligence Inc. 2019, Confidential. Throw in some Intelligence! Let a system replace the people. o Transparent SQL Acceleration o On-demand Data Preparation o Interactive Query Performance o High Concurrency o Centralized Semantic Layer Faster time to market. Stay “online”. Augmented OLAP Data Mart Presentation Visualization Impala Data Lake Hive Drill MapReduce Spark ……. Spark SQL Semantic Automation Acceleration Governance
  21. 21. © Kyligence Inc. 2019. A Learning OLAP System Business User Insights Business User Analyst Data Engineer vs
  22. 22. © Kyligence Inc. 2019. A Learning OLAP System Business User Insights Pattern Detection Auto Modeling Data Preparing Raw Data Prepared Data Augmented OLAP Engine (Background Learning)
  23. 23. © Kyligence Inc. 2019. Demo Setup Tableau SparkSQL 2.4 Kyligence Enterprise Analyze 1 billion rows of sales records (TPC-H) Business User
  24. 24. © Kyligence Inc. 2019. (Embed the Demo Video)
  25. 25. © Kyligence Inc. 2019. Demo FAQ Business User Analyst reuse How to improve the first slow exploration? What if the analyst operates differently the second time? More comprehensive performance benchmark? Prepared Data
  26. 26. © Kyligence Inc. 2019. TPC-H Decision Support Benchmark TPC-H Benchmark • Examine large volumes of data • High complexity queries • Answers critical business questions • 22 decision making queries E.g. The Shipping Priority Query retrieves the shipping priority and potential revenue of the orders having the largest revenue among those that had not been shipped as of a given date. Top 10 orders are listed in decreasing order of revenue.
  27. 27. © Kyligence Inc. 2019. Kyligence Enterprise 4 Beta vs SparkSQL 2.4 To see the trend as data grows • 3 datasets • Scale Factor = 20, 35, 50 • TPCH_SF1: Consists of the base row size (several million elements). • TPCH_SF20: Consists of the base row size x 20. • TPCH_SF35: Consists of the base row size x 35. • TPCH_SF50: Consists of the base row size x 50 (several hundred million elements). Billion
  28. 28. © Kyligence Inc. 2019. Hardware Configurations Same 4 physical nodes - Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz * 2 - Totally 86 vCores, 188 GB mem Same Spark configuration for both KE 4 Beta and SparkSQL 2.4 - spark.driver.memory=16g - spark.executor.memory=8g - spark.yarn.executor.memoryOverhead=2g - spark.yarn.am.memory=1024m - spark.executor.cores=5 - spark.executor.instances=17
  29. 29. © Kyligence Inc. 2019. Query Response Time | KE 4 Beta vs. SparkSQL 2.4 Milliseconds TPC-H 22 queries For each dataset - Run each query 3 times - Record the average time - No warm up Lower is better. SF=50
  30. 30. © Kyligence Inc. 2019. Total Response Time | KE 4 Beta vs. SparkSQL 2.4 Billion Seconds Total response time is the sum of 22 queries’ response time. Compare over the size of datasets and feel the trend. Scale out for the future.
  31. 31. © Kyligence Inc. 2019. Avg. Acceleration Rate | KE 4 Beta vs. SparkSQL 2.4 Acceleration Rate = SparkSQL time / KE time Take average of the 22 and compare over size of datasets.
  32. 32. © Kyligence Inc. 2019. SQL Query Log Analytic Behavior Data Schema Data Profile Machine Learning Engine Data Modeling Automation Kylin Cube Learnt Index Smart Pushdown BI Real-time Analysis Data-as-a- Service Local Deployment Cloud Platform Container Data Services © Kyligence Inc. 2018, Confidential. AI-Augmented Analytics Platform
  33. 33. © Kyligence Inc. 2019. Agenda • About Kyligence • Pains in Big Data Analysis • Kyligence’s solution: Augmented OLAP • Use Cases
  34. 34. © Kyligence Inc. 2019. Use Case: IBM Cognos Replacement One Kyligence Cube for 800+ Cognos Cubes Org. Daily Cube Merch. Daily Cube Channel Daily Cube Region Daily Cube Org. Monthly Cube Merch. Monthly Cube Channel Monthly Cube Region Monthly Cube Shanghai Merchants Zhejiang Merchants Anhui Merchants Guangdong Merchants Card Transaction Dimensions: 167 Measures: 20 800+ Cognos Cube, 1000+ ETL jobs Functional Scene Time Scene Geo Scene ┄ ┄ Data: 300+ B Records Merchants: 10+m Cards: 10+B
  35. 35. © Kyligence Inc. 2019. Use Case: Data as a Services Platform In the past, due to the limitations of our previous multi-dimensional analytic tool, we faced challenges of constrained time range in queries……We are considering leveraging multi-dimensional data cubes to replace a number of fragmented legacy tabular reports in more business units, so that we can provide better analytic services to our business users.” -- Laments Wu Ying, VP of CMBs Development Center, Offline Platform (Hadoop) Online Platform (Hadoop) EDW (Teradata) Kyligence Data-as-a-Service Platform Tenancy 1 Tenancy 2 Tenancy 3 Tenancy N…… Business Intelligence Cognos Tableau MicroStrategy Superset API Smart Routine Intelligent Modeling Multi-tenancy Applications SecurityJob MIP MDS CRM DWS
  36. 36. © Kyligence Inc. 2019, Confidential. Take away: Augmented OLAP, the future for analytics ImpalaHive Spark SQL Drill MapReduce Spark ……. AI-Augmented OLAP ImpalaHive Drill MapReduce Spark ……. Spark SQL Semantic Automation Acceleration Governance
  37. 37. Thanks luke.han@Kyligence.io | @lukehq Homepage: http://kyligence.io Twitter: @kyligence Booth: #410

×