Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apache kylin 101 - Get Sub-Second Analytics on Massive Datasets

250 views

Published on

Learn how the world’s leading open source solution for query acceleration on massive datasets is revolutionizing analytics for enterprises across every industry, and how you can get started with it yourself.

This presentation will provide you with everything you need to understand the basics of Apache Kylin, as well as clear steps for deploying it in your organization. Learn more here: https://kyligence.io/apache-kylin-overview/

Published in: Data & Analytics
  • Be the first to comment

Apache kylin 101 - Get Sub-Second Analytics on Massive Datasets

  1. 1. Apache Kylin 101 Kaige Liu Senior Solution Architect, Kyligence Apache Kylin Committer 2020.4
  2. 2. © Kyligence Inc. 2019, Confidential. Agenda • OLAP Overview • Apache Kylin Introduction • Apache Kylin Demo • Q&A
  3. 3. © Kyligence Inc. 2019, Confidential. Questions OLAP Can Help Us Answer What are our top 5 best selling products in each state/city? Which products should be put together? Do you have enough toilet paper prepared for coronavirus? Who owns this supermarket? Boss Theodore Analyst
  4. 4. © Kyligence Inc. 2019, Confidential. Why OLAP? Good at: • Designed for analysis – BI reporting, data discovery etc. • Quick insight • Multidimensional data model • Complex business calculations Online Analytical Processing Not good at: • Update/delete frequently • Transactional data
  5. 5. © Kyligence Inc. 2019, Confidential. OLAP Cubes 120 80 60 50 130 90 70 50 100 April May June New York Los Angeles San Francisco Beer Milk Juice Q: How many beers were sold in Los Angeles in June? A: 90
  6. 6. © Kyligence Inc. 2019, Confidential. From Tables to OLAP Cubes Dimensions are the context that help the consumer of measures understand the meaning of those measures. F_SALES REVENUE SALES AMOUNT TAX SUPPLY COST DIM_DATE DATE YEAR QUARTER MONTH WEEK DIM_CUSTOMER CUSTOMER_ID NAME EMAIL CITY ADDRESS DIM_SHOP SHOP_ID CITY STATUS Measures contain numeric, quantitative values that you can measure.
  7. 7. © Kyligence Inc. 2019, Confidential. Dimensions and Measures in OLAP Cubes 120 80 60 50 130 90 70 50 100 April May June New York Los Angeles San Francisco Beer Milk Juice D D D M Q: How many beers were sold in Los Angeles in June?
  8. 8. © Kyligence Inc. 2019, Confidential. OLAP Operations 120 80 60 50 130 90 70 50 100 April May June New York Los Angeles San Francisco Beer Milk Juice Roll Up 260 270 220 Q2
  9. 9. © Kyligence Inc. 2019, Confidential. OLAP Operations 120 80 60 50 130 90 70 50 100 Week13 New York Los Angeles San Francisco Beer Milk Juice Drill Down 2 0 4 0 3 0 3 0 2 0 3 0 2 0 1 0 1 5 1 0 2 5 1 0 1 0 1 5 1 5 1 0 5 0 2 5 2 5 3 0 3 5 5 3 0 2 0 2 5 1 5 2 0 1 0 1 5 1 0 1 0 1 5 2 5 2 5 2 5 2 5 Week14 Week15 Week24 Week23 Week22 … April May June
  10. 10. © Kyligence Inc. 2019, Confidential. Traditional OLAP Tools
  11. 11. © Kyligence Inc. 2019, Confidential. Challenges in the Big Data Era Traditional OLAP Tools Are Great but… • Difficult to handle massive data volumes • Cube size limited by a single machine • Have to maintain lots of cubes • Hard to scale • Takes a long time to build cubes • Number of dimensions is limited
  12. 12. © Kyligence Inc. 2019, Confidential. Modern OLAP Cubes in a single machine Cubes distributed in cluster One logical cube Processed by distributed framework
  13. 13. © Kyligence Inc. 2019, Confidential. Journey of Apache Kylin Sept 2013 Oct 2014 Nov 2014 Sept 2015 Nov 2015 Mar 2016 Officially Open Source Project Initiated Apache Incubator Project InfoWorld Best Open Source Big Data Tool Award Kyligence Inc. Founded Apache Top-Level Project
  14. 14. © Kyligence Inc. 2019, Confidential. Apache Kylin Extreme OLAP Engine for Big Data High performance at massive scale More than 900 billion rows of data, 99% queries < 1.3 seconds, from Meituan.com – #1 O2O company in China ANSI-SQL SQL on Hadoop, supports ANSI SQL JDBC/ODBC/Restful API Hadoop Native Compatible with Hadoop ecosystem, fully scalable architecture MOLAP Cube Multidimensional model for billions of rows of data
  15. 15. © Kyligence Inc. 2019, Confidential. Apache Kylin Architecture BI Tools, Web App… ANSI SQL OLAP Cube
  16. 16. © Kyligence Inc. 2019, Confidential. Performance Benchmark
  17. 17. © Kyligence Inc. 2019, Confidential. Apache Kylin Users 1,000+ Global Users
  18. 18. © Kyligence Inc. 2019, Confidential. Demo 4 Steps to Build Your First Apache Kylin Cube 1. Connect to Data Source 2. Create Model and Cube 3. Build Cube 4. Go and Query
  19. 19. © Kyligence Inc. 2019, Confidential. Roadmap • Fully on Spark • New parquet storage (replace HBase) • Dockerize • Kubernetes integration • Cloud ready • From OLAP to data warehouse Visit http://kylin.apache.org/ for more information
  20. 20. © Kyligence Inc. 2019, Confidential. Join the Community https://github.com/apache/kylin apache-kylin.slack.comuser@kylin.apache.org
  21. 21. THANK YOU

×