Apache Kylin 101
Kaige Liu
Senior Solution Architect, Kyligence
Apache Kylin Committer
2020.4
© Kyligence Inc. 2019, Confidential.
Agenda
• OLAP Overview
• Apache Kylin Introduction
• Apache Kylin Demo
• Q&A
© Kyligence Inc. 2019, Confidential.
Questions OLAP Can Help Us Answer
What are our top 5 best selling products in each state/city?
Which products should be put together?
Do you have enough toilet paper prepared for coronavirus?
Who owns this supermarket?
Boss Theodore
Analyst
© Kyligence Inc. 2019, Confidential.
Why OLAP?
Good at:
• Designed for analysis – BI reporting, data discovery etc.
• Quick insight
• Multidimensional data model
• Complex business calculations
Online Analytical Processing
Not good at:
• Update/delete frequently
• Transactional data
© Kyligence Inc. 2019, Confidential.
OLAP Cubes
120 80 60
50 130 90
70 50 100
April May June
New York
Los Angeles
San Francisco
Beer
Milk
Juice
Q: How many beers were sold in Los Angeles in June?
A: 90
© Kyligence Inc. 2019, Confidential.
From Tables to OLAP Cubes
Dimensions are the context that help the
consumer of measures understand the meaning
of those measures.
F_SALES
REVENUE
SALES AMOUNT
TAX
SUPPLY COST
DIM_DATE
DATE
YEAR
QUARTER
MONTH
WEEK
DIM_CUSTOMER
CUSTOMER_ID
NAME
EMAIL
CITY
ADDRESS
DIM_SHOP
SHOP_ID
CITY
STATUS
Measures contain numeric, quantitative
values that you can measure.
© Kyligence Inc. 2019, Confidential.
Dimensions and Measures in OLAP Cubes
120 80 60
50 130 90
70 50 100
April May June
New York
Los Angeles
San Francisco
Beer
Milk
Juice
D
D
D
M
Q: How many beers were sold in Los Angeles in June?
© Kyligence Inc. 2019, Confidential.
OLAP Operations
120 80 60
50 130 90
70 50 100
April May June
New York
Los Angeles
San Francisco
Beer
Milk
Juice
Roll Up
260
270
220
Q2
© Kyligence Inc. 2019, Confidential.
OLAP Operations
120 80 60
50 130 90
70 50 100
Week13
New York
Los Angeles
San Francisco
Beer
Milk
Juice
Drill Down
2
0
4
0
3
0
3
0
2
0
3
0
2
0
1
0
1
5
1
0
2
5
1
0
1
0
1
5
1
5
1
0
5
0
2
5
2
5
3
0
3
5
5
3
0
2
0
2
5
1
5
2
0
1
0
1
5
1
0
1
0
1
5
2
5
2
5
2
5
2
5
Week14
Week15
Week24
Week23
Week22
…
April May June
© Kyligence Inc. 2019, Confidential.
Traditional OLAP Tools
© Kyligence Inc. 2019, Confidential.
Challenges in the Big Data Era
Traditional OLAP Tools Are Great but…
• Difficult to handle massive data volumes
• Cube size limited by a single machine
• Have to maintain lots of cubes
• Hard to scale
• Takes a long time to build cubes
• Number of dimensions is limited
© Kyligence Inc. 2019, Confidential.
Modern OLAP
Cubes in a single machine
Cubes distributed in
cluster
One logical cube
Processed by
distributed framework
© Kyligence Inc. 2019, Confidential.
Journey of Apache Kylin
Sept 2013 Oct 2014 Nov 2014 Sept 2015 Nov 2015 Mar 2016
Officially
Open Source
Project
Initiated
Apache
Incubator Project
InfoWorld
Best Open Source
Big Data Tool Award Kyligence Inc.
Founded
Apache Top-Level
Project
© Kyligence Inc. 2019, Confidential.
Apache Kylin
Extreme OLAP Engine for Big Data
High performance at massive scale
More than 900 billion rows of data, 99% queries < 1.3 seconds,
from Meituan.com – #1 O2O company in China
ANSI-SQL
SQL on Hadoop, supports ANSI SQL JDBC/ODBC/Restful API
Hadoop Native
Compatible with Hadoop ecosystem, fully scalable architecture
MOLAP Cube
Multidimensional model for billions of rows of data
© Kyligence Inc. 2019, Confidential.
Apache Kylin Architecture
BI Tools, Web App…
ANSI SQL
OLAP Cube
© Kyligence Inc. 2019, Confidential.
Performance Benchmark
© Kyligence Inc. 2019, Confidential.
Apache Kylin Users
1,000+ Global Users
© Kyligence Inc. 2019, Confidential.
Demo
4 Steps to Build Your First Apache Kylin Cube
1. Connect to Data Source
2. Create Model and Cube
3. Build Cube
4. Go and Query
© Kyligence Inc. 2019, Confidential.
Roadmap
• Fully on Spark
• New parquet storage (replace HBase)
• Dockerize
• Kubernetes integration
• Cloud ready
• From OLAP to data warehouse
Visit http://kylin.apache.org/ for more information
© Kyligence Inc. 2019, Confidential.
Join the Community
https://github.com/apache/kylin apache-kylin.slack.comuser@kylin.apache.org
THANK YOU

Apache Kylin 101

  • 1.
    Apache Kylin 101 KaigeLiu Senior Solution Architect, Kyligence Apache Kylin Committer 2020.4
  • 2.
    © Kyligence Inc.2019, Confidential. Agenda • OLAP Overview • Apache Kylin Introduction • Apache Kylin Demo • Q&A
  • 3.
    © Kyligence Inc.2019, Confidential. Questions OLAP Can Help Us Answer What are our top 5 best selling products in each state/city? Which products should be put together? Do you have enough toilet paper prepared for coronavirus? Who owns this supermarket? Boss Theodore Analyst
  • 4.
    © Kyligence Inc.2019, Confidential. Why OLAP? Good at: • Designed for analysis – BI reporting, data discovery etc. • Quick insight • Multidimensional data model • Complex business calculations Online Analytical Processing Not good at: • Update/delete frequently • Transactional data
  • 5.
    © Kyligence Inc.2019, Confidential. OLAP Cubes 120 80 60 50 130 90 70 50 100 April May June New York Los Angeles San Francisco Beer Milk Juice Q: How many beers were sold in Los Angeles in June? A: 90
  • 6.
    © Kyligence Inc.2019, Confidential. From Tables to OLAP Cubes Dimensions are the context that help the consumer of measures understand the meaning of those measures. F_SALES REVENUE SALES AMOUNT TAX SUPPLY COST DIM_DATE DATE YEAR QUARTER MONTH WEEK DIM_CUSTOMER CUSTOMER_ID NAME EMAIL CITY ADDRESS DIM_SHOP SHOP_ID CITY STATUS Measures contain numeric, quantitative values that you can measure.
  • 7.
    © Kyligence Inc.2019, Confidential. Dimensions and Measures in OLAP Cubes 120 80 60 50 130 90 70 50 100 April May June New York Los Angeles San Francisco Beer Milk Juice D D D M Q: How many beers were sold in Los Angeles in June?
  • 8.
    © Kyligence Inc.2019, Confidential. OLAP Operations 120 80 60 50 130 90 70 50 100 April May June New York Los Angeles San Francisco Beer Milk Juice Roll Up 260 270 220 Q2
  • 9.
    © Kyligence Inc.2019, Confidential. OLAP Operations 120 80 60 50 130 90 70 50 100 Week13 New York Los Angeles San Francisco Beer Milk Juice Drill Down 2 0 4 0 3 0 3 0 2 0 3 0 2 0 1 0 1 5 1 0 2 5 1 0 1 0 1 5 1 5 1 0 5 0 2 5 2 5 3 0 3 5 5 3 0 2 0 2 5 1 5 2 0 1 0 1 5 1 0 1 0 1 5 2 5 2 5 2 5 2 5 Week14 Week15 Week24 Week23 Week22 … April May June
  • 10.
    © Kyligence Inc.2019, Confidential. Traditional OLAP Tools
  • 11.
    © Kyligence Inc.2019, Confidential. Challenges in the Big Data Era Traditional OLAP Tools Are Great but… • Difficult to handle massive data volumes • Cube size limited by a single machine • Have to maintain lots of cubes • Hard to scale • Takes a long time to build cubes • Number of dimensions is limited
  • 12.
    © Kyligence Inc.2019, Confidential. Modern OLAP Cubes in a single machine Cubes distributed in cluster One logical cube Processed by distributed framework
  • 13.
    © Kyligence Inc.2019, Confidential. Journey of Apache Kylin Sept 2013 Oct 2014 Nov 2014 Sept 2015 Nov 2015 Mar 2016 Officially Open Source Project Initiated Apache Incubator Project InfoWorld Best Open Source Big Data Tool Award Kyligence Inc. Founded Apache Top-Level Project
  • 14.
    © Kyligence Inc.2019, Confidential. Apache Kylin Extreme OLAP Engine for Big Data High performance at massive scale More than 900 billion rows of data, 99% queries < 1.3 seconds, from Meituan.com – #1 O2O company in China ANSI-SQL SQL on Hadoop, supports ANSI SQL JDBC/ODBC/Restful API Hadoop Native Compatible with Hadoop ecosystem, fully scalable architecture MOLAP Cube Multidimensional model for billions of rows of data
  • 15.
    © Kyligence Inc.2019, Confidential. Apache Kylin Architecture BI Tools, Web App… ANSI SQL OLAP Cube
  • 16.
    © Kyligence Inc.2019, Confidential. Performance Benchmark
  • 17.
    © Kyligence Inc.2019, Confidential. Apache Kylin Users 1,000+ Global Users
  • 18.
    © Kyligence Inc.2019, Confidential. Demo 4 Steps to Build Your First Apache Kylin Cube 1. Connect to Data Source 2. Create Model and Cube 3. Build Cube 4. Go and Query
  • 19.
    © Kyligence Inc.2019, Confidential. Roadmap • Fully on Spark • New parquet storage (replace HBase) • Dockerize • Kubernetes integration • Cloud ready • From OLAP to data warehouse Visit http://kylin.apache.org/ for more information
  • 20.
    © Kyligence Inc.2019, Confidential. Join the Community https://github.com/apache/kylin apache-kylin.slack.comuser@kylin.apache.org
  • 21.

Editor's Notes

  • #4 Add trans to page 4
  • #16 Mention HBase will be removed in next release
  • #20 Mention blog in website