Apache Kylin 101

Apache Kylin 101
Kaige Liu
Senior Solution Architect, Kyligence
Apache Kylin Committer
2020.4

© Kyligence Inc. 2019, Confidential.
Agenda
• OLAP Overview
• Apache Kylin Introduction
• Apache Kylin Demo
• Q&A

Questions OLAP Can Help Us Answer
What are our top 5 best selling products in each state/city?
Which products should be put together?
Do you have enough toilet paper prepared for coronavirus?
Who owns this supermarket?
Boss Theodore
Analyst

Why OLAP?
Good at:
• Designed for analysis – BI reporting, data discovery etc.
• Quick insight
• Multidimensional data model
• Complex business calculations
Online Analytical Processing
Not good at:
• Update/delete frequently
• Transactional data

OLAP Cubes
120 80 60
50 130 90
70 50 100
April May June
New York
Los Angeles
San Francisco
Beer
Milk
Juice
Q: How many beers were sold in Los Angeles in June?
A: 90

From Tables to OLAP Cubes
Dimensions are the context that help the
consumer of measures understand the meaning
of those measures.
F_SALES
REVENUE
SALES AMOUNT
TAX
SUPPLY COST
DIM_DATE
DATE
YEAR
QUARTER
MONTH
WEEK
DIM_CUSTOMER
CUSTOMER_ID
NAME
EMAIL
CITY
ADDRESS
DIM_SHOP
SHOP_ID
CITY
STATUS
Measures contain numeric, quantitative
values that you can measure.

Dimensions and Measures in OLAP Cubes
120 80 60
50 130 90
70 50 100
April May June
New York
Los Angeles
San Francisco
Beer
Milk
Juice
D
D
D
M
Q: How many beers were sold in Los Angeles in June?

OLAP Operations
120 80 60
50 130 90
70 50 100
April May June
New York
Los Angeles
San Francisco
Beer
Milk
Juice
Roll Up
260
270
220
Q2

OLAP Operations
120 80 60
50 130 90
70 50 100
Week13
New York
Los Angeles
San Francisco
Beer
Milk
Juice
Drill Down
2
0
4
0
3
0
3
0
2
0
3
0
2
0
1
0
1
5
1
0
2
5
1
0
1
0
1
5
1
5
1
0
5
0
2
5
2
5
3
0
3
5
5
3
0
2
0
2
5
1
5
2
0
1
0
1
5
1
0
1
0
1
5
2
5
2
5
2
5
2
5
Week14
Week15
Week24
Week23
Week22
…
April May June

Traditional OLAP Tools

Challenges in the Big Data Era
Traditional OLAP Tools Are Great but…
• Difficult to handle massive data volumes
• Cube size limited by a single machine
• Have to maintain lots of cubes
• Hard to scale
• Takes a long time to build cubes
• Number of dimensions is limited

Modern OLAP
Cubes in a single machine
Cubes distributed in
cluster
One logical cube
Processed by
distributed framework

Journey of Apache Kylin
Sept 2013 Oct 2014 Nov 2014 Sept 2015 Nov 2015 Mar 2016
Officially
Open Source
Project
Initiated
Apache
Incubator Project
InfoWorld
Best Open Source
Big Data Tool Award Kyligence Inc.
Founded
Apache Top-Level
Project

Apache Kylin
Extreme OLAP Engine for Big Data
High performance at massive scale
More than 900 billion rows of data, 99% queries < 1.3 seconds,
from Meituan.com – #1 O2O company in China
ANSI-SQL
SQL on Hadoop, supports ANSI SQL JDBC/ODBC/Restful API
Hadoop Native
Compatible with Hadoop ecosystem, fully scalable architecture
MOLAP Cube
Multidimensional model for billions of rows of data

Apache Kylin Architecture
BI Tools, Web App…
ANSI SQL
OLAP Cube

Performance Benchmark

Apache Kylin Users
1,000+ Global Users

Demo
4 Steps to Build Your First Apache Kylin Cube
1. Connect to Data Source
2. Create Model and Cube
3. Build Cube
4. Go and Query

Roadmap
• Fully on Spark
• New parquet storage (replace HBase)
• Dockerize
• Kubernetes integration
• Cloud ready
• From OLAP to data warehouse
Visit http://kylin.apache.org/ for more information

Join the Community
https://github.com/apache/kylin apache-kylin.slack.comuser@kylin.apache.org

Apache Kylin 101

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Apache Kylin 101

Similar to Apache Kylin 101 (20)

More from SamanthaBerlant

More from SamanthaBerlant (12)

Recently uploaded

Recently uploaded (20)

Apache Kylin 101

Editor's Notes