SlideShare a Scribd company logo
Apache Kylin 101
Kaige Liu
Senior Solution Architect, Kyligence
Apache Kylin Committer
2020.4
© Kyligence Inc. 2019, Confidential.
Agenda
• OLAP Overview
• Apache Kylin Introduction
• Apache Kylin Demo
• Q&A
© Kyligence Inc. 2019, Confidential.
Questions OLAP Can Help Us Answer
What are our top 5 best selling products in each state/city?
Which products should be put together?
Do you have enough toilet paper prepared for coronavirus?
Who owns this supermarket?
Boss Theodore
Analyst
© Kyligence Inc. 2019, Confidential.
Why OLAP?
Good at:
• Designed for analysis – BI reporting, data discovery etc.
• Quick insight
• Multidimensional data model
• Complex business calculations
Online Analytical Processing
Not good at:
• Update/delete frequently
• Transactional data
© Kyligence Inc. 2019, Confidential.
OLAP Cubes
120 80 60
50 130 90
70 50 100
April May June
New York
Los Angeles
San Francisco
Beer
Milk
Juice
Q: How many beers were sold in Los Angeles in June?
A: 90
© Kyligence Inc. 2019, Confidential.
From Tables to OLAP Cubes
Dimensions are the context that help the
consumer of measures understand the meaning
of those measures.
F_SALES
REVENUE
SALES AMOUNT
TAX
SUPPLY COST
DIM_DATE
DATE
YEAR
QUARTER
MONTH
WEEK
DIM_CUSTOMER
CUSTOMER_ID
NAME
EMAIL
CITY
ADDRESS
DIM_SHOP
SHOP_ID
CITY
STATUS
Measures contain numeric, quantitative
values that you can measure.
© Kyligence Inc. 2019, Confidential.
Dimensions and Measures in OLAP Cubes
120 80 60
50 130 90
70 50 100
April May June
New York
Los Angeles
San Francisco
Beer
Milk
Juice
D
D
D
M
Q: How many beers were sold in Los Angeles in June?
© Kyligence Inc. 2019, Confidential.
OLAP Operations
120 80 60
50 130 90
70 50 100
April May June
New York
Los Angeles
San Francisco
Beer
Milk
Juice
Roll Up
260
270
220
Q2
© Kyligence Inc. 2019, Confidential.
OLAP Operations
120 80 60
50 130 90
70 50 100
Week13
New York
Los Angeles
San Francisco
Beer
Milk
Juice
Drill Down
2
0
4
0
3
0
3
0
2
0
3
0
2
0
1
0
1
5
1
0
2
5
1
0
1
0
1
5
1
5
1
0
5
0
2
5
2
5
3
0
3
5
5
3
0
2
0
2
5
1
5
2
0
1
0
1
5
1
0
1
0
1
5
2
5
2
5
2
5
2
5
Week14
Week15
Week24
Week23
Week22
…
April May June
© Kyligence Inc. 2019, Confidential.
Traditional OLAP Tools
© Kyligence Inc. 2019, Confidential.
Challenges in the Big Data Era
Traditional OLAP Tools Are Great but…
• Difficult to handle massive data volumes
• Cube size limited by a single machine
• Have to maintain lots of cubes
• Hard to scale
• Takes a long time to build cubes
• Number of dimensions is limited
© Kyligence Inc. 2019, Confidential.
Modern OLAP
Cubes in a single machine
Cubes distributed in
cluster
One logical cube
Processed by
distributed framework
© Kyligence Inc. 2019, Confidential.
Journey of Apache Kylin
Sept 2013 Oct 2014 Nov 2014 Sept 2015 Nov 2015 Mar 2016
Officially
Open Source
Project
Initiated
Apache
Incubator Project
InfoWorld
Best Open Source
Big Data Tool Award Kyligence Inc.
Founded
Apache Top-Level
Project
© Kyligence Inc. 2019, Confidential.
Apache Kylin
Extreme OLAP Engine for Big Data
High performance at massive scale
More than 900 billion rows of data, 99% queries < 1.3 seconds,
from Meituan.com – #1 O2O company in China
ANSI-SQL
SQL on Hadoop, supports ANSI SQL JDBC/ODBC/Restful API
Hadoop Native
Compatible with Hadoop ecosystem, fully scalable architecture
MOLAP Cube
Multidimensional model for billions of rows of data
© Kyligence Inc. 2019, Confidential.
Apache Kylin Architecture
BI Tools, Web App…
ANSI SQL
OLAP Cube
© Kyligence Inc. 2019, Confidential.
Performance Benchmark
© Kyligence Inc. 2019, Confidential.
Apache Kylin Users
1,000+ Global Users
© Kyligence Inc. 2019, Confidential.
Demo
4 Steps to Build Your First Apache Kylin Cube
1. Connect to Data Source
2. Create Model and Cube
3. Build Cube
4. Go and Query
© Kyligence Inc. 2019, Confidential.
Roadmap
• Fully on Spark
• New parquet storage (replace HBase)
• Dockerize
• Kubernetes integration
• Cloud ready
• From OLAP to data warehouse
Visit http://kylin.apache.org/ for more information
© Kyligence Inc. 2019, Confidential.
Join the Community
https://github.com/apache/kylin apache-kylin.slack.comuser@kylin.apache.org
THANK YOU

More Related Content

What's hot

Getting started with Hadoop, Hive, Spark and Kafka
Getting started with Hadoop, Hive, Spark and KafkaGetting started with Hadoop, Hive, Spark and Kafka
Getting started with Hadoop, Hive, Spark and Kafka
Edelweiss Kammermann
 
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Kai Wähner
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
How Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At SpotifyHow Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At Spotify
Josh Baer
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
DataWorks Summit/Hadoop Summit
 
Lasso regression
Lasso regressionLasso regression
Lasso regression
Masayuki Tanaka
 
K means
K meansK means
Big Data MDX with Mondrian and Apache Kylin
Big Data MDX with Mondrian and Apache KylinBig Data MDX with Mondrian and Apache Kylin
Big Data MDX with Mondrian and Apache Kylin
inovex GmbH
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
DataWorks Summit
 
URP? Excuse You! The Three Kafka Metrics You Need to Know
URP? Excuse You! The Three Kafka Metrics You Need to KnowURP? Excuse You! The Three Kafka Metrics You Need to Know
URP? Excuse You! The Three Kafka Metrics You Need to Know
Todd Palino
 
Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowDataWorks Summit
 
Oracle API Gateway
Oracle API GatewayOracle API Gateway
Oracle API Gateway
Rakesh Gujjarlapudi
 
Transactional SQL in Apache Hive
Transactional SQL in Apache HiveTransactional SQL in Apache Hive
Transactional SQL in Apache Hive
DataWorks Summit
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
Lars Marius Garshol
 
Querying Druid in SQL with Superset
Querying Druid in SQL with SupersetQuerying Druid in SQL with Superset
Querying Druid in SQL with Superset
DataWorks Summit
 
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
HostedbyConfluent
 
Overview of new features in Apache Ranger
Overview of new features in Apache RangerOverview of new features in Apache Ranger
Overview of new features in Apache Ranger
DataWorks Summit
 
Introduction to Apache Accumulo
Introduction to Apache AccumuloIntroduction to Apache Accumulo
Introduction to Apache Accumulo
busbey
 

What's hot (20)

Getting started with Hadoop, Hive, Spark and Kafka
Getting started with Hadoop, Hive, Spark and KafkaGetting started with Hadoop, Hive, Spark and Kafka
Getting started with Hadoop, Hive, Spark and Kafka
 
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
How Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At SpotifyHow Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At Spotify
 
Kmeans
KmeansKmeans
Kmeans
 
Knn
KnnKnn
Knn
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
 
Lasso regression
Lasso regressionLasso regression
Lasso regression
 
K means
K meansK means
K means
 
Big Data MDX with Mondrian and Apache Kylin
Big Data MDX with Mondrian and Apache KylinBig Data MDX with Mondrian and Apache Kylin
Big Data MDX with Mondrian and Apache Kylin
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
URP? Excuse You! The Three Kafka Metrics You Need to Know
URP? Excuse You! The Three Kafka Metrics You Need to KnowURP? Excuse You! The Three Kafka Metrics You Need to Know
URP? Excuse You! The Three Kafka Metrics You Need to Know
 
Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and Tomorrow
 
Oracle API Gateway
Oracle API GatewayOracle API Gateway
Oracle API Gateway
 
Transactional SQL in Apache Hive
Transactional SQL in Apache HiveTransactional SQL in Apache Hive
Transactional SQL in Apache Hive
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
Querying Druid in SQL with Superset
Querying Druid in SQL with SupersetQuerying Druid in SQL with Superset
Querying Druid in SQL with Superset
 
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
 
Overview of new features in Apache Ranger
Overview of new features in Apache RangerOverview of new features in Apache Ranger
Overview of new features in Apache Ranger
 
Introduction to Apache Accumulo
Introduction to Apache AccumuloIntroduction to Apache Accumulo
Introduction to Apache Accumulo
 

Similar to Apache Kylin 101

Open Source Technologies in the Analytics Revolution
Open Source Technologies in the Analytics RevolutionOpen Source Technologies in the Analytics Revolution
Open Source Technologies in the Analytics Revolution
SamanthaBerlant
 
Apache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and JapanApache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and Japan
Luke Han
 
Building Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSIBuilding Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSI
Luke Han
 
Take the Bias out of Big Data Insights With Augmented Analytics
Take the Bias out of Big Data Insights With Augmented AnalyticsTake the Bias out of Big Data Insights With Augmented Analytics
Take the Bias out of Big Data Insights With Augmented Analytics
Tyler Wishnoff
 
Simplify Data Analytics Over the Cloud
Simplify Data Analytics Over the CloudSimplify Data Analytics Over the Cloud
Simplify Data Analytics Over the Cloud
Tyler Wishnoff
 
Apache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data SpainApache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data Spain
Luke Han
 
Lightning-Fast, Interactive Business Intelligence Performance with MicroStrat...
Lightning-Fast, Interactive Business Intelligence Performance with MicroStrat...Lightning-Fast, Interactive Business Intelligence Performance with MicroStrat...
Lightning-Fast, Interactive Business Intelligence Performance with MicroStrat...
Tyler Wishnoff
 
Augmented OLAP for Big Data
Augmented OLAP for Big DataAugmented OLAP for Big Data
Augmented OLAP for Big Data
Luke Han
 
Augmented OLAP Analytics for Big Data
Augmented OLAP Analytics for Big DataAugmented OLAP Analytics for Big Data
Augmented OLAP Analytics for Big Data
Tyler Wishnoff
 
Augmented OLAP for Big Data Analytics
Augmented OLAP for Big Data AnalyticsAugmented OLAP for Big Data Analytics
Augmented OLAP for Big Data Analytics
Tyler Wishnoff
 
Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data Lake
Databricks
 
Apache kylin boost your sqls on extremely large dataset
Apache kylin boost your sqls on extremely large datasetApache kylin boost your sqls on extremely large dataset
Apache kylin boost your sqls on extremely large dataset
ssuser931288
 
Apache kylin boost your SQLs on extremely large dataset
Apache kylin boost your SQLs on extremely large datasetApache kylin boost your SQLs on extremely large dataset
Apache kylin boost your SQLs on extremely large dataset
Chun'en Ni
 
Ibm leads way with hadoop and spark 2015 may 15
Ibm leads way with hadoop and spark 2015 may 15Ibm leads way with hadoop and spark 2015 may 15
Ibm leads way with hadoop and spark 2015 may 15
IBMInfoSphereUGFR
 
Batched To Perfection: Modeling & Solving Business Problems With Apache Spark
Batched To Perfection: Modeling & Solving Business Problems  With Apache SparkBatched To Perfection: Modeling & Solving Business Problems  With Apache Spark
Batched To Perfection: Modeling & Solving Business Problems With Apache Spark
Eliav Lavi
 
AI-Powered Analytics: What It Is and How It’s Powering the Next Generation of...
AI-Powered Analytics: What It Is and How It’s Powering the Next Generation of...AI-Powered Analytics: What It Is and How It’s Powering the Next Generation of...
AI-Powered Analytics: What It Is and How It’s Powering the Next Generation of...
Tyler Wishnoff
 
Apache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 BeijingApache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 Beijing
Luke Han
 
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
DataWorks Summit
 
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
Data Con LA
 
Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...
Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...
Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...
Romeo Kienzler
 

Similar to Apache Kylin 101 (20)

Open Source Technologies in the Analytics Revolution
Open Source Technologies in the Analytics RevolutionOpen Source Technologies in the Analytics Revolution
Open Source Technologies in the Analytics Revolution
 
Apache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and JapanApache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and Japan
 
Building Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSIBuilding Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSI
 
Take the Bias out of Big Data Insights With Augmented Analytics
Take the Bias out of Big Data Insights With Augmented AnalyticsTake the Bias out of Big Data Insights With Augmented Analytics
Take the Bias out of Big Data Insights With Augmented Analytics
 
Simplify Data Analytics Over the Cloud
Simplify Data Analytics Over the CloudSimplify Data Analytics Over the Cloud
Simplify Data Analytics Over the Cloud
 
Apache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data SpainApache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data Spain
 
Lightning-Fast, Interactive Business Intelligence Performance with MicroStrat...
Lightning-Fast, Interactive Business Intelligence Performance with MicroStrat...Lightning-Fast, Interactive Business Intelligence Performance with MicroStrat...
Lightning-Fast, Interactive Business Intelligence Performance with MicroStrat...
 
Augmented OLAP for Big Data
Augmented OLAP for Big DataAugmented OLAP for Big Data
Augmented OLAP for Big Data
 
Augmented OLAP Analytics for Big Data
Augmented OLAP Analytics for Big DataAugmented OLAP Analytics for Big Data
Augmented OLAP Analytics for Big Data
 
Augmented OLAP for Big Data Analytics
Augmented OLAP for Big Data AnalyticsAugmented OLAP for Big Data Analytics
Augmented OLAP for Big Data Analytics
 
Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data Lake
 
Apache kylin boost your sqls on extremely large dataset
Apache kylin boost your sqls on extremely large datasetApache kylin boost your sqls on extremely large dataset
Apache kylin boost your sqls on extremely large dataset
 
Apache kylin boost your SQLs on extremely large dataset
Apache kylin boost your SQLs on extremely large datasetApache kylin boost your SQLs on extremely large dataset
Apache kylin boost your SQLs on extremely large dataset
 
Ibm leads way with hadoop and spark 2015 may 15
Ibm leads way with hadoop and spark 2015 may 15Ibm leads way with hadoop and spark 2015 may 15
Ibm leads way with hadoop and spark 2015 may 15
 
Batched To Perfection: Modeling & Solving Business Problems With Apache Spark
Batched To Perfection: Modeling & Solving Business Problems  With Apache SparkBatched To Perfection: Modeling & Solving Business Problems  With Apache Spark
Batched To Perfection: Modeling & Solving Business Problems With Apache Spark
 
AI-Powered Analytics: What It Is and How It’s Powering the Next Generation of...
AI-Powered Analytics: What It Is and How It’s Powering the Next Generation of...AI-Powered Analytics: What It Is and How It’s Powering the Next Generation of...
AI-Powered Analytics: What It Is and How It’s Powering the Next Generation of...
 
Apache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 BeijingApache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 Beijing
 
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
 
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
 
Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...
Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...
Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...
 

More from SamanthaBerlant

Kyligence Cloud 4 - Feature Focus: Spark-Powered Cubing and Indexing
Kyligence Cloud 4 - Feature Focus: Spark-Powered Cubing and IndexingKyligence Cloud 4 - Feature Focus: Spark-Powered Cubing and Indexing
Kyligence Cloud 4 - Feature Focus: Spark-Powered Cubing and Indexing
SamanthaBerlant
 
Smashing Through Big Data Barriers with Tableau and Snowflake
Smashing Through Big Data Barriers with Tableau and SnowflakeSmashing Through Big Data Barriers with Tableau and Snowflake
Smashing Through Big Data Barriers with Tableau and Snowflake
SamanthaBerlant
 
Kyligence Cloud 4 - Feature Focus: AI-Augmented Engine
Kyligence Cloud 4 - Feature Focus: AI-Augmented EngineKyligence Cloud 4 - Feature Focus: AI-Augmented Engine
Kyligence Cloud 4 - Feature Focus: AI-Augmented Engine
SamanthaBerlant
 
Precomputation or Data Virtualization, which one is right for you?
Precomputation or Data Virtualization, which one is right for you?Precomputation or Data Virtualization, which one is right for you?
Precomputation or Data Virtualization, which one is right for you?
SamanthaBerlant
 
Architecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High PerformanceArchitecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High Performance
SamanthaBerlant
 
Kyligence Cloud 4 - An Overview
Kyligence Cloud 4 - An OverviewKyligence Cloud 4 - An Overview
Kyligence Cloud 4 - An Overview
SamanthaBerlant
 
Extreme Excel: How a 35-Year-Old Desktop App Smashed Through the Big Data Bar...
Extreme Excel: How a 35-Year-Old Desktop App Smashed Through the Big Data Bar...Extreme Excel: How a 35-Year-Old Desktop App Smashed Through the Big Data Bar...
Extreme Excel: How a 35-Year-Old Desktop App Smashed Through the Big Data Bar...
SamanthaBerlant
 
Addressing the systemic shortcomings of cloud analytics
Addressing the systemic shortcomings of cloud analyticsAddressing the systemic shortcomings of cloud analytics
Addressing the systemic shortcomings of cloud analytics
SamanthaBerlant
 
SF Big Analytics Meetup - Exact Count Distinct with Apache Kylin
SF Big Analytics Meetup - Exact Count Distinct with Apache KylinSF Big Analytics Meetup - Exact Count Distinct with Apache Kylin
SF Big Analytics Meetup - Exact Count Distinct with Apache Kylin
SamanthaBerlant
 
Snowflake: The Good, the Bad and the Ugly
Snowflake: The Good, the Bad and the UglySnowflake: The Good, the Bad and the Ugly
Snowflake: The Good, the Bad and the Ugly
SamanthaBerlant
 
Enhance Data Governance with Kyligence Unified Semantic Layer
Enhance Data Governance with Kyligence Unified Semantic LayerEnhance Data Governance with Kyligence Unified Semantic Layer
Enhance Data Governance with Kyligence Unified Semantic Layer
SamanthaBerlant
 
How to Guarantee Exact Count Distinct Queries with Sub-Second Latency on Mass...
How to Guarantee Exact Count Distinct Queries with Sub-Second Latency on Mass...How to Guarantee Exact Count Distinct Queries with Sub-Second Latency on Mass...
How to Guarantee Exact Count Distinct Queries with Sub-Second Latency on Mass...
SamanthaBerlant
 

More from SamanthaBerlant (12)

Kyligence Cloud 4 - Feature Focus: Spark-Powered Cubing and Indexing
Kyligence Cloud 4 - Feature Focus: Spark-Powered Cubing and IndexingKyligence Cloud 4 - Feature Focus: Spark-Powered Cubing and Indexing
Kyligence Cloud 4 - Feature Focus: Spark-Powered Cubing and Indexing
 
Smashing Through Big Data Barriers with Tableau and Snowflake
Smashing Through Big Data Barriers with Tableau and SnowflakeSmashing Through Big Data Barriers with Tableau and Snowflake
Smashing Through Big Data Barriers with Tableau and Snowflake
 
Kyligence Cloud 4 - Feature Focus: AI-Augmented Engine
Kyligence Cloud 4 - Feature Focus: AI-Augmented EngineKyligence Cloud 4 - Feature Focus: AI-Augmented Engine
Kyligence Cloud 4 - Feature Focus: AI-Augmented Engine
 
Precomputation or Data Virtualization, which one is right for you?
Precomputation or Data Virtualization, which one is right for you?Precomputation or Data Virtualization, which one is right for you?
Precomputation or Data Virtualization, which one is right for you?
 
Architecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High PerformanceArchitecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High Performance
 
Kyligence Cloud 4 - An Overview
Kyligence Cloud 4 - An OverviewKyligence Cloud 4 - An Overview
Kyligence Cloud 4 - An Overview
 
Extreme Excel: How a 35-Year-Old Desktop App Smashed Through the Big Data Bar...
Extreme Excel: How a 35-Year-Old Desktop App Smashed Through the Big Data Bar...Extreme Excel: How a 35-Year-Old Desktop App Smashed Through the Big Data Bar...
Extreme Excel: How a 35-Year-Old Desktop App Smashed Through the Big Data Bar...
 
Addressing the systemic shortcomings of cloud analytics
Addressing the systemic shortcomings of cloud analyticsAddressing the systemic shortcomings of cloud analytics
Addressing the systemic shortcomings of cloud analytics
 
SF Big Analytics Meetup - Exact Count Distinct with Apache Kylin
SF Big Analytics Meetup - Exact Count Distinct with Apache KylinSF Big Analytics Meetup - Exact Count Distinct with Apache Kylin
SF Big Analytics Meetup - Exact Count Distinct with Apache Kylin
 
Snowflake: The Good, the Bad and the Ugly
Snowflake: The Good, the Bad and the UglySnowflake: The Good, the Bad and the Ugly
Snowflake: The Good, the Bad and the Ugly
 
Enhance Data Governance with Kyligence Unified Semantic Layer
Enhance Data Governance with Kyligence Unified Semantic LayerEnhance Data Governance with Kyligence Unified Semantic Layer
Enhance Data Governance with Kyligence Unified Semantic Layer
 
How to Guarantee Exact Count Distinct Queries with Sub-Second Latency on Mass...
How to Guarantee Exact Count Distinct Queries with Sub-Second Latency on Mass...How to Guarantee Exact Count Distinct Queries with Sub-Second Latency on Mass...
How to Guarantee Exact Count Distinct Queries with Sub-Second Latency on Mass...
 

Recently uploaded

Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 

Recently uploaded (20)

Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 

Apache Kylin 101

  • 1. Apache Kylin 101 Kaige Liu Senior Solution Architect, Kyligence Apache Kylin Committer 2020.4
  • 2. © Kyligence Inc. 2019, Confidential. Agenda • OLAP Overview • Apache Kylin Introduction • Apache Kylin Demo • Q&A
  • 3. © Kyligence Inc. 2019, Confidential. Questions OLAP Can Help Us Answer What are our top 5 best selling products in each state/city? Which products should be put together? Do you have enough toilet paper prepared for coronavirus? Who owns this supermarket? Boss Theodore Analyst
  • 4. © Kyligence Inc. 2019, Confidential. Why OLAP? Good at: • Designed for analysis – BI reporting, data discovery etc. • Quick insight • Multidimensional data model • Complex business calculations Online Analytical Processing Not good at: • Update/delete frequently • Transactional data
  • 5. © Kyligence Inc. 2019, Confidential. OLAP Cubes 120 80 60 50 130 90 70 50 100 April May June New York Los Angeles San Francisco Beer Milk Juice Q: How many beers were sold in Los Angeles in June? A: 90
  • 6. © Kyligence Inc. 2019, Confidential. From Tables to OLAP Cubes Dimensions are the context that help the consumer of measures understand the meaning of those measures. F_SALES REVENUE SALES AMOUNT TAX SUPPLY COST DIM_DATE DATE YEAR QUARTER MONTH WEEK DIM_CUSTOMER CUSTOMER_ID NAME EMAIL CITY ADDRESS DIM_SHOP SHOP_ID CITY STATUS Measures contain numeric, quantitative values that you can measure.
  • 7. © Kyligence Inc. 2019, Confidential. Dimensions and Measures in OLAP Cubes 120 80 60 50 130 90 70 50 100 April May June New York Los Angeles San Francisco Beer Milk Juice D D D M Q: How many beers were sold in Los Angeles in June?
  • 8. © Kyligence Inc. 2019, Confidential. OLAP Operations 120 80 60 50 130 90 70 50 100 April May June New York Los Angeles San Francisco Beer Milk Juice Roll Up 260 270 220 Q2
  • 9. © Kyligence Inc. 2019, Confidential. OLAP Operations 120 80 60 50 130 90 70 50 100 Week13 New York Los Angeles San Francisco Beer Milk Juice Drill Down 2 0 4 0 3 0 3 0 2 0 3 0 2 0 1 0 1 5 1 0 2 5 1 0 1 0 1 5 1 5 1 0 5 0 2 5 2 5 3 0 3 5 5 3 0 2 0 2 5 1 5 2 0 1 0 1 5 1 0 1 0 1 5 2 5 2 5 2 5 2 5 Week14 Week15 Week24 Week23 Week22 … April May June
  • 10. © Kyligence Inc. 2019, Confidential. Traditional OLAP Tools
  • 11. © Kyligence Inc. 2019, Confidential. Challenges in the Big Data Era Traditional OLAP Tools Are Great but… • Difficult to handle massive data volumes • Cube size limited by a single machine • Have to maintain lots of cubes • Hard to scale • Takes a long time to build cubes • Number of dimensions is limited
  • 12. © Kyligence Inc. 2019, Confidential. Modern OLAP Cubes in a single machine Cubes distributed in cluster One logical cube Processed by distributed framework
  • 13. © Kyligence Inc. 2019, Confidential. Journey of Apache Kylin Sept 2013 Oct 2014 Nov 2014 Sept 2015 Nov 2015 Mar 2016 Officially Open Source Project Initiated Apache Incubator Project InfoWorld Best Open Source Big Data Tool Award Kyligence Inc. Founded Apache Top-Level Project
  • 14. © Kyligence Inc. 2019, Confidential. Apache Kylin Extreme OLAP Engine for Big Data High performance at massive scale More than 900 billion rows of data, 99% queries < 1.3 seconds, from Meituan.com – #1 O2O company in China ANSI-SQL SQL on Hadoop, supports ANSI SQL JDBC/ODBC/Restful API Hadoop Native Compatible with Hadoop ecosystem, fully scalable architecture MOLAP Cube Multidimensional model for billions of rows of data
  • 15. © Kyligence Inc. 2019, Confidential. Apache Kylin Architecture BI Tools, Web App… ANSI SQL OLAP Cube
  • 16. © Kyligence Inc. 2019, Confidential. Performance Benchmark
  • 17. © Kyligence Inc. 2019, Confidential. Apache Kylin Users 1,000+ Global Users
  • 18. © Kyligence Inc. 2019, Confidential. Demo 4 Steps to Build Your First Apache Kylin Cube 1. Connect to Data Source 2. Create Model and Cube 3. Build Cube 4. Go and Query
  • 19. © Kyligence Inc. 2019, Confidential. Roadmap • Fully on Spark • New parquet storage (replace HBase) • Dockerize • Kubernetes integration • Cloud ready • From OLAP to data warehouse Visit http://kylin.apache.org/ for more information
  • 20. © Kyligence Inc. 2019, Confidential. Join the Community https://github.com/apache/kylin apache-kylin.slack.comuser@kylin.apache.org

Editor's Notes

  1. Add trans to page 4
  2. Mention HBase will be removed in next release
  3. Mention blog in website