SlideShare a Scribd company logo
Cloud-native Semantic Layer on
Data Lake
Dong Li
PMC @ Apache Kylin, Founding Member @ Kyligence
About Me
Dong Li is a Founding Member and Head of Product and Innovation
at Kyligence, an Apache Kylin Core Developer (Committer) and member of
the Project Management Committee (PMC) where he focuses on big data
technology development.
Previously, he was a Senior Engineer in eBay’s Global Analytics
Infrastructure Department, a Software Development Engineer for
Microsoft Cloud Computing and Enterprise Products.
Agenda
▪ Start from a real challenge
▪ The Solution
▪ Q&A
Customer Background
• A fast-growing SaaS company in US
• 1800 customers in 40+ countries
• 1/3 Fortune 500 use
• 8 Billion transactions per year
• Dashboards for end users
Landscape & Challenges
• Source Data in AWS RDS
• Materialized views used for dashboards
• Slow queries cost 5+ seconds
• 4+ hours to refresh materialized views every day
• Bottleneck at ~10 concurrent users
• Couldn’t provide flexible dashboards
• Number of views keeps increasing
OLTP
(RDS)
OLAP
(RDS)
Materialized
View
Dashboard
Export ETL SQL
Expectation for the future data platform
• Flexible dashboards for end users
• High performance (< 2s), high concurrency (> 100 users)
• Easy to scale
• Low data preparation latency (< 1 hour)
• Flexible for new requirements
• Enterprise-grade security: data recovery, row/column level access etc.
• Totally on AWS
• Low TCO
• Open Platform for Machine learning, Internal Analytics etc.
Agenda
▪ Start from a real challenge
▪ The Solution
▪ Q&A
Apache Kylin: Open Source Distributed Analytical Data Warehouse
Apache Kylin: Managing Your Most Valuable Data
• OLAP Data Modelling
• Speed Up Analytics Using Pre-Calculation
• ANSI SQL Interface
• High Concurrency and High Performance
• Batch & Streaming Together
Presentation
Visualization
Big Data
Platform
Data
Source
Data Mart
Hive Impala Spark SQL Kafka
MapReduce …Spark
Apache Kylin Community & Adoptions
1000+ Global Adoptions
Leading Open Source OLAP
Github Stars
JIRA Issues
Star Schema Benchmark
Star schema benchmark:
http://www.cs.umb.edu/~poneil/StarSchemaB.PDF
0
2
4
6
8
10
12
1.1 1.2 1.3 2.1 2.2 2.3 3.1 3.2 3.3 3.4 4.1 4.2 4.3
Latency(s)
SSB Queries
条SQL响
Kylin SQL on Hadoop
SQL Latency
Lower is better
0
10
20
30
40
50
60
70
80
90
0 10 20 30 40 50
Latency(s)
Data Scale
不同数据量性能 化
Kylin SQL on Hadoop
Data Volume Scale
Lower is better
Interactive Analysis with BI for Petabyte-Scale Datasets
select
l_returnflag,
o_orderstatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price
from
v_lineitem
inner join v_orders on l_orderkey = o_orderkey
where
l_shipdate <= '1998-09-16'
group by
l_returnflag,
o_orderstatus
order by
l_returnflag,
o_orderstatus;
Sort
Aggr
Filter
TablesO(N)
Join
Parse SQL to
an execution
plan
How Does Kylin Accelerate Queries?
• Kylin uses Apache Calcite as the SQL parser and optimizer
How Does Kylin Accelerate Queries?
• Kylin optimizes and adapts the plan to an OLAP cube.
• With less processing, Kylin can return the result instantly.
Aggr
Filter
Tables
Join
Sort
Sort
Cube
Filter
Pick the best
matched cube
Rewrite toThese steps have
already been completed
in the cube build.
O(1)
Apache Kylin
BI Tools Apps Machine Learning
SQL
Runtime Workload
Offload Workload
Scan & filter
Extract
Load
Architecture
The architecture of Apache Kylin v4.0.0-alpha
Use Case: Online Shopping Reporting
The most visited website in Japan
https://techblog.yahoo.co.jp/oss/apache-kylin/
§ Our reporting system used Impala as a backend
database previously.
- It took a long time (about 60 sec) to show Web
UI.
§ In order to lower the latency, we moved to Apache
Kylin.
- Average latency < 1sec for most cases
Thanks to low latency with Kylin, we become possible to focus on
adding functions for users.
§ We provide a reporting system that show
statistics for store owners.
- e. g. impressions, clicks and sales.
Apache Kylin 4.0 Roadmap: Cloud Native
Data analytics
Apache Kylin
Container Service (K8S, Docker)
Interactive Reporting Dashboard
OLAP / Data mart
Resource
Orchestration
Data Lake Source file, Streams, Parquet on Object Storage (S3, ADSL)
Metadata
Security
• Less Dependency, More
Lightweight
• Automated Scaling
• Less Computing and
Storage Cost
• Automated DevOps
Data is next Oil
The world’s most valuable resource is no
longer oil, but data. —“The Economist”
China’s Datasphere is expected to grow 30% on average over the next 7
years and will be the largest Datasphere of all regions by 2025 --IDC
175 Zettabytes By 2025 -- IDC
and, Data is moving to Cloud
But, the Chaos Happens Again!!!
What is missing here?
? ? ?
Reporting Dashboard Ad-Hoc Data-as-a-Services Machine Learning
EDW Datasets Data Lake Datasets Cloud Datasets
Data Lake
Application
SQL / MDX
Data Analysts Marketing User Operation Analysts
Unified Semantic Layer
Govern
Data Platform
Reporting Dashboard Ad-Hoc Data-as-a-Services Machine Learning
Managed Datasets
Managed KPIs
CUSTOMER
CUSTOMER NUMBER
CUSTOMER NAME
CUSTOMER CITY
CUSTOMER POST
CUSTOMER ST
CUSTOMER ADDR
CUSTOMER PHONE
CUSTOMER FAX
ORDER
ORDER NUMBER
ORDER DATE
STATUS
ORDER ITEM BACKORDERED
QUANTITY
ITEM
ITEM NUMBER
QUANTITY
DESCRIPTION
ORDER ITEM SHIPPED
QUANTITY
SHIP DATE
Finance KPI ERP KPI
Accounting KPI ……
Marketing KPI
Sales KPI
EDW Datasets Data Lake Datasets Cloud Datasets
Data Lake
Application
SQL / MDX
One-stop Governed Platform
• Data as a service
• Single source of truth
• Managed golden data
Intelligent Data Platform
• Machine Learning recommendation
from SQL history
• Optimizaed for PB data at scale
• High performance and High
Concurrency
Analysts Delighted Platform
• Supports most favorite BI tools
• Support standard SQL/MDX
• Reduce engineering efforts
Data Analysts Marketing User Operation Analysts
Intelligent Cubing
managed data at scale
Unified Data-as-a-Service Layer
Unified Semantic Layer
Use Case: Less Cubes for More
2 Cubes vs 1200 IBM Cognos Cubes
Challenge
• 1200+ existing cubes to manage
• 1000+ jobs to maintain
• Time-to-insight over 4 days
Job
Job
Job
Job
Job
Job
Job
…
…
Job
Job
Job
1200+ IBM Cognos Cubes, 1000+ ETL Jobs
with dependencies
Merchant Topic
Daily Cube
Region Topic
Daily Cube
Merchant Topic
Monthly Cube
2 Cubes, 1 ETL Job
┄ ┄
Agency Topic
Daily Cube
Agency Topic
Monthly Cube
Channel Topic
Monthly Cube
Region Topic
Monthly Cube
Merchant Topic
Shanghai Cube
Merchant Topic
Beijing Cube
Merchant Topic
Zhejiang Cube
Merchant Topic
Guangdong Cube
Sub
scenarios
Channel Topic
Daily Cube
Sub
scenarios
Sub
scenarios
Solution
• Using Kylin replaced IBM Cognos backend but
continue keep Cognos Reporting to interactive
with Kyligence
Result
• 1000x improved maintenance efficiency
• 10x faster and more stable analytics performance
• Time-to-Insight less then 4 hours
Cloud-native Semantic Layer on Data Lake
FinanceMarketingSales
Index
More…
Data Lake Aggregation & Index ApplicationsSource
Azure Blob Storage
Feedback
Your feedback is important to us.
Don’t forget to rate
and review the sessions.

More Related Content

What's hot

[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
DataScienceConferenc1
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
Databricks
 
The delta architecture
The delta architectureThe delta architecture
The delta architecture
Prakash Chockalingam
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
Alex Ivy
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Paris Data Engineers !
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
DataWorks Summit
 
Data Democratization at Nubank
 Data Democratization at Nubank Data Democratization at Nubank
Data Democratization at Nubank
Databricks
 
Data mesh
Data meshData mesh
Data mesh
ManojKumarR41
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
Databricks
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
Databricks
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the Ugly
Tyler Wishnoff
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
Matei Zaharia
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data Warehouse
Snowflake Computing
 
Changing the game with cloud dw
Changing the game with cloud dwChanging the game with cloud dw
Changing the game with cloud dw
elephantscale
 
Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)
Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)
Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)
Sergey Karayev
 
From Data Warehouse to Lakehouse
From Data Warehouse to LakehouseFrom Data Warehouse to Lakehouse
From Data Warehouse to Lakehouse
Modern Data Stack France
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
James Serra
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Yohei Onishi
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
Databricks
 

What's hot (20)

[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
The delta architecture
The delta architectureThe delta architecture
The delta architecture
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
 
Data Democratization at Nubank
 Data Democratization at Nubank Data Democratization at Nubank
Data Democratization at Nubank
 
Data mesh
Data meshData mesh
Data mesh
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the Ugly
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data Warehouse
 
Changing the game with cloud dw
Changing the game with cloud dwChanging the game with cloud dw
Changing the game with cloud dw
 
Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)
Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)
Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)
 
From Data Warehouse to Lakehouse
From Data Warehouse to LakehouseFrom Data Warehouse to Lakehouse
From Data Warehouse to Lakehouse
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 

Similar to Cloud-native Semantic Layer on Data Lake

Building Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSIBuilding Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSI
Luke Han
 
Develop a Custom Data Solution Architecture with NorthBay
Develop a Custom Data Solution Architecture with NorthBayDevelop a Custom Data Solution Architecture with NorthBay
Develop a Custom Data Solution Architecture with NorthBay
Amazon Web Services
 
Accelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache KylinAccelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache Kylin
Tyler Wishnoff
 
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Qubole
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lake
Torsten Steinbach
 
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data Lake
DATAVERSITY
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
Dmitry Anoshin
 
Apache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and JapanApache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and Japan
Luke Han
 
Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.
Amazon Web Services
 
Architecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High PerformanceArchitecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High Performance
SamanthaBerlant
 
Apache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data SpainApache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data Spain
Luke Han
 
PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)
Stratebi
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Databricks
 
Laboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nubeLaboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nube
Software Guru
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)
Kent Graziano
 
SQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastSQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at Comcast
Databricks
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
Torsten Steinbach
 
DoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics PlatformDoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics Platform
martinbpeters
 
Tapping the cloud for real time data analytics
 Tapping the cloud for real time data analytics Tapping the cloud for real time data analytics
Tapping the cloud for real time data analytics
Amazon Web Services
 

Similar to Cloud-native Semantic Layer on Data Lake (20)

Building Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSIBuilding Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSI
 
Develop a Custom Data Solution Architecture with NorthBay
Develop a Custom Data Solution Architecture with NorthBayDevelop a Custom Data Solution Architecture with NorthBay
Develop a Custom Data Solution Architecture with NorthBay
 
Accelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache KylinAccelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache Kylin
 
ESGYN Overview
ESGYN OverviewESGYN Overview
ESGYN Overview
 
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lake
 
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data Lake
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Apache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and JapanApache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and Japan
 
Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.
 
Architecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High PerformanceArchitecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High Performance
 
Apache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data SpainApache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data Spain
 
PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
 
Laboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nubeLaboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nube
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)
 
SQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastSQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at Comcast
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
 
DoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics PlatformDoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics Platform
 
Tapping the cloud for real time data analytics
 Tapping the cloud for real time data analytics Tapping the cloud for real time data analytics
Tapping the cloud for real time data analytics
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 

Recently uploaded (20)

Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 

Cloud-native Semantic Layer on Data Lake

  • 1. Cloud-native Semantic Layer on Data Lake Dong Li PMC @ Apache Kylin, Founding Member @ Kyligence
  • 2. About Me Dong Li is a Founding Member and Head of Product and Innovation at Kyligence, an Apache Kylin Core Developer (Committer) and member of the Project Management Committee (PMC) where he focuses on big data technology development. Previously, he was a Senior Engineer in eBay’s Global Analytics Infrastructure Department, a Software Development Engineer for Microsoft Cloud Computing and Enterprise Products.
  • 3. Agenda ▪ Start from a real challenge ▪ The Solution ▪ Q&A
  • 4. Customer Background • A fast-growing SaaS company in US • 1800 customers in 40+ countries • 1/3 Fortune 500 use • 8 Billion transactions per year • Dashboards for end users
  • 5. Landscape & Challenges • Source Data in AWS RDS • Materialized views used for dashboards • Slow queries cost 5+ seconds • 4+ hours to refresh materialized views every day • Bottleneck at ~10 concurrent users • Couldn’t provide flexible dashboards • Number of views keeps increasing OLTP (RDS) OLAP (RDS) Materialized View Dashboard Export ETL SQL
  • 6. Expectation for the future data platform • Flexible dashboards for end users • High performance (< 2s), high concurrency (> 100 users) • Easy to scale • Low data preparation latency (< 1 hour) • Flexible for new requirements • Enterprise-grade security: data recovery, row/column level access etc. • Totally on AWS • Low TCO • Open Platform for Machine learning, Internal Analytics etc.
  • 7. Agenda ▪ Start from a real challenge ▪ The Solution ▪ Q&A
  • 8. Apache Kylin: Open Source Distributed Analytical Data Warehouse
  • 9. Apache Kylin: Managing Your Most Valuable Data • OLAP Data Modelling • Speed Up Analytics Using Pre-Calculation • ANSI SQL Interface • High Concurrency and High Performance • Batch & Streaming Together Presentation Visualization Big Data Platform Data Source Data Mart Hive Impala Spark SQL Kafka MapReduce …Spark
  • 10. Apache Kylin Community & Adoptions 1000+ Global Adoptions Leading Open Source OLAP Github Stars JIRA Issues
  • 11. Star Schema Benchmark Star schema benchmark: http://www.cs.umb.edu/~poneil/StarSchemaB.PDF 0 2 4 6 8 10 12 1.1 1.2 1.3 2.1 2.2 2.3 3.1 3.2 3.3 3.4 4.1 4.2 4.3 Latency(s) SSB Queries 条SQL响 Kylin SQL on Hadoop SQL Latency Lower is better 0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 Latency(s) Data Scale 不同数据量性能 化 Kylin SQL on Hadoop Data Volume Scale Lower is better
  • 12. Interactive Analysis with BI for Petabyte-Scale Datasets
  • 13. select l_returnflag, o_orderstatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price from v_lineitem inner join v_orders on l_orderkey = o_orderkey where l_shipdate <= '1998-09-16' group by l_returnflag, o_orderstatus order by l_returnflag, o_orderstatus; Sort Aggr Filter TablesO(N) Join Parse SQL to an execution plan How Does Kylin Accelerate Queries? • Kylin uses Apache Calcite as the SQL parser and optimizer
  • 14. How Does Kylin Accelerate Queries? • Kylin optimizes and adapts the plan to an OLAP cube. • With less processing, Kylin can return the result instantly. Aggr Filter Tables Join Sort Sort Cube Filter Pick the best matched cube Rewrite toThese steps have already been completed in the cube build. O(1)
  • 15. Apache Kylin BI Tools Apps Machine Learning SQL Runtime Workload Offload Workload Scan & filter Extract Load Architecture The architecture of Apache Kylin v4.0.0-alpha
  • 16. Use Case: Online Shopping Reporting The most visited website in Japan https://techblog.yahoo.co.jp/oss/apache-kylin/ § Our reporting system used Impala as a backend database previously. - It took a long time (about 60 sec) to show Web UI. § In order to lower the latency, we moved to Apache Kylin. - Average latency < 1sec for most cases Thanks to low latency with Kylin, we become possible to focus on adding functions for users. § We provide a reporting system that show statistics for store owners. - e. g. impressions, clicks and sales.
  • 17. Apache Kylin 4.0 Roadmap: Cloud Native Data analytics Apache Kylin Container Service (K8S, Docker) Interactive Reporting Dashboard OLAP / Data mart Resource Orchestration Data Lake Source file, Streams, Parquet on Object Storage (S3, ADSL) Metadata Security • Less Dependency, More Lightweight • Automated Scaling • Less Computing and Storage Cost • Automated DevOps
  • 18. Data is next Oil The world’s most valuable resource is no longer oil, but data. —“The Economist” China’s Datasphere is expected to grow 30% on average over the next 7 years and will be the largest Datasphere of all regions by 2025 --IDC 175 Zettabytes By 2025 -- IDC
  • 19. and, Data is moving to Cloud
  • 20. But, the Chaos Happens Again!!!
  • 21. What is missing here? ? ? ? Reporting Dashboard Ad-Hoc Data-as-a-Services Machine Learning EDW Datasets Data Lake Datasets Cloud Datasets Data Lake Application SQL / MDX Data Analysts Marketing User Operation Analysts
  • 22. Unified Semantic Layer Govern Data Platform Reporting Dashboard Ad-Hoc Data-as-a-Services Machine Learning Managed Datasets Managed KPIs CUSTOMER CUSTOMER NUMBER CUSTOMER NAME CUSTOMER CITY CUSTOMER POST CUSTOMER ST CUSTOMER ADDR CUSTOMER PHONE CUSTOMER FAX ORDER ORDER NUMBER ORDER DATE STATUS ORDER ITEM BACKORDERED QUANTITY ITEM ITEM NUMBER QUANTITY DESCRIPTION ORDER ITEM SHIPPED QUANTITY SHIP DATE Finance KPI ERP KPI Accounting KPI …… Marketing KPI Sales KPI EDW Datasets Data Lake Datasets Cloud Datasets Data Lake Application SQL / MDX One-stop Governed Platform • Data as a service • Single source of truth • Managed golden data Intelligent Data Platform • Machine Learning recommendation from SQL history • Optimizaed for PB data at scale • High performance and High Concurrency Analysts Delighted Platform • Supports most favorite BI tools • Support standard SQL/MDX • Reduce engineering efforts Data Analysts Marketing User Operation Analysts Intelligent Cubing managed data at scale
  • 24. Use Case: Less Cubes for More 2 Cubes vs 1200 IBM Cognos Cubes Challenge • 1200+ existing cubes to manage • 1000+ jobs to maintain • Time-to-insight over 4 days Job Job Job Job Job Job Job … … Job Job Job 1200+ IBM Cognos Cubes, 1000+ ETL Jobs with dependencies Merchant Topic Daily Cube Region Topic Daily Cube Merchant Topic Monthly Cube 2 Cubes, 1 ETL Job ┄ ┄ Agency Topic Daily Cube Agency Topic Monthly Cube Channel Topic Monthly Cube Region Topic Monthly Cube Merchant Topic Shanghai Cube Merchant Topic Beijing Cube Merchant Topic Zhejiang Cube Merchant Topic Guangdong Cube Sub scenarios Channel Topic Daily Cube Sub scenarios Sub scenarios Solution • Using Kylin replaced IBM Cognos backend but continue keep Cognos Reporting to interactive with Kyligence Result • 1000x improved maintenance efficiency • 10x faster and more stable analytics performance • Time-to-Insight less then 4 hours
  • 25. Cloud-native Semantic Layer on Data Lake FinanceMarketingSales Index More… Data Lake Aggregation & Index ApplicationsSource Azure Blob Storage
  • 26. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.