SlideShare a Scribd company logo
Augmented OLAP
for Big Data
Luke Han | luke.han@Kyligence.io
Co-founder & CEO of Kyligence
Apache Kylin PMC Chair
Microsoft Reginal Director & MVP
Strata Global Sponsor
BOOTH #410
© Kyligence Inc. 2019.
About Luke Han
• Luke Han
• Co-founder & CEO at Kyligence
• Co-creator and PMC Chair of Apache Kylin
• Apache Software Foundation Member
• Microsoft Regional Director & MVP
• Former eBay Big Data Product Manager Lead
© Kyligence Inc. 2019.
About Apache Kylin
• Leading Open Source OLAP for Big Data
• Rank 1 from googling “big data OLAP”
• Rank 1 from googling “hadoop OLAP”
• Open sourced by eBay in 2014
• Graduated to Apache Top Project in 2015
• 1000+ Adoptions world wild
• 2015 InfoWorld Bossie Awards
• 2016 InfoWorld Bossie Awards
© Kyligence Inc. 2019.
Agenda
• About Kyligence
• Pains in Big Data Analysis
• Kyligence’s solution: Augmented OLAP
• Video Demo
• Benchmark
• Use Cases
© Kyligence Inc. 2019.
Kyligence = Kylin + Intelligence
• Founded in 2016 by the original creators of Apache Kylin
• CRN Top 10 Big Data Startups 2018
• Backing by leading VCs:
• Redpoint Ventures
• Cisco
• CBC Capital
• Shunwei Capital
• Eight Roads Ventures (Fidelity International Arm)
• Coatue
• Global Offices:
• Shanghai
• Beijing
• Shenzhen
• San Jose
• New York
• Seattle
• …
© Kyligence Inc. 2019.
Telecom
Finance
Manufacturing
Trusted by Global Leaders
Retail &
Others
Most of them are Global Fortune 500
© Kyligence Inc. 2019.
Global Partners
© Kyligence Inc. 2019.
Agenda
• About Kyligence
• Pains in Big Data Analysis
• Kyligence’s solution: Augmented OLAP
• Use Cases
© Kyligence Inc. 2019.
Let’s talk about photography story…
https://technave.com/data/files/mall/article/201812271418327393.jpg
© Kyligence Inc. 2019.
Let’s talk about photography story…
How many people
really know how to
setup those?
© Kyligence Inc. 2019.
Let’s talk about photography story…
https://technave.com/data/files/mall/article/201812271437395703.jpg
© Kyligence Inc. 2019.
Let’s talk about photography story…
Google Photos
© Kyligence Inc. 2019.
Let’s talk about photography story…
How do you manage
your 100,000+ photos?
Google Photos
© Kyligence Inc. 2019, Confidential.
Then…
how about your enterprise
data?
© Kyligence Inc. 2019, Confidential.
https://www.slideshare.net/datascienceth/introduction-to-data-science-data-science-thailand-meetup-1
https://www.sintetia.com/wp-content/uploads/2014/05/Data-Scientist-What-I-really-do.png
© Kyligence Inc. 2019, Confidential.
Fast and Changing
Analysis Demand
Slow and Heavy
Big Data Operations
vs
© Kyligence Inc. 2019.
The Typical “Throw in some People” Approach
Business Users Analysts Data Engineers
Business Analysis Data Modeling Lake → Warehouse → Mart Reporting
$$$High Cost :
Administrators
Slow Time-to-Insight:
© Kyligence Inc. 2019, Confidential.
Presentation
Visualization
Impala
Data Lake
Hive Spark SQL Drill
MapReduce Spark …….
Time-to-value Pain
Weeks of waiting breaks the
“online” promise.
Collaboration Pain
Hard to reuse asset across teams.
Each team fights their own path.
Resource Pain
Hard to scale. Where to find so
many skilled big data engineers?
Pains in the “Throw in some People” Approach
© Kyligence Inc. 2019.
Agenda
• About Kyligence
• Pains in Big Data Analysis
• Kyligence’s solution: Augmented OLAP
• Use Cases
© Kyligence Inc. 2019, Confidential.
Throw in some Intelligence!
Let a system replace the people.
o Transparent SQL Acceleration
o On-demand Data Preparation
o Interactive Query Performance
o High Concurrency
o Centralized Semantic Layer
Faster time to market. Stay “online”.
Augmented OLAP
Data Mart
Presentation
Visualization
Impala
Data Lake
Hive Drill
MapReduce Spark …….
Spark SQL
Semantic Automation Acceleration Governance
© Kyligence Inc. 2019.
A Learning OLAP System
Business User Insights
Business User Analyst Data Engineer
vs
© Kyligence Inc. 2019.
A Learning OLAP System
Business User Insights
Pattern Detection
Auto Modeling
Data Preparing
Raw Data
Prepared Data
Augmented
OLAP Engine
(Background Learning)
© Kyligence Inc. 2019.
Demo Setup
Tableau
SparkSQL 2.4
Kyligence
Enterprise
Analyze 1 billion rows of
sales records (TPC-H)
Business User
© Kyligence Inc. 2019.
(Embed the Demo Video)
© Kyligence Inc. 2019.
Demo FAQ
Business User
Analyst
reuse
How to improve the first slow
exploration?
What if the analyst operates differently
the second time?
More comprehensive performance
benchmark?
Prepared Data
© Kyligence Inc. 2019.
TPC-H Decision Support Benchmark
TPC-H Benchmark
• Examine large volumes of data
• High complexity queries
• Answers critical business questions
• 22 decision making queries
E.g. The Shipping Priority Query
retrieves the shipping priority and potential
revenue of the orders having the largest revenue
among those that had not been shipped as of a
given date. Top 10 orders are listed in
decreasing order of revenue.
© Kyligence Inc. 2019.
Kyligence Enterprise 4 Beta vs SparkSQL 2.4
To see the trend as data grows
• 3 datasets
• Scale Factor = 20, 35, 50
• TPCH_SF1: Consists of the base row size (several million
elements).
• TPCH_SF20: Consists of the base row size x 20.
• TPCH_SF35: Consists of the base row size x 35.
• TPCH_SF50: Consists of the base row size x 50 (several
hundred million elements).
Billion
© Kyligence Inc. 2019.
Hardware Configurations
Same 4 physical nodes
- Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz * 2
- Totally 86 vCores, 188 GB mem
Same Spark configuration for both KE 4 Beta and SparkSQL 2.4
- spark.driver.memory=16g
- spark.executor.memory=8g
- spark.yarn.executor.memoryOverhead=2g
- spark.yarn.am.memory=1024m
- spark.executor.cores=5
- spark.executor.instances=17
© Kyligence Inc. 2019.
Query Response Time | KE 4 Beta vs. SparkSQL 2.4
Milliseconds
TPC-H 22 queries
For each dataset
- Run each query 3 times
- Record the average time
- No warm up
Lower is better.
SF=50
© Kyligence Inc. 2019.
Total Response Time | KE 4 Beta vs. SparkSQL 2.4
Billion Seconds
Total response time is the sum
of 22 queries’ response time.
Compare over the size of
datasets and feel the trend.
Scale out for the future.
© Kyligence Inc. 2019.
Avg. Acceleration Rate | KE 4 Beta vs. SparkSQL 2.4
Acceleration Rate
= SparkSQL time / KE time
Take average of the 22 and
compare over size of datasets.
© Kyligence Inc. 2019.
SQL
Query Log
Analytic Behavior
Data
Schema
Data
Profile
Machine Learning
Engine
Data Modeling
Automation
Kylin Cube
Learnt Index
Smart Pushdown
BI
Real-time
Analysis
Data-as-a-
Service
Local
Deployment
Cloud
Platform
Container Data Services
© Kyligence Inc. 2018, Confidential.
AI-Augmented Analytics Platform
© Kyligence Inc. 2019.
Agenda
• About Kyligence
• Pains in Big Data Analysis
• Kyligence’s solution: Augmented OLAP
• Use Cases
© Kyligence Inc. 2019.
Use Case: IBM Cognos Replacement
One Kyligence Cube for 800+ Cognos Cubes
Org. Daily
Cube
Merch. Daily
Cube
Channel Daily
Cube
Region Daily
Cube
Org. Monthly
Cube
Merch.
Monthly Cube
Channel
Monthly Cube
Region
Monthly Cube
Shanghai
Merchants
Zhejiang
Merchants
Anhui
Merchants
Guangdong
Merchants
Card Transaction
Dimensions: 167
Measures: 20
800+ Cognos Cube, 1000+ ETL jobs
Functional
Scene
Time
Scene
Geo
Scene
┄ ┄
Data: 300+ B Records
Merchants: 10+m
Cards: 10+B
© Kyligence Inc. 2019.
Use Case: Data as a Services Platform
In the past, due to the limitations of our previous
multi-dimensional analytic tool, we faced challenges
of constrained time range in queries……We are
considering leveraging multi-dimensional data
cubes to replace a number of fragmented legacy
tabular reports in more business units, so that we
can provide better analytic services to our business
users.”
-- Laments Wu Ying, VP of CMBs Development
Center,
Offline Platform
(Hadoop)
Online Platform
(Hadoop)
EDW
(Teradata)
Kyligence Data-as-a-Service Platform
Tenancy 1 Tenancy 2 Tenancy 3 Tenancy N……
Business Intelligence
Cognos Tableau MicroStrategy Superset API
Smart Routine
Intelligent
Modeling
Multi-tenancy
Applications
SecurityJob
MIP MDS CRM DWS
© Kyligence Inc. 2019, Confidential.
Take away: Augmented OLAP, the future for analytics
ImpalaHive Spark SQL Drill
MapReduce Spark …….
AI-Augmented
OLAP
ImpalaHive Drill
MapReduce Spark …….
Spark SQL
Semantic Automation Acceleration Governance
Thanks
luke.han@Kyligence.io |
@lukehq
Homepage: http://kyligence.io
Twitter: @kyligence
Booth: #410

More Related Content

What's hot

Pivotal Digital Transformation Forum: Requirements to Become a Data-Driven En...
Pivotal Digital Transformation Forum: Requirements to Become a Data-Driven En...Pivotal Digital Transformation Forum: Requirements to Become a Data-Driven En...
Pivotal Digital Transformation Forum: Requirements to Become a Data-Driven En...
VMware Tanzu
 
Customer Spotlight: How WellCare Accelerated Big Data Delivery to Improve Ana...
Customer Spotlight: How WellCare Accelerated Big Data Delivery to Improve Ana...Customer Spotlight: How WellCare Accelerated Big Data Delivery to Improve Ana...
Customer Spotlight: How WellCare Accelerated Big Data Delivery to Improve Ana...
VMware Tanzu
 
Data Engineering the Startup Way - AWS Startup Day Chicago 2018
Data Engineering the Startup Way - AWS Startup Day Chicago 2018Data Engineering the Startup Way - AWS Startup Day Chicago 2018
Data Engineering the Startup Way - AWS Startup Day Chicago 2018
Amazon Web Services
 
ThoughtWorks Approach 2009
ThoughtWorks Approach 2009ThoughtWorks Approach 2009
ThoughtWorks Approach 2009
ThoughtWorks Studios
 
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...
Kai Wähner
 
Pivotal corporate story by CS Park
Pivotal corporate story by CS ParkPivotal corporate story by CS Park
Pivotal corporate story by CS Park
VMware Tanzu Korea
 
Using AI-powered Automation for High Performance Data Pipelines in the Cloud
Using AI-powered Automation for High Performance Data Pipelines in the CloudUsing AI-powered Automation for High Performance Data Pipelines in the Cloud
Using AI-powered Automation for High Performance Data Pipelines in the Cloud
DevOps.com
 
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
Data Con LA
 
Moving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from PivotalMoving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from Pivotal
VMware Tanzu Korea
 
Build a Big Data Warehouse on the Cloud in 30 Minutes
Build a Big Data Warehouse on the Cloud in 30 MinutesBuild a Big Data Warehouse on the Cloud in 30 Minutes
Build a Big Data Warehouse on the Cloud in 30 Minutes
Caserta
 
Well Architected Framework - Data
Well Architected Framework - Data Well Architected Framework - Data
Well Architected Framework - Data
Craig Milroy
 
Embracing Cloud Agility to Maximize Flexibility & Performance
Embracing Cloud Agility to Maximize Flexibility & Performance Embracing Cloud Agility to Maximize Flexibility & Performance
Embracing Cloud Agility to Maximize Flexibility & Performance
Talend
 
BLD() Tech Conference — Data exploration with KSQL
BLD() Tech Conference — Data exploration with KSQLBLD() Tech Conference — Data exploration with KSQL
BLD() Tech Conference — Data exploration with KSQL
Gillis J. de Nijs
 
Alan Southall, SVP of Engineering, Head of IoT Predictive Maintenance, SAP
Alan Southall, SVP of Engineering, Head of IoT Predictive Maintenance, SAPAlan Southall, SVP of Engineering, Head of IoT Predictive Maintenance, SAP
Alan Southall, SVP of Engineering, Head of IoT Predictive Maintenance, SAP
MIT Enterprise Forum Cambridge
 
Why Are Digital Disruptors Successful And How Can You Become One?
Why Are Digital Disruptors Successful And How Can You Become One? Why Are Digital Disruptors Successful And How Can You Become One?
Why Are Digital Disruptors Successful And How Can You Become One?
VMware Tanzu
 
Tableau Conference 2018: Binging on Data - Enabling Analytics at Netflix
Tableau Conference 2018: Binging on Data - Enabling Analytics at NetflixTableau Conference 2018: Binging on Data - Enabling Analytics at Netflix
Tableau Conference 2018: Binging on Data - Enabling Analytics at Netflix
Blake Irvine
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Matt Stubbs
 
Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...
Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...
Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...
Sri Ambati
 
AWS Summit Sydney 2014 | How Companies are Using Cloud-Based Data Visualizati...
AWS Summit Sydney 2014 | How Companies are Using Cloud-Based Data Visualizati...AWS Summit Sydney 2014 | How Companies are Using Cloud-Based Data Visualizati...
AWS Summit Sydney 2014 | How Companies are Using Cloud-Based Data Visualizati...
Amazon Web Services
 
Unleash the Power of Big Data and Machine Learning
Unleash the Power of Big Data and Machine LearningUnleash the Power of Big Data and Machine Learning
Unleash the Power of Big Data and Machine Learning
Talend
 

What's hot (20)

Pivotal Digital Transformation Forum: Requirements to Become a Data-Driven En...
Pivotal Digital Transformation Forum: Requirements to Become a Data-Driven En...Pivotal Digital Transformation Forum: Requirements to Become a Data-Driven En...
Pivotal Digital Transformation Forum: Requirements to Become a Data-Driven En...
 
Customer Spotlight: How WellCare Accelerated Big Data Delivery to Improve Ana...
Customer Spotlight: How WellCare Accelerated Big Data Delivery to Improve Ana...Customer Spotlight: How WellCare Accelerated Big Data Delivery to Improve Ana...
Customer Spotlight: How WellCare Accelerated Big Data Delivery to Improve Ana...
 
Data Engineering the Startup Way - AWS Startup Day Chicago 2018
Data Engineering the Startup Way - AWS Startup Day Chicago 2018Data Engineering the Startup Way - AWS Startup Day Chicago 2018
Data Engineering the Startup Way - AWS Startup Day Chicago 2018
 
ThoughtWorks Approach 2009
ThoughtWorks Approach 2009ThoughtWorks Approach 2009
ThoughtWorks Approach 2009
 
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...
 
Pivotal corporate story by CS Park
Pivotal corporate story by CS ParkPivotal corporate story by CS Park
Pivotal corporate story by CS Park
 
Using AI-powered Automation for High Performance Data Pipelines in the Cloud
Using AI-powered Automation for High Performance Data Pipelines in the CloudUsing AI-powered Automation for High Performance Data Pipelines in the Cloud
Using AI-powered Automation for High Performance Data Pipelines in the Cloud
 
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
 
Moving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from PivotalMoving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from Pivotal
 
Build a Big Data Warehouse on the Cloud in 30 Minutes
Build a Big Data Warehouse on the Cloud in 30 MinutesBuild a Big Data Warehouse on the Cloud in 30 Minutes
Build a Big Data Warehouse on the Cloud in 30 Minutes
 
Well Architected Framework - Data
Well Architected Framework - Data Well Architected Framework - Data
Well Architected Framework - Data
 
Embracing Cloud Agility to Maximize Flexibility & Performance
Embracing Cloud Agility to Maximize Flexibility & Performance Embracing Cloud Agility to Maximize Flexibility & Performance
Embracing Cloud Agility to Maximize Flexibility & Performance
 
BLD() Tech Conference — Data exploration with KSQL
BLD() Tech Conference — Data exploration with KSQLBLD() Tech Conference — Data exploration with KSQL
BLD() Tech Conference — Data exploration with KSQL
 
Alan Southall, SVP of Engineering, Head of IoT Predictive Maintenance, SAP
Alan Southall, SVP of Engineering, Head of IoT Predictive Maintenance, SAPAlan Southall, SVP of Engineering, Head of IoT Predictive Maintenance, SAP
Alan Southall, SVP of Engineering, Head of IoT Predictive Maintenance, SAP
 
Why Are Digital Disruptors Successful And How Can You Become One?
Why Are Digital Disruptors Successful And How Can You Become One? Why Are Digital Disruptors Successful And How Can You Become One?
Why Are Digital Disruptors Successful And How Can You Become One?
 
Tableau Conference 2018: Binging on Data - Enabling Analytics at Netflix
Tableau Conference 2018: Binging on Data - Enabling Analytics at NetflixTableau Conference 2018: Binging on Data - Enabling Analytics at Netflix
Tableau Conference 2018: Binging on Data - Enabling Analytics at Netflix
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...
Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...
Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...
 
AWS Summit Sydney 2014 | How Companies are Using Cloud-Based Data Visualizati...
AWS Summit Sydney 2014 | How Companies are Using Cloud-Based Data Visualizati...AWS Summit Sydney 2014 | How Companies are Using Cloud-Based Data Visualizati...
AWS Summit Sydney 2014 | How Companies are Using Cloud-Based Data Visualizati...
 
Unleash the Power of Big Data and Machine Learning
Unleash the Power of Big Data and Machine LearningUnleash the Power of Big Data and Machine Learning
Unleash the Power of Big Data and Machine Learning
 

Similar to Augmented OLAP Analytics for Big Data

Simplify Data Analytics Over the Cloud
Simplify Data Analytics Over the CloudSimplify Data Analytics Over the Cloud
Simplify Data Analytics Over the Cloud
Tyler Wishnoff
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
ConnectaDigital
 
Apache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data SpainApache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data Spain
Luke Han
 
Kyligence Cloud 4 - An Overview
Kyligence Cloud 4 - An OverviewKyligence Cloud 4 - An Overview
Kyligence Cloud 4 - An Overview
SamanthaBerlant
 
The Enabling Power of Distributed SQL for Enterprise Digital Transformation I...
The Enabling Power of Distributed SQL for Enterprise Digital Transformation I...The Enabling Power of Distributed SQL for Enterprise Digital Transformation I...
The Enabling Power of Distributed SQL for Enterprise Digital Transformation I...
NuoDB
 
Business Data Lake Best Practices
Business Data Lake Best PracticesBusiness Data Lake Best Practices
Business Data Lake Best Practices
Capgemini
 
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI
Holden Ackerman
 
Becoming a data driven organization
Becoming a data driven organization Becoming a data driven organization
Becoming a data driven organization
Magnus Backman
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the Ugly
Tyler Wishnoff
 
Turning Big Data into Better Business Outcomes
Turning Big Data into Better Business OutcomesTurning Big Data into Better Business Outcomes
Turning Big Data into Better Business Outcomes
Cisco Canada
 
Snowflake: The Good, the Bad and the Ugly
Snowflake: The Good, the Bad and the UglySnowflake: The Good, the Bad and the Ugly
Snowflake: The Good, the Bad and the Ugly
SamanthaBerlant
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Matt Stubbs
 
Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTX
Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTXCustomer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTX
Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTX
tsigitnist02
 
Architecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High PerformanceArchitecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High Performance
SamanthaBerlant
 
The new dominant companies are running on data
The new dominant companies are running on data The new dominant companies are running on data
The new dominant companies are running on data
SnapLogic
 
Digital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming EraDigital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming Era
Attunity
 
Seeing Redshift: How Amazon Changed Data Warehousing Forever
Seeing Redshift: How Amazon Changed Data Warehousing ForeverSeeing Redshift: How Amazon Changed Data Warehousing Forever
Seeing Redshift: How Amazon Changed Data Warehousing Forever
Inside Analysis
 
Drive Business Outcomes for Big Data Environments
Drive Business Outcomes for Big Data EnvironmentsDrive Business Outcomes for Big Data Environments
Drive Business Outcomes for Big Data Environments
Cisco Services
 
BIG Data & Hadoop Applications in Finance
BIG Data & Hadoop Applications in FinanceBIG Data & Hadoop Applications in Finance
BIG Data & Hadoop Applications in Finance
Skillspeed
 
Building a hybrid, dynamic cloud on an open architecture
Building a hybrid, dynamic cloud on an open architectureBuilding a hybrid, dynamic cloud on an open architecture
Building a hybrid, dynamic cloud on an open architecture
Daniel Krook
 

Similar to Augmented OLAP Analytics for Big Data (20)

Simplify Data Analytics Over the Cloud
Simplify Data Analytics Over the CloudSimplify Data Analytics Over the Cloud
Simplify Data Analytics Over the Cloud
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
 
Apache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data SpainApache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data Spain
 
Kyligence Cloud 4 - An Overview
Kyligence Cloud 4 - An OverviewKyligence Cloud 4 - An Overview
Kyligence Cloud 4 - An Overview
 
The Enabling Power of Distributed SQL for Enterprise Digital Transformation I...
The Enabling Power of Distributed SQL for Enterprise Digital Transformation I...The Enabling Power of Distributed SQL for Enterprise Digital Transformation I...
The Enabling Power of Distributed SQL for Enterprise Digital Transformation I...
 
Business Data Lake Best Practices
Business Data Lake Best PracticesBusiness Data Lake Best Practices
Business Data Lake Best Practices
 
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI
 
Becoming a data driven organization
Becoming a data driven organization Becoming a data driven organization
Becoming a data driven organization
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the Ugly
 
Turning Big Data into Better Business Outcomes
Turning Big Data into Better Business OutcomesTurning Big Data into Better Business Outcomes
Turning Big Data into Better Business Outcomes
 
Snowflake: The Good, the Bad and the Ugly
Snowflake: The Good, the Bad and the UglySnowflake: The Good, the Bad and the Ugly
Snowflake: The Good, the Bad and the Ugly
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTX
Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTXCustomer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTX
Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTX
 
Architecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High PerformanceArchitecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High Performance
 
The new dominant companies are running on data
The new dominant companies are running on data The new dominant companies are running on data
The new dominant companies are running on data
 
Digital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming EraDigital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming Era
 
Seeing Redshift: How Amazon Changed Data Warehousing Forever
Seeing Redshift: How Amazon Changed Data Warehousing ForeverSeeing Redshift: How Amazon Changed Data Warehousing Forever
Seeing Redshift: How Amazon Changed Data Warehousing Forever
 
Drive Business Outcomes for Big Data Environments
Drive Business Outcomes for Big Data EnvironmentsDrive Business Outcomes for Big Data Environments
Drive Business Outcomes for Big Data Environments
 
BIG Data & Hadoop Applications in Finance
BIG Data & Hadoop Applications in FinanceBIG Data & Hadoop Applications in Finance
BIG Data & Hadoop Applications in Finance
 
Building a hybrid, dynamic cloud on an open architecture
Building a hybrid, dynamic cloud on an open architectureBuilding a hybrid, dynamic cloud on an open architecture
Building a hybrid, dynamic cloud on an open architecture
 

More from Tyler Wishnoff

Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...
Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...
Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...
Tyler Wishnoff
 
How to Guarantee Exact COUNT DISTINCT Queries with Sub-Second Latency on Mass...
How to Guarantee Exact COUNT DISTINCT Queries with Sub-Second Latency on Mass...How to Guarantee Exact COUNT DISTINCT Queries with Sub-Second Latency on Mass...
How to Guarantee Exact COUNT DISTINCT Queries with Sub-Second Latency on Mass...
Tyler Wishnoff
 
Providing Interactive Analytics on Excel with Billions of Rows
Providing Interactive Analytics on Excel with Billions of RowsProviding Interactive Analytics on Excel with Billions of Rows
Providing Interactive Analytics on Excel with Billions of Rows
Tyler Wishnoff
 
Analysis of the Pressure Placed on Medical Systems during the COVID-19 Pandemic
Analysis of the Pressure Placed on Medical Systems during the COVID-19 PandemicAnalysis of the Pressure Placed on Medical Systems during the COVID-19 Pandemic
Analysis of the Pressure Placed on Medical Systems during the COVID-19 Pandemic
Tyler Wishnoff
 
Apache Kylin Meetup: Berlin - With OLX Group
Apache Kylin Meetup: Berlin - With OLX GroupApache Kylin Meetup: Berlin - With OLX Group
Apache Kylin Meetup: Berlin - With OLX Group
Tyler Wishnoff
 
Apache Kylin Data Summit 2019: Kyligence Presentation
Apache Kylin Data Summit 2019: Kyligence PresentationApache Kylin Data Summit 2019: Kyligence Presentation
Apache Kylin Data Summit 2019: Kyligence Presentation
Tyler Wishnoff
 
How Analytics Teams Using SSAS Can Embrace Big Data and the Cloud
How Analytics Teams Using SSAS Can Embrace Big Data and the CloudHow Analytics Teams Using SSAS Can Embrace Big Data and the Cloud
How Analytics Teams Using SSAS Can Embrace Big Data and the Cloud
Tyler Wishnoff
 
Accelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache KylinAccelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache Kylin
Tyler Wishnoff
 

More from Tyler Wishnoff (8)

Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...
Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...
Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...
 
How to Guarantee Exact COUNT DISTINCT Queries with Sub-Second Latency on Mass...
How to Guarantee Exact COUNT DISTINCT Queries with Sub-Second Latency on Mass...How to Guarantee Exact COUNT DISTINCT Queries with Sub-Second Latency on Mass...
How to Guarantee Exact COUNT DISTINCT Queries with Sub-Second Latency on Mass...
 
Providing Interactive Analytics on Excel with Billions of Rows
Providing Interactive Analytics on Excel with Billions of RowsProviding Interactive Analytics on Excel with Billions of Rows
Providing Interactive Analytics on Excel with Billions of Rows
 
Analysis of the Pressure Placed on Medical Systems during the COVID-19 Pandemic
Analysis of the Pressure Placed on Medical Systems during the COVID-19 PandemicAnalysis of the Pressure Placed on Medical Systems during the COVID-19 Pandemic
Analysis of the Pressure Placed on Medical Systems during the COVID-19 Pandemic
 
Apache Kylin Meetup: Berlin - With OLX Group
Apache Kylin Meetup: Berlin - With OLX GroupApache Kylin Meetup: Berlin - With OLX Group
Apache Kylin Meetup: Berlin - With OLX Group
 
Apache Kylin Data Summit 2019: Kyligence Presentation
Apache Kylin Data Summit 2019: Kyligence PresentationApache Kylin Data Summit 2019: Kyligence Presentation
Apache Kylin Data Summit 2019: Kyligence Presentation
 
How Analytics Teams Using SSAS Can Embrace Big Data and the Cloud
How Analytics Teams Using SSAS Can Embrace Big Data and the CloudHow Analytics Teams Using SSAS Can Embrace Big Data and the Cloud
How Analytics Teams Using SSAS Can Embrace Big Data and the Cloud
 
Accelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache KylinAccelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache Kylin
 

Recently uploaded

End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
yuvarajkumar334
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
bmucuha
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
VyNguyen709676
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
y3i0qsdzb
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
bmucuha
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
a9qfiubqu
 

Recently uploaded (20)

End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
 

Augmented OLAP Analytics for Big Data

  • 1. Augmented OLAP for Big Data Luke Han | luke.han@Kyligence.io Co-founder & CEO of Kyligence Apache Kylin PMC Chair Microsoft Reginal Director & MVP Strata Global Sponsor BOOTH #410
  • 2. © Kyligence Inc. 2019. About Luke Han • Luke Han • Co-founder & CEO at Kyligence • Co-creator and PMC Chair of Apache Kylin • Apache Software Foundation Member • Microsoft Regional Director & MVP • Former eBay Big Data Product Manager Lead
  • 3. © Kyligence Inc. 2019. About Apache Kylin • Leading Open Source OLAP for Big Data • Rank 1 from googling “big data OLAP” • Rank 1 from googling “hadoop OLAP” • Open sourced by eBay in 2014 • Graduated to Apache Top Project in 2015 • 1000+ Adoptions world wild • 2015 InfoWorld Bossie Awards • 2016 InfoWorld Bossie Awards
  • 4. © Kyligence Inc. 2019. Agenda • About Kyligence • Pains in Big Data Analysis • Kyligence’s solution: Augmented OLAP • Video Demo • Benchmark • Use Cases
  • 5. © Kyligence Inc. 2019. Kyligence = Kylin + Intelligence • Founded in 2016 by the original creators of Apache Kylin • CRN Top 10 Big Data Startups 2018 • Backing by leading VCs: • Redpoint Ventures • Cisco • CBC Capital • Shunwei Capital • Eight Roads Ventures (Fidelity International Arm) • Coatue • Global Offices: • Shanghai • Beijing • Shenzhen • San Jose • New York • Seattle • …
  • 6. © Kyligence Inc. 2019. Telecom Finance Manufacturing Trusted by Global Leaders Retail & Others Most of them are Global Fortune 500
  • 7. © Kyligence Inc. 2019. Global Partners
  • 8. © Kyligence Inc. 2019. Agenda • About Kyligence • Pains in Big Data Analysis • Kyligence’s solution: Augmented OLAP • Use Cases
  • 9. © Kyligence Inc. 2019. Let’s talk about photography story… https://technave.com/data/files/mall/article/201812271418327393.jpg
  • 10. © Kyligence Inc. 2019. Let’s talk about photography story… How many people really know how to setup those?
  • 11. © Kyligence Inc. 2019. Let’s talk about photography story… https://technave.com/data/files/mall/article/201812271437395703.jpg
  • 12. © Kyligence Inc. 2019. Let’s talk about photography story… Google Photos
  • 13. © Kyligence Inc. 2019. Let’s talk about photography story… How do you manage your 100,000+ photos? Google Photos
  • 14. © Kyligence Inc. 2019, Confidential. Then… how about your enterprise data?
  • 15. © Kyligence Inc. 2019, Confidential. https://www.slideshare.net/datascienceth/introduction-to-data-science-data-science-thailand-meetup-1 https://www.sintetia.com/wp-content/uploads/2014/05/Data-Scientist-What-I-really-do.png
  • 16. © Kyligence Inc. 2019, Confidential. Fast and Changing Analysis Demand Slow and Heavy Big Data Operations vs
  • 17. © Kyligence Inc. 2019. The Typical “Throw in some People” Approach Business Users Analysts Data Engineers Business Analysis Data Modeling Lake → Warehouse → Mart Reporting $$$High Cost : Administrators Slow Time-to-Insight:
  • 18. © Kyligence Inc. 2019, Confidential. Presentation Visualization Impala Data Lake Hive Spark SQL Drill MapReduce Spark ……. Time-to-value Pain Weeks of waiting breaks the “online” promise. Collaboration Pain Hard to reuse asset across teams. Each team fights their own path. Resource Pain Hard to scale. Where to find so many skilled big data engineers? Pains in the “Throw in some People” Approach
  • 19. © Kyligence Inc. 2019. Agenda • About Kyligence • Pains in Big Data Analysis • Kyligence’s solution: Augmented OLAP • Use Cases
  • 20. © Kyligence Inc. 2019, Confidential. Throw in some Intelligence! Let a system replace the people. o Transparent SQL Acceleration o On-demand Data Preparation o Interactive Query Performance o High Concurrency o Centralized Semantic Layer Faster time to market. Stay “online”. Augmented OLAP Data Mart Presentation Visualization Impala Data Lake Hive Drill MapReduce Spark ……. Spark SQL Semantic Automation Acceleration Governance
  • 21. © Kyligence Inc. 2019. A Learning OLAP System Business User Insights Business User Analyst Data Engineer vs
  • 22. © Kyligence Inc. 2019. A Learning OLAP System Business User Insights Pattern Detection Auto Modeling Data Preparing Raw Data Prepared Data Augmented OLAP Engine (Background Learning)
  • 23. © Kyligence Inc. 2019. Demo Setup Tableau SparkSQL 2.4 Kyligence Enterprise Analyze 1 billion rows of sales records (TPC-H) Business User
  • 24. © Kyligence Inc. 2019. (Embed the Demo Video)
  • 25. © Kyligence Inc. 2019. Demo FAQ Business User Analyst reuse How to improve the first slow exploration? What if the analyst operates differently the second time? More comprehensive performance benchmark? Prepared Data
  • 26. © Kyligence Inc. 2019. TPC-H Decision Support Benchmark TPC-H Benchmark • Examine large volumes of data • High complexity queries • Answers critical business questions • 22 decision making queries E.g. The Shipping Priority Query retrieves the shipping priority and potential revenue of the orders having the largest revenue among those that had not been shipped as of a given date. Top 10 orders are listed in decreasing order of revenue.
  • 27. © Kyligence Inc. 2019. Kyligence Enterprise 4 Beta vs SparkSQL 2.4 To see the trend as data grows • 3 datasets • Scale Factor = 20, 35, 50 • TPCH_SF1: Consists of the base row size (several million elements). • TPCH_SF20: Consists of the base row size x 20. • TPCH_SF35: Consists of the base row size x 35. • TPCH_SF50: Consists of the base row size x 50 (several hundred million elements). Billion
  • 28. © Kyligence Inc. 2019. Hardware Configurations Same 4 physical nodes - Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz * 2 - Totally 86 vCores, 188 GB mem Same Spark configuration for both KE 4 Beta and SparkSQL 2.4 - spark.driver.memory=16g - spark.executor.memory=8g - spark.yarn.executor.memoryOverhead=2g - spark.yarn.am.memory=1024m - spark.executor.cores=5 - spark.executor.instances=17
  • 29. © Kyligence Inc. 2019. Query Response Time | KE 4 Beta vs. SparkSQL 2.4 Milliseconds TPC-H 22 queries For each dataset - Run each query 3 times - Record the average time - No warm up Lower is better. SF=50
  • 30. © Kyligence Inc. 2019. Total Response Time | KE 4 Beta vs. SparkSQL 2.4 Billion Seconds Total response time is the sum of 22 queries’ response time. Compare over the size of datasets and feel the trend. Scale out for the future.
  • 31. © Kyligence Inc. 2019. Avg. Acceleration Rate | KE 4 Beta vs. SparkSQL 2.4 Acceleration Rate = SparkSQL time / KE time Take average of the 22 and compare over size of datasets.
  • 32. © Kyligence Inc. 2019. SQL Query Log Analytic Behavior Data Schema Data Profile Machine Learning Engine Data Modeling Automation Kylin Cube Learnt Index Smart Pushdown BI Real-time Analysis Data-as-a- Service Local Deployment Cloud Platform Container Data Services © Kyligence Inc. 2018, Confidential. AI-Augmented Analytics Platform
  • 33. © Kyligence Inc. 2019. Agenda • About Kyligence • Pains in Big Data Analysis • Kyligence’s solution: Augmented OLAP • Use Cases
  • 34. © Kyligence Inc. 2019. Use Case: IBM Cognos Replacement One Kyligence Cube for 800+ Cognos Cubes Org. Daily Cube Merch. Daily Cube Channel Daily Cube Region Daily Cube Org. Monthly Cube Merch. Monthly Cube Channel Monthly Cube Region Monthly Cube Shanghai Merchants Zhejiang Merchants Anhui Merchants Guangdong Merchants Card Transaction Dimensions: 167 Measures: 20 800+ Cognos Cube, 1000+ ETL jobs Functional Scene Time Scene Geo Scene ┄ ┄ Data: 300+ B Records Merchants: 10+m Cards: 10+B
  • 35. © Kyligence Inc. 2019. Use Case: Data as a Services Platform In the past, due to the limitations of our previous multi-dimensional analytic tool, we faced challenges of constrained time range in queries……We are considering leveraging multi-dimensional data cubes to replace a number of fragmented legacy tabular reports in more business units, so that we can provide better analytic services to our business users.” -- Laments Wu Ying, VP of CMBs Development Center, Offline Platform (Hadoop) Online Platform (Hadoop) EDW (Teradata) Kyligence Data-as-a-Service Platform Tenancy 1 Tenancy 2 Tenancy 3 Tenancy N…… Business Intelligence Cognos Tableau MicroStrategy Superset API Smart Routine Intelligent Modeling Multi-tenancy Applications SecurityJob MIP MDS CRM DWS
  • 36. © Kyligence Inc. 2019, Confidential. Take away: Augmented OLAP, the future for analytics ImpalaHive Spark SQL Drill MapReduce Spark ……. AI-Augmented OLAP ImpalaHive Drill MapReduce Spark ……. Spark SQL Semantic Automation Acceleration Governance