SlideShare a Scribd company logo
1 of 21
Kyligence Cloud 4 Feature Focus:
Spark-Powered
Cubing and Indexing
Mike Shen
Senior Solutions Architect
© Kyligence Inc. 2021
Three Pillars of Modern Cloud Analytics
Distributed Compute Columnar Storage Precomputation
© Kyligence Inc. 2021
MPP Queries – Slow and Costly in the Cloud
Common to Greenplum, Snowflake, Synapse, Redshift
Data Volume /
$$$
Resource
Consumption
Runtime Computation
MPP Engines : Runtime Computation
Data
Concurrency
© Kyligence Inc. 2021
Precomputation - Faster and Cheaper in the Cloud
An Aggregate Index (a.k.a. Cube)
Precomputation – Compute Once, Query Many Times
Data
Data Volume /
$$$
Resource
Consumption
Runtime Computation
Concurrency
Precomputation
© Kyligence Inc. 2021
Explain Precomputation with an Analogy
© Kyligence Inc. 2021
 Busy families:
 Full-time working parents with work commutes
 Drive kids to sport or music practices after school
 Minimize outside food like pizza delivery
 Help with weekday dinner preparation
 Objective - fast, reliable, and healthy dinners
 Popular practice in the US
Meal Prep
© Kyligence Inc. 2021
 Devote a portion of weekend to prepare
 Plan the meals for an entire week
 Shop for groceries
 Wash and chop the ingredients
 Re-package ingredients into meal sized portions
 On weekdays, cooking is quick and easy
 Repeat every weekend
How Does Meal Prep Work?
© Kyligence Inc. 2021
 Predictable Demand
 Family members will come home hungry
 Very little time to prepare dinner
 Resource Management
 Have time on weekends
 More people available to help on weekends
 Net labor saving with consolidated clean up
 Clear Benefits
 Well organized weekday evenings
 Consistent healthy eating
 Economic savings
Key Factors
© Kyligence Inc. 2021
 Predictable Demand
 Know the OLAP queries – filters, joins, and aggregations
 Know when these queries will show up, such as scheduled reports
 Resource Management
 Control when to process the source data
 Design optimized indexes
 Control hardware footprint – scale a cluster to suit the workload
 Clear Benefits
 Queries fulfilled at lightning speeds
 Scheduled reports run quickly
 Analysts issue more queries and derive more business insights
 Increased query load does not lead to increased cost
Analogous with Precomputation
© Kyligence Inc. 2021
Meal Prep Precomputation Technology
Weekend prepping (heavy lifting) Index building (heavy lifting) Spark build* cluster
Re-packaged ingredients Index files Parquet columnar format
Weekday cooking (payoff) Serve user queries (payoff) Spark query* cluster
* Kyligence separates build and query activities to avoid resource contention, hence different Spark clusters
Direct Comparisons
© Kyligence Inc. 2021
Kyligence Cloud 4
The First Cloud-Native OLAP Platform
Cloud-native architecture Elasticity, separate compute from storage
Unified Semantic Model MDX, SQL, REST access transparent to BI users
Cloud immediacy Deploys in minutes, on AWS/Azure marketplace
Enhanced AI engine Auto-modeling, self-tuning cubes/indexes
© Kyligence Inc. 2021
Kyligence Architecture
Unified Semantic Layer
Multi-Dimensional Modeling Security & Governance
Acceleration Layer
Aggregate & Table Indexes
AI-Augmented Engine
Any Cloud
Data Warehouse
Data Lake
Streaming Data
Any Data Platform
Any BI Tool
Machine Learning
SaaS & Apps
CRM HCM
SCM
SQL,
MDX,
REST
SQL,
MDX,
REST
SQL,
MDX,
REST
© Kyligence Inc. 2021
Cloud Native Architecture
Separation of compute and storage
• Spark compute cluster
• Shared cloud object storage
• Scale and optimize separately for best
results and price performance
• Continuity: build cluster separate from – and
doesn’t disrupt – query cluster
Removed legacy Hadoop requirement
• MapReduce, HDFS, HBase
Spark Cubing Persisted in Parquet format,
Optimized for Cloud Storage
High performance components
• Highly parallel processing for large workloads
• All tasks are executed in memory with Spark
- cubing, querying, modeling, aggregation
• Cubes and indexes saved in Parquet
© Kyligence Inc. 2021
Spark Parallel Processing and Cloud Columnar Storage
© Kyligence Inc. 2021
Spark Is Best Suited for Parallel Cubing/Indexing Operations
© Kyligence Inc. 2021
Precomputing Everything Isn’t Practical
Precomputation
Total Indexes = 2 ^ Number of dimensions - 1
© Kyligence Inc. 2021
Precomputing Everything Isn’t Practical
Challenge
• 50 dimensions => 1.2T Indexes
• Today, 100+ dimensions are possible
• Cost is always critical
Solution
• AI-brokered compromise
© Kyligence Inc. 2021
Intelligent Optimization: Automatic Index Tuning
User-Guided Intelligent
Operations
• User sets rules, system only recommends the
critical business scenarios that comply
• System continuously learns
Benefits
• Improve build, query, and storage efficiency
• Lower the tuning threshold and optimize
the index without professional coaching
and training
Custom Rules Critical Business Query
Index Auto-Tuning
E
A, E
A, B, E
A, B, C, E
A, B, C, D, E
© Kyligence Inc. 2021
© Kyligence Inc. 2021
Contact Us
Kyligence Inc
 http://kyligence.io
 info@kyligence.io
 Twitter: @Kyligence
Apache Kylin
 http://kylin.apache.org
 dev@kylin.apache.org
 Twitter: @ApacheKylin
© Kyligence Inc. 2020, Confidential.

More Related Content

What's hot

Snowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for EveryoneSnowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for EveryoneAngel Abundez
 
Key Architecture and Performance Principles to Optimize Data Management
Key Architecture and Performance Principles to Optimize Data ManagementKey Architecture and Performance Principles to Optimize Data Management
Key Architecture and Performance Principles to Optimize Data ManagementJana Lass
 
How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?Jeraldine Phneah
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMatillion
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglyTyler Wishnoff
 
StackEngine Demo - Docker Austin
StackEngine Demo - Docker AustinStackEngine Demo - Docker Austin
StackEngine Demo - Docker AustinBoyd Hemphill
 
Altis AWS Snowflake Practice
Altis AWS Snowflake PracticeAltis AWS Snowflake Practice
Altis AWS Snowflake PracticeSamanthaSwain7
 
CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014Hortonworks
 
Introducing Direct Database Access with Snowflake + Intrinio
Introducing Direct Database Access with Snowflake + IntrinioIntroducing Direct Database Access with Snowflake + Intrinio
Introducing Direct Database Access with Snowflake + IntrinioIntrinio
 
Google Cloud Platform Tutorial | GCP Fundamentals | Edureka
Google Cloud Platform Tutorial | GCP Fundamentals | EdurekaGoogle Cloud Platform Tutorial | GCP Fundamentals | Edureka
Google Cloud Platform Tutorial | GCP Fundamentals | EdurekaEdureka!
 
Webinar: SnapLogic Winter 2015
Webinar: SnapLogic Winter 2015Webinar: SnapLogic Winter 2015
Webinar: SnapLogic Winter 2015SnapLogic
 
#DataUnlimited - Google Big Data Unlimited
#DataUnlimited - Google Big Data Unlimited#DataUnlimited - Google Big Data Unlimited
#DataUnlimited - Google Big Data UnlimitedAudrey Huvet
 
Simplifying Your Journey to the Cloud: The Benefits of a Cloud-Based Data War...
Simplifying Your Journey to the Cloud: The Benefits of a Cloud-Based Data War...Simplifying Your Journey to the Cloud: The Benefits of a Cloud-Based Data War...
Simplifying Your Journey to the Cloud: The Benefits of a Cloud-Based Data War...Matillion
 
NetApp IT Uses NetApp Manageability SDK to do More Than Configuration Tasks
NetApp IT Uses NetApp Manageability SDK to do More Than Configuration Tasks NetApp IT Uses NetApp Manageability SDK to do More Than Configuration Tasks
NetApp IT Uses NetApp Manageability SDK to do More Than Configuration Tasks NetApp
 
Google cloud session 1
Google cloud session 1Google cloud session 1
Google cloud session 1Vijay Ojha
 
Google Cloud Platform
Google Cloud PlatformGoogle Cloud Platform
Google Cloud PlatformGeneXus
 
Get Savvy with Snowflake
Get Savvy with SnowflakeGet Savvy with Snowflake
Get Savvy with SnowflakeMatillion
 
On Cloud Nine: How to be happy migrating your in-memory computing platform to...
On Cloud Nine: How to be happy migrating your in-memory computing platform to...On Cloud Nine: How to be happy migrating your in-memory computing platform to...
On Cloud Nine: How to be happy migrating your in-memory computing platform to...Stephen Darlington
 
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019GoDataDriven
 

What's hot (20)

Snowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for EveryoneSnowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for Everyone
 
Key Architecture and Performance Principles to Optimize Data Management
Key Architecture and Performance Principles to Optimize Data ManagementKey Architecture and Performance Principles to Optimize Data Management
Key Architecture and Performance Principles to Optimize Data Management
 
How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the Ugly
 
StackEngine Demo - Docker Austin
StackEngine Demo - Docker AustinStackEngine Demo - Docker Austin
StackEngine Demo - Docker Austin
 
Altis AWS Snowflake Practice
Altis AWS Snowflake PracticeAltis AWS Snowflake Practice
Altis AWS Snowflake Practice
 
CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014
 
Introducing Direct Database Access with Snowflake + Intrinio
Introducing Direct Database Access with Snowflake + IntrinioIntroducing Direct Database Access with Snowflake + Intrinio
Introducing Direct Database Access with Snowflake + Intrinio
 
Google Cloud Platform Tutorial | GCP Fundamentals | Edureka
Google Cloud Platform Tutorial | GCP Fundamentals | EdurekaGoogle Cloud Platform Tutorial | GCP Fundamentals | Edureka
Google Cloud Platform Tutorial | GCP Fundamentals | Edureka
 
Webinar: SnapLogic Winter 2015
Webinar: SnapLogic Winter 2015Webinar: SnapLogic Winter 2015
Webinar: SnapLogic Winter 2015
 
#DataUnlimited - Google Big Data Unlimited
#DataUnlimited - Google Big Data Unlimited#DataUnlimited - Google Big Data Unlimited
#DataUnlimited - Google Big Data Unlimited
 
Simplifying Your Journey to the Cloud: The Benefits of a Cloud-Based Data War...
Simplifying Your Journey to the Cloud: The Benefits of a Cloud-Based Data War...Simplifying Your Journey to the Cloud: The Benefits of a Cloud-Based Data War...
Simplifying Your Journey to the Cloud: The Benefits of a Cloud-Based Data War...
 
NetApp IT Uses NetApp Manageability SDK to do More Than Configuration Tasks
NetApp IT Uses NetApp Manageability SDK to do More Than Configuration Tasks NetApp IT Uses NetApp Manageability SDK to do More Than Configuration Tasks
NetApp IT Uses NetApp Manageability SDK to do More Than Configuration Tasks
 
Google cloud session 1
Google cloud session 1Google cloud session 1
Google cloud session 1
 
Google Cloud Platform
Google Cloud PlatformGoogle Cloud Platform
Google Cloud Platform
 
Enterprise_scale_data_blending
Enterprise_scale_data_blendingEnterprise_scale_data_blending
Enterprise_scale_data_blending
 
Get Savvy with Snowflake
Get Savvy with SnowflakeGet Savvy with Snowflake
Get Savvy with Snowflake
 
On Cloud Nine: How to be happy migrating your in-memory computing platform to...
On Cloud Nine: How to be happy migrating your in-memory computing platform to...On Cloud Nine: How to be happy migrating your in-memory computing platform to...
On Cloud Nine: How to be happy migrating your in-memory computing platform to...
 
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
 

Similar to Kyligence Cloud 4 - Feature Focus: Spark-Powered Cubing and Indexing

Architecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High PerformanceArchitecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High PerformanceSamanthaBerlant
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformDATAVERSITY
 
The Need For Speed - Strategies to Modernize Your Data Center
The Need For Speed - Strategies to Modernize Your Data CenterThe Need For Speed - Strategies to Modernize Your Data Center
The Need For Speed - Strategies to Modernize Your Data CenterEDB
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise AnalyticsDATAVERSITY
 
Augmented OLAP for Big Data Analytics
Augmented OLAP for Big Data AnalyticsAugmented OLAP for Big Data Analytics
Augmented OLAP for Big Data AnalyticsTyler Wishnoff
 
Lightning-Fast, Interactive Business Intelligence Performance with MicroStrat...
Lightning-Fast, Interactive Business Intelligence Performance with MicroStrat...Lightning-Fast, Interactive Business Intelligence Performance with MicroStrat...
Lightning-Fast, Interactive Business Intelligence Performance with MicroStrat...Tyler Wishnoff
 
How EnerKey Using InfluxDB Saves Customers Millions by Detecting Energy Usage...
How EnerKey Using InfluxDB Saves Customers Millions by Detecting Energy Usage...How EnerKey Using InfluxDB Saves Customers Millions by Detecting Energy Usage...
How EnerKey Using InfluxDB Saves Customers Millions by Detecting Energy Usage...InfluxData
 
4 C’s for Using Cloud to Support Scientific Research
4 C’s for Using Cloud to Support Scientific Research4 C’s for Using Cloud to Support Scientific Research
4 C’s for Using Cloud to Support Scientific ResearchAvere Systems
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindAvere Systems
 
Google Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better OneGoogle Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better OneDataWorks Summit
 
AWS Sydney Summit 2013 - Big Data Analytics
AWS Sydney Summit 2013 - Big Data AnalyticsAWS Sydney Summit 2013 - Big Data Analytics
AWS Sydney Summit 2013 - Big Data AnalyticsAmazon Web Services
 
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsCloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsYong Feng
 
How Analytics Teams Using SSAS Can Embrace Big Data and the Cloud
How Analytics Teams Using SSAS Can Embrace Big Data and the CloudHow Analytics Teams Using SSAS Can Embrace Big Data and the Cloud
How Analytics Teams Using SSAS Can Embrace Big Data and the CloudTyler Wishnoff
 
Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsDATAVERSITY
 
Oracle Essbase in the Cloud A Mercer Advisors Success Story
Oracle Essbase in the Cloud A Mercer Advisors Success StoryOracle Essbase in the Cloud A Mercer Advisors Success Story
Oracle Essbase in the Cloud A Mercer Advisors Success StoryPerficient, Inc.
 

Similar to Kyligence Cloud 4 - Feature Focus: Spark-Powered Cubing and Indexing (20)

Architecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High PerformanceArchitecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High Performance
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics Platform
 
Elastic Data Warehousing
Elastic Data WarehousingElastic Data Warehousing
Elastic Data Warehousing
 
The Need For Speed - Strategies to Modernize Your Data Center
The Need For Speed - Strategies to Modernize Your Data CenterThe Need For Speed - Strategies to Modernize Your Data Center
The Need For Speed - Strategies to Modernize Your Data Center
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
Augmented OLAP for Big Data Analytics
Augmented OLAP for Big Data AnalyticsAugmented OLAP for Big Data Analytics
Augmented OLAP for Big Data Analytics
 
AbhishekDullu_Res
AbhishekDullu_ResAbhishekDullu_Res
AbhishekDullu_Res
 
Lightning-Fast, Interactive Business Intelligence Performance with MicroStrat...
Lightning-Fast, Interactive Business Intelligence Performance with MicroStrat...Lightning-Fast, Interactive Business Intelligence Performance with MicroStrat...
Lightning-Fast, Interactive Business Intelligence Performance with MicroStrat...
 
AWS for HPC in Drug Discovery
AWS for HPC in Drug DiscoveryAWS for HPC in Drug Discovery
AWS for HPC in Drug Discovery
 
S&OP as a Service.pdf
S&OP as a Service.pdfS&OP as a Service.pdf
S&OP as a Service.pdf
 
How EnerKey Using InfluxDB Saves Customers Millions by Detecting Energy Usage...
How EnerKey Using InfluxDB Saves Customers Millions by Detecting Energy Usage...How EnerKey Using InfluxDB Saves Customers Millions by Detecting Energy Usage...
How EnerKey Using InfluxDB Saves Customers Millions by Detecting Energy Usage...
 
4 C’s for Using Cloud to Support Scientific Research
4 C’s for Using Cloud to Support Scientific Research4 C’s for Using Cloud to Support Scientific Research
4 C’s for Using Cloud to Support Scientific Research
 
ESGYN Overview
ESGYN OverviewESGYN Overview
ESGYN Overview
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
 
Google Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better OneGoogle Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better One
 
AWS Sydney Summit 2013 - Big Data Analytics
AWS Sydney Summit 2013 - Big Data AnalyticsAWS Sydney Summit 2013 - Big Data Analytics
AWS Sydney Summit 2013 - Big Data Analytics
 
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsCloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
 
How Analytics Teams Using SSAS Can Embrace Big Data and the Cloud
How Analytics Teams Using SSAS Can Embrace Big Data and the CloudHow Analytics Teams Using SSAS Can Embrace Big Data and the Cloud
How Analytics Teams Using SSAS Can Embrace Big Data and the Cloud
 
Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced Analytics
 
Oracle Essbase in the Cloud A Mercer Advisors Success Story
Oracle Essbase in the Cloud A Mercer Advisors Success StoryOracle Essbase in the Cloud A Mercer Advisors Success Story
Oracle Essbase in the Cloud A Mercer Advisors Success Story
 

More from SamanthaBerlant

Smashing Through Big Data Barriers with Tableau and Snowflake
Smashing Through Big Data Barriers with Tableau and SnowflakeSmashing Through Big Data Barriers with Tableau and Snowflake
Smashing Through Big Data Barriers with Tableau and SnowflakeSamanthaBerlant
 
Extreme Excel: How a 35-Year-Old Desktop App Smashed Through the Big Data Bar...
Extreme Excel: How a 35-Year-Old Desktop App Smashed Through the Big Data Bar...Extreme Excel: How a 35-Year-Old Desktop App Smashed Through the Big Data Bar...
Extreme Excel: How a 35-Year-Old Desktop App Smashed Through the Big Data Bar...SamanthaBerlant
 
Addressing the systemic shortcomings of cloud analytics
Addressing the systemic shortcomings of cloud analyticsAddressing the systemic shortcomings of cloud analytics
Addressing the systemic shortcomings of cloud analyticsSamanthaBerlant
 
SF Big Analytics Meetup - Exact Count Distinct with Apache Kylin
SF Big Analytics Meetup - Exact Count Distinct with Apache KylinSF Big Analytics Meetup - Exact Count Distinct with Apache Kylin
SF Big Analytics Meetup - Exact Count Distinct with Apache KylinSamanthaBerlant
 
Snowflake: The Good, the Bad and the Ugly
Snowflake: The Good, the Bad and the UglySnowflake: The Good, the Bad and the Ugly
Snowflake: The Good, the Bad and the UglySamanthaBerlant
 
Enhance Data Governance with Kyligence Unified Semantic Layer
Enhance Data Governance with Kyligence Unified Semantic LayerEnhance Data Governance with Kyligence Unified Semantic Layer
Enhance Data Governance with Kyligence Unified Semantic LayerSamanthaBerlant
 
How to Guarantee Exact Count Distinct Queries with Sub-Second Latency on Mass...
How to Guarantee Exact Count Distinct Queries with Sub-Second Latency on Mass...How to Guarantee Exact Count Distinct Queries with Sub-Second Latency on Mass...
How to Guarantee Exact Count Distinct Queries with Sub-Second Latency on Mass...SamanthaBerlant
 

More from SamanthaBerlant (8)

Smashing Through Big Data Barriers with Tableau and Snowflake
Smashing Through Big Data Barriers with Tableau and SnowflakeSmashing Through Big Data Barriers with Tableau and Snowflake
Smashing Through Big Data Barriers with Tableau and Snowflake
 
Extreme Excel: How a 35-Year-Old Desktop App Smashed Through the Big Data Bar...
Extreme Excel: How a 35-Year-Old Desktop App Smashed Through the Big Data Bar...Extreme Excel: How a 35-Year-Old Desktop App Smashed Through the Big Data Bar...
Extreme Excel: How a 35-Year-Old Desktop App Smashed Through the Big Data Bar...
 
Addressing the systemic shortcomings of cloud analytics
Addressing the systemic shortcomings of cloud analyticsAddressing the systemic shortcomings of cloud analytics
Addressing the systemic shortcomings of cloud analytics
 
SF Big Analytics Meetup - Exact Count Distinct with Apache Kylin
SF Big Analytics Meetup - Exact Count Distinct with Apache KylinSF Big Analytics Meetup - Exact Count Distinct with Apache Kylin
SF Big Analytics Meetup - Exact Count Distinct with Apache Kylin
 
Snowflake: The Good, the Bad and the Ugly
Snowflake: The Good, the Bad and the UglySnowflake: The Good, the Bad and the Ugly
Snowflake: The Good, the Bad and the Ugly
 
Enhance Data Governance with Kyligence Unified Semantic Layer
Enhance Data Governance with Kyligence Unified Semantic LayerEnhance Data Governance with Kyligence Unified Semantic Layer
Enhance Data Governance with Kyligence Unified Semantic Layer
 
Apache Kylin 101
Apache Kylin 101Apache Kylin 101
Apache Kylin 101
 
How to Guarantee Exact Count Distinct Queries with Sub-Second Latency on Mass...
How to Guarantee Exact Count Distinct Queries with Sub-Second Latency on Mass...How to Guarantee Exact Count Distinct Queries with Sub-Second Latency on Mass...
How to Guarantee Exact Count Distinct Queries with Sub-Second Latency on Mass...
 

Recently uploaded

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 

Kyligence Cloud 4 - Feature Focus: Spark-Powered Cubing and Indexing

  • 1. Kyligence Cloud 4 Feature Focus: Spark-Powered Cubing and Indexing Mike Shen Senior Solutions Architect
  • 2. © Kyligence Inc. 2021 Three Pillars of Modern Cloud Analytics Distributed Compute Columnar Storage Precomputation
  • 3. © Kyligence Inc. 2021 MPP Queries – Slow and Costly in the Cloud Common to Greenplum, Snowflake, Synapse, Redshift Data Volume / $$$ Resource Consumption Runtime Computation MPP Engines : Runtime Computation Data Concurrency
  • 4. © Kyligence Inc. 2021 Precomputation - Faster and Cheaper in the Cloud An Aggregate Index (a.k.a. Cube) Precomputation – Compute Once, Query Many Times Data Data Volume / $$$ Resource Consumption Runtime Computation Concurrency Precomputation
  • 5. © Kyligence Inc. 2021 Explain Precomputation with an Analogy
  • 6. © Kyligence Inc. 2021  Busy families:  Full-time working parents with work commutes  Drive kids to sport or music practices after school  Minimize outside food like pizza delivery  Help with weekday dinner preparation  Objective - fast, reliable, and healthy dinners  Popular practice in the US Meal Prep
  • 7. © Kyligence Inc. 2021  Devote a portion of weekend to prepare  Plan the meals for an entire week  Shop for groceries  Wash and chop the ingredients  Re-package ingredients into meal sized portions  On weekdays, cooking is quick and easy  Repeat every weekend How Does Meal Prep Work?
  • 8. © Kyligence Inc. 2021  Predictable Demand  Family members will come home hungry  Very little time to prepare dinner  Resource Management  Have time on weekends  More people available to help on weekends  Net labor saving with consolidated clean up  Clear Benefits  Well organized weekday evenings  Consistent healthy eating  Economic savings Key Factors
  • 9. © Kyligence Inc. 2021  Predictable Demand  Know the OLAP queries – filters, joins, and aggregations  Know when these queries will show up, such as scheduled reports  Resource Management  Control when to process the source data  Design optimized indexes  Control hardware footprint – scale a cluster to suit the workload  Clear Benefits  Queries fulfilled at lightning speeds  Scheduled reports run quickly  Analysts issue more queries and derive more business insights  Increased query load does not lead to increased cost Analogous with Precomputation
  • 10. © Kyligence Inc. 2021 Meal Prep Precomputation Technology Weekend prepping (heavy lifting) Index building (heavy lifting) Spark build* cluster Re-packaged ingredients Index files Parquet columnar format Weekday cooking (payoff) Serve user queries (payoff) Spark query* cluster * Kyligence separates build and query activities to avoid resource contention, hence different Spark clusters Direct Comparisons
  • 11. © Kyligence Inc. 2021 Kyligence Cloud 4 The First Cloud-Native OLAP Platform Cloud-native architecture Elasticity, separate compute from storage Unified Semantic Model MDX, SQL, REST access transparent to BI users Cloud immediacy Deploys in minutes, on AWS/Azure marketplace Enhanced AI engine Auto-modeling, self-tuning cubes/indexes
  • 12. © Kyligence Inc. 2021 Kyligence Architecture Unified Semantic Layer Multi-Dimensional Modeling Security & Governance Acceleration Layer Aggregate & Table Indexes AI-Augmented Engine Any Cloud Data Warehouse Data Lake Streaming Data Any Data Platform Any BI Tool Machine Learning SaaS & Apps CRM HCM SCM SQL, MDX, REST SQL, MDX, REST SQL, MDX, REST
  • 13. © Kyligence Inc. 2021 Cloud Native Architecture Separation of compute and storage • Spark compute cluster • Shared cloud object storage • Scale and optimize separately for best results and price performance • Continuity: build cluster separate from – and doesn’t disrupt – query cluster Removed legacy Hadoop requirement • MapReduce, HDFS, HBase Spark Cubing Persisted in Parquet format, Optimized for Cloud Storage High performance components • Highly parallel processing for large workloads • All tasks are executed in memory with Spark - cubing, querying, modeling, aggregation • Cubes and indexes saved in Parquet
  • 14. © Kyligence Inc. 2021 Spark Parallel Processing and Cloud Columnar Storage
  • 15. © Kyligence Inc. 2021 Spark Is Best Suited for Parallel Cubing/Indexing Operations
  • 16. © Kyligence Inc. 2021 Precomputing Everything Isn’t Practical Precomputation Total Indexes = 2 ^ Number of dimensions - 1
  • 17. © Kyligence Inc. 2021 Precomputing Everything Isn’t Practical Challenge • 50 dimensions => 1.2T Indexes • Today, 100+ dimensions are possible • Cost is always critical Solution • AI-brokered compromise
  • 18. © Kyligence Inc. 2021 Intelligent Optimization: Automatic Index Tuning User-Guided Intelligent Operations • User sets rules, system only recommends the critical business scenarios that comply • System continuously learns Benefits • Improve build, query, and storage efficiency • Lower the tuning threshold and optimize the index without professional coaching and training Custom Rules Critical Business Query Index Auto-Tuning E A, E A, B, E A, B, C, E A, B, C, D, E
  • 20. © Kyligence Inc. 2021 Contact Us Kyligence Inc  http://kyligence.io  info@kyligence.io  Twitter: @Kyligence Apache Kylin  http://kylin.apache.org  dev@kylin.apache.org  Twitter: @ApacheKylin
  • 21. © Kyligence Inc. 2020, Confidential.