Architecting Snowflake for High Concurrency and High Performance

Architecting Snowflake for High Concurrency
and High Performance
Robert Hardaway
Sr. Solutions Architect, Kyligence

Agenda
 Snowflake, Scaling, and Concurrency
 Kyligence Overview – Built for Scale
 Query Optimization - Pre-Compute
 Unified Semantic – Business Expression
 AI Augmentation – Adaptive Learning
 DEMO

© Kyligence Inc. 2020, Confidential.
Snowflake Is on Fire
Net Retention Rate 158%
NPS 71
Market Cap $105B
YOY Growth 121%

The First Cloud Native Analytics Platform
 Separates Storage from Compute
 Elastic Scale
 Scale-up
 Scale-out
 Load-and-go design
Why People Like Snowflake

Snowflake Coarse-Grained Hardware Scaling
Scale Up for High Data Volumes Scale Out for Higher Concurrency

What if…
…you could achieve predictable,
sub-second response times:
• For 90+% of your queries
• Against petabytes of data
• Supporting 100s -1,000s of concurrent users

Pre-computation delivers predictable results: Lookups vs. Computation
On-line Computation
O (N)
O (1)
Data Volume
Response Time
Precomputation

The Basic Process of Query
Sort
Agg
Filter
Table
Join
Table
No pre-calculation, all calculations
are made on-site

Querying Precomputed Results
Sort
Cube
Filter
Sort
Agg
Filter
Tables
Join
Pre-aggregated data
Table
Pre-calculation is available, and
results are based on less I/O,
less calculation, and lower
delay/latency.

Kyligence Overview
- Built for Scale

Apache Kylin
Top Level Project
 The only open-source distributed
OLAP platform
Award Winning
 InfoWorld’s Bossies 2015 & 2016
(Best of Open Source Software Awards)
Sub-Second Interactive Query
 Large scale, high concurrency, multi-
dimensional, sub-second query latency
1,000+ Organizations
 Adopted by thousands of
organizations globally

Kyligence Cloud Architecture
Unified Semantic Layer
Data as a Service
Machine Learning
Data as a Service
SaaS & Apps
CRM
HCM
SCM
Low Latency Queries
Any BI Tool
Any Cloud
Any Data Platform
Data Warehouse
Streaming Data
Data Lake
AI Augmented Engine
Precomputation Layer
Distributed Cubes
& Table Indexes
Multi-Dimensional
Modeling
Security &
Governance
Finance
Marketing
Sales

Modern OLAP
Large cubes in a single machine
Cubes distributed in
cluster
One logical cube
Processed by
distributed framework

Pre-computation: Flattening the Big-O Curve
On-line Computation
O (N)
O (1)
Pre-computation
Data Volume
Response Time
• MPP
• Inverted Index
• Memory DB
Functional Features:
• Ensures predictable query performance
• Does not falter with surge in data volume
• One-time computation for multiple use
• Optimizes resources usage
Functional Values:
• Provides stable high-performance and high-
concurrency services
• Saves resources and development costs

Query Optimization
- Pre-Compute

Query Routing & Prioritization
• Maximizes overall system performance
• Multiple data source support
• High-performance columnar storage engine
• Distributed query engine
• Comprehensive Raw Query（Table Index）

Introducing the Cuboid
time, item
time, item, location
time, item, location, supplier
time item location supplier
time, location
Time, supplier
item, location
item, supplier
location, supplier
time, item, supplier
time, location, supplier
item, location, supplier
0-D(apex) cuboid
1-D cuboids
2-D cuboids
3-D cuboids
4-D(base) cuboid
• Cuboid = one combination of dimensions
• Cube = all combination of dimensions
(all cuboids)
OLAPCube

Unified Semantic
- Business Expression

Unified Business Definitions
Unified Data Sources Data Modeling
Business
Semantics
Self-Service
Analysis
 Integrate data sources from
Cloud DW, Data Lake,
RDBMS
 Intelligent Route Queries to
the best engine based on
queries
 Generate data model by Learn
from SQL pattern
 Intelligently recommend
dimensions and measures
 Identify critical business
scenarios and recommend
indexes
Sum(revenue)
Sales MTD
Sales YTD
Sales LM
Sales LY
Sales YOY
Sales MOM
 Business user can define
business metrics that
translate underlying data
into business language
Ad-hoc analysis
Business user
Marketing Sales Finance
 Business can analyze based on
unified dataset
 Query will fully reuse the pre-
calculation data model, boosting
analytic performance

Enterprise Governance
• Rich Vocabulary
• Single Source of Truth
• Accessible Audit
• Governance Platform Integration (REST)

AI Augmentation
- Adaptive Learning

Self-Learning
• Query history in summary
• Intelligent modeling and indexing
• One-click query acceleration
• Track usage to help manage models
• Make resource decisions based on facts
Learning to Improve Performance

Adaptive Model
• The model quickly adapts with business
• Unified semantic definitions
• Unlimited dimensions and measures
Learning to Reduce Operational Effort

Kyligence Cloud: Extreme Analytics
Supports All Data Platforms
• Data Warehouse
• Data Lake
• Streaming Data
• Cloud Storage
Intelligent Precomputation
• Multi-dimensional cubes and
table Indexes
• AI-assisted query optimization,
data modeling
Supports All BI Tools
• Excel
• Power BI
• Tableau
• MicroStrategy
Massive Concurrency
• 1,000s+ concurrent queries
• 10s-100s analysts running
multi-threaded programs
Powered by Apache Kylin

Journey of Apache Kylin
Sept 2013 Oct 2014 Nov 2014 Sept 2015 Nov 2015 Mar 2016
Officially
Open Source
Project
Initiated
Apache
Incubator Project
InfoWorld
Best Open Source
Big Data Tool Award Kyligence Inc.
Founded
Apache Top-Level
Project

Kyligence = Kylin + Intelligence
Mission: Develop an AI-Assisted
SQL/OLAP Platform for Interactive
Analytics at Unprecedented Scale
• Founded in 2016 by the creators of Apache Kylin
• CRN Top-10 big data startups in 2018
• Global Presence: San Jose, Shanghai, Beijing, Seattle, New York
• VCs: Fidelity International, Shunwei Capital, Broadband Capital,
Coatue, Redpoint, Cisco

Contact Us
Kyligence Inc
 http://kyligence.io
 info@kyligence.io
 Twitter: @Kyligence
Apache Kylin
 http://kylin.apache.org
 dev@kylin.apache.org
 Twitter: @ApacheKylin

Architecting Snowflake for High Concurrency and High Performance

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Architecting Snowflake for High Concurrency and High Performance

Similar to Architecting Snowflake for High Concurrency and High Performance (20)

More from SamanthaBerlant

More from SamanthaBerlant (6)

Recently uploaded

Recently uploaded (20)

Architecting Snowflake for High Concurrency and High Performance