SlideShare a Scribd company logo
Snowflake MasterClass
What will you learn?
Getting Started
Architecture
Unstructured Data
Performance
Load from AWS
Snowpipe
Time Travel
Fail Safe
Zero-Copy Cloning
Table Types
Data Sharing
Data Sampling
Scheduling Tasks
Streams
Materialized views
Data Masking
Visualizations
Partner Connect
Best practices
Loading Data
Copy Options Access Management
Load from Azure
Load from GCP
Contents
Getting Started 4 Loading Data 44 Data Sampling 117
Snowflake Architecture 7 Performance Optimization 60 Tasks & Streams 124
Multi-Clustering 11 Snowpipe 88 Materialized Views 142
Data Warehousing 17 Fail Safe and Time Travel 92 Data Masking 155
Cloud Computing 26 Table Types 97 Access Control 159
Snowflake Editions 31 Zero Copy Cloning 101 Snowflake & Other
Tools
182
Snowflake Pricing 34 Swapping 104 Best Practices 186
Snowflake Roles 41 Data Sharing 109
Getting Started
How you can get the
most out of this course?
Make use of the
udemy tools
Own pace
Practice
Help others
Resources, quizzes &
assignements
Ask questions
Enjoy learning!
Best practices
Make use of the
udemy tools
Own pace
Practice
Help others
Resources, quizzes &
assignements
Ask questions
Enjoy learning!
✓ Pay only for what you use
Snowflake Architecture
Snowflake Architecture
STORAGE
QUERY PROCESSING
CLOUD SERVICES
Virtual
Warehouse
Virtual
Warehouse
Virtual
Warehouse
Snowflake Architecture
- Brain of the system -
Managing infrastructure, Access control, security,
Optimizier, Metadata etc.
- Muscle of the system -
Performs MMP (Massive Parallel Processing)
- Hybrid Columnar Storage -
Saved in blobs
Virtual Warehouse Sizes
XS
S
M
L
1
2
4
8
16
XL
4XL 128
Multi-Clustering
Multi-Clustering
S
S
S
… More queries …
Multi-Clustering
S
S
S
… More queries …
> Auto-Scaling
Multi-Clustering
S
Queue
Auto-Scaling: When to start an additional cluster?
Scaling policy
Standard
Favors starting
additional
warehouses
Economy
Favors conserving
credits rather than
starting additional
warehouses
Scaling policy
Policy Description Cluster Starts… Cluster Shuts Down…
Standard (default)
Prevents/minimizes queuing by
favoring starting additional
clusters over conserving credits.
Immediately when either a query
is queued or the system detects
that there are more queries than
can be executed by the currently
available clusters.
After 2 to 3 consecutive
successful checks
(performed at 1 minute intervals),
which determine whether the load on
the least-loaded cluster could be
redistributed to the other clusters
Economy
Conserves credits by favoring
keeping running clusters fully-
loaded rather than starting
additional clusters,
Result: May result in queries
being queued and taking longer
to complete.
Only if the system estimates
there’s enough query load to keep
the cluster busy for at least 6
minutes.
After 5 to 6 consecutive
successful checks …
Data Warehousing
What is a data warehouse?
What is a data warehouse?
Skip the lecture if you are already familiar
with data warehouses
What is a data warehouse?
What is the purpose of a data warehouse?
What is a data warehouse?
= Database that is used for reporting and
data analysis
Different layers
Staging
area
Data
Transformation
HR data
sales data
Production
What is a data warehouse?
HR data
sales data
ETL
data warehouse
ETL = Extract, Transform & Load
Different layers
Data scources
Raw
data
Data
integration
Access layer
Data Science
Reporting
Other Apps
Different layers
Staging
area
Data
Transformation
Cloud Computing
Cloud Computing
Why Cloud Computing?
Cloud Computing
Data Center
• Infrastucture
• Security
• Electricity
• Software/Hardware upgrades
MANAGED
Software-as-a-Service
Cloud Computing
Application
Software
Data
Physical servers
Virtual machines
Physical storage
Operating System
Software-as-a-service
Databases, tables etc.
Cloud provider
AWS, Azure, GCP
Snowflake
Managing data storage,
Virtual warehouses,
Upgrades/Metadata etc.
Cloud Computing
Application
Software
Data
Physical servers
Virtual machines
Physical storage
Operating System
Software-as-a-service
Customer
Creating tables etc.
Cloud provider
AWS, Azure, GCP
Snowflake
Managing data storage,
Virtual warehouses,
Upgrades/Metadata etc.
Snowflake Editions
Snowflake Editions
introductory level
Standard Business
Critical
even higher levels of data
protection for
organizations with
extremely
sensitive data
Enterprise
additional features
for the needs of
large-scale
enterprises
Virtual Private
highest level of
security
Snowflake Editions
Standard Business
Critical
Enterprise Virtual Private
✓ Complete DWH
✓ Automatic data
encryption
✓ Time travel up to 1 day
✓ Disaster recovery for 7
days beyond time travel
✓ Secure data share
✓ Premier support 24/7
✓ All Standard features
✓ Multi-cluster warehouse
✓ Time travel up to 90
days
✓ Materialized views
✓ Search Optimization
✓ Column-level security
✓ All Enterprise features
✓ Additional security
features such as Data
encryption everywhere
✓ Extended support
✓ Database failover and
disaster recovery
✓ All Business Ciritcal
features
✓ Dedicated virtual
servers and completely
seperate Snowflake
environment
Snowflake Pricing
Snowflake Pricing
Compute Storage
✓ Charged for active warehouses per hour
✓ Depending on the size of the warehouse
✓ Billed by second (minimum of 1min)
✓ Charged in Snowflake credits
✓ Monthly storage fees
✓ Based on average storage used per month
✓ Cost calculated after compression
✓ Cloud Providers
Snowflake Pricing
Compute Storage
$/€ Credits
Consumed
Snowflake Pricing
Standard Business
Critical
Enterprise Virtual Private
✓ $2.70 / Credit ✓ $4 / Credit ✓ $5.40 / Credit ✓ Contact Snowflake
Region: EU (Frankfurt)
Platform: AWS
Virtual Warehouse Sizes
XS
S
M
L
1
2
4
8
16
XL
4XL 128
Snowflake Pricing
On Demand
Storage
Capacity
Storage
✓ We think we need 1 TB of storage
Region: US East (Northern Virginia)
Platform: AWS
❖ Scenario 1: 100GB of storage used
0.1 TB x $40 = $4
❖ Scenario 1: 100GB of storage used
1 TB x $23 = $23
❖ Scenario 2: 800GB of storage used
0.8 TB x $40 = $32
❖ Scenario 2: 800GB of storage used
0.8 TB x $40 = $23
Snowflake Pricing
On Demand
Storage
Capacity
Storage
✓ Start with On Demand
✓ Once you are sure about your usage use
Capacity storage
Snowflake Roles
Snowflake Roles
ACCOUNTADMIN
SECURITYADMIN SYSADMIN
USERADMIN
PUBLIC
Custom Role 1 Custom Role 2
Custom Role 3
Snowflake Roles
ACCOUNTADMIN SECURITYADMIN SYSADMIN USERADMIN PUBLIC
✓ SYSADMIN and
SECURITYADMIN
✓ top-level role in the
system
✓ should be granted only
to a limited number of
users
✓ USERADMIN role is
granted to
SECURITYADMIN
✓ Can manage users
and roles
✓ Can manage any
object grant globally
✓ Create warehouses
and databases (and
more objects)
✓ Recommended that
all custom roles are
assigned
✓ Dedicated to user
and role
management only
✓ Can create users and
roles
✓ Automatically
granted to every
user
✓ Can create own
objects like every
other role (available
to every other
user/role
Loading Data
Loading Data
BULK
LOADING
CONTINUOUS
LOADING
✓ Most frequent method
✓ Uses warehouses
✓ Loading from stages
✓ COPY command
✓ Transformations possible
✓ Designed to load small volumes of data
✓ Automatically once they are added to stages
✓ Lates results for analysis
✓ Snowpipe (Serverless feature)
Understanding Stages
✓ Not to be confused with dataware house
stages
✓ Location of data files where data can be loaded from
External
Stage
Internal
Stage
Understanding Stages
External
Stage
Internal
Stage
✓ External cloud provider
▪ S3
▪ Google Cloud Plattform
▪ Microsoft Azure
✓ Database object created in Schema
✓ CREATE STAGE (URL, access settings)
Note: Additional costs may apply
if region/platform differs
✓ Local storage maintained
by Snowflake
Copy Options
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
FILE_FORMAT = <file_format_name>
copyOptions
Copy Options
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
FILE_FORMAT = <file_format_name>
ON_ERROR = CONTINUE
Copy Options
✓ Validate the data files instead of loading them
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
FILE_FORMAT = <file_format_name>
VALIDATION_MODE = RETURN_n_ROWS | RETURN_ERRORS
Copy Options
✓ Validate the data files instead of loading them
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
FILE_FORMAT = <file_format_name>
VALIDATION_MODE = RETURN_n_ROWS | RETURN_ERRORS
RETURN_n_ROWS (e.g. RETURN_10_ROWS) Validates & returns the specified number of rows;
fails at the first error encountered
RETURN_ERRORS Returns all errors in Copy Command
Copy Options
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
FILE_FORMAT = <file_format_name>
SIZE_LIMIT = num
✓ Specify maximum size (in bytes) of data loaded in that command (at least one file)
✓ When the threshold is exceeded, the COPY operation stops
loading
Copy Options
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
FILE_FORMAT = <file_format_name>
RETURN_FAILED_ONLY = TRUE | FALSE
✓ Specifies whether to return only files that have failed to load in the statement result
✓ DEFAULT = FALSE
Copy Options
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
FILE_FORMAT = <file_format_name>
TRUNCATECOLUMNS = TRUE | FALSE
✓ Specifies whether to truncate text strings that exceed the target column length
Copy Options
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
FILE_FORMAT = <file_format_name>
FORCE = TRUE | FALSE
✓ Specifies to load all files, regardless of whether they’ve been loaded previously and
have not changed since they were loaded
✓ Note that this option reloads files, potentially
duplicating data in a table
Copy Options
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
FILE_FORMAT = <file_format_name>
TRUNCATECOLUMNS = TRUE | FALSE
✓ Specifies whether to truncate text strings that exceed the target column length
✓ TRUE = strings are automatically truncated to the target column length
✓ FALSE = COPY produces an error if a loaded string exceeds the target column length
✓ DEFAULT = FALSE
Copy Options
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
FILE_FORMAT = <file_format_name>
SIZE_LIMIT = num
✓ Specify maximum size (in bytes) of data loaded in that command (at least one file)
✓ When the threshold is exceeded, the COPY operation stops
loading
✓ Threshold for each file
✓ DEFAULT: null (no size limit)
Copy Options
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
FILE_FORMAT = <file_format_name>
PURGE = TRUE | FALSE
✓ DEFAULT: FALSE
✓ specifies whether to remove the data files from the stage
automatically after the data is loaded successfully
Load unstructured data
Create Stage
Load raw data
Analyse & Parse
Flatten & Load
Type VARIANT
Performance Optimization
Performance Optimization
Performance Optimization
✓ Add indexes, primary keys
✓ Create table partitions
✓ Analyze the query execution table plan
✓ Remove unnecessary full table scans
Performance Optimization
✓ Add indexes, primary keys
✓ Create table partitions
✓ Analyze the query execution table plan
✓ Remove unnecessary full table scans
How does it work in Snowflake?
✓ Automatically managed micro-
partitions
What is our job?
✓ Assigning appropriate data types
✓ Sizing virtual warehouses
✓ Cluster keys
Performance aspects
Dedicated virtual
warehouses
Scaling
Up
Maximize
Cache Usage
✓ Separated according to different workloads ✓ For known patterns of high work load
✓ Dynamically fo unknown patterns of work load ✓ Automatic caching can be maximized
Scaling
Out
Cluster
Keys
✓ For large tables
Data scources
ETL/ELT
Database administrators
Reporting
Marketing
Data Science
BI
Dedicated virtual warehouse
Dedicated virtual warehouse
Identify &
Classify
✓ Identify & Classify groups of workload/users
✓ For every class of workload & assign users
Create dedicated
virtual
warehouses
✓ BI Team, Data Science Team, Marketing department
Considerations
Not too many
VW
✓ Avoid underutilization
✓ Work patterns can change
Refine
classifications
Considerations
✓ If you use at least Entripse Edition all warehouses should be Multi-Cluster
✓ Minimum: Default should be 1
✓ Maximum: Can be very high
Let's practice!
How does it work in Snowflake?
Data scources
ETL/ELT
Scaling Up/Down
✓ Changing the size of the virtual warehouse
depending on different work loads in different periods
✓ ETL at certain times (for example between 4pm and 8pm)
✓ Special business event with more work load
Use cases
✓ NOTE: Common scenario is increased query complexity
NOT more users (then Scaling out would be better)
Scaling Out
Scaling Up Scaling Out
Increasing the size of virtual
warehouses
Using addition warehouses/ Multi-
Cluster warehouses
More complex query More concurrent users/queries
Scaling Out
✓ Handling performance related to large numbers of concurrent users
✓ Automation the process if you have fluctuating number of users
Caching
✓ Automatical process to speed up the queries
✓ If query is executed twice, results are cached and can be re-used
✓ Results are cached for 24 hours or until underlaying data has changed
What can we do?
✓ Ensure that similar queries go on the same warehouse
✓ Example: Team of Data Scientists run similar queries, so they should all use
the same warehouse
✓ In general Snowflake produces well-clustered tables
✓ Cluster keys are not always ideal and can change over time
Clustering in Snowflake
✓ Snowflake automatically maintains these cluster keys
✓ Manually customize these cluster keys
What is a cluster key?
✓ Subset of rows to locate the data in micro-partions
✓ For large tables this improves the scan efficiency in our queries
Event Date Event ID Customers City
2021-03-12 134584 … …
2021-12-04 134586 … …
2021-11-04 134588 … …
2021-04-05 134589 … …
2021-06-07 134594 … …
2021-07-03 134597 … …
2021-03-04 134598 … …
2021-08-03 134599 … …
2021-08-04 134601 … …
What is a cluster key?
Event Date Event ID Customers City
2021-03-12 134584 … …
2021-12-04 134586 … …
2021-11-04 134588 … …
2021-04-05 134589 … …
2021-06-07 134594 … …
2021-07-03 134597 … …
2021-03-04 134598 … …
2021-08-03 134599 … …
2021-08-04 134601 … …
What is a cluster key?
Event Date Event ID Customers City
2021-03-12 134584 … …
2021-12-04 134586 … …
2021-11-04 134588 … …
2021-04-05 134589 … …
2021-06-07 134594 … …
2021-07-03 134597 … …
2021-03-04 134598 … …
2021-08-03 134599 … …
2021-08-04 134601 … …
Event Date Event ID Customers City
2021-03-12 134584 … …
2021-12-04 134586 … …
2021-11-04 134588 … …
2021-04-05 134589 … …
2021-06-07 134594 … …
2021-07-03 134597 … …
2021-03-04 134598 … …
2021-08-03 134599 … …
2021-08-04 134601 … …
1
2
3
Event Date Event ID Customers City
2021-03-12 134584 … …
2021-12-04 134586 … …
2021-11-04 134588 … …
2021-04-05 134589 … …
2021-06-07 134594 … …
2021-07-03 134597 … …
2021-03-04 134598 … …
2021-08-03 134599 … …
2021-08-04 134601 … …
Event Date Event ID Customers City
2021-03-12 134584 … …
2021-12-04 134586 … …
2021-11-04 134588 … …
2021-04-05 134589 … …
2021-06-07 134594 … …
2021-07-03 134597 … …
2021-03-04 134598 … …
2021-08-03 134599 … …
2021-08-04 134601 … …
What is a cluster key?
SELECT COUNT(*)
WHERE Event_Date > '2021-07-01'
AND Event_Date < '2021-08-01 '
Event Date Event ID Customers City
2021-03-12 134584 … …
2021-12-04 134586 … …
2021-11-04 134588 … …
2021-04-05 134589 … …
2021-06-07 134594 … …
2021-07-03 134597 … …
2021-03-04 134598 … …
2021-08-03 134599 … …
2021-08-04 134601 … …
Event Date Event ID Customers City
2021-03-12 134584 … …
2021-12-04 134586 … …
2021-11-04 134588 … …
2021-04-05 134589 … …
2021-06-07 134594 … …
2021-07-03 134597 … …
2021-03-04 134598 … …
2021-08-03 134599 … …
2021-08-04 134601 … …
What is a cluster key?
SELECT COUNT(*)
WHERE Event_Date > '2021-07-01'
AND Event_Date < '2021-08-01'
All partitions need to be scanned!
What is a cluster key?
Event Date Event ID Customers City
2021-03-12 134584 … …
2021-12-04 134586 … …
2021-11-04 134588 … …
2021-04-05 134589 … …
2021-06-07 134594 … …
2021-07-03 134597 … …
2021-03-04 134598 … …
2021-08-03 134599 … …
2021-08-04 134601 … …
Event Date Event ID Customers City
2021-03-12 134584 … …
2021-12-04 134586 … …
2021-11-04 134588 … …
2021-04-05 134589 … …
2021-06-07 134594 … …
2021-07-03 134597 … …
2021-03-04 134598 … …
2021-08-03 134599 … …
2021-08-04 134601 … …
1
2
3
Event Date Event ID Customers City
2021-03-12 134584 … …
2021-12-04 134586 … …
2021-11-04 134588 … …
2021-04-05 134589 … …
2021-06-07 134594 … …
2021-07-03 134597 … …
2021-03-04 134598 … …
2021-08-03 134599 … …
2021-08-04 134601 … …
Event Date Event ID Customers City
2021-03-04 134598 … …
2021-03-12 134584 … …
2021-04-05 134589 … …
2021-06-07 134594 … …
2021-07-03 134597 … …
2021-08-03 134599 … …
2021-08-04 134601 … …
2021-11-04 134588 … …
2021-12-04 134586 … …
✓ Mainly very large tables of multiple terabytes can benefit
When to cluster?
✓ Clustering is not for all tables
✓ If you typically use filters on two columns then the table can also benefit
from two cluster keys
✓ Column that is frequently used in Joins
How to cluster?
✓ Columns that are used most frequently in WHERE-clauses
(often date columns for event tables)
✓ Large enough number of distinct values to enable effective grouping
Small enough number of distinct values to allow effective grouping
Clustering in Snowflake
CREATE TABLE <name> ... CLUSTER BY ( <column1> [ , <column2> ... ] )
ALTER TABLE <name> CLUSTER BY ( <expr1> [ , <expr2> ... ] )
CREATE TABLE <name> ... CLUSTER BY ( <expression> )
ALTER TABLE <name> DROP CLUSTERING KEY
✓ If you typically use filters on two columns then the table can also benefit
from two cluster keys
✓ Column that is frequently used in Joins
Clustering in Snowflake
✓ Columns that are used most frequently in WHERE-clauses
(often date columns for event tables)
✓ Large enough number of distinct values to enable effective grouping
Small enough number of distinct values to allow effective grouping
Snowpipe
What is Snowpipe?
✓ Enables loading once a file appears in a bucket
✓ If data needs to be available immediately for analysis
✓ Snowpipe uses serverless features instead of
warehouses
Snowpipe
S3 bucket
Serverless
Load
S3 notification
COPY
Snowflake DB
Setting up Snowpipe
Create Stage
✓ To make sure it works
✓ To trigger snowpipe
Test COPY COMMAND
Create Pipe
✓ To have the connection
S3 Notification
✓ Create pipe as object with COPY COMMAND
Fail Safe and Time Travel
Time Travel
Standard Business
Critical
Enterprise Virtual Private
✓ Time travel up to 1 day ✓ Time travel up to 90
days
✓ Time travel up to 90
days
✓ Time travel up to 90
days
RETENTION
PERIODE
DEFAULT = 1
Fail Safe
✓ Protection of historical data in case of
disaster
✓ Non-configurable 7-day period for permanent
tables
✓ No user interaction & recoverable only by
Snowflake
✓ Period starts immediately after Time Travel period
ends
✓ Contributes to storage cost
Fail Safe
✓ Access and query data etc.
Current
Data Storage
Continuous Data Protection Lifecycle
✓ Access and query data etc.
etc.
Current
Data Storage
✓ SELECT … AT | BEFORE
UNDROP
Time Travel
(1 – 90 days)
✓ No user
operations/queries
Fail Safe
(transient: 0 days
permanent: 7 days)
✓ Restoring only by snowflake support
✓ Recovery beyond Time Travel
Table Types
Table types
Transient
Permanent
✓ Time Travel Retention Period
0 – 90 days
✓ Fail Safe
CREATE TRANSIENT TABLE
CREATE TABLE
✓ Time Travel Retention Period
0 – 1 day
× Fail Safe
Temporary
CREATE TEMPORARY TABLE
× Fail Safe
Only in session
Until dropped
Until dropped
✓ Time Travel Retention Period
0 – 1 day
Only for data that does not
need to be protected
Non-permanent
data
Permanent data
Table types
Transient
Permanent
✓ Time Travel Retention Period
0 – 90 days
✓ Fail Safe
CREATE TRANSIENT TABLE
CREATE TABLE
✓ Time Travel Retention Period
0 – 1 day
× Fail Safe
Temporary
CREATE TEMPORARY TABLE
× Fail Safe
Only in session
Until dropped
Until dropped
✓ Time Travel Retention Period
0 – 1 day
Only for data that does not
need to be protected
Non-permanent
data
Permanent data
Managing Storage Cost
Table types notes
✓ Types are also available for other
database objects (database, schema
etc.)
✓ For temporary table no naming conflicts
with permanent/transient tables!
Other tables will be effectively hidden!
Zero Copy Cloning
Zero-Copy Cloning
✓ Create copies of a database, a schema or a
table
✓ Cloned object is independent from original
table
✓ Easy to copy all meta data & improved storage
management
✓ Creating backups for development purposes
✓ Works with time travel also
Zero-Copy Cloning
CREATE TABLE <table_name> ...
CLONE <source_table_name>
BEFORE ( TIMESTAMP => <timestamp> )
Swapping
Swapping Tables
✓ Use-case: Development table into production
table
Development Production
Meta data
Meta data
Swap
Swapping
ALTER TABLE <table_name> ...
SWAP WITH <target_table_name>
Swapping
ALTER SCHEMA <schema_name> ...
SWAP WITH <target_schema_name>
Swaping Tables
✓ Create copies of a database, a schema or a
table
Original
Copy
Meta data operation
Data Sharing
Data Sharing
✓ Usually this can be also a rather complicated
process
✓ Data sharing without actual copy of the data &
uptodate
✓ Shared data can be consumed by the own compute
resources
✓ Non-Snowflake-Users can also access through a
reader account
Data Sharing
Producer
Account 1
Read-only
Account 2
Consumer
Data Sharing
Account 2
Account 1
Read-only
Account 2
Compute Resources
Data Sharing
Account 1
Reader Account
Own Compute
Resources
Data Sharing
Account 1
Reader Account
Own Compute
Resources
Data Sharing with
Non Snowflake Users
Account 1
Reader Account
Own Compute
Resources
Sharing with
Non Snowflake users
✓ Indepentant instance with
own url & own compute resources
New Reader
Account
Share data ✓ Share database & table
✓ In reader account create database from share
Create database
✓ As administrator create user & roles
Create Users
Data Sampling
Data Sampling
10 TB
SAMPLE
Why Sampling?
500 GB
Data Sampling
- Use-cases: Query development, data analysis etc.
Why Sampling?
- Faster & more cost efficient (less compute resources)
Data Sampling
10 TB
SAMPLE
Why Sampling?
500 GB
Data Sampling Methods
ROW or BERNOULLI method
BLOCK or SYSTEM method
Data Sampling Methods
ROW or BERNOULLI method BLOCK or SYSTEM method
Every row is chosen with percentage p Every block is chosen with percentage p
More effective processing
More "randomness"
Smaller tables Larger tables
Data Sampling
Account 1
Reader Account
Own Compute
Resources
Tasks & Streams
Scheduling Tasks
✓ Tasks can be used to schedule SQL statements
✓ Standalone tasks and trees of tasks
Understand
tasks
Create tasks Schedule tasks
Tree of tasks Check task
history
Tree of Tasks
Root task
Task A Task B
Task C Task D Task E Task F
✓ Every task has one parent
Tree of Tasks
ALTER TASK ...
ADD AFTER <parent task>
CREATE TASK ...
AFTER <parent task>
AS …
Streams
Table
Stream object
Streams
Table
Stream object
DELETE
INSERT
UPDATE
Streams
Table
Stream object
Streams
Table
Stream object
METADATA$ACTION
METADATA$UPDATE
METADATA$ROW_ID
Streams
SELECT * FROM <stream name>
CREATE STREAM <stream name>
ON TABLE <table name>
Streams
Table
Stream object
INSERT
Streams
Data scources
ETL
Streams
Data scources
Raw
data
Data
integration
Access layer
Data Science
Reporting
Other Apps
Streams
Data scources
Raw
data
Data
integration
Access layer
Data Science
Reporting
Other Apps
Streams
Data scources
Raw
data
Data
integration
Access layer
Data Science
Reporting
Other Apps
Streams
Data scources
Raw
data
Data
integration
Access layer
Data Science
Reporting
Other Apps
Streams
Data scources
Raw
data
Data
integration
Access layer
Data Science
Reporting
Other Apps
Object that records (DML-)changes made
to a table
This process is called change data capture
(CDC)
Types of streams
STANDARD APPEND-ONLY
✓ INSERT
✓ UPDATE
✓ DELETE
✓ INSERT
Syntax
CREATE STREAM <stream name>
ON TABLE <table name>
APPEND_ONLY = TRUE
Materialized Views
Materialized views
✓ We have a view that is queried frequently
and that a long time to be processed
× Bad user experience
× More compute consumption
Materialized views
✓ We have a view that is queried frequently
and that a long time to be processed
✓ We can create a materialized view to solve
that problem
What is a materialized view?
✓ Use any SELECT-statement to create this MV
✓ Results will be stored in a seperate table
and this will be updated automatically
based on the base table
When to use MV?
✓ Benefits
✓ Maintenance costs
When to use MV?
✓ View would take a long time to be processed
and is used frequently
✓ Underlaying data is change not frequently
and on a rather irregular basis
When to use MV?
If the data is updated on a very regular basis…
✓ Using tasks & streams could be a better alternative
Alternative – streams & tasks
Underlaying Table
Stream object
TASK with MERGE
VIEW / TABLE
When to use MV?
✓ Don't use materialized view if data changes are very
frequent
✓ Keep maintenance cost in mind
✓ Considder leveraging tasks (& streams) instead
Limitations
Only available for Enterprise edition
Limitations
× Joins (including self-joins) are not supported
× Limited amount of aggregation functions
Limitations
× Joins (including self-joins) are not supported
× Limited amount of aggregation functions
APPROX_COUNT_DISTINCT (HLL).
AVG (except when used in PIVOT).
BITAND_AGG.
BITOR_AGG.
BITXOR_AGG.
COUNT.
MIN.
MAX.
STDDEV.
STDDEV_POP.
STDDEV_SAMP.
SUM.
VARIANCE (VARIANCE_SAMP, VAR_SAMP).
VARIANCE_POP (VAR_POP).
Limitations
× Joins (including self-joins) are not supported
× Limited amount of aggregation functions
× UDFs
× HAVING clauses.
× ORDER BY clause.
× LIMIT clause
Data Masking
Data Masking
Data Masking
Data Masking
Column-level Security
Access Control
Access Control
✓ Who can access and perform operations
on objects in Snowflake
✓ Two aspects of access control combined
Access Control
Discretionary
Access Control
(DAC)
Role-based
Access Control
(RBAC)
✓ Each object has an owner who can
grant access to that object
✓ Access privileges are assigned to
roles, which are in turn assigned
to users
Access Control
Role 1
Creates
Table
Owns
Privilege
Role 2
Role 3
User 1
User 2
User 3
GRANT <privilege>
ON <obeject>
TO <role>
GRANT <role>
TO <user>
Securable objects
Account
User Role Database Warehouse Other Account
objects
Schema
Table View Stage Integration Other Schema
objects
Access Control
✓ Every object owned by a single role (multiple users)
✓ Owner (role) has all privileges per default
Key concepts
USER
ROLE
PRIVILEGE
SECURABLE OBJECT
✓ People or systems
✓ Entity to which privileges are granted
(role hierarchy)
✓ Level of access to an object
(SELECT, DROP, CREATE etc.)
✓ Objects to which privileges can be
granted
(Database, Table, Warehouse etc.)
Snowflake Roles
ACCOUNTADMIN
SECURITYADMIN SYSADMIN
USERADMIN
PUBLIC
Snowflake Roles
ACCOUNTADMIN
SECURITYADMIN SYSADMIN
USERADMIN
PUBLIC
Custom Role 1 Custom Role 2
Custom Role 3
Snowflake Roles
ACCOUNTADMIN SECURITYADMIN SYSADMIN USERADMIN PUBLIC
✓ SYSADMIN and
SECURITYADMIN
✓ top-level role in the
system
✓ should be granted only
to a limited number of
users
✓ USERADMIN role is
granted to
SECURITYADMIN
✓ Can manage users
and roles
✓ Can manage any
object grant globally
✓ Create warehouses
and databases (and
more objects)
✓ Recommended that
all custom roles are
assigned
✓ Dedicated to user
and role
management only
✓ Can create users and
roles
✓ Automatically
granted to every
user
✓ Can create own
objects like every
other role (available
to every other
user/role
ACCOUNTADMIN
ACCOUNTADMIN
SECURITYADMIN SYSADMIN
USERADMIN
PUBLIC
ACCOUNTADMIN
Top-Level-Role
✓ Manage & view all objects
✓ All configurations on account level
✓ Account operations
(create reader account, billing
etc.)
✓ First user will have this role assigned
Best practises
✓ Very controlled assignment strongly recommended!
✓ Multi-factor authentification
✓ At least two users should be assigned to that role
✓ Avoid creating objects with that role unless you have
to
✓ Initial setup & managing account level
objects
ACCOUNTADMIN
ACCOUNTADMIN
✓ Account admin tab
✓ Billing & Usage
✓ Reader Account
✓ Multi-Factor Authentification
✓ Create other users
ACCOUNTADMIN
✓ USERADMIN role is
granted to
SECURITYADMIN
✓ Can manage users
and roles
✓ Can manage any
object grant globally
SECURITYADMIN
Sales Admin Role
Sales Role
HR Admin Role
HR Role
SECURITYADMIN
Custom Role 1
Custom Role 3
SYSADMIN
✓ Create & manage objects
✓ Create & manage warehouses,
databases, tables etc.
✓ Custom roles should be assigned to the
SYSADMIN role as the parent
Then this role also has the ability to grant privileges on
warehouses, databases, and other objects to the custom
roles.
SYSADMIN
Sales Admin Role
Sales Role
HR Admin Role
HR Role
✓ Create a virtual warehouse & assign it to the custom roles
✓ Create a database and table & assign it to the custom
roles
Custom roles
Sales Admin Role
Sales Role
HR Admin Role
HR Role
✓ Customize roles to our needs & create own hierarchies
✓ Custom roles are usually created by SECURITYADMIN
✓ Should be leading up to the SYSADMIN role
USERADMIN
Sales Admin Role
Sales Role
HR Admin Role
HR Role
✓ Create Users & Roles (User & Role Management)
✓ Not for granting privileges (only the one that is owns)
PUBLIC
Sales Admin Role
Sales Role
HR Admin Role
HR Role
✓ Create Users & Roles (User & Role Management)
✓ Not for granting privileges (only the one that is owns)
PUBLIC
✓ Least privileged role (bottom of hierarchy)
✓ Every user is automatically assigned to this role
✓ Can own objects
✓ These objects are then available to everyone
Snowflake & Other Tools
Data scources
ETL/ELT
Database administrators
Reporting
Marketing
Data Science
BI
Snowflake & other tools
Snowflake & other tools
✓ Create easily trial accounts with Snowflake partners
✓ Convenient option for trying 3rd-party tools
Snowflake & other tools
✓ ETL/data integration tools – Moving & transforming data
✓ Machine Learning & Data Science tools
✓ Security & Governance
Best Practices
Best practices
✓ Virtual warehouses
✓ Table design
✓ Monitoring
✓ Retention period
How does it work in Snowflake?
Data scources
ETL/ELT
Virtual warehouse
✓ Best Practice #1 – Enable Auto-Suspend
✓ Best Practice #2 – Enable Auto-Resume
✓ Best Practice #3 – Set appropriate timeouts
ETL / Data Loading BI / SELECT queries DevOps / Data Science
Timeout Immediately 10 min 5 min
Table design
✓ Best Practice #1 – Appropiate table type
✓ Productive tables – Permanent
✓ Development tables – Transient
✓ Staging tables – Transient
Table design
✓ Best Practice #1 – Appropiate table type
✓ Best Practice #2 – Appropiate data type
✓ Best Practice #3 – Set cluster keys only if necesarry
✓ Most query time for table scan
✓ Dimensions
✓ Large table
Retention period
✓ Best Practice #1: Staging database – 0 days (transient)
✓ Best Practice #2 – Production – 4-7 days (1 day min)
✓ Best Practice #3 – Large high-churn tables – 0 days
(transient)
Active Time Travel Fail Safe
Timeout 20GB 400GB 2.8TB

More Related Content

What's hot

Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data Engineering
Harald Erb
 
Snowflake free trial_lab_guide
Snowflake free trial_lab_guideSnowflake free trial_lab_guide
Snowflake free trial_lab_guide
slidedown1
 
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Amazon Web Services
 
An overview of snowflake
An overview of snowflakeAn overview of snowflake
An overview of snowflake
Sivakumar Ramar
 
25 snowflake
25 snowflake25 snowflake
25 snowflake
剑飞 陈
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
Matillion
 
Changing the game with cloud dw
Changing the game with cloud dwChanging the game with cloud dw
Changing the game with cloud dw
elephantscale
 
Snowflake Overview
Snowflake OverviewSnowflake Overview
Snowflake Overview
Snowflake Computing
 
Snowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for EveryoneSnowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for Everyone
Angel Abundez
 
Snowflake Datawarehouse Architecturing
Snowflake Datawarehouse ArchitecturingSnowflake Datawarehouse Architecturing
Snowflake Datawarehouse Architecturing
Ishan Bhawantha Hewanayake
 
A 30 day plan to start ending your data struggle with Snowflake
A 30 day plan to start ending your data struggle with SnowflakeA 30 day plan to start ending your data struggle with Snowflake
A 30 day plan to start ending your data struggle with Snowflake
Snowflake Computing
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
Databricks
 
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiModern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Slim Baltagi
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP
vinoth kumar
 
Delivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with SnowflakeDelivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with Snowflake
Kent Graziano
 
Data Lake,beyond the Data Warehouse
Data Lake,beyond the Data WarehouseData Lake,beyond the Data Warehouse
Data Lake,beyond the Data Warehouse
Data Science Thailand
 
Get Savvy with Snowflake
Get Savvy with SnowflakeGet Savvy with Snowflake
Get Savvy with Snowflake
Matillion
 
Demystifying Data Warehouse as a Service
Demystifying Data Warehouse as a ServiceDemystifying Data Warehouse as a Service
Demystifying Data Warehouse as a Service
Snowflake Computing
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
magda3695
 
Snowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat SheetSnowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat Sheet
Jeno Yamma
 

What's hot (20)

Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data Engineering
 
Snowflake free trial_lab_guide
Snowflake free trial_lab_guideSnowflake free trial_lab_guide
Snowflake free trial_lab_guide
 
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
 
An overview of snowflake
An overview of snowflakeAn overview of snowflake
An overview of snowflake
 
25 snowflake
25 snowflake25 snowflake
25 snowflake
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
 
Changing the game with cloud dw
Changing the game with cloud dwChanging the game with cloud dw
Changing the game with cloud dw
 
Snowflake Overview
Snowflake OverviewSnowflake Overview
Snowflake Overview
 
Snowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for EveryoneSnowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for Everyone
 
Snowflake Datawarehouse Architecturing
Snowflake Datawarehouse ArchitecturingSnowflake Datawarehouse Architecturing
Snowflake Datawarehouse Architecturing
 
A 30 day plan to start ending your data struggle with Snowflake
A 30 day plan to start ending your data struggle with SnowflakeA 30 day plan to start ending your data struggle with Snowflake
A 30 day plan to start ending your data struggle with Snowflake
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiModern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP
 
Delivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with SnowflakeDelivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with Snowflake
 
Data Lake,beyond the Data Warehouse
Data Lake,beyond the Data WarehouseData Lake,beyond the Data Warehouse
Data Lake,beyond the Data Warehouse
 
Get Savvy with Snowflake
Get Savvy with SnowflakeGet Savvy with Snowflake
Get Savvy with Snowflake
 
Demystifying Data Warehouse as a Service
Demystifying Data Warehouse as a ServiceDemystifying Data Warehouse as a Service
Demystifying Data Warehouse as a Service
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
 
Snowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat SheetSnowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat Sheet
 

Similar to All course slides.pdf

Ultimate+SnowPro+Core+Certification+Course+Slides+by+Tom+Bailey (1).pdf
Ultimate+SnowPro+Core+Certification+Course+Slides+by+Tom+Bailey (1).pdfUltimate+SnowPro+Core+Certification+Course+Slides+by+Tom+Bailey (1).pdf
Ultimate+SnowPro+Core+Certification+Course+Slides+by+Tom+Bailey (1).pdf
chanti29
 
Protecting your Microsoft Workloads with High Availability and Reliability
Protecting your Microsoft Workloads with High Availability and ReliabilityProtecting your Microsoft Workloads with High Availability and Reliability
Protecting your Microsoft Workloads with High Availability and Reliability
Amazon Web Services
 
Champion Fas Deduplication
Champion Fas DeduplicationChampion Fas Deduplication
Champion Fas Deduplication
Michael Hudak
 
Best practices: Backup and Recovery for Windows Workloads
Best practices: Backup and Recovery for Windows WorkloadsBest practices: Backup and Recovery for Windows Workloads
Best practices: Backup and Recovery for Windows Workloads
Amazon Web Services
 
Best Practices for Backup and Recovery: Windows Workload on AWS
Best Practices for Backup and Recovery: Windows Workload on AWS Best Practices for Backup and Recovery: Windows Workload on AWS
Best Practices for Backup and Recovery: Windows Workload on AWS
Amazon Web Services
 
Merging and Migrating: Data Portability from the Trenches
Merging and Migrating: Data Portability from the TrenchesMerging and Migrating: Data Portability from the Trenches
Merging and Migrating: Data Portability from the Trenches
Atlassian
 
Masterclass - Redshift
Masterclass - RedshiftMasterclass - Redshift
Masterclass - Redshift
Amazon Web Services
 
Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyers
huguk
 
Configuring sql server - SQL Saturday, Athens Oct 2014
Configuring sql server - SQL Saturday, Athens Oct 2014Configuring sql server - SQL Saturday, Athens Oct 2014
Configuring sql server - SQL Saturday, Athens Oct 2014
Antonios Chatzipavlis
 
Data Replication Options in AWS (ARC302) | AWS re:Invent 2013
Data Replication Options in AWS (ARC302) | AWS re:Invent 2013Data Replication Options in AWS (ARC302) | AWS re:Invent 2013
Data Replication Options in AWS (ARC302) | AWS re:Invent 2013
Amazon Web Services
 
11g R2
11g R211g R2
11g R2
afa reg
 
(BIZ305) Case Study: Migrating Oracle E-Business Suite to AWS | AWS re:Invent...
(BIZ305) Case Study: Migrating Oracle E-Business Suite to AWS | AWS re:Invent...(BIZ305) Case Study: Migrating Oracle E-Business Suite to AWS | AWS re:Invent...
(BIZ305) Case Study: Migrating Oracle E-Business Suite to AWS | AWS re:Invent...
Amazon Web Services
 
Amazon Athena Hands-On Workshop
Amazon Athena Hands-On WorkshopAmazon Athena Hands-On Workshop
Amazon Athena Hands-On Workshop
DoiT International
 
AWS Webinar 201 - Backup, Archive and Disaster Recovery
AWS Webinar 201 - Backup, Archive and Disaster RecoveryAWS Webinar 201 - Backup, Archive and Disaster Recovery
AWS Webinar 201 - Backup, Archive and Disaster Recovery
Amazon Web Services
 
Storage solution in the cloud
Storage solution in the cloudStorage solution in the cloud
Storage solution in the cloud
Martin Yan
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
Amazon Web Services
 
Amazon RedShift - Ianni Vamvadelis
Amazon RedShift - Ianni VamvadelisAmazon RedShift - Ianni Vamvadelis
Amazon RedShift - Ianni Vamvadelis
huguk
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
Amazon Web Services
 
How to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutesHow to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutes
Vladimir Simek
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
Amazon Web Services
 

Similar to All course slides.pdf (20)

Ultimate+SnowPro+Core+Certification+Course+Slides+by+Tom+Bailey (1).pdf
Ultimate+SnowPro+Core+Certification+Course+Slides+by+Tom+Bailey (1).pdfUltimate+SnowPro+Core+Certification+Course+Slides+by+Tom+Bailey (1).pdf
Ultimate+SnowPro+Core+Certification+Course+Slides+by+Tom+Bailey (1).pdf
 
Protecting your Microsoft Workloads with High Availability and Reliability
Protecting your Microsoft Workloads with High Availability and ReliabilityProtecting your Microsoft Workloads with High Availability and Reliability
Protecting your Microsoft Workloads with High Availability and Reliability
 
Champion Fas Deduplication
Champion Fas DeduplicationChampion Fas Deduplication
Champion Fas Deduplication
 
Best practices: Backup and Recovery for Windows Workloads
Best practices: Backup and Recovery for Windows WorkloadsBest practices: Backup and Recovery for Windows Workloads
Best practices: Backup and Recovery for Windows Workloads
 
Best Practices for Backup and Recovery: Windows Workload on AWS
Best Practices for Backup and Recovery: Windows Workload on AWS Best Practices for Backup and Recovery: Windows Workload on AWS
Best Practices for Backup and Recovery: Windows Workload on AWS
 
Merging and Migrating: Data Portability from the Trenches
Merging and Migrating: Data Portability from the TrenchesMerging and Migrating: Data Portability from the Trenches
Merging and Migrating: Data Portability from the Trenches
 
Masterclass - Redshift
Masterclass - RedshiftMasterclass - Redshift
Masterclass - Redshift
 
Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyers
 
Configuring sql server - SQL Saturday, Athens Oct 2014
Configuring sql server - SQL Saturday, Athens Oct 2014Configuring sql server - SQL Saturday, Athens Oct 2014
Configuring sql server - SQL Saturday, Athens Oct 2014
 
Data Replication Options in AWS (ARC302) | AWS re:Invent 2013
Data Replication Options in AWS (ARC302) | AWS re:Invent 2013Data Replication Options in AWS (ARC302) | AWS re:Invent 2013
Data Replication Options in AWS (ARC302) | AWS re:Invent 2013
 
11g R2
11g R211g R2
11g R2
 
(BIZ305) Case Study: Migrating Oracle E-Business Suite to AWS | AWS re:Invent...
(BIZ305) Case Study: Migrating Oracle E-Business Suite to AWS | AWS re:Invent...(BIZ305) Case Study: Migrating Oracle E-Business Suite to AWS | AWS re:Invent...
(BIZ305) Case Study: Migrating Oracle E-Business Suite to AWS | AWS re:Invent...
 
Amazon Athena Hands-On Workshop
Amazon Athena Hands-On WorkshopAmazon Athena Hands-On Workshop
Amazon Athena Hands-On Workshop
 
AWS Webinar 201 - Backup, Archive and Disaster Recovery
AWS Webinar 201 - Backup, Archive and Disaster RecoveryAWS Webinar 201 - Backup, Archive and Disaster Recovery
AWS Webinar 201 - Backup, Archive and Disaster Recovery
 
Storage solution in the cloud
Storage solution in the cloudStorage solution in the cloud
Storage solution in the cloud
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Amazon RedShift - Ianni Vamvadelis
Amazon RedShift - Ianni VamvadelisAmazon RedShift - Ianni Vamvadelis
Amazon RedShift - Ianni Vamvadelis
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
 
How to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutesHow to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutes
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 

Recently uploaded

Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
CAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on BlockchainCAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on Blockchain
Claudio Di Ciccio
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 

Recently uploaded (20)

Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
CAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on BlockchainCAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on Blockchain
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 

All course slides.pdf

  • 2. What will you learn? Getting Started Architecture Unstructured Data Performance Load from AWS Snowpipe Time Travel Fail Safe Zero-Copy Cloning Table Types Data Sharing Data Sampling Scheduling Tasks Streams Materialized views Data Masking Visualizations Partner Connect Best practices Loading Data Copy Options Access Management Load from Azure Load from GCP
  • 3. Contents Getting Started 4 Loading Data 44 Data Sampling 117 Snowflake Architecture 7 Performance Optimization 60 Tasks & Streams 124 Multi-Clustering 11 Snowpipe 88 Materialized Views 142 Data Warehousing 17 Fail Safe and Time Travel 92 Data Masking 155 Cloud Computing 26 Table Types 97 Access Control 159 Snowflake Editions 31 Zero Copy Cloning 101 Snowflake & Other Tools 182 Snowflake Pricing 34 Swapping 104 Best Practices 186 Snowflake Roles 41 Data Sharing 109
  • 5. How you can get the most out of this course? Make use of the udemy tools Own pace Practice Help others Resources, quizzes & assignements Ask questions Enjoy learning!
  • 6. Best practices Make use of the udemy tools Own pace Practice Help others Resources, quizzes & assignements Ask questions Enjoy learning! ✓ Pay only for what you use
  • 8. Snowflake Architecture STORAGE QUERY PROCESSING CLOUD SERVICES Virtual Warehouse Virtual Warehouse Virtual Warehouse
  • 9. Snowflake Architecture - Brain of the system - Managing infrastructure, Access control, security, Optimizier, Metadata etc. - Muscle of the system - Performs MMP (Massive Parallel Processing) - Hybrid Columnar Storage - Saved in blobs
  • 14. Multi-Clustering S Queue Auto-Scaling: When to start an additional cluster?
  • 15. Scaling policy Standard Favors starting additional warehouses Economy Favors conserving credits rather than starting additional warehouses
  • 16. Scaling policy Policy Description Cluster Starts… Cluster Shuts Down… Standard (default) Prevents/minimizes queuing by favoring starting additional clusters over conserving credits. Immediately when either a query is queued or the system detects that there are more queries than can be executed by the currently available clusters. After 2 to 3 consecutive successful checks (performed at 1 minute intervals), which determine whether the load on the least-loaded cluster could be redistributed to the other clusters Economy Conserves credits by favoring keeping running clusters fully- loaded rather than starting additional clusters, Result: May result in queries being queued and taking longer to complete. Only if the system estimates there’s enough query load to keep the cluster busy for at least 6 minutes. After 5 to 6 consecutive successful checks …
  • 18. What is a data warehouse?
  • 19. What is a data warehouse? Skip the lecture if you are already familiar with data warehouses
  • 20. What is a data warehouse? What is the purpose of a data warehouse?
  • 21. What is a data warehouse? = Database that is used for reporting and data analysis
  • 23. What is a data warehouse? HR data sales data ETL data warehouse ETL = Extract, Transform & Load
  • 24. Different layers Data scources Raw data Data integration Access layer Data Science Reporting Other Apps
  • 28. Cloud Computing Data Center • Infrastucture • Security • Electricity • Software/Hardware upgrades MANAGED Software-as-a-Service
  • 29. Cloud Computing Application Software Data Physical servers Virtual machines Physical storage Operating System Software-as-a-service Databases, tables etc. Cloud provider AWS, Azure, GCP Snowflake Managing data storage, Virtual warehouses, Upgrades/Metadata etc.
  • 30. Cloud Computing Application Software Data Physical servers Virtual machines Physical storage Operating System Software-as-a-service Customer Creating tables etc. Cloud provider AWS, Azure, GCP Snowflake Managing data storage, Virtual warehouses, Upgrades/Metadata etc.
  • 32. Snowflake Editions introductory level Standard Business Critical even higher levels of data protection for organizations with extremely sensitive data Enterprise additional features for the needs of large-scale enterprises Virtual Private highest level of security
  • 33. Snowflake Editions Standard Business Critical Enterprise Virtual Private ✓ Complete DWH ✓ Automatic data encryption ✓ Time travel up to 1 day ✓ Disaster recovery for 7 days beyond time travel ✓ Secure data share ✓ Premier support 24/7 ✓ All Standard features ✓ Multi-cluster warehouse ✓ Time travel up to 90 days ✓ Materialized views ✓ Search Optimization ✓ Column-level security ✓ All Enterprise features ✓ Additional security features such as Data encryption everywhere ✓ Extended support ✓ Database failover and disaster recovery ✓ All Business Ciritcal features ✓ Dedicated virtual servers and completely seperate Snowflake environment
  • 35. Snowflake Pricing Compute Storage ✓ Charged for active warehouses per hour ✓ Depending on the size of the warehouse ✓ Billed by second (minimum of 1min) ✓ Charged in Snowflake credits ✓ Monthly storage fees ✓ Based on average storage used per month ✓ Cost calculated after compression ✓ Cloud Providers
  • 37. Snowflake Pricing Standard Business Critical Enterprise Virtual Private ✓ $2.70 / Credit ✓ $4 / Credit ✓ $5.40 / Credit ✓ Contact Snowflake Region: EU (Frankfurt) Platform: AWS
  • 39. Snowflake Pricing On Demand Storage Capacity Storage ✓ We think we need 1 TB of storage Region: US East (Northern Virginia) Platform: AWS ❖ Scenario 1: 100GB of storage used 0.1 TB x $40 = $4 ❖ Scenario 1: 100GB of storage used 1 TB x $23 = $23 ❖ Scenario 2: 800GB of storage used 0.8 TB x $40 = $32 ❖ Scenario 2: 800GB of storage used 0.8 TB x $40 = $23
  • 40. Snowflake Pricing On Demand Storage Capacity Storage ✓ Start with On Demand ✓ Once you are sure about your usage use Capacity storage
  • 43. Snowflake Roles ACCOUNTADMIN SECURITYADMIN SYSADMIN USERADMIN PUBLIC ✓ SYSADMIN and SECURITYADMIN ✓ top-level role in the system ✓ should be granted only to a limited number of users ✓ USERADMIN role is granted to SECURITYADMIN ✓ Can manage users and roles ✓ Can manage any object grant globally ✓ Create warehouses and databases (and more objects) ✓ Recommended that all custom roles are assigned ✓ Dedicated to user and role management only ✓ Can create users and roles ✓ Automatically granted to every user ✓ Can create own objects like every other role (available to every other user/role
  • 45. Loading Data BULK LOADING CONTINUOUS LOADING ✓ Most frequent method ✓ Uses warehouses ✓ Loading from stages ✓ COPY command ✓ Transformations possible ✓ Designed to load small volumes of data ✓ Automatically once they are added to stages ✓ Lates results for analysis ✓ Snowpipe (Serverless feature)
  • 46. Understanding Stages ✓ Not to be confused with dataware house stages ✓ Location of data files where data can be loaded from External Stage Internal Stage
  • 47. Understanding Stages External Stage Internal Stage ✓ External cloud provider ▪ S3 ▪ Google Cloud Plattform ▪ Microsoft Azure ✓ Database object created in Schema ✓ CREATE STAGE (URL, access settings) Note: Additional costs may apply if region/platform differs ✓ Local storage maintained by Snowflake
  • 48. Copy Options COPY INTO <table_name> FROM externalStage FILES = ( '<file_name>' ,'<file_name2>') FILE_FORMAT = <file_format_name> copyOptions
  • 49. Copy Options COPY INTO <table_name> FROM externalStage FILES = ( '<file_name>' ,'<file_name2>') FILE_FORMAT = <file_format_name> ON_ERROR = CONTINUE
  • 50. Copy Options ✓ Validate the data files instead of loading them COPY INTO <table_name> FROM externalStage FILES = ( '<file_name>' ,'<file_name2>') FILE_FORMAT = <file_format_name> VALIDATION_MODE = RETURN_n_ROWS | RETURN_ERRORS
  • 51. Copy Options ✓ Validate the data files instead of loading them COPY INTO <table_name> FROM externalStage FILES = ( '<file_name>' ,'<file_name2>') FILE_FORMAT = <file_format_name> VALIDATION_MODE = RETURN_n_ROWS | RETURN_ERRORS RETURN_n_ROWS (e.g. RETURN_10_ROWS) Validates & returns the specified number of rows; fails at the first error encountered RETURN_ERRORS Returns all errors in Copy Command
  • 52. Copy Options COPY INTO <table_name> FROM externalStage FILES = ( '<file_name>' ,'<file_name2>') FILE_FORMAT = <file_format_name> SIZE_LIMIT = num ✓ Specify maximum size (in bytes) of data loaded in that command (at least one file) ✓ When the threshold is exceeded, the COPY operation stops loading
  • 53. Copy Options COPY INTO <table_name> FROM externalStage FILES = ( '<file_name>' ,'<file_name2>') FILE_FORMAT = <file_format_name> RETURN_FAILED_ONLY = TRUE | FALSE ✓ Specifies whether to return only files that have failed to load in the statement result ✓ DEFAULT = FALSE
  • 54. Copy Options COPY INTO <table_name> FROM externalStage FILES = ( '<file_name>' ,'<file_name2>') FILE_FORMAT = <file_format_name> TRUNCATECOLUMNS = TRUE | FALSE ✓ Specifies whether to truncate text strings that exceed the target column length
  • 55. Copy Options COPY INTO <table_name> FROM externalStage FILES = ( '<file_name>' ,'<file_name2>') FILE_FORMAT = <file_format_name> FORCE = TRUE | FALSE ✓ Specifies to load all files, regardless of whether they’ve been loaded previously and have not changed since they were loaded ✓ Note that this option reloads files, potentially duplicating data in a table
  • 56. Copy Options COPY INTO <table_name> FROM externalStage FILES = ( '<file_name>' ,'<file_name2>') FILE_FORMAT = <file_format_name> TRUNCATECOLUMNS = TRUE | FALSE ✓ Specifies whether to truncate text strings that exceed the target column length ✓ TRUE = strings are automatically truncated to the target column length ✓ FALSE = COPY produces an error if a loaded string exceeds the target column length ✓ DEFAULT = FALSE
  • 57. Copy Options COPY INTO <table_name> FROM externalStage FILES = ( '<file_name>' ,'<file_name2>') FILE_FORMAT = <file_format_name> SIZE_LIMIT = num ✓ Specify maximum size (in bytes) of data loaded in that command (at least one file) ✓ When the threshold is exceeded, the COPY operation stops loading ✓ Threshold for each file ✓ DEFAULT: null (no size limit)
  • 58. Copy Options COPY INTO <table_name> FROM externalStage FILES = ( '<file_name>' ,'<file_name2>') FILE_FORMAT = <file_format_name> PURGE = TRUE | FALSE ✓ DEFAULT: FALSE ✓ specifies whether to remove the data files from the stage automatically after the data is loaded successfully
  • 59. Load unstructured data Create Stage Load raw data Analyse & Parse Flatten & Load Type VARIANT
  • 62. Performance Optimization ✓ Add indexes, primary keys ✓ Create table partitions ✓ Analyze the query execution table plan ✓ Remove unnecessary full table scans
  • 63. Performance Optimization ✓ Add indexes, primary keys ✓ Create table partitions ✓ Analyze the query execution table plan ✓ Remove unnecessary full table scans
  • 64. How does it work in Snowflake? ✓ Automatically managed micro- partitions
  • 65. What is our job? ✓ Assigning appropriate data types ✓ Sizing virtual warehouses ✓ Cluster keys
  • 66. Performance aspects Dedicated virtual warehouses Scaling Up Maximize Cache Usage ✓ Separated according to different workloads ✓ For known patterns of high work load ✓ Dynamically fo unknown patterns of work load ✓ Automatic caching can be maximized Scaling Out Cluster Keys ✓ For large tables
  • 68. Dedicated virtual warehouse Identify & Classify ✓ Identify & Classify groups of workload/users ✓ For every class of workload & assign users Create dedicated virtual warehouses ✓ BI Team, Data Science Team, Marketing department
  • 69. Considerations Not too many VW ✓ Avoid underutilization ✓ Work patterns can change Refine classifications
  • 70. Considerations ✓ If you use at least Entripse Edition all warehouses should be Multi-Cluster ✓ Minimum: Default should be 1 ✓ Maximum: Can be very high Let's practice!
  • 71. How does it work in Snowflake? Data scources ETL/ELT
  • 72. Scaling Up/Down ✓ Changing the size of the virtual warehouse depending on different work loads in different periods ✓ ETL at certain times (for example between 4pm and 8pm) ✓ Special business event with more work load Use cases ✓ NOTE: Common scenario is increased query complexity NOT more users (then Scaling out would be better)
  • 73. Scaling Out Scaling Up Scaling Out Increasing the size of virtual warehouses Using addition warehouses/ Multi- Cluster warehouses More complex query More concurrent users/queries
  • 74. Scaling Out ✓ Handling performance related to large numbers of concurrent users ✓ Automation the process if you have fluctuating number of users
  • 75. Caching ✓ Automatical process to speed up the queries ✓ If query is executed twice, results are cached and can be re-used ✓ Results are cached for 24 hours or until underlaying data has changed
  • 76. What can we do? ✓ Ensure that similar queries go on the same warehouse ✓ Example: Team of Data Scientists run similar queries, so they should all use the same warehouse
  • 77. ✓ In general Snowflake produces well-clustered tables ✓ Cluster keys are not always ideal and can change over time Clustering in Snowflake ✓ Snowflake automatically maintains these cluster keys ✓ Manually customize these cluster keys
  • 78. What is a cluster key? ✓ Subset of rows to locate the data in micro-partions ✓ For large tables this improves the scan efficiency in our queries
  • 79. Event Date Event ID Customers City 2021-03-12 134584 … … 2021-12-04 134586 … … 2021-11-04 134588 … … 2021-04-05 134589 … … 2021-06-07 134594 … … 2021-07-03 134597 … … 2021-03-04 134598 … … 2021-08-03 134599 … … 2021-08-04 134601 … … What is a cluster key? Event Date Event ID Customers City 2021-03-12 134584 … … 2021-12-04 134586 … … 2021-11-04 134588 … … 2021-04-05 134589 … … 2021-06-07 134594 … … 2021-07-03 134597 … … 2021-03-04 134598 … … 2021-08-03 134599 … … 2021-08-04 134601 … …
  • 80. What is a cluster key? Event Date Event ID Customers City 2021-03-12 134584 … … 2021-12-04 134586 … … 2021-11-04 134588 … … 2021-04-05 134589 … … 2021-06-07 134594 … … 2021-07-03 134597 … … 2021-03-04 134598 … … 2021-08-03 134599 … … 2021-08-04 134601 … … Event Date Event ID Customers City 2021-03-12 134584 … … 2021-12-04 134586 … … 2021-11-04 134588 … … 2021-04-05 134589 … … 2021-06-07 134594 … … 2021-07-03 134597 … … 2021-03-04 134598 … … 2021-08-03 134599 … … 2021-08-04 134601 … … 1 2 3
  • 81. Event Date Event ID Customers City 2021-03-12 134584 … … 2021-12-04 134586 … … 2021-11-04 134588 … … 2021-04-05 134589 … … 2021-06-07 134594 … … 2021-07-03 134597 … … 2021-03-04 134598 … … 2021-08-03 134599 … … 2021-08-04 134601 … … Event Date Event ID Customers City 2021-03-12 134584 … … 2021-12-04 134586 … … 2021-11-04 134588 … … 2021-04-05 134589 … … 2021-06-07 134594 … … 2021-07-03 134597 … … 2021-03-04 134598 … … 2021-08-03 134599 … … 2021-08-04 134601 … … What is a cluster key? SELECT COUNT(*) WHERE Event_Date > '2021-07-01' AND Event_Date < '2021-08-01 '
  • 82. Event Date Event ID Customers City 2021-03-12 134584 … … 2021-12-04 134586 … … 2021-11-04 134588 … … 2021-04-05 134589 … … 2021-06-07 134594 … … 2021-07-03 134597 … … 2021-03-04 134598 … … 2021-08-03 134599 … … 2021-08-04 134601 … … Event Date Event ID Customers City 2021-03-12 134584 … … 2021-12-04 134586 … … 2021-11-04 134588 … … 2021-04-05 134589 … … 2021-06-07 134594 … … 2021-07-03 134597 … … 2021-03-04 134598 … … 2021-08-03 134599 … … 2021-08-04 134601 … … What is a cluster key? SELECT COUNT(*) WHERE Event_Date > '2021-07-01' AND Event_Date < '2021-08-01' All partitions need to be scanned!
  • 83. What is a cluster key? Event Date Event ID Customers City 2021-03-12 134584 … … 2021-12-04 134586 … … 2021-11-04 134588 … … 2021-04-05 134589 … … 2021-06-07 134594 … … 2021-07-03 134597 … … 2021-03-04 134598 … … 2021-08-03 134599 … … 2021-08-04 134601 … … Event Date Event ID Customers City 2021-03-12 134584 … … 2021-12-04 134586 … … 2021-11-04 134588 … … 2021-04-05 134589 … … 2021-06-07 134594 … … 2021-07-03 134597 … … 2021-03-04 134598 … … 2021-08-03 134599 … … 2021-08-04 134601 … … 1 2 3 Event Date Event ID Customers City 2021-03-12 134584 … … 2021-12-04 134586 … … 2021-11-04 134588 … … 2021-04-05 134589 … … 2021-06-07 134594 … … 2021-07-03 134597 … … 2021-03-04 134598 … … 2021-08-03 134599 … … 2021-08-04 134601 … … Event Date Event ID Customers City 2021-03-04 134598 … … 2021-03-12 134584 … … 2021-04-05 134589 … … 2021-06-07 134594 … … 2021-07-03 134597 … … 2021-08-03 134599 … … 2021-08-04 134601 … … 2021-11-04 134588 … … 2021-12-04 134586 … …
  • 84. ✓ Mainly very large tables of multiple terabytes can benefit When to cluster? ✓ Clustering is not for all tables
  • 85. ✓ If you typically use filters on two columns then the table can also benefit from two cluster keys ✓ Column that is frequently used in Joins How to cluster? ✓ Columns that are used most frequently in WHERE-clauses (often date columns for event tables) ✓ Large enough number of distinct values to enable effective grouping Small enough number of distinct values to allow effective grouping
  • 86. Clustering in Snowflake CREATE TABLE <name> ... CLUSTER BY ( <column1> [ , <column2> ... ] ) ALTER TABLE <name> CLUSTER BY ( <expr1> [ , <expr2> ... ] ) CREATE TABLE <name> ... CLUSTER BY ( <expression> ) ALTER TABLE <name> DROP CLUSTERING KEY
  • 87. ✓ If you typically use filters on two columns then the table can also benefit from two cluster keys ✓ Column that is frequently used in Joins Clustering in Snowflake ✓ Columns that are used most frequently in WHERE-clauses (often date columns for event tables) ✓ Large enough number of distinct values to enable effective grouping Small enough number of distinct values to allow effective grouping
  • 89. What is Snowpipe? ✓ Enables loading once a file appears in a bucket ✓ If data needs to be available immediately for analysis ✓ Snowpipe uses serverless features instead of warehouses
  • 91. Setting up Snowpipe Create Stage ✓ To make sure it works ✓ To trigger snowpipe Test COPY COMMAND Create Pipe ✓ To have the connection S3 Notification ✓ Create pipe as object with COPY COMMAND
  • 92. Fail Safe and Time Travel
  • 93. Time Travel Standard Business Critical Enterprise Virtual Private ✓ Time travel up to 1 day ✓ Time travel up to 90 days ✓ Time travel up to 90 days ✓ Time travel up to 90 days RETENTION PERIODE DEFAULT = 1
  • 94. Fail Safe ✓ Protection of historical data in case of disaster ✓ Non-configurable 7-day period for permanent tables ✓ No user interaction & recoverable only by Snowflake ✓ Period starts immediately after Time Travel period ends ✓ Contributes to storage cost
  • 95. Fail Safe ✓ Access and query data etc. Current Data Storage
  • 96. Continuous Data Protection Lifecycle ✓ Access and query data etc. etc. Current Data Storage ✓ SELECT … AT | BEFORE UNDROP Time Travel (1 – 90 days) ✓ No user operations/queries Fail Safe (transient: 0 days permanent: 7 days) ✓ Restoring only by snowflake support ✓ Recovery beyond Time Travel
  • 98. Table types Transient Permanent ✓ Time Travel Retention Period 0 – 90 days ✓ Fail Safe CREATE TRANSIENT TABLE CREATE TABLE ✓ Time Travel Retention Period 0 – 1 day × Fail Safe Temporary CREATE TEMPORARY TABLE × Fail Safe Only in session Until dropped Until dropped ✓ Time Travel Retention Period 0 – 1 day Only for data that does not need to be protected Non-permanent data Permanent data
  • 99. Table types Transient Permanent ✓ Time Travel Retention Period 0 – 90 days ✓ Fail Safe CREATE TRANSIENT TABLE CREATE TABLE ✓ Time Travel Retention Period 0 – 1 day × Fail Safe Temporary CREATE TEMPORARY TABLE × Fail Safe Only in session Until dropped Until dropped ✓ Time Travel Retention Period 0 – 1 day Only for data that does not need to be protected Non-permanent data Permanent data Managing Storage Cost
  • 100. Table types notes ✓ Types are also available for other database objects (database, schema etc.) ✓ For temporary table no naming conflicts with permanent/transient tables! Other tables will be effectively hidden!
  • 102. Zero-Copy Cloning ✓ Create copies of a database, a schema or a table ✓ Cloned object is independent from original table ✓ Easy to copy all meta data & improved storage management ✓ Creating backups for development purposes ✓ Works with time travel also
  • 103. Zero-Copy Cloning CREATE TABLE <table_name> ... CLONE <source_table_name> BEFORE ( TIMESTAMP => <timestamp> )
  • 105. Swapping Tables ✓ Use-case: Development table into production table Development Production Meta data Meta data Swap
  • 106. Swapping ALTER TABLE <table_name> ... SWAP WITH <target_table_name>
  • 107. Swapping ALTER SCHEMA <schema_name> ... SWAP WITH <target_schema_name>
  • 108. Swaping Tables ✓ Create copies of a database, a schema or a table Original Copy Meta data operation
  • 110. Data Sharing ✓ Usually this can be also a rather complicated process ✓ Data sharing without actual copy of the data & uptodate ✓ Shared data can be consumed by the own compute resources ✓ Non-Snowflake-Users can also access through a reader account
  • 112. Data Sharing Account 2 Account 1 Read-only Account 2 Compute Resources
  • 113. Data Sharing Account 1 Reader Account Own Compute Resources
  • 114. Data Sharing Account 1 Reader Account Own Compute Resources
  • 115. Data Sharing with Non Snowflake Users Account 1 Reader Account Own Compute Resources
  • 116. Sharing with Non Snowflake users ✓ Indepentant instance with own url & own compute resources New Reader Account Share data ✓ Share database & table ✓ In reader account create database from share Create database ✓ As administrator create user & roles Create Users
  • 118. Data Sampling 10 TB SAMPLE Why Sampling? 500 GB
  • 119. Data Sampling - Use-cases: Query development, data analysis etc. Why Sampling? - Faster & more cost efficient (less compute resources)
  • 120. Data Sampling 10 TB SAMPLE Why Sampling? 500 GB
  • 121. Data Sampling Methods ROW or BERNOULLI method BLOCK or SYSTEM method
  • 122. Data Sampling Methods ROW or BERNOULLI method BLOCK or SYSTEM method Every row is chosen with percentage p Every block is chosen with percentage p More effective processing More "randomness" Smaller tables Larger tables
  • 123. Data Sampling Account 1 Reader Account Own Compute Resources
  • 125. Scheduling Tasks ✓ Tasks can be used to schedule SQL statements ✓ Standalone tasks and trees of tasks Understand tasks Create tasks Schedule tasks Tree of tasks Check task history
  • 126. Tree of Tasks Root task Task A Task B Task C Task D Task E Task F ✓ Every task has one parent
  • 127. Tree of Tasks ALTER TASK ... ADD AFTER <parent task> CREATE TASK ... AFTER <parent task> AS …
  • 132. Streams SELECT * FROM <stream name> CREATE STREAM <stream name> ON TABLE <table name>
  • 139. Streams Data scources Raw data Data integration Access layer Data Science Reporting Other Apps Object that records (DML-)changes made to a table This process is called change data capture (CDC)
  • 140. Types of streams STANDARD APPEND-ONLY ✓ INSERT ✓ UPDATE ✓ DELETE ✓ INSERT
  • 141. Syntax CREATE STREAM <stream name> ON TABLE <table name> APPEND_ONLY = TRUE
  • 143. Materialized views ✓ We have a view that is queried frequently and that a long time to be processed × Bad user experience × More compute consumption
  • 144. Materialized views ✓ We have a view that is queried frequently and that a long time to be processed ✓ We can create a materialized view to solve that problem
  • 145. What is a materialized view? ✓ Use any SELECT-statement to create this MV ✓ Results will be stored in a seperate table and this will be updated automatically based on the base table
  • 146. When to use MV? ✓ Benefits ✓ Maintenance costs
  • 147. When to use MV? ✓ View would take a long time to be processed and is used frequently ✓ Underlaying data is change not frequently and on a rather irregular basis
  • 148. When to use MV? If the data is updated on a very regular basis… ✓ Using tasks & streams could be a better alternative
  • 149. Alternative – streams & tasks Underlaying Table Stream object TASK with MERGE VIEW / TABLE
  • 150. When to use MV? ✓ Don't use materialized view if data changes are very frequent ✓ Keep maintenance cost in mind ✓ Considder leveraging tasks (& streams) instead
  • 151. Limitations Only available for Enterprise edition
  • 152. Limitations × Joins (including self-joins) are not supported × Limited amount of aggregation functions
  • 153. Limitations × Joins (including self-joins) are not supported × Limited amount of aggregation functions APPROX_COUNT_DISTINCT (HLL). AVG (except when used in PIVOT). BITAND_AGG. BITOR_AGG. BITXOR_AGG. COUNT. MIN. MAX. STDDEV. STDDEV_POP. STDDEV_SAMP. SUM. VARIANCE (VARIANCE_SAMP, VAR_SAMP). VARIANCE_POP (VAR_POP).
  • 154. Limitations × Joins (including self-joins) are not supported × Limited amount of aggregation functions × UDFs × HAVING clauses. × ORDER BY clause. × LIMIT clause
  • 160. Access Control ✓ Who can access and perform operations on objects in Snowflake ✓ Two aspects of access control combined
  • 161. Access Control Discretionary Access Control (DAC) Role-based Access Control (RBAC) ✓ Each object has an owner who can grant access to that object ✓ Access privileges are assigned to roles, which are in turn assigned to users
  • 162. Access Control Role 1 Creates Table Owns Privilege Role 2 Role 3 User 1 User 2 User 3 GRANT <privilege> ON <obeject> TO <role> GRANT <role> TO <user>
  • 163. Securable objects Account User Role Database Warehouse Other Account objects Schema Table View Stage Integration Other Schema objects
  • 164. Access Control ✓ Every object owned by a single role (multiple users) ✓ Owner (role) has all privileges per default
  • 165. Key concepts USER ROLE PRIVILEGE SECURABLE OBJECT ✓ People or systems ✓ Entity to which privileges are granted (role hierarchy) ✓ Level of access to an object (SELECT, DROP, CREATE etc.) ✓ Objects to which privileges can be granted (Database, Table, Warehouse etc.)
  • 168. Snowflake Roles ACCOUNTADMIN SECURITYADMIN SYSADMIN USERADMIN PUBLIC ✓ SYSADMIN and SECURITYADMIN ✓ top-level role in the system ✓ should be granted only to a limited number of users ✓ USERADMIN role is granted to SECURITYADMIN ✓ Can manage users and roles ✓ Can manage any object grant globally ✓ Create warehouses and databases (and more objects) ✓ Recommended that all custom roles are assigned ✓ Dedicated to user and role management only ✓ Can create users and roles ✓ Automatically granted to every user ✓ Can create own objects like every other role (available to every other user/role
  • 170. ACCOUNTADMIN Top-Level-Role ✓ Manage & view all objects ✓ All configurations on account level ✓ Account operations (create reader account, billing etc.) ✓ First user will have this role assigned Best practises ✓ Very controlled assignment strongly recommended! ✓ Multi-factor authentification ✓ At least two users should be assigned to that role ✓ Avoid creating objects with that role unless you have to ✓ Initial setup & managing account level objects
  • 172. ACCOUNTADMIN ✓ Account admin tab ✓ Billing & Usage ✓ Reader Account ✓ Multi-Factor Authentification ✓ Create other users
  • 173. ACCOUNTADMIN ✓ USERADMIN role is granted to SECURITYADMIN ✓ Can manage users and roles ✓ Can manage any object grant globally
  • 174. SECURITYADMIN Sales Admin Role Sales Role HR Admin Role HR Role
  • 176. SYSADMIN ✓ Create & manage objects ✓ Create & manage warehouses, databases, tables etc. ✓ Custom roles should be assigned to the SYSADMIN role as the parent Then this role also has the ability to grant privileges on warehouses, databases, and other objects to the custom roles.
  • 177. SYSADMIN Sales Admin Role Sales Role HR Admin Role HR Role ✓ Create a virtual warehouse & assign it to the custom roles ✓ Create a database and table & assign it to the custom roles
  • 178. Custom roles Sales Admin Role Sales Role HR Admin Role HR Role ✓ Customize roles to our needs & create own hierarchies ✓ Custom roles are usually created by SECURITYADMIN ✓ Should be leading up to the SYSADMIN role
  • 179. USERADMIN Sales Admin Role Sales Role HR Admin Role HR Role ✓ Create Users & Roles (User & Role Management) ✓ Not for granting privileges (only the one that is owns)
  • 180. PUBLIC Sales Admin Role Sales Role HR Admin Role HR Role ✓ Create Users & Roles (User & Role Management) ✓ Not for granting privileges (only the one that is owns)
  • 181. PUBLIC ✓ Least privileged role (bottom of hierarchy) ✓ Every user is automatically assigned to this role ✓ Can own objects ✓ These objects are then available to everyone
  • 184. Snowflake & other tools ✓ Create easily trial accounts with Snowflake partners ✓ Convenient option for trying 3rd-party tools
  • 185. Snowflake & other tools ✓ ETL/data integration tools – Moving & transforming data ✓ Machine Learning & Data Science tools ✓ Security & Governance
  • 187. Best practices ✓ Virtual warehouses ✓ Table design ✓ Monitoring ✓ Retention period
  • 188. How does it work in Snowflake? Data scources ETL/ELT
  • 189. Virtual warehouse ✓ Best Practice #1 – Enable Auto-Suspend ✓ Best Practice #2 – Enable Auto-Resume ✓ Best Practice #3 – Set appropriate timeouts ETL / Data Loading BI / SELECT queries DevOps / Data Science Timeout Immediately 10 min 5 min
  • 190. Table design ✓ Best Practice #1 – Appropiate table type ✓ Productive tables – Permanent ✓ Development tables – Transient ✓ Staging tables – Transient
  • 191. Table design ✓ Best Practice #1 – Appropiate table type ✓ Best Practice #2 – Appropiate data type ✓ Best Practice #3 – Set cluster keys only if necesarry ✓ Most query time for table scan ✓ Dimensions ✓ Large table
  • 192. Retention period ✓ Best Practice #1: Staging database – 0 days (transient) ✓ Best Practice #2 – Production – 4-7 days (1 day min) ✓ Best Practice #3 – Large high-churn tables – 0 days (transient) Active Time Travel Fail Safe Timeout 20GB 400GB 2.8TB