Presented By: Sarfaraz Hussain
Sr. Software Consultant
Knoldus Inc
KSnow: Getting started with Snowflake
(A cloud data warehouse)
Lack of etiquette and manners is a huge turn off.
KnolX Etiquettes
Punctuality
Respect Knolx session timings, you
are requested not to join sessions
after a 5 minutes threshold post
the session start time.
Feedback
Make sure to submit a constructive
feedback for all sessions as it is
very helpful for the presenter.
Silent Mode
Keep your screen on mute, until it
is necessary.
Avoid Distraction
Be along with the presenter during
the session and enjoy.
Agenda
01 Prerequisite Knowledge
02 Snowflake and it’s internal Architecture
03 Snowflake vs. Big Data Tools
04 Virtual Warehouse and Staging Area
05 Deep Dive into working Architecture
06 Use-case and DEMO
Data Warehouse vs. Data Lake
Teradata
Exadata
HDFS
AWS S3
Need of Data Warehouse
What is Data Warehouse?
2013 2014 201 2016 2017 2018
- DWH is a centralized place to store large amount of historical data produced by a
system/organization to find out meaningful insights after processing and analyzing the data.
- Traditional Data Warehouse Architecture:
What is Snowflake?
2013 2014 201 2016 2017 2018
Snowflake is modern-day data processing system that is intended to make the best
use of the elasticity of the cloud so that it can scale to infinity.
Features:
- Cloud based data warehouse
- SaaS solution
- Pay per Use model (storage + compute)
- Supports standard ANSI SQL
- Supports ODBC and JDBC connectors
- Auto Scalable and Elastic (Virtual Warehouse)
- Unlimited storage of data (Uses AWS S3, Azure Blob Storage, Google Cloud Storage)
Snowflake (contd.)
2013 2014 201 2016 2017 2018
Advantages:
- Easy to process huge volume of data
- Provides ACID transaction
- No data backups required
- No need to worry about Optimization
- No need to maintain Indexes
- No Out of Memory issues
- Sharing data
Disadvantage:
- COST
Snowflake vs. Big Data Tools
2013 2014 201 2016 2017 2018
Apache Hive
- It is a data warehouse on top of HDFS
- It has performance challenges as it uses MapReduce for processing
Apache Spark (Batch SQL processing)
- Spark SQL has limited support for advanced SQL operation
- Advance optimizations are developer’s responsibility
- Resource allocation is developer’s responsibility
Snowflake Architecture
2013 2014 201 2016 2017 2018
Snowflake Architecture
2013 2014 201 2016 2017 2018
Data Storage Layer
2013 2014 201 2016 2017 2018
- When we create a Snowflake account, we select the underlying cloud provider.
- Cloud provider can be AWS, Azure, Google.
- According to our choice, the Data Storage Layer (DSL) is hosted on AWS S3, Azure Blob Storage
or Google Cloud Storage.
- DSL stores the actual data and provides unlimited space.
- Data in the DSL is stored as compressed columnar format using AES 256-bit encryption.
Virtual Warehouse
2013 2014 201 2016 2017 2018
- Virtual Warehouse are cluster of nodes that process the data.
- In case of AWS, these nodes are EC2 instances and accordingly for Azure and Google.
- Computation/processing is performed by Virtual Warehouse which helps in loading and querying of
data.
- It does not store the data and can be suspended when not in use.
- Suspended virtual warehouse can automatically resume upon running query.
- It can cache the query result.
- Size of virtual warehouse can be scaled up or down (manual process).
- Elastic or Multi cluster virtual warehouse - can replicate multiple virtual warehouse of the same
size depending upon the workload (automatic process)
- WHEN TO SCALE UP AND DOWN CLUSTER?
Scaling Policy
2013 2014 201 2016 2017 2018
- How many queries does Snowflake queues before it spins up additional cluster?
- STANDARD: Immediately when a query is queued, i.e. when the system detects that there is
one more query than the currently running cluster can execute.
- ECONOMY: Only if the system estimates there is enough query load to keep the new cluster
busy for at least 6 minutes.
Virtual Warehouse Size
2013 2014 201 2016 2017 2018
Size X-Small Small Medium Large X-Large 2X-Large 3X-Large 4X-Large
No. of
nodes
1 2 4 8 16 32 64 128
Demo of Virtual Warehouse
2013 2014 201 2016 2017 2018
Life without Snowflake
2013 2014 201 2016 2017 2018
Life with Snowflake
2013 2014 201 2016 2017 2018
Pricing
2013 2014 201 2016 2017 2018
https://www.snowflake.com/pricing/
How it works?
2013 2014 201 2016 2017 2018
Deep Dive in Architecture
2013 2014 201 2016 2017 2018
Staging Area
2013 2014 201 2016 2017 2018
“Stages” or “Staging Areas” are places to put things temporarily before moving them to a
more stable location.
Staging Area (contd.)
2013 2014 201 2016 2017 2018
- External storage from where data is loaded in Snowflake’s Data Storage Layer.
- External storage can be AWS S3, Azure Blob Storage, Google Cloud Storage.
- It is treated as Data Lake where land first lands into.
- From staging area we load data into Snowflake database, after performing transformations if
required.
- To load batch data:
Snowflake’s COPY command, Informatica, Talend, Matillion
- To load continuous data:
Snowpipe, Kafka, Kinesis
Snowflake in Action
2013 2014 201 2016 2017 2018
Real life use-case
2013 2014 201 2016 2017 2018
Demo
2013 2014 201 2016 2017 2018
- Bulk data loading into Snowflake
- Continuous data loading into Snowflake
(optional)
Thank You !
Contact us at:
hello@knoldus.com
Connect with me:
linkedin.com/in/sarfaraz-hussain-
8123b4132

KSnow: Getting started with Snowflake

  • 1.
    Presented By: SarfarazHussain Sr. Software Consultant Knoldus Inc KSnow: Getting started with Snowflake (A cloud data warehouse)
  • 2.
    Lack of etiquetteand manners is a huge turn off. KnolX Etiquettes Punctuality Respect Knolx session timings, you are requested not to join sessions after a 5 minutes threshold post the session start time. Feedback Make sure to submit a constructive feedback for all sessions as it is very helpful for the presenter. Silent Mode Keep your screen on mute, until it is necessary. Avoid Distraction Be along with the presenter during the session and enjoy.
  • 3.
    Agenda 01 Prerequisite Knowledge 02Snowflake and it’s internal Architecture 03 Snowflake vs. Big Data Tools 04 Virtual Warehouse and Staging Area 05 Deep Dive into working Architecture 06 Use-case and DEMO
  • 4.
    Data Warehouse vs.Data Lake Teradata Exadata HDFS AWS S3
  • 5.
    Need of DataWarehouse
  • 6.
    What is DataWarehouse? 2013 2014 201 2016 2017 2018 - DWH is a centralized place to store large amount of historical data produced by a system/organization to find out meaningful insights after processing and analyzing the data. - Traditional Data Warehouse Architecture:
  • 7.
    What is Snowflake? 20132014 201 2016 2017 2018 Snowflake is modern-day data processing system that is intended to make the best use of the elasticity of the cloud so that it can scale to infinity. Features: - Cloud based data warehouse - SaaS solution - Pay per Use model (storage + compute) - Supports standard ANSI SQL - Supports ODBC and JDBC connectors - Auto Scalable and Elastic (Virtual Warehouse) - Unlimited storage of data (Uses AWS S3, Azure Blob Storage, Google Cloud Storage)
  • 8.
    Snowflake (contd.) 2013 2014201 2016 2017 2018 Advantages: - Easy to process huge volume of data - Provides ACID transaction - No data backups required - No need to worry about Optimization - No need to maintain Indexes - No Out of Memory issues - Sharing data Disadvantage: - COST
  • 9.
    Snowflake vs. BigData Tools 2013 2014 201 2016 2017 2018 Apache Hive - It is a data warehouse on top of HDFS - It has performance challenges as it uses MapReduce for processing Apache Spark (Batch SQL processing) - Spark SQL has limited support for advanced SQL operation - Advance optimizations are developer’s responsibility - Resource allocation is developer’s responsibility
  • 10.
  • 11.
  • 12.
    Data Storage Layer 20132014 201 2016 2017 2018 - When we create a Snowflake account, we select the underlying cloud provider. - Cloud provider can be AWS, Azure, Google. - According to our choice, the Data Storage Layer (DSL) is hosted on AWS S3, Azure Blob Storage or Google Cloud Storage. - DSL stores the actual data and provides unlimited space. - Data in the DSL is stored as compressed columnar format using AES 256-bit encryption.
  • 13.
    Virtual Warehouse 2013 2014201 2016 2017 2018 - Virtual Warehouse are cluster of nodes that process the data. - In case of AWS, these nodes are EC2 instances and accordingly for Azure and Google. - Computation/processing is performed by Virtual Warehouse which helps in loading and querying of data. - It does not store the data and can be suspended when not in use. - Suspended virtual warehouse can automatically resume upon running query. - It can cache the query result. - Size of virtual warehouse can be scaled up or down (manual process). - Elastic or Multi cluster virtual warehouse - can replicate multiple virtual warehouse of the same size depending upon the workload (automatic process) - WHEN TO SCALE UP AND DOWN CLUSTER?
  • 14.
    Scaling Policy 2013 2014201 2016 2017 2018 - How many queries does Snowflake queues before it spins up additional cluster? - STANDARD: Immediately when a query is queued, i.e. when the system detects that there is one more query than the currently running cluster can execute. - ECONOMY: Only if the system estimates there is enough query load to keep the new cluster busy for at least 6 minutes.
  • 15.
    Virtual Warehouse Size 20132014 201 2016 2017 2018 Size X-Small Small Medium Large X-Large 2X-Large 3X-Large 4X-Large No. of nodes 1 2 4 8 16 32 64 128
  • 16.
    Demo of VirtualWarehouse 2013 2014 201 2016 2017 2018
  • 17.
    Life without Snowflake 20132014 201 2016 2017 2018
  • 18.
    Life with Snowflake 20132014 201 2016 2017 2018
  • 19.
    Pricing 2013 2014 2012016 2017 2018 https://www.snowflake.com/pricing/
  • 20.
    How it works? 20132014 201 2016 2017 2018
  • 21.
    Deep Dive inArchitecture 2013 2014 201 2016 2017 2018
  • 22.
    Staging Area 2013 2014201 2016 2017 2018 “Stages” or “Staging Areas” are places to put things temporarily before moving them to a more stable location.
  • 23.
    Staging Area (contd.) 20132014 201 2016 2017 2018 - External storage from where data is loaded in Snowflake’s Data Storage Layer. - External storage can be AWS S3, Azure Blob Storage, Google Cloud Storage. - It is treated as Data Lake where land first lands into. - From staging area we load data into Snowflake database, after performing transformations if required. - To load batch data: Snowflake’s COPY command, Informatica, Talend, Matillion - To load continuous data: Snowpipe, Kafka, Kinesis
  • 24.
    Snowflake in Action 20132014 201 2016 2017 2018
  • 25.
    Real life use-case 20132014 201 2016 2017 2018
  • 26.
    Demo 2013 2014 2012016 2017 2018 - Bulk data loading into Snowflake - Continuous data loading into Snowflake (optional)
  • 27.
    Thank You ! Contactus at: hello@knoldus.com Connect with me: linkedin.com/in/sarfaraz-hussain- 8123b4132