SlideShare a Scribd company logo
1 of 27
Download to read offline
Presented By: Sarfaraz Hussain
Sr. Software Consultant
Knoldus Inc
KSnow: Getting started with Snowflake
(A cloud data warehouse)
Lack of etiquette and manners is a huge turn off.
KnolX Etiquettes
Punctuality
Respect Knolx session timings, you
are requested not to join sessions
after a 5 minutes threshold post
the session start time.
Feedback
Make sure to submit a constructive
feedback for all sessions as it is
very helpful for the presenter.
Silent Mode
Keep your screen on mute, until it
is necessary.
Avoid Distraction
Be along with the presenter during
the session and enjoy.
Agenda
01 Prerequisite Knowledge
02 Snowflake and it’s internal Architecture
03 Snowflake vs. Big Data Tools
04 Virtual Warehouse and Staging Area
05 Deep Dive into working Architecture
06 Use-case and DEMO
Data Warehouse vs. Data Lake
Teradata
Exadata
HDFS
AWS S3
Need of Data Warehouse
What is Data Warehouse?
2013 2014 201 2016 2017 2018
- DWH is a centralized place to store large amount of historical data produced by a
system/organization to find out meaningful insights after processing and analyzing the data.
- Traditional Data Warehouse Architecture:
What is Snowflake?
2013 2014 201 2016 2017 2018
Snowflake is modern-day data processing system that is intended to make the best
use of the elasticity of the cloud so that it can scale to infinity.
Features:
- Cloud based data warehouse
- SaaS solution
- Pay per Use model (storage + compute)
- Supports standard ANSI SQL
- Supports ODBC and JDBC connectors
- Auto Scalable and Elastic (Virtual Warehouse)
- Unlimited storage of data (Uses AWS S3, Azure Blob Storage, Google Cloud Storage)
Snowflake (contd.)
2013 2014 201 2016 2017 2018
Advantages:
- Easy to process huge volume of data
- Provides ACID transaction
- No data backups required
- No need to worry about Optimization
- No need to maintain Indexes
- No Out of Memory issues
- Sharing data
Disadvantage:
- COST
Snowflake vs. Big Data Tools
2013 2014 201 2016 2017 2018
Apache Hive
- It is a data warehouse on top of HDFS
- It has performance challenges as it uses MapReduce for processing
Apache Spark (Batch SQL processing)
- Spark SQL has limited support for advanced SQL operation
- Advance optimizations are developer’s responsibility
- Resource allocation is developer’s responsibility
Snowflake Architecture
2013 2014 201 2016 2017 2018
Snowflake Architecture
2013 2014 201 2016 2017 2018
Data Storage Layer
2013 2014 201 2016 2017 2018
- When we create a Snowflake account, we select the underlying cloud provider.
- Cloud provider can be AWS, Azure, Google.
- According to our choice, the Data Storage Layer (DSL) is hosted on AWS S3, Azure Blob Storage
or Google Cloud Storage.
- DSL stores the actual data and provides unlimited space.
- Data in the DSL is stored as compressed columnar format using AES 256-bit encryption.
Virtual Warehouse
2013 2014 201 2016 2017 2018
- Virtual Warehouse are cluster of nodes that process the data.
- In case of AWS, these nodes are EC2 instances and accordingly for Azure and Google.
- Computation/processing is performed by Virtual Warehouse which helps in loading and querying of
data.
- It does not store the data and can be suspended when not in use.
- Suspended virtual warehouse can automatically resume upon running query.
- It can cache the query result.
- Size of virtual warehouse can be scaled up or down (manual process).
- Elastic or Multi cluster virtual warehouse - can replicate multiple virtual warehouse of the same
size depending upon the workload (automatic process)
- WHEN TO SCALE UP AND DOWN CLUSTER?
Scaling Policy
2013 2014 201 2016 2017 2018
- How many queries does Snowflake queues before it spins up additional cluster?
- STANDARD: Immediately when a query is queued, i.e. when the system detects that there is
one more query than the currently running cluster can execute.
- ECONOMY: Only if the system estimates there is enough query load to keep the new cluster
busy for at least 6 minutes.
Virtual Warehouse Size
2013 2014 201 2016 2017 2018
Size X-Small Small Medium Large X-Large 2X-Large 3X-Large 4X-Large
No. of
nodes
1 2 4 8 16 32 64 128
Demo of Virtual Warehouse
2013 2014 201 2016 2017 2018
Life without Snowflake
2013 2014 201 2016 2017 2018
Life with Snowflake
2013 2014 201 2016 2017 2018
Pricing
2013 2014 201 2016 2017 2018
https://www.snowflake.com/pricing/
How it works?
2013 2014 201 2016 2017 2018
Deep Dive in Architecture
2013 2014 201 2016 2017 2018
Staging Area
2013 2014 201 2016 2017 2018
“Stages” or “Staging Areas” are places to put things temporarily before moving them to a
more stable location.
Staging Area (contd.)
2013 2014 201 2016 2017 2018
- External storage from where data is loaded in Snowflake’s Data Storage Layer.
- External storage can be AWS S3, Azure Blob Storage, Google Cloud Storage.
- It is treated as Data Lake where land first lands into.
- From staging area we load data into Snowflake database, after performing transformations if
required.
- To load batch data:
Snowflake’s COPY command, Informatica, Talend, Matillion
- To load continuous data:
Snowpipe, Kafka, Kinesis
Snowflake in Action
2013 2014 201 2016 2017 2018
Real life use-case
2013 2014 201 2016 2017 2018
Demo
2013 2014 201 2016 2017 2018
- Bulk data loading into Snowflake
- Continuous data loading into Snowflake
(optional)
Thank You !
Contact us at:
hello@knoldus.com
Connect with me:
linkedin.com/in/sarfaraz-hussain-
8123b4132

More Related Content

What's hot

What's hot (20)

CAP Theorem - Theory, Implications and Practices
CAP Theorem - Theory, Implications and PracticesCAP Theorem - Theory, Implications and Practices
CAP Theorem - Theory, Implications and Practices
 
BDA311 Introduction to AWS Glue
BDA311 Introduction to AWS GlueBDA311 Introduction to AWS Glue
BDA311 Introduction to AWS Glue
 
Autonomous Data Warehouse
Autonomous Data WarehouseAutonomous Data Warehouse
Autonomous Data Warehouse
 
Snowflake free trial_lab_guide
Snowflake free trial_lab_guideSnowflake free trial_lab_guide
Snowflake free trial_lab_guide
 
CockroachDB
CockroachDBCockroachDB
CockroachDB
 
Amazon S3 & Amazon Glacier - Object Storage Overview
Amazon S3 & Amazon Glacier - Object Storage OverviewAmazon S3 & Amazon Glacier - Object Storage Overview
Amazon S3 & Amazon Glacier - Object Storage Overview
 
Snowflake SnowPro Core Cert CheatSheet.pdf
Snowflake SnowPro Core Cert CheatSheet.pdfSnowflake SnowPro Core Cert CheatSheet.pdf
Snowflake SnowPro Core Cert CheatSheet.pdf
 
OCI Overview
OCI OverviewOCI Overview
OCI Overview
 
2019 - OOW - Database Migration Methods from On-Premise to Cloud
2019 - OOW - Database Migration Methods from On-Premise to Cloud2019 - OOW - Database Migration Methods from On-Premise to Cloud
2019 - OOW - Database Migration Methods from On-Premise to Cloud
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
Move your oracle apps to oci
Move your oracle apps to ociMove your oracle apps to oci
Move your oracle apps to oci
 
Snowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat SheetSnowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat Sheet
 
Understanding oracle rac internals part 1 - slides
Understanding oracle rac internals   part 1 - slidesUnderstanding oracle rac internals   part 1 - slides
Understanding oracle rac internals part 1 - slides
 
Migration From Oracle to PostgreSQL
Migration From Oracle to PostgreSQLMigration From Oracle to PostgreSQL
Migration From Oracle to PostgreSQL
 
Oracle Real Application Clusters (RAC) 12c Rel. 2 - Operational Best Practices
Oracle Real Application Clusters (RAC) 12c Rel. 2 - Operational Best PracticesOracle Real Application Clusters (RAC) 12c Rel. 2 - Operational Best Practices
Oracle Real Application Clusters (RAC) 12c Rel. 2 - Operational Best Practices
 
Amazon Aurora: Under the Hood
Amazon Aurora: Under the HoodAmazon Aurora: Under the Hood
Amazon Aurora: Under the Hood
 
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech Talks
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech TalksMigrating Your Oracle Database to PostgreSQL - AWS Online Tech Talks
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech Talks
 
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
 
Introduction to Amazon DynamoDB
Introduction to Amazon DynamoDBIntroduction to Amazon DynamoDB
Introduction to Amazon DynamoDB
 
Heterogenous Migration with DMS & SCT
Heterogenous Migration with DMS & SCTHeterogenous Migration with DMS & SCT
Heterogenous Migration with DMS & SCT
 

Similar to KSnow: Getting started with Snowflake

Hadoop workshop
Hadoop workshopHadoop workshop
Hadoop workshop
Fang Mac
 

Similar to KSnow: Getting started with Snowflake (20)

Let’s get to know Snowflake
Let’s get to know SnowflakeLet’s get to know Snowflake
Let’s get to know Snowflake
 
Accelerate Spark Workloads on S3
Accelerate Spark Workloads on S3Accelerate Spark Workloads on S3
Accelerate Spark Workloads on S3
 
Hadoop workshop
Hadoop workshopHadoop workshop
Hadoop workshop
 
Things You Should Know About Snowflake Warehouses
Things You Should Know About Snowflake WarehousesThings You Should Know About Snowflake Warehouses
Things You Should Know About Snowflake Warehouses
 
Cloud Data Warehousing presentation by Rogier Werschkull, including tips, bes...
Cloud Data Warehousing presentation by Rogier Werschkull, including tips, bes...Cloud Data Warehousing presentation by Rogier Werschkull, including tips, bes...
Cloud Data Warehousing presentation by Rogier Werschkull, including tips, bes...
 
Demystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWDemystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFW
 
Hybrid as a Stepping Stone: It’s Not All or Nothing for Your Cloud Transforma...
Hybrid as a Stepping Stone: It’s Not All or Nothing for Your Cloud Transforma...Hybrid as a Stepping Stone: It’s Not All or Nothing for Your Cloud Transforma...
Hybrid as a Stepping Stone: It’s Not All or Nothing for Your Cloud Transforma...
 
Best-Practices-for-Using-Tableau-With-Snowflake.pdf
Best-Practices-for-Using-Tableau-With-Snowflake.pdfBest-Practices-for-Using-Tableau-With-Snowflake.pdf
Best-Practices-for-Using-Tableau-With-Snowflake.pdf
 
Big data knolx
Big data knolxBig data knolx
Big data knolx
 
Zenko @Cloud Native Foundation London Meetup March 6th 2018
Zenko @Cloud Native Foundation London Meetup March 6th 2018Zenko @Cloud Native Foundation London Meetup March 6th 2018
Zenko @Cloud Native Foundation London Meetup March 6th 2018
 
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
Big Data Infrastructure
Big Data InfrastructureBig Data Infrastructure
Big Data Infrastructure
 
Databricks for Dummies
Databricks for DummiesDatabricks for Dummies
Databricks for Dummies
 
Demystifying Data Warehouse as a Service
Demystifying Data Warehouse as a ServiceDemystifying Data Warehouse as a Service
Demystifying Data Warehouse as a Service
 
A Brave new object store world
A Brave new object store worldA Brave new object store world
A Brave new object store world
 
Ibm integrated analytics system
Ibm integrated analytics systemIbm integrated analytics system
Ibm integrated analytics system
 
Securely Enhancing Data Access in Hybrid Cloud with Alluxio
Securely Enhancing Data Access in Hybrid Cloud with AlluxioSecurely Enhancing Data Access in Hybrid Cloud with Alluxio
Securely Enhancing Data Access in Hybrid Cloud with Alluxio
 
Phissug s01 ep6, stretch database
Phissug s01 ep6, stretch databasePhissug s01 ep6, stretch database
Phissug s01 ep6, stretch database
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP
 

More from Knoldus Inc.

More from Knoldus Inc. (20)

Supply chain security with Kubeclarity.pptx
Supply chain security with Kubeclarity.pptxSupply chain security with Kubeclarity.pptx
Supply chain security with Kubeclarity.pptx
 
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML ParsingMastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
 
Akka gRPC Essentials A Hands-On Introduction
Akka gRPC Essentials A Hands-On IntroductionAkka gRPC Essentials A Hands-On Introduction
Akka gRPC Essentials A Hands-On Introduction
 
Entity Core with Core Microservices.pptx
Entity Core with Core Microservices.pptxEntity Core with Core Microservices.pptx
Entity Core with Core Microservices.pptx
 
Introduction to Redis and its features.pptx
Introduction to Redis and its features.pptxIntroduction to Redis and its features.pptx
Introduction to Redis and its features.pptx
 
GraphQL with .NET Core Microservices.pdf
GraphQL with .NET Core Microservices.pdfGraphQL with .NET Core Microservices.pdf
GraphQL with .NET Core Microservices.pdf
 
NuGet Packages Presentation (DoT NeT).pptx
NuGet Packages Presentation (DoT NeT).pptxNuGet Packages Presentation (DoT NeT).pptx
NuGet Packages Presentation (DoT NeT).pptx
 
Data Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable TestingData Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable Testing
 
K8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose KubernetesK8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose Kubernetes
 
Introduction to Circle Ci Presentation.pptx
Introduction to Circle Ci Presentation.pptxIntroduction to Circle Ci Presentation.pptx
Introduction to Circle Ci Presentation.pptx
 
Robusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptxRobusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptx
 
Optimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptxOptimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptx
 
Azure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptxAzure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptx
 
CQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptxCQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptx
 
ETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake PresentationETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake Presentation
 
Scripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics PresentationScripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics Presentation
 
Getting started with dotnet core Web APIs
Getting started with dotnet core Web APIsGetting started with dotnet core Web APIs
Getting started with dotnet core Web APIs
 
Introduction To Rust part II Presentation
Introduction To Rust part II PresentationIntroduction To Rust part II Presentation
Introduction To Rust part II Presentation
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Configuring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRAConfiguring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRA
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

KSnow: Getting started with Snowflake

  • 1. Presented By: Sarfaraz Hussain Sr. Software Consultant Knoldus Inc KSnow: Getting started with Snowflake (A cloud data warehouse)
  • 2. Lack of etiquette and manners is a huge turn off. KnolX Etiquettes Punctuality Respect Knolx session timings, you are requested not to join sessions after a 5 minutes threshold post the session start time. Feedback Make sure to submit a constructive feedback for all sessions as it is very helpful for the presenter. Silent Mode Keep your screen on mute, until it is necessary. Avoid Distraction Be along with the presenter during the session and enjoy.
  • 3. Agenda 01 Prerequisite Knowledge 02 Snowflake and it’s internal Architecture 03 Snowflake vs. Big Data Tools 04 Virtual Warehouse and Staging Area 05 Deep Dive into working Architecture 06 Use-case and DEMO
  • 4. Data Warehouse vs. Data Lake Teradata Exadata HDFS AWS S3
  • 5. Need of Data Warehouse
  • 6. What is Data Warehouse? 2013 2014 201 2016 2017 2018 - DWH is a centralized place to store large amount of historical data produced by a system/organization to find out meaningful insights after processing and analyzing the data. - Traditional Data Warehouse Architecture:
  • 7. What is Snowflake? 2013 2014 201 2016 2017 2018 Snowflake is modern-day data processing system that is intended to make the best use of the elasticity of the cloud so that it can scale to infinity. Features: - Cloud based data warehouse - SaaS solution - Pay per Use model (storage + compute) - Supports standard ANSI SQL - Supports ODBC and JDBC connectors - Auto Scalable and Elastic (Virtual Warehouse) - Unlimited storage of data (Uses AWS S3, Azure Blob Storage, Google Cloud Storage)
  • 8. Snowflake (contd.) 2013 2014 201 2016 2017 2018 Advantages: - Easy to process huge volume of data - Provides ACID transaction - No data backups required - No need to worry about Optimization - No need to maintain Indexes - No Out of Memory issues - Sharing data Disadvantage: - COST
  • 9. Snowflake vs. Big Data Tools 2013 2014 201 2016 2017 2018 Apache Hive - It is a data warehouse on top of HDFS - It has performance challenges as it uses MapReduce for processing Apache Spark (Batch SQL processing) - Spark SQL has limited support for advanced SQL operation - Advance optimizations are developer’s responsibility - Resource allocation is developer’s responsibility
  • 10. Snowflake Architecture 2013 2014 201 2016 2017 2018
  • 11. Snowflake Architecture 2013 2014 201 2016 2017 2018
  • 12. Data Storage Layer 2013 2014 201 2016 2017 2018 - When we create a Snowflake account, we select the underlying cloud provider. - Cloud provider can be AWS, Azure, Google. - According to our choice, the Data Storage Layer (DSL) is hosted on AWS S3, Azure Blob Storage or Google Cloud Storage. - DSL stores the actual data and provides unlimited space. - Data in the DSL is stored as compressed columnar format using AES 256-bit encryption.
  • 13. Virtual Warehouse 2013 2014 201 2016 2017 2018 - Virtual Warehouse are cluster of nodes that process the data. - In case of AWS, these nodes are EC2 instances and accordingly for Azure and Google. - Computation/processing is performed by Virtual Warehouse which helps in loading and querying of data. - It does not store the data and can be suspended when not in use. - Suspended virtual warehouse can automatically resume upon running query. - It can cache the query result. - Size of virtual warehouse can be scaled up or down (manual process). - Elastic or Multi cluster virtual warehouse - can replicate multiple virtual warehouse of the same size depending upon the workload (automatic process) - WHEN TO SCALE UP AND DOWN CLUSTER?
  • 14. Scaling Policy 2013 2014 201 2016 2017 2018 - How many queries does Snowflake queues before it spins up additional cluster? - STANDARD: Immediately when a query is queued, i.e. when the system detects that there is one more query than the currently running cluster can execute. - ECONOMY: Only if the system estimates there is enough query load to keep the new cluster busy for at least 6 minutes.
  • 15. Virtual Warehouse Size 2013 2014 201 2016 2017 2018 Size X-Small Small Medium Large X-Large 2X-Large 3X-Large 4X-Large No. of nodes 1 2 4 8 16 32 64 128
  • 16. Demo of Virtual Warehouse 2013 2014 201 2016 2017 2018
  • 17. Life without Snowflake 2013 2014 201 2016 2017 2018
  • 18. Life with Snowflake 2013 2014 201 2016 2017 2018
  • 19. Pricing 2013 2014 201 2016 2017 2018 https://www.snowflake.com/pricing/
  • 20. How it works? 2013 2014 201 2016 2017 2018
  • 21. Deep Dive in Architecture 2013 2014 201 2016 2017 2018
  • 22. Staging Area 2013 2014 201 2016 2017 2018 “Stages” or “Staging Areas” are places to put things temporarily before moving them to a more stable location.
  • 23. Staging Area (contd.) 2013 2014 201 2016 2017 2018 - External storage from where data is loaded in Snowflake’s Data Storage Layer. - External storage can be AWS S3, Azure Blob Storage, Google Cloud Storage. - It is treated as Data Lake where land first lands into. - From staging area we load data into Snowflake database, after performing transformations if required. - To load batch data: Snowflake’s COPY command, Informatica, Talend, Matillion - To load continuous data: Snowpipe, Kafka, Kinesis
  • 24. Snowflake in Action 2013 2014 201 2016 2017 2018
  • 25. Real life use-case 2013 2014 201 2016 2017 2018
  • 26. Demo 2013 2014 201 2016 2017 2018 - Bulk data loading into Snowflake - Continuous data loading into Snowflake (optional)
  • 27. Thank You ! Contact us at: hello@knoldus.com Connect with me: linkedin.com/in/sarfaraz-hussain- 8123b4132