SlideShare a Scribd company logo
Best Practices for Cloud Data
Warehousing with Snowflake and
AWS
Speakers
Why AWS for Big Data?
Immediately
Available
Broad and Deep
Capabilities
Trusted and
Secure
Scalable
Big Data is for everyone
The market for Big Data technologies is growing more than six times faster than the
information technology market as a whole….
…and those companies who use their data well win.
AWS Global Infrastructure
Amazon S3
 Highly durable object storage for all types of data
Internet-scale storage
Grow without limits
Benefit from AWS’s
massive security
investments
Built-in redundancy
Designed for
99.999999999%
durability
Low price per GB
per month
No commitment
No up-front cost
Amazon Elastic Cloud Compute (EC2)
Virtual servers
hosted on the
Amazon Cloud
Scale up or down
quickly, as needed
Pay for what
you use
Familiar operating
systems
Secure Operating Environment
Logically isolated network
resources
ComplianceShared Responsibility
Poll Question 2: Data sources
 What data sources would you like to analyze today (choose multiple)?
– Customer facing mobile applications (JSON/XML, semi-structured, etc.)
– Traditional applications (structured data)
– Machine data or Weblog data from your website (JSON / XML)
About IAC Publishing Labs
 Headquartered in Oakland, CA
 One of the largest collections of premium
publishers
 Growth through acquisition
BI Team
 Provides centralized analytics
 Manages 300+ million events and 50-100 million
keywords
 Manages marketing terms for bidding and
monetization
 Imports more than 1.5 terabytes of raw data daily
 Incorporate different data sources
IAC Publishing Labs (Ask.com, About.com)
IAC Publishing Labs turns data from a business barrier to a business
accelerator with Snowflake on AWS
Let’s be honest
I want to spend all my day
waiting for infrastructure,
struggling to get access to
data, and competing for
resources
I want to be a data
superhero, taking on
the toughest data
challenges to wrestle
new insights from data
I want to spend all my day
waiting for infrastructure,
struggling to get access to
data, and competing for
resources
Let’s be honest
The Challenge
With a query running
longer than 30
minutes, there was a
70% chance the (on-
premises) database
would shut down
Legacy Data Warehouse
 Large MPP warehouse - 6 months of history
 Large Cloudera Hadoop cluster - longer than 6
months of data retention
Pains
 Can’t natively process JSON data
 No TEST/DEV environment
 Unstable !
A Rigid System that Could Not Keep Pace with the Growing Business
Path to Better BI
Requirements
 120+ metrics
 Need to move to ‘as-a-service’ architecture
Evaluation
 PoC Vendors- Snowflake on AWS, Google Big Query, other
cloud data warehouse alternatives
After completing
evaluations and
choosing Snowflake
Elastic Data
Warehouse on AWS,
IAC Publishing Labs
was in full
production in just
three months
Benefits
 Ability for large number of users to query the same data
 Querying JSON data from the web logs
 Pinpoint logging in near-real time
 Spin up/down warehouses of any size as required
Chosen Solution - Snowflake on AWS
IAC Publishing Labs now has a single environment for processing data
and producing results
Scalability with Greater Control on the AWS Cloud
Flexibility
Spin up/down
warehouses of any size
as required
One data
warehouse to rule
them all
Snowflake architecture
allows us to combine
data and workloads in
one environment
Thorough Testing
Instantly able to create
new dev/test
environment without
creating multiple copies
of data and impacting
production
Concurrency
Unlimited processing
with Snowflake and
Amazon EC2
Stability
Separate, controlled
access to new users
Enhanced Service Levels
Improvements
Stable systems
 30+analysts querying concurrently
 24x7x365 loading data (1.5 TB loaded every day)
Process data once a day, over 3 hours
Data load every 15 seconds
Data load every 5 hours
Process data every hour under 10 min
Then
Now
Vs.
Then
Now
Vs.
Transition from Cost Center to Value Center
Snowflake on AWS holds both internal and external data together
Single source of truth Real time visibility Data metrics matches
speed of business
No capital/
infrastructure
investments
Elastic Data Warehouse
Results
 Establishing one source of truth in a centralized data warehouse
 Consolidating technologies and eliminating legacy platforms
 Providing enhanced BI service levels through
 Changing the BI team from a cost center to a value center
 Decreasing expenses significantly (by 78%) for the data warehouse
environment
Poll Question 3: Pain points
 For your data warehouse and/or big data platforms, what are the
biggest pain-points? (you can select multiple)
– Constantly juggling performance, management and scaling users
– Difficulty in bringing in new sources of data
– Current solution costs are too high
What data analysts want
Easy access to all
relevant data
Expand direct access to
data insights
Without burdensome cost
and complexity
Common realities
Silos of Data
Difficult to bring together diverse
data—application data, machine-
generated data, streaming data
Common realities
Complex Data Infrastructure
Significant resources spent building
and maintaining data platforms
Silos of Data
Difficult to bring together diverse
data—application data, machine-
generated data, streaming data
Data Warehouse(s) Datamarts
Hadoop &
noSQL
Common realities
Frustrated Analysts
Limited by incomplete data, delays in
access, performance
Complex Data Infrastructure
Significant resources spent building
and maintaining data platforms
Silos of Data
Difficult to bring together diverse data--
application data, machine-generated
data, streaming data
DatamartsData Warehouse(s)
Hadoop &
noSQL
Introducing Snowflake’s Elastic Data Warehouse
All-new SQL data
warehouse
No legacy code or constraints
Designed for the cloud
Running on AWS
Delivered as a service
No infrastructure, knobs or
tuning to manage
Bring together data in one place
 Any scale of data
– Transparently scale up and down, online and
on-demand
Bring together data in one place
 Any scale of data
– Transparently scale up and down, online and
on-demand
 Efficient storage, low cost
– Columnar, automatically compressed storage +
pay for only what you use
Bring together data in one place
 Any scale of data
– Transparently scale up and down, online and
on-demand
 Efficient storage, low cost
– Columnar, automatically compressed storage +
pay for only what you use
 Native support for diverse data
– Structured + semi-structured (JSON, Avro, ...)
in one system, without sacrificing performance
or flexibility
Accelerate analytics
 Simpler, faster data pipeline
– Load data directly into Snowflake, at any time,
without additional systems or transformations
Accelerate analytics
 Simpler, faster data pipeline
– Load data directly into Snowflake, at any time,
without additional systems or transformations
 Any scale of workload
– Elastic scaling to handle any size job—in-
database performance for complex queries
Accelerate analytics
 Simpler, faster data pipeline
– Load data directly into Snowflake, at any time,
without additional systems or transformations
 Any scale of workload
– Elastic scaling to handle any size job—in-
database performance for complex queries
Accelerate analytics at a fraction of the cost
 Simpler, faster data pipeline
– Load data directly into Snowflake, at any time,
without additional systems or transformations
 Any scale of workload
– Elastic scaling to handle any size job—in-
database performance for complex queries
 Infinite concurrency scaling
– Give workloads independently processing
resources, without needing to copy or
move data
 At a fraction of the Cost
– Cloud economies of scale, pay for only what
you use, and align storage & processing costs
with use
No infrastructure, knobs, or tuning
On-premises Cloud data warehouse Data warehouse service
Infrastructure
Datacenter ✘ (customer) ✔ (vendor)
✔
Hardware & software ✘ ✔
Upgrades & scaling ✘ ✘
Database
management
& tuning
Index management ✘ ✘
Data partitioning ✘ ✘
Metadata & statistics maintenance ✘ ✘
Query optimization ✘ ✘
Data & service
availability
Failure recovery ✘ ✔
Disaster recovery ✘ ✘
Data protection ✘ ✘
Service monitoring & alerting ✘ ✘
Security
Physical security ✘ ✔
Deployment security ✘ ✔
Security monitoring ✘ ✘
In summary
Fast analytics
“Snowflake gets us answers an
order of magnitude faster. As a
result we can do 100 times more
queries per day.”
Balaji Rao, Accordant Media
Without the cost
“Snowflake is extremely cost
effective—we have saved nearly 80%
by implementing Snowflake.”
Rolfe Lindberg, DoubleDown
Zero management
“Snowflake makes it possible for us
to focus on making use of our data
without the complexity and resources
required by traditional data
warehousing and big data solutions.”
Ethan Erchinger, Chime
One place for data
“I can't say enough about how
fantastic the native JSON support is.”
Josh McDonald, KIXEYE
Questions & Answers

More Related Content

What's hot

Data Sharing with Snowflake
Data Sharing with SnowflakeData Sharing with Snowflake
Data Sharing with Snowflake
Snowflake Computing
 

What's hot (20)

Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
 
Demystifying Data Warehouse as a Service
Demystifying Data Warehouse as a ServiceDemystifying Data Warehouse as a Service
Demystifying Data Warehouse as a Service
 
Let’s get to know Snowflake
Let’s get to know SnowflakeLet’s get to know Snowflake
Let’s get to know Snowflake
 
How to Take Advantage of an Enterprise Data Warehouse in the Cloud
How to Take Advantage of an Enterprise Data Warehouse in the CloudHow to Take Advantage of an Enterprise Data Warehouse in the Cloud
How to Take Advantage of an Enterprise Data Warehouse in the Cloud
 
Changing the game with cloud dw
Changing the game with cloud dwChanging the game with cloud dw
Changing the game with cloud dw
 
Snowflake Datawarehouse Architecturing
Snowflake Datawarehouse ArchitecturingSnowflake Datawarehouse Architecturing
Snowflake Datawarehouse Architecturing
 
Snowflake Data Loading.pptx
Snowflake Data Loading.pptxSnowflake Data Loading.pptx
Snowflake Data Loading.pptx
 
Snowflake Company Presentation
Snowflake Company PresentationSnowflake Company Presentation
Snowflake Company Presentation
 
Delivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with SnowflakeDelivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with Snowflake
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data Warehouse
 
Data Sharing with Snowflake
Data Sharing with SnowflakeData Sharing with Snowflake
Data Sharing with Snowflake
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Elastic Data Warehousing
Elastic Data WarehousingElastic Data Warehousing
Elastic Data Warehousing
 
ADV Slides: Strategies for Fitting a Data Lake into a Modern Data Architecture
ADV Slides: Strategies for Fitting a Data Lake into a Modern Data ArchitectureADV Slides: Strategies for Fitting a Data Lake into a Modern Data Architecture
ADV Slides: Strategies for Fitting a Data Lake into a Modern Data Architecture
 
Snowflake Architecture
Snowflake ArchitectureSnowflake Architecture
Snowflake Architecture
 
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothThe Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
 
An overview of snowflake
An overview of snowflakeAn overview of snowflake
An overview of snowflake
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform System
 

Similar to Snowflake Best Practices for Elastic Data Warehousing

클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
Amazon Web Services Korea
 
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...
Splunk
 

Similar to Snowflake Best Practices for Elastic Data Warehousing (20)

AWS Big Data Solution Days
AWS Big Data Solution DaysAWS Big Data Solution Days
AWS Big Data Solution Days
 
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution Overview
 
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
 
Big Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of LightBig Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of Light
 
Building a scalable analytics environment to support diverse workloads
Building a scalable analytics environment to support diverse workloadsBuilding a scalable analytics environment to support diverse workloads
Building a scalable analytics environment to support diverse workloads
 
Auckland Summit Keynote
Auckland Summit KeynoteAuckland Summit Keynote
Auckland Summit Keynote
 
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
Chip ICT | Hgst storage brochure
Chip ICT | Hgst storage brochureChip ICT | Hgst storage brochure
Chip ICT | Hgst storage brochure
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
 
Microsoft Data Warehousing
Microsoft Data Warehousing Microsoft Data Warehousing
Microsoft Data Warehousing
 
Top 10 Enterprise Use Cases for NoSQL
Top 10 Enterprise Use Cases for NoSQLTop 10 Enterprise Use Cases for NoSQL
Top 10 Enterprise Use Cases for NoSQL
 
How Enterprises are Using NoSQL for Mission-Critical Applications
How Enterprises are Using NoSQL for Mission-Critical ApplicationsHow Enterprises are Using NoSQL for Mission-Critical Applications
How Enterprises are Using NoSQL for Mission-Critical Applications
 
Big Data Solutions Day - Calgary
Big Data Solutions Day - CalgaryBig Data Solutions Day - Calgary
Big Data Solutions Day - Calgary
 
Tapping the cloud for real time data analytics
 Tapping the cloud for real time data analytics Tapping the cloud for real time data analytics
Tapping the cloud for real time data analytics
 

More from Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Recently uploaded (20)

IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 

Snowflake Best Practices for Elastic Data Warehousing

  • 1. Best Practices for Cloud Data Warehousing with Snowflake and AWS
  • 3. Why AWS for Big Data? Immediately Available Broad and Deep Capabilities Trusted and Secure Scalable
  • 4. Big Data is for everyone The market for Big Data technologies is growing more than six times faster than the information technology market as a whole…. …and those companies who use their data well win.
  • 6. Amazon S3  Highly durable object storage for all types of data Internet-scale storage Grow without limits Benefit from AWS’s massive security investments Built-in redundancy Designed for 99.999999999% durability Low price per GB per month No commitment No up-front cost
  • 7. Amazon Elastic Cloud Compute (EC2) Virtual servers hosted on the Amazon Cloud Scale up or down quickly, as needed Pay for what you use Familiar operating systems
  • 8. Secure Operating Environment Logically isolated network resources ComplianceShared Responsibility
  • 9. Poll Question 2: Data sources  What data sources would you like to analyze today (choose multiple)? – Customer facing mobile applications (JSON/XML, semi-structured, etc.) – Traditional applications (structured data) – Machine data or Weblog data from your website (JSON / XML)
  • 10. About IAC Publishing Labs  Headquartered in Oakland, CA  One of the largest collections of premium publishers  Growth through acquisition BI Team  Provides centralized analytics  Manages 300+ million events and 50-100 million keywords  Manages marketing terms for bidding and monetization  Imports more than 1.5 terabytes of raw data daily  Incorporate different data sources IAC Publishing Labs (Ask.com, About.com) IAC Publishing Labs turns data from a business barrier to a business accelerator with Snowflake on AWS
  • 11. Let’s be honest I want to spend all my day waiting for infrastructure, struggling to get access to data, and competing for resources
  • 12. I want to be a data superhero, taking on the toughest data challenges to wrestle new insights from data I want to spend all my day waiting for infrastructure, struggling to get access to data, and competing for resources Let’s be honest
  • 13. The Challenge With a query running longer than 30 minutes, there was a 70% chance the (on- premises) database would shut down Legacy Data Warehouse  Large MPP warehouse - 6 months of history  Large Cloudera Hadoop cluster - longer than 6 months of data retention Pains  Can’t natively process JSON data  No TEST/DEV environment  Unstable ! A Rigid System that Could Not Keep Pace with the Growing Business
  • 14. Path to Better BI Requirements  120+ metrics  Need to move to ‘as-a-service’ architecture Evaluation  PoC Vendors- Snowflake on AWS, Google Big Query, other cloud data warehouse alternatives After completing evaluations and choosing Snowflake Elastic Data Warehouse on AWS, IAC Publishing Labs was in full production in just three months
  • 15. Benefits  Ability for large number of users to query the same data  Querying JSON data from the web logs  Pinpoint logging in near-real time  Spin up/down warehouses of any size as required Chosen Solution - Snowflake on AWS IAC Publishing Labs now has a single environment for processing data and producing results
  • 16. Scalability with Greater Control on the AWS Cloud Flexibility Spin up/down warehouses of any size as required One data warehouse to rule them all Snowflake architecture allows us to combine data and workloads in one environment Thorough Testing Instantly able to create new dev/test environment without creating multiple copies of data and impacting production Concurrency Unlimited processing with Snowflake and Amazon EC2 Stability Separate, controlled access to new users
  • 17. Enhanced Service Levels Improvements Stable systems  30+analysts querying concurrently  24x7x365 loading data (1.5 TB loaded every day) Process data once a day, over 3 hours Data load every 15 seconds Data load every 5 hours Process data every hour under 10 min Then Now Vs. Then Now Vs.
  • 18. Transition from Cost Center to Value Center Snowflake on AWS holds both internal and external data together Single source of truth Real time visibility Data metrics matches speed of business No capital/ infrastructure investments Elastic Data Warehouse
  • 19. Results  Establishing one source of truth in a centralized data warehouse  Consolidating technologies and eliminating legacy platforms  Providing enhanced BI service levels through  Changing the BI team from a cost center to a value center  Decreasing expenses significantly (by 78%) for the data warehouse environment
  • 20. Poll Question 3: Pain points  For your data warehouse and/or big data platforms, what are the biggest pain-points? (you can select multiple) – Constantly juggling performance, management and scaling users – Difficulty in bringing in new sources of data – Current solution costs are too high
  • 21. What data analysts want Easy access to all relevant data Expand direct access to data insights Without burdensome cost and complexity
  • 22. Common realities Silos of Data Difficult to bring together diverse data—application data, machine- generated data, streaming data
  • 23. Common realities Complex Data Infrastructure Significant resources spent building and maintaining data platforms Silos of Data Difficult to bring together diverse data—application data, machine- generated data, streaming data Data Warehouse(s) Datamarts Hadoop & noSQL
  • 24. Common realities Frustrated Analysts Limited by incomplete data, delays in access, performance Complex Data Infrastructure Significant resources spent building and maintaining data platforms Silos of Data Difficult to bring together diverse data-- application data, machine-generated data, streaming data DatamartsData Warehouse(s) Hadoop & noSQL
  • 25. Introducing Snowflake’s Elastic Data Warehouse All-new SQL data warehouse No legacy code or constraints Designed for the cloud Running on AWS Delivered as a service No infrastructure, knobs or tuning to manage
  • 26. Bring together data in one place  Any scale of data – Transparently scale up and down, online and on-demand
  • 27. Bring together data in one place  Any scale of data – Transparently scale up and down, online and on-demand  Efficient storage, low cost – Columnar, automatically compressed storage + pay for only what you use
  • 28. Bring together data in one place  Any scale of data – Transparently scale up and down, online and on-demand  Efficient storage, low cost – Columnar, automatically compressed storage + pay for only what you use  Native support for diverse data – Structured + semi-structured (JSON, Avro, ...) in one system, without sacrificing performance or flexibility
  • 29. Accelerate analytics  Simpler, faster data pipeline – Load data directly into Snowflake, at any time, without additional systems or transformations
  • 30. Accelerate analytics  Simpler, faster data pipeline – Load data directly into Snowflake, at any time, without additional systems or transformations  Any scale of workload – Elastic scaling to handle any size job—in- database performance for complex queries
  • 31. Accelerate analytics  Simpler, faster data pipeline – Load data directly into Snowflake, at any time, without additional systems or transformations  Any scale of workload – Elastic scaling to handle any size job—in- database performance for complex queries
  • 32. Accelerate analytics at a fraction of the cost  Simpler, faster data pipeline – Load data directly into Snowflake, at any time, without additional systems or transformations  Any scale of workload – Elastic scaling to handle any size job—in- database performance for complex queries  Infinite concurrency scaling – Give workloads independently processing resources, without needing to copy or move data  At a fraction of the Cost – Cloud economies of scale, pay for only what you use, and align storage & processing costs with use
  • 33. No infrastructure, knobs, or tuning On-premises Cloud data warehouse Data warehouse service Infrastructure Datacenter ✘ (customer) ✔ (vendor) ✔ Hardware & software ✘ ✔ Upgrades & scaling ✘ ✘ Database management & tuning Index management ✘ ✘ Data partitioning ✘ ✘ Metadata & statistics maintenance ✘ ✘ Query optimization ✘ ✘ Data & service availability Failure recovery ✘ ✔ Disaster recovery ✘ ✘ Data protection ✘ ✘ Service monitoring & alerting ✘ ✘ Security Physical security ✘ ✔ Deployment security ✘ ✔ Security monitoring ✘ ✘
  • 34. In summary Fast analytics “Snowflake gets us answers an order of magnitude faster. As a result we can do 100 times more queries per day.” Balaji Rao, Accordant Media Without the cost “Snowflake is extremely cost effective—we have saved nearly 80% by implementing Snowflake.” Rolfe Lindberg, DoubleDown Zero management “Snowflake makes it possible for us to focus on making use of our data without the complexity and resources required by traditional data warehousing and big data solutions.” Ethan Erchinger, Chime One place for data “I can't say enough about how fantastic the native JSON support is.” Josh McDonald, KIXEYE

Editor's Notes