SlideShare a Scribd company logo
1 of 28
Download to read offline
Snowflakes in the Cloud
Real world experience on a new
approach for Big Data
Robert Fehrmann
Principal Architect @ Snagajob
About Me
● Master Degree in Computer
Science from “Technische
Universitaet Braunschweig”
● 25 years building the data tier for
applications in different verticals
● Evangelist for polyglot data
environments
● Community involvement
(MongoDB User Groups / DevOps)
Agenda
● The Snagajob Story:
○ How did we end up in the Big Data World using
Snowflake
● Gotcha’s
○ Interesting stories on using Snowflake
Funnel Analysis
750 000 postings every day
600,000 unique visitors
X% find the posting
interesting
Y% apply for the
posting
(candidate)
Z%
Using Analytics to understand
the funnel
- Geographical Analysis
- Customer Analysis
- Historical Analysis
- Industry Analysis
- Click through rate &
abandoning the search
- What makes a Posting
Interesting, ...
Event Collection Framework V1
Web WebWeb
Message
Bus
LB
Tracking
Service
Tracking
Service
Flume
Flume
Flume
Hadoop
Hue Impala Report
Console
SQL-DW
Looker
Vertica
Evolution
201620142012
“We want to be a
cloud based
company”
Peter Harris, CEO
2015
Search
Continues
For a true
cloud
solution till
….
Data warehouse &
platform software
( on premise)
Vertica Data
Warehouse
Hadoop
Vertica Data
Warehouse
Move to Cloud
Doesn’t solve all
problems
Hadoop
Goals for Next Generation Solution
● Horizontal Scalability
● PaaS
● Stability
● Ease of Use
● Can’t be more expensive
Architecture
Event Collection Framework V2
Web WebWeb
Message
Bus
LB
Tracking
Service
Tracking
Service
FiveTran
Salesforce
Netsuite
Kenisis Snowflake
Looker
Snowflake
Portal
AdHoc
Spark
MongoDB
Results: Performance
Results: Better Use of Resources
Gotcha #1
● Problem:
○ Funnel Analysis got slower over time
● Base Metrics
○ 15 Billion rows
○ Analysis on monthly dataset: 2 minutes per run
○ 3 medium clusters in DW during business hours
Take 1
10 min !!!
Take 1
14 Billion row
hmmm,...
Take 2
Take 2
3 min !
Take 2
Yeah ,...
Hmm ,...
Take 3
Take 3
7 secs !!!
Take 3
Gotcha #2
Snowflake
Continuous Data Protection
https://docs.snowflake.net/manuals/user-guide/data-failsafe.html
Updates and Fail-Failsafe
SCD Type 2 / Mapping / View
View
Mapping Table
N-Key S-Key ..
N-Key-1 S-Key-2
N-Key-2 S-Key-5
Fact Table
S-Key N-Key ...
S-Key-1 N-Key-1
S-Key-2 N-Key-1
S-Key-3 N-Key-2
S-Key-4 N-Key-2
S-Key-5 N-Key-3
What’s going on: Case 2
Snowflake
Other Features
● Undrop (DB, Table, Schema) no restore required
● Clone (DB, Table, Schema) (metadata only operation)
● Native JSON Parsing (as well as CSV, AVRO, XML, Parquet)
● Automatic Encryption of Data
● Automatic Query Optimization (no tuning)
● All Data in one place (single source of truth)
Snowflakes in the Cloud Real world experience on a new approach for Big Data

More Related Content

What's hot

5 Steps to Smarter, Faster, Simpler Tableau Dashboards.
5 Steps to Smarter, Faster, Simpler Tableau Dashboards.5 Steps to Smarter, Faster, Simpler Tableau Dashboards.
5 Steps to Smarter, Faster, Simpler Tableau Dashboards.Kinetica
 
Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020Harald Erb
 
Eugene Polonichko "Architecture of modern data warehouse"
Eugene Polonichko "Architecture of modern data warehouse"Eugene Polonichko "Architecture of modern data warehouse"
Eugene Polonichko "Architecture of modern data warehouse"Lviv Startup Club
 
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...Databricks
 
Operationalizing Machine Learning Using GPU-accelerated, In-database Analytics
Operationalizing Machine Learning Using GPU-accelerated, In-database AnalyticsOperationalizing Machine Learning Using GPU-accelerated, In-database Analytics
Operationalizing Machine Learning Using GPU-accelerated, In-database AnalyticsKinetica
 
IBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data LakeIBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data LakeTorsten Steinbach
 
Does it only have to be ML + AI?
Does it only have to be ML + AI?Does it only have to be ML + AI?
Does it only have to be ML + AI?Harald Erb
 
How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?Jeraldine Phneah
 
Hybrid Transactional/Analytics Processing: Beyond the Big Database Hype
Hybrid Transactional/Analytics Processing: Beyond the Big Database HypeHybrid Transactional/Analytics Processing: Beyond the Big Database Hype
Hybrid Transactional/Analytics Processing: Beyond the Big Database HypeAli Hodroj
 
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data GridsSpark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data GridsAli Hodroj
 
GPU Acceleration for Financial Services
GPU Acceleration for Financial ServicesGPU Acceleration for Financial Services
GPU Acceleration for Financial ServicesKinetica
 
Democratizing data science Using spark, hive and druid
Democratizing data science Using spark, hive and druidDemocratizing data science Using spark, hive and druid
Democratizing data science Using spark, hive and druidDataWorks Summit
 
How to Realize an Additional 270% ROI on Snowflake
How to Realize an Additional 270% ROI on SnowflakeHow to Realize an Additional 270% ROI on Snowflake
How to Realize an Additional 270% ROI on SnowflakeAtScale
 
Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
Lambda Architecture in the Cloud with Azure Databricks with Andrei VaranovichLambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
Lambda Architecture in the Cloud with Azure Databricks with Andrei VaranovichDatabricks
 
How GPUs Enable XVA Pricing and Risk Calculations for Risk Aggregation
How GPUs Enable XVA Pricing and Risk Calculations for Risk AggregationHow GPUs Enable XVA Pricing and Risk Calculations for Risk Aggregation
How GPUs Enable XVA Pricing and Risk Calculations for Risk AggregationKinetica
 
Real-time Microservices and In-Memory Data Grids
Real-time Microservices and In-Memory Data GridsReal-time Microservices and In-Memory Data Grids
Real-time Microservices and In-Memory Data GridsAli Hodroj
 
GTC-DC 2017 Session: Advanced Analytics and Machine Learning with Geospatial ...
GTC-DC 2017 Session: Advanced Analytics and Machine Learning with Geospatial ...GTC-DC 2017 Session: Advanced Analytics and Machine Learning with Geospatial ...
GTC-DC 2017 Session: Advanced Analytics and Machine Learning with Geospatial ...Kinetica
 
Big Data in the Cloud with Azure Marketplace Images
Big Data in the Cloud with Azure Marketplace ImagesBig Data in the Cloud with Azure Marketplace Images
Big Data in the Cloud with Azure Marketplace ImagesMark Kromer
 
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services Torsten Steinbach
 
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Michael Rys
 

What's hot (20)

5 Steps to Smarter, Faster, Simpler Tableau Dashboards.
5 Steps to Smarter, Faster, Simpler Tableau Dashboards.5 Steps to Smarter, Faster, Simpler Tableau Dashboards.
5 Steps to Smarter, Faster, Simpler Tableau Dashboards.
 
Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020
 
Eugene Polonichko "Architecture of modern data warehouse"
Eugene Polonichko "Architecture of modern data warehouse"Eugene Polonichko "Architecture of modern data warehouse"
Eugene Polonichko "Architecture of modern data warehouse"
 
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
 
Operationalizing Machine Learning Using GPU-accelerated, In-database Analytics
Operationalizing Machine Learning Using GPU-accelerated, In-database AnalyticsOperationalizing Machine Learning Using GPU-accelerated, In-database Analytics
Operationalizing Machine Learning Using GPU-accelerated, In-database Analytics
 
IBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data LakeIBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data Lake
 
Does it only have to be ML + AI?
Does it only have to be ML + AI?Does it only have to be ML + AI?
Does it only have to be ML + AI?
 
How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?
 
Hybrid Transactional/Analytics Processing: Beyond the Big Database Hype
Hybrid Transactional/Analytics Processing: Beyond the Big Database HypeHybrid Transactional/Analytics Processing: Beyond the Big Database Hype
Hybrid Transactional/Analytics Processing: Beyond the Big Database Hype
 
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data GridsSpark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
 
GPU Acceleration for Financial Services
GPU Acceleration for Financial ServicesGPU Acceleration for Financial Services
GPU Acceleration for Financial Services
 
Democratizing data science Using spark, hive and druid
Democratizing data science Using spark, hive and druidDemocratizing data science Using spark, hive and druid
Democratizing data science Using spark, hive and druid
 
How to Realize an Additional 270% ROI on Snowflake
How to Realize an Additional 270% ROI on SnowflakeHow to Realize an Additional 270% ROI on Snowflake
How to Realize an Additional 270% ROI on Snowflake
 
Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
Lambda Architecture in the Cloud with Azure Databricks with Andrei VaranovichLambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
 
How GPUs Enable XVA Pricing and Risk Calculations for Risk Aggregation
How GPUs Enable XVA Pricing and Risk Calculations for Risk AggregationHow GPUs Enable XVA Pricing and Risk Calculations for Risk Aggregation
How GPUs Enable XVA Pricing and Risk Calculations for Risk Aggregation
 
Real-time Microservices and In-Memory Data Grids
Real-time Microservices and In-Memory Data GridsReal-time Microservices and In-Memory Data Grids
Real-time Microservices and In-Memory Data Grids
 
GTC-DC 2017 Session: Advanced Analytics and Machine Learning with Geospatial ...
GTC-DC 2017 Session: Advanced Analytics and Machine Learning with Geospatial ...GTC-DC 2017 Session: Advanced Analytics and Machine Learning with Geospatial ...
GTC-DC 2017 Session: Advanced Analytics and Machine Learning with Geospatial ...
 
Big Data in the Cloud with Azure Marketplace Images
Big Data in the Cloud with Azure Marketplace ImagesBig Data in the Cloud with Azure Marketplace Images
Big Data in the Cloud with Azure Marketplace Images
 
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
IBM THINK 2020 - Cloud Data Lake with IBM Cloud Data Services
 
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
 

Similar to Snowflakes in the Cloud Real world experience on a new approach for Big Data

Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data PlatformDani Solà Lagares
 
Evolution of Real-time User Engagement Event Consumption at Pinterest
Evolution of Real-time User Engagement Event Consumption at PinterestEvolution of Real-time User Engagement Event Consumption at Pinterest
Evolution of Real-time User Engagement Event Consumption at PinterestHostedbyConfluent
 
Apache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyondApache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyondBowen Li
 
Choosing the Right Database - Facebook DevC Malang Hackdays 2017
Choosing the Right Database - Facebook DevC Malang Hackdays 2017Choosing the Right Database - Facebook DevC Malang Hackdays 2017
Choosing the Right Database - Facebook DevC Malang Hackdays 2017Rendy Bambang Junior
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQLWSO2
 
WSO2Con USA 2015: An Introduction to the WSO2 Analytics Platform
WSO2Con USA 2015: An Introduction to the WSO2 Analytics PlatformWSO2Con USA 2015: An Introduction to the WSO2 Analytics Platform
WSO2Con USA 2015: An Introduction to the WSO2 Analytics PlatformWSO2
 
Building Reactive Real-time Data Pipeline
Building Reactive Real-time Data PipelineBuilding Reactive Real-time Data Pipeline
Building Reactive Real-time Data PipelineTrieu Nguyen
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futuremarkgrover
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesKarthik Murugesan
 
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataVoxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataStavros Kontopoulos
 
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big DataVoxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big DataVoxxed Days Thessaloniki
 
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...HostedbyConfluent
 
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen LiTowards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen LiBowen Li
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Demi Ben-Ari
 
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Codemotion
 
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018Bowen Li
 
WSO2Con USA 2015: An Introduction to the WSO2 Analytics Platform
WSO2Con USA 2015: An Introduction to the WSO2 Analytics PlatformWSO2Con USA 2015: An Introduction to the WSO2 Analytics Platform
WSO2Con USA 2015: An Introduction to the WSO2 Analytics PlatformWSO2
 
Data Platform in the Cloud
Data Platform in the CloudData Platform in the Cloud
Data Platform in the CloudAmihay Zer-Kavod
 
AWS meetup - Serverless Conf 2019 NYC
AWS meetup - Serverless Conf 2019 NYCAWS meetup - Serverless Conf 2019 NYC
AWS meetup - Serverless Conf 2019 NYCGerald Bachlmayr
 
Machine Learning at Scale with MLflow and Apache Spark
Machine Learning at Scale with MLflow and Apache SparkMachine Learning at Scale with MLflow and Apache Spark
Machine Learning at Scale with MLflow and Apache SparkDatabricks
 

Similar to Snowflakes in the Cloud Real world experience on a new approach for Big Data (20)

Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data Platform
 
Evolution of Real-time User Engagement Event Consumption at Pinterest
Evolution of Real-time User Engagement Event Consumption at PinterestEvolution of Real-time User Engagement Event Consumption at Pinterest
Evolution of Real-time User Engagement Event Consumption at Pinterest
 
Apache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyondApache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyond
 
Choosing the Right Database - Facebook DevC Malang Hackdays 2017
Choosing the Right Database - Facebook DevC Malang Hackdays 2017Choosing the Right Database - Facebook DevC Malang Hackdays 2017
Choosing the Right Database - Facebook DevC Malang Hackdays 2017
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
 
WSO2Con USA 2015: An Introduction to the WSO2 Analytics Platform
WSO2Con USA 2015: An Introduction to the WSO2 Analytics PlatformWSO2Con USA 2015: An Introduction to the WSO2 Analytics Platform
WSO2Con USA 2015: An Introduction to the WSO2 Analytics Platform
 
Building Reactive Real-time Data Pipeline
Building Reactive Real-time Data PipelineBuilding Reactive Real-time Data Pipeline
Building Reactive Real-time Data Pipeline
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slides
 
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataVoxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
 
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big DataVoxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
 
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
 
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen LiTowards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
 
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
 
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
 
WSO2Con USA 2015: An Introduction to the WSO2 Analytics Platform
WSO2Con USA 2015: An Introduction to the WSO2 Analytics PlatformWSO2Con USA 2015: An Introduction to the WSO2 Analytics Platform
WSO2Con USA 2015: An Introduction to the WSO2 Analytics Platform
 
Data Platform in the Cloud
Data Platform in the CloudData Platform in the Cloud
Data Platform in the Cloud
 
AWS meetup - Serverless Conf 2019 NYC
AWS meetup - Serverless Conf 2019 NYCAWS meetup - Serverless Conf 2019 NYC
AWS meetup - Serverless Conf 2019 NYC
 
Machine Learning at Scale with MLflow and Apache Spark
Machine Learning at Scale with MLflow and Apache SparkMachine Learning at Scale with MLflow and Apache Spark
Machine Learning at Scale with MLflow and Apache Spark
 

More from DevFest DC

Push Notifications Or: How I Learned to Stop Worrying and Love NotificationCo...
Push Notifications Or: How I Learned to Stop Worrying and Love NotificationCo...Push Notifications Or: How I Learned to Stop Worrying and Love NotificationCo...
Push Notifications Or: How I Learned to Stop Worrying and Love NotificationCo...DevFest DC
 
Reactive Programming in Akka
Reactive Programming in AkkaReactive Programming in Akka
Reactive Programming in AkkaDevFest DC
 
Containers, microservices and azure
Containers, microservices and azureContainers, microservices and azure
Containers, microservices and azureDevFest DC
 
Programming Google apps with the G Suite APIs
Programming Google apps with the G Suite APIsProgramming Google apps with the G Suite APIs
Programming Google apps with the G Suite APIsDevFest DC
 
Well, That Escalated Quickly: Anomaly Detection with Elastic Machine Learning
Well, That Escalated Quickly: Anomaly Detection with Elastic Machine LearningWell, That Escalated Quickly: Anomaly Detection with Elastic Machine Learning
Well, That Escalated Quickly: Anomaly Detection with Elastic Machine LearningDevFest DC
 
Why uri storage and the modern android app
Why uri  storage and the modern android appWhy uri  storage and the modern android app
Why uri storage and the modern android appDevFest DC
 
Myths of Angular 2: What Angular Really Is
Myths of Angular 2: What Angular Really IsMyths of Angular 2: What Angular Really Is
Myths of Angular 2: What Angular Really IsDevFest DC
 
Android Things Robocar with TensorFlow for object recognition
Android Things Robocar with TensorFlow for object recognitionAndroid Things Robocar with TensorFlow for object recognition
Android Things Robocar with TensorFlow for object recognitionDevFest DC
 
Troubleshooting & debugging production microservices in Kubernetes with Googl...
Troubleshooting & debugging production microservices in Kubernetes with Googl...Troubleshooting & debugging production microservices in Kubernetes with Googl...
Troubleshooting & debugging production microservices in Kubernetes with Googl...DevFest DC
 
Hack the Real World with ANDROID THINGS
Hack the Real World with ANDROID THINGSHack the Real World with ANDROID THINGS
Hack the Real World with ANDROID THINGSDevFest DC
 
Using Cloud Vision To Watch The World’s News Imagery In Realtime: The GDELT P...
Using Cloud Vision To Watch The World’s News Imagery In Realtime: The GDELT P...Using Cloud Vision To Watch The World’s News Imagery In Realtime: The GDELT P...
Using Cloud Vision To Watch The World’s News Imagery In Realtime: The GDELT P...DevFest DC
 
Teaching machines to see the process of designing (datasets) with ai
Teaching machines to see  the process of designing (datasets) with aiTeaching machines to see  the process of designing (datasets) with ai
Teaching machines to see the process of designing (datasets) with aiDevFest DC
 

More from DevFest DC (12)

Push Notifications Or: How I Learned to Stop Worrying and Love NotificationCo...
Push Notifications Or: How I Learned to Stop Worrying and Love NotificationCo...Push Notifications Or: How I Learned to Stop Worrying and Love NotificationCo...
Push Notifications Or: How I Learned to Stop Worrying and Love NotificationCo...
 
Reactive Programming in Akka
Reactive Programming in AkkaReactive Programming in Akka
Reactive Programming in Akka
 
Containers, microservices and azure
Containers, microservices and azureContainers, microservices and azure
Containers, microservices and azure
 
Programming Google apps with the G Suite APIs
Programming Google apps with the G Suite APIsProgramming Google apps with the G Suite APIs
Programming Google apps with the G Suite APIs
 
Well, That Escalated Quickly: Anomaly Detection with Elastic Machine Learning
Well, That Escalated Quickly: Anomaly Detection with Elastic Machine LearningWell, That Escalated Quickly: Anomaly Detection with Elastic Machine Learning
Well, That Escalated Quickly: Anomaly Detection with Elastic Machine Learning
 
Why uri storage and the modern android app
Why uri  storage and the modern android appWhy uri  storage and the modern android app
Why uri storage and the modern android app
 
Myths of Angular 2: What Angular Really Is
Myths of Angular 2: What Angular Really IsMyths of Angular 2: What Angular Really Is
Myths of Angular 2: What Angular Really Is
 
Android Things Robocar with TensorFlow for object recognition
Android Things Robocar with TensorFlow for object recognitionAndroid Things Robocar with TensorFlow for object recognition
Android Things Robocar with TensorFlow for object recognition
 
Troubleshooting & debugging production microservices in Kubernetes with Googl...
Troubleshooting & debugging production microservices in Kubernetes with Googl...Troubleshooting & debugging production microservices in Kubernetes with Googl...
Troubleshooting & debugging production microservices in Kubernetes with Googl...
 
Hack the Real World with ANDROID THINGS
Hack the Real World with ANDROID THINGSHack the Real World with ANDROID THINGS
Hack the Real World with ANDROID THINGS
 
Using Cloud Vision To Watch The World’s News Imagery In Realtime: The GDELT P...
Using Cloud Vision To Watch The World’s News Imagery In Realtime: The GDELT P...Using Cloud Vision To Watch The World’s News Imagery In Realtime: The GDELT P...
Using Cloud Vision To Watch The World’s News Imagery In Realtime: The GDELT P...
 
Teaching machines to see the process of designing (datasets) with ai
Teaching machines to see  the process of designing (datasets) with aiTeaching machines to see  the process of designing (datasets) with ai
Teaching machines to see the process of designing (datasets) with ai
 

Recently uploaded

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 

Recently uploaded (20)

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 

Snowflakes in the Cloud Real world experience on a new approach for Big Data

  • 1. Snowflakes in the Cloud Real world experience on a new approach for Big Data Robert Fehrmann Principal Architect @ Snagajob
  • 2.
  • 3. About Me ● Master Degree in Computer Science from “Technische Universitaet Braunschweig” ● 25 years building the data tier for applications in different verticals ● Evangelist for polyglot data environments ● Community involvement (MongoDB User Groups / DevOps)
  • 4. Agenda ● The Snagajob Story: ○ How did we end up in the Big Data World using Snowflake ● Gotcha’s ○ Interesting stories on using Snowflake
  • 5. Funnel Analysis 750 000 postings every day 600,000 unique visitors X% find the posting interesting Y% apply for the posting (candidate) Z% Using Analytics to understand the funnel - Geographical Analysis - Customer Analysis - Historical Analysis - Industry Analysis - Click through rate & abandoning the search - What makes a Posting Interesting, ...
  • 6. Event Collection Framework V1 Web WebWeb Message Bus LB Tracking Service Tracking Service Flume Flume Flume Hadoop Hue Impala Report Console SQL-DW Looker Vertica
  • 7. Evolution 201620142012 “We want to be a cloud based company” Peter Harris, CEO 2015 Search Continues For a true cloud solution till …. Data warehouse & platform software ( on premise) Vertica Data Warehouse Hadoop Vertica Data Warehouse Move to Cloud Doesn’t solve all problems Hadoop
  • 8. Goals for Next Generation Solution ● Horizontal Scalability ● PaaS ● Stability ● Ease of Use ● Can’t be more expensive
  • 10. Event Collection Framework V2 Web WebWeb Message Bus LB Tracking Service Tracking Service FiveTran Salesforce Netsuite Kenisis Snowflake Looker Snowflake Portal AdHoc Spark MongoDB
  • 12. Results: Better Use of Resources
  • 13. Gotcha #1 ● Problem: ○ Funnel Analysis got slower over time ● Base Metrics ○ 15 Billion rows ○ Analysis on monthly dataset: 2 minutes per run ○ 3 medium clusters in DW during business hours
  • 15. Take 1 14 Billion row hmmm,...
  • 25. SCD Type 2 / Mapping / View View Mapping Table N-Key S-Key .. N-Key-1 S-Key-2 N-Key-2 S-Key-5 Fact Table S-Key N-Key ... S-Key-1 N-Key-1 S-Key-2 N-Key-1 S-Key-3 N-Key-2 S-Key-4 N-Key-2 S-Key-5 N-Key-3
  • 26. What’s going on: Case 2 Snowflake
  • 27. Other Features ● Undrop (DB, Table, Schema) no restore required ● Clone (DB, Table, Schema) (metadata only operation) ● Native JSON Parsing (as well as CSV, AVRO, XML, Parquet) ● Automatic Encryption of Data ● Automatic Query Optimization (no tuning) ● All Data in one place (single source of truth)