Snowflake concepts & hands on expertise to help get you started on implementing Data warehouses using Snowflake. Necessary information and skills that will help you master Snowflake essentials.
As cloud computing continues to gather speed, organizations with years’ worth of data stored on legacy on-premise technologies are facing issues with scale, speed, and complexity. Your customers and business partners are likely eager to get data from you, especially if you can make the process easy and secure.
Challenges with performance are not uncommon and ongoing interventions are required just to “keep the lights on”.
Discover how Snowflake empowers you to meet your analytics needs by unlocking the potential of your data.
Agenda of Webinar :
~Understand Snowflake and its Architecture
~Quickly load data into Snowflake
~Leverage the latest in Snowflake’s unlimited performance and scale to make the data ready for analytics
~Deliver secure and governed access to all data – no more silos
Introduction to Snowflake Datawarehouse and Architecture for Big data company. Centralized data management. Snowpipe and Copy into a command for data loading. Stream loading and Batch Processing.
What is elastic data warehousing, and how does Snowflake uniquely enable it? Learn about the requirements needed to support flexible, elastic data warehousing using cloud infrastructure.
Master the Multi-Clustered Data Warehouse - SnowflakeMatillion
Snowflake is one of the most powerful, efficient data warehouses on the market today—and we joined forces with the Snowflake team to show you how it works!
In this webinar:
- Learn how to optimize Snowflake
- Hear insider tips and tricks on how to improve performance
- Get expert insights from Craig Collier, Technical Architect from Snowflake, and Kalyan Arangam, Solution Architect from Matillion
- Find out how leading brands like Converse, Duo Security, and Pets at Home use Snowflake and Matillion ETL to make data-driven decisions
- Discover how Matillion ETL and Snowflake work together to modernize your data world
- Learn how to utilize the impressive scalability of Snowflake and Matillion
As cloud computing continues to gather speed, organizations with years’ worth of data stored on legacy on-premise technologies are facing issues with scale, speed, and complexity. Your customers and business partners are likely eager to get data from you, especially if you can make the process easy and secure.
Challenges with performance are not uncommon and ongoing interventions are required just to “keep the lights on”.
Discover how Snowflake empowers you to meet your analytics needs by unlocking the potential of your data.
Agenda of Webinar :
~Understand Snowflake and its Architecture
~Quickly load data into Snowflake
~Leverage the latest in Snowflake’s unlimited performance and scale to make the data ready for analytics
~Deliver secure and governed access to all data – no more silos
Introduction to Snowflake Datawarehouse and Architecture for Big data company. Centralized data management. Snowpipe and Copy into a command for data loading. Stream loading and Batch Processing.
What is elastic data warehousing, and how does Snowflake uniquely enable it? Learn about the requirements needed to support flexible, elastic data warehousing using cloud infrastructure.
Master the Multi-Clustered Data Warehouse - SnowflakeMatillion
Snowflake is one of the most powerful, efficient data warehouses on the market today—and we joined forces with the Snowflake team to show you how it works!
In this webinar:
- Learn how to optimize Snowflake
- Hear insider tips and tricks on how to improve performance
- Get expert insights from Craig Collier, Technical Architect from Snowflake, and Kalyan Arangam, Solution Architect from Matillion
- Find out how leading brands like Converse, Duo Security, and Pets at Home use Snowflake and Matillion ETL to make data-driven decisions
- Discover how Matillion ETL and Snowflake work together to modernize your data world
- Learn how to utilize the impressive scalability of Snowflake and Matillion
A 30 day plan to start ending your data struggle with SnowflakeSnowflake Computing
Organizations everywhere are struggling to load, integrate, analyze and collaborate with data. This is largely thanks to their antiquated data platform, designed in a time when few people had the desire or need to interact with the database. Snowflake, the data warehouse built for the cloud, can help.
Snowflake's Kent Graziano talks about what makes a data warehouse as a service and some of the key features of Snowflake's data warehouse as a service.
How to Take Advantage of an Enterprise Data Warehouse in the CloudDenodo
Watch full webinar here: [https://buff.ly/2CIOtys]
As organizations collect increasing amounts of diverse data, integrating that data for analytics becomes more difficult. Technology that scales poorly and fails to support semi-structured data fails to meet the ever-increasing demands of today’s enterprise. In short, companies everywhere can’t consolidate their data into a single location for analytics.
In this Denodo DataFest 2018 session we’ll cover:
Bypassing the mandate of a single enterprise data warehouse
Modern data sharing to easily connect different data types located in multiple repositories for deeper analytics
How cloud data warehouses can scale both storage and compute, independently and elastically, to meet variable workloads
Presentation by Harsha Kapre, Snowflake
Snowflake: The Good, the Bad, and the UglyTyler Wishnoff
Learn how to solve the top 3 challenges Snowflake customers face, and what you can do to ensure high-performance, intelligent analytics at any scale. Ideal for those currently using Snowflake and those considering it. Learn more at: https://kyligence.io/
Data driven organizations can be challenged to deliver new and growing business intelligence requirements from existing data warehouse platforms, constrained by lack of scalability and performance. The solution for customers is a data warehouse that scales for real-time demands and uses resources in a more optimized and cost-effective manner. Join Snowflake, AWS and Ask.com to learn how Ask.com enhanced BI service levels and decreased expenses while meeting demand to collect, store and analyze over a terabyte of data per day. Snowflake Computing delivers a fast and flexible elastic data warehouse solution that reduces complexity and overhead, built on top of the elasticity, flexibility, and resiliency of AWS.
Join us to learn:
• Learn how Ask.com eliminates data redundancy, and simplifies and accelerates data load, unload, and administration
• Learn how to support new and fluid data consumption patterns with consistently high performance
• Best practices for scaling high data volume on Amazon EC2 and Amazon S3
Who should attend: CIOs, CTOs, CDOs, Directors of IT, IT Administrators, IT Architects, Data Warehouse Developers, Database Administrators, Business Analysts and Data Architects
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Amazon Web Services
Struggling to keep up with an ever-increasing demand for data at your organisation? Do you spend hours tinkering with your streaming data pipelines? Does that one data scientist with direct EDW access keep you up at night? Introducing Snowflake, a brand new SQL data warehouse built for the cloud. We’ve designed and implemented a unique cloud-based architecture that addresses the most common shortcomings of existing data solutions. With Snowflake, you can unlock unlimited concurrency, enable instant scalability, and take advantage of built-in tuning and optimisation. Join us and find out what Netflix, Adobe, and Nike all have in common.
Organizations are struggling to make sense of their data within antiquated data platforms. Snowflake, the data warehouse built for the cloud, can help.
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
Snowflake: The most cost-effective agile and scalable data warehouse ever!Visual_BI
In this webinar, the presenter will take you through the most revolutionary data warehouse, Snowflake with a live demo and technical and functional discussions with a customer. Ryan Goltz from Chesapeake Energy and Tristan Handy, creator of DBT Cloud and owner of Fishtown Analytics will also be joining the webinar.
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
Delivering rapid-fire Analytics with Snowflake and TableauHarald Erb
Until recently, advancements in data warehousing and analytics were largely incremental. Small innovations in database design would herald a new data warehouse every
2-3 years, which would quickly become overwhelmed with rapidly increasing data volumes. Knowledge workers struggled to access those databases with development intensive BI tools designed for reporting, rather than exploration and sharing. Both databases and BI tools were strained in locally hosted environments that were inflexible to growth or change.
Snowflake and Tableau represent a fundamentally different approach. Snowflake’s multi-cluster shared data architecture was designed for the cloud and to handle logarithmically larger data volumes at blazing speed. Tableau was made to foster an interactive approach to analytics, freeing knowledge workers to use the speed of Snowflake to their greatest advantage.
A 30 day plan to start ending your data struggle with SnowflakeSnowflake Computing
Organizations everywhere are struggling to load, integrate, analyze and collaborate with data. This is largely thanks to their antiquated data platform, designed in a time when few people had the desire or need to interact with the database. Snowflake, the data warehouse built for the cloud, can help.
Snowflake's Kent Graziano talks about what makes a data warehouse as a service and some of the key features of Snowflake's data warehouse as a service.
How to Take Advantage of an Enterprise Data Warehouse in the CloudDenodo
Watch full webinar here: [https://buff.ly/2CIOtys]
As organizations collect increasing amounts of diverse data, integrating that data for analytics becomes more difficult. Technology that scales poorly and fails to support semi-structured data fails to meet the ever-increasing demands of today’s enterprise. In short, companies everywhere can’t consolidate their data into a single location for analytics.
In this Denodo DataFest 2018 session we’ll cover:
Bypassing the mandate of a single enterprise data warehouse
Modern data sharing to easily connect different data types located in multiple repositories for deeper analytics
How cloud data warehouses can scale both storage and compute, independently and elastically, to meet variable workloads
Presentation by Harsha Kapre, Snowflake
Snowflake: The Good, the Bad, and the UglyTyler Wishnoff
Learn how to solve the top 3 challenges Snowflake customers face, and what you can do to ensure high-performance, intelligent analytics at any scale. Ideal for those currently using Snowflake and those considering it. Learn more at: https://kyligence.io/
Data driven organizations can be challenged to deliver new and growing business intelligence requirements from existing data warehouse platforms, constrained by lack of scalability and performance. The solution for customers is a data warehouse that scales for real-time demands and uses resources in a more optimized and cost-effective manner. Join Snowflake, AWS and Ask.com to learn how Ask.com enhanced BI service levels and decreased expenses while meeting demand to collect, store and analyze over a terabyte of data per day. Snowflake Computing delivers a fast and flexible elastic data warehouse solution that reduces complexity and overhead, built on top of the elasticity, flexibility, and resiliency of AWS.
Join us to learn:
• Learn how Ask.com eliminates data redundancy, and simplifies and accelerates data load, unload, and administration
• Learn how to support new and fluid data consumption patterns with consistently high performance
• Best practices for scaling high data volume on Amazon EC2 and Amazon S3
Who should attend: CIOs, CTOs, CDOs, Directors of IT, IT Administrators, IT Architects, Data Warehouse Developers, Database Administrators, Business Analysts and Data Architects
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Amazon Web Services
Struggling to keep up with an ever-increasing demand for data at your organisation? Do you spend hours tinkering with your streaming data pipelines? Does that one data scientist with direct EDW access keep you up at night? Introducing Snowflake, a brand new SQL data warehouse built for the cloud. We’ve designed and implemented a unique cloud-based architecture that addresses the most common shortcomings of existing data solutions. With Snowflake, you can unlock unlimited concurrency, enable instant scalability, and take advantage of built-in tuning and optimisation. Join us and find out what Netflix, Adobe, and Nike all have in common.
Organizations are struggling to make sense of their data within antiquated data platforms. Snowflake, the data warehouse built for the cloud, can help.
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
Snowflake: The most cost-effective agile and scalable data warehouse ever!Visual_BI
In this webinar, the presenter will take you through the most revolutionary data warehouse, Snowflake with a live demo and technical and functional discussions with a customer. Ryan Goltz from Chesapeake Energy and Tristan Handy, creator of DBT Cloud and owner of Fishtown Analytics will also be joining the webinar.
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
Delivering rapid-fire Analytics with Snowflake and TableauHarald Erb
Until recently, advancements in data warehousing and analytics were largely incremental. Small innovations in database design would herald a new data warehouse every
2-3 years, which would quickly become overwhelmed with rapidly increasing data volumes. Knowledge workers struggled to access those databases with development intensive BI tools designed for reporting, rather than exploration and sharing. Both databases and BI tools were strained in locally hosted environments that were inflexible to growth or change.
Snowflake and Tableau represent a fundamentally different approach. Snowflake’s multi-cluster shared data architecture was designed for the cloud and to handle logarithmically larger data volumes at blazing speed. Tableau was made to foster an interactive approach to analytics, freeing knowledge workers to use the speed of Snowflake to their greatest advantage.
AUSPC 2013 - Business Continuity Management in SharePointMichael Noel
Providing for a highly available and disaster-tolerant SharePoint environment is no small task; as there are multiple components that require backup, and various architectural design options that each provide for various degrees of business continuity. Consequently, understanding how to design and implement a BCM solution for SharePoint is a must. This session covers BCM for SharePoint, including a thorough discussion of SharePoint Backup and Restore options, a discussion of various BCM-related architectural designs, and a frank look at some of the new SQL 2012 AlwaysOn options for SharePoint.
Reduce planned database down time with Oracle technologyKirill Loifman
How to design an Oracle database system to minimize planned interruptions? That depends on the requirements, goals, SLAs etc. The presentation will follow top-down approach. First we will describe major types of planned maintenance, prioritize those and then based on the system availability requirements find the best cost-effective technics to address those. A bit of planning, strategy and of course modern database and OS technics including latest Oracle 12c features.
SharePoint 2013 on Azure: Your Dedicated Farm in the CloudJamie McAllister
With the Virtual Machine and Virtual Networking services of Windows Azure, it is now possible to deploy and operate a Microsoft SharePoint 2013 Server farm on Windows Azure. In this session we will discuss the key considerations, architecture and operations required to do this successfully. At the end you be able to build your own SharePoint Farm on the Cloud!
Tomer Shiran est le fondateur et chef de produit (CPO) de Dremio. Tomer était le 4e employé et vice-président produit de MapR, un pionnier de l'analyse du Big Data. Il a également occupé de nombreux postes de gestion de produits et d'ingénierie chez IBM Research et Microsoft, et a fondé plusieurs sites Web qui ont servi des millions d'utilisateurs. Il est titulaire d'un Master en génie informatique de l'Université Carnegie Mellon et d'un Bachelor of Science en informatique du Technion - Israel Institute of Technology.
Le Modern Data Stack meetup est ravi d'accueillir Tomer Shiran. Depuis Apache Drill, Apache Arrow maintenant Apache Iceberg, il ancre avec ses équipes des choix pour Dremio avec une vision de la plateforme de données “ouverte” basée sur des technologies open source. En plus, de ces valeurs qui évitent le verrouillage de clients dans des formats propriétaires, il a aussi le souci des coûts qu’engendrent de telles plateformes. Il sait aussi proposer un certain nombre de fonctionnalités qui transforment la gestion de données grâce à des initiatives telles Nessie qui ouvre la route du Data As Code et du transactionnel multi-processus.
Le Modern Data Stack Meetup laisse “carte blanche” à Tomer Shiran afin qu’il nous partage son expérience et sa vision quant à l’Open Data Lakehouse.
C-Drive 2009 presentation by Scott DesBles about how Compellent's Data Instant Replay and Data Progression work together to create an efficient data storage system.
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
Alluxio Monthly Webinar
Apr. 23, 2024
For more Alluxio Events: https://www.alluxio.io/events/
Speaker:
- ChanChan Mao (Developer Advocate, Alluxio)
- Shawn Sun (Tech Lead of Cloud Native, Alluxio)
Cloud-native model training jobs require fast data access to achieve shorter training cycles. Accessing data can be challenging when your datasets are distributed across different regions and clouds. Additionally, as GPUs remain scarce and expensive resources, it becomes more common to set up remote training clusters from where data resides. This multi-region/cloud scenario introduces the challenges of losing data locality, resulting in operational overhead, latency and expensive cloud costs.
In the third webinar of the multi-cloud webinar series, Chanchan and Shawn dive deep into:
- The data locality challenges in the multi-region/cloud ML pipeline
- Using a cloud-native distributed caching system to overcome these challenges
- The architecture and integration of PyTorch/Ray+Alluxio+S3 using POSIX or RESTful APIs
- Live demo with ResNet and BERT benchmark results showing performance gains and cost savings analysis
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLAlluxio, Inc.
Big Data Bellevue Meetup
March 21, 2024
For more Alluxio events: https://alluxio.io/events/
Speakers:
Bin Fan (VP of Open Source, Alluxio)
In this presentation, Bin Fan (VP of Open Source @ Alluxio) will address a critical challenge of optimizing data loading for distributed Python applications within AI/ML workloads in the cloud, focusing on popular frameworks like Ray and Hugging Face. Integration of Alluxio’s distributed caching for Python applications is accomplished using the fsspec interface, thus greatly improving data access speeds. This is particularly useful in machine learning workflows, where repeated data reloading across slow, unstable or congested networks can severely affect GPU efficiency and escalate operational costs.
Attendees can look forward to practical, hands-on demonstrations showcasing the tangible benefits of Alluxio’s caching mechanism across various real-world scenarios. These demos will highlight the enhancements in data efficiency and overall performance of data-intensive Python applications. This presentation is tailored for developers and data scientists eager to optimize their AI/ML workloads. Discover strategies to accelerate your data processing tasks, making them not only faster but also more cost-efficient.
700 Updatable Queries Per Second: Spark as a Real-Time Web ServiceEvan Chan
700 Updatable Queries Per Second: Spark as a Real-Time Web Service. Find out how to use Apache Spark with FiloDb for low-latency queries - something you never thought possible with Spark. Scale it down, not just scale it up!
Actionable Insights with AI - Snowflake for Data ScienceHarald Erb
Talk @ ScaleUp 360° AI Infrastructures DACH, 2021: Data scientists spend 80% and more of their time searching for and preparing data. This talk explains Snowflake’s Platform capabilities like near-unlimited data storage and instant and near-infinite compute resources and how the platform can be used to seamlessly integrate and support the machine learning libraries and tools data scientists rely on.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
2. 11
No software, infrastructure or upgrades to manage
Software as a Service
3
2
4
Built from scratch, optimized for cloud, storage & compute is decoupled
Built for the Cloud
Pay only for used compute & storage
Storage & compute charged independently, only for use
Virtual warehouse enable compute scaling
Scalable
SaaS
Snowflake Full snowflake course at
http://bit.ly/snowflake_course
3. TRADITIONAL ARCHITECTURES
Shared Disk Shared Nothing
Storage is shared
Easier to manage storage
Performance affected by disk contention
Storage & compute are decentralized
Performance scales with storage & compute increase
Generally storage increase must be accompanied by
compute increase
4. SNOWFLAKE - HYBRID ARCHITECTURE
Shared Disk
Similar to shared disk architecture Snowflake uses
a central data repository
Shared Storage
X-Large
LargeMedium
Small
MPP Compute Clusters
Similar to shared nothing architectures Snowflake
uses massively parallel compute clusters where each
node stores and processes its part of the data
Advantages
This hybrid approach provides the benefits of a
simple single disk storage but still allows to scale
out as and when needed
5. Snowflake Pricing
Storage Cost
- Monthly fees charged for the data stored
- Cost is based on average storage used per month
- Cost is calculated after compression
Compute Costs
- Compute costs are charged per snowflake credit
- Snowflake credits are used only when virtual
warehouses are in running state
- The rate of consumption of snowflake credits is
dependent on the size of the virtual warehouse
Full snowflake course at
http://bit.ly/snowflake_course
6. Snowflake Credit
XS S M L XL 2XL 3XL 4XL
1 2 4 8 16 32 64 128
VIRTUAL WAREHOUSE SIZE
NO NODES / SNOWFLAKE
CREDITS PER HOUR
7. Snowflake Credit
XS
S
M
4 hours
2 hours
1 hours
1 SNOW FLAKE CREDIT PER HOUR x 4 HOURS = 4 SNOWFLAKE CREDITS
2 SNOW FLAKE CREDIT PER HOUR x 2 HOURS = 4 SNOWFLAKE CREDITS
4 SNOW FLAKE CREDIT PER HOUR x 1 HOUR = 4 SNOWFLAKE CREDITS
8. Time Travel
A feature unique to Snowflake
Travel back to a point in time
Travel back to a point before a query
Undrop objects
Undo accidental changes
ime Travel
Time travel for
Databases
Schemas
Tables
1 day for Standard Snowflake
Up to 90 days for Enterprise version
Configurable from zero to 90 days
Can be configured at various levels
Database / Schema / Table
User
Objects RetentionT
Full snowflake course at
http://bit.ly/snowflake_course
9. DATA SHARING IN SNOWFLAKE
Historically data sharing has been a cumbersome process
involving extracts, scheduled jobs, latency and data
becoming stale as soon as it is extracted
Snowflake’s unique architecture of decoupling data from
compute enables sharing with minimal effort
Data can be shared without requiring an extract of the data (similar to how Cloning
works).
Shared data can be used by the consumers by using their own compute resources
Non snowflake users can also consume shared data through Reader accounts
10. DATA SHARING IN SNOWFLAKE
Instantaneous
-Data available
immediately
-No need to transform,
extract or transfer data
Up-to-date
-Data is always up-to-date
-Any changes to the data
by the data provider are
reflected to the consumer
Secure & Governed
-Access governed by the
provider
11. DATA LOADING OPTIONS
BULK LOAD
- Snowflake COPY command use for batch loading
- Batch Loading of data which is already available in Cloud Storage or
an internal location
- COPY command used Virtual Warehouse compute resources which
need to be managed manually
- Allows basic Transformations such as re-ordering columns, excluding
columns, data typing, truncating strings
CONTINOUS LOAD
- Snowpipe used for loading streaming data
- Uses a serverless approach scaling up / down automatically
- Doesn’t use the virutal warehouse compute resources
Full snowflake course at
http://bit.ly/snowflake_course
12. QUERY DATA WITHOUT LOADING DATA
EXTERNAL TABLES
- Its not always necessary to load data into Snowflake before you can
access it
- In such cases you can use the external tables to access data
externally
- This is useful if there is a lot of data externally but you want to query
only a small subset of that data
- The external table performance and associate costs can be
optimised by creating metarialised views
13. 11
Make snowflake aware of the data. Internal or external staging area
Stage the data
3
2
4
If required pre-process the files into a format that is optimal for loading
Prepare your files
Execute COPY command
COPY the data into the table
Organize files & schedule your loads
Managing regular loads
LOADING BULK DATA FROM CLOUD & LOCAL STORAGE
3
14. Sign up for a complete Snowflake Essentials course
http://bit.ly/snowflake_course
Secure Data Sharing
Snowflake Concepts & Architecture
Loading & Querying data
Cloning & Time Travel