This Snowflake MasterClass document provides an overview of the topics that will be covered in the course, including getting started, architecture, loading and managing data, performance optimization, security and access control, and best practices. The course contents are organized into modules covering concepts such as Snowflake architecture with its virtual warehouses and storage architecture, loading and transforming data using stages and the COPY command, optimizing performance through techniques like dedicated warehouses, scaling, and caching, and administering security using roles and access control.
This document outlines an agenda for a 90-minute workshop on Snowflake. The agenda includes introductions, an overview of Snowflake and data warehousing, demonstrations of how users utilize Snowflake, hands-on exercises loading sample data and running queries, and discussions of Snowflake architecture and capabilities. Real-world customer examples are also presented, such as a pharmacy building new applications on Snowflake and an education company using it to unify their data sources and achieve a 16x performance improvement.
Snowflake concepts & hands on expertise to help get you started on implementing Data warehouses using Snowflake. Necessary information and skills that will help you master Snowflake essentials.
The document discusses Snowflake, a cloud data platform. It covers Snowflake's data landscape and benefits over legacy systems. It also describes how Snowflake can be deployed on AWS, Azure and GCP. Pricing is noted to vary by region but not cloud platform. The document outlines Snowflake's editions, architecture using a shared-nothing model, support for structured data, storage compression, and virtual warehouses that can autoscale. Security features like MFA and encryption are highlighted.
As cloud computing continues to gather speed, organizations with years’ worth of data stored on legacy on-premise technologies are facing issues with scale, speed, and complexity. Your customers and business partners are likely eager to get data from you, especially if you can make the process easy and secure.
Challenges with performance are not uncommon and ongoing interventions are required just to “keep the lights on”.
Discover how Snowflake empowers you to meet your analytics needs by unlocking the potential of your data.
Agenda of Webinar :
~Understand Snowflake and its Architecture
~Quickly load data into Snowflake
~Leverage the latest in Snowflake’s unlimited performance and scale to make the data ready for analytics
~Deliver secure and governed access to all data – no more silos
Bulk data loading in Snowflake involves the following steps:
1. Creating file format objects to define file types and formats
2. Creating stage objects to store loaded files
3. Staging data files in the stages
4. Listing the staged files
5. Copying data from the stages into target tables
Introducing Snowflake, an elastic data warehouse delivered as a service in the cloud. It aims to simplify data warehousing by removing the need for customers to manage infrastructure, scaling, and tuning. Snowflake uses a multi-cluster architecture to provide elastic scaling of storage, compute, and concurrency. It can bring together structured and semi-structured data for analysis without requiring data transformation. Customers have seen significant improvements in performance, cost savings, and the ability to add new workloads compared to traditional on-premises data warehousing solutions.
How to Take Advantage of an Enterprise Data Warehouse in the CloudDenodo
Watch full webinar here: [https://buff.ly/2CIOtys]
As organizations collect increasing amounts of diverse data, integrating that data for analytics becomes more difficult. Technology that scales poorly and fails to support semi-structured data fails to meet the ever-increasing demands of today’s enterprise. In short, companies everywhere can’t consolidate their data into a single location for analytics.
In this Denodo DataFest 2018 session we’ll cover:
Bypassing the mandate of a single enterprise data warehouse
Modern data sharing to easily connect different data types located in multiple repositories for deeper analytics
How cloud data warehouses can scale both storage and compute, independently and elastically, to meet variable workloads
Presentation by Harsha Kapre, Snowflake
Snowflake is an analytic data warehouse provided as software-as-a-service (SaaS). It uses a unique architecture designed for the cloud, with a shared-disk database and shared-nothing architecture. Snowflake's architecture consists of three layers - the database layer, query processing layer, and cloud services layer - which are deployed and managed entirely on cloud platforms like AWS and Azure. Snowflake offers different editions like Standard, Premier, Enterprise, and Enterprise for Sensitive Data that provide additional features, support, and security capabilities.
This document outlines an agenda for a 90-minute workshop on Snowflake. The agenda includes introductions, an overview of Snowflake and data warehousing, demonstrations of how users utilize Snowflake, hands-on exercises loading sample data and running queries, and discussions of Snowflake architecture and capabilities. Real-world customer examples are also presented, such as a pharmacy building new applications on Snowflake and an education company using it to unify their data sources and achieve a 16x performance improvement.
Snowflake concepts & hands on expertise to help get you started on implementing Data warehouses using Snowflake. Necessary information and skills that will help you master Snowflake essentials.
The document discusses Snowflake, a cloud data platform. It covers Snowflake's data landscape and benefits over legacy systems. It also describes how Snowflake can be deployed on AWS, Azure and GCP. Pricing is noted to vary by region but not cloud platform. The document outlines Snowflake's editions, architecture using a shared-nothing model, support for structured data, storage compression, and virtual warehouses that can autoscale. Security features like MFA and encryption are highlighted.
As cloud computing continues to gather speed, organizations with years’ worth of data stored on legacy on-premise technologies are facing issues with scale, speed, and complexity. Your customers and business partners are likely eager to get data from you, especially if you can make the process easy and secure.
Challenges with performance are not uncommon and ongoing interventions are required just to “keep the lights on”.
Discover how Snowflake empowers you to meet your analytics needs by unlocking the potential of your data.
Agenda of Webinar :
~Understand Snowflake and its Architecture
~Quickly load data into Snowflake
~Leverage the latest in Snowflake’s unlimited performance and scale to make the data ready for analytics
~Deliver secure and governed access to all data – no more silos
Bulk data loading in Snowflake involves the following steps:
1. Creating file format objects to define file types and formats
2. Creating stage objects to store loaded files
3. Staging data files in the stages
4. Listing the staged files
5. Copying data from the stages into target tables
Introducing Snowflake, an elastic data warehouse delivered as a service in the cloud. It aims to simplify data warehousing by removing the need for customers to manage infrastructure, scaling, and tuning. Snowflake uses a multi-cluster architecture to provide elastic scaling of storage, compute, and concurrency. It can bring together structured and semi-structured data for analysis without requiring data transformation. Customers have seen significant improvements in performance, cost savings, and the ability to add new workloads compared to traditional on-premises data warehousing solutions.
How to Take Advantage of an Enterprise Data Warehouse in the CloudDenodo
Watch full webinar here: [https://buff.ly/2CIOtys]
As organizations collect increasing amounts of diverse data, integrating that data for analytics becomes more difficult. Technology that scales poorly and fails to support semi-structured data fails to meet the ever-increasing demands of today’s enterprise. In short, companies everywhere can’t consolidate their data into a single location for analytics.
In this Denodo DataFest 2018 session we’ll cover:
Bypassing the mandate of a single enterprise data warehouse
Modern data sharing to easily connect different data types located in multiple repositories for deeper analytics
How cloud data warehouses can scale both storage and compute, independently and elastically, to meet variable workloads
Presentation by Harsha Kapre, Snowflake
Snowflake is an analytic data warehouse provided as software-as-a-service (SaaS). It uses a unique architecture designed for the cloud, with a shared-disk database and shared-nothing architecture. Snowflake's architecture consists of three layers - the database layer, query processing layer, and cloud services layer - which are deployed and managed entirely on cloud platforms like AWS and Azure. Snowflake offers different editions like Standard, Premier, Enterprise, and Enterprise for Sensitive Data that provide additional features, support, and security capabilities.
This document provides instructions for a hands-on lab guide to explore the Snowflake data warehouse platform using a free trial. The lab guide walks through loading and analyzing structured and semi-structured data in Snowflake. It introduces the key Snowflake concepts of databases, tables, warehouses, queries and roles. The lab is presented as a story where an analytics team loads and analyzes bike share rider transaction data and weather data to understand riders and improve services.
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Amazon Web Services
Snowflake is a cloud-based data warehouse that is built for the cloud. It was founded in 2012 and has raised $1 billion in funding. Snowflake's architecture separates storage, compute, and metadata services, allowing it to offer unlimited scalability, multiple clusters that can access shared data with no downtime, and full transactional consistency across the system. Snowflake has over 2000 customers including large enterprises that use it for analytics, data science, and sharing large volumes of data securely.
Snowflake is a cloud data warehouse that offers scalable storage, flexible compute capabilities, and a shared data architecture. It uses a shared data model where data is stored independently from compute resources in micro-partitions in cloud object storage. This allows for elastic scaling of storage and compute. Snowflake also uses a virtual warehouse architecture where queries are processed in parallel across nodes, enabling high performance on large datasets. Data can be loaded into Snowflake from external sources like Amazon S3 and queries can be run across petabytes of data with ACID transactions and security at scale.
The document discusses Snowflake, a cloud data warehouse that is built for the cloud, multi-tenant, and highly scalable. It uses a shared-data, multi-cluster architecture where compute resources can be scaled independently from storage. Data is stored immutably in micro-partitions across an object store. Virtual warehouses provide isolated compute resources that can access all the data.
Master the Multi-Clustered Data Warehouse - SnowflakeMatillion
Snowflake is one of the most powerful, efficient data warehouses on the market today—and we joined forces with the Snowflake team to show you how it works!
In this webinar:
- Learn how to optimize Snowflake
- Hear insider tips and tricks on how to improve performance
- Get expert insights from Craig Collier, Technical Architect from Snowflake, and Kalyan Arangam, Solution Architect from Matillion
- Find out how leading brands like Converse, Duo Security, and Pets at Home use Snowflake and Matillion ETL to make data-driven decisions
- Discover how Matillion ETL and Snowflake work together to modernize your data world
- Learn how to utilize the impressive scalability of Snowflake and Matillion
This document provides an agenda and overview of a presentation on cloud data warehousing. The presentation discusses data challenges companies face today with large and diverse data sources, and how a cloud data warehouse can help address these challenges by providing unlimited scalability, flexibility, and lower costs. It introduces Snowflake as a first cloud data warehouse built for the cloud, with features like separation of storage and compute, automatic query optimization, and built-in security and encryption. Other cloud data warehouse offerings like Amazon Redshift are also briefly discussed.
Organizations are struggling to make sense of their data within antiquated data platforms. Snowflake, the data warehouse built for the cloud, can help.
Snowflake + Power BI: Cloud Analytics for EveryoneAngel Abundez
This document discusses architectures for using Snowflake and Power BI together. It begins by describing the benefits of each technology. It then outlines several architectural scenarios for connecting Snowflake to Power BI, including using a Power BI gateway, without a gateway, and connecting to Analysis Services. The document also provides examples of usage scenarios and developer best practices. It concludes with a section on data governance considerations for architectures with and without a Power BI gateway.
Introduction to Snowflake Datawarehouse and Architecture for Big data company. Centralized data management. Snowpipe and Copy into a command for data loading. Stream loading and Batch Processing.
A 30 day plan to start ending your data struggle with SnowflakeSnowflake Computing
This document outlines a 30-day plan to address common data struggles around loading, integrating, analyzing, and collaborating on data using Snowflake's data platform. It describes setting up a team, defining goals and scope, loading sample data, testing and deploying business logic transformations, creating warehouses for business intelligence tools, and connecting BI tools to the data. The goal is that after 30 days, teams will be collaborating more effectively, able to easily load and combine different data sources, have accurate business logic implemented, and gain more insights from their data.
Achieving Lakehouse Models with Spark 3.0Databricks
It’s very easy to be distracted by the latest and greatest approaches with technology, but sometimes there’s a reason old approaches stand the test of time. Star Schemas & Kimball is one of those things that isn’t going anywhere, but as we move towards the “Data Lakehouse” paradigm – how appropriate is this modelling technique, and how can we harness the Delta Engine & Spark 3.0 to maximise it’s performance?
In this presentation, we:
1. Look at the challenges and opportunities of the data era
2. Look at key challenges of the legacy data warehouses such as data diversity, complexity, cost, scalabilily, performance, management, ...
3. Look at how modern data warehouses in the cloud not only overcome most of these challenges but also how some of them bring additional technical innovations and capabilities such as pay as you go cloud-based services, decoupling of storage and compute, scaling up or down, effortless management, native support of semi-structured data ...
4. Show how capabilities brought by modern data warehouses in the cloud, help businesses, either new or existing ones, during the phases of their lifecycle such as launch, growth, maturity and renewal/decline.
5. Share a Near-Real-Time Data Warehousing use case built on Snowflake and give a live demo to showcase ease of use, fast provisioning, continuous data ingestion, support of JSON data ...
- Hadoop is a framework for managing and processing big data distributed across clusters of computers. It allows for parallel processing of large datasets.
- Big data comes from various sources like customer behavior, machine data from sensors, etc. It is used by companies to better understand customers and target ads.
- Hadoop uses a master-slave architecture with a NameNode master and DataNode slaves. Files are divided into blocks and replicated across DataNodes for reliability. The NameNode tracks where data blocks are stored.
Delivering Data Democratization in the Cloud with SnowflakeKent Graziano
This is a brief introduction to Snowflake Cloud Data Platform and our revolutionary architecture. It contains a discussion of some of our unique features along with some real world metrics from our global customer base.
In this webinar you'll learn how to quickly and easily improve your business using Snowflake and Matillion ETL for Snowflake. Webinar presented by Solution Architects Craig Collier (Snowflake) adn Kalyan Arangam (Matillion).
In this webinar:
- Learn to optimize Snowflake and leverage Matillion ETL for Snowflake
- Discover tips and tricks to improve performance
- Get invaluable insights from data warehousing pros
Snowflake's Kent Graziano talks about what makes a data warehouse as a service and some of the key features of Snowflake's data warehouse as a service.
This document discusses big data and how it can be analyzed. It defines big data as data that is too large, complex, and dynamic for conventional tools to handle due to its volume, velocity, and variety. It then lists some examples of the huge amounts of data created every day and discusses how organizations have benefited from improved risk management, increased sales, better management control, and other gains through big data analysis. The document also outlines some common Hadoop tools used for working with big data, like HDFS, MapReduce, Hive, HBase and Zookeeper. It notes that big data solutions can be implemented using vendors like Cloudera or Hortonworks or Amazon services and asks if results can be obtained even faster using tools
Snowflake is a cloud data platform that allows users to store, process and analyze data across cloud services like AWS, GCP and Azure. Some key features include:
- It uses a shared data architecture that decouples storage, compute and management services allowing them to scale independently. Data is stored in a centralized storage layer and processed using virtual warehouses.
- Snowflake supports both structured and semi-structured data and users can query data using standard SQL. It also supports stored procedures, user defined functions and external integrations.
- The platform offers different editions with features for security, governance, compute resource management and data integration. It also has tools and connectors for data warehousing, science and sharing workflows.
Protecting your Microsoft Workloads with High Availability and ReliabilityAmazon Web Services
Backing up Windows workloads can be a challenge, and cumbersome for many companies. Backup and recovery for Windows workloads on AWS, however, can be easy. This session will cover best practices for backup and recovery, how to configure Windows workloads to back up to AWS; pitfalls to look out for; and recommended reference architectures.
This document provides instructions for a hands-on lab guide to explore the Snowflake data warehouse platform using a free trial. The lab guide walks through loading and analyzing structured and semi-structured data in Snowflake. It introduces the key Snowflake concepts of databases, tables, warehouses, queries and roles. The lab is presented as a story where an analytics team loads and analyzes bike share rider transaction data and weather data to understand riders and improve services.
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Amazon Web Services
Snowflake is a cloud-based data warehouse that is built for the cloud. It was founded in 2012 and has raised $1 billion in funding. Snowflake's architecture separates storage, compute, and metadata services, allowing it to offer unlimited scalability, multiple clusters that can access shared data with no downtime, and full transactional consistency across the system. Snowflake has over 2000 customers including large enterprises that use it for analytics, data science, and sharing large volumes of data securely.
Snowflake is a cloud data warehouse that offers scalable storage, flexible compute capabilities, and a shared data architecture. It uses a shared data model where data is stored independently from compute resources in micro-partitions in cloud object storage. This allows for elastic scaling of storage and compute. Snowflake also uses a virtual warehouse architecture where queries are processed in parallel across nodes, enabling high performance on large datasets. Data can be loaded into Snowflake from external sources like Amazon S3 and queries can be run across petabytes of data with ACID transactions and security at scale.
The document discusses Snowflake, a cloud data warehouse that is built for the cloud, multi-tenant, and highly scalable. It uses a shared-data, multi-cluster architecture where compute resources can be scaled independently from storage. Data is stored immutably in micro-partitions across an object store. Virtual warehouses provide isolated compute resources that can access all the data.
Master the Multi-Clustered Data Warehouse - SnowflakeMatillion
Snowflake is one of the most powerful, efficient data warehouses on the market today—and we joined forces with the Snowflake team to show you how it works!
In this webinar:
- Learn how to optimize Snowflake
- Hear insider tips and tricks on how to improve performance
- Get expert insights from Craig Collier, Technical Architect from Snowflake, and Kalyan Arangam, Solution Architect from Matillion
- Find out how leading brands like Converse, Duo Security, and Pets at Home use Snowflake and Matillion ETL to make data-driven decisions
- Discover how Matillion ETL and Snowflake work together to modernize your data world
- Learn how to utilize the impressive scalability of Snowflake and Matillion
This document provides an agenda and overview of a presentation on cloud data warehousing. The presentation discusses data challenges companies face today with large and diverse data sources, and how a cloud data warehouse can help address these challenges by providing unlimited scalability, flexibility, and lower costs. It introduces Snowflake as a first cloud data warehouse built for the cloud, with features like separation of storage and compute, automatic query optimization, and built-in security and encryption. Other cloud data warehouse offerings like Amazon Redshift are also briefly discussed.
Organizations are struggling to make sense of their data within antiquated data platforms. Snowflake, the data warehouse built for the cloud, can help.
Snowflake + Power BI: Cloud Analytics for EveryoneAngel Abundez
This document discusses architectures for using Snowflake and Power BI together. It begins by describing the benefits of each technology. It then outlines several architectural scenarios for connecting Snowflake to Power BI, including using a Power BI gateway, without a gateway, and connecting to Analysis Services. The document also provides examples of usage scenarios and developer best practices. It concludes with a section on data governance considerations for architectures with and without a Power BI gateway.
Introduction to Snowflake Datawarehouse and Architecture for Big data company. Centralized data management. Snowpipe and Copy into a command for data loading. Stream loading and Batch Processing.
A 30 day plan to start ending your data struggle with SnowflakeSnowflake Computing
This document outlines a 30-day plan to address common data struggles around loading, integrating, analyzing, and collaborating on data using Snowflake's data platform. It describes setting up a team, defining goals and scope, loading sample data, testing and deploying business logic transformations, creating warehouses for business intelligence tools, and connecting BI tools to the data. The goal is that after 30 days, teams will be collaborating more effectively, able to easily load and combine different data sources, have accurate business logic implemented, and gain more insights from their data.
Achieving Lakehouse Models with Spark 3.0Databricks
It’s very easy to be distracted by the latest and greatest approaches with technology, but sometimes there’s a reason old approaches stand the test of time. Star Schemas & Kimball is one of those things that isn’t going anywhere, but as we move towards the “Data Lakehouse” paradigm – how appropriate is this modelling technique, and how can we harness the Delta Engine & Spark 3.0 to maximise it’s performance?
In this presentation, we:
1. Look at the challenges and opportunities of the data era
2. Look at key challenges of the legacy data warehouses such as data diversity, complexity, cost, scalabilily, performance, management, ...
3. Look at how modern data warehouses in the cloud not only overcome most of these challenges but also how some of them bring additional technical innovations and capabilities such as pay as you go cloud-based services, decoupling of storage and compute, scaling up or down, effortless management, native support of semi-structured data ...
4. Show how capabilities brought by modern data warehouses in the cloud, help businesses, either new or existing ones, during the phases of their lifecycle such as launch, growth, maturity and renewal/decline.
5. Share a Near-Real-Time Data Warehousing use case built on Snowflake and give a live demo to showcase ease of use, fast provisioning, continuous data ingestion, support of JSON data ...
- Hadoop is a framework for managing and processing big data distributed across clusters of computers. It allows for parallel processing of large datasets.
- Big data comes from various sources like customer behavior, machine data from sensors, etc. It is used by companies to better understand customers and target ads.
- Hadoop uses a master-slave architecture with a NameNode master and DataNode slaves. Files are divided into blocks and replicated across DataNodes for reliability. The NameNode tracks where data blocks are stored.
Delivering Data Democratization in the Cloud with SnowflakeKent Graziano
This is a brief introduction to Snowflake Cloud Data Platform and our revolutionary architecture. It contains a discussion of some of our unique features along with some real world metrics from our global customer base.
In this webinar you'll learn how to quickly and easily improve your business using Snowflake and Matillion ETL for Snowflake. Webinar presented by Solution Architects Craig Collier (Snowflake) adn Kalyan Arangam (Matillion).
In this webinar:
- Learn to optimize Snowflake and leverage Matillion ETL for Snowflake
- Discover tips and tricks to improve performance
- Get invaluable insights from data warehousing pros
Snowflake's Kent Graziano talks about what makes a data warehouse as a service and some of the key features of Snowflake's data warehouse as a service.
This document discusses big data and how it can be analyzed. It defines big data as data that is too large, complex, and dynamic for conventional tools to handle due to its volume, velocity, and variety. It then lists some examples of the huge amounts of data created every day and discusses how organizations have benefited from improved risk management, increased sales, better management control, and other gains through big data analysis. The document also outlines some common Hadoop tools used for working with big data, like HDFS, MapReduce, Hive, HBase and Zookeeper. It notes that big data solutions can be implemented using vendors like Cloudera or Hortonworks or Amazon services and asks if results can be obtained even faster using tools
Snowflake is a cloud data platform that allows users to store, process and analyze data across cloud services like AWS, GCP and Azure. Some key features include:
- It uses a shared data architecture that decouples storage, compute and management services allowing them to scale independently. Data is stored in a centralized storage layer and processed using virtual warehouses.
- Snowflake supports both structured and semi-structured data and users can query data using standard SQL. It also supports stored procedures, user defined functions and external integrations.
- The platform offers different editions with features for security, governance, compute resource management and data integration. It also has tools and connectors for data warehousing, science and sharing workflows.
Protecting your Microsoft Workloads with High Availability and ReliabilityAmazon Web Services
Backing up Windows workloads can be a challenge, and cumbersome for many companies. Backup and recovery for Windows workloads on AWS, however, can be easy. This session will cover best practices for backup and recovery, how to configure Windows workloads to back up to AWS; pitfalls to look out for; and recommended reference architectures.
The document provides an overview of NetApp's data deduplication technology and best practices for its implementation. It discusses how deduplication works by fingerprinting and reference pointers, its implementation using WAFL block sharing, and commands for enabling and monitoring deduplication. Best practices covered include using deduplication with SnapMirror, scheduling with backup data, and its applicability to VMware environments.
Best practices: Backup and Recovery for Windows WorkloadsAmazon Web Services
Backing up Windows workloads can be a challenge, and cumbersome for many companies. Backup and recovery for Windows workloads on AWS, however, can be easy. This session will cover best practices for backup and recovery, how to configure Windows workloads to back up to AWS; pitfalls to look out for; and recommended reference architectures.
Best Practices for Backup and Recovery: Windows Workload on AWS Amazon Web Services
Backing up Windows workloads can be a challenge, and cumbersome for many companies. Backup and recovery for Windows workloads on AWS, however, can be easy. This session will cover best practices for backup and recovery, how to configure Windows workloads to back up to AWS; pitfalls to look out for; and recommended reference architectures.
Merging and Migrating: Data Portability from the TrenchesAtlassian
Atlassian products contain lots of data, and often there isn't just one Jira system in use. Be it moving to or from the Cloud, or between instances - merging and migrating data can be hard. Dan Hardiker from Adaptavist will share the challenges they face and solutions they've found to common data portability issues. Learn some best practices, including their standardised Export-Transform-Load approach, and uncover many hidden gremlins you may trip over along the way. After this sessions you'll be ready to fearlessly move data from one Jira instance to another!
This document summarizes a presentation on Amazon Redshift. Redshift is a fully managed data warehouse service that makes it easy to analyze large amounts of data for less than $1,000 per terabyte per year. The presentation covers how to get started with Redshift, best practices for table design and data loading, using Redshift for analytics, and upgrading and scaling a Redshift data warehouse over time.
This document provides a checklist for configuring SQL Server that includes recommendations for the Windows OS, patches, page file, antivirus, RAID, disk formatting, power options, network cards, installation directories, SQL Server services accounts, trace flags, server properties, memory settings, TempDB configuration, and database configuration. It offers guidance on topics like Windows version, patch level, page file size, antivirus exclusions, RAID level, disk formatting, power plan, network redundancy, installation directories, services accounts, trace flags, memory allocation, TempDB files, and database growth settings.
Data Replication Options in AWS (ARC302) | AWS re:Invent 2013Amazon Web Services
One of the most critical roles of an IT department is to protect and serve its corporate data. As a result, IT departments spend tremendous amounts of resources developing, designing, testing, and optimizing data recovery and replication options in order to improve data availability and service response time. This session outlines replication challenges, key design patterns, and methods commonly used in today’s IT environment. Furthermore, the session provides different data replication solutions available in the AWS cloud. Finally, the session outlines several key factors to be considered when implementing data replication architectures in the AWS cloud.
- Oracle Database 11g Release 2 provides many advanced features to lower IT costs including in-memory processing, automated storage management, database compression, and real application testing capabilities.
- It allows for online application upgrades using edition-based redefinition which allows new code and data changes to be installed without disrupting the existing system.
- Oracle provides multiple upgrade paths from prior database versions to 11g to allow for predictable performance and a safe upgrade process.
(BIZ305) Case Study: Migrating Oracle E-Business Suite to AWS | AWS re:Invent...Amazon Web Services
With the maturity and breadth of cloud solutions, more enterprises are moving mission-critical workloads to the cloud. American Commercial Lines (ACL) recently migrated their Oracle ERP to AWS. ERP solutions such as Oracle E-Business Suite require specific knowledge in mapping AWS infrastructure to the specific configurations and needs of running these workloads. In this session, Apps Associates and ACL walk through the considerations for running Oracle E-Business Suite on AWS, including deployment architectures, concurrent processing, load balanced forms and web services, varying database transactional workloads, and performance requirements, as well as security and monitoring aspects. ACL shares their experiences and business drivers in making this transition to AWS.
This 1-day course provides hands-on skills in ingesting, analyzing, transforming and visualizing data using AWS Athena and getting the best performance when using it at scale.
Audience:
This class is intended for data engineers, analysts and data scientists responsible for: analyzing and visualizing big data, implementing cloud-based big data solutions, deploying or migrating big data applications to the public cloud, implementing and maintaining large-scale data storage environments, and transforming/processing big data.
IT systems and applications are producing and consuming content at a rapidly growing rate. This could significantly impact costs and agility of IT organizations if not planned for appropriately. Organizations of all sizes have seen significant benefits from utilizing cloud services in their business. One early area of focus for companies has been the highly durable, low cost and massively scalable benefits that come with cloud storage services. Today, thousands of developers and businesses around the globe rely on Amazon Web Services (AWS) for their backup, archival and disaster recovery requirements. This session covers best practices on proven designs from real world customer use cases and discuss topics such as capacity planning, durability, cost, security, as well as content categorization and transfer.
A brief introduction of different storage options available on AWS platform. And what is the value proposition of AWS in the Disaster Recovery (DR) scenario.
How do you get data from your sources into your Redshift data warehouse? We'll show how to use AWS Glue and Amazon Kinesis Firehose to make it easy to automate the work to get data loaded.
Level: Intermediate
Speakers:
Jay Formosa - Solutions Architect, AWS
Aser Moustafa - Data Warehouse Specialist Solutions Architect, AWS
In this talk, Ian will table about Amazon Redshift, a managed petabyte scale data warehouse, give an overview of integration with Amazon Elastic MapReduce, a managed Hadoop environment, and cover some exciting new developments in the analytics space.
(BDT208) A Technical Introduction to Amazon Elastic MapReduceAmazon Web Services
"Amazon EMR provides a managed framework which makes it easy, cost effective, and secure to run data processing frameworks such as Apache Hadoop, Apache Spark, and Presto on AWS. In this session, you learn the key design principles behind running these frameworks on the cloud and the feature set that Amazon EMR offers. We discuss the benefits of decoupling compute and storage and strategies to take advantage of the scale and the parallelism that the cloud offers, while lowering costs. Additionally, you hear from AOL’s Senior Software Engineer on how they used these strategies to migrate their Hadoop workloads to the AWS cloud and lessons learned along the way.
In this session, you learn the benefits of decoupling storage and compute and allowing them to scale independently; how to run Hadoop, Spark, Presto and other supported Hadoop Applications on Amazon EMR; how to use Amazon S3 as a persistent data-store and process data directly from Amazon S3; dDeployment strategies and how to avoid common mistakes when deploying at scale; and how to use Spot instances to scale your transient infrastructure effectively."
How to run your Hadoop Cluster in 10 minutesVladimir Simek
- Two companies faced challenges processing big data on-premises, including high fixed costs, slow deployment, lack of scalability, and outages impacting production.
- Amazon Elastic MapReduce (EMR) provides a managed Hadoop service that allows companies to launch clusters within minutes in the AWS cloud at lower costs by using elastic and scalable infrastructure.
- AOL moved their 2PB on-premises Hadoop cluster to EMR, reducing costs by 4x while gaining automatic scaling and high availability across availability zones. EMR addressed their challenges and allowed faster restatement of historical data.
by Peter Dalton, Principal Consultant AWS and Taz Sayed, Sr Technical Account Manager AWS
AWS Data & Analytics Week is an opportunity to learn about Amazon’s family of managed analytics services. These services provide easy, scalable, reliable, and cost-effective ways to manage your data in the cloud. We explain the fundamentals and take a technical deep dive into Amazon Redshift data warehouse; Data Lake services including Amazon EMR, Amazon Athena, & Amazon Redshift Spectrum; Log Analytics with Amazon Elasticsearch Service; and data preparation and placement services with AWS Glue and Amazon Kinesis. You'll will learn how to get started, how to support applications, and how to scale.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
CAKE: Sharing Slices of Confidential Data on BlockchainClaudio Di Ciccio
Presented at the CAiSE 2024 Forum, Intelligent Information Systems, June 6th, Limassol, Cyprus.
Synopsis: Cooperative information systems typically involve various entities in a collaborative process within a distributed environment. Blockchain technology offers a mechanism for automating such processes, even when only partial trust exists among participants. The data stored on the blockchain is replicated across all nodes in the network, ensuring accessibility to all participants. While this aspect facilitates traceability, integrity, and persistence, it poses challenges for adopting public blockchains in enterprise settings due to confidentiality issues. In this paper, we present a software tool named Control Access via Key Encryption (CAKE), designed to ensure data confidentiality in scenarios involving public blockchains. After outlining its core components and functionalities, we showcase the application of CAKE in the context of a real-world cyber-security project within the logistics domain.
Paper: https://doi.org/10.1007/978-3-031-61000-4_16
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
2. What will you learn?
Getting Started
Architecture
Unstructured Data
Performance
Load from AWS
Snowpipe
Time Travel
Fail Safe
Zero-Copy Cloning
Table Types
Data Sharing
Data Sampling
Scheduling Tasks
Streams
Materialized views
Data Masking
Visualizations
Partner Connect
Best practices
Loading Data
Copy Options Access Management
Load from Azure
Load from GCP
3. Contents
Getting Started 4 Loading Data 44 Data Sampling 117
Snowflake Architecture 7 Performance Optimization 60 Tasks & Streams 124
Multi-Clustering 11 Snowpipe 88 Materialized Views 142
Data Warehousing 17 Fail Safe and Time Travel 92 Data Masking 155
Cloud Computing 26 Table Types 97 Access Control 159
Snowflake Editions 31 Zero Copy Cloning 101 Snowflake & Other
Tools
182
Snowflake Pricing 34 Swapping 104 Best Practices 186
Snowflake Roles 41 Data Sharing 109
5. How you can get the
most out of this course?
Make use of the
udemy tools
Own pace
Practice
Help others
Resources, quizzes &
assignements
Ask questions
Enjoy learning!
6. Best practices
Make use of the
udemy tools
Own pace
Practice
Help others
Resources, quizzes &
assignements
Ask questions
Enjoy learning!
✓ Pay only for what you use
9. Snowflake Architecture
- Brain of the system -
Managing infrastructure, Access control, security,
Optimizier, Metadata etc.
- Muscle of the system -
Performs MMP (Massive Parallel Processing)
- Hybrid Columnar Storage -
Saved in blobs
16. Scaling policy
Policy Description Cluster Starts… Cluster Shuts Down…
Standard (default)
Prevents/minimizes queuing by
favoring starting additional
clusters over conserving credits.
Immediately when either a query
is queued or the system detects
that there are more queries than
can be executed by the currently
available clusters.
After 2 to 3 consecutive
successful checks
(performed at 1 minute intervals),
which determine whether the load on
the least-loaded cluster could be
redistributed to the other clusters
Economy
Conserves credits by favoring
keeping running clusters fully-
loaded rather than starting
additional clusters,
Result: May result in queries
being queued and taking longer
to complete.
Only if the system estimates
there’s enough query load to keep
the cluster busy for at least 6
minutes.
After 5 to 6 consecutive
successful checks …
32. Snowflake Editions
introductory level
Standard Business
Critical
even higher levels of data
protection for
organizations with
extremely
sensitive data
Enterprise
additional features
for the needs of
large-scale
enterprises
Virtual Private
highest level of
security
33. Snowflake Editions
Standard Business
Critical
Enterprise Virtual Private
✓ Complete DWH
✓ Automatic data
encryption
✓ Time travel up to 1 day
✓ Disaster recovery for 7
days beyond time travel
✓ Secure data share
✓ Premier support 24/7
✓ All Standard features
✓ Multi-cluster warehouse
✓ Time travel up to 90
days
✓ Materialized views
✓ Search Optimization
✓ Column-level security
✓ All Enterprise features
✓ Additional security
features such as Data
encryption everywhere
✓ Extended support
✓ Database failover and
disaster recovery
✓ All Business Ciritcal
features
✓ Dedicated virtual
servers and completely
seperate Snowflake
environment
35. Snowflake Pricing
Compute Storage
✓ Charged for active warehouses per hour
✓ Depending on the size of the warehouse
✓ Billed by second (minimum of 1min)
✓ Charged in Snowflake credits
✓ Monthly storage fees
✓ Based on average storage used per month
✓ Cost calculated after compression
✓ Cloud Providers
39. Snowflake Pricing
On Demand
Storage
Capacity
Storage
✓ We think we need 1 TB of storage
Region: US East (Northern Virginia)
Platform: AWS
❖ Scenario 1: 100GB of storage used
0.1 TB x $40 = $4
❖ Scenario 1: 100GB of storage used
1 TB x $23 = $23
❖ Scenario 2: 800GB of storage used
0.8 TB x $40 = $32
❖ Scenario 2: 800GB of storage used
0.8 TB x $40 = $23
43. Snowflake Roles
ACCOUNTADMIN SECURITYADMIN SYSADMIN USERADMIN PUBLIC
✓ SYSADMIN and
SECURITYADMIN
✓ top-level role in the
system
✓ should be granted only
to a limited number of
users
✓ USERADMIN role is
granted to
SECURITYADMIN
✓ Can manage users
and roles
✓ Can manage any
object grant globally
✓ Create warehouses
and databases (and
more objects)
✓ Recommended that
all custom roles are
assigned
✓ Dedicated to user
and role
management only
✓ Can create users and
roles
✓ Automatically
granted to every
user
✓ Can create own
objects like every
other role (available
to every other
user/role
45. Loading Data
BULK
LOADING
CONTINUOUS
LOADING
✓ Most frequent method
✓ Uses warehouses
✓ Loading from stages
✓ COPY command
✓ Transformations possible
✓ Designed to load small volumes of data
✓ Automatically once they are added to stages
✓ Lates results for analysis
✓ Snowpipe (Serverless feature)
46. Understanding Stages
✓ Not to be confused with dataware house
stages
✓ Location of data files where data can be loaded from
External
Stage
Internal
Stage
47. Understanding Stages
External
Stage
Internal
Stage
✓ External cloud provider
▪ S3
▪ Google Cloud Plattform
▪ Microsoft Azure
✓ Database object created in Schema
✓ CREATE STAGE (URL, access settings)
Note: Additional costs may apply
if region/platform differs
✓ Local storage maintained
by Snowflake
48. Copy Options
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
FILE_FORMAT = <file_format_name>
copyOptions
49. Copy Options
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
FILE_FORMAT = <file_format_name>
ON_ERROR = CONTINUE
50. Copy Options
✓ Validate the data files instead of loading them
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
FILE_FORMAT = <file_format_name>
VALIDATION_MODE = RETURN_n_ROWS | RETURN_ERRORS
51. Copy Options
✓ Validate the data files instead of loading them
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
FILE_FORMAT = <file_format_name>
VALIDATION_MODE = RETURN_n_ROWS | RETURN_ERRORS
RETURN_n_ROWS (e.g. RETURN_10_ROWS) Validates & returns the specified number of rows;
fails at the first error encountered
RETURN_ERRORS Returns all errors in Copy Command
52. Copy Options
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
FILE_FORMAT = <file_format_name>
SIZE_LIMIT = num
✓ Specify maximum size (in bytes) of data loaded in that command (at least one file)
✓ When the threshold is exceeded, the COPY operation stops
loading
53. Copy Options
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
FILE_FORMAT = <file_format_name>
RETURN_FAILED_ONLY = TRUE | FALSE
✓ Specifies whether to return only files that have failed to load in the statement result
✓ DEFAULT = FALSE
54. Copy Options
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
FILE_FORMAT = <file_format_name>
TRUNCATECOLUMNS = TRUE | FALSE
✓ Specifies whether to truncate text strings that exceed the target column length
55. Copy Options
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
FILE_FORMAT = <file_format_name>
FORCE = TRUE | FALSE
✓ Specifies to load all files, regardless of whether they’ve been loaded previously and
have not changed since they were loaded
✓ Note that this option reloads files, potentially
duplicating data in a table
56. Copy Options
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
FILE_FORMAT = <file_format_name>
TRUNCATECOLUMNS = TRUE | FALSE
✓ Specifies whether to truncate text strings that exceed the target column length
✓ TRUE = strings are automatically truncated to the target column length
✓ FALSE = COPY produces an error if a loaded string exceeds the target column length
✓ DEFAULT = FALSE
57. Copy Options
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
FILE_FORMAT = <file_format_name>
SIZE_LIMIT = num
✓ Specify maximum size (in bytes) of data loaded in that command (at least one file)
✓ When the threshold is exceeded, the COPY operation stops
loading
✓ Threshold for each file
✓ DEFAULT: null (no size limit)
58. Copy Options
COPY INTO <table_name>
FROM externalStage
FILES = ( '<file_name>' ,'<file_name2>')
FILE_FORMAT = <file_format_name>
PURGE = TRUE | FALSE
✓ DEFAULT: FALSE
✓ specifies whether to remove the data files from the stage
automatically after the data is loaded successfully
68. Dedicated virtual warehouse
Identify &
Classify
✓ Identify & Classify groups of workload/users
✓ For every class of workload & assign users
Create dedicated
virtual
warehouses
✓ BI Team, Data Science Team, Marketing department
70. Considerations
✓ If you use at least Entripse Edition all warehouses should be Multi-Cluster
✓ Minimum: Default should be 1
✓ Maximum: Can be very high
Let's practice!
71. How does it work in Snowflake?
Data scources
ETL/ELT
72. Scaling Up/Down
✓ Changing the size of the virtual warehouse
depending on different work loads in different periods
✓ ETL at certain times (for example between 4pm and 8pm)
✓ Special business event with more work load
Use cases
✓ NOTE: Common scenario is increased query complexity
NOT more users (then Scaling out would be better)
73. Scaling Out
Scaling Up Scaling Out
Increasing the size of virtual
warehouses
Using addition warehouses/ Multi-
Cluster warehouses
More complex query More concurrent users/queries
74. Scaling Out
✓ Handling performance related to large numbers of concurrent users
✓ Automation the process if you have fluctuating number of users
75. Caching
✓ Automatical process to speed up the queries
✓ If query is executed twice, results are cached and can be re-used
✓ Results are cached for 24 hours or until underlaying data has changed
76. What can we do?
✓ Ensure that similar queries go on the same warehouse
✓ Example: Team of Data Scientists run similar queries, so they should all use
the same warehouse
77. ✓ In general Snowflake produces well-clustered tables
✓ Cluster keys are not always ideal and can change over time
Clustering in Snowflake
✓ Snowflake automatically maintains these cluster keys
✓ Manually customize these cluster keys
78. What is a cluster key?
✓ Subset of rows to locate the data in micro-partions
✓ For large tables this improves the scan efficiency in our queries
79. Event Date Event ID Customers City
2021-03-12 134584 … …
2021-12-04 134586 … …
2021-11-04 134588 … …
2021-04-05 134589 … …
2021-06-07 134594 … …
2021-07-03 134597 … …
2021-03-04 134598 … …
2021-08-03 134599 … …
2021-08-04 134601 … …
What is a cluster key?
Event Date Event ID Customers City
2021-03-12 134584 … …
2021-12-04 134586 … …
2021-11-04 134588 … …
2021-04-05 134589 … …
2021-06-07 134594 … …
2021-07-03 134597 … …
2021-03-04 134598 … …
2021-08-03 134599 … …
2021-08-04 134601 … …
84. ✓ Mainly very large tables of multiple terabytes can benefit
When to cluster?
✓ Clustering is not for all tables
85. ✓ If you typically use filters on two columns then the table can also benefit
from two cluster keys
✓ Column that is frequently used in Joins
How to cluster?
✓ Columns that are used most frequently in WHERE-clauses
(often date columns for event tables)
✓ Large enough number of distinct values to enable effective grouping
Small enough number of distinct values to allow effective grouping
86. Clustering in Snowflake
CREATE TABLE <name> ... CLUSTER BY ( <column1> [ , <column2> ... ] )
ALTER TABLE <name> CLUSTER BY ( <expr1> [ , <expr2> ... ] )
CREATE TABLE <name> ... CLUSTER BY ( <expression> )
ALTER TABLE <name> DROP CLUSTERING KEY
87. ✓ If you typically use filters on two columns then the table can also benefit
from two cluster keys
✓ Column that is frequently used in Joins
Clustering in Snowflake
✓ Columns that are used most frequently in WHERE-clauses
(often date columns for event tables)
✓ Large enough number of distinct values to enable effective grouping
Small enough number of distinct values to allow effective grouping
89. What is Snowpipe?
✓ Enables loading once a file appears in a bucket
✓ If data needs to be available immediately for analysis
✓ Snowpipe uses serverless features instead of
warehouses
91. Setting up Snowpipe
Create Stage
✓ To make sure it works
✓ To trigger snowpipe
Test COPY COMMAND
Create Pipe
✓ To have the connection
S3 Notification
✓ Create pipe as object with COPY COMMAND
93. Time Travel
Standard Business
Critical
Enterprise Virtual Private
✓ Time travel up to 1 day ✓ Time travel up to 90
days
✓ Time travel up to 90
days
✓ Time travel up to 90
days
RETENTION
PERIODE
DEFAULT = 1
94. Fail Safe
✓ Protection of historical data in case of
disaster
✓ Non-configurable 7-day period for permanent
tables
✓ No user interaction & recoverable only by
Snowflake
✓ Period starts immediately after Time Travel period
ends
✓ Contributes to storage cost
96. Continuous Data Protection Lifecycle
✓ Access and query data etc.
etc.
Current
Data Storage
✓ SELECT … AT | BEFORE
UNDROP
Time Travel
(1 – 90 days)
✓ No user
operations/queries
Fail Safe
(transient: 0 days
permanent: 7 days)
✓ Restoring only by snowflake support
✓ Recovery beyond Time Travel
98. Table types
Transient
Permanent
✓ Time Travel Retention Period
0 – 90 days
✓ Fail Safe
CREATE TRANSIENT TABLE
CREATE TABLE
✓ Time Travel Retention Period
0 – 1 day
× Fail Safe
Temporary
CREATE TEMPORARY TABLE
× Fail Safe
Only in session
Until dropped
Until dropped
✓ Time Travel Retention Period
0 – 1 day
Only for data that does not
need to be protected
Non-permanent
data
Permanent data
99. Table types
Transient
Permanent
✓ Time Travel Retention Period
0 – 90 days
✓ Fail Safe
CREATE TRANSIENT TABLE
CREATE TABLE
✓ Time Travel Retention Period
0 – 1 day
× Fail Safe
Temporary
CREATE TEMPORARY TABLE
× Fail Safe
Only in session
Until dropped
Until dropped
✓ Time Travel Retention Period
0 – 1 day
Only for data that does not
need to be protected
Non-permanent
data
Permanent data
Managing Storage Cost
100. Table types notes
✓ Types are also available for other
database objects (database, schema
etc.)
✓ For temporary table no naming conflicts
with permanent/transient tables!
Other tables will be effectively hidden!
102. Zero-Copy Cloning
✓ Create copies of a database, a schema or a
table
✓ Cloned object is independent from original
table
✓ Easy to copy all meta data & improved storage
management
✓ Creating backups for development purposes
✓ Works with time travel also
110. Data Sharing
✓ Usually this can be also a rather complicated
process
✓ Data sharing without actual copy of the data &
uptodate
✓ Shared data can be consumed by the own compute
resources
✓ Non-Snowflake-Users can also access through a
reader account
115. Data Sharing with
Non Snowflake Users
Account 1
Reader Account
Own Compute
Resources
116. Sharing with
Non Snowflake users
✓ Indepentant instance with
own url & own compute resources
New Reader
Account
Share data ✓ Share database & table
✓ In reader account create database from share
Create database
✓ As administrator create user & roles
Create Users
122. Data Sampling Methods
ROW or BERNOULLI method BLOCK or SYSTEM method
Every row is chosen with percentage p Every block is chosen with percentage p
More effective processing
More "randomness"
Smaller tables Larger tables
125. Scheduling Tasks
✓ Tasks can be used to schedule SQL statements
✓ Standalone tasks and trees of tasks
Understand
tasks
Create tasks Schedule tasks
Tree of tasks Check task
history
126. Tree of Tasks
Root task
Task A Task B
Task C Task D Task E Task F
✓ Every task has one parent
127. Tree of Tasks
ALTER TASK ...
ADD AFTER <parent task>
CREATE TASK ...
AFTER <parent task>
AS …
143. Materialized views
✓ We have a view that is queried frequently
and that a long time to be processed
× Bad user experience
× More compute consumption
144. Materialized views
✓ We have a view that is queried frequently
and that a long time to be processed
✓ We can create a materialized view to solve
that problem
145. What is a materialized view?
✓ Use any SELECT-statement to create this MV
✓ Results will be stored in a seperate table
and this will be updated automatically
based on the base table
146. When to use MV?
✓ Benefits
✓ Maintenance costs
147. When to use MV?
✓ View would take a long time to be processed
and is used frequently
✓ Underlaying data is change not frequently
and on a rather irregular basis
148. When to use MV?
If the data is updated on a very regular basis…
✓ Using tasks & streams could be a better alternative
149. Alternative – streams & tasks
Underlaying Table
Stream object
TASK with MERGE
VIEW / TABLE
150. When to use MV?
✓ Don't use materialized view if data changes are very
frequent
✓ Keep maintenance cost in mind
✓ Considder leveraging tasks (& streams) instead
153. Limitations
× Joins (including self-joins) are not supported
× Limited amount of aggregation functions
APPROX_COUNT_DISTINCT (HLL).
AVG (except when used in PIVOT).
BITAND_AGG.
BITOR_AGG.
BITXOR_AGG.
COUNT.
MIN.
MAX.
STDDEV.
STDDEV_POP.
STDDEV_SAMP.
SUM.
VARIANCE (VARIANCE_SAMP, VAR_SAMP).
VARIANCE_POP (VAR_POP).
154. Limitations
× Joins (including self-joins) are not supported
× Limited amount of aggregation functions
× UDFs
× HAVING clauses.
× ORDER BY clause.
× LIMIT clause
164. Access Control
✓ Every object owned by a single role (multiple users)
✓ Owner (role) has all privileges per default
165. Key concepts
USER
ROLE
PRIVILEGE
SECURABLE OBJECT
✓ People or systems
✓ Entity to which privileges are granted
(role hierarchy)
✓ Level of access to an object
(SELECT, DROP, CREATE etc.)
✓ Objects to which privileges can be
granted
(Database, Table, Warehouse etc.)
168. Snowflake Roles
ACCOUNTADMIN SECURITYADMIN SYSADMIN USERADMIN PUBLIC
✓ SYSADMIN and
SECURITYADMIN
✓ top-level role in the
system
✓ should be granted only
to a limited number of
users
✓ USERADMIN role is
granted to
SECURITYADMIN
✓ Can manage users
and roles
✓ Can manage any
object grant globally
✓ Create warehouses
and databases (and
more objects)
✓ Recommended that
all custom roles are
assigned
✓ Dedicated to user
and role
management only
✓ Can create users and
roles
✓ Automatically
granted to every
user
✓ Can create own
objects like every
other role (available
to every other
user/role
170. ACCOUNTADMIN
Top-Level-Role
✓ Manage & view all objects
✓ All configurations on account level
✓ Account operations
(create reader account, billing
etc.)
✓ First user will have this role assigned
Best practises
✓ Very controlled assignment strongly recommended!
✓ Multi-factor authentification
✓ At least two users should be assigned to that role
✓ Avoid creating objects with that role unless you have
to
✓ Initial setup & managing account level
objects
176. SYSADMIN
✓ Create & manage objects
✓ Create & manage warehouses,
databases, tables etc.
✓ Custom roles should be assigned to the
SYSADMIN role as the parent
Then this role also has the ability to grant privileges on
warehouses, databases, and other objects to the custom
roles.
177. SYSADMIN
Sales Admin Role
Sales Role
HR Admin Role
HR Role
✓ Create a virtual warehouse & assign it to the custom roles
✓ Create a database and table & assign it to the custom
roles
178. Custom roles
Sales Admin Role
Sales Role
HR Admin Role
HR Role
✓ Customize roles to our needs & create own hierarchies
✓ Custom roles are usually created by SECURITYADMIN
✓ Should be leading up to the SYSADMIN role
179. USERADMIN
Sales Admin Role
Sales Role
HR Admin Role
HR Role
✓ Create Users & Roles (User & Role Management)
✓ Not for granting privileges (only the one that is owns)
180. PUBLIC
Sales Admin Role
Sales Role
HR Admin Role
HR Role
✓ Create Users & Roles (User & Role Management)
✓ Not for granting privileges (only the one that is owns)
181. PUBLIC
✓ Least privileged role (bottom of hierarchy)
✓ Every user is automatically assigned to this role
✓ Can own objects
✓ These objects are then available to everyone
188. How does it work in Snowflake?
Data scources
ETL/ELT
189. Virtual warehouse
✓ Best Practice #1 – Enable Auto-Suspend
✓ Best Practice #2 – Enable Auto-Resume
✓ Best Practice #3 – Set appropriate timeouts
ETL / Data Loading BI / SELECT queries DevOps / Data Science
Timeout Immediately 10 min 5 min
190. Table design
✓ Best Practice #1 – Appropiate table type
✓ Productive tables – Permanent
✓ Development tables – Transient
✓ Staging tables – Transient
191. Table design
✓ Best Practice #1 – Appropiate table type
✓ Best Practice #2 – Appropiate data type
✓ Best Practice #3 – Set cluster keys only if necesarry
✓ Most query time for table scan
✓ Dimensions
✓ Large table
192. Retention period
✓ Best Practice #1: Staging database – 0 days (transient)
✓ Best Practice #2 – Production – 4-7 days (1 day min)
✓ Best Practice #3 – Large high-churn tables – 0 days
(transient)
Active Time Travel Fail Safe
Timeout 20GB 400GB 2.8TB