Frank Bell, Data Thought Leader and Snowflake SME at Accenture - CEO at ITS
We will cover all aspects of optimizing your Snowflake Data Cloud including:
*Dive deep into how Snowflake pay as you go costs work and how by utilizing our proven optimization tools - Snoptimizer SaaS Snowflake Optimizer - https://snoptimizer.com/
, scripts, and architecture techniques you typically can save 10-40++% on your existing Snowflake Account costs.
*Explain how Snowflake Compute works and proven techniques on how to architect warehouses for both cost and performance efficiency. We cover in depth how snowflake scales BOTH out and in as well as up and down with compute resources.
*Explain how Snowflake data storage works with Replication, Time-Travel, and Cloning. We explain these awesome features as well as their downsides if they are used and configured wrongly.
*Cover Snowflake cloud services costs and features that have costs related to them, including Snowpipe, Search Optimization, Materialized Views, Auto-clustering, and other recent new cost based features that provide value at a cost.
*Finally, we will discuss how you can ensure your Snowflake Account(s) are fully optimized not just for cost but also for security and performance on Snowflake. We will show you security and performance best practices as well as pitfalls to avoid.
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdfChris Hoyean Song
I'm posting the slide presented at the Snowflake user group meet up.
NFT Bank has introduced DBT to rebuild and operate the entire data pipeline from scratch.
Data quality control and monitoring are critical as data is at the core of the company.
You can manage your numerous data validation tests in organized way. You can add one data validation test with just single line of yaml.
You can build the data catalog and data lineage docs if you just implement your data pipeline on top of DBT without big effort.
---
Session 1: Data Quality & Productivity
Data Quality
Data Quality Validation
Data Catalog, Lineage Documentation
DBT Introduction
Session 2: Integrate DBT with Airflow
DBT Cloud or Airflow?
Astronomer Cosmos
dbt deps
Session 3: Cost Optimization
Query Optimization
Cost Monitoring
Mastering Cloud Data Cost Control: A FinOps ApproachDenodo
Watch full webinar here: https://buff.ly/3uu8dEy
With the rise of cloud-first initiatives and pay-per-use systems, forecasting IT costs has become a challenge. It's easy to start small, but it's equally easy to get skyrocketing bills with little warning. FinOps is a discipline that tries to tackle these issues, by providing the framework to understand and optimize cloud costs in a more controlled manner. The Denodo Platform, being a middleware layer in charge of global data delivery, sits in a privileged position not only to help us understand where costs are coming from, but also to take action, manage, and reduce them.
Attend this session to learn:
- The importance of FinOps in a cloud architecture
- How the Denodo Platform can help you collect and visualize key FinOps metrics to understand where your costs are coming from
- What actions and controls the Denodo Platform offers to keep costs at bay
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps ApproachDenodo
Watch full webinar here: https://buff.ly/4bYOOgb
With the rise of cloud-first initiatives and pay-per-use systems, forecasting IT costs has become a challenge. It's easy to start small, but it's equally easy to get skyrocketing bills with little warning. FinOps is a discipline that tries to tackle these issues, by providing the framework to understand and optimize cloud costs in a more controlled manner. The Denodo Platform, being a middleware layer in charge of global data delivery, sits in a privileged position not only to help us understand where costs are coming from, but also to take action, manage, and reduce them.
Attend this session to learn:
- The importance of FinOps in a cloud architecture.
- How the Denodo Platform can help you collect and visualize key FinOps metrics to understand where your costs are coming from?
- What actions and controls the Denodo Platform offers to keep costs at bay.
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Databricks
Morningstar’s Risk Model project is created by stitching together statistical and machine learning models to produce risk and performance metrics for millions of financial securities. Previously, we were running a single version of this application, but needed to expand it to allow for customizations based on client demand. With the goal of running hundreds of custom Risk Model runs at once at an output size of around 1TB of data each, we had a challenging technical problem on our hands! In this presentation, we’ll talk about the challenges we faced replatforming this application to Spark, how we solved them, and the benefits we saw.
Some things we’ll touch on include how we created customized models, the architecture of our machine learning application, how we maintain an audit trail of data transformations (for rigorous third party audits), and how we validate the input data our model takes in and output data our model produces. We want the attendees to walk away with some key ideas of what worked for us when productizing a large scale machine learning platform.
AWS December 2015 Webinar Series - Strategies to Quantify TCO & Optimize Cost...Amazon Web Services
AWS allows customers to save money and optimize costs in multiple ways. By adopting AWS, organizations can reduce capital expenses and shift to an operating model, improve business performance and drive savings over time. Organizations that adopt AWS have the tools to move from forecast-based capacity planning to an on-demand model with no termination fees or complex agreements. By moving to AWS, customers can reduce total cost of ownership (TCO) and continue to see increased savings over time. In addition to reducing TCO, AWS empowers customers to optimize costs by providing them tools and partner solutions that help them identify what they are consuming and the right size of the services that their business needs. They will use the services only when they are necessary for production. These solutions allow customers to pay not only for what they need but also only pay for the right capacity and time of consumption, reducing idle time and unnecessary sunk costs.
In this webinar, you will learn strategies directly from AWS Product Manager and understand how a customer (FINRA) used Splunk to develop a cost optimization model that helps to drive value and continued lower costs.
Learning Objectives:
Dive deeper into the economics of the cloud and understand how AWS can positively impact your organization
Learn how a customer gained real-time visibility into instance cost and usage to reduce spending
Who Should Attend:
IT managers, Sr. IT professionals, business decision makers, procurement managers, developers, sys admins, operations
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudAmazon Web Services
FINRA’s Data Lake unlocks the value in its data to accelerate analytics and machine learning at scale. FINRA's Technology group has changed its customer's relationship with data by creating a Managed Data Lake that enables discovery on Petabytes of capital markets data, while saving time and money over traditional analytics solutions. FINRA’s Managed Data Lake includes a centralized data catalog and separates storage from compute, allowing users to query from petabytes of data in seconds. Learn how FINRA uses Spot instances and services such as Amazon S3, Amazon EMR, Amazon Redshift, and AWS Lambda to provide the 'right tool for the right job' at each step in the data processing pipeline. All of this is done while meeting FINRA’s security and compliance responsibilities as a financial regulator.
ADV Slides: Building and Growing Organizational Analytics with Data LakesDATAVERSITY
Data lakes are providing immense value to organizations embracing data science.
In this webinar, William will discuss the value of having broad, detailed, and seemingly obscure data available in cloud storage for purposes of expanding Data Science in the organization.
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdfChris Hoyean Song
I'm posting the slide presented at the Snowflake user group meet up.
NFT Bank has introduced DBT to rebuild and operate the entire data pipeline from scratch.
Data quality control and monitoring are critical as data is at the core of the company.
You can manage your numerous data validation tests in organized way. You can add one data validation test with just single line of yaml.
You can build the data catalog and data lineage docs if you just implement your data pipeline on top of DBT without big effort.
---
Session 1: Data Quality & Productivity
Data Quality
Data Quality Validation
Data Catalog, Lineage Documentation
DBT Introduction
Session 2: Integrate DBT with Airflow
DBT Cloud or Airflow?
Astronomer Cosmos
dbt deps
Session 3: Cost Optimization
Query Optimization
Cost Monitoring
Mastering Cloud Data Cost Control: A FinOps ApproachDenodo
Watch full webinar here: https://buff.ly/3uu8dEy
With the rise of cloud-first initiatives and pay-per-use systems, forecasting IT costs has become a challenge. It's easy to start small, but it's equally easy to get skyrocketing bills with little warning. FinOps is a discipline that tries to tackle these issues, by providing the framework to understand and optimize cloud costs in a more controlled manner. The Denodo Platform, being a middleware layer in charge of global data delivery, sits in a privileged position not only to help us understand where costs are coming from, but also to take action, manage, and reduce them.
Attend this session to learn:
- The importance of FinOps in a cloud architecture
- How the Denodo Platform can help you collect and visualize key FinOps metrics to understand where your costs are coming from
- What actions and controls the Denodo Platform offers to keep costs at bay
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps ApproachDenodo
Watch full webinar here: https://buff.ly/4bYOOgb
With the rise of cloud-first initiatives and pay-per-use systems, forecasting IT costs has become a challenge. It's easy to start small, but it's equally easy to get skyrocketing bills with little warning. FinOps is a discipline that tries to tackle these issues, by providing the framework to understand and optimize cloud costs in a more controlled manner. The Denodo Platform, being a middleware layer in charge of global data delivery, sits in a privileged position not only to help us understand where costs are coming from, but also to take action, manage, and reduce them.
Attend this session to learn:
- The importance of FinOps in a cloud architecture.
- How the Denodo Platform can help you collect and visualize key FinOps metrics to understand where your costs are coming from?
- What actions and controls the Denodo Platform offers to keep costs at bay.
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Databricks
Morningstar’s Risk Model project is created by stitching together statistical and machine learning models to produce risk and performance metrics for millions of financial securities. Previously, we were running a single version of this application, but needed to expand it to allow for customizations based on client demand. With the goal of running hundreds of custom Risk Model runs at once at an output size of around 1TB of data each, we had a challenging technical problem on our hands! In this presentation, we’ll talk about the challenges we faced replatforming this application to Spark, how we solved them, and the benefits we saw.
Some things we’ll touch on include how we created customized models, the architecture of our machine learning application, how we maintain an audit trail of data transformations (for rigorous third party audits), and how we validate the input data our model takes in and output data our model produces. We want the attendees to walk away with some key ideas of what worked for us when productizing a large scale machine learning platform.
AWS December 2015 Webinar Series - Strategies to Quantify TCO & Optimize Cost...Amazon Web Services
AWS allows customers to save money and optimize costs in multiple ways. By adopting AWS, organizations can reduce capital expenses and shift to an operating model, improve business performance and drive savings over time. Organizations that adopt AWS have the tools to move from forecast-based capacity planning to an on-demand model with no termination fees or complex agreements. By moving to AWS, customers can reduce total cost of ownership (TCO) and continue to see increased savings over time. In addition to reducing TCO, AWS empowers customers to optimize costs by providing them tools and partner solutions that help them identify what they are consuming and the right size of the services that their business needs. They will use the services only when they are necessary for production. These solutions allow customers to pay not only for what they need but also only pay for the right capacity and time of consumption, reducing idle time and unnecessary sunk costs.
In this webinar, you will learn strategies directly from AWS Product Manager and understand how a customer (FINRA) used Splunk to develop a cost optimization model that helps to drive value and continued lower costs.
Learning Objectives:
Dive deeper into the economics of the cloud and understand how AWS can positively impact your organization
Learn how a customer gained real-time visibility into instance cost and usage to reduce spending
Who Should Attend:
IT managers, Sr. IT professionals, business decision makers, procurement managers, developers, sys admins, operations
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudAmazon Web Services
FINRA’s Data Lake unlocks the value in its data to accelerate analytics and machine learning at scale. FINRA's Technology group has changed its customer's relationship with data by creating a Managed Data Lake that enables discovery on Petabytes of capital markets data, while saving time and money over traditional analytics solutions. FINRA’s Managed Data Lake includes a centralized data catalog and separates storage from compute, allowing users to query from petabytes of data in seconds. Learn how FINRA uses Spot instances and services such as Amazon S3, Amazon EMR, Amazon Redshift, and AWS Lambda to provide the 'right tool for the right job' at each step in the data processing pipeline. All of this is done while meeting FINRA’s security and compliance responsibilities as a financial regulator.
ADV Slides: Building and Growing Organizational Analytics with Data LakesDATAVERSITY
Data lakes are providing immense value to organizations embracing data science.
In this webinar, William will discuss the value of having broad, detailed, and seemingly obscure data available in cloud storage for purposes of expanding Data Science in the organization.
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY
Thirty years is a long time for a technology foundation to be as active as relational databases. Are their replacements here? In this webinar, we say no.
Databases have not sat around while Hadoop emerged. The Hadoop era generated a ton of interest and confusion, but is it still relevant as organizations are deploying cloud storage like a kid in a candy store? We’ll discuss what platforms to use for what data. This is a critical decision that can dictate two to five times additional work effort if it’s a bad fit.
Drop the herd mentality. In reality, there is no “one size fits all” right now. We need to make our platform decisions amidst this backdrop.
This webinar will distinguish these analytic deployment options and help you platform 2020 and beyond for success.
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
Whether to take data ingestion cycles off the ETL tool and the data warehouse or to facilitate competitive Data Science and building algorithms in the organization, the data lake – a place for unmodeled and vast data – will be provisioned widely in 2020.
Though it doesn’t have to be complicated, the data lake has a few key design points that are critical, and it does need to follow some principles for success. Avoid building the data swamp, but not the data lake! The tool ecosystem is building up around the data lake and soon many will have a robust lake and data warehouse. We will discuss policy to keep them straight, send data to its best platform, and keep users’ confidence up in their data platforms.
Data lakes will be built in cloud object storage. We’ll discuss the options there as well.
Get this data point for your data lake journey.
Benefits of Cloud Hosting and SaaS Solutions for IT Solution Providers and th...Janine Soika
This presentation provides information on what Cloud, Hosting, and SaaS solutions are, and how they can be resold by IT Solution providers to build a profitable monthly recurring business, and the cost savings benefits to end customers for using these solutions.
It is a fascinating, explosive time for enterprise analytics.
It is from the position of analytics leadership that the mission will be executed and company leadership will emerge. The data professional is absolutely sitting on the performance of the company in this information economy and has an obligation to demonstrate the possibilities and originate the architecture, data, and projects that will deliver analytics. After all, no matter what business you’re in, you’re in the business of analytics.
The coming years will be full of big changes in enterprise analytics and Data Architecture. William will kick off the fourth year of the Advanced Analytics series with a discussion of the trends winning organizations should build into their plans, expectations, vision, and awareness now.
Data Architecture Best Practices for Advanced AnalyticsDATAVERSITY
Many organizations are immature when it comes to data and analytics use. The answer lies in delivering a greater level of insight from data, straight to the point of need.
There are so many Data Architecture best practices today, accumulated from years of practice. In this webinar, William will look at some Data Architecture best practices that he believes have emerged in the past two years and are not worked into many enterprise data programs yet. These are keepers and will be required to move towards, by one means or another, so it’s best to mindfully work them into the environment.
Estimating the Total Costs of Your Cloud Analytics PlatformDATAVERSITY
Organizations today need a broad set of enterprise data cloud services with key data functionality to modernize applications and utilize machine learning. They need a platform designed to address multi-faceted needs by offering multi-function Data Management and analytics to solve the enterprise’s most pressing data and analytic challenges in a streamlined fashion. They need a worry-free experience with the architecture and its components.
The Shifting Landscape of Data IntegrationDATAVERSITY
Enterprises and organizations from every industry and scale are working to leverage data to achieve their strategic objectives — whether they are to be more profitable, effective, risk-tolerant, prepared, sustainable, and/or adaptable in an ever-changing world. Data has exploded in volume during the last decade as humans and machines alike produce data at an exponential pace. Also, exciting technologies have emerged around that data to improve our abilities and capabilities around what we can do with data.
Behind this data revolution, there are forces at work, causing enterprises to shift the way they leverage data and accelerate the demand for leverageable data. Organizations (and the climates in which they operate) are becoming more and more complex. They are also becoming increasingly digital and, thus, dependent on how data informs, transforms, and automates their operations and decisions. With increased digitization comes an increased need for both scale and agility at scale.
In this session, we have undertaken an ambitious goal of evaluating the current vendor landscape and assessing which platforms have made, or are in the process of making, the leap to this new generation of Data Management and integration capabilities.
Many companies have discovered that there is “gold” in their server log files and machine data. Closely monitoring this data can improve security, help prevent costly outages and reduce the time it takes to recover from a problem. In this presentation, GTRI’s Micah Montgomery explains how operational intelligence can be gained from machine data, and how Splunk Enterprise can turn this data into actionable insights. Also presenting was NetApp’s Steve Fritzinger, who discussed how to manage the challenges of capturing and storing a flood of data without breaking the bank.
Presented at "Denver Big Data Analytics Day" on May 18, 2016 at GTRI.
Curiosity and Lemontree present - Data Breaks DevOps: Why you need automated ...Curiosity Software Ireland
This webinar was co-hosted by Curiosity and Lemontree on April 22nd, 2021. Watch the webinar on demand - https://opentestingplatform.curiositysoftware.ie/data-breaks-devops-webinar
DevOps and continuous delivery are only as fast as their slowest part. For many organisations, testing remains the major sticking point. It’s viewed as a necessary bottleneck, at fault for delaying releases, yet still unable to catch bugs before they hit production. One persistent, yet often overlooked, barrier is commonly at fault: test data. Data is the place to improve release velocity and quality today.
For many test teams today, test data delays remain their greatest bottleneck. Many still rely on a central team for data provisioning, before spending further time finding and making the data they need for a particular test suite. This siloed “request and receive” approach to data provisioning will always be a game of catch-up. Development is constantly getting faster, releasing systems that require increasingly complex data. Manually finding, securing and copying that data will never be able to keep up.
Delivering quality systems at speed instead requires on demand access to rich and interrelated data. With today’s technologies, that means “allocating” data during CI/CD processes and automated testing, making rich and compliant data available to parallel teams and frameworks automatically.
This webinar will present a pragmatic approach for moving from current test data processes to “just in time” data allocation. Veteran test data innovator, Huw Price, will offer cutting edge techniques for allocating rich test data from a range of sources on-the-fly. This “Test Data Automation” ensures that every test and tester has the data they need, exactly when and where they need it.
Learn how to solve the top 3 challenges Snowflake customers face, and what you can do to ensure high-performance, intelligent analytics at any scale. Ideal for those currently using Snowflake and those considering it.
https://www.brighttalk.com/webcast/18317/422499
Learn more about the tools, techniques and technologies for working productively with data at any scale. This presentation introduces the family of data analytics tools on AWS which you can use to collect, compute and collaborate around data, from gigabytes to petabytes. We'll discuss Amazon Elastic MapReduce, Hadoop, structured and unstructured data, and the EC2 instance types which enable high performance analytics.
Jon Einkauf, Senior Product Manager, Elastic MapReduce, AWS
Alan Priestley, Marketing Manager, Intel and Bob Harris, CTO, Channel 4
How Greenhouse Software Unlocked the Power of Machine Data Analytics with Sum...Amazon Web Services
Sumo Logic offers a powerful cloud-native analytics solution that supports all types of machine data. Our platform integrates easily with your AWS infrastructure supporting fast, accurate and secure analysis and monitoring of enormous amounts of data—giving you clear and direct visibility into its operations.
In this webinar, you’ll learn how organizations such as Greenhouse Software harness cloud-native machine data analytics to optimize the internal and external process lifecycles, monitor the health of all AWS application and services and deliver a WOW application to their end users.
Watch full webinar here: https://bit.ly/3JlhTnT
In the last few years, Data Virtualization technology has experienced tremendous growth, emerging as a key component for enabling modern data architectures such as the logical data warehouse, data fabric, and data mesh.
Gartner recently named it “a must-have data integration component” and estimated that it results in 45% cost savings in data integration, while Forrester has estimated 65% faster data delivery than ETL processes.
However, there are still misconceptions in the market about data virtualization technology, how it can be leveraged, and the real benefits that it can provide.
Catch this on-demand session where we review these misconceptions and discuss:
- What data virtualization is and what it is not
- Key capabilities of a modern data virtualization platform
- How to leverage data virtualization for faster data delivery
Understanding the Cloud and the Benefits for the Accountancy Sector - Present...LouisaHDUK
This seminar will provide an overview of the Cloud and its application to the accountancy sector. The session is aimed at accountants in practice and business that wish to understand more about the benefits and risks associated with Cloud Computing and how they might introduce it into their business.
We will define the Cloud; introduce the options available to your practice,; highlight the benefits; and address the common concerns.
This will be put into context with real life case studies from some accountancy clients that have implemented our Cloud solutions. This will provide a valuable insight into the practicalities of going down a hosted route.
By then end of the seminar you will understand what is meant by Cloud Computing; appreciate the benefits and issues; and be able to evaluate if this type of solution is applicable to your business.
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY
Thirty years is a long time for a technology foundation to be as active as relational databases. Are their replacements here? In this webinar, we say no.
Databases have not sat around while Hadoop emerged. The Hadoop era generated a ton of interest and confusion, but is it still relevant as organizations are deploying cloud storage like a kid in a candy store? We’ll discuss what platforms to use for what data. This is a critical decision that can dictate two to five times additional work effort if it’s a bad fit.
Drop the herd mentality. In reality, there is no “one size fits all” right now. We need to make our platform decisions amidst this backdrop.
This webinar will distinguish these analytic deployment options and help you platform 2020 and beyond for success.
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
Whether to take data ingestion cycles off the ETL tool and the data warehouse or to facilitate competitive Data Science and building algorithms in the organization, the data lake – a place for unmodeled and vast data – will be provisioned widely in 2020.
Though it doesn’t have to be complicated, the data lake has a few key design points that are critical, and it does need to follow some principles for success. Avoid building the data swamp, but not the data lake! The tool ecosystem is building up around the data lake and soon many will have a robust lake and data warehouse. We will discuss policy to keep them straight, send data to its best platform, and keep users’ confidence up in their data platforms.
Data lakes will be built in cloud object storage. We’ll discuss the options there as well.
Get this data point for your data lake journey.
Benefits of Cloud Hosting and SaaS Solutions for IT Solution Providers and th...Janine Soika
This presentation provides information on what Cloud, Hosting, and SaaS solutions are, and how they can be resold by IT Solution providers to build a profitable monthly recurring business, and the cost savings benefits to end customers for using these solutions.
It is a fascinating, explosive time for enterprise analytics.
It is from the position of analytics leadership that the mission will be executed and company leadership will emerge. The data professional is absolutely sitting on the performance of the company in this information economy and has an obligation to demonstrate the possibilities and originate the architecture, data, and projects that will deliver analytics. After all, no matter what business you’re in, you’re in the business of analytics.
The coming years will be full of big changes in enterprise analytics and Data Architecture. William will kick off the fourth year of the Advanced Analytics series with a discussion of the trends winning organizations should build into their plans, expectations, vision, and awareness now.
Data Architecture Best Practices for Advanced AnalyticsDATAVERSITY
Many organizations are immature when it comes to data and analytics use. The answer lies in delivering a greater level of insight from data, straight to the point of need.
There are so many Data Architecture best practices today, accumulated from years of practice. In this webinar, William will look at some Data Architecture best practices that he believes have emerged in the past two years and are not worked into many enterprise data programs yet. These are keepers and will be required to move towards, by one means or another, so it’s best to mindfully work them into the environment.
Estimating the Total Costs of Your Cloud Analytics PlatformDATAVERSITY
Organizations today need a broad set of enterprise data cloud services with key data functionality to modernize applications and utilize machine learning. They need a platform designed to address multi-faceted needs by offering multi-function Data Management and analytics to solve the enterprise’s most pressing data and analytic challenges in a streamlined fashion. They need a worry-free experience with the architecture and its components.
The Shifting Landscape of Data IntegrationDATAVERSITY
Enterprises and organizations from every industry and scale are working to leverage data to achieve their strategic objectives — whether they are to be more profitable, effective, risk-tolerant, prepared, sustainable, and/or adaptable in an ever-changing world. Data has exploded in volume during the last decade as humans and machines alike produce data at an exponential pace. Also, exciting technologies have emerged around that data to improve our abilities and capabilities around what we can do with data.
Behind this data revolution, there are forces at work, causing enterprises to shift the way they leverage data and accelerate the demand for leverageable data. Organizations (and the climates in which they operate) are becoming more and more complex. They are also becoming increasingly digital and, thus, dependent on how data informs, transforms, and automates their operations and decisions. With increased digitization comes an increased need for both scale and agility at scale.
In this session, we have undertaken an ambitious goal of evaluating the current vendor landscape and assessing which platforms have made, or are in the process of making, the leap to this new generation of Data Management and integration capabilities.
Many companies have discovered that there is “gold” in their server log files and machine data. Closely monitoring this data can improve security, help prevent costly outages and reduce the time it takes to recover from a problem. In this presentation, GTRI’s Micah Montgomery explains how operational intelligence can be gained from machine data, and how Splunk Enterprise can turn this data into actionable insights. Also presenting was NetApp’s Steve Fritzinger, who discussed how to manage the challenges of capturing and storing a flood of data without breaking the bank.
Presented at "Denver Big Data Analytics Day" on May 18, 2016 at GTRI.
Curiosity and Lemontree present - Data Breaks DevOps: Why you need automated ...Curiosity Software Ireland
This webinar was co-hosted by Curiosity and Lemontree on April 22nd, 2021. Watch the webinar on demand - https://opentestingplatform.curiositysoftware.ie/data-breaks-devops-webinar
DevOps and continuous delivery are only as fast as their slowest part. For many organisations, testing remains the major sticking point. It’s viewed as a necessary bottleneck, at fault for delaying releases, yet still unable to catch bugs before they hit production. One persistent, yet often overlooked, barrier is commonly at fault: test data. Data is the place to improve release velocity and quality today.
For many test teams today, test data delays remain their greatest bottleneck. Many still rely on a central team for data provisioning, before spending further time finding and making the data they need for a particular test suite. This siloed “request and receive” approach to data provisioning will always be a game of catch-up. Development is constantly getting faster, releasing systems that require increasingly complex data. Manually finding, securing and copying that data will never be able to keep up.
Delivering quality systems at speed instead requires on demand access to rich and interrelated data. With today’s technologies, that means “allocating” data during CI/CD processes and automated testing, making rich and compliant data available to parallel teams and frameworks automatically.
This webinar will present a pragmatic approach for moving from current test data processes to “just in time” data allocation. Veteran test data innovator, Huw Price, will offer cutting edge techniques for allocating rich test data from a range of sources on-the-fly. This “Test Data Automation” ensures that every test and tester has the data they need, exactly when and where they need it.
Learn how to solve the top 3 challenges Snowflake customers face, and what you can do to ensure high-performance, intelligent analytics at any scale. Ideal for those currently using Snowflake and those considering it.
https://www.brighttalk.com/webcast/18317/422499
Learn more about the tools, techniques and technologies for working productively with data at any scale. This presentation introduces the family of data analytics tools on AWS which you can use to collect, compute and collaborate around data, from gigabytes to petabytes. We'll discuss Amazon Elastic MapReduce, Hadoop, structured and unstructured data, and the EC2 instance types which enable high performance analytics.
Jon Einkauf, Senior Product Manager, Elastic MapReduce, AWS
Alan Priestley, Marketing Manager, Intel and Bob Harris, CTO, Channel 4
How Greenhouse Software Unlocked the Power of Machine Data Analytics with Sum...Amazon Web Services
Sumo Logic offers a powerful cloud-native analytics solution that supports all types of machine data. Our platform integrates easily with your AWS infrastructure supporting fast, accurate and secure analysis and monitoring of enormous amounts of data—giving you clear and direct visibility into its operations.
In this webinar, you’ll learn how organizations such as Greenhouse Software harness cloud-native machine data analytics to optimize the internal and external process lifecycles, monitor the health of all AWS application and services and deliver a WOW application to their end users.
Watch full webinar here: https://bit.ly/3JlhTnT
In the last few years, Data Virtualization technology has experienced tremendous growth, emerging as a key component for enabling modern data architectures such as the logical data warehouse, data fabric, and data mesh.
Gartner recently named it “a must-have data integration component” and estimated that it results in 45% cost savings in data integration, while Forrester has estimated 65% faster data delivery than ETL processes.
However, there are still misconceptions in the market about data virtualization technology, how it can be leveraged, and the real benefits that it can provide.
Catch this on-demand session where we review these misconceptions and discuss:
- What data virtualization is and what it is not
- Key capabilities of a modern data virtualization platform
- How to leverage data virtualization for faster data delivery
Understanding the Cloud and the Benefits for the Accountancy Sector - Present...LouisaHDUK
This seminar will provide an overview of the Cloud and its application to the accountancy sector. The session is aimed at accountants in practice and business that wish to understand more about the benefits and risks associated with Cloud Computing and how they might introduce it into their business.
We will define the Cloud; introduce the options available to your practice,; highlight the benefits; and address the common concerns.
This will be put into context with real life case studies from some accountancy clients that have implemented our Cloud solutions. This will provide a valuable insight into the practicalities of going down a hosted route.
By then end of the seminar you will understand what is meant by Cloud Computing; appreciate the benefits and issues; and be able to evaluate if this type of solution is applicable to your business.
Similar to Data Con LA 2022 - Supercharge your Snowflake Data Cloud from a Snowflake Data super hero (20)
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA
Mike Limcaco, Analytics Specialist / Customer Engineer at Google
Measure trends in a particular topic or search term across Google Search across the US down to the city-level. Integrate these data signals into analytic pipelines to drive product, retail, media (video, audio, digital content) recommendations tailored to your audience segment. We'll discuss how Google unique datasets can be used with Google Cloud smart analytic services to process, enrich and surface the most relevant product or content that matches the ever-changing interests of your local customer segment.
Melinda Thielbar, Data Science Practice Lead and Director of Data Science at Fidelity Investments
From corporations to governments to private individuals, most of the AI community has recognized the growing need to incorporate ethics into the development and maintenance of AI models. Much of the current discussion, though, is meant for leaders and managers. This talk is directed to data scientists, data engineers, ML Ops specialists, and anyone else who is responsible for the hands-on, day-to-day of work building, productionalizing, and maintaining AI models. We'll give a short overview of the business case for why technical AI expertise is critical to developing an AI Ethics strategy. Then we'll discuss the technical problems that cause AI models to behave unethically, how to detect problems at all phases of model development, and the tools and techniques that are available to support technical teams in Ethical AI development.
Data Con LA 2022 - Improving disaster response with machine learningData Con LA
Antje Barth, Principal Developer Advocate, AI/ML at AWS & Chris Fregly, Principal Engineer, AI & ML at AWS
The frequency and severity of natural disasters are increasing. In response, governments, businesses, nonprofits, and international organizations are placing more emphasis on disaster preparedness and response. Many organizations are accelerating their efforts to make their data publicly available for others to use. Repositories such as the Registry of Open Data on AWS and Humanitarian Data Exchange contain troves of data available for use by developers, data scientists, and machine learning practitioners. In this session, see how a community of developers came together though the AWS Disaster Response hackathon to build models to support natural disaster preparedness and response.
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
Sig Narvaez, Executive Solution Architect at MongoDB
MongoDB is now a Developer Data Platform. Come learn what�s new in the 6.0 release and Atlas following all the recent announcements made at MongoDB World 2022. Topics will include
- Atlas Search which combines 3 systems into one (database, search engine, and sync mechanisms) letting you focus on your product's differentiation.
- Atlas Data Federation to seamlessly query, transform, and aggregate data from one or more MongoDB Atlas databases, Atlas Data Lake and AWS S3 buckets
- Queryable Encryption lets you run expressive queries on fully randomized encrypted data to meet the most stringent security requirements
- Relational Migrator which analyzes your existing relational schemas and helps you design a new MongoDB schema.
- And more!
Data Con LA 2022 - Real world consumer segmentationData Con LA
Jaysen Gillespie, Head of Analytics and Data Science at RTB House
1. Shopkick has over 30M downloads, but the userbase is very heterogeneous. Anecdotal evidence indicated a wide variety of users for whom the app holds long-term appeal.
2. Marketing and other teams challenged Analytics to get beyond basic summary statistics and develop a holistic segmentation of the userbase.
3. Shopkick's data science team used SQL and python to gather data, clean data, and then perform a data-driven segmentation using a k-means algorithm.
4. Interpreting the results is more work -- and more fun -- than running the algo itself. We'll discuss how we transform from ""segment 1"", ""segment 2"", etc. to something that non-analytics users (Marketing, Operations, etc.) could actually benefit from.
5. So what? How did team across Shopkick change their approach given what Analytics had discovered.
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA
Ravi Pillala, Chief Data Architect & Distinguished Engineer at Intuit
TurboTax is one of the well known consumer software brand which at its peak serves 385K+ concurrent users. In this session, We start with looking at how user behavioral data & tax domain events are captured in real time using the event bus and analyzed to drive real time personalization with various TurboTax data pipelines. We will also look at solutions performing analytics which make use of these events, with the help of Kafka, Apache Flink, Apache Beam, Spark, Amazon S3, Amazon EMR, Redshift, Athena and Amazon lambda functions. Finally, we look at how SageMaker is used to create the TurboTax model to predict if a customer is at risk or needs help.
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA
George Mansoor, Chief Information Systems Officer at California State University
Overview of the CSU Data Architecture on moving on-prem ERP data to the AWS Cloud at scale using Delphix for Data Replication/Virtualization and AWS Data Migration Service (DMS) for data extracts
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA
Anand Ranganathan, Chief AI Officer at Unscrambl
Conversational AI is getting more and more widely used for customer support and employee support use-cases. In this session, I'm going to talk about how it can be extended for data analysis and data science use-cases ... i.e., how users can interact with a bot to ask analytical questions on data in relational databases.
This allows users to explore complex datasets using a combination of text and voice questions, in natural language, and then get back results in a combination of natural language and visualizations. Furthermore, it allows collaborative exploration of data by a group of users in a channel in platforms like Microsoft Teams, Slack or Google Chat.
For example, a group of users in a channel can ask questions to a bot in plain English like ""How many cases of Covid were there in the last 2 months by state and gender"" or ""Why did the number of deaths from Covid increase in May 2022"", and jointly look at the results that come back. This facilitates data awareness, data-driven collaboration and joint decision making among teams in enterprises and outside.
In this talk, I'll describe how we can bring together various features including natural-language understanding, NL-to-SQL translation, dialog management, data story-telling, semantic modeling of data and augmented analytics to facilitate collaborate exploration of data using conversational AI.
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA
Anil Inamdar, VP & Head of Data Solutions at Instaclustr
The most modernized enterprises utilize polyglot architecture, applying the best-suited database technologies to each of their organization's particular use cases. To successfully implement such an architecture, though, you need a thorough knowledge of the expansive NoSQL data technologies now available.
Attendees of this Data Con LA presentation will come away with:
-- A solid understanding of the decision-making process that should go into vetting NoSQL technologies and how to plan out their data modernization initiatives and migrations.
-- They will learn the types of functionality that best match the strengths of NoSQL key-value stores, graph databases, columnar databases, document-type databases, time-series databases, and more.
-- Attendees will also understand how to navigate database technology licensing concerns, and to recognize the types of vendors they'll encounter across the NoSQL ecosystem. This includes sniffing out open-core vendors that may advertise as “open source,"" but are driven by a business model that hinges on achieving proprietary lock-in.
-- Attendees will also learn to determine if vendors offer open-code solutions that apply restrictive licensing, or if they support true open source technologies like Hadoop, Cassandra, Kafka, OpenSearch, Redis, Spark, and many more that offer total portability and true freedom of use.
Data Con LA 2022 - Intro to Data ScienceData Con LA
Zia Khan, Computer Systems Analyst and Data Scientist at LearningFuze
Data Science tutorial is designed for people who are new to Data Science. This is a beginner level session so no prior coding or technical knowledge is required. Just bring your laptop with WiFi capability. The session starts with a review of what is data science, the amount of data we generate and how companies are using that data to get insight. We will pick a business use case, define the data science process, followed by hands-on lab using python and Jupyter notebook. During the hands-on portion we will work with pandas, numpy, matplotlib and sklearn modules and use a machine learning algorithm to approach the business use case.
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA
Mariana Danilovic, Managing Director at Infiom, LLC
We will address:
(1) Community creation and engagement using tokens and NFTs
(2) Organization of DAO structures and ways to incentivize Web3 communities
(3) DeFi business models applied to Web3 ventures
(4) Why Metaverse matters for new entertainment and community engagement models.
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
Curtis ODell, Global Director Data Integrity at Tricentis
Join me to learn about a new end-to-end data testing approach designed for modern data pipelines that fills dangerous gaps left by traditional data management tools—one designed to handle structured and unstructured data from any source. You'll hear how you can use unique automation technology to reach up to 90 percent test coverage rates and deliver trustworthy analytical and operational data at scale. Several real world use cases from major banks/finance, insurance, health analytics, and Snowflake examples will be presented.
Key Learning Objective
1. Data journeys are complex and you have to ensure integrity of the data end to end across this journey from source to end reporting for compliance
2. Data Management tools do not test data, they profile and monitor at best, and leave serious gaps in your data testing coverage
3. Automation with integration to DevOps and DataOps' CI/CD processes are key to solving this.
4. How this approach has impact in your vertical
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA
Arif Ansari, Professor at University of Southern California
Super Bowl Ad cost $7 million and each year a few Super Bowl ads go viral. The traditional A/B testing does not predict virality. Some highly shared ones reach over 60 million organic views, which can be more valuable than views on TV. Not only are these voluntary, but they are typically without distraction, and win viewer engagement in the form of likes, comments, or shares. A Super Bowl ad that wins 69 million views on YouTube (e.g., Alexa Mind Reader) costs less than 10 cents per quality view! However, the challenge is triggering virality. We developed a method to predict virality and engineer virality into Ads.
1. Prof. Gerard J. Tellis and co-authors recommended that advertisers use YouTube to tease, test, and tweak (TTT) their ads to maximize sharing and viewing. 2022 saw that maxim put into practice.
2. We developed viral Ads prediction using two scientific models:
a. Prof. Gerard Tellis et al.'s model for viral prediction
b. Deep Learning viral prediction using social media effect
3. The model was able to identify all the top 15 Viral Ads it performed better than the traditional agencies.
4. New proposed method is Tease, Test, Tweak, Target and Spots Ad.
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA
Jai Bansal, Senior Manager, Data Science at Aetna
This talk describes an internal data product called Member Embeddings that facilitates modeling of member medical journeys with machine learning.
Medical claims are the key data source we use to understand health journeys at Aetna. Claims are the data artifacts that result from our members' interactions with the healthcare system. Claims contain data like the amount the provider billed, the place of service, and provider specialty. The primary medical information in a claim is represented in codes that indicate the diagnoses, procedures, or drugs for which a member was billed. These codes give us a semi-structured view into the medical reason for each claim and so contain rich information about members' health journeys. However, since the codes themselves are categorical and high-dimensional (10K cardinality), it's challenging to extract insight or predictive power directly from the raw codes on a claim.
To transform claim codes into a more useful format for machine learning, we turned to the concept of embeddings. Word embeddings are widely used in natural language processing to provide numeric vector representations of individual words.
We use a similar approach with our claims data. We treat each claim code as a word or token and use embedding algorithms to learn lower-dimensional vector representations that preserve the original high-dimensional semantic meaning.
This process converts the categorical features into dense numeric representations. In our case, we use sequences of anonymized member claim diagnosis, procedure, and drug codes as training data. We tested a variety of algorithms to learn embeddings for each type of claim code.
We found that the trained embeddings showed relationships between codes that were reasonable from the point of view of subject matter experts. In addition, using the embeddings to predict future healthcare-related events outperformed other basic features, making this tool an easy way to improve predictive model performance and save data scientist time.
Data Con LA 2022 - Data Streaming with KafkaData Con LA
Jie Chen, Manager Advisory, KPMG
Data is the new oil. However, many organizations have fragmented data in siloed line of businesses. In this topic, we will focus on identifying the legacy patterns and their limitations and introducing the new patterns packed by Kafka's core design ideas. The goal is to tirelessly pursue better solutions for organizations to overcome the bottleneck in data pipelines and modernize the digital assets for ready to scale their businesses. In summary, we will walk through three uses cases, recommend Dos and Donts, Take aways for Data Engineers, Data Scientist, Data architect in developing forefront data oriented skills.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Data Con LA 2022 - Supercharge your Snowflake Data Cloud from a Snowflake Data super hero
1. Snowflake Data Cloud
Optimization Fun!
How to Optimize your
Snowflake Data Cloud
Best Practices and Tips
from a Snowflake Data
Superhero
2. • Intro – Speaker and the Snowflake Data Cloud
• Snowflake Evolution and Difference
• Snowflake Consumption Pricing
• Snowflake Optimization
– Cost
– Performance
– Security
• Optimization Techniques
– Hard Way
– Easy Way (Automation Tools)
• New Cost and Resource Monitoring Tools coming in 2022
– Resource Groups
– Budget Assignment
DATACONLA – 2022 - Agenda
3. Frank Bell
*Main author of the book Snowflake Essentials
*Top Snowflake Data Thought Leader at Accenture
*Started Snowflake LA Users Group 2018
*Ran top Snowflake Data Cloud Consulting Practice
[acquired by Fairway/Accenture in 2019]
*25+ years of data .. “stuff”
*Created Snowflake Solutions and Snoptimizer
https://snowflakesolutions.net – Snowflake Business User and Developer Community with
Knowledge Repository and Snowflake Tools
https://snoptimizer.com – Automated Snowflake Data Cloud Optimization Service
Snowflake Data Superhero
*Creator of Snowflake Solutions and Snoptimizer
4. • Snowflake Data Cloud is:
• A cloud based database and
interconnected data system
that can handle multiple
workloads
• Fully connected data cloud
with “no-copy” data sharing,
data cloning, and data usage
enabled within a cloud
provider region.
What is the Snowflake Data Cloud?
KEY POINTS:
*Unified system
*Connects companies and data
providers to data
*Single and seamless experience
*Run across multiple public cloud
*Removes massive amounts of
friction from data access and
processing
5. • 2014-2018 – Snowflake Database (Focused on Data
Warehouse)
• Structured and Semi-Structured Data
• 2019-2020
• 2021 Snowflake Data Cloud
• Workloads: Data Warehousing, Data Lakes, Data
Applications, Data Science, etc.
• Data Types: Structured,Semi-Structured,Unstructured
Snowflake Evolution
6. Frank’s Snowflake Differentiators: (from all other data systems)
#1: Architected on cloud providers separated storage
and compute.
#2: Micro-Partition Architecture
#3: ”no-copy” data cloning – enables DataOps and
true Agile Data Systems/Applications
#4: “no-copy” data sharing – “game changer”
Why is the Snowflake Data Cloud
so different? (Part 1)
7. Frank’s Snowflake Differentiators: (continued)
#5: Capability to process Structured, Semi-Structured
and fully Unstructured data all in one system
#6: End to End data processing and data science
workloads in one fully secure and governed system.
#7: in progress
#8: in progress
Why is the Snowflake Data Cloud
so different? (Part 2)
8. • Overall Snowflake is Awesome
but it is not continuously optimized
for cost, security, or performance
• Snowflake continues to add excellent features and many
of them improve performance but also adds cost,
security, and performance complexity. -
Snowflake
Consumption Based Pricing
9. Consumption Based
Pricing is Awesome!
Unless you have:
1. no optimizations
2. resource constraints!
Some Snowflake
Customers who have a
budget of $50,000/year
can blow through it in 3
days without
optimization
Medium to large size
customers who do not
optimize extensively can
miss out on saving
$5,000+/month
Automated Snowflake Data
Cloud Optimization is what
keeps Consumption Pricing
on Snowflake Awesome!
Then it's not …
10. Resource Monitors, warehouse
Optimization, etc.
Cost
Roles, Warehouses, Privileges, Grants,
Stale Users, Network Policies, and 30
more tests
Security
Queuing, Spilling, etc.
Performance
Snowflake Optimization Areas
11. Resource Monitors, warehouse
Optimization, etc.
Cost
Roles, Warehouses, Privileges, Grants,
Stale Users, Network Policies, and 30
more tests
Security
Queuing, Spilling, etc.
Performance
Snowflake Cost Optimization
Best Practices
12. Most costs found here
Query Consumption
Typically should be low
Automatic Clustering
What out for stages & time
travel
Storage Consumption
Need to monitor
Search Optimization
Need to monitor
Materialized Views
Typically low but you still need
to monitor
Cloud Services
Typically low - very efficient
Pipes
Need to monitor if you have
replication set up
Replication
Cost Areas on Snowflake
13. Before After
Customer A Profile
• Capacity Contract -$100,000/year
• Data Platform Size: 10+ TBs
• Uses Snowpipe for IOT data
• $12,000+/month consumption costs
Optimization Results
• Cost Optimization around Load
Warehouses with non-optimized
suspend and parameters setting
• Exact code to fix above issues
• $10,200/month consumption costs
Cost Optimization
14. Tip #1
Cost Optimization
Credit Consumption
Optimization - Using
Resource Monitors
(and soon Resource
Groups)
*Put actual code here
on Resource Monitor
Best Practices
18. Snowflake Optimization
Techniques
Snowflake Optimization Technique Services Available PROS CONS
*Reporting on Costs. Manual
Optimizations
*link to cost reporting Quick to implement. Not comprehensive at all. Also, very
reactive versus proactive. Major
problems can occur on cost,
performance, and security very quickly.
*Custom Coded automated optimization
system
* More thorough and typically more
proactive than reporting and finding
error culture.
Extremely expensive. Time-consuming.
Easily outdated if large staff of
performance experts is not continuously
used.
*Relying on Human based Consulting or
Snowflake Professional Service Health
Checks
Snowflake Usually engages reasonably detailed
Snowflake Consulting Experts
Expensive and not repeatable.
Prone to human error. Is outdated often
within days of it being finished.
*Fully Automated SaaS services that
automate optimization
Snoptimizer
Nadilytics SaaS
Security [. ]
Automates the tremendous complexity
of optimizing Snowflake for cost,
performance and security continuously.
Medium Cost
?
19. Resource Monitors, warehouse
Optimization, etc.
Cost
Roles, Warehouses, Privileges, Grants,
Stale Users, Network Policies, and 30
more tests
Security
Queuing, Spilling, etc.
Performance
Snowflake Performance Optimization
Best Practices
23. Customer C Profile
• Capacity Contract -$200,000/year
• Data Platform Size: 50++ TBs
• 3TB ++ size table with non-optimized
cluster keys - Many queries running
over several minutes
• Slower External Table Queries
Optimization Results
• Improved Cluster Keys and queries
above 1 minute all fell under 1 minute
• Materialized Views for certain External
Tables - Massive Query Performance
Improvement
Before After
Performance
Optimization
24. Resource Monitors, warehouse
Optimization, etc.
Cost
Roles, Warehouses, Privileges, Grants,
Stale Users, Network Policies, and 30
more tests
Security
Queuing, Spilling, etc.
Performance
Snowflake Security Optimization
Best Practices
29. Customer B Profile
• Capacity Contract: $50,000/year
• Data Platform Size: 1-2TBs
• Significant Cost Risk due to a large
amount of users having CREATE
WAREHOUSE at any size granted.
• Stale objects with too many users
having access
• Stages not properly secured
Optimization Results
• Improved RBAC, cost risk significantly
reduced
• No stale objects
• Stages properly secured
• Continuous Security Analysis and
Protection
Before After
Security Optimization
30. Optimization Results
• In our tests, on average we have seen 10-30% cost savings,
1000s of security issues fixed and 100s of performance
problems solved.
• In some implementations we have seen almost 50% cost
savings.
31. Contact Us
Frank Bell, Snowflake Data Superhero [3rd year]
fbell@itstrategists.com
https://snowflakesolutions.net
https://snoptimizer.com
Snowflake Questions?