DF1 - ML - Petukhov - Azure Ml Machine Learning as a ServiceMoscowDataFest
Presentation from Moscow Data Fest #1, September 12.
Moscow Data Fest is a free one-day event that brings together Data Scientists for sessions on both theory and practice.
Link: http://www.meetup.com/Moscow-Data-Fest/
Scala: the unpredicted lingua franca for data scienceAndy Petrella
Talk given at Strata London with Dean Wampler (Lightbend) about Scala as the future of Data Science. First part is an approach of how scala became important, the remaining part of the talk is in notebooks using the Spark Notebook (http://spark-notebook.io/).
The notebooks are available on GitHub: https://github.com/data-fellas/scala-for-data-science.
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Big Data Spain
All modern Big Data solutions, like Hadoop, Kafka or the rest of the ecosystem tools, are designed as distributed processes and as such include some sort of redundancy for High Availability.
https://www.bigdataspain.org/2017/talk/disaster-recovery-for-big-data
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
DF1 - ML - Petukhov - Azure Ml Machine Learning as a ServiceMoscowDataFest
Presentation from Moscow Data Fest #1, September 12.
Moscow Data Fest is a free one-day event that brings together Data Scientists for sessions on both theory and practice.
Link: http://www.meetup.com/Moscow-Data-Fest/
Scala: the unpredicted lingua franca for data scienceAndy Petrella
Talk given at Strata London with Dean Wampler (Lightbend) about Scala as the future of Data Science. First part is an approach of how scala became important, the remaining part of the talk is in notebooks using the Spark Notebook (http://spark-notebook.io/).
The notebooks are available on GitHub: https://github.com/data-fellas/scala-for-data-science.
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Big Data Spain
All modern Big Data solutions, like Hadoop, Kafka or the rest of the ecosystem tools, are designed as distributed processes and as such include some sort of redundancy for High Availability.
https://www.bigdataspain.org/2017/talk/disaster-recovery-for-big-data
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
Tech talk on what Azure Databricks is, why you should learn it and how to get started. We'll use PySpark and talk about some real live examples from the trenches, including the pitfalls of leaving your clusters running accidentally and receiving a huge bill ;)
After this you will hopefully switch to Spark-as-a-service and get rid of your HDInsight/Hadoop clusters.
This is part 1 of an 8 part Data Science for Dummies series:
Databricks for dummies
Titanic survival prediction with Databricks + Python + Spark ML
Titanic with Azure Machine Learning Studio
Titanic with Databricks + Azure Machine Learning Service
Titanic with Databricks + MLS + AutoML
Titanic with Databricks + MLFlow
Titanic with DataRobot
Deployment, DevOps/MLops and Operationalization
Managing and Deploying High Performance Computing Clusters using Windows HPC ...Saptak Sen
The new management features built into Windows HPC Server 2008 R2 are the foundation for deploying and managing HPC clusters of scale up to 1000 nodes. Join us for a deep dive in monitoring and diagnostic tools, a review of the updated heat-map and template-based deployment. We also cover the new PowerShell-based scripting capabilities: the basics of management shell, as well as the underlying design and key concepts, new Reporting Capabilities, and a discussion on network boot.
Adam Fuchs' presentation slides on what's next in the evolution of BigTable implementations (transactions, indexing, etc.) and what these advances could mean for the massive database that gave rise to Google.
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhDAdnan Masood
Spark is a unified framework for big data analytics. Spark provides one integrated API for use by developers, data scientists, and analysts to perform diverse tasks that would have previously required separate processing engines such as batch analytics, stream processing and statistical modeling. Spark supports a wide range of popular languages including Python, R, Scala, SQL, and Java. Spark can read from diverse data sources and scale to thousands of nodes.
In this presentation we discuss Microsoft HDInsight offering of Spark. Azure HDInsight, Microsoft’s managed Hadoop and Spark cloud service that runs the Hortonworks Data Platform. Spark for Azure HDInsight offers customers an enterprise-ready Spark solution that’s fully managed, secured, and highly available and made simpler for users with compelling and interactive experiences.
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...DataWorks Summit
The Census Bureau is the U.S. government's largest statistical agency with a mission to provide current facts and figures about America's people, places and economy. The Bureau operates a large number of surveys to collect this data, the most well known being the decennial population census. Data is being collected in increasing volumes and the analytics solutions must be able to scale to meet the ever increasing needs while maintaining the confidentiality of the data. Past data analytics have occurred in processing silos inhibiting the sharing of information and common reference data is replicated across multiple system. The use of the Hortonworks Data Platform, Hortonworks Data Flow and other open-source technologies is enabling the creation of a cloud-based enterprise data lake and analytics platform. Cloud object stores are used to provide scalable data storage and cloud compute supports permanent and transient clusters. Data governance tools are used to track the data lineage and to provide access controls to sensitive data.
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big DataHakka Labs
By Doug Daniels (Director of Engineering, Data Dog)
At Datadog, we collect hundreds of billions of metric data points per day from hosts, services, and customers all over the world. In addition charting and monitoring this data in real time, we also run many large-scale offline jobs to apply algorithms and compute aggregations on the data. In the past months, we’ve migrated our largest data sets over to Apache Parquet—an efficient, portable columnar storage format
Slides from the August 2021 St. Louis Big Data IDEA meeting from Sam Portillo. The presentation covers AWS EMR including comparisons to other similar projects and lessons learned. A recording is available in the comments for the meeting.
What is Splunk? At the end of this session you’ll have a high-level understanding of the pieces that make up the Splunk Platform, how it works, and how it fits in the landscape of Big Data. You’ll see practical examples that differentiate Splunk while demonstrating how to gain quick time to value.
In this knolx session, we will come to know about Delta Lake and its features. Delta Lake is one of the greatest innovations by Databricks that makes existing data lakes more scalable and reliable. Delta Lake is an open source storage layer that brings reliability to data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. Delta Lake runs on top of our existing data lake and is fully compatible with Apache Spark APIs.
Jump Start on Apache Spark 2.2 with DatabricksAnyscale
Apache Spark 2.0 and subsequent releases of Spark 2.1 and 2.2 have laid the foundation for many new features and functionality. Its main three themes—easier, faster, and smarter—are pervasive in its unified and simplified high-level APIs for Structured data.
In this introductory part lecture and part hands-on workshop, you’ll learn how to apply some of these new APIs using Databricks Community Edition. In particular, we will cover the following areas:
Agenda:
• Overview of Spark Fundamentals & Architecture
• What’s new in Spark 2.x
• Unified APIs: SparkSessions, SQL, DataFrames, Datasets
• Introduction to DataFrames, Datasets and Spark SQL
• Introduction to Structured Streaming Concepts
• Four Hands-On Labs
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsStreamsets Inc.
Big data and the cloud are perfect partners for companies who want to unlock maximum value from all of their unstructured, semi-structured, and structured data. The challenge has been how to create and manage a reliable end-to-end solution that spans data ingestion, storage and analysis in the face of the volume, velocity and variety of big data sources.
In this webinar, we will show you how to achieve big data bliss by combining StreamSets Data Collector, which specializes in creating and running complex any-to-any dataflows, with Microsoft's Azure Data Lake and Azure analytic solutions.
We will walk through an example of how a major bank is using StreamSets to transport their on-premise data to the Azure Cloud Computing Platform and Azure Data Lake to take advantage of analytics tools with unprecedented scale and performance.
Here I talk about examples and use cases for Big Data & Big Data Analytics and how we accomplished massive-scale sentiment, campaign and marketing analytics for Razorfish using a collecting of database, Big Data and analytics technologies.
Using Spark and Riak for IoT Apps—Patterns and Anti-Patterns: Spark Summit Ea...Spark Summit
Everybody agrees that IoT is changing the world… and creates new challenges for software developers, architects and DevOps. How can we build efficient and highly scalable distributed applications using open-source technologies? What are characteristics of data generated by IoT devices and how it differs from traditional enterprise or Big Data problems? Which architectural patterns are beneficial for IoT use cases and why some trusted methods eventually turn out to be “anti-patterns”? This talk will show how to combine best-of-breed open-source technologies, like Apache Spark, Riak and Mesos to build scalable IoT pipelines to ingest, store and analyze huge amounts of data, while keeping operational complexity and costs under control. We will discuss cons and pros of using relational, NoSQL and object storage products for storing and archiving IoT data. Then we cover best practices how to use Spark with Riak NoSQL database. Will describe how Apache Spark advanced modules (Spark SQL, Spark Streaming and MLlib) can solve the problems common to IoT apps, while using Riak for fast and scalable persistence. At the end, will explain why Structured Spark Streaming is a godsend for IoT data and make a case for Time Series databases deserving a separate category in NoSQL classification.
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Big Data Spain
Hadoop clusters can store nearly everything in a cheap and blazingly fast way to your data lake. Answering questions and gaining insights out of this ever growing stream becomes the decisive part for many businesses.
https://www.bigdataspain.org/2017/talk/fishing-graphs-in-a-hadoop-data-lake
Big Data Spain 2017
16th - 17th November Kinépolis Madrid
Tech talk on what Azure Databricks is, why you should learn it and how to get started. We'll use PySpark and talk about some real live examples from the trenches, including the pitfalls of leaving your clusters running accidentally and receiving a huge bill ;)
After this you will hopefully switch to Spark-as-a-service and get rid of your HDInsight/Hadoop clusters.
This is part 1 of an 8 part Data Science for Dummies series:
Databricks for dummies
Titanic survival prediction with Databricks + Python + Spark ML
Titanic with Azure Machine Learning Studio
Titanic with Databricks + Azure Machine Learning Service
Titanic with Databricks + MLS + AutoML
Titanic with Databricks + MLFlow
Titanic with DataRobot
Deployment, DevOps/MLops and Operationalization
Managing and Deploying High Performance Computing Clusters using Windows HPC ...Saptak Sen
The new management features built into Windows HPC Server 2008 R2 are the foundation for deploying and managing HPC clusters of scale up to 1000 nodes. Join us for a deep dive in monitoring and diagnostic tools, a review of the updated heat-map and template-based deployment. We also cover the new PowerShell-based scripting capabilities: the basics of management shell, as well as the underlying design and key concepts, new Reporting Capabilities, and a discussion on network boot.
Adam Fuchs' presentation slides on what's next in the evolution of BigTable implementations (transactions, indexing, etc.) and what these advances could mean for the massive database that gave rise to Google.
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhDAdnan Masood
Spark is a unified framework for big data analytics. Spark provides one integrated API for use by developers, data scientists, and analysts to perform diverse tasks that would have previously required separate processing engines such as batch analytics, stream processing and statistical modeling. Spark supports a wide range of popular languages including Python, R, Scala, SQL, and Java. Spark can read from diverse data sources and scale to thousands of nodes.
In this presentation we discuss Microsoft HDInsight offering of Spark. Azure HDInsight, Microsoft’s managed Hadoop and Spark cloud service that runs the Hortonworks Data Platform. Spark for Azure HDInsight offers customers an enterprise-ready Spark solution that’s fully managed, secured, and highly available and made simpler for users with compelling and interactive experiences.
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...DataWorks Summit
The Census Bureau is the U.S. government's largest statistical agency with a mission to provide current facts and figures about America's people, places and economy. The Bureau operates a large number of surveys to collect this data, the most well known being the decennial population census. Data is being collected in increasing volumes and the analytics solutions must be able to scale to meet the ever increasing needs while maintaining the confidentiality of the data. Past data analytics have occurred in processing silos inhibiting the sharing of information and common reference data is replicated across multiple system. The use of the Hortonworks Data Platform, Hortonworks Data Flow and other open-source technologies is enabling the creation of a cloud-based enterprise data lake and analytics platform. Cloud object stores are used to provide scalable data storage and cloud compute supports permanent and transient clusters. Data governance tools are used to track the data lineage and to provide access controls to sensitive data.
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big DataHakka Labs
By Doug Daniels (Director of Engineering, Data Dog)
At Datadog, we collect hundreds of billions of metric data points per day from hosts, services, and customers all over the world. In addition charting and monitoring this data in real time, we also run many large-scale offline jobs to apply algorithms and compute aggregations on the data. In the past months, we’ve migrated our largest data sets over to Apache Parquet—an efficient, portable columnar storage format
Slides from the August 2021 St. Louis Big Data IDEA meeting from Sam Portillo. The presentation covers AWS EMR including comparisons to other similar projects and lessons learned. A recording is available in the comments for the meeting.
What is Splunk? At the end of this session you’ll have a high-level understanding of the pieces that make up the Splunk Platform, how it works, and how it fits in the landscape of Big Data. You’ll see practical examples that differentiate Splunk while demonstrating how to gain quick time to value.
In this knolx session, we will come to know about Delta Lake and its features. Delta Lake is one of the greatest innovations by Databricks that makes existing data lakes more scalable and reliable. Delta Lake is an open source storage layer that brings reliability to data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. Delta Lake runs on top of our existing data lake and is fully compatible with Apache Spark APIs.
Jump Start on Apache Spark 2.2 with DatabricksAnyscale
Apache Spark 2.0 and subsequent releases of Spark 2.1 and 2.2 have laid the foundation for many new features and functionality. Its main three themes—easier, faster, and smarter—are pervasive in its unified and simplified high-level APIs for Structured data.
In this introductory part lecture and part hands-on workshop, you’ll learn how to apply some of these new APIs using Databricks Community Edition. In particular, we will cover the following areas:
Agenda:
• Overview of Spark Fundamentals & Architecture
• What’s new in Spark 2.x
• Unified APIs: SparkSessions, SQL, DataFrames, Datasets
• Introduction to DataFrames, Datasets and Spark SQL
• Introduction to Structured Streaming Concepts
• Four Hands-On Labs
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsStreamsets Inc.
Big data and the cloud are perfect partners for companies who want to unlock maximum value from all of their unstructured, semi-structured, and structured data. The challenge has been how to create and manage a reliable end-to-end solution that spans data ingestion, storage and analysis in the face of the volume, velocity and variety of big data sources.
In this webinar, we will show you how to achieve big data bliss by combining StreamSets Data Collector, which specializes in creating and running complex any-to-any dataflows, with Microsoft's Azure Data Lake and Azure analytic solutions.
We will walk through an example of how a major bank is using StreamSets to transport their on-premise data to the Azure Cloud Computing Platform and Azure Data Lake to take advantage of analytics tools with unprecedented scale and performance.
Here I talk about examples and use cases for Big Data & Big Data Analytics and how we accomplished massive-scale sentiment, campaign and marketing analytics for Razorfish using a collecting of database, Big Data and analytics technologies.
Using Spark and Riak for IoT Apps—Patterns and Anti-Patterns: Spark Summit Ea...Spark Summit
Everybody agrees that IoT is changing the world… and creates new challenges for software developers, architects and DevOps. How can we build efficient and highly scalable distributed applications using open-source technologies? What are characteristics of data generated by IoT devices and how it differs from traditional enterprise or Big Data problems? Which architectural patterns are beneficial for IoT use cases and why some trusted methods eventually turn out to be “anti-patterns”? This talk will show how to combine best-of-breed open-source technologies, like Apache Spark, Riak and Mesos to build scalable IoT pipelines to ingest, store and analyze huge amounts of data, while keeping operational complexity and costs under control. We will discuss cons and pros of using relational, NoSQL and object storage products for storing and archiving IoT data. Then we cover best practices how to use Spark with Riak NoSQL database. Will describe how Apache Spark advanced modules (Spark SQL, Spark Streaming and MLlib) can solve the problems common to IoT apps, while using Riak for fast and scalable persistence. At the end, will explain why Structured Spark Streaming is a godsend for IoT data and make a case for Time Series databases deserving a separate category in NoSQL classification.
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Big Data Spain
Hadoop clusters can store nearly everything in a cheap and blazingly fast way to your data lake. Answering questions and gaining insights out of this ever growing stream becomes the decisive part for many businesses.
https://www.bigdataspain.org/2017/talk/fishing-graphs-in-a-hadoop-data-lake
Big Data Spain 2017
16th - 17th November Kinépolis Madrid
Streaming Real-time Data to Azure Data Lake Storage Gen 2Carole Gunst
Check out this presentation to learn the basics of using Attunity Replicate to stream real-time data to Azure Data Lake Storage Gen2 for analytics projects.
At our March Data Analytics Meetup, Dan Rodriguez and Cherian Mathew demonstrated the variations in Microsoft Azure programs and how they are impacting digital transformation.
Dans cette session nous vous présenterons les différentes manières d'utiliser SQL Server dans une infrastructure Cloud (Microsoft Azure). Seront présentés des scénarios hybrides, de migration, de backup, et d'hébergement de bases de données SQL Server en mode IaaS ou PaaS.
Customer migration to Azure SQL database, December 2019George Walters
This is a real life story on how a software as a service application moved to the cloud, to azure, over a period of two years. We discuss migration, business drivers, technology, and how it got done. We talk through more modern ways to refactor or change code to get into the cloud nowadays.
Data relay introduction to big data clustersChris Adkin
Data relay introduction to SQL Server 2019 big data clusters deck, including a brief overview of containers, Kubernetes and a recorded demo available on youtube.
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAlluxio, Inc.
Alluxio Tech Talk
January 21, 2020
Speakers:
Matt Fuller, Starburst
Dipti Borkar, Alluxio
With the advent of the public clouds and data increasingly siloed across many locations -- on premises and in the public cloud -- enterprises are looking for more flexibility and higher performance approaches to analyze their structured data.
Join us for this tech talk where we’ll introduce the Starburst Presto, Alluxio, and cloud object store stack for building a highly-concurrent and low-latency analytics platform. This stack provides a strong solution to run fast SQL across multiple storage systems including HDFS, S3, and others in public cloud, hybrid cloud, and multi-cloud environments. You’ll learn more about:
- The architecture of Presto, an open source distributed SQL engine
- How the Presto + Alluxio stack queries data from cloud object storage like S3 for faster and more cost-effective analytics
- Achieving data locality and cross-job caching with Alluxio regardless of where data is persisted
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)Trivadis
«Moderne» Data Warehouse/Data Lake Architekturen strotzen oft nur von Layern und Services. Mit solchen Systemen lassen sich Petabytes von Daten verwalten und analysieren. Das Ganze hat aber auch seinen Preis (Komplexität, Latenzzeit, Stabilität) und nicht jedes Projekt wird mit diesem Ansatz glücklich.
Der Vortrag zeigt die Reise von einer technologieverliebten Lösung zu einer auf die Anwender Bedürfnisse abgestimmten Umgebung. Er zeigt die Sonnen- und Schattenseiten von massiv parallelen Systemen und soll die Sinne auf das Aufnehmen der realen Kundenanforderungen sensibilisieren.
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Sumeet Singh
Since 2006, Hadoop and its ecosystem components have evolved into a platform that Yahoo has begun to trust for running its businesses globally. In this talk, we will take a broad look at some of the top software, hardware, and services considerations that have gone in to make the platform indispensable for nearly 1,000 active developers, including the challenges that come from scale, security and multi-tenancy. We will cover the current technology stack that we have built or assembled, infrastructure elements such as configurations, deployment models, and network, and and what it takes to offer hosted Hadoop services to a large customer base.
Delivering Data Democratization in the Cloud with SnowflakeKent Graziano
This is a brief introduction to Snowflake Cloud Data Platform and our revolutionary architecture. It contains a discussion of some of our unique features along with some real world metrics from our global customer base.
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyAlluxio, Inc.
Data Orchestration Summit 2020 organized by Alluxio
https://www.alluxio.io/data-orchestration-summit-2020/
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Sandipan Chakraborty, Director of Engineering (Rakuten)
About Alluxio: alluxio.io
Engage with the open source community on slack: alluxio.io/slack
10 Reasons Snowflake Is Great for AnalyticsSenturus
Learn why Snowflake analytic data warehouse makes sense for BI including data loading flexibility and scalability, consumption-based storage and compute costs, Time Travel and data sharing features, support across a range of BI tools like Power BI and Tableau and ability to allocate compute costs. View this on-demand webinar: https://senturus.com/resources/10-reasons-snowflake-is-great-for-analytics/.
Senturus offers a full spectrum of services in business intelligence and training on Cognos, Tableau and Power BI. Our resource library has hundreds of free live and recorded webinars, blog posts, demos and unbiased product reviews available on our website at: http://www.senturus.com/senturus-resources/.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
2. Argelo Royce P. Bautista
4 years in the it industry
Full-stack developer
Enjoys creating reports (SSRS, excel, & power query)
Enjoys exploring Microsoft technologies
Enjoys learning and knowledge sharing with the tech community
3. Hybrid Will Be the Most Common Use of
the Cloud
Gartner Says By 2020, a Corporate "No-Cloud" Policy Will Be as Rare as a "No-
Internet" Policy Is Today
By 2019, more than 30 percent of the 100 largest vendors' new software
investments will have shifted from cloud-first to cloud-only.
By 2020, more compute power will have been sold by IaaS and PaaS cloud
providers than sold and deployed into enterprise data centers.
The Infrastructure as a Service (IaaS) market has been growing more than 40 percent in
revenue per year since 2011, and it is projected to continue to grow more than 25
percent per year through 2019.
By 2019, the majority of virtual machines (VMs) will be delivered by IaaS providers. By
2020, the revenue for compute IaaS and Platform as a Service (PaaS) will exceed $55
billion — and likely pass the revenue for servers.
Reference: http://www.gartner.com/newsroom/id/3354117
4. “
”
Stretch feature of SQL Server
2016
A new way to archive your data.
Disclaimer:
Majority of the parts of the presentation were taken from Joe Yong’s presentation of Stretch Db and
Microsoft Introduction of SQL Server 2016’s new features at the Microsoft Ignite 2015
The videos of StretchDb and AlwaysEncrypted were taken from Microsoft’s New SQL. No Equal Series @
https://channel9.msdn.com
5. Agenda
1. Introduction
2. Use cases
3. Enabling Stretch on Database
4. Stretching the Table
5. Disable stretching
6. migrate data that satisfies condition
7. Use function to migrate data that
satisfy condition
8. Show Trickling and spaces used
9. Migrating child rows from related
tables
10. Querying the stretch table
11. Querying related stretched tables
12. Backup and Restore
13. Enable always encrypted
14. Stretching temporal tables
15. Pricing
6.
7. Capability
Stretch large operational tables
from on-premises to Azure with
the ability to query
Benefits
BI integration
for on-premises
and cloud
Cold/closed data
Orders
In-memory
OLTP table
Hot/active data
Order history
Stretched table
Trickle data movement and
remote query processing
On-premises Azure
Stretch SQL Server into Azure
Securely stretch cold tables to Azure with remote query processing
8. Databases today
Business users and decision makers
Want or need to retain cold data for many
years (indefinitely)
Cold data must be online
SAN consumption increasing
faster than IT budgets
Want to access cold data
from same application
Database administrators
Uncontrollable growth in size and sprawl
Operations windows exceeding SLA
(index, backup/restore)
Frequent storage acquisition and
provisioning
Users can’t/won’t say what data can be
deleted
TCO
PerfData access
Cost
Performance
■ Current Solutions
Admin
overhead
10. StretchDB considerations
No change to access controls, DBA processes,
applications, tools, operationsGoals
Cold data always online
Significantly lower storage TCO
Access cold data with existing
applications
Easier performance and index
maintenance
Faster backup/restore
Automatically managed and
protected cold data
Tradeoffs
Moderate performance reduction
for cold data access
Update and delete on cold data is
an administrative function
Some functional limitations
TCO
PerfData access
■ Stretch Database ■ Current Solutions
Cost
PerformanceAdmin
overhead
29. Enabling tables and specifying criteria for cold
data (Filter via TSQL using Function)
30. StretchDB – data movement
Azure
On-
premises
SQL Server
instanceOn-premises
application(s)
DB in SAN/Local
Storage
Ord_Detail
Storage
Ord_detail
_archive
table
Trickle data migration
Ord_detail_
archive
table
Txn_detail
Txn_detail
(cold rows
only)
Hot +
cold rows
in same
table
Compute
Entire
archive table
moved
Only cold
rows moved
On-premises
38. StretchDB – smart query processing
Azure
On-
premises
SQL Server
instanceOn-premises
application(s)
DB in SAN/Local
Storage
Ord_Detail
Storage
Ord_detail
_archive
table
Ord_detail_
archive
table
Txn_detail
Txn_detail
(cold rows
only)
Transparent remote
data access
Hot +
cold rows
in same
table
Compute
Local data
only queries
Local + remote
data queries
Trickle data migration
On-premises
52. SELECT * FROM Department
FOR SYSTEM_TIME
AS OF '2010.01.01' Facts:
1. History is much bigger than actual data
2. Retained between 3 and 10 years
3. “Warm”: up to a few weeks/months
4. “Cold”: rarely queried
Solution:
History as a stretch table:
PeriodEnd < “Now - 6 months”
Azure SQL Database
74. “
”
Questions?
Ask and you may or may not be answered.
The presentation will be uploaded on Docs.com
Join us @ https://www.facebook.com/groups/phissug
Majority of the parts of the presentation were taken from Joe Yong’s presentation of
Stretch Db and Microsoft Introduction of SQL Server 2016’s new features at the
Microsoft Ignite 2015
The videos of StretchDb and AlwaysEncrypted were taken from Microsoft’s New SQL.
No Equal Series @ https://channel9.msdn.com
75. References
Stretch Database: https://msdn.microsoft.com/en-us/library/dn935011.aspx
Stretching On-Premises Databases to the Cloud:
https://channel9.msdn.com/events/Ignite/2015/BRK2574
SQL Server Stretch DB: https://channel9.msdn.com/Shows/Data-Exposed/SQL-Server-
Stretch-DB
Stretch Database Securely and transparently leverage infinite storage and compute
capacity in Azure with SQL Server 2016:
https://channel9.msdn.com/events/DataDriven/SQLServer2016/StretchDatabase
Microsoft SQL Server 2016, Stretch Database:
https://channel9.msdn.com/events/DataDriven/New-SQL-No-Equal/Microsoft-SQL-
Server-2016-Stretch-Database
Introduction to Stretch Database: http://sqlperformance.com/2015/08/sql-server-
2016/intro-stretch-database
Kung may students man sa inyo, pwede nyokong kuning resource speaker. I usally discuss things about C#, ASP.NET MVC, LINQ, MS SQL Server, MS Excel, Power BI, Power Query. Certificate at food lang sapat na.
The curious case of hybrid and cloud.
Source: https://msdn.microsoft.com/en-us/library/dn935011(v=sql.130).aspx
Stretch Database lets you archive your historical data transparently and securely. In SQL Server 2016 Community Technology Preview 2 (CTP2), Stretch Database stores your historical data in the Microsoft Azure cloud. After you enable Stretch Database, it silently migrates your historical data to an Azure SQL Database.
You don't have to change existing queries and client apps. You continue to have seamless access to both local and remote data.
Your local queries and database operations against current data typically run faster.
You typically enjoy reduced cost and complexity.
We have two conflicting requirements, and we need to find a middle ground between them.
The proposed solution is SQL Server 2016’s StretchDB
By enabling stretchDB we are creating a way for the DB Engine to connect to an Azure SQL Database, and do StretchDB Operations there such as migrating rows and querying the Azure SQL Database.
To enable Stretch on a database.
Right click on the desired database.
Point to tasks.
Point to stretch.
Click enable.
This will take you to the StretchDB Wizard.
The wizard takes you to a brief introductory window. Read it if it interests you, after reading click next.
Select the tables where the data to be migrated to azure is located and then click next.
By default SQL Server Migrates the whole table to Azure.
After selecting the tables, you will be prompted for your Azure Credentials. Sign in to your azure account and then proceed to the next steps.
After signing in you need to specify the Following:
The azure subscription that you’ll gonna use.
The region where the server is located, or where the new server will be located if you’ll opt for new server.
The name of the server. This is auto generated if you’re gonna create a new server.
The server admin credentials. This will create a new server login and a new Azure SQL Database regardless if you created a new SQL Server or not.
After specifying, click next.
Enter password for the database master key. The database master key secures the credentials that Stretch Database uses to connect to the remote database.
This dialog allows you to create exceptions for Azure’s firewall. At least add an exception to the Source SQL Server Public IP.
This dialog shows the summary of the setup. It also includes an estimate of the cost of stretching your database.
When you proceed to stretching you’ll have to wait until SQL Server is done with its checks, and until Azure is done with the provisioning.
After everything’s done, the Icon of your database will be changed into something like that one above.
In case that you need to enable stretching to another table on the same database. Just right click the table, point to stretch and then click enable. You will be shown a dialog similar to what you have seen before.
This dialog shows the same introduction to stretchDB
This is similar to the previous dialog where you selected the tables that you want to stretch. However this time, only one table is shown because you specifically chosen the table from the object explorer.
This shows a summary of the operations that will be carried out.
After confirming the stretch, the wizard will show you thee status of stretching, you have to wait until SQL Server finishes its operations.
After SQL Server is finished, the table is as good as stretched.
Please note that Migration does not begin immediately, specially if a lot of this is happening on the server.
In case that the cold data are mixed with the hot data on a single table, you can specify a criteria for migration by clicking on the cell under Migrate column of the corresponding row of the table.
In this dialog you can specify the condition for migration. This dialog only allows simple conditions (one column only).
In case that you need more complex condition, you can create a function as formatted above and create a where clause. This where clause tells if the row is eligible for migration or not.
This msdn documentation discusses more about the limitations regarding the complexity of the where clause: https://msdn.microsoft.com/en-us/library/mt613432.aspx
The way StretchDB migrates data to Azure is more of clump by clump. And it usually depends on fast is your upload speed. During the demonstration on Microsoft Ph, StretchDb was migrating in 100k rows per transaction, when I made this presentation at our house, the migration is 10k rows per transaction.
The frequency of these transactions depends if the server is busy or not. SQL Server handles this functions.
To monitor the status of StretchDb and its migration:
Right click on the database.
Point to tasks.
Point to stretch.
Click Monitor
A dashboard will show up on SSMS.
This dashboards shows an overview of the stretching, which include the current size of the database (Data + Log), the Azure Details, and the tables that are configured for stretch.
You will find out that the tables that are configured for stretch shows the Rows on the table, the rows that are currently on the local server, and the rows that has been migrated to azure.
You can also click on the “View Stretch Database Health Events”, this will show a table of events related to Stretch DB.
The table of events shows the name of the event and the time when it happened. And as you can guess it the “stretch_table_row_migration” is the event where the DB Engine is migrating rows from the local database to Azure. It does not happen exactly after we configured stretch DB. In this case it took my SQL Server a minute or two after I’ve successfully configured the table for stretching.
And as you can the row migration happens on an average of twice every minute for an idle server.
This network graph shows the burst of uploads and downloads whenever the SQL Server is migrating rows.
Other than the monitor, you can also see the migration status by using the dmv sys.dm_db_rda_migration_status. This DMV features a more detailed view of the migration events.
You can also monitor the space being used by your database using the sp_spaceused stored procedure. The screenshot above shots three different uses of the stored procedure. The first shows the space used by the whole database, the second shows the space used by the local data, and the third shows the space used by the data on Azure.
As you can see, as we let StretchDb migrate data from local to Azure, we are freeing up-space on our local servers.
Comparing StretchDB to a manually archived table: In manually archived table, depending on your implementation, it might require additional SQL Logic to include archived data. On the other hand, on StretchDB querying a stretched table is the same as querying a normal table. The smart query processing automatically include the data that resides on Azure in your query results. It also detects if the query requires data from Azure or not.
In this screenshot StretchDemoSimpleFilter table migrates data that has ID values greater than 10000. So when we need those data that has ID Column below 10000, the query execution plan shows that we are querying normally.
On the other hand, if we issued a query that requires data from Azure, we can see a different execution plan. A plan which includes a remote query action.
When you stretched a Child table, the smart query processing is smart enough to know that some of the Child table’s data is in azure during a join.
However the smart query processor is not quite smart when processing joins, it doesn’t really know if we only need the data that is on the local server or not.
But when you use the where clause as a guide, the smart query processing gets smarter.
When you are doing backup and restore, the the DB Engine knows that it will only backup the data that is on the local server. So It doesn’t download all data from Azure just for the backup. Same with Restore.
Here we can see a Backup operation being performed on a database that doesn’t have any rows migrated to azure.
Notice the difference between the time of backup after we have migrated a lot of rows to azure
We can also notice the difference between the size of the database backup file.
Here we can see a restore operation being performed on a database that doesn’t have any rows migrated to Azure.
I was expecting that it would yield to a faster restore but weird things happened when I attempted to do a restore. My restore database speed won’t get back to 29MB/Sec
One of the neat feature of Cloud is agility on scaling. Instead of waiting for your order of hardware, you’ll just a few clicks here and the additional load will be address without further effort needed.
This scaling feature of Azure is also available in the StretchDB Service. Just adjust the amount of DSU that you want.
DSU is database stretch unit. It is a unit of measure created by Microsoft so that Azure subscribers would have a relative measure to compare StretchD Service.
Source: http://channel9.msdn.com/Shows/Data-Exposed/Temporal-in-SQL-Server-2016
Example of using temporal tables with Azure SQL Database stretch tables.
I wasn’t able to complete my screenshots because I ran out of azure credits.If you don’t mind experimenting, you can create a System-Versioned table, and stretch the history table. Do changes, and see if smart query processing will be applied on the For Clause. During out demonstration, the For clause doesn’t seem to exhibit that behavior.
You cannot stretch the System-Versioned Table, but you can stretch the History Table.
A normal query on the table will not hit the Azure Database because the temporal table itself is not stretched.
This shows that the smart query processor is taking into account the queries for temporal tables.However, it is not behaving as I expect it to be. The values of the rows are not shown as of the specified date.
And when you try to disable the stretch database on the history table SSMS is showing null exception errors.
SQL Server Always Encrypted.Whend ata is being migrated from Local Database to Azure Database, the data is encrypted before it is being sent to azure. But if that security is not enough, you can enable Always Encrypted on a selected column and proceed with the migration.
To encrypt columns, right click the database and click encrypt columns.
You will be taken into a wizard that shows an overview of the always encrypted feature.
Choose which columns are to be encrypted and select the encryption type that we want to.
Configure master key and then press next.
This part of the wizard warns you about the resources that the Always Encrypted will eat up.
This shows the summary of the tasks that will be done.
After Sql server database engine is finished with the encryption, you can proceed to stretching of the table as normal.
Configure stretch database on the encrypted table.
After configuring and verifying if it has been stretch, we take a loot at our encrypted table.
As you can see even if the columns are encrypted, they are still stored on the Microsoft azure and they are still encrypted and still included in the query.
This shows that the query execution plan for a stretched table is the same as that of a an encrypted table.
The pricing calculator is located here: https://azure.microsoft.com/en-us/pricing/calculator/ it gives you a good estimate on how much will it cost for implementing the stretchDB feature.
To pause the Stretch feature means temporarily stopping the Migration of rows from local to azure.To pause it, right click on the table, point to stretch, and Click Pause.
Clicking on pause will show you a dialog that looks the same as above, wait until the operations has succeeded and then click close.
If you want to stop stretching, you can disable it by right clicking on the stretched table, point to stretch, point to disable and click on any of the two choices.You have two choices: Bring back the data from azure or Leave the data in azure.
If you’ve chosen to bring back the data, SQL Server will begin trickling the Azure SQL Database. It will move clump by clump all of the data that were stored on azure.
Once that the database’s stretch has been disabled.Data left in azure will no longer be queried smartly.
It only shows weird execution plan when we try to hit the remote database using our queries.
In the event that you ran out of Azure credits while a table is being stretched. You will not be able to run any queries pertaining to the stretched table.
Regarding the question: “Can I bring back my data if I ran out of azure credits?”, I cannot answer this in plain manner. All I can say is that I cannot get my data back (using the disable + bring back data from azure option) after I ran out of credits. The StretchDB Monitor says: login account is not a valid azure account.
Although a few days after running out of credits, I happen to see this green graph on my chart. I’m pretty sure it won’t show up as a charge on my credit card :P