The document discusses various disaster recovery scenarios for a BI solution involving Azure Synapse, Data Lake, and Data Share. Scenario 2 involves provisioning these services in a paired secondary region, then synchronizing the Data Lake, restoring the SQL Pool, activating Synapse pipelines, and data share triggers to enable a standby environment. A step-by-step guide is provided for implementing scenario 2 with phases for provisioning, synchronization, restore, activation of pipelines and triggers, and notification of consumers. References are also included.
Introduction to Snowflake Datawarehouse and Architecture for Big data company. Centralized data management. Snowpipe and Copy into a command for data loading. Stream loading and Batch Processing.
Microsoft Data Platform - What's includedJames Serra
The pace of Microsoft product innovation is so fast that even though I spend half my days learning, I struggle to keep up. And as I work with customers I find they are often in the dark about many of the products that we have since they are focused on just keeping what they have running and putting out fires. So, let me cover what products you might have missed in the Microsoft data platform world. Be prepared to discover all the various Microsoft technologies and products for collecting data, transforming it, storing it, and visualizing it. My goal is to help you not only understand each product but understand how they all fit together and there proper use case, allowing you to build the appropriate solution that can incorporate any data in the future no matter the size, frequency, or type. Along the way we will touch on technologies covering NoSQL, Hadoop, and open source.
Building an Effective Data Warehouse ArchitectureJames Serra
Why use a data warehouse? What is the best methodology to use when creating a data warehouse? Should I use a normalized or dimensional approach? What is the difference between the Kimball and Inmon methodologies? Does the new Tabular model in SQL Server 2012 change things? What is the difference between a data warehouse and a data mart? Is there hardware that is optimized for a data warehouse? What if I have a ton of data? During this session James will help you to answer these questions.
최근 국내와 글로벌 서비스에서 MongoDB를 사용하는 사례가 급증하고 있습니다. 다만 전통적인 RDBMS에 비해, 아직 지식과 경험의 축적이 적게 되어 있어 손쉬운 접근과 트러블 슈팅등에 문제가 있는 것도 사실입니다. 이 세션에서는 MongoDB 와 AWS의 DocumentDB의 Architecure를 간단히 살펴보고 MongoDB 및 DocumentDB의 비교를 진행하며 특히 MongoDB와 DocumentDB를 사용할때 주의해야할 중요 포인트에 대해서 알아봅니다.
Oracle RAC 19c - the Basis for the Autonomous DatabaseMarkus Michalewicz
Oracle Real Application Clusters (RAC) has been Oracle's premier database availability and scalability solution for more than two decades as it provides near linear horizontal scalability without the need to change the application code. This session explains why Oracle RAC 19c is the basis for Oracle's Autonomous Database by introducing some of its latest features, some of which were specifically designed for ATP-D, as well as by taking a peek under the hood of the dedicated Autonomous Database Service (ATP-D).
Introduction to Snowflake Datawarehouse and Architecture for Big data company. Centralized data management. Snowpipe and Copy into a command for data loading. Stream loading and Batch Processing.
Microsoft Data Platform - What's includedJames Serra
The pace of Microsoft product innovation is so fast that even though I spend half my days learning, I struggle to keep up. And as I work with customers I find they are often in the dark about many of the products that we have since they are focused on just keeping what they have running and putting out fires. So, let me cover what products you might have missed in the Microsoft data platform world. Be prepared to discover all the various Microsoft technologies and products for collecting data, transforming it, storing it, and visualizing it. My goal is to help you not only understand each product but understand how they all fit together and there proper use case, allowing you to build the appropriate solution that can incorporate any data in the future no matter the size, frequency, or type. Along the way we will touch on technologies covering NoSQL, Hadoop, and open source.
Building an Effective Data Warehouse ArchitectureJames Serra
Why use a data warehouse? What is the best methodology to use when creating a data warehouse? Should I use a normalized or dimensional approach? What is the difference between the Kimball and Inmon methodologies? Does the new Tabular model in SQL Server 2012 change things? What is the difference between a data warehouse and a data mart? Is there hardware that is optimized for a data warehouse? What if I have a ton of data? During this session James will help you to answer these questions.
최근 국내와 글로벌 서비스에서 MongoDB를 사용하는 사례가 급증하고 있습니다. 다만 전통적인 RDBMS에 비해, 아직 지식과 경험의 축적이 적게 되어 있어 손쉬운 접근과 트러블 슈팅등에 문제가 있는 것도 사실입니다. 이 세션에서는 MongoDB 와 AWS의 DocumentDB의 Architecure를 간단히 살펴보고 MongoDB 및 DocumentDB의 비교를 진행하며 특히 MongoDB와 DocumentDB를 사용할때 주의해야할 중요 포인트에 대해서 알아봅니다.
Oracle RAC 19c - the Basis for the Autonomous DatabaseMarkus Michalewicz
Oracle Real Application Clusters (RAC) has been Oracle's premier database availability and scalability solution for more than two decades as it provides near linear horizontal scalability without the need to change the application code. This session explains why Oracle RAC 19c is the basis for Oracle's Autonomous Database by introducing some of its latest features, some of which were specifically designed for ATP-D, as well as by taking a peek under the hood of the dedicated Autonomous Database Service (ATP-D).
Azure Synapse is Microsoft's new cloud analytics service offering that combines enterprise data warehouse and Big Data analytics capabilities. It offers a powerful and streamlined platform to facilitate the process of consolidating, storing, curating and analysing your data to generate reliable and actionable business insights.
데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study (박주홍 데이터 분석 및 인프라 팀...Amazon Web Services Korea
데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study
이 세션에서는 데브시스터즈의 Case Study를 통하여 Data Lake를 만들고 사용하는데 있어 요구 되는 사항들에 대해 공유합니다. 여러 목적에 맞는 데이터를 전달하기 위해 AWS 를 활용하여 Data Lake 를 구축하게된 계기와 실제 구축 작업을 하면서 경험하게 된 것들에 대해 말씀드리고자 합니다. 기존 인프라 구조 대비 효율성 및 비용적 측면을 소개해드리고, 빅데이터를 이용한 부서별 데이터 세분화를 진행할 때 어떠한 Architecture가 사용되었는지 소개드리고자 합니다.
Machine Learning Model Serving with Backend.AIJeongkyu Shin
머신러닝 모델을 서비스 단에서 서빙하는 것은 손이 많이 갑니다.
서비스 과정을 편리하게 하기 위하여 TensorFlow serving 등 서빙 과정을 돕는 다양한 도구들이 공개되고 개발되고 있습니다만, 여전히 서빙 과정은 귀찮고 불편합니다. 이 세션에서는 Backend.AI 와 TensorFlow serving을 이용하여 간단하게 TensorFlow 모델을 서빙하는 법에 대해 다루어 봅니다.
Backend.AI 서빙 모드를 소개하고, 여러 TF serving 모델 등을 Backend.AI 로 서비스하는 과정을 통해 실제로 사용하는 법을 알아봅니다.
Serving the machine learning model at the service level is a lot of work. A variety of tools are being developed and released to facilitate the process of serving. TensorFlow serving is the greatest one for serving now, but the docker image baking-based serving process is not easy, not flexible and controllable enough. In this session, I will discuss how to simplify the serving process of TensorFlow models by using Backend.AI and TensorFlow serving.
I will introduce the Backend.AI serving mode (on the trunk but will be official since 1.6). After that, I will demonstrate how to use the Backend.AI serving mode that conveniently provides various TensorFlow models with TensorFlow serving on the fly.
This presentation is for those of you who are interested in moving your on-prem SQL Server databases and servers to Azure virtual machines (VM’s) in the cloud so you can take advantage of all the benefits of being in the cloud. This is commonly referred to as a “lift and shift” as part of an Infrastructure-as-a-service (IaaS) solution. I will discuss the various Azure VM sizes and options, migration strategies, storage options, high availability (HA) and disaster recovery (DR) solutions, and best practices.
Here are the slides for my talk "An intro to Azure Data Lake" at Techorama NL 2018. The session was held on Tuesday October 2nd from 15:00 - 16:00 in room 7.
Amazon Aurora is a MySQL- and PostgreSQL-compatible relational database built for the cloud. It combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. In this session, we cover some of the key innovations in the database engine and storage layers, explain recently announced features, such as Aurora Serverless, Aurora Multi-Master, and Aurora Parallel Query, and discuss best practices and optimal configurations. See why Aurora is a great fit for new application development and for migrations from overpriced, restrictive commercial databases.
by Joyjeet Banerjee, Solutions Architect, AWS
Amazon Athena is a new serverless query service that makes it easy to analyze data in Amazon S3, using standard SQL. With Athena, there is no infrastructure to setup or manage, and you can start analyzing your data immediately. You don’t even need to load your data into Athena, it works directly with data stored in S3. Level 200
In this session, we will show you how easy it is to start querying your data stored in Amazon S3, with Amazon Athena. First we will use Athena to create the schema for data already in S3. Then, we will demonstrate how you can run interactive queries through the built-in query editor. We will provide best practices and use cases for Athena. Then, we will talk about supported queries, data formats, and strategies to save costs when querying data with Athena.
This is the first time I introduced the concept of Schema-on-Read vs Schema-on-Write to the public. It was at Berkeley EECS RAD Lab retreat Open Mic Session on May 28th, 2009 at Santa Cruz, California.
re:Invent 2022 DAT326 Deep dive into Amazon Aurora and its innovationsGrant McAlister
With an innovative architecture that decouples compute from storage as well as advanced features like Global Database and low-latency read replicas, Amazon Aurora reimagines what it means to be a relational database. The result is a modern database service that offers performance and high availability at scale, fully open-source MySQL- and PostgreSQL-compatible editions, and a range of developer tools for building serverless and machine learning-driven applications. In this session, dive deep into some of the most exciting features Aurora offers, including Aurora Serverless v2 and Global Database. Also learn about recent innovations that enhance performance, scalability, and security while reducing operational challenges.
[오픈소스컨설팅]Day #1 MySQL 엔진소개, 튜닝, 백업 및 복구, 업그레이드방법Ji-Woong Choi
MySQL 소개
간략한 소개
version history
MySQL 사용처
제품 군 변화
시장 변화
MySQL 구성
MySQL 클라이언트 / 서버 개념
클라이언트 프로그램
MySQL 설치
MySQL 버전
MySQL 설치
MySQL 환경 설정
환경설정, 변수 설정
MySQL 스토리지 엔진 소개
MySQL tuning 소개 및 방법
데이터 백업/복구 방법
백업
복구
MySQL Upgrade
Organizations are grappling to manually classify and create an inventory for distributed and heterogeneous data assets to deliver value. However, the new Azure service for enterprises – Azure Synapse Analytics is poised to help organizations and fill the gap between data warehouses and data lakes.
Azure Synapse is Microsoft's new cloud analytics service offering that combines enterprise data warehouse and Big Data analytics capabilities. It offers a powerful and streamlined platform to facilitate the process of consolidating, storing, curating and analysing your data to generate reliable and actionable business insights.
데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study (박주홍 데이터 분석 및 인프라 팀...Amazon Web Services Korea
데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study
이 세션에서는 데브시스터즈의 Case Study를 통하여 Data Lake를 만들고 사용하는데 있어 요구 되는 사항들에 대해 공유합니다. 여러 목적에 맞는 데이터를 전달하기 위해 AWS 를 활용하여 Data Lake 를 구축하게된 계기와 실제 구축 작업을 하면서 경험하게 된 것들에 대해 말씀드리고자 합니다. 기존 인프라 구조 대비 효율성 및 비용적 측면을 소개해드리고, 빅데이터를 이용한 부서별 데이터 세분화를 진행할 때 어떠한 Architecture가 사용되었는지 소개드리고자 합니다.
Machine Learning Model Serving with Backend.AIJeongkyu Shin
머신러닝 모델을 서비스 단에서 서빙하는 것은 손이 많이 갑니다.
서비스 과정을 편리하게 하기 위하여 TensorFlow serving 등 서빙 과정을 돕는 다양한 도구들이 공개되고 개발되고 있습니다만, 여전히 서빙 과정은 귀찮고 불편합니다. 이 세션에서는 Backend.AI 와 TensorFlow serving을 이용하여 간단하게 TensorFlow 모델을 서빙하는 법에 대해 다루어 봅니다.
Backend.AI 서빙 모드를 소개하고, 여러 TF serving 모델 등을 Backend.AI 로 서비스하는 과정을 통해 실제로 사용하는 법을 알아봅니다.
Serving the machine learning model at the service level is a lot of work. A variety of tools are being developed and released to facilitate the process of serving. TensorFlow serving is the greatest one for serving now, but the docker image baking-based serving process is not easy, not flexible and controllable enough. In this session, I will discuss how to simplify the serving process of TensorFlow models by using Backend.AI and TensorFlow serving.
I will introduce the Backend.AI serving mode (on the trunk but will be official since 1.6). After that, I will demonstrate how to use the Backend.AI serving mode that conveniently provides various TensorFlow models with TensorFlow serving on the fly.
This presentation is for those of you who are interested in moving your on-prem SQL Server databases and servers to Azure virtual machines (VM’s) in the cloud so you can take advantage of all the benefits of being in the cloud. This is commonly referred to as a “lift and shift” as part of an Infrastructure-as-a-service (IaaS) solution. I will discuss the various Azure VM sizes and options, migration strategies, storage options, high availability (HA) and disaster recovery (DR) solutions, and best practices.
Here are the slides for my talk "An intro to Azure Data Lake" at Techorama NL 2018. The session was held on Tuesday October 2nd from 15:00 - 16:00 in room 7.
Amazon Aurora is a MySQL- and PostgreSQL-compatible relational database built for the cloud. It combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. In this session, we cover some of the key innovations in the database engine and storage layers, explain recently announced features, such as Aurora Serverless, Aurora Multi-Master, and Aurora Parallel Query, and discuss best practices and optimal configurations. See why Aurora is a great fit for new application development and for migrations from overpriced, restrictive commercial databases.
by Joyjeet Banerjee, Solutions Architect, AWS
Amazon Athena is a new serverless query service that makes it easy to analyze data in Amazon S3, using standard SQL. With Athena, there is no infrastructure to setup or manage, and you can start analyzing your data immediately. You don’t even need to load your data into Athena, it works directly with data stored in S3. Level 200
In this session, we will show you how easy it is to start querying your data stored in Amazon S3, with Amazon Athena. First we will use Athena to create the schema for data already in S3. Then, we will demonstrate how you can run interactive queries through the built-in query editor. We will provide best practices and use cases for Athena. Then, we will talk about supported queries, data formats, and strategies to save costs when querying data with Athena.
This is the first time I introduced the concept of Schema-on-Read vs Schema-on-Write to the public. It was at Berkeley EECS RAD Lab retreat Open Mic Session on May 28th, 2009 at Santa Cruz, California.
re:Invent 2022 DAT326 Deep dive into Amazon Aurora and its innovationsGrant McAlister
With an innovative architecture that decouples compute from storage as well as advanced features like Global Database and low-latency read replicas, Amazon Aurora reimagines what it means to be a relational database. The result is a modern database service that offers performance and high availability at scale, fully open-source MySQL- and PostgreSQL-compatible editions, and a range of developer tools for building serverless and machine learning-driven applications. In this session, dive deep into some of the most exciting features Aurora offers, including Aurora Serverless v2 and Global Database. Also learn about recent innovations that enhance performance, scalability, and security while reducing operational challenges.
[오픈소스컨설팅]Day #1 MySQL 엔진소개, 튜닝, 백업 및 복구, 업그레이드방법Ji-Woong Choi
MySQL 소개
간략한 소개
version history
MySQL 사용처
제품 군 변화
시장 변화
MySQL 구성
MySQL 클라이언트 / 서버 개념
클라이언트 프로그램
MySQL 설치
MySQL 버전
MySQL 설치
MySQL 환경 설정
환경설정, 변수 설정
MySQL 스토리지 엔진 소개
MySQL tuning 소개 및 방법
데이터 백업/복구 방법
백업
복구
MySQL Upgrade
Organizations are grappling to manually classify and create an inventory for distributed and heterogeneous data assets to deliver value. However, the new Azure service for enterprises – Azure Synapse Analytics is poised to help organizations and fill the gap between data warehouses and data lakes.
Amazon Aurora is a cloud-optimized relational database that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. The recently announced PostgreSQL-compatibility, together with the original MySQL compatibility, are perfect for new application development and for migrations from overpriced, restrictive commercial databases. In this session, we’ll do a deep dive into the new architectural model and distributed systems techniques behind Amazon Aurora, discuss best practices and configurations, look at migration options and share customer experience from the field.
GECon2017_High-volume data streaming in azure_ Aliaksandr LaishaGECon_Org Team
The session will be focused on solutions that require high-throughput ingestion & streaming of data in real-time. You'll get familiar with different business uses-cases and architecture examples to get a common idea as well as understand the concepts of stream processing systems. Next, you'll get deep insights into functional and non-functional capabilities of Azure Event Hub service to see how it fits into the whole picture. Moreover we’ll take a look how to leverage Azure CosmosDB for high-throughput streaming when Event Hub is not suitable by different reasons.
Presentation detailed about capabilities of In memory Analytic using Apache Spark. Apache Spark overview with programming mode, cluster mode with Mosos, supported operations and comparison with Hadoop Map Reduce. Elaborating Apache Spark Stack expansion like Shark, Streaming, MLib, GraphX
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
4. BI007 - DR Scenarios
RTO : Recovery Time Objective
RPO : Recovery Point Objective
Normal BI Solution
RPO = 0 –> ( re-import data )
RTO = How many Time Down ?
5. Data Share
To be prepared for a data center outage, the data provider can
have a data share environment provisioned in a secondary
region. Measures can be taken to ensure a smooth failover in the
event that a data center outage does occur.
In this context data consumers can have an active share
subscription that is idle for DR purposes.
https://docs.microsoft.com/en-us/azure/data-share/disaster-recovery
6. Data Lake
Storage accounts that have hierarchical namespace enabled (such
as for Data Lake Storage Gen2) are not supported for failover at
this time.
https://docs.microsoft.com/en-us/azure/storage/common/storage-disaster-recovery-guidance
Trigger Failover Trigger Failover – not available
Failover for storage accounts with hierarchical namespace enabled (Azure Data Lake Storage Gen2 storage accounts)
is not supported at this time.
7. Copying data as an alternative to failover
If your storage account is configured for read access to the secondary, then you can design
your application to read from the secondary endpoint. If you prefer not to fail over in the
event of an outage in the primary region, you can use tools such as AzCopy, Azure
PowerShell, or the Azure Data Movement library to copy data from your storage account in
the secondary region to another storage account in an unaffected region. You can then
point your applications to that storage account for both read and write availability.
Data Lake
8. Data Lake
RA on secondary
Geo-redundant storage (with GRS or GZRS) replicates your data to another physical location
in the secondary region to protect against regional outages. However, that data is available
to be read only if the customer or Microsoft initiates a failover from the primary to
secondary region.
When you enable read access to the secondary region, your data is available to be
read at all times, including in a situation where the primary region becomes unavailable.
For read access to the secondary region, enable read-access geo-redundant storage (RA-
GRS) or read-access geo-zone-redundant storage (RA-GZRS).
9. Synapse
Geo-backups and disaster recovery
A geo-backup is created once per day to a paired data center. The RPO for a geo-restore is 24
hours. You can restore the geo-backup to a server in any other region where dedicated SQL pool
is supported. A geo-backup ensures you can restore data warehouse in case you cannot access
the restore points in your primary region.
You can also create a user-defined restore point and restore from the newly created restore point
to a new data warehouse in a different region. After you have restored, you have the data
warehouse online and can pause it indefinitely to save compute costs. The paused database
incurs storage charges at the Azure Premium Storage rate. Another common pattern for a
shorter recovery point is to ingest data into primary and secondary instances of a data
warehouse in parallel. In this scenario, data is ingested from a source (or sources) and persisted
to two separate instances of the data warehouse (primary and secondary). To save on compute
costs, you can pause the secondary instance of the warehouse. If you need an active copy of the
data warehouse, you can resume, which should take only a few minutes.
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/backup-and-restore
10. Synapse
Move synapse from one region to another
https://docs.microsoft.com/en-us/azure/synapse-analytics/how-to-move-
workspace-from-one-region-to-another
Steps resume:
• Provision new Synapse instance and restore your last state to it from
the Automated or User-Defined snapshot.
• Proper permissions should be granted
• Connections to Azure Services should be reestablished
• New connection parameters should be propagated to the end-users
• Model drift should be mitigated
11. Synapse
Workspace
SQL Pool
Data Lake
GZRS
Data Share
Data Share
Main Region Pair Region
Secondary
Data Lake
LRS
Automated
Snapshot
Copy
Snapshot
RPO < 24h
RPO = 8h
RPO < 15m (no SLA)
Failover not available due
to hierarchical namespace
Current Scenario
12. Current scenario
Data lake: ( RPO < 15 minutes )
• Activate (RA-GRS) – allow read only Secondary region
SQL Pool ( RPO 8 H + 24 H )
• possibility to use user-defined snapshot backup
13. Synapse
Workspace
SQL Pool
Data Lake
GZRS
Data Share
Data Share
Main Region Pair Region
Scenario 1 : recover from current scenario
Synapse
Workspace
SQL Pool
Data Lake
Read Only
LRS
(Microsoft Activated)
Data Share
Data Share
Data Lake
Writable
Copy
Snapshot
restore
sync
Provision new
14. Scenario 1 : recover from current scenario
• Provision new Azure Services – BI007 infrastructure
network, roles, data lake permissions…
• Wait for failover by Microsoft – allow read only secondary
region
• Restore SQL Pool database
15. Synapse
Workspace
SQL Pool
Data Lake
GZRS
Data Share
Data Share
Main Region Pair Region
Scenario 2 : Pair region provisioned stand by
Synapse
Workspace
SQL Pool
Data Lake
Read Only
LRS
(Microsoft Activated)
Data Share
Data Share
Data Lake
Writable
Copy
Snapshot
restore
sync
Stand by / Stepped activation
1
3
1
2
16. Scenario 2 : Pair region provisioned stand by
The deploy is made to 2 regions – CI/CD?
All azure services will be deployed and configured
1. Activate data lake synchronization – read only to read/write
and Restore SQL Pool from snapshot
2. Activate Synapse pipeline triggers
3. Activate Data Share triggers
17. Synapse
Workspace
SQL Pool
Data Lake
GZRS
Data Share
Data Share
Main Region Pair Region
Scenario 3 : current replicated hot stand by
Synapse
Workspace
SQL Pool
Data Share
Data Share
Data Lake
Writable
GZRS
Daily
Activated
18. Scenario 3 : current replicated hot stand by
2 production systems in each region – CI/CD deploy
Data Share and Synapse pipelines activated
Start/Pause SQL Pool
Possibility to have 2 online systems
2 production systems to maintain
If data lake replication removed => data lake cost will be equal
Additional cost with Synapse pipelines
Low RPO and RTO – replicated environment
20. Synapse
Workspace
SQL Pool
Data Lake
GZRS
Data Share
Data Share
Main Region Pair Region
Scenario 2 : Pair region provisioned stand by
Synapse
Workspace
SQL Pool
Data Lake
Read Only
LRS
(Microsoft Activated)
Data Share
Data Share
Data Lake
Writable
Copy
Snapshot
restore
sync
Stand by / Stepped activation
1
3
1
2
21. Scenario 2 : Pair region provisioned stand by
Pre-Requisites
Phase 0 : Provisioning and failover validation
Phase 1 : Data Lake synchronization and SQL Pool restore
Phase 2 : Activate Synapse trigger pipelines
Phase 3 : Activate Data Share triggers
Phase 4 : Adjust/Notify consumers for new endpoints
References
22. Pre-Requisites
Ensure that the current redundant Storage Account on secondary
is RA activated (RA-GZRS)
Ensure that a Read/Write Storage Account is provisioned on
secondary region, using ZRS
Ensure that Synapse Dedicated SQL Pool geo-backup policy is
enabled
23. Phase 0 : Provisioning and failover validation
• Use current automation deployment strategy (terraform)
• Provision all Azure Services that represent the current MD Data hub
infrastructure (as in previous diagram)
• Ensure network, roles, data lake permissions, etc
• Replicate Data Share subscription(s)
• Consider secondary backup region from data provider(s)
• Use current automation (DevOps CI/CD) to replicate last
developments
• Periodic failover validation
24. Phase 1 : Data Lake synchronization and SQL Pool
restore
Evaluate possible data loss (regarding RTO)
This steps can run in parallel:
• Sync Data Lake (avg 3s/GB)
• Execute AzCopy script to sync Read Only Storage Account with Read/Write
Storage Account
azcopy sync
"https://<source_storage>.blob.core.windows.net/?<sas_token>"
"https://<destination_storage>.blob.core.windows.net/?<sas_token>"
--recursive
--delete-destination=true
https://docs.microsoft.com/en-us/azure/storage/common/storage-ref-azcopy-sync?toc=/azure/storage/blobs/toc.json
• SQL Pool
• restore (using powershell or azure portal)
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-restore-from-geo-backup#restore-
from-an-azure-geographical-region-through-powershell
1
25. Using Synapse Workspace:
• Open Synapse Studio
• Go to Manage / Integration / Triggers (menu)
• Start triggers
Phase 2 : Activate Synapse trigger pipelines
26. Phase 3 : Activate Data Share triggers
Using Azure Portal:
• Go to Data Share Service
• Select Received Shares (on left menu)
• For each shared subscription:
• on snapshot schedule, enable recurrence interval
27. Phase 4 : Adjust/Notify consumers for new endpoints
• Distribute new end-points for consumers or
• If DNS based, apply new end-points
29. Ricardo Linhares
BI Specialist | Data & AI Solutions @ DevScope
Start with SQL 2000
r.linhas@gmail.com
https://www.linkedin.com/in/r-linhares/
https://twitter.com/RLinhas