Pipelines and Packages: Introduction to Azure Data Factory (Techorama NL 2019)

•

3 likes•1,449 views

This document discusses Azure Data Factory (ADF) and how it can be used to build and orchestrate data pipelines without code. It describes how ADF is a hybrid data integration service that improves on its previous version. It also explains how existing SSIS packages can be "lifted and shifted" to ADF to modernize solutions while retaining investments. The document demonstrates creating pipelines and data flows in ADF, handling schema drift, and best practices for development.

Data & Analytics

Pipelines and Packages:
Introduction to Azure Data Factory
Cathrine Wilhelmsen
Techorama NL · Oct 2, 2019

Pipelines and Packages: Introduction to Azure Data Factory
As Data Engineers and ETL Developers, our main responsibilities are to move, transform, integrate and
prepare data for our end users as quickly and efficiently as possible. With the ever-increasing volume
and variety of data, this can easily start to feel like a daunting task.
Azure Data Factory (ADF) is a hybrid data integration service that lets you build, orchestrate and monitor
complex and scalable data pipelines - without writing any code. The first version of Azure Data Factory
may not have lived entirely up to its nickname "SSIS in the Cloud", but the second version has been
drastically improved and expanded with new capabilities.
But wait, what's that? You have already invested years and millions in a comprehensive SSIS solution, you
say? No problem! You can lift and shift your existing SSIS packages into Azure Data Factory to start
modernizing your solution while retaining the investments you have already made.
In this session, we will first go through the fundamentals of Azure Data Factory and see how easy it is to
build new data pipelines or migrate your existing SSIS packages. Then, we will explore some of the major
improvements in Azure Data Factory v2, including the new Mapping Data Flows. Finally, we will look at
design patterns and best practices for development to speed up productivity while keeping costs down.

Data Warehousing Business Intelligence
Artificial Intelligence
Big Data and Analytics
Machine Learning
Data Science

Collect
Store
Transform
Integrate
Prepare

What is Azure Data Factory?
Hybrid data integration service
Complex and scalable pipelines
No-code ETL/ELT data flows

What can you do in Azure Data Factory?
Copy Data Transform Data

What is inside Azure Data Factory?
Pipelines
Activities
Datasets
Linked
Services
Data Flows
Templates
Triggers

DEMO
Let's look inside
Azure Data Factory!

Wait…
I already have
thousands of SSIS
packages!

And…
You told me to use
this Biml thing!

What does Lift and Shift mean?
Lift up existing SSIS packages
Shift them to a new location

Why should you Lift and Shift SSIS?
Modernize while retaining investments
Continue to use familiar tools
Reduce maintenance and costs (*)

How do you Lift and Shift SSIS?
1. Configure Azure-SSIS Integration Runtime
2. Deploy SSIS Packages to SSISDB in Azure
3. Orchestrate SSIS Packages in Azure Data Factory

Azure-SSIS Integration Runtime
Managed cluster of Azure VMs dedicated to SSIS
Billed while running (like all VMs)
Manage cost by running when necessary

DEMO
Let’s explore SSIS in
Azure Data Factory!

Pipeline
Linked Service
Source
Sink
Activity
Data Flow
≈
≈
≈
≈
≈
≈
Package
Connection Manager
Source
Destination
Control Flow Task
Data Flow
ADF vs SSIS

What are Mapping Data Flows?
Data transformation at scale
Runs on Azure Databricks
Visual editor, no-code experience

Why use Mapping Data Flows?
Transform Data
Upsert Data
Load a Data Warehouse
Handle schema drift

What is Schema Drift?
Rapidly changing source files and metadata:
• Added / Removed Columns
• Renamed Column Names
• Changed Data Types
If not handled properly, Schema Drift can (and most likely
will) cause problems in the upstream pipeline

DEMO
Let's create some
mapping data flows!

Lessons Learned
In ADF, everything has a price
SSIS best practices != ADF best practices
Learn how to learn and adapt

@cathrinew
cathrinew.net
hi@cathrinew.net
thank you!

What's hot

A lap around Azure Data FactoryBizTalk360

J1 T1 4 - Azure Data Factory vs SSIS - Regis BaccaroMS Cloud Summit

DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...Lace Lofranco

ETL in the Cloud With Microsoft AzureMark Kromer

Intro to Azure Data Factory v1Eric Bragas

Azure Data FactoryHARIHARAN R

Azure Data Factory v2Sergio Zenatti Filho

Azure Data Factory ETL Patterns in the CloudMark Kromer

Modern data warehouseRakesh Jayaram

Designing a modern data warehouse in azure Antonios Chatzipavlis

Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Lace Lofranco

Cortana Analytics Workshop: Azure Data LakeMSAdvAnalytics

Azure data factoryBizTalk360

Microsoft Azure BI Solutions in the CloudMark Kromer

Analyzing StackExchange data with Azure Data LakeBizTalk360

Unleash the power of Azure Data Factory Sergio Zenatti Filho

Azure Data Factory for Redmond SQL PASS UG Sept 2018Mark Kromer

Azure SQL Data Warehouse Antonios Chatzipavlis

Using Redash for SQL Analytics on DatabricksDatabricks

Microsoft Build 2018 Analytic Solutions with Azure Data Factory and Azure SQL...Mark Kromer

What's hot (20)

A lap around Azure Data Factory

J1 T1 4 - Azure Data Factory vs SSIS - Regis Baccaro

DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...

ETL in the Cloud With Microsoft Azure

Intro to Azure Data Factory v1

Azure Data Factory

Azure Data Factory v2

Azure Data Factory ETL Patterns in the Cloud

Modern data warehouse

Designing a modern data warehouse in azure

Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...

Cortana Analytics Workshop: Azure Data Lake

Azure data factory

Microsoft Azure BI Solutions in the Cloud

Analyzing StackExchange data with Azure Data Lake

Unleash the power of Azure Data Factory

Azure Data Factory for Redmond SQL PASS UG Sept 2018

Azure SQL Data Warehouse

Using Redash for SQL Analytics on Databricks

Microsoft Build 2018 Analytic Solutions with Azure Data Factory and Azure SQL...

Similar to Pipelines and Packages: Introduction to Azure Data Factory (Techorama NL 2019)

Exploring Microsoft Azure InfrastructuresCCG

SQL Saturday Redmond 2019 ETL Patterns in the CloudMark Kromer

Lift SSIS package to Azure Data Factory V2Manjeet Singh

Azure Data.pptxFedoRam1

Afternoons with Azure - Azure Data ServicesCCG

New capabilities for modern data integration in the cloudMicrosoft Tech Community

Visualizing Big Data Insights with Amazon QuickSightAmazon Web Services

New capabilities for modern data integration in the cloudGaurav Malhotra

Modern dataintegration azuredatafactory_ssisGaurav Malhotra

ADF Demo_ppt.pptxvamsytaurus

Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Trivadis

Azure Data Engineering.pptxakhilamadupativibhin

Optimiser votre infrastructure SQL Server avec AzureSwiss Data Forum Swiss Data Forum

Azure SQL DB Managed Instances Built to easily modernize application data layerMicrosoft Tech Community

Azure DevOps for the Data ProfessionalSarah Dutkiewicz

Azure Data Engineering.pdfakhilamadupativibhin

How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsInformatica

2014.11.14 Data Opportunities with AzureMarco Parenzan

Benefits of the Azure cloudJames Serra

Azure Data Engineering course in hyderabad.pptxshaikmadarbi3zen

Similar to Pipelines and Packages: Introduction to Azure Data Factory (Techorama NL 2019) (20)

Exploring Microsoft Azure Infrastructures

SQL Saturday Redmond 2019 ETL Patterns in the Cloud

Lift SSIS package to Azure Data Factory V2

Azure Data.pptx

Afternoons with Azure - Azure Data Services

New capabilities for modern data integration in the cloud

Visualizing Big Data Insights with Amazon QuickSight

New capabilities for modern data integration in the cloud

Modern dataintegration azuredatafactory_ssis

ADF Demo_ppt.pptx

Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)

Azure Data Engineering.pptx

Optimiser votre infrastructure SQL Server avec Azure

Azure SQL DB Managed Instances Built to easily modernize application data layer

Azure DevOps for the Data Professional

Azure Data Engineering.pdf

How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics

2014.11.14 Data Opportunities with Azure

Benefits of the Azure cloud

Azure Data Engineering course in hyderabad.pptx

Recently uploaded

Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridihmeghakumariji156

Call Girls In GOA North Goa +91-8588052666 Direct Cash Escorts Servicenishakur201

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann

Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg

Vastral Call Girls Book Now 7737669865 Top Class Escort Service Availablegargpaaro

TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB

RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay

💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...vershagrag

Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131

Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...Delhi Call girls

Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg

Introduction to Statistics Presentation.pptxAniqa Zai

Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...HyderabadDolls

Belur $ Female Escorts Service in Kolkata (Adult Only) 8005736733 Escort Serv...HyderabadDolls

Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg

👉 Bhilai Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top Class Call Girl Ser...vershagrag

Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg

Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...kumargunjan9515

Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...HyderabadDolls

Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...HyderabadDolls

Recently uploaded (20)

Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih

Call Girls In GOA North Goa +91-8588052666 Direct Cash Escorts Service

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK

Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...

Vastral Call Girls Book Now 7737669865 Top Class Escort Service Available

TrafficWave Generator Will Instantly drive targeted and engaging traffic back...

RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx

💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...

Dubai Call Girls Peeing O525547819 Call Girls Dubai

Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...

Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...

Introduction to Statistics Presentation.pptx

Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...

Belur $ Female Escorts Service in Kolkata (Adult Only) 8005736733 Escort Serv...

Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...

👉 Bhilai Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top Class Call Girl Ser...

Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...

Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...

Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...

Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...

Pipelines and Packages: Introduction to Azure Data Factory (Techorama NL 2019)

2. Pipelines and Packages: Introduction to Azure Data Factory Cathrine Wilhelmsen Techorama NL · Oct 2, 2019

3. Pipelines and Packages: Introduction to Azure Data Factory As Data Engineers and ETL Developers, our main responsibilities are to move, transform, integrate and prepare data for our end users as quickly and efficiently as possible. With the ever-increasing volume and variety of data, this can easily start to feel like a daunting task. Azure Data Factory (ADF) is a hybrid data integration service that lets you build, orchestrate and monitor complex and scalable data pipelines - without writing any code. The first version of Azure Data Factory may not have lived entirely up to its nickname "SSIS in the Cloud", but the second version has been drastically improved and expanded with new capabilities. But wait, what's that? You have already invested years and millions in a comprehensive SSIS solution, you say? No problem! You can lift and shift your existing SSIS packages into Azure Data Factory to start modernizing your solution while retaining the investments you have already made. In this session, we will first go through the fundamentals of Azure Data Factory and see how easy it is to build new data pipelines or migrate your existing SSIS packages. Then, we will explore some of the major improvements in Azure Data Factory v2, including the new Mapping Data Flows. Finally, we will look at design patterns and best practices for development to speed up productivity while keeping costs down.

4. @cathrinew cathrinew.net

5. Data Warehousing Business Intelligence Artificial Intelligence Big Data and Analytics Machine Learning Data Science

6. Data Warehousing Business Intelligence Artificial Intelligence Big Data and Analytics Machine Learning Data Science

7. What? When? Why?

8. Collect Store Transform Integrate Prepare

10. 10 years ago…

11. SSIS SQL Server Integration Services

12.

13. Then…

14. ADF v1 Azure Data Factory Version 1

15.

16.

17. Today…

18. ADF v2 Azure Data Factory Version 2

19.

20. Azure Data Factory

21. What is Azure Data Factory? Hybrid data integration service Complex and scalable pipelines No-code ETL/ELT data flows

22. What can you do in Azure Data Factory? Copy Data Transform Data

23. What is inside Azure Data Factory? Pipelines Activities Datasets Linked Services Data Flows Templates Triggers

24. DEMO Let's look inside Azure Data Factory!

25. Wait… I already have thousands of SSIS packages!

26. And… You told me to use this Biml thing!

27. SSIS Lift and Shift

28. What does Lift and Shift mean? Lift up existing SSIS packages Shift them to a new location

29. Why should you Lift and Shift SSIS? Modernize while retaining investments Continue to use familiar tools Reduce maintenance and costs (*)

30. How do you Lift and Shift SSIS? 1. Configure Azure-SSIS Integration Runtime 2. Deploy SSIS Packages to SSISDB in Azure 3. Orchestrate SSIS Packages in Azure Data Factory

31. How do you Lift and Shift SSIS? 1. Configure Azure-SSIS Integration Runtime 2. Deploy SSIS Packages to SSISDB in Azure 3. Orchestrate SSIS Packages in Azure Data Factory

32. Azure-SSIS Integration Runtime Managed cluster of Azure VMs dedicated to SSIS Billed while running (like all VMs) Manage cost by running when necessary

33. DEMO Let’s explore SSIS in Azure Data Factory!

34. Pipeline Linked Service Source Sink Activity Data Flow ≈ ≈ ≈ ≈ ≈ ≈ Package Connection Manager Source Destination Control Flow Task Data Flow ADF vs SSIS

35. Pipeline Linked Service Source Sink Activity Data Flow ≈ ≈ ≈ ≈ ≈ ≈ Package Connection Manager Source Destination Control Flow Task Data Flow ADF vs SSIS

36. Mapping Data Flows

37. What are Mapping Data Flows? Data transformation at scale Runs on Azure Databricks Visual editor, no-code experience

38. How do Mapping Data Flows work?

39. Why use Mapping Data Flows? Transform Data Upsert Data Load a Data Warehouse Handle schema drift

40. What is Schema Drift? Rapidly changing source files and metadata: • Added / Removed Columns • Renamed Column Names • Changed Data Types If not handled properly, Schema Drift can (and most likely will) cause problems in the upstream pipeline

41. Schema Drift in SSIS

42. Schema Drift in ADF

43. Oh no!

44. DEMO Let's create some mapping data flows!

45. Lessons Learned In ADF, everything has a price SSIS best practices != ADF best practices Learn how to learn and adapt

46. Good luck!

47. @cathrinew cathrinew.net hi@cathrinew.net thank you!