Next Generation of Data Integration with
Azure Data Factory
Learn more about Azure Data Factory, the easiest cloud-based data integration
service at scale.
Hi, I’m Tom Kerkhove
1
Azure Architect
Microsoft Azure MVP & Advisor
Hi, I’m Tom Kerkhove
2
@TomKerkhove
tomkerkhove
Agenda
| Basics of Azure Data Factory
| Demo
| How is it different from Logic Apps?
| Q&A
September 2018 Next Generation of Data Integration with Azure Data Factory 3
Azure Serverless
Azure Logic AppsAzure Functions Azure Event Grid Azure Data Factory
September 2018 Next Generation of Data Integration with Azure Data Factory 4
5
Cloud
On-Premises
Storage
Analytics & AI
3rd Party Integration
Taking Actions
Decision Making
& Reporting
Hybrid (Integration) (Industrial) IoT
6
Cloud
On-Premises
Basics of Azure Data Factory
7
What is Azure Data Factory?
September 2018 Next Generation of Data Integration with Azure Data Factory 8
| Managed data orchestration service
| Allows you to run pipelines
| Support for hybrid scenarios
| Support for executing SSIS packages
| Data movement-as-a-service with 70+ connectors
| Visual tooling & programmability
| .NET, Python, REST, ARM
Basics of Azure Data Factory
Anatomy of a data pipeline
September 2018 Next Generation of Data Integration with Azure Data Factory 9
Basics of Azure Data Factory
Trigger(s) Activity ActivityActivity
Activity
Activity
Anatomy of a data pipeline
September 2018 Next Generation of Data Integration with Azure Data Factory 10
| A pipeline represents a business process which contains one or more “steps”
which are called activities.
| Triggers initiate a specific pipeline which can contain parameters
| Activities represent a step in a business process that perform a specific action.
| This is based on the outcome of the previous step and can be on success, failure, skipped
or completion
Basics of Azure Data Factory
Anatomy of a data pipeline
September 2018 Next Generation of Data Integration with Azure Data Factory 11
Basics of Azure Data Factory
Trigger(s) Activity ActivityActivity
Activity
Activity
Integration Runtime (IR)
September 2018 Next Generation of Data Integration with Azure Data Factory 12
| Compute infrastructure used by Data Factory
| Azure, Azure-SSIS or Self-Hosted (Any cloud or on-prem)
| Core capabilities
| Data movement
| Pipeline activity execution
| SSIS package execution
| Pipelines issues commands & control, integration runtime executes
| Data movement is from IR to IR
| All executions are happening in sources & sinks
Basics of Azure Data Factory
13
| Azure Data Factory
| Scheduling of pipelines
| Orchestrating the activities across
Integration Runtimes
| Monitoring the progress
| Integration Runtime (IR)
| Execution engine
| Core capabilities:
| Data movement
| Pipeline activity execution
| SSIS package execution
High-Level Overview
Integration Runtime (IR)
September 2018 Next Generation of Data Integration with Azure Data Factory 14
What is Azure Data Factory?
September 2018 Next Generation of Data Integration with Azure Data Factory 15
Basics of Azure Data Factory
Trigger(s) Activity ActivityActivity
Activity
Activity
Triggers
September 2018 Next Generation of Data Integration with Azure Data Factory 16
| Different types of triggers
| On-Demand
| Triggered via REST API, .NET, etc.
| Azure API Management can make this easier (http://bit.ly/api-management-adf)
| Scheduled / Wall-clock
| Tumbling Windows (aka “data slicing”)
| Event-based (New file is added to blob storage)
| Support for passing parameters
Basics of Azure Data Factory
Activities, data sets & linked services
September 2018 Next Generation of Data Integration with Azure Data Factory 17
| An activity can produce or consume a data set. It is a representation of a data
structure in a data store that can be used as a source or sink.
| Linked Services define how an activity can connect to an external system. This
external system can be a data store or compute resource.
Basics of Azure Data Factory
Activities, data sets & linked services
September 2018 Next Generation of Data Integration with Azure Data Factory 18
Basics of Azure Data Factory
Activity
Data
Set
Linked
Service
Represents data
stored in
Produces
Consumes
Activities
September 2018 Next Generation of Data Integration with Azure Data Factory 19
| Data Movement
| Azure, Databases, NoSQL, File, SaaS, Web, etc
| Data Transformation
| Pig, Hive, Stored Procedure, U-SQL, ML, Spark, MapReduce, etc.
| Control Flow
| Web call, Lookup, Get Metadata, If, Wait, ForEach, Execute Pipeline, etc
| Custom
| Run commands on an Azure Batch cluster
| Run R scripts on a HDInsight cluster
Basics of Azure Data Factory
Data Transformation
September 2018 Next Generation of Data Integration with Azure Data Factory 20
| Support for running activities against other Azure services
| Provide capability to perform schema column mappings (with UI support)
| Visual Data Flow Authoring Private Preview
| Serverless scale-out transformation execution engine
| No knowledge of Spark, Scala, Python or Java is required
| Walkthrough on YouTube (http://bit.ly/adf-data-flow-preview)
Basics of Azure Data Factory
Visual Data Flow Authoring Private Preview
September 2018 Next Generation of Data Integration with Azure Data Factory 21
Running SSIS packages in Azure
September 2018 Next Generation of Data Integration with Azure Data Factory 22
| Stores SSISDB in Azure SQL DB or Managed Instance
| Azure-SSIS integration runtime as compute-layer
| Compute part for running SSIS
| Managed cluster of Azure VMs
| Can be linked to VNET for hybrid scenarios
| Lift & shift packages to the cloud
Basics of Azure Data Factory
Running SSIS packages in Azure
September 2018 Next Generation of Data Integration with Azure Data Factory 23
Basics of Azure Data Factory
Security
September 2018 Next Generation of Data Integration with Azure Data Factory 24
| Native support for Managed Service Identity (MSI)
| Native integration with Azure Key Vault
| Encrypted-in-transit via HTTPS
| Supports encryption-at-rest with data stores
Basics of Azure Data Factory
Monitoring
September 2018 Next Generation of Data Integration with Azure Data Factory 25
| Visual monitoring in the portal
| Monitoring per pipeline run
| Detailed information per activity
| Azure Monitor integration
| Diagnostic Logs
| Metrics
| Alerts
Basics of Azure Data Factory
Demo - Using Azure Serverless to
become GDPR compliant
26
Using Azure Serverless to become GDPR compliant
September 2018 Next Generation of Data Integration with Azure Data Factory 27
| Every user should be capable of requesting their data
User Profile
information
StackExchange
Data Set
Kerkhove.tom
@gmail.com
Using Azure Serverless to become GDPR compliant
September 2018 Next Generation of Data Integration with Azure Data Factory 28
Send
Summary
Copy User
Info from DB
Consolidate
User Info
Copy
Consolidated
Data
Send
“Consolidation
Started”
Using Azure Serverless to become GDPR compliant
September 2018 Next Generation of Data Integration with Azure Data Factory 29
How is this different from Logic Apps?
30
How is this different from Logic Apps?
September 2018 Next Generation of Data Integration with Azure Data Factory 31
| Serverless orchestration
| Pay for what you use
| Data-centric vs Application-centric workflows
| Work together seamlessly
Conclusion
32
Conclusion
September 2018 Next Generation of Data Integration with Azure Data Factory 33
| Azure Data Factory is a great way to orchestrate data processes and build
data-integration pipelines
| Allows you to get to market very quickly with the built-in connectors
| Very powerful for data-centric workloads
| A perfect match with Azure Logic Apps
| Unsung hero in the serverless space
Any question(s)?
Read more about my demo and other Azure adventures on codit.eu/blog!
Thank you for your attention!
34

Next Generation Data Integration with Azure Data Factory

  • 1.
    Next Generation ofData Integration with Azure Data Factory Learn more about Azure Data Factory, the easiest cloud-based data integration service at scale. Hi, I’m Tom Kerkhove 1
  • 2.
    Azure Architect Microsoft AzureMVP & Advisor Hi, I’m Tom Kerkhove 2 @TomKerkhove tomkerkhove
  • 3.
    Agenda | Basics ofAzure Data Factory | Demo | How is it different from Logic Apps? | Q&A September 2018 Next Generation of Data Integration with Azure Data Factory 3
  • 4.
    Azure Serverless Azure LogicAppsAzure Functions Azure Event Grid Azure Data Factory September 2018 Next Generation of Data Integration with Azure Data Factory 4
  • 5.
    5 Cloud On-Premises Storage Analytics & AI 3rdParty Integration Taking Actions Decision Making & Reporting Hybrid (Integration) (Industrial) IoT
  • 6.
  • 7.
    Basics of AzureData Factory 7
  • 8.
    What is AzureData Factory? September 2018 Next Generation of Data Integration with Azure Data Factory 8 | Managed data orchestration service | Allows you to run pipelines | Support for hybrid scenarios | Support for executing SSIS packages | Data movement-as-a-service with 70+ connectors | Visual tooling & programmability | .NET, Python, REST, ARM Basics of Azure Data Factory
  • 9.
    Anatomy of adata pipeline September 2018 Next Generation of Data Integration with Azure Data Factory 9 Basics of Azure Data Factory Trigger(s) Activity ActivityActivity Activity Activity
  • 10.
    Anatomy of adata pipeline September 2018 Next Generation of Data Integration with Azure Data Factory 10 | A pipeline represents a business process which contains one or more “steps” which are called activities. | Triggers initiate a specific pipeline which can contain parameters | Activities represent a step in a business process that perform a specific action. | This is based on the outcome of the previous step and can be on success, failure, skipped or completion Basics of Azure Data Factory
  • 11.
    Anatomy of adata pipeline September 2018 Next Generation of Data Integration with Azure Data Factory 11 Basics of Azure Data Factory Trigger(s) Activity ActivityActivity Activity Activity
  • 12.
    Integration Runtime (IR) September2018 Next Generation of Data Integration with Azure Data Factory 12 | Compute infrastructure used by Data Factory | Azure, Azure-SSIS or Self-Hosted (Any cloud or on-prem) | Core capabilities | Data movement | Pipeline activity execution | SSIS package execution | Pipelines issues commands & control, integration runtime executes | Data movement is from IR to IR | All executions are happening in sources & sinks Basics of Azure Data Factory
  • 13.
    13 | Azure DataFactory | Scheduling of pipelines | Orchestrating the activities across Integration Runtimes | Monitoring the progress | Integration Runtime (IR) | Execution engine | Core capabilities: | Data movement | Pipeline activity execution | SSIS package execution High-Level Overview
  • 14.
    Integration Runtime (IR) September2018 Next Generation of Data Integration with Azure Data Factory 14
  • 15.
    What is AzureData Factory? September 2018 Next Generation of Data Integration with Azure Data Factory 15 Basics of Azure Data Factory Trigger(s) Activity ActivityActivity Activity Activity
  • 16.
    Triggers September 2018 NextGeneration of Data Integration with Azure Data Factory 16 | Different types of triggers | On-Demand | Triggered via REST API, .NET, etc. | Azure API Management can make this easier (http://bit.ly/api-management-adf) | Scheduled / Wall-clock | Tumbling Windows (aka “data slicing”) | Event-based (New file is added to blob storage) | Support for passing parameters Basics of Azure Data Factory
  • 17.
    Activities, data sets& linked services September 2018 Next Generation of Data Integration with Azure Data Factory 17 | An activity can produce or consume a data set. It is a representation of a data structure in a data store that can be used as a source or sink. | Linked Services define how an activity can connect to an external system. This external system can be a data store or compute resource. Basics of Azure Data Factory
  • 18.
    Activities, data sets& linked services September 2018 Next Generation of Data Integration with Azure Data Factory 18 Basics of Azure Data Factory Activity Data Set Linked Service Represents data stored in Produces Consumes
  • 19.
    Activities September 2018 NextGeneration of Data Integration with Azure Data Factory 19 | Data Movement | Azure, Databases, NoSQL, File, SaaS, Web, etc | Data Transformation | Pig, Hive, Stored Procedure, U-SQL, ML, Spark, MapReduce, etc. | Control Flow | Web call, Lookup, Get Metadata, If, Wait, ForEach, Execute Pipeline, etc | Custom | Run commands on an Azure Batch cluster | Run R scripts on a HDInsight cluster Basics of Azure Data Factory
  • 20.
    Data Transformation September 2018Next Generation of Data Integration with Azure Data Factory 20 | Support for running activities against other Azure services | Provide capability to perform schema column mappings (with UI support) | Visual Data Flow Authoring Private Preview | Serverless scale-out transformation execution engine | No knowledge of Spark, Scala, Python or Java is required | Walkthrough on YouTube (http://bit.ly/adf-data-flow-preview) Basics of Azure Data Factory
  • 21.
    Visual Data FlowAuthoring Private Preview September 2018 Next Generation of Data Integration with Azure Data Factory 21
  • 22.
    Running SSIS packagesin Azure September 2018 Next Generation of Data Integration with Azure Data Factory 22 | Stores SSISDB in Azure SQL DB or Managed Instance | Azure-SSIS integration runtime as compute-layer | Compute part for running SSIS | Managed cluster of Azure VMs | Can be linked to VNET for hybrid scenarios | Lift & shift packages to the cloud Basics of Azure Data Factory
  • 23.
    Running SSIS packagesin Azure September 2018 Next Generation of Data Integration with Azure Data Factory 23 Basics of Azure Data Factory
  • 24.
    Security September 2018 NextGeneration of Data Integration with Azure Data Factory 24 | Native support for Managed Service Identity (MSI) | Native integration with Azure Key Vault | Encrypted-in-transit via HTTPS | Supports encryption-at-rest with data stores Basics of Azure Data Factory
  • 25.
    Monitoring September 2018 NextGeneration of Data Integration with Azure Data Factory 25 | Visual monitoring in the portal | Monitoring per pipeline run | Detailed information per activity | Azure Monitor integration | Diagnostic Logs | Metrics | Alerts Basics of Azure Data Factory
  • 26.
    Demo - UsingAzure Serverless to become GDPR compliant 26
  • 27.
    Using Azure Serverlessto become GDPR compliant September 2018 Next Generation of Data Integration with Azure Data Factory 27 | Every user should be capable of requesting their data User Profile information StackExchange Data Set Kerkhove.tom @gmail.com
  • 28.
    Using Azure Serverlessto become GDPR compliant September 2018 Next Generation of Data Integration with Azure Data Factory 28 Send Summary Copy User Info from DB Consolidate User Info Copy Consolidated Data Send “Consolidation Started”
  • 29.
    Using Azure Serverlessto become GDPR compliant September 2018 Next Generation of Data Integration with Azure Data Factory 29
  • 30.
    How is thisdifferent from Logic Apps? 30
  • 31.
    How is thisdifferent from Logic Apps? September 2018 Next Generation of Data Integration with Azure Data Factory 31 | Serverless orchestration | Pay for what you use | Data-centric vs Application-centric workflows | Work together seamlessly
  • 32.
  • 33.
    Conclusion September 2018 NextGeneration of Data Integration with Azure Data Factory 33 | Azure Data Factory is a great way to orchestrate data processes and build data-integration pipelines | Allows you to get to market very quickly with the built-in connectors | Very powerful for data-centric workloads | A perfect match with Azure Logic Apps | Unsung hero in the serverless space
  • 34.
    Any question(s)? Read moreabout my demo and other Azure adventures on codit.eu/blog! Thank you for your attention! 34