SlideShare a Scribd company logo
1 of 43
Download to read offline
© 2018 YASH Technologies | | Confidential
Azure Data Factory
- Mahesh Pandit
 Why Azure data Factory
 Introduction
 Steps involves in ADF
 ADF Components
 ADF Activities
 Linked Services
 Integration Runtime and its
 How Azure Data Factory works
 Azure Data Factory V1 vs V2
 System Variables
 Functions in ADF
 Expressions in ADF
 Question- Answers
© 2018 YASH Technologies | | Confidential
Why Azure data Factory
 Modern DW for BI
 Modern DW for SaaS Apps
 Lift & Shift existing SSIS
Pkgs. to Cloud
Why Azure Data Factory
Azure SQL
Azure Data Lake
Azure Data Factory
Modern DW for Business Intelligence
Log, Files & Media
On Prem., Cloud
Apps & Data
Business/Custom apps
Data Factory
Data Factory Azure Storage Azure Databricks
Ingest Store Prep & Train Model & Serve Intelligence
Azure SQL Data warehouse
Azure Analysis services
(Power BI)
Azure Data Factory orchestrates data pipeline activity work flow & scheduling
Modern DW for SaaS Apps
Log, Files & Media
On Prem., Cloud
Apps & Data
Business/Custom apps
Data Factory
Data Factory Azure Storage Azure Databricks
Ingest Store Prep & Train Model & Serve Intelligence
SaaS App Browser/Devices
Azure Data Factory orchestrates data pipeline activity work flow & scheduling
Lift & Shift existing SSIS packages to Cloud
On Premise
On-Premise Data Sources SQL Server
Azure Data Factory orchestrates data pipeline activity work flow & scheduling
Data Factory
Cloud Data Sources
SQL DB Managed Instance
Azure Data
Integration Service
 It is cloud-based integration service
that allows you to create data-
driven workflows in the cloud for
orchestrating and automating data
movement and data transformation.
 Scheduled data-driven workflows.
 Sources and Destinations can be
either on-premise or cloud.
 Transformation can be done using
Azure HDInsight Hadoop, Spark,
Azure Data Lake Analytics and ML.
How does it work?
 The pipelines (data-driven workflows) in Azure Data Factory typically
perform the following four steps:
© 2018 YASH Technologies | | Confidential
Steps involves in ADF
 Connect and collect
 Transform and enrich
 Publish
 Monitor
Connect and collect
 The first step in building an information production system is to connect to all the
required sources of data and processing, such as software-as-a-service (SaaS)
services, databases, file shares, and FTP web services.
 With Data Factory, you can use the Copy Activity in a data pipeline to move data
from both on-premises and cloud source data stores to a centralization data store in
the cloud for further analysis.
 For example, you can collect data in Azure Data Lake as well in Azure Blob
Transform and enrich
 After data is present in a centralized data store in
the cloud, process or transform the collected data
by using compute services such as
 HDInsight Hadoop
 Spark
 Data Lake Analytics
 Machine Learning.
 After the raw data has been refined into a business-ready consumable form, load the data into Azure Data
Warehouse, Azure SQL Database, Azure Cosmos DB or many more as per user’s need.
 After you have successfully built and deployed your data integration pipeline, providing business value from
refined data, monitor the scheduled activities and pipelines for success and failure rates.
 Azure Data Factory has built-in support for pipeline monitoring via Azure Monitor, API, PowerShell, Log
Analytics, and health panels on the Azure portal.
© 2018 YASH Technologies | | Confidential
ADF Components
 Pipeline
 Activity
 Datasets
 Linked Services
ADF Components
(Table , File)
(hive, copy)
(schedule, Monitor)
(SQL Server, Hadoop
Produces Is logical
Group of
Runs on
Data item
stored in
 An Azure subscription might have one or more Azure Data Factory instances (or data factories).
 Azure Data Factory is composed of four key components.
 These components work together to provide the platform on which you can compose data-driven workflows with
steps to move and transform data.
 A data factory might have one or more pipelines.
 A pipeline is a logical grouping of activities that performs a
unit of work.
 Together, the activities in a pipeline perform a task.
 The pipeline allows you to manage the activities as a set
instead of managing each one individually.
 The activities in a pipeline can be chained together to operate
sequentially, or they can operate independently in parallel.
 To create data factory pipeline, we can use any one of the
below method:
Data Factory UI Copy Data Tool Azure Power Shell Rest
Resource Manager Template .NET Python
Pipeline Execution
 Triggers represent the unit of processing that
determines when a pipeline execution needs to
be kicked off.
 There are different types of triggers for different
types of events.
Pipeline Runs
 A pipeline run is an instance of the pipeline
 Pipeline runs are typically instantiated by
passing the arguments to the parameters
that are defined in pipelines.
 The arguments can be passed manually or
within the trigger definition.
 Parameters are key-value pairs of read-only
 Parameters are defined in the pipeline.
 The arguments for the defined parameters are
passed during execution from the run context
that was created by a trigger or a pipeline that
was executed manually.
 Activities within the pipeline consume the
parameter values.
Control Flow
 Control flow is an orchestration of pipeline
activities that includes chaining activities in a
sequence, branching, defining parameters at
the pipeline level, and passing arguments
while invoking the pipeline on-demand or
from a trigger.
 It also includes custom-state passing and
looping containers, that is, For-each
© 2018 YASH Technologies | | Confidential
ADF Activities
 Data Movement Activities
 Data Transformation Activities
 Control Activities
 Activities represent a processing step in a pipeline.
 For example, you might use a copy activity to copy data from one data store to another data store.
 Data Factory supports three types of activities:
1. Data movement activities
2. Data transformation activities
3. Control activities.
Copy Activity
Copy Activity Azure Blob
Copy Activity
Output data
Azure SQL
Data Warehouse
BI Tool
Data Movement Activities
 Copy Activity in Data Factory copies data from a source data store to a sink data store.
 Data from any source can be written to any sink.
Data Transformation Activities
 Azure Data Factory supports the following transformation activities that can be added to pipelines either
individually or chained with another activity.
Data Transformation
Data Transformation
Azure SQL, Azure SQL DW
OR SQL Server
Azure VM
Azure Data Lake
Azure Batch
Azure Databricks
Control Activities
 The following control flow activities are supported
Execute Pipeline Activity It allows a Data Factory pipeline
to invoke another pipeline.
For Each Activity
It defines a repeating control flow in your
Web Activity
It can be used to call a custom
REST endpoint from a Data Factory
Lookup Activity
It can be used to read or look up a record/
table name/ value from any external
Get Metadata Activity
It can be used to retrieve metadata
of any data in Azure Data Factory.
Until Activity
It implements Do-Until loop that is similar to Do-
Until looping structure in programming languages.
It executes a set of activities in a loop until the
condition associated with the activity evaluates to
If Condition Activity
It can be used to branch based on
condition that evaluates to true or
Wait Activity
When you use a Wait activity in a pipeline, the
pipeline waits for the specified period of time
before continuing with execution of
subsequent activities.
Linked services
 Linked services are much like connection strings, which define the connection information that's needed for Data
Factory to connect to external resources.
 A linked service defines the connection to the data source.
 For example, an Azure Storage-linked service specifies a connection string to connect to the Azure Storage
 Linked services are used for two purposes in Data Factory:
 To represent a data store that includes data stores located on-premises and in the cloud. E.g. Tables, Files,
Folders or Documents
 To represent a compute resource that can host the execution of an activity. For example, the HDInsightHive
activity runs on an HDInsight Hadoop cluster.
Apache Spark
Data Stores
Compute Resources
Integration Runtime
 Think it as a Bridge between 2 networks.
 It is compute infrastructure which provides capabilities across different N/W environments
SSIS Package
 Copy data across data
stores in public network
and data stores in private
network (on-premises or
virtual private network).
 It provides support for
built-in connectors, format
conversion, column
mapping and scalable data
 This capabilities are use
when compute services such
as Azure HDInsight, Azure
Machine Learning, Azure
SQL Database, SQL Server,
and more get used for
transformation activities.
 When SSIS packages
need to be executed in the
managed Azure Compute
Environment like HDInsight
then this capabilities are
Integration runtime types
 These three types are:
IR type Public network Private network
Azure Data movement
Activity dispatch
Self-hosted Data movement
Activity dispatch
Data movement
Activity dispatch
Azure-SSIS SSIS package
SSIS package
How Azure Data Factory Works
Integration Runtime
Integration Runtime
Integration Runtime
Activity Activity Activity
SQL Server
© 2018 YASH Technologies | | Confidential
Data Factory V1 vs. V2
Data Factory V1 vs. V2
Data Factory V1
 Datasets
 Linked Services
 Pipelines
 On-Premises Gateway
 Schedule on Dataset
availability and Pipeline
start/end Time
Data Factory V2
 Datasets
 Linked Services
 Pipelines
 Self hosted Integration
 Schedule triggers(time or
tumbling window)
 Host and Execute SSIS
Package Parameters
 New Control Flow
© 2018 YASH Technologies | | Confidential
System Variables
 Pipeline scope
 Schedule Trigger scope
 Tumbling Window Trigger scope
Pipeline Scope
These system variables can be referenced anywhere in the pipeline JSON.
@pipeline().DataFactory Name of the data factory the pipeline run is running within
@pipeline().Pipeline Name of the pipeline
@pipeline().RunId ID of the specific pipeline run
@pipeline().TriggerType Type of the trigger that invoked the pipeline (Manual, Scheduler)
@pipeline().TriggerId ID of the trigger that invokes the pipeline
@pipeline().TriggerName Name of the trigger that invokes the pipeline
@pipeline().TriggerTime Time when the trigger that invoked the pipeline. The trigger time is the
actual fired time, not the scheduled time.
Schedule Trigger Scope
 These system variables can be referenced anywhere in the trigger JSON if the trigger is of type:
Time when the trigger was scheduled to invoke the pipeline run.
For example, for a trigger that fires every 5 min, this variable would return 2017-06-
01T22:20:00Z, 2017-06-01T22:25:00Z, 2017-06-01T22:29:00Z respectively.
Time when the trigger actually fired to invoke the pipeline run.
For example, for a trigger that fires every 5 min, this variable might return
something like this 2017-06-01T22:20:00.4061448Z, 2017-06-
01T22:25:00.7958577Z, 2017-06-01T22:29:00.9935483Zrespectively.
Tumbling window Trigger Scope
 These system variables can be referenced anywhere in the trigger JSON if the trigger is of type:
Start of the window when the trigger was scheduled to invoke the pipeline run. If the
tumbling window trigger has a frequency of "hourly" this would be the time at the
beginning of the hour.
End of the window when the trigger was scheduled to invoke the pipeline run. If the
tumbling window trigger has a frequency of "hourly" this would be the time at the
end of the hour.
© 2018 YASH Technologies | | Confidential
Functions in Azure
 String Functions
 Collection Functions
 Logical Functions
 Conversion Functions
 Math Functions
 Date Functions
String Functions
Function Description Example
concat Combines any number of strings together. concat(‘Hi’, ‘team’) : Hi team
substring Returns a subset of characters from a
substring('somevalue',1,3) : ome
replace Replaces a string with a given string. replace(‘Hi team', ‘Hi', ‘Hey') : Hey team
guid Generates a globally unique string guid() : c2ecc88d-88c8-4096-912c-d6
toLower Converts a string to lowercase. toLower('Two’) : two
toUpper Converts a string to uppercase. toUpper('Two’) : TWO
indexof Find the index of a value within a string
case insensitively.
indexof(Hi team', ‘Hi’) : 0
endswith Checks if the string ends with a value case
endswith(‘Hi team', ‘team') : true
startswith Checks if the string starts with a value
case insensitively.
startswith(‘Hi team', ‘team') : false
split Splits the string using a separator. split(‘Hi;team', ‘;') : [“Hi", “team“]
lastindexof Find the last index of a value within a
string case insensitively.
lastindexof('foofoo‘) : 3
Collection Functions
Function Description Example
contains Returns true if dictionary contains a key,
list contains value, or string contains
substring. .
: true
length Returns the number of elements in an
array or string.
: 3
empty Returns true if object, array, or string is
: true
intersection Returns a single array or object with the
common elements between the arrays or
objects passed to it.
intersection([1, 2, 3], [101, 2, 1, 10],[6, 8,
1, 2])
: [1, 2]
union Returns a single array or object with all of
the elements that are in either array or
object passed to it.
union([1, 2, 3], [101, 2, 1, 10])
: [1, 2, 3, 10, 101]
first Returns the first element in the array or
string passed in.
: 0
last Returns the last element in the array or
string passed in.
take Returns the first Count elements from the
array or string passed in
take([1, 2, 3, 4], 2)
: [1, 2]
skip Returns the elements in the array starting
at index Count,
skip([1, 2 ,3 ,4], 2)
: [3, 4]
Logical Functions
Function Description Example
int Convert the parameter to an integer. int('100')
: 100
string Convert the parameter to a string. string(10)
: ‘10’
json Convert the parameter to a JSON type
json('[1,2,3]') : [1,2,3]
json('{"bar" : "baz"}') : { "bar" : "baz" }
float Convert the parameter argument to a
floating-point number.
: 10.333
bool Convert the parameter to a Boolean. bool(0)
: false
coalesce Returns the first non-null object in the
arguments passed in. Note: an empty
string is not null.
er1', pipeline().parameters.parameter2
: fallback
array Convert the parameter to an array. array('abc')
: ["abc"]
createArray Creates an array from the parameters. createArray('a', 'c')
: ["a", "c"]
Math Functions
Function Description Example
add Returns the result of the addition of the
two numbers.
add(10,10.333): 20.333
sub Returns the result of the subtraction of the
two numbers.
sub(10,10.333): -0.333
mul Returns the result of the multiplication of
the two numbers.
mul(10,10.333): 103.33
div Returns the result of the division of the
two numbers.
div(10.333,10): 1.0333
mod Returns the result of the remainder after
the division of the two numbers (modulo).
mod(10,4) :2
min There are two different patterns for calling
this function. Note, all values must be
min([0,1,2]) :0
min(0,1,2) : 0
max There are two different patterns for calling
this function. Note, all values must be
max([0,1,2]) :2
max(0,1,2) : 2
range Generates an array of integers starting
from a certain number, and you define the
length of the returned array.
range(3,4) : [3,4,5,6]
rand Generates a random integer within the
specified range
rand(-1000,1000) : 42
Date Functions
Function Description Example
utcnow Returns the current timestamp as a string. . utcnow()
: 2019-02-21T13:27:36Z
addseconds Adds an integer number of seconds to a
string timestamp passed in. The number of
seconds can be positive or negative.
addseconds('2015-03-15T13:27:36Z', -36)
addminutes Adds an integer number of minutes to a
string timestamp passed in. The number of
minutes can be positive or negative.
addminutes('2015-03-15T13:27:36Z', 33)
addhours Adds an integer number of hours to a string
timestamp passed in. The number of hours
can be positive or negative.
addhours('2015-03-15T13:27:36Z', 12)
adddays Adds an integer number of days to a string
timestamp passed in. The number of days
can be positive or negative.
adddays('2015-03-15T13:27:36Z', -20)
formatDateTime Returns a string in date format. formatDateTime('2015-03-15T13:27:36Z',
Expressions in Azure Data Factory
 JSON values in the definition can be literal or expressions that are evaluated at runtime.
E. g. "name": "value“ OR "name": "@pipeline().parameters.password“
 Expressions can appear anywhere in a JSON string value and always result in another JSON value.
 If a JSON value is an expression, the body of the expression is extracted by removing the at-sign (@).
JSON value Result
"parameters" The characters 'parameters' are returned.
"parameters[1]" The characters 'parameters[1]' are
"@@" A 1 character string that contains '@' is
" @" A 2 character string that contains ' @' is
A dataset with a parameter
 Suppose the BlobDataset takes a parameter named path.
 Its value is used to set a value for the folderPath property by using the following expressions:
"folderPath": "@dataset().path"
A pipeline with a parameter
 In the following example, the pipeline takes inputPath and outputPath parameters.
 The path for the parameterized blob dataset is set by using values of these parameters.
 The syntax used here is: :
"path": "@pipeline().parameters.inputPath"
Question- Answers
© 2018 YASH Technologies | | Confidential
Feel free to write to me at:
in case of any queries / clarifications.

More Related Content

What's hot

Microsoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMicrosoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMark Kromer
Azure Data Factory Data Flow
Azure Data Factory Data FlowAzure Data Factory Data Flow
Azure Data Factory Data FlowMark Kromer
Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...
Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...
Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...Edureka!
Intro to Azure Data Factory v1
Intro to Azure Data Factory v1Intro to Azure Data Factory v1
Intro to Azure Data Factory v1Eric Bragas
Azure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene PolonichkoAzure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene PolonichkoDimko Zhluktenko
Core Concepts in azure data factory
Core Concepts in azure data factoryCore Concepts in azure data factory
Core Concepts in azure data factoryBRIJESH KUMAR
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2inovex GmbH
Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAlex Tumanoff
Deep Dive into Azure Data Factory v2
Deep Dive into Azure Data Factory v2Deep Dive into Azure Data Factory v2
Deep Dive into Azure Data Factory v2Eric Bragas
Azure data factory
Azure data factoryAzure data factory
Azure data factoryBizTalk360
Azure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudAzure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudMark Kromer
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Cathrine Wilhelmsen
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure DatabricksJames Serra
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)James Serra
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overviewJames Serra
Azure Data Factory for Azure Data Week
Azure Data Factory for Azure Data WeekAzure Data Factory for Azure Data Week
Azure Data Factory for Azure Data WeekMark Kromer
Microsoft Azure - Introduction
Microsoft Azure - IntroductionMicrosoft Azure - Introduction
Microsoft Azure - IntroductionPranav Ainavolu
Serverless data and analytics on AWS for operations
Serverless data and analytics on AWS for operations Serverless data and analytics on AWS for operations
Serverless data and analytics on AWS for operations CloudHesive

What's hot (20)

Microsoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMicrosoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview Slides
Azure Data Factory Data Flow
Azure Data Factory Data FlowAzure Data Factory Data Flow
Azure Data Factory Data Flow
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2
Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...
Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...
Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...
Intro to Azure Data Factory v1
Intro to Azure Data Factory v1Intro to Azure Data Factory v1
Intro to Azure Data Factory v1
Azure Data Engineering.pptx
Azure Data Engineering.pptxAzure Data Engineering.pptx
Azure Data Engineering.pptx
Azure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene PolonichkoAzure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene Polonichko
Core Concepts in azure data factory
Core Concepts in azure data factoryCore Concepts in azure data factory
Core Concepts in azure data factory
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2
Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene Polonichko
Deep Dive into Azure Data Factory v2
Deep Dive into Azure Data Factory v2Deep Dive into Azure Data Factory v2
Deep Dive into Azure Data Factory v2
Azure data factory
Azure data factoryAzure data factory
Azure data factory
Azure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudAzure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the Cloud
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
Azure Data Factory for Azure Data Week
Azure Data Factory for Azure Data WeekAzure Data Factory for Azure Data Week
Azure Data Factory for Azure Data Week
Microsoft Azure - Introduction
Microsoft Azure - IntroductionMicrosoft Azure - Introduction
Microsoft Azure - Introduction
Serverless data and analytics on AWS for operations
Serverless data and analytics on AWS for operations Serverless data and analytics on AWS for operations
Serverless data and analytics on AWS for operations

Similar to Azure Data Factory Introduction.pdf

Transform your data with Azure Data factory
Transform your data with Azure Data factoryTransform your data with Azure Data factory
Transform your data with Azure Data factoryPrometix Pty Ltd
Azure Data Factory usage at Aucfanlab
Azure Data Factory usage at AucfanlabAzure Data Factory usage at Aucfanlab
Azure Data Factory usage at AucfanlabAucfan
Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -Aucfan
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Lace Lofranco
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptxFedoRam1
Azure Data Engineer Course | Azure Data Engineer Training Hyderabad.pptx
Azure Data Engineer Course | Azure Data Engineer Training Hyderabad.pptxAzure Data Engineer Course | Azure Data Engineer Training Hyderabad.pptx
Azure Data Engineer Course | Azure Data Engineer Training Hyderabad.pptxsivavisualpath
Migration to Databricks - On-prem HDFS.pptx
Migration to Databricks - On-prem HDFS.pptxMigration to Databricks - On-prem HDFS.pptx
Migration to Databricks - On-prem HDFS.pptxKshitija(KJ) Gupte
Exploring Microsoft Azure Infrastructures
Exploring Microsoft Azure InfrastructuresExploring Microsoft Azure Infrastructures
Exploring Microsoft Azure InfrastructuresCCG
ADF Demo_ppt.pptx
ADF Demo_ppt.pptxADF Demo_ppt.pptx
ADF Demo_ppt.pptxvamsytaurus
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...AboutYouGmbH
Data Modernization_Harinath Susairaj.pptx
Data Modernization_Harinath Susairaj.pptxData Modernization_Harinath Susairaj.pptx
Data Modernization_Harinath Susairaj.pptxArunPandiyan890855
Alluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudAlluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudShubham Tagra
Enterprise Cloud Data Platforms - with Microsoft Azure
Enterprise Cloud Data Platforms - with Microsoft AzureEnterprise Cloud Data Platforms - with Microsoft Azure
Enterprise Cloud Data Platforms - with Microsoft AzureKhalid Salama
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overviewvhrocca
Comprehensive Guide for Microsoft Fabric to Master Data Analytics
Comprehensive Guide for Microsoft Fabric to Master Data AnalyticsComprehensive Guide for Microsoft Fabric to Master Data Analytics
Comprehensive Guide for Microsoft Fabric to Master Data AnalyticsSparity1
Azure Data Factory for Redmond SQL PASS UG Sept 2018
Azure Data Factory for Redmond SQL PASS UG Sept 2018Azure Data Factory for Redmond SQL PASS UG Sept 2018
Azure Data Factory for Redmond SQL PASS UG Sept 2018Mark Kromer
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSAWS User Group Kochi
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Trivadis

Similar to Azure Data Factory Introduction.pdf (20)

Transform your data with Azure Data factory
Transform your data with Azure Data factoryTransform your data with Azure Data factory
Transform your data with Azure Data factory
Azure Data Factory usage at Aucfanlab
Azure Data Factory usage at AucfanlabAzure Data Factory usage at Aucfanlab
Azure Data Factory usage at Aucfanlab
Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
Azure Data Engineer Course | Azure Data Engineer Training Hyderabad.pptx
Azure Data Engineer Course | Azure Data Engineer Training Hyderabad.pptxAzure Data Engineer Course | Azure Data Engineer Training Hyderabad.pptx
Azure Data Engineer Course | Azure Data Engineer Training Hyderabad.pptx
Migration to Databricks - On-prem HDFS.pptx
Migration to Databricks - On-prem HDFS.pptxMigration to Databricks - On-prem HDFS.pptx
Migration to Databricks - On-prem HDFS.pptx
Exploring Microsoft Azure Infrastructures
Exploring Microsoft Azure InfrastructuresExploring Microsoft Azure Infrastructures
Exploring Microsoft Azure Infrastructures
ADF Demo_ppt.pptx
ADF Demo_ppt.pptxADF Demo_ppt.pptx
ADF Demo_ppt.pptx
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Data Modernization_Harinath Susairaj.pptx
Data Modernization_Harinath Susairaj.pptxData Modernization_Harinath Susairaj.pptx
Data Modernization_Harinath Susairaj.pptx
Alluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudAlluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the Cloud
Enterprise Cloud Data Platforms - with Microsoft Azure
Enterprise Cloud Data Platforms - with Microsoft AzureEnterprise Cloud Data Platforms - with Microsoft Azure
Enterprise Cloud Data Platforms - with Microsoft Azure
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overview
Comprehensive Guide for Microsoft Fabric to Master Data Analytics
Comprehensive Guide for Microsoft Fabric to Master Data AnalyticsComprehensive Guide for Microsoft Fabric to Master Data Analytics
Comprehensive Guide for Microsoft Fabric to Master Data Analytics
Azure Data Factory for Redmond SQL PASS UG Sept 2018
Azure Data Factory for Redmond SQL PASS UG Sept 2018Azure Data Factory for Redmond SQL PASS UG Sept 2018
Azure Data Factory for Redmond SQL PASS UG Sept 2018
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)

Recently uploaded

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto

Recently uploaded (20)

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf

Azure Data Factory Introduction.pdf

  • 1. © 2018 YASH Technologies | | Confidential Azure Data Factory - Mahesh Pandit
  • 2. 2 Agenda  Why Azure data Factory  Introduction  Steps involves in ADF  ADF Components  ADF Activities  Linked Services  Integration Runtime and its types  How Azure Data Factory works  Azure Data Factory V1 vs V2  System Variables  Functions in ADF  Expressions in ADF  Question- Answers
  • 3. 3 © 2018 YASH Technologies | | Confidential Why Azure data Factory  Modern DW for BI  Modern DW for SaaS Apps  Lift & Shift existing SSIS Pkgs. to Cloud
  • 4. 4 Why Azure Data Factory Azure SQL DW Azure Data Lake Azure Data Factory
  • 5. 5 Modern DW for Business Intelligence Log, Files & Media (Unstructured) On Prem., Cloud Apps & Data Business/Custom apps (Structures) Data Factory Data Factory Azure Storage Azure Databricks Spark Ingest Store Prep & Train Model & Serve Intelligence Azure SQL Data warehouse Azure Analysis services Analytical Dashboards (Power BI) Azure Data Factory orchestrates data pipeline activity work flow & scheduling
  • 6. 6 Modern DW for SaaS Apps Log, Files & Media (Unstructured) On Prem., Cloud Apps & Data Business/Custom apps (Structures) Data Factory Data Factory Azure Storage Azure Databricks Spark Ingest Store Prep & Train Model & Serve Intelligence SaaS App Browser/Devices Azure Data Factory orchestrates data pipeline activity work flow & scheduling App Storage
  • 7. 7 Lift & Shift existing SSIS packages to Cloud Cloud On Premise On-Premise Data Sources SQL Server Azure Data Factory orchestrates data pipeline activity work flow & scheduling Data Factory Cloud Data Sources SQL DB Managed Instance VNET
  • 8. 8 Introduction Azure Data Factory Cloud-based Integration Service  It is cloud-based integration service that allows you to create data- driven workflows in the cloud for orchestrating and automating data movement and data transformation.  Scheduled data-driven workflows.  Sources and Destinations can be either on-premise or cloud.  Transformation can be done using Azure HDInsight Hadoop, Spark, Azure Data Lake Analytics and ML.
  • 9. 9 How does it work?  The pipelines (data-driven workflows) in Azure Data Factory typically perform the following four steps:
  • 10. 10 © 2018 YASH Technologies | | Confidential Steps involves in ADF  Connect and collect  Transform and enrich  Publish  Monitor
  • 11. 11 Connect and collect  The first step in building an information production system is to connect to all the required sources of data and processing, such as software-as-a-service (SaaS) services, databases, file shares, and FTP web services.  With Data Factory, you can use the Copy Activity in a data pipeline to move data from both on-premises and cloud source data stores to a centralization data store in the cloud for further analysis.  For example, you can collect data in Azure Data Lake as well in Azure Blob storage.
  • 12. 12 Transform and enrich  After data is present in a centralized data store in the cloud, process or transform the collected data by using compute services such as  HDInsight Hadoop  Spark  Data Lake Analytics  Machine Learning.
  • 13. 13 Publish  After the raw data has been refined into a business-ready consumable form, load the data into Azure Data Warehouse, Azure SQL Database, Azure Cosmos DB or many more as per user’s need.
  • 14. 14 Monitor  After you have successfully built and deployed your data integration pipeline, providing business value from refined data, monitor the scheduled activities and pipelines for success and failure rates.  Azure Data Factory has built-in support for pipeline monitoring via Azure Monitor, API, PowerShell, Log Analytics, and health panels on the Azure portal.
  • 15. 15 © 2018 YASH Technologies | | Confidential ADF Components  Pipeline  Activity  Datasets  Linked Services
  • 16. 16 ADF Components DATA SET (Table , File) ACTIVITY (hive, copy) PIPELINE (schedule, Monitor) LINKED SERVICE (SQL Server, Hadoop Cluster) Consume Produces Is logical Group of Runs on Represent Data item stored in  An Azure subscription might have one or more Azure Data Factory instances (or data factories).  Azure Data Factory is composed of four key components.  These components work together to provide the platform on which you can compose data-driven workflows with steps to move and transform data.
  • 17. 17 Pipeline  A data factory might have one or more pipelines.  A pipeline is a logical grouping of activities that performs a unit of work.  Together, the activities in a pipeline perform a task.  The pipeline allows you to manage the activities as a set instead of managing each one individually.  The activities in a pipeline can be chained together to operate sequentially, or they can operate independently in parallel.  To create data factory pipeline, we can use any one of the below method: Data Factory UI Copy Data Tool Azure Power Shell Rest Resource Manager Template .NET Python
  • 18. 18 Pipeline Execution Triggers  Triggers represent the unit of processing that determines when a pipeline execution needs to be kicked off.  There are different types of triggers for different types of events. Pipeline Runs  A pipeline run is an instance of the pipeline execution.  Pipeline runs are typically instantiated by passing the arguments to the parameters that are defined in pipelines.  The arguments can be passed manually or within the trigger definition. Parameters  Parameters are key-value pairs of read-only configuration.  Parameters are defined in the pipeline.  The arguments for the defined parameters are passed during execution from the run context that was created by a trigger or a pipeline that was executed manually.  Activities within the pipeline consume the parameter values. Control Flow  Control flow is an orchestration of pipeline activities that includes chaining activities in a sequence, branching, defining parameters at the pipeline level, and passing arguments while invoking the pipeline on-demand or from a trigger.  It also includes custom-state passing and looping containers, that is, For-each iterators.
  • 19. 19 © 2018 YASH Technologies | | Confidential ADF Activities  Data Movement Activities  Data Transformation Activities  Control Activities
  • 20. 20 Activity  Activities represent a processing step in a pipeline.  For example, you might use a copy activity to copy data from one data store to another data store.  Data Factory supports three types of activities: 1. Data movement activities 2. Data transformation activities 3. Control activities. Copy Activity Copy Activity Azure Blob Transformation Activity Copy Activity Output data Azure SQL Data Warehouse BI Tool
  • 21. 21 Data Movement Activities  Copy Activity in Data Factory copies data from a source data store to a sink data store.  Data from any source can be written to any sink. …….
  • 22. 22 Data Transformation Activities  Azure Data Factory supports the following transformation activities that can be added to pipelines either individually or chained with another activity. Compute Environment Data Transformation Activity Compute Environment Data Transformation Activity HDInsight HDInsight HDInsight HDInsight HDInsight Azure SQL, Azure SQL DW OR SQL Server Azure VM Azure Data Lake Analytics Azure Batch Azure Databricks
  • 23. 23 Control Activities  The following control flow activities are supported Execute Pipeline Activity It allows a Data Factory pipeline to invoke another pipeline. For Each Activity It defines a repeating control flow in your pipeline. Web Activity It can be used to call a custom REST endpoint from a Data Factory pipeline. Lookup Activity It can be used to read or look up a record/ table name/ value from any external source. Get Metadata Activity It can be used to retrieve metadata of any data in Azure Data Factory. Until Activity It implements Do-Until loop that is similar to Do- Until looping structure in programming languages. It executes a set of activities in a loop until the condition associated with the activity evaluates to true. If Condition Activity It can be used to branch based on condition that evaluates to true or false. Wait Activity When you use a Wait activity in a pipeline, the pipeline waits for the specified period of time before continuing with execution of subsequent activities.
  • 24. 24 Linked services  Linked services are much like connection strings, which define the connection information that's needed for Data Factory to connect to external resources.  A linked service defines the connection to the data source.  For example, an Azure Storage-linked service specifies a connection string to connect to the Azure Storage account.  Linked services are used for two purposes in Data Factory:  To represent a data store that includes data stores located on-premises and in the cloud. E.g. Tables, Files, Folders or Documents  To represent a compute resource that can host the execution of an activity. For example, the HDInsightHive activity runs on an HDInsight Hadoop cluster. Tables Files HDInsight Apache Spark ....... Data Stores Compute Resources
  • 25. 25 Integration Runtime  Think it as a Bridge between 2 networks.  It is compute infrastructure which provides capabilities across different N/W environments Data Movement Activity Dispatch SSIS Package Execution  Copy data across data stores in public network and data stores in private network (on-premises or virtual private network).  It provides support for built-in connectors, format conversion, column mapping and scalable data transfer.  This capabilities are use when compute services such as Azure HDInsight, Azure Machine Learning, Azure SQL Database, SQL Server, and more get used for transformation activities.  When SSIS packages need to be executed in the managed Azure Compute Environment like HDInsight then this capabilities are used.
  • 26. 26 Integration runtime types  These three types are: IR type Public network Private network Azure Data movement Activity dispatch Self-hosted Data movement Activity dispatch Data movement Activity dispatch Azure-SSIS SSIS package execution SSIS package execution
  • 27. 27 How Azure Data Factory Works Integration Runtime Integration Runtime Integration Runtime Dataset Dataset Dataset Pipeline Activity Activity Activity On-Premise SQL Server DB
  • 28. 28 © 2018 YASH Technologies | | Confidential Data Factory V1 vs. V2
  • 29. 29 Data Factory V1 vs. V2 Data Factory V1  Datasets  Linked Services  Pipelines  On-Premises Gateway  Schedule on Dataset availability and Pipeline start/end Time Data Factory V2  Datasets  Linked Services  Pipelines  Self hosted Integration Runtime  Schedule triggers(time or tumbling window)  Host and Execute SSIS Package Parameters  New Control Flow Activities
  • 30. 30 © 2018 YASH Technologies | | Confidential System Variables  Pipeline scope  Schedule Trigger scope  Tumbling Window Trigger scope
  • 31. 31 Pipeline Scope These system variables can be referenced anywhere in the pipeline JSON. @pipeline().DataFactory Name of the data factory the pipeline run is running within @pipeline().Pipeline Name of the pipeline @pipeline().RunId ID of the specific pipeline run @pipeline().TriggerType Type of the trigger that invoked the pipeline (Manual, Scheduler) @pipeline().TriggerId ID of the trigger that invokes the pipeline @pipeline().TriggerName Name of the trigger that invokes the pipeline @pipeline().TriggerTime Time when the trigger that invoked the pipeline. The trigger time is the actual fired time, not the scheduled time.
  • 32. 32 Schedule Trigger Scope  These system variables can be referenced anywhere in the trigger JSON if the trigger is of type: "ScheduleTrigger." @trigger().scheduledTime Time when the trigger was scheduled to invoke the pipeline run. For example, for a trigger that fires every 5 min, this variable would return 2017-06- 01T22:20:00Z, 2017-06-01T22:25:00Z, 2017-06-01T22:29:00Z respectively. @trigger().startTime Time when the trigger actually fired to invoke the pipeline run. For example, for a trigger that fires every 5 min, this variable might return something like this 2017-06-01T22:20:00.4061448Z, 2017-06- 01T22:25:00.7958577Z, 2017-06-01T22:29:00.9935483Zrespectively.
  • 33. 33 Tumbling window Trigger Scope  These system variables can be referenced anywhere in the trigger JSON if the trigger is of type: "TumblingWindowTrigger“. @trigger().outputs.windowStartTime Start of the window when the trigger was scheduled to invoke the pipeline run. If the tumbling window trigger has a frequency of "hourly" this would be the time at the beginning of the hour. @trigger().outputs.windowEndTime End of the window when the trigger was scheduled to invoke the pipeline run. If the tumbling window trigger has a frequency of "hourly" this would be the time at the end of the hour.
  • 34. 34 © 2018 YASH Technologies | | Confidential Functions in Azure  String Functions  Collection Functions  Logical Functions  Conversion Functions  Math Functions  Date Functions
  • 35. 35 String Functions Function Description Example concat Combines any number of strings together. concat(‘Hi’, ‘team’) : Hi team substring Returns a subset of characters from a string. substring('somevalue',1,3) : ome replace Replaces a string with a given string. replace(‘Hi team', ‘Hi', ‘Hey') : Hey team guid Generates a globally unique string guid() : c2ecc88d-88c8-4096-912c-d6 toLower Converts a string to lowercase. toLower('Two’) : two toUpper Converts a string to uppercase. toUpper('Two’) : TWO indexof Find the index of a value within a string case insensitively. indexof(Hi team', ‘Hi’) : 0 endswith Checks if the string ends with a value case insensitively. endswith(‘Hi team', ‘team') : true startswith Checks if the string starts with a value case insensitively. startswith(‘Hi team', ‘team') : false split Splits the string using a separator. split(‘Hi;team', ‘;') : [“Hi", “team“] lastindexof Find the last index of a value within a string case insensitively. lastindexof('foofoo‘) : 3
  • 36. 36 Collection Functions Function Description Example contains Returns true if dictionary contains a key, list contains value, or string contains substring. . contains('abacaba','aca') : true length Returns the number of elements in an array or string. length('abc') : 3 empty Returns true if object, array, or string is empty. empty('') : true intersection Returns a single array or object with the common elements between the arrays or objects passed to it. intersection([1, 2, 3], [101, 2, 1, 10],[6, 8, 1, 2]) : [1, 2] union Returns a single array or object with all of the elements that are in either array or object passed to it. union([1, 2, 3], [101, 2, 1, 10]) : [1, 2, 3, 10, 101] first Returns the first element in the array or string passed in. first([0,2,3]) : 0 last Returns the last element in the array or string passed in. last('0123') :3 take Returns the first Count elements from the array or string passed in take([1, 2, 3, 4], 2) : [1, 2] skip Returns the elements in the array starting at index Count, skip([1, 2 ,3 ,4], 2) : [3, 4]
  • 37. 37 Logical Functions Function Description Example int Convert the parameter to an integer. int('100') : 100 string Convert the parameter to a string. string(10) : ‘10’ json Convert the parameter to a JSON type value. json('[1,2,3]') : [1,2,3] json('{"bar" : "baz"}') : { "bar" : "baz" } float Convert the parameter argument to a floating-point number. float('10.333') : 10.333 bool Convert the parameter to a Boolean. bool(0) : false coalesce Returns the first non-null object in the arguments passed in. Note: an empty string is not null. coalesce(pipeline().parameters.paramet er1', pipeline().parameters.parameter2 ,'fallback') : fallback array Convert the parameter to an array. array('abc') : ["abc"] createArray Creates an array from the parameters. createArray('a', 'c') : ["a", "c"]
  • 38. 38 Math Functions Function Description Example add Returns the result of the addition of the two numbers. add(10,10.333): 20.333 sub Returns the result of the subtraction of the two numbers. sub(10,10.333): -0.333 mul Returns the result of the multiplication of the two numbers. mul(10,10.333): 103.33 div Returns the result of the division of the two numbers. div(10.333,10): 1.0333 mod Returns the result of the remainder after the division of the two numbers (modulo). mod(10,4) :2 min There are two different patterns for calling this function. Note, all values must be numbers min([0,1,2]) :0 min(0,1,2) : 0 max There are two different patterns for calling this function. Note, all values must be numbers max([0,1,2]) :2 max(0,1,2) : 2 range Generates an array of integers starting from a certain number, and you define the length of the returned array. range(3,4) : [3,4,5,6] rand Generates a random integer within the specified range rand(-1000,1000) : 42
  • 39. 39 Date Functions Function Description Example utcnow Returns the current timestamp as a string. . utcnow() : 2019-02-21T13:27:36Z addseconds Adds an integer number of seconds to a string timestamp passed in. The number of seconds can be positive or negative. addseconds('2015-03-15T13:27:36Z', -36) :2015-03-15T13:27:00Z addminutes Adds an integer number of minutes to a string timestamp passed in. The number of minutes can be positive or negative. addminutes('2015-03-15T13:27:36Z', 33) :2015-03-15T14:00:36Z addhours Adds an integer number of hours to a string timestamp passed in. The number of hours can be positive or negative. addhours('2015-03-15T13:27:36Z', 12) :2015-03-16T01:27:36Z adddays Adds an integer number of days to a string timestamp passed in. The number of days can be positive or negative. adddays('2015-03-15T13:27:36Z', -20) :2015-02-23T13:27:36Z formatDateTime Returns a string in date format. formatDateTime('2015-03-15T13:27:36Z', 'o') :2015-02-23T13:27:36Z
  • 40. 40 Expressions in Azure Data Factory  JSON values in the definition can be literal or expressions that are evaluated at runtime. E. g. "name": "value“ OR "name": "@pipeline().parameters.password“  Expressions can appear anywhere in a JSON string value and always result in another JSON value.  If a JSON value is an expression, the body of the expression is extracted by removing the at-sign (@). JSON value Result "parameters" The characters 'parameters' are returned. "parameters[1]" The characters 'parameters[1]' are returned. "@@" A 1 character string that contains '@' is returned. " @" A 2 character string that contains ' @' is returned.
  • 41. 41 A dataset with a parameter  Suppose the BlobDataset takes a parameter named path.  Its value is used to set a value for the folderPath property by using the following expressions: "folderPath": "@dataset().path" A pipeline with a parameter  In the following example, the pipeline takes inputPath and outputPath parameters.  The path for the parameterized blob dataset is set by using values of these parameters.  The syntax used here is: : "path": "@pipeline().parameters.inputPath"
  • 43. © 2018 YASH Technologies | | Confidential Feel free to write to me at: in case of any queries / clarifications.