SlideShare a Scribd company logo
1 of 43
Download to read offline
© 2018 YASH Technologies | www.yash.com | Confidential
Azure Data Factory
- Mahesh Pandit
2
Agenda
 Why Azure data Factory
 Introduction
 Steps involves in ADF
 ADF Components
 ADF Activities
 Linked Services
 Integration Runtime and its
types
 How Azure Data Factory works
 Azure Data Factory V1 vs V2
 System Variables
 Functions in ADF
 Expressions in ADF
 Question- Answers
3
© 2018 YASH Technologies | www.yash.com | Confidential
Why Azure data Factory
 Modern DW for BI
 Modern DW for SaaS Apps
 Lift & Shift existing SSIS
Pkgs. to Cloud
4
Why Azure Data Factory
Azure SQL
DW
Azure Data Lake
Azure Data Factory
5
Modern DW for Business Intelligence
Log, Files & Media
(Unstructured)
On Prem., Cloud
Apps & Data
Business/Custom apps
(Structures)
Data Factory
Data Factory Azure Storage Azure Databricks
Spark
Ingest Store Prep & Train Model & Serve Intelligence
Azure SQL Data warehouse
Azure Analysis services
Analytical
Dashboards
(Power BI)
Azure Data Factory orchestrates data pipeline activity work flow & scheduling
6
Modern DW for SaaS Apps
Log, Files & Media
(Unstructured)
On Prem., Cloud
Apps & Data
Business/Custom apps
(Structures)
Data Factory
Data Factory Azure Storage Azure Databricks
Spark
Ingest Store Prep & Train Model & Serve Intelligence
SaaS App Browser/Devices
Azure Data Factory orchestrates data pipeline activity work flow & scheduling
App
Storage
7
Lift & Shift existing SSIS packages to Cloud
Cloud
On Premise
On-Premise Data Sources SQL Server
Azure Data Factory orchestrates data pipeline activity work flow & scheduling
Data Factory
Cloud Data Sources
SQL DB Managed Instance
VNET
8
Introduction
Azure Data
Factory
Cloud-based
Integration Service
 It is cloud-based integration service
that allows you to create data-
driven workflows in the cloud for
orchestrating and automating data
movement and data transformation.
 Scheduled data-driven workflows.
 Sources and Destinations can be
either on-premise or cloud.
 Transformation can be done using
Azure HDInsight Hadoop, Spark,
Azure Data Lake Analytics and ML.
9
How does it work?
 The pipelines (data-driven workflows) in Azure Data Factory typically
perform the following four steps:
10
© 2018 YASH Technologies | www.yash.com | Confidential
Steps involves in ADF
 Connect and collect
 Transform and enrich
 Publish
 Monitor
11
Connect and collect
 The first step in building an information production system is to connect to all the
required sources of data and processing, such as software-as-a-service (SaaS)
services, databases, file shares, and FTP web services.
 With Data Factory, you can use the Copy Activity in a data pipeline to move data
from both on-premises and cloud source data stores to a centralization data store in
the cloud for further analysis.
 For example, you can collect data in Azure Data Lake as well in Azure Blob
storage.
12
Transform and enrich
 After data is present in a centralized data store in
the cloud, process or transform the collected data
by using compute services such as
 HDInsight Hadoop
 Spark
 Data Lake Analytics
 Machine Learning.
13
Publish
 After the raw data has been refined into a business-ready consumable form, load the data into Azure Data
Warehouse, Azure SQL Database, Azure Cosmos DB or many more as per user’s need.
14
Monitor
 After you have successfully built and deployed your data integration pipeline, providing business value from
refined data, monitor the scheduled activities and pipelines for success and failure rates.
 Azure Data Factory has built-in support for pipeline monitoring via Azure Monitor, API, PowerShell, Log
Analytics, and health panels on the Azure portal.
15
© 2018 YASH Technologies | www.yash.com | Confidential
ADF Components
 Pipeline
 Activity
 Datasets
 Linked Services
16
ADF Components
DATA SET
(Table , File)
ACTIVITY
(hive, copy)
PIPELINE
(schedule, Monitor)
LINKED SERVICE
(SQL Server, Hadoop
Cluster)
Consume
Produces Is logical
Group of
Runs on
Represent
Data item
stored in
 An Azure subscription might have one or more Azure Data Factory instances (or data factories).
 Azure Data Factory is composed of four key components.
 These components work together to provide the platform on which you can compose data-driven workflows with
steps to move and transform data.
17
Pipeline
 A data factory might have one or more pipelines.
 A pipeline is a logical grouping of activities that performs a
unit of work.
 Together, the activities in a pipeline perform a task.
 The pipeline allows you to manage the activities as a set
instead of managing each one individually.
 The activities in a pipeline can be chained together to operate
sequentially, or they can operate independently in parallel.
 To create data factory pipeline, we can use any one of the
below method:
Data Factory UI Copy Data Tool Azure Power Shell Rest
Resource Manager Template .NET Python
18
Pipeline Execution
Triggers
 Triggers represent the unit of processing that
determines when a pipeline execution needs to
be kicked off.
 There are different types of triggers for different
types of events.
Pipeline Runs
 A pipeline run is an instance of the pipeline
execution.
 Pipeline runs are typically instantiated by
passing the arguments to the parameters
that are defined in pipelines.
 The arguments can be passed manually or
within the trigger definition.
Parameters
 Parameters are key-value pairs of read-only
configuration.
 Parameters are defined in the pipeline.
 The arguments for the defined parameters are
passed during execution from the run context
that was created by a trigger or a pipeline that
was executed manually.
 Activities within the pipeline consume the
parameter values.
Control Flow
 Control flow is an orchestration of pipeline
activities that includes chaining activities in a
sequence, branching, defining parameters at
the pipeline level, and passing arguments
while invoking the pipeline on-demand or
from a trigger.
 It also includes custom-state passing and
looping containers, that is, For-each
iterators.
19
© 2018 YASH Technologies | www.yash.com | Confidential
ADF Activities
 Data Movement Activities
 Data Transformation Activities
 Control Activities
20
Activity
 Activities represent a processing step in a pipeline.
 For example, you might use a copy activity to copy data from one data store to another data store.
 Data Factory supports three types of activities:
1. Data movement activities
2. Data transformation activities
3. Control activities.
Copy Activity
Copy Activity Azure Blob
Transformation
Activity
Copy Activity
Output data
Azure SQL
Data Warehouse
BI Tool
21
Data Movement Activities
 Copy Activity in Data Factory copies data from a source data store to a sink data store.
 Data from any source can be written to any sink.
…….
22
Data Transformation Activities
 Azure Data Factory supports the following transformation activities that can be added to pipelines either
individually or chained with another activity.
Compute
Environment
Data Transformation
Activity
Compute
Environment
Data Transformation
Activity
HDInsight
HDInsight
HDInsight
HDInsight
HDInsight
Azure SQL, Azure SQL DW
OR SQL Server
Azure VM
Azure Data Lake
Analytics
Azure Batch
Azure Databricks
23
Control Activities
 The following control flow activities are supported
Execute Pipeline Activity It allows a Data Factory pipeline
to invoke another pipeline.
For Each Activity
It defines a repeating control flow in your
pipeline.
Web Activity
It can be used to call a custom
REST endpoint from a Data Factory
pipeline.
Lookup Activity
It can be used to read or look up a record/
table name/ value from any external
source.
Get Metadata Activity
It can be used to retrieve metadata
of any data in Azure Data Factory.
Until Activity
It implements Do-Until loop that is similar to Do-
Until looping structure in programming languages.
It executes a set of activities in a loop until the
condition associated with the activity evaluates to
true.
If Condition Activity
It can be used to branch based on
condition that evaluates to true or
false.
Wait Activity
When you use a Wait activity in a pipeline, the
pipeline waits for the specified period of time
before continuing with execution of
subsequent activities.
24
Linked services
 Linked services are much like connection strings, which define the connection information that's needed for Data
Factory to connect to external resources.
 A linked service defines the connection to the data source.
 For example, an Azure Storage-linked service specifies a connection string to connect to the Azure Storage
account.
 Linked services are used for two purposes in Data Factory:
 To represent a data store that includes data stores located on-premises and in the cloud. E.g. Tables, Files,
Folders or Documents
 To represent a compute resource that can host the execution of an activity. For example, the HDInsightHive
activity runs on an HDInsight Hadoop cluster.
Tables
Files
HDInsight
Apache Spark
.......
Data Stores
Compute Resources
25
Integration Runtime
 Think it as a Bridge between 2 networks.
 It is compute infrastructure which provides capabilities across different N/W environments
Data
Movement
Activity
Dispatch
SSIS Package
Execution
 Copy data across data
stores in public network
and data stores in private
network (on-premises or
virtual private network).
 It provides support for
built-in connectors, format
conversion, column
mapping and scalable data
transfer.
 This capabilities are use
when compute services such
as Azure HDInsight, Azure
Machine Learning, Azure
SQL Database, SQL Server,
and more get used for
transformation activities.
 When SSIS packages
need to be executed in the
managed Azure Compute
Environment like HDInsight
then this capabilities are
used.
26
Integration runtime types
 These three types are:
IR type Public network Private network
Azure Data movement
Activity dispatch
Self-hosted Data movement
Activity dispatch
Data movement
Activity dispatch
Azure-SSIS SSIS package
execution
SSIS package
execution
27
How Azure Data Factory Works
Integration Runtime
Integration Runtime
Integration Runtime
Dataset
Dataset
Dataset
Pipeline
Activity Activity Activity
On-Premise
SQL Server
DB
28
© 2018 YASH Technologies | www.yash.com | Confidential
Data Factory V1 vs. V2
29
Data Factory V1 vs. V2
Data Factory V1
 Datasets
 Linked Services
 Pipelines
 On-Premises Gateway
 Schedule on Dataset
availability and Pipeline
start/end Time
Data Factory V2
 Datasets
 Linked Services
 Pipelines
 Self hosted Integration
Runtime
 Schedule triggers(time or
tumbling window)
 Host and Execute SSIS
Package Parameters
 New Control Flow
Activities
30
© 2018 YASH Technologies | www.yash.com | Confidential
System Variables
 Pipeline scope
 Schedule Trigger scope
 Tumbling Window Trigger scope
31
Pipeline Scope
These system variables can be referenced anywhere in the pipeline JSON.
@pipeline().DataFactory Name of the data factory the pipeline run is running within
@pipeline().Pipeline Name of the pipeline
@pipeline().RunId ID of the specific pipeline run
@pipeline().TriggerType Type of the trigger that invoked the pipeline (Manual, Scheduler)
@pipeline().TriggerId ID of the trigger that invokes the pipeline
@pipeline().TriggerName Name of the trigger that invokes the pipeline
@pipeline().TriggerTime Time when the trigger that invoked the pipeline. The trigger time is the
actual fired time, not the scheduled time.
32
Schedule Trigger Scope
 These system variables can be referenced anywhere in the trigger JSON if the trigger is of type:
"ScheduleTrigger."
@trigger().scheduledTime
Time when the trigger was scheduled to invoke the pipeline run.
For example, for a trigger that fires every 5 min, this variable would return 2017-06-
01T22:20:00Z, 2017-06-01T22:25:00Z, 2017-06-01T22:29:00Z respectively.
@trigger().startTime
Time when the trigger actually fired to invoke the pipeline run.
For example, for a trigger that fires every 5 min, this variable might return
something like this 2017-06-01T22:20:00.4061448Z, 2017-06-
01T22:25:00.7958577Z, 2017-06-01T22:29:00.9935483Zrespectively.
33
Tumbling window Trigger Scope
 These system variables can be referenced anywhere in the trigger JSON if the trigger is of type:
"TumblingWindowTrigger“.
@trigger().outputs.windowStartTime
Start of the window when the trigger was scheduled to invoke the pipeline run. If the
tumbling window trigger has a frequency of "hourly" this would be the time at the
beginning of the hour.
@trigger().outputs.windowEndTime
End of the window when the trigger was scheduled to invoke the pipeline run. If the
tumbling window trigger has a frequency of "hourly" this would be the time at the
end of the hour.
34
© 2018 YASH Technologies | www.yash.com | Confidential
Functions in Azure
 String Functions
 Collection Functions
 Logical Functions
 Conversion Functions
 Math Functions
 Date Functions
35
String Functions
Function Description Example
concat Combines any number of strings together. concat(‘Hi’, ‘team’) : Hi team
substring Returns a subset of characters from a
string.
substring('somevalue',1,3) : ome
replace Replaces a string with a given string. replace(‘Hi team', ‘Hi', ‘Hey') : Hey team
guid Generates a globally unique string guid() : c2ecc88d-88c8-4096-912c-d6
toLower Converts a string to lowercase. toLower('Two’) : two
toUpper Converts a string to uppercase. toUpper('Two’) : TWO
indexof Find the index of a value within a string
case insensitively.
indexof(Hi team', ‘Hi’) : 0
endswith Checks if the string ends with a value case
insensitively.
endswith(‘Hi team', ‘team') : true
startswith Checks if the string starts with a value
case insensitively.
startswith(‘Hi team', ‘team') : false
split Splits the string using a separator. split(‘Hi;team', ‘;') : [“Hi", “team“]
lastindexof Find the last index of a value within a
string case insensitively.
lastindexof('foofoo‘) : 3
36
Collection Functions
Function Description Example
contains Returns true if dictionary contains a key,
list contains value, or string contains
substring. .
contains('abacaba','aca')
: true
length Returns the number of elements in an
array or string.
length('abc')
: 3
empty Returns true if object, array, or string is
empty.
empty('')
: true
intersection Returns a single array or object with the
common elements between the arrays or
objects passed to it.
intersection([1, 2, 3], [101, 2, 1, 10],[6, 8,
1, 2])
: [1, 2]
union Returns a single array or object with all of
the elements that are in either array or
object passed to it.
union([1, 2, 3], [101, 2, 1, 10])
: [1, 2, 3, 10, 101]
first Returns the first element in the array or
string passed in.
first([0,2,3])
: 0
last Returns the last element in the array or
string passed in.
last('0123')
:3
take Returns the first Count elements from the
array or string passed in
take([1, 2, 3, 4], 2)
: [1, 2]
skip Returns the elements in the array starting
at index Count,
skip([1, 2 ,3 ,4], 2)
: [3, 4]
37
Logical Functions
Function Description Example
int Convert the parameter to an integer. int('100')
: 100
string Convert the parameter to a string. string(10)
: ‘10’
json Convert the parameter to a JSON type
value.
json('[1,2,3]') : [1,2,3]
json('{"bar" : "baz"}') : { "bar" : "baz" }
float Convert the parameter argument to a
floating-point number.
float('10.333')
: 10.333
bool Convert the parameter to a Boolean. bool(0)
: false
coalesce Returns the first non-null object in the
arguments passed in. Note: an empty
string is not null.
coalesce(pipeline().parameters.paramet
er1', pipeline().parameters.parameter2
,'fallback')
: fallback
array Convert the parameter to an array. array('abc')
: ["abc"]
createArray Creates an array from the parameters. createArray('a', 'c')
: ["a", "c"]
38
Math Functions
Function Description Example
add Returns the result of the addition of the
two numbers.
add(10,10.333): 20.333
sub Returns the result of the subtraction of the
two numbers.
sub(10,10.333): -0.333
mul Returns the result of the multiplication of
the two numbers.
mul(10,10.333): 103.33
div Returns the result of the division of the
two numbers.
div(10.333,10): 1.0333
mod Returns the result of the remainder after
the division of the two numbers (modulo).
mod(10,4) :2
min There are two different patterns for calling
this function. Note, all values must be
numbers
min([0,1,2]) :0
min(0,1,2) : 0
max There are two different patterns for calling
this function. Note, all values must be
numbers
max([0,1,2]) :2
max(0,1,2) : 2
range Generates an array of integers starting
from a certain number, and you define the
length of the returned array.
range(3,4) : [3,4,5,6]
rand Generates a random integer within the
specified range
rand(-1000,1000) : 42
39
Date Functions
Function Description Example
utcnow Returns the current timestamp as a string. . utcnow()
: 2019-02-21T13:27:36Z
addseconds Adds an integer number of seconds to a
string timestamp passed in. The number of
seconds can be positive or negative.
addseconds('2015-03-15T13:27:36Z', -36)
:2015-03-15T13:27:00Z
addminutes Adds an integer number of minutes to a
string timestamp passed in. The number of
minutes can be positive or negative.
addminutes('2015-03-15T13:27:36Z', 33)
:2015-03-15T14:00:36Z
addhours Adds an integer number of hours to a string
timestamp passed in. The number of hours
can be positive or negative.
addhours('2015-03-15T13:27:36Z', 12)
:2015-03-16T01:27:36Z
adddays Adds an integer number of days to a string
timestamp passed in. The number of days
can be positive or negative.
adddays('2015-03-15T13:27:36Z', -20)
:2015-02-23T13:27:36Z
formatDateTime Returns a string in date format. formatDateTime('2015-03-15T13:27:36Z',
'o')
:2015-02-23T13:27:36Z
40
Expressions in Azure Data Factory
 JSON values in the definition can be literal or expressions that are evaluated at runtime.
E. g. "name": "value“ OR "name": "@pipeline().parameters.password“
 Expressions can appear anywhere in a JSON string value and always result in another JSON value.
 If a JSON value is an expression, the body of the expression is extracted by removing the at-sign (@).
JSON value Result
"parameters" The characters 'parameters' are returned.
"parameters[1]" The characters 'parameters[1]' are
returned.
"@@" A 1 character string that contains '@' is
returned.
" @" A 2 character string that contains ' @' is
returned.
41
A dataset with a parameter
 Suppose the BlobDataset takes a parameter named path.
 Its value is used to set a value for the folderPath property by using the following expressions:
"folderPath": "@dataset().path"
A pipeline with a parameter
 In the following example, the pipeline takes inputPath and outputPath parameters.
 The path for the parameterized blob dataset is set by using values of these parameters.
 The syntax used here is: :
"path": "@pipeline().parameters.inputPath"
42
Question- Answers
© 2018 YASH Technologies | www.yash.com | Confidential
Feel free to write to me at:
mahesh.pandit@yash.com
in case of any queries / clarifications.

More Related Content

What's hot

Microsoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMicrosoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMark Kromer
 
Azure Data Factory Data Flow
Azure Data Factory Data FlowAzure Data Factory Data Flow
Azure Data Factory Data FlowMark Kromer
 
Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...
Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...
Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...Edureka!
 
Intro to Azure Data Factory v1
Intro to Azure Data Factory v1Intro to Azure Data Factory v1
Intro to Azure Data Factory v1Eric Bragas
 
Azure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene PolonichkoAzure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene PolonichkoDimko Zhluktenko
 
Core Concepts in azure data factory
Core Concepts in azure data factoryCore Concepts in azure data factory
Core Concepts in azure data factoryBRIJESH KUMAR
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2inovex GmbH
 
Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAlex Tumanoff
 
Deep Dive into Azure Data Factory v2
Deep Dive into Azure Data Factory v2Deep Dive into Azure Data Factory v2
Deep Dive into Azure Data Factory v2Eric Bragas
 
Azure data factory
Azure data factoryAzure data factory
Azure data factoryBizTalk360
 
Azure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudAzure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudMark Kromer
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Cathrine Wilhelmsen
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure DatabricksJames Serra
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)James Serra
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overviewJames Serra
 
Azure Data Factory for Azure Data Week
Azure Data Factory for Azure Data WeekAzure Data Factory for Azure Data Week
Azure Data Factory for Azure Data WeekMark Kromer
 
Microsoft Azure - Introduction
Microsoft Azure - IntroductionMicrosoft Azure - Introduction
Microsoft Azure - IntroductionPranav Ainavolu
 
Serverless data and analytics on AWS for operations
Serverless data and analytics on AWS for operations Serverless data and analytics on AWS for operations
Serverless data and analytics on AWS for operations CloudHesive
 

What's hot (20)

Microsoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMicrosoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview Slides
 
Azure Data Factory Data Flow
Azure Data Factory Data FlowAzure Data Factory Data Flow
Azure Data Factory Data Flow
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2
 
Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...
Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...
Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...
 
Intro to Azure Data Factory v1
Intro to Azure Data Factory v1Intro to Azure Data Factory v1
Intro to Azure Data Factory v1
 
Azure Data Engineering.pptx
Azure Data Engineering.pptxAzure Data Engineering.pptx
Azure Data Engineering.pptx
 
Azure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene PolonichkoAzure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene Polonichko
 
Core Concepts in azure data factory
Core Concepts in azure data factoryCore Concepts in azure data factory
Core Concepts in azure data factory
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2
 
Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene Polonichko
 
Deep Dive into Azure Data Factory v2
Deep Dive into Azure Data Factory v2Deep Dive into Azure Data Factory v2
Deep Dive into Azure Data Factory v2
 
Azure data factory
Azure data factoryAzure data factory
Azure data factory
 
Azure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudAzure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the Cloud
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Azure Data Factory for Azure Data Week
Azure Data Factory for Azure Data WeekAzure Data Factory for Azure Data Week
Azure Data Factory for Azure Data Week
 
Microsoft Azure - Introduction
Microsoft Azure - IntroductionMicrosoft Azure - Introduction
Microsoft Azure - Introduction
 
Serverless data and analytics on AWS for operations
Serverless data and analytics on AWS for operations Serverless data and analytics on AWS for operations
Serverless data and analytics on AWS for operations
 

Similar to Azure Data Factory Introduction.pdf

Transform your data with Azure Data factory
Transform your data with Azure Data factoryTransform your data with Azure Data factory
Transform your data with Azure Data factoryPrometix Pty Ltd
 
Azure Data Factory usage at Aucfanlab
Azure Data Factory usage at AucfanlabAzure Data Factory usage at Aucfanlab
Azure Data Factory usage at AucfanlabAucfan
 
Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -Aucfan
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Lace Lofranco
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptxFedoRam1
 
Azure Data Engineer Course | Azure Data Engineer Training Hyderabad.pptx
Azure Data Engineer Course | Azure Data Engineer Training Hyderabad.pptxAzure Data Engineer Course | Azure Data Engineer Training Hyderabad.pptx
Azure Data Engineer Course | Azure Data Engineer Training Hyderabad.pptxsivavisualpath
 
Migration to Databricks - On-prem HDFS.pptx
Migration to Databricks - On-prem HDFS.pptxMigration to Databricks - On-prem HDFS.pptx
Migration to Databricks - On-prem HDFS.pptxKshitija(KJ) Gupte
 
Exploring Microsoft Azure Infrastructures
Exploring Microsoft Azure InfrastructuresExploring Microsoft Azure Infrastructures
Exploring Microsoft Azure InfrastructuresCCG
 
ADF Demo_ppt.pptx
ADF Demo_ppt.pptxADF Demo_ppt.pptx
ADF Demo_ppt.pptxvamsytaurus
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...AboutYouGmbH
 
Data Modernization_Harinath Susairaj.pptx
Data Modernization_Harinath Susairaj.pptxData Modernization_Harinath Susairaj.pptx
Data Modernization_Harinath Susairaj.pptxArunPandiyan890855
 
Alluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudAlluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudShubham Tagra
 
Enterprise Cloud Data Platforms - with Microsoft Azure
Enterprise Cloud Data Platforms - with Microsoft AzureEnterprise Cloud Data Platforms - with Microsoft Azure
Enterprise Cloud Data Platforms - with Microsoft AzureKhalid Salama
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overviewvhrocca
 
Comprehensive Guide for Microsoft Fabric to Master Data Analytics
Comprehensive Guide for Microsoft Fabric to Master Data AnalyticsComprehensive Guide for Microsoft Fabric to Master Data Analytics
Comprehensive Guide for Microsoft Fabric to Master Data AnalyticsSparity1
 
Azure Data Factory for Redmond SQL PASS UG Sept 2018
Azure Data Factory for Redmond SQL PASS UG Sept 2018Azure Data Factory for Redmond SQL PASS UG Sept 2018
Azure Data Factory for Redmond SQL PASS UG Sept 2018Mark Kromer
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSAWS User Group Kochi
 
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Trivadis
 

Similar to Azure Data Factory Introduction.pdf (20)

adf.docx
adf.docxadf.docx
adf.docx
 
Transform your data with Azure Data factory
Transform your data with Azure Data factoryTransform your data with Azure Data factory
Transform your data with Azure Data factory
 
Azure Data Factory usage at Aucfanlab
Azure Data Factory usage at AucfanlabAzure Data Factory usage at Aucfanlab
Azure Data Factory usage at Aucfanlab
 
Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
 
Azure Data Engineer Course | Azure Data Engineer Training Hyderabad.pptx
Azure Data Engineer Course | Azure Data Engineer Training Hyderabad.pptxAzure Data Engineer Course | Azure Data Engineer Training Hyderabad.pptx
Azure Data Engineer Course | Azure Data Engineer Training Hyderabad.pptx
 
Migration to Databricks - On-prem HDFS.pptx
Migration to Databricks - On-prem HDFS.pptxMigration to Databricks - On-prem HDFS.pptx
Migration to Databricks - On-prem HDFS.pptx
 
Exploring Microsoft Azure Infrastructures
Exploring Microsoft Azure InfrastructuresExploring Microsoft Azure Infrastructures
Exploring Microsoft Azure Infrastructures
 
ADF Demo_ppt.pptx
ADF Demo_ppt.pptxADF Demo_ppt.pptx
ADF Demo_ppt.pptx
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
 
Data Modernization_Harinath Susairaj.pptx
Data Modernization_Harinath Susairaj.pptxData Modernization_Harinath Susairaj.pptx
Data Modernization_Harinath Susairaj.pptx
 
Alluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudAlluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the Cloud
 
Enterprise Cloud Data Platforms - with Microsoft Azure
Enterprise Cloud Data Platforms - with Microsoft AzureEnterprise Cloud Data Platforms - with Microsoft Azure
Enterprise Cloud Data Platforms - with Microsoft Azure
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overview
 
Comprehensive Guide for Microsoft Fabric to Master Data Analytics
Comprehensive Guide for Microsoft Fabric to Master Data AnalyticsComprehensive Guide for Microsoft Fabric to Master Data Analytics
Comprehensive Guide for Microsoft Fabric to Master Data Analytics
 
Azure Data Factory for Redmond SQL PASS UG Sept 2018
Azure Data Factory for Redmond SQL PASS UG Sept 2018Azure Data Factory for Redmond SQL PASS UG Sept 2018
Azure Data Factory for Redmond SQL PASS UG Sept 2018
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
 
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
 

Recently uploaded

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 

Recently uploaded (20)

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 

Azure Data Factory Introduction.pdf

  • 1. © 2018 YASH Technologies | www.yash.com | Confidential Azure Data Factory - Mahesh Pandit
  • 2. 2 Agenda  Why Azure data Factory  Introduction  Steps involves in ADF  ADF Components  ADF Activities  Linked Services  Integration Runtime and its types  How Azure Data Factory works  Azure Data Factory V1 vs V2  System Variables  Functions in ADF  Expressions in ADF  Question- Answers
  • 3. 3 © 2018 YASH Technologies | www.yash.com | Confidential Why Azure data Factory  Modern DW for BI  Modern DW for SaaS Apps  Lift & Shift existing SSIS Pkgs. to Cloud
  • 4. 4 Why Azure Data Factory Azure SQL DW Azure Data Lake Azure Data Factory
  • 5. 5 Modern DW for Business Intelligence Log, Files & Media (Unstructured) On Prem., Cloud Apps & Data Business/Custom apps (Structures) Data Factory Data Factory Azure Storage Azure Databricks Spark Ingest Store Prep & Train Model & Serve Intelligence Azure SQL Data warehouse Azure Analysis services Analytical Dashboards (Power BI) Azure Data Factory orchestrates data pipeline activity work flow & scheduling
  • 6. 6 Modern DW for SaaS Apps Log, Files & Media (Unstructured) On Prem., Cloud Apps & Data Business/Custom apps (Structures) Data Factory Data Factory Azure Storage Azure Databricks Spark Ingest Store Prep & Train Model & Serve Intelligence SaaS App Browser/Devices Azure Data Factory orchestrates data pipeline activity work flow & scheduling App Storage
  • 7. 7 Lift & Shift existing SSIS packages to Cloud Cloud On Premise On-Premise Data Sources SQL Server Azure Data Factory orchestrates data pipeline activity work flow & scheduling Data Factory Cloud Data Sources SQL DB Managed Instance VNET
  • 8. 8 Introduction Azure Data Factory Cloud-based Integration Service  It is cloud-based integration service that allows you to create data- driven workflows in the cloud for orchestrating and automating data movement and data transformation.  Scheduled data-driven workflows.  Sources and Destinations can be either on-premise or cloud.  Transformation can be done using Azure HDInsight Hadoop, Spark, Azure Data Lake Analytics and ML.
  • 9. 9 How does it work?  The pipelines (data-driven workflows) in Azure Data Factory typically perform the following four steps:
  • 10. 10 © 2018 YASH Technologies | www.yash.com | Confidential Steps involves in ADF  Connect and collect  Transform and enrich  Publish  Monitor
  • 11. 11 Connect and collect  The first step in building an information production system is to connect to all the required sources of data and processing, such as software-as-a-service (SaaS) services, databases, file shares, and FTP web services.  With Data Factory, you can use the Copy Activity in a data pipeline to move data from both on-premises and cloud source data stores to a centralization data store in the cloud for further analysis.  For example, you can collect data in Azure Data Lake as well in Azure Blob storage.
  • 12. 12 Transform and enrich  After data is present in a centralized data store in the cloud, process or transform the collected data by using compute services such as  HDInsight Hadoop  Spark  Data Lake Analytics  Machine Learning.
  • 13. 13 Publish  After the raw data has been refined into a business-ready consumable form, load the data into Azure Data Warehouse, Azure SQL Database, Azure Cosmos DB or many more as per user’s need.
  • 14. 14 Monitor  After you have successfully built and deployed your data integration pipeline, providing business value from refined data, monitor the scheduled activities and pipelines for success and failure rates.  Azure Data Factory has built-in support for pipeline monitoring via Azure Monitor, API, PowerShell, Log Analytics, and health panels on the Azure portal.
  • 15. 15 © 2018 YASH Technologies | www.yash.com | Confidential ADF Components  Pipeline  Activity  Datasets  Linked Services
  • 16. 16 ADF Components DATA SET (Table , File) ACTIVITY (hive, copy) PIPELINE (schedule, Monitor) LINKED SERVICE (SQL Server, Hadoop Cluster) Consume Produces Is logical Group of Runs on Represent Data item stored in  An Azure subscription might have one or more Azure Data Factory instances (or data factories).  Azure Data Factory is composed of four key components.  These components work together to provide the platform on which you can compose data-driven workflows with steps to move and transform data.
  • 17. 17 Pipeline  A data factory might have one or more pipelines.  A pipeline is a logical grouping of activities that performs a unit of work.  Together, the activities in a pipeline perform a task.  The pipeline allows you to manage the activities as a set instead of managing each one individually.  The activities in a pipeline can be chained together to operate sequentially, or they can operate independently in parallel.  To create data factory pipeline, we can use any one of the below method: Data Factory UI Copy Data Tool Azure Power Shell Rest Resource Manager Template .NET Python
  • 18. 18 Pipeline Execution Triggers  Triggers represent the unit of processing that determines when a pipeline execution needs to be kicked off.  There are different types of triggers for different types of events. Pipeline Runs  A pipeline run is an instance of the pipeline execution.  Pipeline runs are typically instantiated by passing the arguments to the parameters that are defined in pipelines.  The arguments can be passed manually or within the trigger definition. Parameters  Parameters are key-value pairs of read-only configuration.  Parameters are defined in the pipeline.  The arguments for the defined parameters are passed during execution from the run context that was created by a trigger or a pipeline that was executed manually.  Activities within the pipeline consume the parameter values. Control Flow  Control flow is an orchestration of pipeline activities that includes chaining activities in a sequence, branching, defining parameters at the pipeline level, and passing arguments while invoking the pipeline on-demand or from a trigger.  It also includes custom-state passing and looping containers, that is, For-each iterators.
  • 19. 19 © 2018 YASH Technologies | www.yash.com | Confidential ADF Activities  Data Movement Activities  Data Transformation Activities  Control Activities
  • 20. 20 Activity  Activities represent a processing step in a pipeline.  For example, you might use a copy activity to copy data from one data store to another data store.  Data Factory supports three types of activities: 1. Data movement activities 2. Data transformation activities 3. Control activities. Copy Activity Copy Activity Azure Blob Transformation Activity Copy Activity Output data Azure SQL Data Warehouse BI Tool
  • 21. 21 Data Movement Activities  Copy Activity in Data Factory copies data from a source data store to a sink data store.  Data from any source can be written to any sink. …….
  • 22. 22 Data Transformation Activities  Azure Data Factory supports the following transformation activities that can be added to pipelines either individually or chained with another activity. Compute Environment Data Transformation Activity Compute Environment Data Transformation Activity HDInsight HDInsight HDInsight HDInsight HDInsight Azure SQL, Azure SQL DW OR SQL Server Azure VM Azure Data Lake Analytics Azure Batch Azure Databricks
  • 23. 23 Control Activities  The following control flow activities are supported Execute Pipeline Activity It allows a Data Factory pipeline to invoke another pipeline. For Each Activity It defines a repeating control flow in your pipeline. Web Activity It can be used to call a custom REST endpoint from a Data Factory pipeline. Lookup Activity It can be used to read or look up a record/ table name/ value from any external source. Get Metadata Activity It can be used to retrieve metadata of any data in Azure Data Factory. Until Activity It implements Do-Until loop that is similar to Do- Until looping structure in programming languages. It executes a set of activities in a loop until the condition associated with the activity evaluates to true. If Condition Activity It can be used to branch based on condition that evaluates to true or false. Wait Activity When you use a Wait activity in a pipeline, the pipeline waits for the specified period of time before continuing with execution of subsequent activities.
  • 24. 24 Linked services  Linked services are much like connection strings, which define the connection information that's needed for Data Factory to connect to external resources.  A linked service defines the connection to the data source.  For example, an Azure Storage-linked service specifies a connection string to connect to the Azure Storage account.  Linked services are used for two purposes in Data Factory:  To represent a data store that includes data stores located on-premises and in the cloud. E.g. Tables, Files, Folders or Documents  To represent a compute resource that can host the execution of an activity. For example, the HDInsightHive activity runs on an HDInsight Hadoop cluster. Tables Files HDInsight Apache Spark ....... Data Stores Compute Resources
  • 25. 25 Integration Runtime  Think it as a Bridge between 2 networks.  It is compute infrastructure which provides capabilities across different N/W environments Data Movement Activity Dispatch SSIS Package Execution  Copy data across data stores in public network and data stores in private network (on-premises or virtual private network).  It provides support for built-in connectors, format conversion, column mapping and scalable data transfer.  This capabilities are use when compute services such as Azure HDInsight, Azure Machine Learning, Azure SQL Database, SQL Server, and more get used for transformation activities.  When SSIS packages need to be executed in the managed Azure Compute Environment like HDInsight then this capabilities are used.
  • 26. 26 Integration runtime types  These three types are: IR type Public network Private network Azure Data movement Activity dispatch Self-hosted Data movement Activity dispatch Data movement Activity dispatch Azure-SSIS SSIS package execution SSIS package execution
  • 27. 27 How Azure Data Factory Works Integration Runtime Integration Runtime Integration Runtime Dataset Dataset Dataset Pipeline Activity Activity Activity On-Premise SQL Server DB
  • 28. 28 © 2018 YASH Technologies | www.yash.com | Confidential Data Factory V1 vs. V2
  • 29. 29 Data Factory V1 vs. V2 Data Factory V1  Datasets  Linked Services  Pipelines  On-Premises Gateway  Schedule on Dataset availability and Pipeline start/end Time Data Factory V2  Datasets  Linked Services  Pipelines  Self hosted Integration Runtime  Schedule triggers(time or tumbling window)  Host and Execute SSIS Package Parameters  New Control Flow Activities
  • 30. 30 © 2018 YASH Technologies | www.yash.com | Confidential System Variables  Pipeline scope  Schedule Trigger scope  Tumbling Window Trigger scope
  • 31. 31 Pipeline Scope These system variables can be referenced anywhere in the pipeline JSON. @pipeline().DataFactory Name of the data factory the pipeline run is running within @pipeline().Pipeline Name of the pipeline @pipeline().RunId ID of the specific pipeline run @pipeline().TriggerType Type of the trigger that invoked the pipeline (Manual, Scheduler) @pipeline().TriggerId ID of the trigger that invokes the pipeline @pipeline().TriggerName Name of the trigger that invokes the pipeline @pipeline().TriggerTime Time when the trigger that invoked the pipeline. The trigger time is the actual fired time, not the scheduled time.
  • 32. 32 Schedule Trigger Scope  These system variables can be referenced anywhere in the trigger JSON if the trigger is of type: "ScheduleTrigger." @trigger().scheduledTime Time when the trigger was scheduled to invoke the pipeline run. For example, for a trigger that fires every 5 min, this variable would return 2017-06- 01T22:20:00Z, 2017-06-01T22:25:00Z, 2017-06-01T22:29:00Z respectively. @trigger().startTime Time when the trigger actually fired to invoke the pipeline run. For example, for a trigger that fires every 5 min, this variable might return something like this 2017-06-01T22:20:00.4061448Z, 2017-06- 01T22:25:00.7958577Z, 2017-06-01T22:29:00.9935483Zrespectively.
  • 33. 33 Tumbling window Trigger Scope  These system variables can be referenced anywhere in the trigger JSON if the trigger is of type: "TumblingWindowTrigger“. @trigger().outputs.windowStartTime Start of the window when the trigger was scheduled to invoke the pipeline run. If the tumbling window trigger has a frequency of "hourly" this would be the time at the beginning of the hour. @trigger().outputs.windowEndTime End of the window when the trigger was scheduled to invoke the pipeline run. If the tumbling window trigger has a frequency of "hourly" this would be the time at the end of the hour.
  • 34. 34 © 2018 YASH Technologies | www.yash.com | Confidential Functions in Azure  String Functions  Collection Functions  Logical Functions  Conversion Functions  Math Functions  Date Functions
  • 35. 35 String Functions Function Description Example concat Combines any number of strings together. concat(‘Hi’, ‘team’) : Hi team substring Returns a subset of characters from a string. substring('somevalue',1,3) : ome replace Replaces a string with a given string. replace(‘Hi team', ‘Hi', ‘Hey') : Hey team guid Generates a globally unique string guid() : c2ecc88d-88c8-4096-912c-d6 toLower Converts a string to lowercase. toLower('Two’) : two toUpper Converts a string to uppercase. toUpper('Two’) : TWO indexof Find the index of a value within a string case insensitively. indexof(Hi team', ‘Hi’) : 0 endswith Checks if the string ends with a value case insensitively. endswith(‘Hi team', ‘team') : true startswith Checks if the string starts with a value case insensitively. startswith(‘Hi team', ‘team') : false split Splits the string using a separator. split(‘Hi;team', ‘;') : [“Hi", “team“] lastindexof Find the last index of a value within a string case insensitively. lastindexof('foofoo‘) : 3
  • 36. 36 Collection Functions Function Description Example contains Returns true if dictionary contains a key, list contains value, or string contains substring. . contains('abacaba','aca') : true length Returns the number of elements in an array or string. length('abc') : 3 empty Returns true if object, array, or string is empty. empty('') : true intersection Returns a single array or object with the common elements between the arrays or objects passed to it. intersection([1, 2, 3], [101, 2, 1, 10],[6, 8, 1, 2]) : [1, 2] union Returns a single array or object with all of the elements that are in either array or object passed to it. union([1, 2, 3], [101, 2, 1, 10]) : [1, 2, 3, 10, 101] first Returns the first element in the array or string passed in. first([0,2,3]) : 0 last Returns the last element in the array or string passed in. last('0123') :3 take Returns the first Count elements from the array or string passed in take([1, 2, 3, 4], 2) : [1, 2] skip Returns the elements in the array starting at index Count, skip([1, 2 ,3 ,4], 2) : [3, 4]
  • 37. 37 Logical Functions Function Description Example int Convert the parameter to an integer. int('100') : 100 string Convert the parameter to a string. string(10) : ‘10’ json Convert the parameter to a JSON type value. json('[1,2,3]') : [1,2,3] json('{"bar" : "baz"}') : { "bar" : "baz" } float Convert the parameter argument to a floating-point number. float('10.333') : 10.333 bool Convert the parameter to a Boolean. bool(0) : false coalesce Returns the first non-null object in the arguments passed in. Note: an empty string is not null. coalesce(pipeline().parameters.paramet er1', pipeline().parameters.parameter2 ,'fallback') : fallback array Convert the parameter to an array. array('abc') : ["abc"] createArray Creates an array from the parameters. createArray('a', 'c') : ["a", "c"]
  • 38. 38 Math Functions Function Description Example add Returns the result of the addition of the two numbers. add(10,10.333): 20.333 sub Returns the result of the subtraction of the two numbers. sub(10,10.333): -0.333 mul Returns the result of the multiplication of the two numbers. mul(10,10.333): 103.33 div Returns the result of the division of the two numbers. div(10.333,10): 1.0333 mod Returns the result of the remainder after the division of the two numbers (modulo). mod(10,4) :2 min There are two different patterns for calling this function. Note, all values must be numbers min([0,1,2]) :0 min(0,1,2) : 0 max There are two different patterns for calling this function. Note, all values must be numbers max([0,1,2]) :2 max(0,1,2) : 2 range Generates an array of integers starting from a certain number, and you define the length of the returned array. range(3,4) : [3,4,5,6] rand Generates a random integer within the specified range rand(-1000,1000) : 42
  • 39. 39 Date Functions Function Description Example utcnow Returns the current timestamp as a string. . utcnow() : 2019-02-21T13:27:36Z addseconds Adds an integer number of seconds to a string timestamp passed in. The number of seconds can be positive or negative. addseconds('2015-03-15T13:27:36Z', -36) :2015-03-15T13:27:00Z addminutes Adds an integer number of minutes to a string timestamp passed in. The number of minutes can be positive or negative. addminutes('2015-03-15T13:27:36Z', 33) :2015-03-15T14:00:36Z addhours Adds an integer number of hours to a string timestamp passed in. The number of hours can be positive or negative. addhours('2015-03-15T13:27:36Z', 12) :2015-03-16T01:27:36Z adddays Adds an integer number of days to a string timestamp passed in. The number of days can be positive or negative. adddays('2015-03-15T13:27:36Z', -20) :2015-02-23T13:27:36Z formatDateTime Returns a string in date format. formatDateTime('2015-03-15T13:27:36Z', 'o') :2015-02-23T13:27:36Z
  • 40. 40 Expressions in Azure Data Factory  JSON values in the definition can be literal or expressions that are evaluated at runtime. E. g. "name": "value“ OR "name": "@pipeline().parameters.password“  Expressions can appear anywhere in a JSON string value and always result in another JSON value.  If a JSON value is an expression, the body of the expression is extracted by removing the at-sign (@). JSON value Result "parameters" The characters 'parameters' are returned. "parameters[1]" The characters 'parameters[1]' are returned. "@@" A 1 character string that contains '@' is returned. " @" A 2 character string that contains ' @' is returned.
  • 41. 41 A dataset with a parameter  Suppose the BlobDataset takes a parameter named path.  Its value is used to set a value for the folderPath property by using the following expressions: "folderPath": "@dataset().path" A pipeline with a parameter  In the following example, the pipeline takes inputPath and outputPath parameters.  The path for the parameterized blob dataset is set by using values of these parameters.  The syntax used here is: : "path": "@pipeline().parameters.inputPath"
  • 43. © 2018 YASH Technologies | www.yash.com | Confidential Feel free to write to me at: mahesh.pandit@yash.com in case of any queries / clarifications.