Presentation on how to chat with PDF using ChatGPT code interpreter
Â
New capabilities for modern data integration in the cloud
1.
2.
3.
4. Modern DW for BI
Modern DW for SaaS
Lift & Shift Existing SSIS Packages to Cloud
5. Business / custom apps
(Structured)
Logs, files and media
(unstructured)
Azure storage
Polybase
Azure SQL Data Warehouse
Data factory
Data factory
Azure Databricks
(Spark)
Analytical dashboards
(PowerBI)
Model & ServePrep & TrainStoreIngest Intelligence
AZURE DATA FACTORY ORCHESTRATES DATA PIPELINE ACTIVITY WORKFLOW & SCHEDULING
Azure Analysis ServicesOn Prem, Cloud
Apps & Data
6. Business / custom apps
(Structured)
Logs, files and media
(unstructured)
Azure storage
Polybase
Data factory
Data factory
Azure Databricks
(Spark)
Browser/Device
Model & ServePrep & TrainStoreIngest Intelligence
AZURE DATA FACTORY ORCHESTRATES DATA PIPELINE ACTIVITY WORKFLOW & SCHEDULING
App Storage
SaaS App
7. On-Premise data sources
SQL DB Managed Instance
SQL Server
VNET
Azure Data Factory
Cloud data sources
Cloud
On-premises
8.
9. Flexible Pipeline Model
Rich pipeline orchestration
Triggers: on-demand, schedule, event
Data Movement as a Service
Cloud, Hybrid
70+ connectors provided
SSIS Package Execution
In a managed cloud environment
Use familiar tools, SSMS & SSDT
Author & Monitor
Programmability (Python, .NET, Powershell, etc.)
Visual Tools
12. 140 MillionPassengers are stranded
annually due to IROPS
IRREGUL AR OPERATIONS (IROPS) COST US AIRLINES
8.3 BillionIn revenue
USD
250Rebooking per passenger
from a cancelled flight
USD
4,000For rebooking
crew per airline
USD
GLOBALLY Passengers most frustrated by a lack of communication
51%
USD 16.7 Billion
Estimates cost of passengersâ time
lost due to IROPS
Loss in demand due to IROPS
USD 3.9 Billion
In annual costs to Airlines and customers
USD60Billion
Airline Pain Points
Poor Communication | Lost Revenue | High Re-accommodational Costs |
Upset Customers | Brand Damage
Airline IROPS (IRREGULAR OPERATIONS)
The Billion Dollar Problem Airlines Must Solve
13. Activity 1 Activity 2
Activity 3
âOn Errorâ
Activity 1
params
params
My Pipeline 1
âŚ
My Pipeline 2
For EachâŚ
Activity 4
params
Trigger
Event
Wall Clock
On Demand
params
14. Scalable
Per job elasticity
Up to 1 GB/s
Simple
Visually author or via code (Python, .Net, etc)
Serverless, no infrastructure to manage
Access all your data
70+ connectors provided and growing (cloud, on premises, SaaS)
Data Movement as a Service: 20+ points of presence world wide
Self-hostable Integration Runtime for hybrid movement
Data Movement
15. Azure Database File Storage NoSQL Services and Apps Generic
Azure Blob Storage Amazon Redshift SQL Server Amazon S3 Couchbase Dynamics 365 Salesforce HTTP
Azure Data Lake
Store
Oracle MySQL File System Cassandra Dynamics CRM Salesforce Service
Cloud
OData
Azure SQL DB Netezza PostgreSQL FTP MongoDB SAP C4C ServiceNow ODBC
Azure SQL DW SAP BW SAP HANA SFTP Oracle CRM Hubspot
Azure Cosmos DB Google BigQuery Informix HDFS Oracle Service
Cloud
Marketo
Azure DB for
MySQL
Sybase DB2 SAP ECC Oracle Responsys
Azure DB for
PostgreSQL
Greenplum MariaDB Zendesk Oracle Eloqua
Azure Search Microsoft Access Drill Zoho CRM
Salesforce
ExactTarget
Azure Table
Storage
Hive Phoenix Amazon
Marketplace
Atlassian Jira
Azure File Storage Hbase Presto Megento Concur
Impala Spark PayPal QuickBooks Online
Vertica Shopify Xero
GE Historian Square
Web table
* Supported file formats: CSV, AVRO, ORC, Parquet, JSON
16.
17. Data Movement Performance Optimization
Copy Scenario Behind the Scenes
Loading data into Azure Cosmos DB
⢠10+ times throughput boost comparing to previous solution
⢠Scalable to utilize 100% Request Units (Rus) during ingestion
⢠Improved reliability
Loading from Azure Blob or ADLS into
SQL DW
Polybase used whenever possible. BULKINSERT otherwise
Applicable when format is text, ORC, Parquet and meets these criteria.
Loading from data sources other than
Azure Blob/ADLS into SQL DW
Use staged copy via Azure Blob
Loading data from Amazon Redshift
Use UNLOAD to copy data from Amazon Redshift to Amazon S3
20. On-premises data sources SQL Server
ď OS: Windows/Linux
ď SCALABILITY: Scale-Out feature
ď EDITION: Standard/Enterprise
ď TOOLS: SSDT/SSMS to design/deploy/
manage/execute/monitor packages
ď EXTENSIBILITY: ISVs can build
components/extensions on SSIS
ď PRICING: Bundled w/ on-prem SQL Server
21. On-premises data sources
Cloud data sources Azure SQL DB/Managed InstanceAzure Data Factory
Cloud
On-premises
SQL Server
ď LIFT & SHIFT: Use Azure SQL DB/Managed
Instance (MI) to host SSISDB
ď SCALABILITY: Use ADF to provision a
managed cluster of Azure VMs dedicated to
run your packages â Azure-SSIS Integration
Runtime (IR)
ď EDITION: Standard/Enterprise
ď TOOLS: SSDT/SSMS + ADF app to
design/deploy/manage/execute/monitor
packages (activities)
ď EXTENSIBILITY: ISVs can build
components/extensions + SaaS on SSIS in
ADF via custom setup + 3rd party licensing
ď PRICING: Pay per hour + Azure Hybrid Benefit
(AHB) to Bring Your Own License (BYOL)
22. On-premises data sources
Azure SQL DB/Managed Instance
VNet
Azure Data FactoryCloud data sources
Cloud
On-premises
SQL Server
ď HYBRID: Join Azure-SSIS IR to a VNet that is
connected to your on-prem network to
enable on-prem data access
ď MODERNIZATION: Schedule first-class SSIS
activities in ADF pipelines via SSMS and
chain/group them w/ other activities via ADF
app
ď COMPLEMENTARY: Splice/inject built-in/3rd
party SSIS tasks and transformations in ADF
pipelines
ď READINESS: Public Preview w/ 24/7 live-site
support
23.
24.
25. Azure-SSIS IR node
Container
ISV Setup1. Specify Product Key in setup script ISV Activation Server
2. Get Activation Key by submitting Cluster ID + Product Key
Local Store
(e.g. Registry)
3. Write Activation Key
SSIS Executor
ISV Extension
4. Read Activation Key and
validate it with Cluster ID
Setup
Runtime
4. Get Cluster ID
4. Report on Node Count (Optional)
SSIS Runtime
2. Activation Key
26.
27.
28.
29. Area Details
Control Flow ⢠Control Flow-Released!!
⢠Azure Data Bricks Integration-Released!!
⢠File arrival trigger-Released!!
⢠Schedule roll up
⢠20 plus activities and growing
Data Transformation Visual author data transformation executed with Spark
Data Movement ⢠More connectors (Hybrid data movement)
⢠More geographic regions
⢠Continue performance Improvements (10x increase for CosmosDB connector)-Released!!
SSIS ⢠More Region and VM Choices
⢠Enterprise Edition-Released!!
⢠Azure Hybrid Benefit (BYOL)-Released!!
⢠Custom Setup + 3rd Party Licensing/Extensibility/Ecosystem-Released!!
⢠First-Class SSIS Activities in ADF Pipelines-Released!!
Tools Visual Control flow, data movement and monitoring-Released!!
Programmatic Interfaces (.NET, Python, Powershell, Rest APIs, ARM templates)-Released!!
Source Control Integration- VSTS GIT & GitHub
Monitoring Single pane of glass to monitor & manage your pipelines
41. Select its location
ď Select the right location for your Azure-SSIS IR to achieve high
performance in ETL workflows â East US/East US 2/Central US/West
US 2/North Europe/West Europe/UK South/Australia East are
available in Public Preview, more will be provided in the future
ď It needs not be the same as the location of your ADF, but it should
be the same as the location of your own Azure SQL DB/MI server
where SSISDB will be hosted and or the location of VNet connected
to your on-prem network
ď Avoid Azure-SSIS IR accessing SSISDB/data across different
locations
42. Select its node size
ď A selection of node sizes specifying the number of cores (CPU) and
size of memory (RAM) per node is provided â
Standard_A4_v2/Standard_A8_v2/
Standard_D1_v2/Standard_D2_v2/Standard_D3_v2/Standard_D4_v2
are available in Public Preview, more will be provided in the future
ď Select a large node size (scale up), if you want to run many
compute/ memory âintensive packages
43. Select its cluster size
ď A range of 1-10 for the number of nodes per cluster is provided
ď Select a large cluster with many nodes (scale out), if you want to
run many packages in parallel
ď Like Azure VMs, you can manage the cost of running Azure-SSIS IR
by stopping/starting it, effectively releasing/reallocating its nodes,
as you see fit
44. Select its edition/license
ď Standard/Enterprise edition/license is available in Public Preview
ď Enterprise Edition lets you use advanced/premium features on your
Azure-SSIS IR
45. Select to use Azure Hybrid Benefit
(AHB)
ď AHB lets you Bring Your Own License (BYOL) to benefit from
substantial cost savings â Essentially paying only for the
infrastructure and service management fees of Azure-SSIS IR
46. Bring your own Azure SQL DB/MI
server to host SSISDB
ď A selection of your existing Azure SQL DB/MI servers in the same
location as your Azure-SSIS IR is provided â If there is none, create
a new one to ensure that your Azure-SSIS IR can easily access
SSISDB without incurring excessive traffics between different
locations
ď The endpoint of your selected Azure SQL DB/MI server becomes
the connection info to enter on SSDT/SSMS
47. Bring your own Azure SQL DB/MI
server to host SSISDB
ď If you select your existing Azure SQL DB w/ VNet service endpoints,
you must also join your Azure-SSIS IR to the same VNet, so we can
prepare and manage SSISDB on your behalf
48. Bring your own Azure SQL DB/MI
server to host SSISDB
ď If you select your existing Azure SQL MI that was provisioned
inside a VNet, you must also join your Azure-SSIS IR to the same
VNet, but in a different subnet, so we can prepare and manage
SSISDB on your behalf
49. Select to use Azure Active
Directory (AAD) authentication
ď AAD authentication w/ ADF Managed Service Identity (MSI) lets
you provision Azure-SSIS IR w/o Azure SQL DB/MI server admin
credentials â Need to add your ADF MSI into an AAD group that
has access permissions to your Azure SQL DB/MI server
50. Enter your Azure SQL DB/MI server
admin credentials
ď Your Azure SQL DB/MI server admin username and password are
required for us to prepare and manage SSISDB on your behalf
51. Select the service tier and provide
access consent for your Azure SQL
DB server
ď A selection of service tiers (e.g. Basic/S0/S1/etc.) for your Azure
SQL DB server is provided â Not applicable to Azure SQL MI
ď Select a large service tier, if you want to achieve high database
performance and store many SSIS projects/packages/execution logs
ď By default, you must allow Azure services to access your Azure SQL
DB server â Not applicable to Azure SQL MI/Azure SQL DB w/ VNet
service endpoints
ď This is required for us to prepare and manage SSISDB on your behalf
52. Test connection to your Azure SQL
DB server
ď Ensure that your Azure SQL DB server is accessible and has no
existing SSISDB for us to prepare and manage it on your behalf
53. Select the concurrency level of
your Azure-SSIS IR nodes
ď A range of 1-8 for the number of parallel executions per node is
provided
ď Select a low concurrency level, if you want to use more than one
cores to run a single large/compute-intensive package
ď Select a high concurrency level, if you want to run one or more
small/light-weight packages in a single core
54. Add custom setup for your Azure-
SSIS IR
ď Provide a Shared Access Signature (SAS) Uniform Resource
Identifier (URI) of your Azure Storage Blob container where your
setup script + its associated files are stored
55. Optionally join your Azure-SSIS IR
to a VNet and provide consent for
VNet configuration
ď If you use Azure SQL MI that was provisioned inside a VNet/Azure SQL DB
w/ VNet service endpoints, you must also join your Azure-SSIS IR to the
same VNet, so we can prepare and manage SSISDB on your behalf
ď If you have on-prem data sources/destinations in your SSIS packages, you
must join your Azure-SSIS IR to a VNet that is connected to your on-prem
network to enable on-prem data access
ď By default, you must allow Azure services to configure the
permissions/settings of selected VNet for your Azure-SSIS IR to join
56. Select VNet and subnet for your
Azure-SSIS IR to join
ď A selection of existing VNets and subnets in the location of your
Azure-SSIS IR is provided â ARM VNet is recommended, since
Classic VNet will be deprecated soon
ď Ideally, your Azure-SSIS IR, Azure SQL DB/MI server, and VNet
connected to your on-prem network should all be in the same
location
ď If that is not the case, you can first create your Azure-SSIS IR in the
location of Azure SQL DB/MI server and join it to a VNet in the same
location, then configure a VNet-to-VNet connection with the VNet
connected to your on-prem network
57. When you submit a
provisioning request, it
takes 20 â 30 minutes to
allocate/prepare the
nodes of your Azure-SSIS
IR and start billing
58. Once your Azure-SSIS IR
is started, you can deploy
SSIS packages to execute
on it and you can stop it
as you see fit
93. Add a sproc parameter
âstmtâ of type âstringâ
with your T-SQL script to
create/start SSIS package
execution using SSISDB
sprocs as its value
108. Provide the path to
identify your deployed
package in SSISDB, e.g.
FolderName/ProjectName
/PackageName.dtsx
109. Provide the path to
identify your package
execution environment in
SSISDB, e.g.
FolderName/Environment
Name, assigning values to
SSIS project/package
parameters
176. Alternatively, if you use SSIS
activity for this, specify a
sufficient number of retry
attempts that are frequent
enough to wait until your
Azure-SSIS IR is started
177.
178.
179. Enterprise Features Descriptions
CDC components ⢠CDC Source/Splitter/Control Task are preinstalled on your Azure-SSIS IR
Enterprise Edition
⢠To connect to Oracle, you will also need to install CDC Designer/Service on
another machine
Oracle connectors ⢠Oracle Connection Manager/Source/Destination are preinstalled on your Azure-
SSIS IR Enterprise Edition
⢠You will also need to install Oracle Call Interface (OCI) driver, and if necessary
configure Oracle Transport Network Substrate (TNS), on your Azure-SSIS IR (via
Custom Setup Interface)
Teradata connectors ⢠You will need to install Teradata Connection Manager/Source/Destination and
Teradata Parallel Transporter (TPT) API + Teradata ODBC driver on your Azure-
SSIS IR Enterprise Edition (via Custom Setup Interface)
180. Enterprise Features Descriptions
SAP BW connectors ⢠SAP BW Connection Manager/Source/Destination are preinstalled on your Azure-
SSIS IR Enterprise Edition
⢠You will also need to install SAP BW driver on your Azure-SSIS IR (via Custom
Setup Interface)
⢠These connectors support SAP BW 7.0 or earlier versions
⢠To connect to later versions or other SAP products, you can install SAP
connectors from 3rd party ISVs on your Azure-SSIS IR (via Custom Setup Interface)
181. Enterprise Features Descriptions
SSAS/AAS components ⢠Data Mining Model Training/Dimension Processing/Partition Processing Destinations and
Data Mining Query Transform are preinstalled on your Azure-SSIS IR Enterprise Edition
⢠All support SSAS, but only Partition Processing Destination supports AAS
⢠To connect to SSAS, you will also need to configure Windows Authentication credentials in
your SSISDB
⢠On top of these components, Analysis Services Execute DDL/Processing and Data Mining
Query Tasks are also preinstalled on your Azure-SSIS IR Standard/Enterprise Edition
Fuzzy Grouping/Lookup
Transforms
⢠They are preinstalled on your Azure-SSIS IR Enterprise Edition and support SQL
Server/Azure SQL DB for storing reference data
Term Extraction/Lookup
Transforms
⢠They are preinstalled on your Azure-SSIS IR Enterprise Edition and support SQL
Server/Azure SQL DB for storing reference data
186. ď If you want to use gacutil.exe to install assemblies in the Global Assembly
Cache (GAC), you need to provide it as part of your custom setup, or use
the copy provided in the Public Preview container
ď If you need to join your Azure-SSIS IR w/ custom setup to a VNet, only
ARM VNet is supported
199. ď âSampleâ folder that contains a
custom setup to install a
simple task that just sleeps for
a few seconds on each node of
your Azure-SSIS IR â It also
contains a âgacutilâ folder,
which contains gacutil.exe
200. ď Double click
âUserScenariosâ and you
can find
ď â.NET FRAMEWORK 3.5â folder,
that contains a custom setup
to install an earlier version of
the .NET Framework on each
node of your Azure-SSIS IR
ď âBCPâ folder that contains a
custom setup to install SQL
Server command line utilities
(MsSqlCmdLnUtils.msi),
including bulk copy program
(bcp), on each node of your
Azure-SSIS IR
ď âEXCELâ folder that contains a
custom setup to install Open
Source Excel assemblies on
each node of your Azure-SSIS
IR