SlideShare a Scribd company logo
SKILLWISE-SSIS DESIGN PATTERN
FOR
DATA WAREHOUSING
SSIS Design Patterns for Data
Warehousing and Change Data
Capture
What is a Design Pattern?
• Pattern –A design for a package that solves a certain
scenario
• Over time certain SSIS logic flows have emerged as
best practices
• These designs have been classified into patterns for
reference purposes
• Standard Design Patterns
– Learn from others
– Common patterns make it easy for new personnel to
understand and work with
– Easy to apply in new projects
Design Patterns and Data
Warehousing
• SSIS most commonly used in Data Warehousing
• Patterns in this course most commonly used in
Data Warehousing
• Applicable to non DW projects
• Definitions
– Type 1 –Dimension updates simply overwrite pre-
existing values
– Type 2 –Each update to a dimension causes a new
record to be created
– Fact –Records the measures for a transaction and
associates with dimensions
What You Need?
• SQL Server Data Tools –BI Components
– For SQL Server 2012 use Visual Studio 2012
– For SQL Server 2014 use Visual Studio 2013
• SQL Server Data Tools –Database Project –SQL Server 2012
– Uses Visual Studio 2012
• SQL Server Data Tools –Database Project –SQL Server 2014
– Included in Visual Studio 2013 Community Edition
– Included in other versions of VS 2013 out of box
– Make sure to install update 4
Versions of SQL Server
• We will use SQL Server 2014 Project Deployment Mode
• Material works identically in 2012 (Project Deployment
Mode)
– Package Deployment Mode for 2012/2014 requires older
style configurations for Master/Child
• Patterns applicable in 2008R2 & 2008 with limitations
– CDC has to be manually implemented, no controls in SSIS
Toolbox
• Master / Child works differently –uses configurations
• Limited applicability to SQL Server 2005
– No Hashbytes
– No Merge
– No CDC
– Master / Child works differently –uses configurations
Deploying the Test Database
• Before running project you will need to
deploy and setup the test database
• Uses SSDT Database Project as part of the
solution
• Deploy a database
• After deploy run stored procedure
DDL.CreateAllObjects
The 13 Patterns• Truncate and Load
• SCD Wizard
– Type 1
– Type 2
• Set Based Updates
– Type 1
– Type 2
• Hashbytes
– Different Databases
– Same Database
• Change Data Capture
• Merge
• Date Based
• Fact Table Pattern
• Master / Child
– Basic
– Passing Parameters
– Load Balancing
Truncate and Load
• Deletes all rows in target, then
completely reloads from source
• Commonly used in staging environments,
often with other patterns
• Pros
– Simple to implement
– Fast for small to medium datasets
• Cons
– No change tracking
– Slower for large datasets
SCD Wizard Type 1
• SCD (Slowly Changing Dimension) Wizard
• Pattern for tables with Type 1 attributes only
• Pros
– Easy to create
– Good for very very small updates
• Cons
– When something changes all SCD generated
components must be deleted and recreated.
– Incredibly slow.
– Did we mention it is slow?
– It is really really slow.
SCD Wizard Type 2
• SCD (Slowly Changing Dimension) Wizard
• Pattern for tables with Type 1 and 2 attributes
• Wizard is same for both patterns, just different
options
• Pros
– Easy to create
– Good for very small updates
• Cons
– When something changes all SCD generated
components must be deleted and recreated.
– Incredibly slow.
– It didn’t get any faster since the last section
Set Based Updates –Type 1
• Set Based Updates –Type 1
• Pros
– Scales well
– Runs fast
• Cons
– Requires extra tables in the database
– Requires more setup work in the
package
Set Based Updates –Type 2
• Set Based Updates –Type 2
• Pros
– Scales well
– Runs fast
• Cons
– Logic is somewhat complex
– Requires extra tables in the database
Hashbytes –Different Databases
• Uses the Hashbytes function to generate a
unique value for comparisons
• Pros
– Good for tables with many columns
– Scales well -fast
• Cons
– Requires use of lookups –caching requires memory
– Requires concatenation of all data columns in select
statement
Hashbytes –Same Database
• Uses Hashbytes with a Merge Join
• Pros
– Avoids use of lookups, lowers memory
requirements
– Scales very well
– Will work on different databases but most
efficient in a single database
• Cons
– Requires data sources to be sorted
– Requires common key to sort on
– Needs to concatenate data columns for
Hashbytes
Change Data Capture
• Lets SQL Server track which rows in source have
changed
• Pros
– Tracks changes to data
– Only read rows which have changed
– Easy to determine Create / Update / Delete actions
• Cons
– Only works with SQL Server
– Requires setup work in database and tables before it
can be used
– Must have ability to alter the source system
Merge
• Uses SQL Server MERGE statement
• Pros
– Simple to implement
– Very fast
• Cons
– No transformations
– No ability to track progress
Date Based
• Uses date driven values to determine
changes
• Pros
– Easy to determine changes to rows
– Reduces number of rows that are read
– Can be combined with any of the other patterns
• Cons
– Requires source system to have a reliable date
field indicating changes
– Still requires logic to determine new rows vs
updates
Fact Table Pattern
• Used to update metrics /
measurements in data warehouse
• Pros
– Common pattern
– Easy to implement
• Cons
– Can require many lookups
– Updates not always simple
Master / Child (Basic)
• A master (parent) package which coordinates
the execution of other packages (children)
• Pros
– Simple to implement
• Cons
– Not always efficient when many packages are
involved
Master / Child (Parameters)
• Passing values from master package to children
• Pros
– SQL Server 2012 / 2014 project deployment mode
make it very easy to pass values
– Easy to reuse values across multiple child packages
• Cons
– In package deployment mode, or SQL Server 2008R2
and previous requires the more complex
configurations
Master/Child (Load Balanced)
• Uses a table to drive package execution
• Pros
– Easy to alter execution –just update a table
– Can easily balance parallel execution of
packages
• Cons
– Needs many variables
– Requires some manual effort and
monitoring to effectively balance
Choosing a Pattern
• Truncate and Load
– Low to moderate number of rows
– No requirement to track changes
– Good for staging tables
• SCD Wizard, Type 1 & 2
– Very small number of rows (< 2000)
– Packages that won’t change
– There is almost always a better pattern
Choosing a Pattern
• Set Based Updates, Type 1 & 2
– Scales well
– Good for limited number of columns
– Extra ram required
• Hashbytes
– Scales well
– Good for large number of columns
– Source systems needs to implement a form of
the Hashbytes function
• Change Data Capture
– Excellent pattern –SQL Server tells you all
changes
– Data source must be SQL Server
Choosing a Pattern
• Merge
– Good for very simple ETL when no
monitoring is required
• Date Based
– Limits number of rows read in
– Can be combined with other patterns
• Fact Table Pattern
• Master / Child
– Basic
– Passing Parameters
– Load Balancing
CHANGE DATA CAPTURE
Introduction
• Many applications have requirements for
identifying data changes that have occurred in a
database for various reasons
– Tracking historical changes to data
– Auditing changes to data
– Synchronizing data changes across disconnected
systems
– Implementing an Operational Data Store (ODS)
– Incremental data loading for a Data Warehouse
• SQL Server provides many techniques for tracking
changes to data
Techniques for Identifying Changes
• DML table triggers
– Can track before and after row state and deleted rows
– Can be customized to include the user, modification
time, or input buffer
– Can introduce significant performance overhead on
transactional systems
• Modified datetime or timestamp column
– Can introduce performance overhead to pull changes
– Does not track deleted rows
– Requires schema modifications to source tables and
code to set value
Techniques for Identifying Changes
• Data comparisons
– Comparing source and destination data requires
scanning all rows to
– determine changes and introduces significant
performance overhead
• Replication and Subscriber triggers
– Offloads change identification to subscription
database
– Requires customizations and manual management
of schema changes
Change Data Capture Solution
• Change Data Capture provides information about DML
changes to a table in near real-time using the same Log
Reader Agent as transactional replication
– Eliminates expensive techniques that require schema
modifications
• DML triggers
• Timestamp columns
• Data comparisons and complex JOIN queries
• May be used to answer a variety of critical questions:
– What are all of the changes that happened to a table since
the last ETL?
– Which columns changed?
– What type of changes occurred? INSERT/UPDATE/DELETE?
– What was the before image of a row that was modified?
• Supports net change identification, with a performance
trade-off due to an additional index on change tables
Configuring Change Data Capture
• Change Data Capture provides the ability to
capture the row data from DML changes to a
database when enabled for capture
• Configuring Change Data Capture has specific
requirements, which when met allows individual
tables to be configured for change capture
• The options for configuring a table for change
capture affect performance, the data collected,
and security controls for accessing the capture
tables
Requirements to Enable CDC
• Enterprise Edition feature only
• Enabling CDC for a database requires
sysadmin privileges
– Requires executing sp_cdc_enable_db
• Enabling CDC for a table requires
db_owner privileges in the database
– Requires executing sp_cdc_enable_table for
each table that will track changes within the
database
• Querying the results from the CDC tables
requires membership within the database
role specified in the sp_cdc_enable_table
procedure call if specified
sp_cdc_enable_table Options
• @source_schema – the name of the source table schema
(required)
• @source_name – the name of the source table (required)
• @role_name – the name of the database security role used for
gaiting access to the change data (required but can explicitly be
set to NULL)
• @capture_instance – the name of the capture instance
(optional)
• @supports_net_changes – indicates whether querying for net
changes is supported by the capture instance (optional defaults
1)
– Enabling net change support adds an additional non-clustered index to
the capture table, which can impact insert performance for change
rows
• @index_name – the name of a unique index to uniquely
identify rows in the source table (optional defaults to primary
key)
sp_cdc_enable_table Options
• @captured_column_list – the list of source table
columns to include in the change table (optional)
• @filegroup_name – the filegroup to be used for the
change table (optional)
• @allow_partition_switch – indicates whether the
SWITCH PARTITION command of ALTER TABLE can
be executed against a table that is enabled for CDC
(optional)
– Switching a partition into a CDC enabled table does
not generate INSERT change data for rows that
previously existed in the partition prior to the switch
– Switching a partition out of a CDC enabled table does
not generate DELETE change data for the rows
contained within the partition
QUERYING CHANGE DATA USING
T-SQL
Introduction
• After enabling a database for Change Data
Capture and configuring capture instances for the
source tables, the change data must be queried
for processing
• All change rows are identified by the Log
Sequence Number (LSN) associated with the
transaction that changed the row
• Change tables include internal metadata columns
that describe the change row as well as the
captured columns configured for the table
Finding Change Table
Metadata
• Stored Procedures
– sys.sp_cdc_help_change_data_capture –
returns the CDC capture information for each
table enabled within a database
• May return up to two rows per table, one for each
capture instance
• The @source_schema parameter specifies the
source schema to return results for when the
procedure executes
• The @source_name parameter specifies the source
table to return results for when the procedure
executes
– sys.sp_cdc_get_captured_columns – returns
the captured columns for the capture instance
specified by the @capture_instance parameter
Finding Change Table
Metadata
• System tables
– cdc.change_tables – contains up to two
rows, one per capture instance enabled
on a source table
– cdc.captured_columns – contains one
row per captured column for a source
capture instance
• Querying system tables is not
recommended, and using the stored
procedures is recommended instead
Understanding Change Table
Columns• Change tables exist within the CDC schema and are named
<capture_instance>_CT
• The first five columns are metadata columns:
– __$start_lsn – the starting LSN of the transaction
– __$end_lsn – the ending LSN of the transaction
– __$seqval – the sequence or order of the row changes within a
transaction
– __$operation – the type of operation reflected by the change row
• 1 = Delete
• 2 = Insert
• 3 = Value before update
• 4 = Value after update
– __$update_mask – bitmask of columns changed by operation within
row
• Remaining columns match the source table column definition
when the capture instance was created
Determining Change Rows to
Process
• By LSN:
– sys.fn_cdc_map_time_to_lsn ( '<relational_operator>',
tracking_time )
• Returns the LSN value from the start_lsn column of the
cdc.lsn_time_mapping system table for the tracking_time specified
• The relational_operator specifies the comparison to be applied against
the tran_end_time of the cdc_lsn_time_mapping table when
determining the LSN value to return
– largest less than, largest less than or equal, smallest greater than, or
smallest greater than or equal
– sys.fn_cdc_get_min_lsn ( 'capture_instance_name' )
• Returns the start_lsn value for the capture instance from
cdc.change_tables
• Sets the lower endpoint for change data for a given capture instance
– sys.fn_cdc_get_max_lsn ()
• Returns the maximum start_lsn column value from
cdc.lsn_time_mapping system table setting the upper endpoint for all
capture instances
Determining Change Rows to
Process• By LSN:
– Custom tracking table updated by application code to track
the capture instance name and last processed LSN
– sys.fn_cdc_decrement_lsn ( lsn_value )
• Returns the previous LSN in the sequence based upon the
specified LSN
• Often used to decrement sys.fn_cdc_get_max_lsn () value to set
upper endpoint without overlapping LSNs across different data
loads
– sys.fn_cdc_increment_lsn ( lsn_value )
• Returns the next LSN in the sequence based upon the specified
LSN
• Often used to increment the last saved LSN from a custom
tracking table to set a new lower endpoint without overlapping
LSNs across different data loads
• By time:
– sys.fn_cdc_map_lsn_to_time ( lsn_value )
• Returns the tran_end_time column from cdc.lsn_time_mapping
for the specified LSN, allowing
Change Row Table-Valued
Functions
• cdc.fn_cdc_get_all_changes_<capture_ins
tance>
– Returns one row for each modification
applied to the source table within the
specified LSN range
– Multiple modifications of a source row
within the LSN range will be represented
individually in the result set
• cdc.fn_cdc_get_net_changes_<capture_in
stance>
– Returns a single net change row for each
source row modified within the specified
LSN range
Determining Whether a Column
Changed
• sys.fn_cdc_get_column_ordinal (
'capture_instance', 'column_name‘ )
– Return s the ordinal position of a column name
within the specified capture instances update mask
• sys.fn_cdc_is_bit_set ( position, update_mask )
– Checks the specified ordinal position of the update
mask to determine if the change bit is set
• sys.fn_cdc_has_column_changed (
'capture_instance','column_name‘ , update_mask )
– Identifies whether the specified column has been
updated in the associated change row
– Ideally only used for post processing
– Use sys.fn_cdc_get_column_ordinal once to set the
position, and sys.fn_cdc_is_bit_set to parse the
update_mask in queries against change tables for
better performance
SQL SERVER 2012 SSISCOMPONENTS
Introduction
• SQL Server 2012 introduced new
SQL Server Integration Services
(SSIS) for CDC to simplify
extracting and consuming change
data
• Using the CDC components does
not require advanced knowledge
of SSIS to move change data from
a source system to a target for
further processing
CDC Control Task Component
• Used to control the life cycle of CDC packages in SSIS
– Synchronizes initial package load and the management of
LSN ranges processed by the CDC package executions
– Maintains the state across executions by persisting state
variable to a table
– Handles error scenarios and recovery from problems
during processing
• Supports two types of operations
– Synchronization of the initial data load and change processing
• Mark initial load start and initial load end for a full load from an
active source
• Resetting the CDC state variable to restart tracking
• Marking the CDC start from a snapshot LSN from a snapshot
database
– Management of change processing LSN ranges and tracking
what is processed successfully
• Getting a processing range before execution
• Marking a processing range after successfully processing changes
CDC Control Task Component
• Persisting state across executions
– Manual state persistence requires the package developer
to read and write the state variable for the package
– Automated state persistence reads the value of the state
variable from the table configured in the Control Task
editor to get the processing range and writes the value to
the table to mark the processed range
• Errors can be reported by the Control Task if:
– A get processing range is called after a previous get
processing range operation without the mark processed
range operation occurring
• Possibly a different package running concurrently with the same
state variable name
– Reading the persisted state variable value from the
persisted store fails
– The state variable value read from the persistent store is
not consistent
– Writing the state variable value to the persistent store fails
CDC Source Component
• Reads the processing range of change data from a capture instance change table and
delivers the changes to other SSIS components
– The processing range is derived from the state package
variable that is set by a CDC Control task executed before
the data flow starts
• The CDC source requires the following configurations:
– ADO.NET connection manager to access the SQL Server
CDC database
– The name of a table enabled for CDC
– The name of the capture instance of the table to read the
changes from
– The change processing mode to use for reading the
changes
– The name of the CDC state package variable to determine the CDC processingrange
• The CDC source does not modify that variable, a subsequent CDC
Control task execution after the data flow must be used to update
the state values
CDC Processing Modes (1)
• All
– Returns a single row for each change applied to the source
table
– Similar to querying the
cdc.fn_cdc_get_all_changes_<capture_instance> table-
valued function with the ‘all’ filter option
• All with old values
– Similar to All, but with two rows per update, one for the
Before value and one for the After value
– Similar to querying the
cdc.fn_cdc_get_all_changes_<capture_instance> table-
valued function with the ‘all update old’ filter option
– The __$operation column distinguishes between Before (3)
and After (4)
CDC Processing Modes(2)
• Net
– Rolls up all changes for a key into a single row to simplify ETL processing
– Requires @supports_net_changes= true for the capture instance
– Similar to querying the cdc.fn_cdc_get_net_changes_<capture_instance>
table-valued function with the ‘all’ filter option
• Net with update mask
– Similar to Net but includes additional booleancolumns
(__$<column_name>_Changed) specifying whether a column was changed
– Similar to querying the cdc.fn_cdc_get_net_changes_<capture_instance>
table-valued function with the ‘all with mask’ filter option
CDC Processing Modes(3)
• Net with merge
– Groups INSERT and UPDATE operations
together making it easier to use the
MERGE statement (__$operation = 5)
– Similar to querying the
cdc.fn_cdc_get_net_changes_<capture_i
nstance> table-valued function with the
‘all with merge’ filter option
– Only the DELETE and UPDATE split paths
will receive rows from the CDC Splitter in
this mode
CDC Splitter Components
• Splits a single input of change rows from the CDC
Source component into separate outputs for Insert,
Update and Delete operations based on the
__$operation column value from the change table
– 1 –Delete
– 2 –Insert (not available using Net with Merge mode)
– 3 –Before Update row (only when using All with Old Values
mode)
– 4 –After Update row
– 5 –Merged Update row (only when using Net with Merge
mode)
• The CDC Source for the Data Flow must have the
NetCDCprocessing mode configured to use the CDC
Splitter
• No advanced configuration is required for the CDC
Splitter
Package Design Considerations
Configure separate packages for handling Initial Load and Incremental Loads
Initial load will mark the start LSN before transferring data
from the source, and the end LSN after using the CDC tracking
variable for all tables associated with the data flow
Facilitates easier re-initialization if necessary from the source system
• Error handling considerations need to be made when operation order must be
maintained as a part of the data flow
– CDC components can redirect error rows when appropriate
to prevent component failures but may result in out-of-
order processing of changes
• Consider using staging tables to fast load change data
and perform batch processing of changes in Transact-
SQL to prevent row-by-row processing of changes in
SSIS
– Change from ETL (SSIS processing of rows) to ELT (database engine processing) to benefit from set based operations
CDC Setup
• Step 1. Enable CDC for database
USE
AdventureWorks2012
GO
EXEC
sp_changedbowner'sa
'
GO
EXEC
sys.sp_cdc_enable_d
b
GO
CDC Functions
CDC Stored Procedures
CDC Tables
CDC Setup
• Step 2. Enable CDC for table(s)
USE AdventureWorks2012
GO
EXEC sys.sp_cdc_enable_table
@source_schema= N'Production'
,@source_name= N'Product'
,@role_name= N'cdc_Admin'
,@capture_instance=
N'Production_Product'
,@supports_net_changes= 1
cdc.Production_Product_CT
Anatomy of a CDC Table
• __$start_lsnand __$seqval
– Link record to a transaction
– Specify order of operations
• __$operation
– 1 = delete
– 2 = insert
– 3 = update (record data before change)
– 4 = update (record data after change)
– 5 = merge
• __$update_mask
– Identify which columns changed
– Use with Sys.fn_cdc_has_column_changed
CDC in Integration Services
SSIS manages
current state of
CDC processing
here
Source Database
Staging Database
Table Structures
One staging
table per type
of change with
source AND
change
columns
EXCEPT…
Updates table
includes
ChangeType
when both Type 1
and Type2
processing
required
CDC in Integration Services Control
Flow-Extraction
CDC
Control
Task to
mark
beginning
and end of
processing
Truncate 3 staging tables
Incremental LoadInitial Load
Data Flow Task –Extraction
Incremental Load Only
Control Flow –Transform and
Load
Data Flow –Transform and
Load
SELECT [__$start_lsn]
,[__$operation]
,[__$update_mask]
,[ProductID]
,[Name]
. . .
FROM
[stage].[stageProduct_Inse
rts]
UNION ALL
SELECT [__$start_lsn]
,[__$operation]
,[__$update_mask]
,[ProductID]
,[Name]
. . .
FROM
[stage].[stageProduct_Upda
tes]
WHERE ChangeType= 2
SKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSING

More Related Content

What's hot

Azure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudAzure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the Cloud
Mark Kromer
 
Considerations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseConsiderations for Data Access in the Lakehouse
Considerations for Data Access in the Lakehouse
Databricks
 
Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene Polonichko
Alex Tumanoff
 
How to build a successful Data Lake
How to build a successful Data LakeHow to build a successful Data Lake
How to build a successful Data Lake
DataWorks Summit/Hadoop Summit
 
How to Prepare for a BI Migration
How to Prepare for a BI MigrationHow to Prepare for a BI Migration
How to Prepare for a BI Migration
Senturus
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
Rakesh Jayaram
 
‏‏‏‏Chapter 9: Data Warehousing and Business Intelligence Management
‏‏‏‏Chapter 9: Data Warehousing and Business Intelligence Management‏‏‏‏Chapter 9: Data Warehousing and Business Intelligence Management
‏‏‏‏Chapter 9: Data Warehousing and Business Intelligence Management
Ahmed Alorage
 
3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta
Databricks
 
Azure Data Factory Data Flows Training (Sept 2020 Update)
Azure Data Factory Data Flows Training (Sept 2020 Update)Azure Data Factory Data Flows Training (Sept 2020 Update)
Azure Data Factory Data Flows Training (Sept 2020 Update)
Mark Kromer
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
Databricks
 
Batch Processing vs Stream Processing Difference
Batch Processing vs Stream Processing DifferenceBatch Processing vs Stream Processing Difference
Batch Processing vs Stream Processing Difference
jeetendra mandal
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
Databricks
 
Oracle Tablespace - Basic
Oracle Tablespace - BasicOracle Tablespace - Basic
Oracle Tablespace - Basic
Eryk Budi Pratama
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
Databricks
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
Databricks
 
1- Introduction of Azure data factory.pptx
1- Introduction of Azure data factory.pptx1- Introduction of Azure data factory.pptx
1- Introduction of Azure data factory.pptx
BRIJESH KUMAR
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
James Serra
 
DataMinds 2022 Azure Purview Erwin de Kreuk
DataMinds 2022 Azure Purview Erwin de KreukDataMinds 2022 Azure Purview Erwin de Kreuk
DataMinds 2022 Azure Purview Erwin de Kreuk
Erwin de Kreuk
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
James Serra
 

What's hot (20)

Azure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudAzure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the Cloud
 
Considerations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseConsiderations for Data Access in the Lakehouse
Considerations for Data Access in the Lakehouse
 
Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene Polonichko
 
How to build a successful Data Lake
How to build a successful Data LakeHow to build a successful Data Lake
How to build a successful Data Lake
 
How to Prepare for a BI Migration
How to Prepare for a BI MigrationHow to Prepare for a BI Migration
How to Prepare for a BI Migration
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
‏‏‏‏Chapter 9: Data Warehousing and Business Intelligence Management
‏‏‏‏Chapter 9: Data Warehousing and Business Intelligence Management‏‏‏‏Chapter 9: Data Warehousing and Business Intelligence Management
‏‏‏‏Chapter 9: Data Warehousing and Business Intelligence Management
 
3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta
 
Azure Data Factory Data Flows Training (Sept 2020 Update)
Azure Data Factory Data Flows Training (Sept 2020 Update)Azure Data Factory Data Flows Training (Sept 2020 Update)
Azure Data Factory Data Flows Training (Sept 2020 Update)
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Batch Processing vs Stream Processing Difference
Batch Processing vs Stream Processing DifferenceBatch Processing vs Stream Processing Difference
Batch Processing vs Stream Processing Difference
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
 
Oracle Tablespace - Basic
Oracle Tablespace - BasicOracle Tablespace - Basic
Oracle Tablespace - Basic
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
1- Introduction of Azure data factory.pptx
1- Introduction of Azure data factory.pptx1- Introduction of Azure data factory.pptx
1- Introduction of Azure data factory.pptx
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
DataMinds 2022 Azure Purview Erwin de Kreuk
DataMinds 2022 Azure Purview Erwin de KreukDataMinds 2022 Azure Purview Erwin de Kreuk
DataMinds 2022 Azure Purview Erwin de Kreuk
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 

Similar to SKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSING

Evolutionary database design
Evolutionary database designEvolutionary database design
Evolutionary database design
Salehein Syed
 
Sql Server2008
Sql Server2008Sql Server2008
Sql Server2008
Microsoft Iceland
 
Road to database automation - Database source control
Road to database automation - Database source controlRoad to database automation - Database source control
Road to database automation - Database source control
Eduardo Piairo
 
Survey On Temporal Data And Change Management in Data Warehouses
Survey On Temporal Data And Change Management in Data WarehousesSurvey On Temporal Data And Change Management in Data Warehouses
Survey On Temporal Data And Change Management in Data Warehouses
Etisalat
 
Database 12c is ready for you... Are you ready for 12c?
Database 12c is ready for you... Are you ready for 12c?Database 12c is ready for you... Are you ready for 12c?
Database 12c is ready for you... Are you ready for 12c?
Performance Tuning Corporation
 
Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]shuwutong
 
Optimizing Application Performance - 2022.pptx
Optimizing Application Performance - 2022.pptxOptimizing Application Performance - 2022.pptx
Optimizing Application Performance - 2022.pptx
JasonTuran2
 
Database Migrations with Gradle and Liquibase
Database Migrations with Gradle and LiquibaseDatabase Migrations with Gradle and Liquibase
Database Migrations with Gradle and Liquibase
Dan Stine
 
Exchange Server 2013 : les mécanismes de haute disponibilité et la redondance...
Exchange Server 2013 : les mécanismes de haute disponibilité et la redondance...Exchange Server 2013 : les mécanismes de haute disponibilité et la redondance...
Exchange Server 2013 : les mécanismes de haute disponibilité et la redondance...
Microsoft Technet France
 
(ATS4-DEV02) Accelrys Query Service: Technology and Tools
(ATS4-DEV02) Accelrys Query Service: Technology and Tools(ATS4-DEV02) Accelrys Query Service: Technology and Tools
(ATS4-DEV02) Accelrys Query Service: Technology and Tools
BIOVIA
 
TCC14 tour hague optimising workbooks
TCC14 tour hague optimising workbooksTCC14 tour hague optimising workbooks
TCC14 tour hague optimising workbooksMrunal Shridhar
 
Data flow in Extraction of ETL data warehousing
Data flow in Extraction of ETL data warehousingData flow in Extraction of ETL data warehousing
Data flow in Extraction of ETL data warehousingDr. Dipti Patil
 
Data Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfData Vault Automation at the Bijenkorf
Data Vault Automation at the Bijenkorf
Rob Winters
 
Presentation cloud control enterprise manager 12c
Presentation   cloud control enterprise manager 12cPresentation   cloud control enterprise manager 12c
Presentation cloud control enterprise manager 12c
xKinAnx
 
Week 2 - Database System Development Lifecycle-old.pptx
Week 2 - Database System Development Lifecycle-old.pptxWeek 2 - Database System Development Lifecycle-old.pptx
Week 2 - Database System Development Lifecycle-old.pptx
NurulIzrin
 
AWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsAWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data Analytics
Keeyong Han
 
Harness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data LakeHarness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data Lake
Saurabh K. Gupta
 

Similar to SKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSING (20)

Evolutionary database design
Evolutionary database designEvolutionary database design
Evolutionary database design
 
NoSql
NoSqlNoSql
NoSql
 
SQLDay2013_MarcinSzeliga_StoredProcedures
SQLDay2013_MarcinSzeliga_StoredProceduresSQLDay2013_MarcinSzeliga_StoredProcedures
SQLDay2013_MarcinSzeliga_StoredProcedures
 
Sql Server2008
Sql Server2008Sql Server2008
Sql Server2008
 
Road to database automation - Database source control
Road to database automation - Database source controlRoad to database automation - Database source control
Road to database automation - Database source control
 
Survey On Temporal Data And Change Management in Data Warehouses
Survey On Temporal Data And Change Management in Data WarehousesSurvey On Temporal Data And Change Management in Data Warehouses
Survey On Temporal Data And Change Management in Data Warehouses
 
SQLDay2013_ChrisWebb_CubeDesign&PerformanceTuning
SQLDay2013_ChrisWebb_CubeDesign&PerformanceTuningSQLDay2013_ChrisWebb_CubeDesign&PerformanceTuning
SQLDay2013_ChrisWebb_CubeDesign&PerformanceTuning
 
Database 12c is ready for you... Are you ready for 12c?
Database 12c is ready for you... Are you ready for 12c?Database 12c is ready for you... Are you ready for 12c?
Database 12c is ready for you... Are you ready for 12c?
 
Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]
 
Optimizing Application Performance - 2022.pptx
Optimizing Application Performance - 2022.pptxOptimizing Application Performance - 2022.pptx
Optimizing Application Performance - 2022.pptx
 
Database Migrations with Gradle and Liquibase
Database Migrations with Gradle and LiquibaseDatabase Migrations with Gradle and Liquibase
Database Migrations with Gradle and Liquibase
 
Exchange Server 2013 : les mécanismes de haute disponibilité et la redondance...
Exchange Server 2013 : les mécanismes de haute disponibilité et la redondance...Exchange Server 2013 : les mécanismes de haute disponibilité et la redondance...
Exchange Server 2013 : les mécanismes de haute disponibilité et la redondance...
 
(ATS4-DEV02) Accelrys Query Service: Technology and Tools
(ATS4-DEV02) Accelrys Query Service: Technology and Tools(ATS4-DEV02) Accelrys Query Service: Technology and Tools
(ATS4-DEV02) Accelrys Query Service: Technology and Tools
 
TCC14 tour hague optimising workbooks
TCC14 tour hague optimising workbooksTCC14 tour hague optimising workbooks
TCC14 tour hague optimising workbooks
 
Data flow in Extraction of ETL data warehousing
Data flow in Extraction of ETL data warehousingData flow in Extraction of ETL data warehousing
Data flow in Extraction of ETL data warehousing
 
Data Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfData Vault Automation at the Bijenkorf
Data Vault Automation at the Bijenkorf
 
Presentation cloud control enterprise manager 12c
Presentation   cloud control enterprise manager 12cPresentation   cloud control enterprise manager 12c
Presentation cloud control enterprise manager 12c
 
Week 2 - Database System Development Lifecycle-old.pptx
Week 2 - Database System Development Lifecycle-old.pptxWeek 2 - Database System Development Lifecycle-old.pptx
Week 2 - Database System Development Lifecycle-old.pptx
 
AWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsAWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data Analytics
 
Harness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data LakeHarness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data Lake
 

More from Skillwise Group

Skillwise Consulting New updated
Skillwise Consulting New updatedSkillwise Consulting New updated
Skillwise Consulting New updated
Skillwise Group
 
Email Etiquette
Email Etiquette Email Etiquette
Email Etiquette
Skillwise Group
 
Healthcare profile
Healthcare profileHealthcare profile
Healthcare profile
Skillwise Group
 
Manufacturing courses
Manufacturing coursesManufacturing courses
Manufacturing courses
Skillwise Group
 
Retailing & logistics profile
Retailing & logistics profileRetailing & logistics profile
Retailing & logistics profile
Skillwise Group
 
Skillwise orientation
Skillwise orientationSkillwise orientation
Skillwise orientation
Skillwise Group
 
Overview- Skillwise Consulting
Overview- Skillwise Consulting Overview- Skillwise Consulting
Overview- Skillwise Consulting
Skillwise Group
 
Skillwise corporate presentation
Skillwise corporate presentationSkillwise corporate presentation
Skillwise corporate presentation
Skillwise Group
 
Skillwise Profile
Skillwise ProfileSkillwise Profile
Skillwise Profile
Skillwise Group
 
Skillwise Softskill Training Workshop
Skillwise Softskill Training WorkshopSkillwise Softskill Training Workshop
Skillwise Softskill Training Workshop
Skillwise Group
 
Skillwise Insurance profile
Skillwise Insurance profileSkillwise Insurance profile
Skillwise Insurance profile
Skillwise Group
 
Skillwise Train and Hire Services
Skillwise Train and Hire ServicesSkillwise Train and Hire Services
Skillwise Train and Hire Services
Skillwise Group
 
Skillwise Digital Technology
Skillwise Digital Technology Skillwise Digital Technology
Skillwise Digital Technology
Skillwise Group
 
Skillwise Boot Camp Training
Skillwise Boot Camp TrainingSkillwise Boot Camp Training
Skillwise Boot Camp Training
Skillwise Group
 
Skillwise Academy Profile
Skillwise Academy ProfileSkillwise Academy Profile
Skillwise Academy Profile
Skillwise Group
 
Skillwise Overview
Skillwise OverviewSkillwise Overview
Skillwise Overview
Skillwise Group
 
SKILLWISE - OOPS CONCEPT
SKILLWISE - OOPS CONCEPTSKILLWISE - OOPS CONCEPT
SKILLWISE - OOPS CONCEPT
Skillwise Group
 
Skillwise - Business writing
Skillwise - Business writing Skillwise - Business writing
Skillwise - Business writing
Skillwise Group
 
Imc.ppt
Imc.pptImc.ppt
Skillwise cics part 1
Skillwise cics part 1Skillwise cics part 1
Skillwise cics part 1
Skillwise Group
 

More from Skillwise Group (20)

Skillwise Consulting New updated
Skillwise Consulting New updatedSkillwise Consulting New updated
Skillwise Consulting New updated
 
Email Etiquette
Email Etiquette Email Etiquette
Email Etiquette
 
Healthcare profile
Healthcare profileHealthcare profile
Healthcare profile
 
Manufacturing courses
Manufacturing coursesManufacturing courses
Manufacturing courses
 
Retailing & logistics profile
Retailing & logistics profileRetailing & logistics profile
Retailing & logistics profile
 
Skillwise orientation
Skillwise orientationSkillwise orientation
Skillwise orientation
 
Overview- Skillwise Consulting
Overview- Skillwise Consulting Overview- Skillwise Consulting
Overview- Skillwise Consulting
 
Skillwise corporate presentation
Skillwise corporate presentationSkillwise corporate presentation
Skillwise corporate presentation
 
Skillwise Profile
Skillwise ProfileSkillwise Profile
Skillwise Profile
 
Skillwise Softskill Training Workshop
Skillwise Softskill Training WorkshopSkillwise Softskill Training Workshop
Skillwise Softskill Training Workshop
 
Skillwise Insurance profile
Skillwise Insurance profileSkillwise Insurance profile
Skillwise Insurance profile
 
Skillwise Train and Hire Services
Skillwise Train and Hire ServicesSkillwise Train and Hire Services
Skillwise Train and Hire Services
 
Skillwise Digital Technology
Skillwise Digital Technology Skillwise Digital Technology
Skillwise Digital Technology
 
Skillwise Boot Camp Training
Skillwise Boot Camp TrainingSkillwise Boot Camp Training
Skillwise Boot Camp Training
 
Skillwise Academy Profile
Skillwise Academy ProfileSkillwise Academy Profile
Skillwise Academy Profile
 
Skillwise Overview
Skillwise OverviewSkillwise Overview
Skillwise Overview
 
SKILLWISE - OOPS CONCEPT
SKILLWISE - OOPS CONCEPTSKILLWISE - OOPS CONCEPT
SKILLWISE - OOPS CONCEPT
 
Skillwise - Business writing
Skillwise - Business writing Skillwise - Business writing
Skillwise - Business writing
 
Imc.ppt
Imc.pptImc.ppt
Imc.ppt
 
Skillwise cics part 1
Skillwise cics part 1Skillwise cics part 1
Skillwise cics part 1
 

Recently uploaded

Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 

Recently uploaded (20)

Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 

SKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSING

  • 2. SSIS Design Patterns for Data Warehousing and Change Data Capture
  • 3. What is a Design Pattern? • Pattern –A design for a package that solves a certain scenario • Over time certain SSIS logic flows have emerged as best practices • These designs have been classified into patterns for reference purposes • Standard Design Patterns – Learn from others – Common patterns make it easy for new personnel to understand and work with – Easy to apply in new projects
  • 4. Design Patterns and Data Warehousing • SSIS most commonly used in Data Warehousing • Patterns in this course most commonly used in Data Warehousing • Applicable to non DW projects • Definitions – Type 1 –Dimension updates simply overwrite pre- existing values – Type 2 –Each update to a dimension causes a new record to be created – Fact –Records the measures for a transaction and associates with dimensions
  • 5. What You Need? • SQL Server Data Tools –BI Components – For SQL Server 2012 use Visual Studio 2012 – For SQL Server 2014 use Visual Studio 2013 • SQL Server Data Tools –Database Project –SQL Server 2012 – Uses Visual Studio 2012 • SQL Server Data Tools –Database Project –SQL Server 2014 – Included in Visual Studio 2013 Community Edition – Included in other versions of VS 2013 out of box – Make sure to install update 4
  • 6. Versions of SQL Server • We will use SQL Server 2014 Project Deployment Mode • Material works identically in 2012 (Project Deployment Mode) – Package Deployment Mode for 2012/2014 requires older style configurations for Master/Child • Patterns applicable in 2008R2 & 2008 with limitations – CDC has to be manually implemented, no controls in SSIS Toolbox • Master / Child works differently –uses configurations • Limited applicability to SQL Server 2005 – No Hashbytes – No Merge – No CDC – Master / Child works differently –uses configurations
  • 7. Deploying the Test Database • Before running project you will need to deploy and setup the test database • Uses SSDT Database Project as part of the solution • Deploy a database • After deploy run stored procedure DDL.CreateAllObjects
  • 8. The 13 Patterns• Truncate and Load • SCD Wizard – Type 1 – Type 2 • Set Based Updates – Type 1 – Type 2 • Hashbytes – Different Databases – Same Database • Change Data Capture • Merge • Date Based • Fact Table Pattern • Master / Child – Basic – Passing Parameters – Load Balancing
  • 9. Truncate and Load • Deletes all rows in target, then completely reloads from source • Commonly used in staging environments, often with other patterns • Pros – Simple to implement – Fast for small to medium datasets • Cons – No change tracking – Slower for large datasets
  • 10. SCD Wizard Type 1 • SCD (Slowly Changing Dimension) Wizard • Pattern for tables with Type 1 attributes only • Pros – Easy to create – Good for very very small updates • Cons – When something changes all SCD generated components must be deleted and recreated. – Incredibly slow. – Did we mention it is slow? – It is really really slow.
  • 11. SCD Wizard Type 2 • SCD (Slowly Changing Dimension) Wizard • Pattern for tables with Type 1 and 2 attributes • Wizard is same for both patterns, just different options • Pros – Easy to create – Good for very small updates • Cons – When something changes all SCD generated components must be deleted and recreated. – Incredibly slow. – It didn’t get any faster since the last section
  • 12. Set Based Updates –Type 1 • Set Based Updates –Type 1 • Pros – Scales well – Runs fast • Cons – Requires extra tables in the database – Requires more setup work in the package
  • 13. Set Based Updates –Type 2 • Set Based Updates –Type 2 • Pros – Scales well – Runs fast • Cons – Logic is somewhat complex – Requires extra tables in the database
  • 14. Hashbytes –Different Databases • Uses the Hashbytes function to generate a unique value for comparisons • Pros – Good for tables with many columns – Scales well -fast • Cons – Requires use of lookups –caching requires memory – Requires concatenation of all data columns in select statement
  • 15. Hashbytes –Same Database • Uses Hashbytes with a Merge Join • Pros – Avoids use of lookups, lowers memory requirements – Scales very well – Will work on different databases but most efficient in a single database • Cons – Requires data sources to be sorted – Requires common key to sort on – Needs to concatenate data columns for Hashbytes
  • 16. Change Data Capture • Lets SQL Server track which rows in source have changed • Pros – Tracks changes to data – Only read rows which have changed – Easy to determine Create / Update / Delete actions • Cons – Only works with SQL Server – Requires setup work in database and tables before it can be used – Must have ability to alter the source system
  • 17. Merge • Uses SQL Server MERGE statement • Pros – Simple to implement – Very fast • Cons – No transformations – No ability to track progress
  • 18. Date Based • Uses date driven values to determine changes • Pros – Easy to determine changes to rows – Reduces number of rows that are read – Can be combined with any of the other patterns • Cons – Requires source system to have a reliable date field indicating changes – Still requires logic to determine new rows vs updates
  • 19. Fact Table Pattern • Used to update metrics / measurements in data warehouse • Pros – Common pattern – Easy to implement • Cons – Can require many lookups – Updates not always simple
  • 20. Master / Child (Basic) • A master (parent) package which coordinates the execution of other packages (children) • Pros – Simple to implement • Cons – Not always efficient when many packages are involved
  • 21. Master / Child (Parameters) • Passing values from master package to children • Pros – SQL Server 2012 / 2014 project deployment mode make it very easy to pass values – Easy to reuse values across multiple child packages • Cons – In package deployment mode, or SQL Server 2008R2 and previous requires the more complex configurations
  • 22. Master/Child (Load Balanced) • Uses a table to drive package execution • Pros – Easy to alter execution –just update a table – Can easily balance parallel execution of packages • Cons – Needs many variables – Requires some manual effort and monitoring to effectively balance
  • 23. Choosing a Pattern • Truncate and Load – Low to moderate number of rows – No requirement to track changes – Good for staging tables • SCD Wizard, Type 1 & 2 – Very small number of rows (< 2000) – Packages that won’t change – There is almost always a better pattern
  • 24. Choosing a Pattern • Set Based Updates, Type 1 & 2 – Scales well – Good for limited number of columns – Extra ram required • Hashbytes – Scales well – Good for large number of columns – Source systems needs to implement a form of the Hashbytes function • Change Data Capture – Excellent pattern –SQL Server tells you all changes – Data source must be SQL Server
  • 25. Choosing a Pattern • Merge – Good for very simple ETL when no monitoring is required • Date Based – Limits number of rows read in – Can be combined with other patterns • Fact Table Pattern • Master / Child – Basic – Passing Parameters – Load Balancing
  • 27. Introduction • Many applications have requirements for identifying data changes that have occurred in a database for various reasons – Tracking historical changes to data – Auditing changes to data – Synchronizing data changes across disconnected systems – Implementing an Operational Data Store (ODS) – Incremental data loading for a Data Warehouse • SQL Server provides many techniques for tracking changes to data
  • 28. Techniques for Identifying Changes • DML table triggers – Can track before and after row state and deleted rows – Can be customized to include the user, modification time, or input buffer – Can introduce significant performance overhead on transactional systems • Modified datetime or timestamp column – Can introduce performance overhead to pull changes – Does not track deleted rows – Requires schema modifications to source tables and code to set value
  • 29. Techniques for Identifying Changes • Data comparisons – Comparing source and destination data requires scanning all rows to – determine changes and introduces significant performance overhead • Replication and Subscriber triggers – Offloads change identification to subscription database – Requires customizations and manual management of schema changes
  • 30. Change Data Capture Solution • Change Data Capture provides information about DML changes to a table in near real-time using the same Log Reader Agent as transactional replication – Eliminates expensive techniques that require schema modifications • DML triggers • Timestamp columns • Data comparisons and complex JOIN queries • May be used to answer a variety of critical questions: – What are all of the changes that happened to a table since the last ETL? – Which columns changed? – What type of changes occurred? INSERT/UPDATE/DELETE? – What was the before image of a row that was modified? • Supports net change identification, with a performance trade-off due to an additional index on change tables
  • 31. Configuring Change Data Capture • Change Data Capture provides the ability to capture the row data from DML changes to a database when enabled for capture • Configuring Change Data Capture has specific requirements, which when met allows individual tables to be configured for change capture • The options for configuring a table for change capture affect performance, the data collected, and security controls for accessing the capture tables
  • 32. Requirements to Enable CDC • Enterprise Edition feature only • Enabling CDC for a database requires sysadmin privileges – Requires executing sp_cdc_enable_db • Enabling CDC for a table requires db_owner privileges in the database – Requires executing sp_cdc_enable_table for each table that will track changes within the database • Querying the results from the CDC tables requires membership within the database role specified in the sp_cdc_enable_table procedure call if specified
  • 33. sp_cdc_enable_table Options • @source_schema – the name of the source table schema (required) • @source_name – the name of the source table (required) • @role_name – the name of the database security role used for gaiting access to the change data (required but can explicitly be set to NULL) • @capture_instance – the name of the capture instance (optional) • @supports_net_changes – indicates whether querying for net changes is supported by the capture instance (optional defaults 1) – Enabling net change support adds an additional non-clustered index to the capture table, which can impact insert performance for change rows • @index_name – the name of a unique index to uniquely identify rows in the source table (optional defaults to primary key)
  • 34. sp_cdc_enable_table Options • @captured_column_list – the list of source table columns to include in the change table (optional) • @filegroup_name – the filegroup to be used for the change table (optional) • @allow_partition_switch – indicates whether the SWITCH PARTITION command of ALTER TABLE can be executed against a table that is enabled for CDC (optional) – Switching a partition into a CDC enabled table does not generate INSERT change data for rows that previously existed in the partition prior to the switch – Switching a partition out of a CDC enabled table does not generate DELETE change data for the rows contained within the partition
  • 35. QUERYING CHANGE DATA USING T-SQL
  • 36. Introduction • After enabling a database for Change Data Capture and configuring capture instances for the source tables, the change data must be queried for processing • All change rows are identified by the Log Sequence Number (LSN) associated with the transaction that changed the row • Change tables include internal metadata columns that describe the change row as well as the captured columns configured for the table
  • 37. Finding Change Table Metadata • Stored Procedures – sys.sp_cdc_help_change_data_capture – returns the CDC capture information for each table enabled within a database • May return up to two rows per table, one for each capture instance • The @source_schema parameter specifies the source schema to return results for when the procedure executes • The @source_name parameter specifies the source table to return results for when the procedure executes – sys.sp_cdc_get_captured_columns – returns the captured columns for the capture instance specified by the @capture_instance parameter
  • 38. Finding Change Table Metadata • System tables – cdc.change_tables – contains up to two rows, one per capture instance enabled on a source table – cdc.captured_columns – contains one row per captured column for a source capture instance • Querying system tables is not recommended, and using the stored procedures is recommended instead
  • 39. Understanding Change Table Columns• Change tables exist within the CDC schema and are named <capture_instance>_CT • The first five columns are metadata columns: – __$start_lsn – the starting LSN of the transaction – __$end_lsn – the ending LSN of the transaction – __$seqval – the sequence or order of the row changes within a transaction – __$operation – the type of operation reflected by the change row • 1 = Delete • 2 = Insert • 3 = Value before update • 4 = Value after update – __$update_mask – bitmask of columns changed by operation within row • Remaining columns match the source table column definition when the capture instance was created
  • 40. Determining Change Rows to Process • By LSN: – sys.fn_cdc_map_time_to_lsn ( '<relational_operator>', tracking_time ) • Returns the LSN value from the start_lsn column of the cdc.lsn_time_mapping system table for the tracking_time specified • The relational_operator specifies the comparison to be applied against the tran_end_time of the cdc_lsn_time_mapping table when determining the LSN value to return – largest less than, largest less than or equal, smallest greater than, or smallest greater than or equal – sys.fn_cdc_get_min_lsn ( 'capture_instance_name' ) • Returns the start_lsn value for the capture instance from cdc.change_tables • Sets the lower endpoint for change data for a given capture instance – sys.fn_cdc_get_max_lsn () • Returns the maximum start_lsn column value from cdc.lsn_time_mapping system table setting the upper endpoint for all capture instances
  • 41. Determining Change Rows to Process• By LSN: – Custom tracking table updated by application code to track the capture instance name and last processed LSN – sys.fn_cdc_decrement_lsn ( lsn_value ) • Returns the previous LSN in the sequence based upon the specified LSN • Often used to decrement sys.fn_cdc_get_max_lsn () value to set upper endpoint without overlapping LSNs across different data loads – sys.fn_cdc_increment_lsn ( lsn_value ) • Returns the next LSN in the sequence based upon the specified LSN • Often used to increment the last saved LSN from a custom tracking table to set a new lower endpoint without overlapping LSNs across different data loads • By time: – sys.fn_cdc_map_lsn_to_time ( lsn_value ) • Returns the tran_end_time column from cdc.lsn_time_mapping for the specified LSN, allowing
  • 42. Change Row Table-Valued Functions • cdc.fn_cdc_get_all_changes_<capture_ins tance> – Returns one row for each modification applied to the source table within the specified LSN range – Multiple modifications of a source row within the LSN range will be represented individually in the result set • cdc.fn_cdc_get_net_changes_<capture_in stance> – Returns a single net change row for each source row modified within the specified LSN range
  • 43. Determining Whether a Column Changed • sys.fn_cdc_get_column_ordinal ( 'capture_instance', 'column_name‘ ) – Return s the ordinal position of a column name within the specified capture instances update mask • sys.fn_cdc_is_bit_set ( position, update_mask ) – Checks the specified ordinal position of the update mask to determine if the change bit is set • sys.fn_cdc_has_column_changed ( 'capture_instance','column_name‘ , update_mask ) – Identifies whether the specified column has been updated in the associated change row – Ideally only used for post processing – Use sys.fn_cdc_get_column_ordinal once to set the position, and sys.fn_cdc_is_bit_set to parse the update_mask in queries against change tables for better performance
  • 44. SQL SERVER 2012 SSISCOMPONENTS
  • 45. Introduction • SQL Server 2012 introduced new SQL Server Integration Services (SSIS) for CDC to simplify extracting and consuming change data • Using the CDC components does not require advanced knowledge of SSIS to move change data from a source system to a target for further processing
  • 46. CDC Control Task Component • Used to control the life cycle of CDC packages in SSIS – Synchronizes initial package load and the management of LSN ranges processed by the CDC package executions – Maintains the state across executions by persisting state variable to a table – Handles error scenarios and recovery from problems during processing • Supports two types of operations – Synchronization of the initial data load and change processing • Mark initial load start and initial load end for a full load from an active source • Resetting the CDC state variable to restart tracking • Marking the CDC start from a snapshot LSN from a snapshot database – Management of change processing LSN ranges and tracking what is processed successfully • Getting a processing range before execution • Marking a processing range after successfully processing changes
  • 47. CDC Control Task Component • Persisting state across executions – Manual state persistence requires the package developer to read and write the state variable for the package – Automated state persistence reads the value of the state variable from the table configured in the Control Task editor to get the processing range and writes the value to the table to mark the processed range • Errors can be reported by the Control Task if: – A get processing range is called after a previous get processing range operation without the mark processed range operation occurring • Possibly a different package running concurrently with the same state variable name – Reading the persisted state variable value from the persisted store fails – The state variable value read from the persistent store is not consistent – Writing the state variable value to the persistent store fails
  • 48. CDC Source Component • Reads the processing range of change data from a capture instance change table and delivers the changes to other SSIS components – The processing range is derived from the state package variable that is set by a CDC Control task executed before the data flow starts • The CDC source requires the following configurations: – ADO.NET connection manager to access the SQL Server CDC database – The name of a table enabled for CDC – The name of the capture instance of the table to read the changes from – The change processing mode to use for reading the changes – The name of the CDC state package variable to determine the CDC processingrange • The CDC source does not modify that variable, a subsequent CDC Control task execution after the data flow must be used to update the state values
  • 49. CDC Processing Modes (1) • All – Returns a single row for each change applied to the source table – Similar to querying the cdc.fn_cdc_get_all_changes_<capture_instance> table- valued function with the ‘all’ filter option • All with old values – Similar to All, but with two rows per update, one for the Before value and one for the After value – Similar to querying the cdc.fn_cdc_get_all_changes_<capture_instance> table- valued function with the ‘all update old’ filter option – The __$operation column distinguishes between Before (3) and After (4)
  • 50. CDC Processing Modes(2) • Net – Rolls up all changes for a key into a single row to simplify ETL processing – Requires @supports_net_changes= true for the capture instance – Similar to querying the cdc.fn_cdc_get_net_changes_<capture_instance> table-valued function with the ‘all’ filter option • Net with update mask – Similar to Net but includes additional booleancolumns (__$<column_name>_Changed) specifying whether a column was changed – Similar to querying the cdc.fn_cdc_get_net_changes_<capture_instance> table-valued function with the ‘all with mask’ filter option
  • 51. CDC Processing Modes(3) • Net with merge – Groups INSERT and UPDATE operations together making it easier to use the MERGE statement (__$operation = 5) – Similar to querying the cdc.fn_cdc_get_net_changes_<capture_i nstance> table-valued function with the ‘all with merge’ filter option – Only the DELETE and UPDATE split paths will receive rows from the CDC Splitter in this mode
  • 52. CDC Splitter Components • Splits a single input of change rows from the CDC Source component into separate outputs for Insert, Update and Delete operations based on the __$operation column value from the change table – 1 –Delete – 2 –Insert (not available using Net with Merge mode) – 3 –Before Update row (only when using All with Old Values mode) – 4 –After Update row – 5 –Merged Update row (only when using Net with Merge mode) • The CDC Source for the Data Flow must have the NetCDCprocessing mode configured to use the CDC Splitter • No advanced configuration is required for the CDC Splitter
  • 53. Package Design Considerations Configure separate packages for handling Initial Load and Incremental Loads Initial load will mark the start LSN before transferring data from the source, and the end LSN after using the CDC tracking variable for all tables associated with the data flow Facilitates easier re-initialization if necessary from the source system • Error handling considerations need to be made when operation order must be maintained as a part of the data flow – CDC components can redirect error rows when appropriate to prevent component failures but may result in out-of- order processing of changes • Consider using staging tables to fast load change data and perform batch processing of changes in Transact- SQL to prevent row-by-row processing of changes in SSIS – Change from ETL (SSIS processing of rows) to ELT (database engine processing) to benefit from set based operations
  • 54. CDC Setup • Step 1. Enable CDC for database USE AdventureWorks2012 GO EXEC sp_changedbowner'sa ' GO EXEC sys.sp_cdc_enable_d b GO CDC Functions CDC Stored Procedures CDC Tables
  • 55. CDC Setup • Step 2. Enable CDC for table(s) USE AdventureWorks2012 GO EXEC sys.sp_cdc_enable_table @source_schema= N'Production' ,@source_name= N'Product' ,@role_name= N'cdc_Admin' ,@capture_instance= N'Production_Product' ,@supports_net_changes= 1 cdc.Production_Product_CT
  • 56. Anatomy of a CDC Table • __$start_lsnand __$seqval – Link record to a transaction – Specify order of operations • __$operation – 1 = delete – 2 = insert – 3 = update (record data before change) – 4 = update (record data after change) – 5 = merge • __$update_mask – Identify which columns changed – Use with Sys.fn_cdc_has_column_changed
  • 57. CDC in Integration Services SSIS manages current state of CDC processing here Source Database Staging Database Table Structures One staging table per type of change with source AND change columns EXCEPT… Updates table includes ChangeType when both Type 1 and Type2 processing required
  • 58. CDC in Integration Services Control Flow-Extraction CDC Control Task to mark beginning and end of processing Truncate 3 staging tables Incremental LoadInitial Load
  • 59. Data Flow Task –Extraction Incremental Load Only
  • 61. Data Flow –Transform and Load SELECT [__$start_lsn] ,[__$operation] ,[__$update_mask] ,[ProductID] ,[Name] . . . FROM [stage].[stageProduct_Inse rts] UNION ALL SELECT [__$start_lsn] ,[__$operation] ,[__$update_mask] ,[ProductID] ,[Name] . . . FROM [stage].[stageProduct_Upda tes] WHERE ChangeType= 2