1) The document discusses how to integrate Master Data Services (MDS) with Change Data Capture (CDC) in SQL Server 2012. MDS allows end users to directly make changes to records instead of relying on technical staff. CDC tracks changes made to tables and identifies who made the changes.
2) It provides steps to set up MDS including creating a database and website. It also explains how to enable CDC for a database and table.
3) The document outlines two SSIS packages needed - one for initial load from source to target, and another incremental package to handle changes captured by CDC between target tables.
Scanning the Internet for External Cloud Exposures via SSL Certs
MDS CDC Integration
1. MDS CDC Implementation
Sainatth Wagh
Preface
Basically the below presentation focuses on the integration between the MDS and CDC and how this can
help the client to a great extent. Also, the focus would be to use both the features so that the time
required for any manual activity can be removed or reduced to some extent.
Basically in the document, the topics covered are in following manner firstly the master data services
which is very comfortable for the clients or the end user to use and second is the change data capture
feature which not only helps for incremental change but also acts as a auditor to track the changes and
identify who had done the changes.
The target audience for this would be someone who already has some exposure to sql server integration
services (SSIS) and sql server analysis services (SSAS)
Master Data Services [MDS/MDM]
Feature: Basically there are always few tables at any client which require frequent updation to be done
either the records are added, modified or deleted. However one thing is tedious that based on the
requirement from the end customer such changes are carried out, so there is dependency on the
technical person to fulfill the same. However with the MDS/MDM this activity or dependency has been
reduced to large extent where instead of depending on the technical person, the end user can himself
do the necessary changes and the changes would be recorded in the database, thus eliminating any sort
of dependency.
What it does: Generally there is a common scenario/situation observed where one table has records of
more than one place because of its different geographic presence. For example consider the scenario
where the table warehouse_inventory is recording all the inventory related facts of different countries.
Now at one of the location if a new product gets introduced then the table needs to have the records for
all the related locations. So to avoid the hustle, it’s better to record or do the changes only at the central
location. With this concept, comes the concept of master data services.
The section related to master data services is being divided into two parts:
1) MDS installation
2) MDS Usage
1) MDS Installation: The following snapshots below will highlight how to set up the MDS so that it
can be used effectively.
2. Once while your installation for the MDS is completed in SQL Server 2012, in the programs list
you will find the component MDS. But before you start using the MDS you need to create
specific MDS database (where your records will be stored) and MDS website (so that even from
the front end at later point of time the changes can be done which will directly change in the
MDS database.)
Under MDS (Master Data Services) there is component present Configuration Manager so click
on it so that it opens.
Next will proceed to create the MDS Database where one can record the changes.
3.
4. Specify the database name and the collation:
Specify the type of windows account that would be used for the connectivity.
5. Specify the administrator account and do note that once specified the account cannot be
changed, so instead of actual account try to use the service account.
6. Creation of the MDS database completed.
Steps for creating the MDS Website:
The first step is to select a site where the web application will be stored. The configuration
application will query the available sites from the IIS and present them in the dropdown list.
7. Once a website is created you select from the virtual directory. On this page, select the IP
address, site name, host address if necessary and configure an application pool with username
and password for the application pool identity.
8. Now with the MDS database and MDS website successfully setup, one can try to explore how it
will look.
With this we have successfully setup the MDS database and website. Now we can install the
MDS excel add-in so that we can connect to the MDS database from excel and add, modify and
delete the necessary changes. The below snapshot will appear when you open click on the
explorer.
9. Under the section entities, list of all the entities that are applicable to the model would be
displayed. These entities can be considered as dimensions similar to SSAS.
Now in case if you want new entity one can either add it though excel add-in or can add through
the website under which model this entity should appear.
The above snapshot is nothing but the front end view which end user can see and if required
can do the modifications to it. So in case if any new record needs to be added or deleted, the
appropriate button “Add member” or “Delete Member” must be clicked to make the changes
effective. In case if any record needs to be modify then in the right pane you can see the values
for the record selected. So change whichever value is applicable to you and click OK. The
changes become effective.
Note: Since this presentation only deals with the integration of MDS with CDC, the scope of
MDS is limited to here. The complete presentation on only MDS will be dealt separately in
upcoming days.
With this we are ready to do the changes to value in MDS. Currently with the scope of MDS, the
scenarios handled are addition and updation. In the change data capture, it will be clear how the
changes done to MDS can be made effective in the CDC.
10. Change Data Capture Implementation using SQL 2012
The following article below will give a head start for working on the CDC [Change Data Capture] feature
of SQL 2012 as well as what all things one need to fix while working on the same.
Introduction:
Change Data Capture [CDC] is the feature introduced in sql 2008 which would perform the role of
auditing and help back track the changes which were carried out. However in the 2008 version lot of
procedures had to be written manually. However with the sql 2012 version, these manual writing of the
scripts is taken care by the wizard.
Before focusing on the CDC and its related package, lets create a procedure which will help to handle
the scenarios (addition/updation) done to the tables in MDS.
Once this procedure is compiled and executed successfully, create a execute sql task in the SSIS package
and set this procedure to execute. Based on its success the package 2 (Incremental Load) will be
executed accordingly.
CDC basically contains creation of two packages:
11. Package 1: Initial Load [This package basically is the first time load from the source table to destination
table]. This destination table is the one which is frequently updated with the inserts, updates and
deletes.
Package 2: Incremental Load [This package deals with the inserts, updates and deletes which are carried
out on the destination table]. Execute this package for the changes carried out in the CDC enabled table.
Below is the list of the necessary tables that would be required for the implementation of the CDC:
cdc_states: This table will particularly record the states post execution of the packages. Few of the
important states are:
ILBEGIN: Initial Load Begin.
ILEND: Initial Load End.
ILUPDATE: update the necessary load.
TFBEGIN: Incremental Load Begin
TFEND: Incremental Load End.
Before executing the CDC on the database, the database as well as the table on which CDC feature will
be implemented needs to be enabled. The below two scripts will help to enable the same.
Exec sys.sp_cdc_enable_db
Go
Execute the above script to enable the database
Two tables (Updates and Deletes) for each table whose value would be changed. The structure of this
table would be same as the frequently changed table. For example: If the table (dbo.dimCustomer) is
being changed frequently, then there would be two more tables introduced one containing the updated
records (dbo.stg_dimCustomer_Updates) and other containing the deleted records
(dbo.stg_dimCustomer_Deletes).
Exec sys.sp_cdc_enable_table
@source_schema=N'schemaname',
@source_name=N'tablename',
@role_name=N'cdc_admin',
@supports_net_changes=1
12. Note: If @support_net_changes is set to 0 then while using the CDC feature
all the values will be considered instead of incremental thus defeating the
purpose of going for the same.
Also one needs to have the sysadmin access to implement the CDC.
Go
Execute the above script to enable the table.
Please find below the snapshots while creating the package for Initial Load. This package basically will
move the data from source to destination. The components in this package required would be: two CDC
control flow task and one data flow task.
Step 1:
13. On opening the first CDC control flow task please find the necessary details that need to be inserted:
14. The above snapshot basically tells the system, to start the initial load and hence CDC control operation is
Mark Initial Load Start. The variable containing the CDC state can be kept just as user::cdc_states.
However if there are multiple tables on which CDC is going to be implemented, then declare the variable
differently for each table and hence in this case its user::cdc_state_customer.
The next is the table that is going to store the cdc_states. By default initially it will be blank. So click on
new and the table automatically by the name cdc_states can be seen in the dropbox. One can also
create a different table name if the table cdc_states is already being used.
Note: The state name and the variable containing the cdc state should be same else the feature won’t
work accordingly.
Step 2: The next step is normal data flow task of moving the data from the oledb source table to oledb
destination table.
15. Step 3: Then the final cdc control task to indicate that the initial load or the one time activity is
completed.
The difference between the first control task and the final control task is the CDC control operation. As
soon as the data loading is done from source to destination, then in this control task just select the value
as Mark Initial Load End.
Once this package is ready and executed, one has to verify the cdc_states and the value for the
cdc_states should be ILEND. In case of any other value, it indicates there is something wrong with this
package.
16. Incremental Load Package Execution:
Step 1: Create Execute SQL task to create the tables for storing the changes for inserts, updates and
deletes.
The code what needs to be present in the sql statement can be seen below. This code can be modified
based on the database name and the table name. This is basically the tables which records or tracks the
changes that are happening.
If not exists (select * from sys.objects where object_id = Object_id (N'[dbo].[stg_table_Updates]') and
type in (N'U'))
Begin
Select top 0 * into stg_table_Updates
From table_destination
End
17. If not exists (select * from sys.objects where object_id = Object_id (N'[dbo].[stg_table_Deletes]') and
type in (N'U'))
Begin
Select top 0 * into stg_table_Deletes
From table_destination
End
Step 2: The remaining architecture or the flow will remain the same as control flow of Initial Load
Package with two control flow tasks and one dataflow task.
The details behind the control flow tasks can be seen in the snapshot below:
18. Note: Basically this helps to get the processing range which has been generated from the execution of
the initial load. Do remember the variable containing the cdc_states should be mapped accordingly,
especially if there are multiple cdc_states.
Step 3: This particular package with respect to data flow task should have some specific structure. The
snapshot of the same can be seen below:
19. In this the cdc source task contains the details from the cdc enabled table. The details on the same can
be seen in the below snapshot.
20. The net selected here represents that only the latest modified/inserted/updated value would be
considered. The previous old values will not be considered. Also please note that this option would be
enabled only while creating the cdc table, the variable @net_support_changes is set to 1.
Once this is done, add the task (CDC Splitter) and in the input properties for the same note that all the
columns are being considered accordingly.
Then create three ADO .Net destinations for the records to be tracked for inserts, updates and deletes.
Once the activity is completed, for each destination select the necessary target table from where the
records should be inserted in those tables.
Step 4: Add the execute SQL task so that the destination is updated accordingly with the changes that
are taking place. This is optional. In case the table which are used in step 3 if sufficient then step 4 can
be skipped.
Step 5: Next is the final cdc control flow task, to mark the process complete related to cdc.
21. So with this two package creation and execution, the Change Data Capture [CDC] can be implemented in
SQL 2012.
Please find below, the list of the tables that is necessary for the execution of the package.
1) dbo.cdc_states (this table records the changes that are happening in the table)
2) dbo.customercdc (this is cdc enabled table)
3) dbo.dimcustomer_destination (this is basically the optional table which can be used in the
staging layer. Else it is optional)
4) dbo.stg_dimcustomer_inserts (records all the inserts)
5) dbo.stg_dimcustomer_deletes(records all the deletes)
6) dbo.stg_dimcustomer_updates (records all the updates)