SlideShare a Scribd company logo
1 of 84
Tutorial: Enterprise Information Management
using SSIS, MDS, and DQS Together
SQL Server Technical Article
Writer: SreedharPelluru,JaimeAlvaBravo
Technical Reviewer: CarlaSabotta,Jimvan de Erve,Kumar Vivek,MattMasson,Matthew Roche
Published: October2012
Applies to: SQL Server 2012
Summary: Managinginformationinanenterprise typicallyinvolvesintegratingdatafromacrossthe
enterprise andbeyond,cleansingthe data,matchingthe datato remove anyduplicates,standardizing
the data, enrichingthe data,makingthe dataconform tolegal and compliance requirements,andthen
storingthe data ina centralizedlocationwithall the necessarysecuritysettings.
In thistutorial,youwill learnhowtouse SQLServerIntegrationServices(SSIS),MasterData Services
(MDS),and Data QualityServices(DQS) togethertoimplementasample Enterprise Information
Management(EIM) solution.First,youwill use DQStocreate a knowledgebasethatcontainsknowledge
aboutthe data (metadata),cleansethe datainan Excel file usingthe knowledge base,andmatchthe
data to identifyandremove duplicatesinthe data.Next,youwill use the MDSAdd-inforExcel toupload
the cleansedandmatcheddata to MDS. Then,youwill automate the whole processusinganSSIS
solution.
2
Copyright
Thisdocumentisprovided“as-is”.Informationandviewsexpressedinthisdocument,includingURLand
otherInternetWebsite references,maychange withoutnotice.Youbearthe riskof usingit.
Some examplesdepictedhereinare providedforillustrationonlyandare fictitious. Noreal association
or connectionisintendedorshouldbe inferred.
Thisdocumentdoesnotprovide youwithanylegal rightstoany intellectual propertyinanyMicrosoft
product. You may copyand use thisdocumentforyour internal,referencepurposes.
© 2011 Microsoft.All rightsreserved.
3
Contents
Overview........................................................................................................................................5
Prerequisites...............................................................................................................................6
Lessons.......................................................................................................................................7
Lesson 1: Creating the Suppliers DQS Knowledge Base......................................................................7
Task 1: Creating a Knowledge Base and Domains...........................................................................8
Task 2: Adding Domain Values Manually.....................................................................................12
Task 3: Importing Domain Values from an Excel File ....................................................................12
Task 4: Setting Domain Rules .....................................................................................................13
Task 5: Setting Term-Based Relations .........................................................................................15
Task 6: Setting Synonyms...........................................................................................................15
Task 7: Creating a Composite Domain.........................................................................................16
Task 8: Creating a Composite Domain Rule .................................................................................17
Task 9: Configuring a Reference Data Service..............................................................................18
Task 10: Configuring Composite Domain to Use Reference Data Service .......................................19
Task 11: Publishing the Knowledge Base.....................................................................................20
Task 12: Discovering Knowledge (Knowledge Discovery)..............................................................21
Lesson 2: Cleansing Supplier Data using the Suppliers Knowledge Base............................................24
Task1: Creating a Data Quality Project........................................................................................24
Task 2: Mapping Excel Columns to DQS Domains.........................................................................25
Task 3: Cleansing Data against the Supplier Knowledge Base........................................................27
Task 4: Managing and Viewing Results........................................................................................28
Task 5: Exporting Cleansing Results to an Excel File .....................................................................31
Task 6: Importing Values from the Cleanse Supplier List Project...................................................32
Lesson 3: Matching Data to Remove Duplicates from Supplier List...................................................33
Task 1: Defining a Matching Policy..............................................................................................33
Task 2: Testing and Publishing the Matching Policy......................................................................36
Task 3: Creating and Running a Data Quality Project for Matching................................................38
Task 4: Exporting the Results from Matching Activity to an Excel File............................................39
Lesson 4: Storing Supplier Data in MDS ..........................................................................................40
Task 1: Creating Suppliers Model using Master Data Manager .....................................................42
Task 2: Uploading Supplier Data to MDS using MDS Add-in for Excel ............................................43
4
Task 3: Verifying the Data in Master Data Manager.....................................................................46
Task 4 (Optional): Combining, Matching, and PublishingNew Set of Data.....................................47
Task 5: Creating a Domain-Based Attribute from Excel.................................................................50
Task 6: Verify that the Domain-Based Attribute is Created using Master Data Manager.................52
Task 7: Viewing Updates Made using Master Data Manager in Excel ............................................54
Task 8: Adding a New Value for State Entity in Excel....................................................................55
Task 9: Creating a Derived Hierarchy using Master Data Manager ................................................57
Lesson 5: Automating the Cleansing and Matching using SSIS..........................................................60
Task 1 (Prerequisite): Removing Supplier Data in MDS.................................................................61
Task 2 (Optional): Creating a MDS SubscriptionView using Master Data Manager ........................62
Task 3 (Optional): Reviewing the Subscription Views...................................................................63
Task 4: Creating an SSIS Project using SQL Server Data Tools........................................................63
Task 5: Adding Data Flow Task....................................................................................................65
Task 6: Adding Excel Source to the Data Flow..............................................................................66
Task 7: Adding DQS Cleansing Transform to the Data Flow...........................................................67
Task 8: Adding Conditional Split Transform to Split Cleansing Output...........................................69
Task 9: Adding Union All Transform to Combine Correct and Corrected Records...........................71
Task 10: Adding Fuzzy Group Transform to Identify Duplicates.....................................................73
Task 11: Adding Conditional Split Transform to Filter Duplicates ..................................................75
Task 12: Adding Derived Column Transform to Add Columns Required by MDS.............................75
Task 13: Adding OLE DB Destination to Write Data to MDS Staging Table......................................77
Task 14: Adding Execute SQL Task to Control Flow to Run the Stored Procedure for MDS..............79
Task 15: Building and Running the SSIS Project............................................................................81
Task 16: Verifying with Master Data Manager.............................................................................83
Task 17: Reviewing DQS Cleansing Project Created by the SSIS package........................................83
Conclusion....................................................................................................................................84
5
Overview
Managing informationinanenterprise typicallyinvolvesintegratingdatafromacrossthe enterprise and
beyond,cleansingthe data,matchingthe datato remove anyduplicates,standardizingthe data,
enrichingthe data,makingthe data conformto legal andcompliance requirements,andthenstoringthe
data ina centralizedlocationwithall the necessarysecuritysettings.
SQL Server2012 providesall the componentsneededforaneffective Enterprise Information
Management(EIM) solutioninasingle product.Keycomponentsof SQLServer2012 that helpyoubuild
an EIM solutionare:
 SQL ServerIntegrationServices
 SQL ServerData QualityServices
 SQL ServerMaster Data Services
SQL ServerIntegrationServices(SSIS) providesapowerful,extensible platformforintegratingdatafrom
a varietyof sourcesina comprehensive extract,transform, andload(ETL) solutionthatsupports
businessworkflows,adatawarehouse,ormasterdata management.See IntegrationServicesOverview
topicfor a quickoverviewandtypical usesof SSIS.
SQL ServerData QualityServices (DQS) enablesyoutocleanse,match,standardize,andenrichdata,so
youcan delivertrustedinformationforbusinessintelligence,adata warehouse,andtransaction
processingworkloads. See IntroducingDataQualityServices topicforthe businessneedforDQSand
howDQS answersthe need.
SQL ServerMaster Data Services (MDS) providesacentral datahub thatensures thatthe integrityof
informationandconsistencyof datais constantacross differentapplications. See MasterDataServices
Overview topicforbrief descriptionsof importantfeaturesof MDS.
See Enterprise InformationManagementwithSQLServer2012 and CleansingandMatchingMasterData
usingEIMTechnologies whitepapersforacomprehensive guidance onimplementinganEIMsolution
usingthese MicrosoftEIMtechnologiestogetherandwatch Enterprise InformationManagement(EIM):
BringingtogetherSSIS,DQS,andMDS videofora cool demonstrationof anEIMscenario.
In thistutorial,youwill learnhowtouse SSIS,MDS, andDQS togethertoimplementasample Enterprise
InformationManagement(EIM) solution.First,youwill use DQStocreate a knowledgebase that
containsknowledgeaboutthe data(metadata),cleansethe datainan Excel file usingthe knowledge
base,and matchthe data to identifyandremove duplicatesinthe data.Next,youwilluse the MDSAdd-
infor Excel to uploadthe cleansedandmatcheddatato MDS. Then,youwill automate the whole
processusingan SSISsolution.The SSISsolutioninthistutorialreadsthe inputdatafroman Excel file,
but youcan extendittoread froma varietyof sourcessuchas Oracle,Teradata,DB2, andSQL Azure.
6
Prerequisites
 MicrosoftSQL Server2012 withthe following componentsinstalled.
o IntegrationServices(SSIS).
o Master Data Services(MDS)
o Data QualityServices(DQS)
o SQL ServerData Tools
See SQL Server2012 InstallationGuide fordetailsaboutinstallingthe product.
 Configure MDSusingMaster Data ServicesConfigurationManager
Use the ConfigurationManagertocreate and configure aMaster Data Servicesdatabase.After
youcreate the MDS database,create a Web application forMDS ina Web site (forexample:
http://localhost/MDS) andassociate the MDSdatabase with the MDS Webapplication.Note
that, to create an MDS Web application,youneedtohave IISinstalledonyourcomputer.See
WebApplicationRequirements(MasterDataServices) andDatabase Requirements(Master
Data Services) fordetailsaboutthe prerequisitesforconfiguringMDSdatabase and Web
application.
 Install andConfigure DQSusingDataQualityServerInstaller.ClickStart,clickAll Programs,
clickMicrosoft SQL Server 2012,clickData Quality Services,and thenclick Data Quality
Server Installer.
 MicrosoftExcel 2010 (32-bitispreferred)
 Install MasterData ServicesAdd-infor Excel (32-bitor 64-bitbasedon the versionof Excel you
have on yourcomputer) from here.Tofindthe versionof Excel installedonyourcomputer,run
Excel,click File on menubarand click Help to see the versioninthe rightpane.Note thatyou
needtoinstall Visual Studio2010 ToolsforOffice Runtime priortoinstallingthe Excel Add-in.
 (Optional) Create anaccountwith WindowsAzure Marketplace.One of the tasksinthe tutorial
requiresyoutohave an Azure Marketplace (originallynamed DataMarket) account.You can
skipthistask if youwant andproceedwiththe nexttask.
 DQS doesnotallowyouto exportthe cleansingormatchingresultstoan Excel file if youare
using64-bit versionof Excel. Thisis a knownissue.Toworkaroundthe issue,dothe following:
o Install SQLServer2012 SP1 (on64-bit computerswith64-bitExcel).
o Run DQLInstaller.exe –upgrade.If youinstalledthe defaultinstance of SQLServer,the
DQSInstaller.exe file will be available atC:ProgramFilesMicrosoftSQL
ServerMSSQL11.MSSQLSERVERMSSQLBinn.Double-clickthe DQSInstaller.exe file.
o In Master Data ServicesConfigurationManager,clickSelectDatabase, selectexisting
MDS database,and thenclick Upgrade.
7
Lessons
Thistutorial includesthe followinglessons:
Lesson Briefdescription Estimated
time to
complete (in
minutes)
Lesson1: Creating
the Suppliers DQS
KnowledgeBase
In thislesson,youwill create aDQSknowledge base named
Suppliers.
45
Lesson2: Cleansing
SupplierDatausing
the Suppliers
KnowledgeBase
In thislessonyouwill create andruna DQS projectto cleanse
the supplierdatainan Excel file usingthe SuppliersKByou
createdinthe firstlesson.
30
Lesson3: Matching
Data to Remove
Duplicatesfrom
SupplierList
In thislesson,youwill create aDQSprojectto perform
matchingactivitytoidentifyandremove duplicatesfromthe
cleansedsupplerlist.
30
Lesson4: Storing
SupplierDatainMDS
In thislesson,youwill uploadthe cleansedandmatched
supplierdatatoMaster Data Services(MDS) byusingthe MDS
Add-infor Excel.
30
Lesson5: Automating
the Cleansingand
Matching usingSSIS
In thislesson,youwill create anSSISsolutionthatcleanses
inputdata usingDQS,matchesthe cleanseddatato remove
duplicates,andstoresthe cleansedandmatcheddataonMDS
inan automatedmanner.
75
Lesson 1: Creating the Suppliers DQS Knowledge Base
In thislesson,youwill create aDQSknowledge base named Supplierswiththe knowledge (metadata)
aboutsupplierdata.Youwill use the knowledge base toperformcleansingandmatchingactivitieson
inputsupplierdata.The cleansingactivityidentifiesincorrect/invaliddata,correctsthe incorrectdataor
proposescorrections/suggestions,standardizesthe data,andenrichesthe datawithadditional
information.The matchingactivitycomparesdataandidentifiessimilarrecords(maybe slightly
different) inthe datathat helpsyouperformde-duplication(remove duplicates) onthe data.
You can use both interactive andcomputer-assistedprocessestocreate,build,andmanage aknowledge
base.Knowledgeinaknowledge base ismaintainedindomains,eachof whichisspecifictoa data field
inthe data that youwantto cleanse and/ormatch.
In thislesson,youwill performthe followingtaskstocreate the Suppliersknowledge base:
 Create a DQS knowledge base named Suppliers.Youcancreate a knowledge base inseveral
ways.You can builda KB fromscratch or builditbased on an existingknowledgebase orby
importingaDQS file (.dqs) thatcontainsapre-builtandexportedknowledge base,orby
performingaknowledgediscoveryactivityonsample data.Inthistutorial,youwill create the KB
fromscratch.
8
 Create domainsin the Suppliersknowledgebase thatyouwill use forcleansingdata,and
matchingdata to identifyduplicates.Youshouldcreate domainsfordatafieldsthatyouwantto
use incleansingandmatchingactivities,notforall the datafieldsinthe data.
 Addvaluestoa domainbyaddingvaluesmanually,importingvaluesfromanexcel file,by
performingaknowledgediscoveryactivityonsample data,andby importingprojectvaluesfrom
a cleansingproject.Youcan alsoimportdomainvaluesbyimportingaDQS file containing
domainpropertiesandvalues,whichyouwill notperforminthe tutorial.
 Setrulesfora domain.A domainrule isa conditionthatwill be usedbyDQSto validate,correct,
and standardize domainvalues.
 Setterm-basedrelationshipsforadomain.A term-basedrelationshipenablesyoutomake a
correctionto a termthat is part of a value ina domain.Forexample,inthe value ContosoInc.,
Inc. is a termthat can be definedasIncorporated.Thishelpsinstandardizingthe dataaswell as
inidentifyingduplicates.Forexample, ContosoInc.andContoso Incorporated can be
consideredduplicates.
 Specifysynonymsindomainvalues.Youcan settwoor more valuesassynonymsandsetone of
themas a leadingvalue,whichreplacesitssynonymvaluesduringacleansingactivityto
standardize the data.
 Create a composite domainnamed AddressValidation thatcomprises AddressLine, City,State,
and Zipdomains.A composite domainisadomainthatconsistsof one or more single domains.
It letsyou create a rule thatinvolvesmultiple domains.Forexample,youcandefine arule:if
CityisLos Angeles,State mustbe CA,where CityandState are twoseparate domains.
 Configure anduse a reference dataprovider/service. The Reference DataService feature inData
QualityServices(DQS) enablesyoutosubscribe tothird-partyreference dataproviders,andto
easilycleanse andenrichyourbusinessdatabyvalidatingitagainsttheirhigh-qualitydata.You
can use servicesfromleadingdataqualityservice providersfromwithinDQStostandardize,
correct, or enrichyourdata duringthe cleansingprocess. Inthistutorial,youwill learnhowto
configure yourDQSenvironmenttouse areference dataservice onWindowsAzure
Marketplace anduse the service associatedwiththe AddressValidationcompositedomainto
cleanse addressdata.
 Publishthe knowledgebase sothatthe KB can be usedincleansingandmatchingactivities.
Task1: Creatinga KnowledgeBaseandDomains
In thistask,you will create the Suppliersknowledge base andcreate domainsthatwill be usedfor
cleansingdataandmatchingdata to remove duplicates.
1. Launch Data Quality Client.Click Start,pointto All Programs, clickMicrosoftSQL Server2012, click
Data Quality Services,andthenclickData Quality Client.
2. In the Connectto Serverdialogbox,selectthe database serverinstance onwhichthe DQSis
installed,andclick Connect.
9
3. In the Data QualityClienthome page,inthe Knowledge Base Managementpane,click New
Knowledge Base.
4. Enter SuppliersforName of the knowledge base.
10
5. ConfirmthatCreate Knowledge Base from fieldissetto None since youare creatingthe Suppliers
knowledge base fromscratch.
6. ConfirmthatDomain Managementis selectedforthe Activityandclick Next.The Domain
Managementactivityletsyoucreate andmanage domainsinthe knowledgebase.
7. In the Domain Managementwindow,click Create a domain toolbarbuttontocreate a domain.
8. In the Create Domain dialogbox,type SupplierID forthe Domain Name,and click OK.
11
9. Repeatprevioussteptocreate the followingdomainswithall the defaultsettings.Tokeepthe
tutorial simple,youwill keepthe Data Type of all the domainsas String.The otheralloweddata
typesare:Integer,Decimal,andDate.When the Use LeadingValuesoptionisselected(default),all
synonymsare replacedwiththe leadingvalue of the synonymgroupinthe output.Setting
Normalize String option(default) removesanyspecial charactersinthe domainvalues.The Format
Output to optionletsyouselectthe formattingthatwill be appliedwhenthe datavaluesinthe
domainare output.Select Enable Speller(default) torunSpelleronall stringvalueswhen
populatingthe domain.The Language settingspecifieswhichlanguageversionof the Spelleryou
wantto apply.Select Disable SyntaxError Algorithms to populate the domainwithoutchecking
stringvaluesforsyntax errors.See Create aDomain topic inthe MSDN libraryfor more details.
a. SupplierName
b. Contact Email
c. AddressLine
d. City
e. State
f. Country
g. Zip
12
Task2: AddingDomainValuesManually
In thistask,you will adda value forthe Country domainmanually.See ChangeDomainValues topicfor
more detailsaboutthe fieldsonthispage.
1. Click Country domaininthe Domain list.
2. In the rightpane,switchto the Domain Valuestab.
3. Click Add newdomain value buttonon the toolbarin the rightpane.
4. Type UnitedStates forthe Value fieldandpress ENTER. Youcan see that,by default,the Type isset
to Correct (greencheck).The Type can be setto Error (redcross) or Invalid (orange triangle),anda
correct value can be enteredinthe CorrectTo field.
Task3: ImportingDomainValuesfromanExcel File
In thistask,you will importvaluesforthe State domainfroma worksheetof anExcel file.
1. Click State domaininthe Domain list.
2. Ensure that the Domain Valuestab is active inthe rightpane.
3. In the rightpane,fromthe toolbar,click downarrow nextto the Import Valuesbutton,and click
Import Valid Valuesfrom Excel.
4. Click Browse,selectSuppliers.xls,andclick Open.
5. SelectStatesToImport$ forthe Worksheet.
13
6. Click OK to close the Import Domain Valuesdialogbox.Youshouldsee all the namesof statesyou
importedinthe list.Notice that ShowOnly Newoptionisautomaticallyselectedafterimporting.
Whenyouimportvaluesandyou don’tsee the oldvaluesinthe list,itisbecause thisoptionis
automaticallyenabledafterimporting.Tosee all the values,justclearthe checkbox.If youimport
the same set of valuesagain,none of the valueswill be importedastheyalreadyexistinthe
domain.
Task4: SettingDomainRules
In thistask,you will create arule forthe Contact Email domaintoverifyif the email addressendswith
@adventure-works.com.See CreatingaDomainRule topicformore detailsonthe page.
1. Click Contact Email inthe Domain list.
2. Switchto the Domain Rulestab in the rightpane.
3. In the rightpane,click Add a newdomain rule toolbarbutton(see the image) toadda new rule.
14
4. Type Email Validationfor the rule name and press ENTER. The Active check box shouldbe checked
by default.Thiscontrol allowsyoutodeactivate arule temporarily.
5. In the Builda Rule pane,click down arrow, andselectValue endswith.
6. Type @adventure-works.cominthe textbox and pressTAB. You can addmore conditionsby
clickingAdda newconditionto the selectedclause toolbarbuttoninthe Build a Rule pane.Inthis
scenario,youwill notbe addinganothercondition.
7. Click Run the selecteddomainrule on test data buttonon the toolbarinthe rightpane to test the
rule againstsample data.
8. In the Test Domain Rule dialogbox,click Adds a new testingterm for the domainrule buttonon
the toolbar.
9. Type frank7@adventure-works.com(a validvalue) inthe ContactEmail column.
15
10. Repeatprevioustwostepstoadd joe2@adventure-work.com(aninvalidvalue withno‘s’).
11. Clickthe lastbutton(Test the domain rule on all the terms) on the toolbarto testthe inputdata
againstthe rule.
12. Notice thatthe firstentryis shownas a validitemandthe secondone as an invaliditem.
13. Click Close to close the TestDomain Rule dialogbox.
Task5: SettingTerm-BasedRelations
In thistask,you will define afewtermbasedrelationsforvaluesforthe SupplierName domain.A term-
basedrelationenablesyoutomake a correctionto a termthat is part of a value ina domain.Itenables
multiple valuesthatare identical exceptfor the spellingof acommon part of themto be considered
identical synonyms.Forexample, Inc.canbe correctedto Incorporated. DQS will use these relationsin
the knowledge discovery,cleansing,ormatchingprocesses.See Create Term-basedRelations formore
details.
1. SelectSupplierName inthe Domain list.
2. Switchto the Term-BasedRelationshipstabin the rightpane.
3. Click Add newrelationbuttonon the toolbarto add a new relationtothe table.
4. Type Co. for the Value fieldand Companyfor the Correct To field.
5. Repeatthe previoustwostepsforthe followingvalues:
Value Correct To
Corp. Corporation
Inc. Incorporated
Task6: SettingSynonyms
In thistask,you will settwodomainvalues,USAandUnitedStates,of the Country domainas synonyms
withUnitedStatesas the leadingvalue.Since the Use LeadingValues optionwasselectedwhen
16
creatingthe Country domain,any USA valuesforthe Country domainwill be outputas UnitedStates(as
thisisthe leadingvalue).See ChangeDomainValues formore details.
1. SelectCountryfromthe listof domains.
2. Switchto the Domain Valuestab.
3. Click Add newdomain value buttonon the toolbar.
4. Type USA forthe value andpress ENTER.
5. Multi-selectUnitedStatesandUSA usingCTRL or SHIFT keys,right-clickthe selecteditems,andthen
clickSet as Synonyms. DQS will groupthese valuesanddesignateone of the valuesasthe leading
value thatthe other valueswill be replacedwith.
6. Notice thatUnitedStates isset as the leadingvalue.If youwantUSA to be the leadingvalue,you
can right-clickonUSA andselect Setas Leadingoption.Forthistutorial,we will use UnitedStatesas
the leadingvalue.
Task7: Creatinga CompositeDomain
In thistask,you will create acomposite domain, AddressValidation,whichcomprises AddressLine,
City, State and Zip domains.A composite domainletsyoudefineacross-domainrule thatinvolves
multiple domainsinarule.There are otheradvantagestoa composite domainsuchasbeingable to
parse a fieldvalueintomultipledomains. Forexample,avalue fora Full Name fieldcanbe parsedinto
separate FirstName,Middle Name,andLastName domains.Inthistutorial,we will onlybe defininga
cross-domainrule.See ManagingaComposite Domain formore details.
1. In the leftpane,clickCreate a composite domain toolbarbutton.
17
2. Enter AddressValidationforthe Composite DomainName.
3. From the domainlistselect AddressLine, City, State,and Zip and click right arrow to addthemto
the Domains in Composite Domain list.
4. Click OK to close the dialogbox.
Task8: Creatinga CompositeDomainRule
In thistask,you will create arule forthe AddressValidationcomposite domain.Youwill define across-
domainrule:if Cityis Los Angeles,State mustbe CA where CityandState are two domains.
1. In the rightpane,switchto the CD Rules tab.
2. Click Add a new domain rule fromthe toolbar.
3. Type City-State Rule forName andpress ENTER.
4. In the Builda Rule pane,selectCityinthe domainlist,andselectcondition Value isequal to and
type Los Angelesforthe value.
18
5. In the Then pane,selectState inthe domainlist,andselect Value isequal to, type CA for the value,
and press TAB.
6. Click Close buttonat the bottomof the page to switchto the mainpage of DQS Client.Youwill
publishthe knowledgebase inthe nextlesson.Notice thatthe KBisin lockedstate (lockicon).
Task9: ConfiguringaReferenceDataService
In thistask,you will configureDQStouse a Reference DataService onWindowsAzure Marketplace.In
the nexttask,you will be configuringthe AddressValidationdomaintouse thisservice.Atruntime,
duringcleansingactivity,DQSpassesthe valuesof domainsinthe AddressValidationdomaintothe
service forcleansing.See Configure DQStoUse Reference Data formore details.
1. In the mainpage of DQS Client,inthe Administrationpane,click Configuration.
2. Ensure that Reference Data tab isactive.
3. In the Network Settingsarea,type appropriate valuesinthe ProxyServerand Port fieldsif youneed
to use a proxyserverto connectto internet.
4. Type your WindowsAzure Data Market (Marketplace) Account Key for the DataMarket Account ID
field.
19
5. Click Validate buttonnexttothe textbox to validate the accountID.
6. Click OK on the message box.
7. Click Close at the bottomof the page to switchto the mainpage of DQS Client.
Task10: ConfiguringCompositeDomainto UseReferenceDataService
In thistask,you will configurethe AddressValidationcomposite domaintouse the MelissaData –
AddressCheck service. Atruntime,duringcleansingactivity,DQSpassesthe valuesof domainsinthe
AddressValidationdomaintothe service forcleansing.See MapDomain/Composite Domainto
Reference Dataformore details.
1. In the mainpage of DQS Client,clickSuppliers(DomainManagement) underRecentKnowledge
Bases to launchthe Domain Managementpage.
2. Selectthe AddressValidationcomposite domainisselectedinthe listofdomains.
3. In the rightpane,switchto the Reference Data tab.
4. Click Browse buttononthe toolbar.
5. On the Online Reference Data ProvidersCatalog dialogbox,select checkboxnextto MelissaData –
AddressCheck.
20
6. In the rightpane,inthe Schemasection,map AddressLine domaintothe AddressLine (M) schema
itemusingthe drop-downlist.
7. Click Add Schema Entry (+) buttonon the toolbarto create a new entryin the list.
8. Map the followingDQSdomainsusingthe drop-downlistsasshowninthe followingpicture.
9. Click OK to close the dialogbox.
Task11: PublishingtheKnowledgeBase
In thistask,you will publishthe knowledge base.A publishedknowledgebasecanbe usedforcleansing
or matchingactivityindata qualityproject.
1. Click Finishbuttonat the bottomof the window.
2. Click Publishinthe SQL ServerData Quality Services dialogbox.
3. Click OK to close the message box.
21
Task12: DiscoveringKnowledge(KnowledgeDiscovery)
In thistask,you will performthe KnowledgeDiscoveryactivityon SupplierIDand SupplierName
domains.Inthisscenario,the knowledge discoveryprocessmainlyimportsvaluesforthese two
domains.
In thistutorial,youstartedbuildingknowledge base fromscratch.Youcan alsostart creatinga
knowledge base byperformingaknowledge discoveryactivity.Whenyouclick Create a Knowledge Base
inthe mainpage,DQS clienttakesyoutoa page with Domain Managementactivityselectedforthe
activity.Youcan change the activity to Knowledge Discoveryandthenin the nextpage youcan create
domainsaspart of the knowledge discoveryprocess.See PerformKnowledge Discovery formore
details.
1. In the mainpage of DQS Client,inthe RecentKnowledge Base section,click right-arrownexttothe
SuppliersKBand click Knowledge Discovery.Alternatively,youcanclick OpenKnowledge Base,
selectSuppliersfromthe KB list,selectKnowledge Discoveryasactivity and click Next.
2. SelectExcel File forData Source.
3. Click Browse,navigate andselect Suppliers.xls,andclick Open.
4. SelectSuppliersforDiscoveryfor Worksheet.
22
5. In the Mappingssection,map SupplierIDcolumnfromthe Excel file tothe SupplierID domainand
SupplierName columnto the SupplierName domainby usingdrop-downlists.The Excel file has
sample datafor the SupplierID and SupplierName domains.Inthe discoveryprocess,youcan
selectthe domainsforwhichyouwantto discoverthe values.Note thatyoucreate domainsonthis
page and thenmap the source columnsto those domains.Itisnotuncommonto create domains
duringknowledgediscoveryactivityinsteadof creatingdomainsduringdomainmanagement
activity.
6. Click Nextto switchto the Discoverpage.
7. On the Discover page,click Start to start the discoveryprocess.Discoveryisperformedonthe
columns SupplierIDandSupplierName in the Suppliers.xlsfile.The SupplierIDand SupplierName
domainswill be populatedwiththe knowledge drawnfromthe discovery.
8. Afterthe analysishascompleted,review the Source Statisticsinthe Profilertab at the bottomof
the page. Notice that10 newrecordswithtotal 20 values(SupplierIDandSupplierName values
fromthe Excel worksheet) were discovered.Youwill alsoseehow manyof the valuesare new,
23
unique,newandunique,andvalid.Inthe listbox tothe right,you can see more detailsforeach
domaininvolvedinthe discoveryprocess.If youhoverthe mouse overthe statusbar inthe
Completenesscolumn,youcansee if there are any missingvaluesinthe columnsinthe source.
9. Click Nextto switchto the Manage Domain Valuespage.
10. In the Manage Domain Valuespage,click SupplierName domainfromthe listof domains.
11. In the rightpane,right-clickLazyCountry Storex (notice ‘x’atthe end),andselectLazy Country
Store. DQSsuggeststhischange afterrunningthe spell checkeronthe domain.Bydefault,spelleris
enabledonthe domainsyoucreate.
12. In the domainvalueslist,confirmthatthe value LazyCountry Storex is setas an error (red X mark)
withLazy Country Store as the correctionand alsothe Lazy Country Store isalsoaddedas a valid
value.
13. Click Finish.
14. On SQL Server Data Quality Services dialogbox,click Publish.
15. Click OK on the successmessage box.
24
Lesson 2: Cleansing Supplier Data using the Suppliers Knowledge Base
In thislessonyouwill cleanse the supplierdatainan Excel file usingthe SuppliersKByouhave created
inthe firstlesson.DatacleansinginDQSincludesa computer-assistedprocessthatanalyzeshow data
conformsto the knowledgeinaknowledge base,andan interactive process that enablesyoutoreview
and modifyresultsfromthe computer-assistedprocess.The datacleansingfeature identifiesincorrect
data inyour data source,andthencorrects or suggestscorrectionsforthe incorrectdata. It also
standardizesandenrichescustomerdatabyusingdomainvalues,leadingvaluesforsynonyms,domain
rules,term-basedrelations,andreference data.Youcan interactivelyapproveorrejectchanges
proposedbythe computer-assistedprocess.See DataCleansingformore details.
The computer-assistedprocessusesthe followingthresholdvaluesthatyoucan configure usingthe
Configurationoptiononthe DQSClientmainpage.
 Min score for suggestions:The minimumscore or confidence level thatwill be usedbyDQSfor
suggestingreplacementfora value.
 Min score for auto corrections:The minimumscore orconfidence levelthatwill be usedbyDQS
for automaticallycorrectingavalue.
See Configure ThresholdValuesforCleansingandMatching for detailsonhow to configure these
settings.
In thislesson,youwill performthe followingtaskstocleanse the inputdatausingthe SuppliersKB.
1. Create a Data QualityProjectforCleansing,selectthe SuppliersKBasthe KB to use to analyze
and cleanse the source datainan Excel file,andselectthe Cleansingactivity.
2. Map Excel columnstobe cleansedtoappropriate DQSdomains/compositedomainsinthe
knowledge base.
3. Run the computer-assistedcleansingactivity.The computer-assistedprocessdisplaysdata
qualityinformationinthe DataQuality Clientthatyoucan use to interactivelycleanse the data.
4. Viewandmanage the resultsof the cleansingactivity.Youcanreview the valuesthatare found
by the computer-assistedprocesstobe correct,incorrectbut corrected,incorrectwitha
suggested change,orinvalid.Youcaninteractivelyapprove orrejectchanges,correctingor
overridingthe suggestionfromthe computer-assistedprocessusingthe CorrectTofield.
5. Exportthe resultsfromthe cleansingprocesstoanExcel file.
6. Importthe values fromthe cleansingprojectintodomainstoaugmentthe knowledgeinthe
knowledge base withnewrules,values,correctionsetc…
Task1:CreatingaData QualityProject
In thistask,you will create aData QualityProjectforcleansingthe supplierdatain an Excel file against
the Suppliersknowledge base youcreatedearlierinthistutorial.
1. In the Data Quality Projectpane onthe mainpage,click NewData Quality Project.
25
2. Type Cleanse SupplierListfor the name of project.
3. Important: SelectSuppliersforthe Use Knowledge Base field.Youwill be cleansinginputsupplier
data againstthe Suppliersknowledgebase youcreatedearlierinthistutorial.
4. Ensure that Cleansingisselectedasthe activityat the bottomof the rightpane and click Next.
Task2: MappingExcel Columnsto DQS Domains
In thistask,you will mapcolumnsinanExcel file toDQS domainsinthe Suppliersknowledge base.
26
1. In the Map page,selectExcel File forData Source.
2. Click Browse,selectSuppliers.xlsx,andclick Open.
3. SelectIncomingSuppliers$forthe Worksheet.
4. Map columnsasshowninthe followingtable andscreenshot.Whencreatingmappingsforthe State
domain,click Adda column mappingtoolbarbuttonlocatedjustabove the list.Note thatyouare
not usingSupplierID column/domainforcleansing.Youwill use the SupplierIDdomainlaterinthe
matchingactivity.
Excel column DQS Domain
SupplierName SupplierName
ContactEmailAddress Contact Email
AddressLine AddressLine
City City
State State
Country(Click +(Adda columnmapping) toolbar
to add a newrow)
Country
ZipCode Zip
5. As youhave mappedall the individual domainswithinthe AddressValidationcomposite domain,
the composite domainautomaticallyparticipatesinthe cleansingprocess.Click View/Select
Composite Domains buttonto see thatthe Address Validationcomposite domainisautomatically
selected,andthenclick OK.
27
6. Click Nextto switchto the Cleanse page.
Task3: CleansingDataagainsttheSupplierKnowledgeBase
In thistask,you will runthe computer-assistedcleansingprocess.DQSusesadvancedalgorithmsand
confidence levelsbasedonthe thresholdvaluesspecifiedtoanalyze the dataagainstthe selected
knowledge base,andthencleanse it.See CleansingDataUsingDQS (Internal) Knowledgeformore
details.
1. Click Start to start the computer-assistedcleansingprocess.
28
2. Whenthe cleansingprocessiscompleted,review statisticsinthe Profilertab.The Source Statistics
provide the numberof recordsprocessed,numberof recordsthatare foundto be correct, number
of recordsthat are correctedby DQS, numberof recordsthat have changessuggestedbyDQS,and
the numberof records thatare invalid.Inthe listbox tothe right,you can see the correctedvalues,
suggestedvalues,andthe completeness(the extenttowhich the dataispresent) andaccuracy (the
extenttowhichthe data can be usedfor intendedpurposes)of valuesforeachdomaininvolvedin
the cleansingprocess.
3. Click Nextto switchto the Manage and ViewResults page.
Task4: ManagingandViewingResults
In thistask,you will reviewthe resultsof computer-assistedcleansingandalsoperforminteractive
cleansingonthe supplierdata.See Interactive CleansingStage formore details.
1. SelectContact Email domainfromthe listof domains.
2. Switchto the Invalidtab in the rightpane.Notice thattwo emailsaddressthatwere missing
character ‘s’at the end.These twoemailswere foundtobe invalidbythe domainrule thatrequires
all email addressestoendwith @adventure-works.com(with‘s’).DQSusesthe domainrule while
cleansingtodeterminewhetheranemail isavalidone or not.Thistab displaysthe domainvalues
that were eithermarkedasinvalidinthe KBor failedadomainrule.Inthiscase,these valuesfailed
the domainrule (Email Validation).
3. In the Correct To column,type the rightemail addressthatendwith @adventure-works.com(with
‘s’).
29
4. Click Approve for boththe records to approve boththe changes.Whenyouapprove,the records
move to the Correctedtab. Insteadof approvingeachitemseparately,youcanapprove all the
changesat once usingthe Approve all termstoolbarbutton.
5. Switchto the Newtab in the rightpane.These are the valuesforwhichDQS doesnothave enough
informationinthe KByetto determine whetherthe valuesare correct.Therefore,itcannotmake
changesor suggestchangesto the domainvalues.
6. Reviewthe valuestoconfirmthatall the emailsendwith @adventure-works.comandclick Approve
all terms onthe toolbar.The approvedvaluesfromthistabmove tothe Correct tab.
7. Selectthe Countrydomainfromthe listof domains.
8. Switchto the Correctedtab in the rightpane and notice that UnitedState value isautomatically
correctedto the UnitedStates with‘s’at the end.Thisis not a rule youdefinedforthe Country
domain,butDQS is 83% confidentthatthe correct value is UnitedStates.The Approve buttonis
automaticallyselectedforall the Correcteditems.Youcan override thisandrejectachange if
needed.
9. Notice thatUSA iscorrectedto UnitedStates because theyare synonymsand UnitedStatesis the
leading(preferred) value.
10. Notice thatthe Approve buttonisalreadyselectedforthese correctedvalues.Thisisthe default
behaviorforthe correctedvalues.Youcan rejecta change and whenyoudoso, the value movesto
the Invalid tab.
11. SelectSupplierName fromthe listof domains.
12. Switchto the Correctedtab in the rightpane.
30
a. Notice thatA. Datum Corp. iscorrectedto A. Datum Corporation and the Reason isset to
Term based relation.A. Datum Corporation is a knowndomainvalue toDQS because itwas
discoveredduringthe knowledge discoveryprocess.Therefore,DQSis 100% confident
aboutthiscorrection.
b. Notice thatthat Lazy Country Storex is correctedto Lazy Country Store, Confidence Level is
setto 100%, and the Reason isset to Domain Value.Duringthe knowledge discovery
process,youset Lazy Country Storex as an error with Lazy Country Store as the correction,
so DQS is 100% confidentaboutmakingthiscorrection.
c. DQS isnot familiarwiththe othervaluesinthe list,butitfoundthe correctionsforthese
valuesusingthe Spell Checkerandproposesthe appropriate corrections.DQSis not 100%
confidentaboutthese corrections,butthe confidence levelisabove 80%, whichisthe
thresholdlevel formakingcorrections,soDQSproposesthe corrections.
13. Notice thatthe Approve isautomaticallyenabledforall the values.Youcanoverride the corrected
value or rejectthe change as appropriate.Bydefaultthe Approve buttonisselectedforall the
valuesonthe Correctedtab.
14. Switchto the Newtab.
15. Notice thatCorp. is correctedto Corporation,Co. iscorrectedto Company, and Inc. iscorrectedto
Incorporated. For example, Consolidate Inc.iscorrectedto Consolidate Incorporatedand
ConsolidatedCo.is correctedto ConsolidatedCompany,and Frabrikam Corp. is correctedto
Fabrikam Corporation. You can see that term-basedrelationismentionedasthe reason.These
changesare proposedbyusingthe term-basedrelations youdefinedduringthe domain
managementactivity.Youcanchange the Correct To valuesmanuallyhere.
16. Scroll the listto see Hunxgry Coyote Store witha redsquigglyline.Right-clickonitandclick Hungy
Coyote Store (withno‘x’).The Correct To columnshouldbe automaticallypopulatedwith Hungry
Coyote Store. You can alsomanuallytype avalue inthe Correct To column.
17. Click Approve all terms fromthe toolbar.The domainvalueswiththe CorrectTo value specified
move to the Correctedtab and the new valueswithnoassociated CorrectTo valuesmove tothe
Correct tab.
18. Selectthe AddressValidationcomposite domainfromthe domainlist.
19. In the rightpane,switchto the Correct tab. You shouldsee the addressesthatare foundto be
correct by the MelissaData – Address CheckDQS service onthe Azure Marketplace.
20. Switchto the Correctedtab.
31
21. Notice thatState for the record that has City as Los Angelesissetto CA now.Notice inthe Reason
fieldisthatCorrectedby Rule ‘City-State Rule’.
22. Notice thatthe Approve radiobuttonis alreadyselectedforthisiteminthe list.Thisisthe default
behaviorforitemsonthe Correctedtab.
23. Switchto the Suggestedtab.Reviewthe changessuggestedbythe MelissaData – AddressCheck
service.
24. ClickApprove all terms on the toolbarbuttonand click OK onthe Confirmationmessage box.
25. Click Nextto switchto the Export page.
Task5: ExportingCleansingResultsto anExcel File
In thistask,you will exportresultsfromthe cleansingactivitytoanExcel file.See ExportStage topicfor
more details.
1. In the rightpane,select Excel forthe DestinationType.
2. Click Browse,specifythe outputfile name as CleansedSupplierList.xls,andthenclick Open.
3. SelectData Only for the Output formatto exportjustthe cleanseddata.The secondoption, Data
and CleansingInfo,letsyouexportcleansingactivitydetailsalongwiththe cleanseddata.The
Standardize Format optionletsyouapplyanyoutputformatsyoudefine ona domaintothe values
of thatdomain.You have notdefinedanoutputformatonany domaininthe tutorial.
4. Click Export to exportthe data.Do not click Finishyet.
5. Click Close onthe Exportingdialogbox.
6. Click Finishto finishthe activity.If youhadforgottentoexportresultsbefore clicking Finish,click
OpenData QualityProject inthe main page of DQS Client,selectCleanse SupplierListfromthe list
32
of projects,andclick Nextat the bottomof the screento getto the Export stage of cleansing
processagain.You can alsoswitchto the Manage and ViewResults tab byclickingBack button.
7. Openthe CleansedSupplierList.xls anddo the following:
a. Ensure that there are no email address thatendwithadventure-work.com(without
character ‘s’) bysearchingforadventure-work.cominthe worksheet.
b. See that there isno USA value inthe Country column.
c. Searchfor Los Angelesandsee that the State is setto CA.
d. Confirmthatthere are notermsCo., Corp., andInc.
e. Important: delete the AddressValidationcolumnfromthe spreadsheetandsave the excel
file.Thisadditional columncorrespondstothe AddressValidationcomposite domain.
Task6: ImportingValuesfromtheCleanseSupplierList Project
In thistask,you will importthe dataqualityknowledge gatheredduringthe cleansingprocess.See
ImportingCleansingProjectValuesintoaDomain topicformore details.Youwill alsoexportthe
knowledge base intoaDQSfile before publishingthe updated SuppliersKB.
1. In the mainpage of DQS Client,clickright-arrow nextto SuppliersunderRecentKnowledge Bases
and click Domain Management.
2. Click Contact Email inthe listof domains,andswitchtothe Domain Valuestab inthe right pane.
3. Click down arrow nextto the Import Valuesiconon the toolbarandclick Import ProjectValues.
4. On the Import Project Values dialogbox,selectthe Cleanse SupplierListproject,andclick OK.
5. Notice thatall the emailsare importedalongwiththe twocorrectionsyoudidduringinteractive
cleansing.Scroll tosee the twocorrections.
Value Correct To
bobby0@adventure-work.com bobby0@adventure-works.com
tad0@adventure-work.com tad0@adventure-works.com
6. Repeatthe previousstepof importingprojectvaluesforthe Countrydomainandnotice thata new
entryisaddedfor correcting UnitedState toUnitedStates (with‘s’)
Value Correct To
UnitedState UnitedStates
7. To see the olddomainvalues,clear ShowOnly Newcheckbox.
8. Repeatthe previousstepof importingprojectvaluesforthe SupplierName domain.Bydefault,
afterimporting,youwill onlyseethe new values.Tosee all the values,clear ShowOnlyNewcheck
box.We have enrichedthe SuppliersKBwithwhatwe learnedfromthe cleansingactivity.The
strongerthe KB is,the betterthe cleansingresultsare.Note thatitisnot possible importvaluesfor
a composite domain.
9. Click Export Knowledge Base icon onthe toolbarandthenclick Export Knowledge Base.
33
10. Navigate tothe Tutorial folder,type Suppliers.dqsforthe file name,andclick Save. You can use this
DQS file tocreate a newknowledge base basedonit.
11. Click OK to close the Export Knowledge Base – Suppliersmessage box.
12. Click Finishto finishthe activity.
13. Click Publish.
14. Click OK on the message box.
Lesson 3: Matching Data to Remove Duplicates from Supplier List
You prepare the knowledgebase forperformingmatchingactivitybycreatingamatching policyinthe
knowledge base.There canbe onlyone matchingpolicyina knowledge base.A matchingpolicyconsists
of one or more matchingrules.A rule identifiesthe domainsthatwill be involvedinthe matching
process,andspecifiesthe weightthateachdomainvalue carriesinthe matchingjudgment.Youspecify
inthe rule whetherdomainvalueshave tobe anexactmatch or can justbe similar,andto whatdegree
of similarity.Youalsospecifywhetheradomainmatch isa prerequisiteforthe matchingprocess.You
can testeach rule separatelyandtestthe entire policyagainstsample data.The testingprocessdisplays
recordswhose matchingscoresare greaterthan the Min record score thresholdspecifiedinthe DQS
configurationinacluster(group).Youcan continue totweakthe rulesinthe policyuntil youare
satisfied.
Afterdefiningthe policy,youcreate aData QualityProjecttorun the matchingactivity.The matching
projectappliesthe matchingrulesinthe matchingpolicytothe datasource to be assessed.Thisprocess
assessesthe likelihoodthatanytworowsare matches.WhenDQSperformsthe matchinganalysis,it
createsclustersof recordsthat DQS considersmatches.DQSrandomlyidentifiesone of the recordsas a
pivotrecord.You can verifyandrejectanyrecord thatis not an appropriate matchforthe cluster.See
Create a Matching Policy topicformore details.
In thislesson,youwill performamatchingactivitytoremove duplicatesfromthe supplierlist.First,you
will create amatchingpolicywithone rule toidentifyduplicatesinthe supplierlistandpublishthe
policytothe knowledge base.Next,youwillcreate andruna data qualityprojectformatching.Finally,
youwill exportthe resultsfromthe matchingactivitytoanExcel file thatyouwill use laterinuploading
data to Master Data Services(MDS).
Task1: DefiningaMatchingPolicy
In thistask,you will create amatchingpolicywithone rule init.The rule will have one prerequisite:
SupplierID, whichmeansthatthe SupplierIDsmustmatch before usingthe otherdomainsinthe rule.
The rule usestwo otherdomains: SupplierName withSimilarityvalue setto70% andContact Email
withSimilarityvalue setto30%.
34
1. In the mainpage of DQS Client,clickright-arrow nextto SuppliersKB,andselectMatchingPolicy.
2. SelectExcel File forData Source inthe Map page.
3. Click Browse,ensure filterissetto Excel Workbook,and selectCleansedSupplierList.xlsfilethat
youexportedafterperformingthe cleansingactivity.Note thatatthe endof thisactivity,youwill
not be able to exportresultsbecause thisactivityisprimarilyfocusedondefiningamatchingpolicy.
You will create aData QualityProjectforthe Matchingactivityandrun it to remove duplicatesfrom
the supplierlistusingthismatchingpolicyinthe nextlesson.
4. Map SupplierIDcolumntoSupplierID domain, SupplierName columnto SupplierName domain,
ContactEmailAddress columnto Contact Email domain.Youonlyneedtomap source columnsto
domainsthatyou wantto use in definingthe matchingpolicy.Inthiscase,youare makingthe
SupplierID,SupplierName,and ContactEmail domainsavailable forthe matchingpolicyactivity.
35
5. Click Nextto move to the Matching Policypage where youwill be definingamatchingpolicywith
one rule init.
6. Click Create a matching rule buttonon the toolbarto create a rule inthe policy.
7. In the Rule Details pane on the right,enterRemove Duplicate Suppliers forthe Rule name.
8. Click Add a new domain elementinthe toolbarinthe rightpane.
9. SelectSupplierIDfor the domain and selectthe Prerequisite checkbox.Notice that Similarityis
automaticallysetto Exact. By settingSupplierIDas the Prerequisite,youspecifythatthe valuesfor
thisfieldinthe tworecordsmustreturn a 100% match, else the recordsare notconsidereda match
and the otherclausesinthe rule are disregarded.
36
10. Click Add a new domain elementfromthe toolbaragain.
11. SelectSupplierName domain,selectSimilarforSimilarity,andType 70 forthe Weight. Here,you
are specifyingthatsuppliernamesdonot needtobe identical butcanbe similarforthe recordsto
be consideredasa match. The weightindicatesthe contributionof thisfield’sscore tothe overall
matchingscore.
12. Repeatsteps10-11 to add Contact Email domainwith 30 for the Weight.
13. Notice thatthe min matching score isset to 80%, whichisthe value yousee inthe General tabof
the Configurationpage of DQS Administration.You can onlyincrease thisscore above this
thresholdvalue here.
14. Notice thatOverlappingClusters optionisselected. Withthisoption,arecordcan show upin
multiple clusters.If youchange the settingtoNonOverlappingClusters,the clustersthathave
commonrecordsare combinedintoone single cluster.
15. The Start buttonon thispage allowsyouto testeach rule inthe policyseparately,whereas,the
Start buttoninthe nextpage allowsyoutotestentire policy(all the rulesinthe policy).
16. Click Nextto switchto the Matching Resultspage.
Task2: TestingandPublishingtheMatchingPolicy
In thistask,you will testandpublishthe Remove Duplicate Suppliers matchingpolicy.
1. In the Matching Resultspage,click Start to testthe entire policy.Inourcase,we have onlyrule in
the policy,sothe resultsfromtestingthe rule andthe policyshouldbe the same.
2. Reviewall the matchedrecordsandtheirmatchingscore inthe listbox.A recordthat has a Green
iconassociatedwithitisa duplicate of the pivotrecordthatprecedesit.Here are couple of
examples:
a. The record with Record ID: 1000005 isa match of the record with RecordId: 1000004 with
Score: 100% because boththe recordshave the same valuesfor SupplierID(prerequisite),
SupplierName,and ContactEmailAddresscolumns.DQS randomlypicksarecord as the
pivotrecordfor a cluster.
b. The record 1000023 is a match of the record 1000022 withthe matchingscore: 93%
because the tworecordshave the same valuesfor SupplierID(prerequisite) andSupplier
Name columns,butdifferentvaluesforthe ContactEmailAddresscolumn.
c. Scroll to the bottomof the listto see tworecordswithrecordsIDs: 1000051 and 1000052.
Record1000052 isconsideredamatch withmatchingscore 91% because the tworecords
have the same valuesforthe SupplierIDand ContactEmailAddress columns,butdifferent
valuesforthe SupplierName column.
37
3. Right-clickonanymatchedrecord(withgreenicon) andclick ViewDetailsto see more detailsabout
the matchingsuch as contributionof eachfieldscore tothe overall matchingscore.
4. Click Close to close the MatchingScore Details dialogbox.
5. Click Matching Resultstab at the bottomof the page.Thiswindow givesyoudetail suchasnumber
of matchedrecords,numberof unmatchedrecords,numberof clusterswithmatchedrecords,the
average clustersize,minimumclustersize,andmaximumclustersize.See Create aMatchingPolicy
for more details.NOTE:youwill notbe able exportresultsfromthisactivity.You are justdefininga
matchingpolicyusingthe sample datatotest rulesandthe policyagainstthe sample data.
38
6. Click Finishto finishcreatingthe matchingpolicy.
NOTE: You have definedthe matchingpolicyhere;therefore youcannotexportresultsto anoutput
file.Youbasicallyusedasample inputfile,createdrules,andtestedthe rulesandpolicyagainstthe
sample datawiththe goal of definingthe policy.
7. Click Publishonthe SQL ServerData QualityServicesdialogbox andclick OKon the message box.
Now,the matchingpolicyyoudefinedispublishedintothe SuppliersKnowledge Base.Youcanuse
the knowledge base torunthe matchingprocessagainstan inputfile toidentifyandremove
duplicates.
Task3: CreatingandRunninga DataQuality ProjectforMatching
In thistask,you will create aData QualityProjectforthe matchingactivityandrun the matchingprocess
on cleansedsupplierdatato remove anyduplicatesinthe data.
1. On the mainpage of DQS Client,click NewData Quality Project.
2. Type Remove SupplierDuplicates from the Name of the project.
3. Important: SelectSuppliersfromthe listof KBs forthe Use Knowledge Base field.Youhave created
a matchingpolicyinthisknowledge base inthe previouslesson.
4. Important: SelectMatching fromthe list ofactivitiesfrom the bottom-rightpane.
39
5. Click Next.
6. In the Map page,selectExcel File forthe Data Source.
7. Click Browse andselectCleansedSupplierList.xls,whichisthe outputfile fromthe cleansing
activity.
8. Map SupplierIDsource columntothe SupplierID domain, SupplierName columntoSupplierName
domain,and ContactEmailAddress columnto Contact Email domain.
9. Click Nextto switchto the Matching page.
10. Click Start to start the matchingprocess.Youwill see resultssimilartothose fromthe previoustask
because youusedthe same inputfile fordefiningthe matchingpolicy.
11. Reviewall the matchedrecordsandtheirmatchingscore inthe listbox.The resultsshould be same
as the onesyou sawin the previoustask.See the stepsinthe previoustasktoanalyze the results
fromthismatchingactivity.
12. Click Nextto switchto the Export page.
Task4: ExportingtheResultsfromMatchingActivityto an Excel File
In thistask,you will exportthe resultsfromthe matchingactivitytoanExcel file.
1. In the Export page,selectExcel File forthe DestinationType.
2. SelectSurvivorshipResultsoption.Inthe survivorshipprocess,DQSdeterminesasurvivorrecordfor
each clusterbasedonthe SurvivorshipRule youselected.
3. Click Browse andnavigate to the folderwhere youwanttostore the outputfile.
4. Type Cleansedand MatchedSuppliers.xlsforthe name and click Open.
40
5. ConfirmthatPivot Record isselectedforthe SurvivorshipRule.Whenyouselectthisoption,the
pivotrecordfor eachclusterispickedforthe outputfroma cluster.The otheroptionsforthe
SurvivorshipRule are:
a. Most complete record: Survivorrecordisthe one withthe largestnumberof populated
fields.
b. Longest record: Survivorrecordisthe one withthe largestnumberof termsinsource fields.
c. Most complete and longestrecord: Survivorrecordisthe one withthe largestnumberof
populatedfields,andhasthe largestnumberof termsin eachfield.
6. Click Export to exportthe resultstoexcel file.
7. Click Close to close the MatchingExport dialogbox.
8. Click Finishto finishthe matchingactivity.
9. Openthe Cleansedand MatchedSuppliers.xlsx file andconfirmthatyoudonot see anyduplicates
(SupplierID).
Now,we have supplierdatathathas beencleansedandmatchedtoremove duplicates.
Lesson 4: Storing Supplier Data in MDS
Master Data Services(MDS) isthe SQL Serversolutionformasterdatamanagement.Masterdata
management(MDM) describesthe effortsmade byanorganizationtodiscoveranddefinenon-
transactional listsof data.
Modelsare the highestlevel of organizationinMasterData Servicesandorganize the structure of your
masterdata. Your MDS implementationcanhave one ormany modelswhere eachmodelgroupssimilar
kindsof data. Ingeneral,masterdatacan be categorizedinone of fourways:people,place,things,or
41
concepts.Forexample,youcancreate a Product model tocontainproduct-relateddataor Customer
model tocontaincustomer-relateddata.See Models(MasterDataServices) formore details.
A model cancontainone or more entities.Eachentityhasattributes(columns) andmembers(rows).
Each row containsthe masterdata. In thislesson,youwill create aSuppliersmodelwithtwoentities
namedSupplierandState.The Supplierentitywill have the followingattributes:Code,Name,Contact
FirstName,ContactLast Name,ContactEmail Address,AddressLine,City,State,Zip,andCountry.See
Attributes(MasterDataServices) formore detailsaboutattributesingeneral.The Code andName
attributescorrespondtothe SupplierIDandSupplierName columnsinthe CleansedandMatched
SuppliersExcel file.
A domainbasedattribute isanattribute withvaluesthatare populatedbymembersof anotherentity.
Domain-basedattributespreventusersfromenteringattributevaluesthatare not valid.Anattribute
valuescanbe selectedonlyfromthe drop-downlistthatispopulatedbyanotherentity.Inthistutorial,
the State attribute of the Supplierentityisadomainbasedattribute withvaluesfromthe State entity.
You can onlychange the value of the State attribute of the Supplierentitytoone of the valuesinthe
State entity.See Domain-BasedAttributes formore details.
A derivedhierarchyinMDSisderivedfromthe domain-basedattributerelationshipinthe model.Inthis
tutorial,youwill create aderivedhierarchybetweenthe Supplierentityandthe State entity.Afteryou
create the derivedhierarchy,youwillsee alistof statesinthe Browserof Master Data Manager.When
youclick ona state inthe list,youwill see the suppliersinthatstate inthe right pane.Youwill be
creatinga derivedhierarchylaterbasedonthisrelationship.See DerivedHierarchies formore details.
You builta knowledgebase inDQSandusedit to cleanse andmatchsupplierdataandstoredthe results
inthe CleansedandMatchedSupplierData.xlsfile.Inthislesson,youwilluploadthe cleansedand
matcheddata intoMDS. Note that DQS onlycontainsknowledge aboutthe data(metadata) whereas
MDS storesthe data itself (masterset).For example:DQSmayhave knowledgeaboutseveral suppliers
but MDS onlymaintainsthe suppliersthata companyuses.
In thislesson,youwill performthe followingtasks:
1. Create the Suppliersmodel in MDSby usingthe Master Data Manager WebApplication.
2. OpenCleansedandMatched SupplierData.xls inExcel and use the MDS Add-infor Excel to create
an entitynamed Supplieranduploadthe datato MDS.
3. Verifythatthe data iscreatedinMDS by usingthe Master Data Manager.
4. Create a newentitynamed State and update the State attribute of Supplierentitytobe a domain-
basedattribute dependingonthe State entity.Youwill dothisall usingthe MDS Add-infor Excel.
5. Verifythatthe domainbasedattribute iscreatedbyusing MasterData Manager and update the
valuesforthe Name attribute of the State entity.
6. Viewthe updatesyoumade using MasterData Manager inExcel.
7. Load valuesfromthe State entityinto Excel andadd a new value,andverifythe additionbyusing
Master Data Manager.
42
8. Create and use a derivedhierarchyusingthe domain-basedattribute relationshipbetweenthe
Supplierentityandthe State entity(the State attribute of the Supplierentityisof type State entity)
by usingMasterData Manager.
Task1: CreatingSuppliersModel using MasterDataManager
In thistask,you will create amodel named SuppliersinMDSusingMaster Data Manager.
1. Navigate tohttp://localhost/mds tolaunch MasterData Manager.Replace the URL if youhave
configuredWebApplicationwithadifferentname oron a differentaWebsite.
2. Click SystemAdministrationin the Administrative Tasks section.
3. If you do notsee the Add Model page,hovermouse overManage on the menubar, click Models
and thenclick Add Model (+) toolbarbuttonto create a new model.
43
4. Enter SuppliersforModel name.
5. ClearCreate entity withsame name as model option.We will be creatinganentitylaterusingthe
MDS Add-infor Excel.
6. Click Save Model buttononthe toolbar.
Task2: UploadingSupplierDatato MDS usingMDS Add-inforExcel
In thistask,you will publishthe cleansedandsupplierdatato MDS usingthe MDS Add-infor Excel.You
will create anentitynamed Supplierinthe Suppliersmodel youcreatedinthe previouslesson.The
entitywill have anattribute foreachcolumninthe Excel file.The Code andName attributesof the
Supplierentitycorrespondtothe SupplierIDandSupplierName columnsinExcel.
1. OpenCleansedandMatched Suppliers.xlsinEXCEL.
44
2. Press CTRL+A to selectentire data.Itis important that youselectthe entire datainthe spreadsheet.
3. Click Master Data on the menubar.
4. Click Create Entity buttonon the ribbon.
5. In the Manage Connectionsdialogbox,if youdonot see the connectionto local MDS serverunder
Existingconnections,do the following:
a. SelectCreate a new connection,andclick New button.
b. In the Add NewConnectiondialogbox,type Local MDS Serverfor Descriptionand
http://localhost/MDS for MDS server address,and click OK to close the dialogbox.
6. In the Manage Connectionsdialogbox,select Local MDS Server (http://localhost/MDS),click Testto
testthe connection. Click OKon the message box.
7. Click Connectto connectto the MDS server.
8. In the Create Entity dialogbox,select Suppliersforthe Model.
9. Ensure that VERSION_1 isselectedforVersion.
10. Enter SupplierforNewentityname.
11. SelectSupplierIDforthe column that contains a unique identifierfield(youcanalsogenerate a
code automatically).Youare essentiallymappingthe SupplierIDcolumnin Excel tothe Code
attribute of Supplierentity.
45
12. SelectSupplierName forthe columnthat contains names field.Youare essentiallymappingthe
SupplierName columnin Excel to the Name attribute of the Supplierentity.The Code andName
attributesare mandatoryattributesforanentityinMDS.
13. Click OK to create the entityonMDS, publishthe masterdatato the entity,andclose Create Entity
dialogbox.
14. Now,youshouldsee anewsheettitled Supplier,whichisthe name of the entity,addedtoyour
Excel spreadsheetandat the top of the worksheetyoushouldsee thatthe worksheetisconnected
to the MDS server.Notice thatthe original worksheet(titledSheet1) still exists.
46
15. Keep Excel open.
Task3: VerifyingtheDatain MasterData Manager
In thistask,you will verifythatthe Supplierentityiscreatedon MDS usingMasterData Manager Web
Application.
1. If Master Data Manager isalreadyopen,click SQL Server2012 Master Data Services at the top to
navigate tothe home page.Otherwise,navigate to http://localhost/mds tolaunch MasterData
Manager.
2. SelectSuppliersforModel,andclick Explorer.
3. Reviewthe datastoredonMDS. If youdo not see the data,confirmthat youselected Suppliersfor
the Model on the home page before launching Explorer.Youcanadd to or delete fromthe supplier
listbyusingAdd Memberand Delete Memberbuttons onthe toolbar.
47
Task4 (Optional):Combining,Matching,andPublishingaNewSet of Data
Overtime,youwill wanttoadd more data to the MDS repository.Before addingdata,itcan be useful to
compare the newdata to the data that’s alreadymanagedinMDS, to ensure youare not adding
duplicate orinaccurate data. In the Master Data ServicesAdd-inforExcel, youcancombine datafrom
twoworksheets andthencompare the data to identifyandremove duplicatesbefore publishingthe
data to MDS. The matchingfeature of the MDS Excel Add-inusesthe DQSmatchingfunctionalityto
identifymatchesinthe data. Inthistask,you will combine datafromtwoworksheetsintoone andthen
performmatchingtoidentifyandremove duplicatesbeforepublishingtoMDS.See Data Quality
Matching inthe MDS Add-inforExcel andCombine Datatopicsfor more details.
1. Launch newinstance of Excel.Click Start, pointto Run, type Excel,andclick OK.
2. Switchto the Master Data tab by clickingMasterData on the menubar.
3. Click Connecton the ribboninthe Connect and Load groupto connectto the MDS server.You have
configuredthisconnectionearlierinthislesson.
4. You shouldsee the MasterData Explorer pane to the right.If you do notsee the Master Data
Explorer,click ShowExplorerbuttonon the ribbon.
5. In the Master Data ExplorerWindow,selectSuppliersinthe drop-downlistforthe Model.You
shouldsee thatthe model hasone entity: Supplier.
6. Double-clickSupplierinthe entitylisttoloadthe entitymembersintothe Excel worksheet.
7. Click Sheet2at the bottomto switchtothe Sheet2tab. If you donot see Sheet2,justcreate a new
worksheet.
8. OpenSuppliers.xlsfile(the original inputfilethatisincludedinthe tutorial files) andcopyall (three)
rowsfrom the CombineAndCleanseworksheetto Sheet2.
9. Switchback to the Suppliersheetinthe Book 1 – Microsoft Excel (notthe Cleansedand Matched
SupplierList Excel) thatisconnectedto MDS.
10. Click Combine Data onthe ribbon.Youwill see the Combine Datadialogbox.
11. In the Combine Data dialogbox,clickthe buttonnextto Range to combine with MDS data textbox
as showninthe followingimage.
48
12. You shouldsee the shrunkendialogbox now.Now,click Sheet2toswitchto the Sheet2tab that has
the new supplierdatawith4 rows(includingone headerrow).
13. In the Sheet2,selectall rows includingthe headerrow (evenif theyseemtobe alreadyselected).
You shouldsee the Range to combine with MDS data is automaticallyupdated.
14. Switchback to the Suppliertab withoutclosingthe Combine Datadialogbox.
15. Clickthe button nextto the textbox. You shouldsee thatthe dialogbox isexpandednow.You
shouldsee thatsome columnsof the SupplierMDS entity are mapped to Excel columns.
49
16. Important: Ensure thatCode entitycolumnismappedtothe SupplierIDcolumninthe worksheet
and Zip Code entitycolumnismappedtothe Zip Code columninthe worksheet.
17. On the Combine Data dialogbox,click Combine.
18. Confirmthatthree data rowsare addedto the bottomof the worksheetandtheyshouldbe color
coded.
19. Click Match Data on the ribbonto identifyduplicates.Thisfeatureusesthe matchingfunctionalityof
DQS.
20. In the Match Data dialogbox,select SuppliersforDQS Knowledge Base.
50
21. Map worksheetcolumnstodomainsasshowninthe followingtable.
WorksheetColumn Domain
Code (youuploadedSupplierIDas the Code for
the SupplierentityinMDS).
SupplierID
Name (youuploadedSupplierName asthe Name
for the SupplierentitytoMDS)
SupplierName
ContactEmailAddress ContactEmail
22. SelectPrerequisite forthe Code columnmapping.
23. Enter 70% as the weightfor SupplierName and 30% as the weightfor Contact Email as shownin
the image.
24. Click OK.
25. The matchingprocessshouldidentifyone duplicate forthe supplierwith Code:S1.
26. Selectthe duplicate row (orange),right-click,andclick Delete todeletethe row.
27. Delete the CLUSTER_ID columnsince youdon’tneeditanymore.
28. Click Publishto publish the updatedrecordset(with twonew recordswith CodesS66and S57) to
MDS.
29. In the Publishand Annotate dialogbox,addan annotation,and click Publish.
30. Switchto the Master Data Manager Webapplication.
31. On the home page,ensure that Suppliersisselectedforthe Model,andclick Explorer.If youalready
have the Explorer open,refreshthe internetbrowser.
4. Sort the listby Code and lookfor recordswith S57 andS66 as codes.Youcan alsouse the Filter
buttonon the toolbarto searchfor a specificrecordinthe list.
5. Now,close Book1 – MicrosoftExcel window withoutsavingthe file.
Task5: Creatinga Domain-BasedAttributefromExcel
In thistask,you will convertthe State attribute of the Supplierentityasa domain-basedattribute.After
youconfigure the State attribute tobe a domain-basedone andpublishittoMDS, a new entitynamed
State will be createdonMDS serverwithall the valuesinthe columnandthe State attribute of the
Supplierentitywill be populatedwithvaluesfromthe State entity.Now,the Suppliersmodel should
51
have twoentities: SupplierandState where the State attribute of the Supplierentityisa domain-based
attribute thatdependson State entity.
1. Switchto Excel windowthathas CleansedandMatched Suppliers.xlsxopen.
2. Click Refreshbuttononthe ribbontoget the latestupdatesonMDS. You shouldsee the two
additional recordsif youhave performedthe optional Task4.
3. Clickcolumnname State (Cell I1) inthe headerrow.
4. Click Attribute Properties on the ribbon.
5. In the Attribute Properties dialogbox,select Constrainedlist(Domain-based) forthe Attribute
type.
6. Type State for the Newentity name and click OK.
7. Now,inExcel,youwill see downarrow whenyouclickon any value inthe State column.Youcan
change the value usingthe drop-downlistif youneed.
52
Task6: Verifythat the Domain-BasedAttributeisCreatedusingMasterDataManager
In thistask,you will verifythatthe State entityiscreatedin MDS andthe State attribute of the Supplier
entityisa domain-basedattribute thatdependsonthe State entitybyusingMaster Data Manager.
1. Switchto the Master Data Manger webapplication.
2. Click SQL Server 2012 Master Data Services at the top to getto the home page.
3. Ensure that Suppliersmodel isselectedandclick Explorer.Youcouldjustrefreshthe page if you
alreadyhad Explorer open.
4. Hoveryour mouse overEntitiesinthe menubar and notice thatnow there are twoentities:
SupplierandState.
5. Click State if the entityisnotopenalready.
6. SelectGAfromthe list.
7. In the Detailspane to the right,change the Name to Georgiainthe right pane, andclick OK.
8. Repeatthe previousstepsforotherstates.
Code Name
CA California
CO Colorado
IL Illinois
DC Districtof Columbia
FL Florida
AL Alabama
KY Kentucky
MA Massachusetts
AZ Arizona
53
MI Michigan
MN Minnesota
NJ New Jersey
NV Nevada
NY New York
OH Ohio
OK Oklahoma
OR Oregon
PA Pennsylvania
SC SouthCarolina
KS Kansas
TN Tennessee
TX Texas
UT Utah
VA Virginia
WA Washington
WI Wisconsin
HI Hawaii
MD Maryland
CT Connecticut
9. Selectanyof the above entriesandclick ViewTransactionsfrom the Toolbar.You shouldsee the
transactionforthe update youjustmade isin the listof transactions.
10. Hoverthe mouse overEntitiesmenuand click Supplier.
11. Now,notice thata value forthe State fieldcanbe changedinthe Details pane usingthe drop-down
list.Youcan alsosee that,inthe listtothe leftandinthe drop-downlistinthe Detailspane,code is
displayedfirstandthenthe name incurlybraces.You can alsochange any othervalue inthe Details
pane.
54
Task7: ViewingUpdatesMadeusingMasterDataManagerinExcel
In thistask,you will verifythatyousee the updatesperformedusingMasterDataManager in Excel.
1. Now,switchtothe excel windowthathas Cleansedand MatchedSuppliers spreadsheetopen.
2. Click Refreshbuttonon the ribbon.
3. Notice thatnames showup (California,New Yorketc…) forthe State fieldalongwiththeircodes.
55
Task8: AddingaNew ValueforState Entity inExcel
In thistask,you will adda newvalue forthe State entityinExcel andpublishthe change tothe MDS
server.
1. Create a new work sheetinExcel by clickingona new tab at the bottom.
2. In Excel,clickthe Master Data tab on the menu,andthenclick Show Explorer on the ribbon.
3. In the Master Data Explorer,selectSuppliersforModel.Youshouldsee twoentities: Supplierand
State inthe entitylist.
4. Double-clickState inthe list.All the membersof the State entityfromMDS shouldbe displayedin
the worksheet.
56
5. Now,adda newrowat the endwiththe followingvalues:NorthCarolinafor Name and NC for
Code.The color codingdifferentiatesanynew/updatedrecordsfromthe otherrecords.
6. Click Publishonthe ribbontopublishthe change toMDS.
7. On the Publishand Annotate dialogbox,notice thatthe Use same annotation for all changes is
selected.Youcanentera single annotationforall the changeshere.
8. SelectReviewchangesand provide annotations individually optiontoprovide annotationforeach
change (inthiscase,onlyone).
57
9. Click Publishto publishdatatoMDS.
10. Notice thatcolor coding for the rowwith North Carolina as the State is same as otherrecordsnow.
11. Optional:verifythatthe newmember(NC) isaddedtothe State entitybyusingthe Explorerin
Master Data Manager.
12. In Excel,right-clickthe State worksheetatthe bottom, andclick Delete to delete the worksheet.
Deletingthe worksheetdoesnotdeleteanydatafrom the MDS server.
Task9: Creatinga DerivedHierarchyusingMasterDataManager
In thistask,you will create aderivedhierarchybyusingMasterData Manager.This derivedhierarchyis
derivedfromthe domain-basedattribute relationshipsbetweenthe SupplierandState entities.
1. Switchto the mainpage of Master Data Manager by clickingSQLServer 2012 Master Data Services
at the topof the page.
2. Click SystemAdministrationin the Administrative Tasks section.
3. Hoverthe mouse overManage on the menubar, andclick DerivedHierarchies.
58
4. Click Add DerivedHierarchy (+) buttononthe toolbar.
5. Type SuppliersInState forthe Derivedhierarchy name.
6. Click Save buttononthe toolbarto save.
7. Drag SupplierfromAvailable Levels:SuppliersInState toCurrent Levels:SuppliersInState.
8. Drag State fromAvailable Levels:SuppliersInState toCurrentLevels:SuppliersInState.The screen
shouldhave CurrentLevelsas showninthe followingpicture.
59
9. In the Previewwindow,expand NY{NewYork} andyou shouldsee one supplierinthatstate as
showninthe precedingimage.
10. Switchto the mainpage of Master Data Manager by clickingSQLServer 2012 Master Data Services
at the topof the page.
11. Click Explorer.
12. Hoverthe mouse overHierarchiesand click Derived:SuppliersInState.
13. Clickon any state node inthe tree viewand youshouldsee the suppliersinthatstate in the right
pane.
60
Lesson 5: Automating the Cleansing and Matching using SSIS
In Lesson1, youbuiltthe SuppliersKBandusedthatKB to performcleansingactivityinLesson2andthe
matchingactivityin Lesson3 usingthe tool DQS Client.Ina real worldscenario,youmayhave to pull
data froma source that isnot supportedbyDQSor youwantto automate the cleansingandmatching
processwithouthavingtouse the DQS Clienttool.SQLServerIntegration Services(SSIS) has
componentsthatyoucan use to integrate datafrom variousheterogeneoussourcesanda DQS
CleansingTransform componenttoinvoke the cleansingfunctionality exposedbyDQS.Currently,DQS
doesnotexpose matchingfunctionalityforSSIStouse,but youcan use the FuzzyGroupingTransform
to identifyduplicatesinthe data.
You can uploaddata to MDS by usingthe Entity-basedStaging feature.Whenyoucreate an entityin
MDS, correspondingstagingtablesandstoredproceduresare automaticallycreated.Forexample,when
we createdthe Supplierentity,the stg.supplier_Leaf table andthe stg.udp_Supplier_Leaf stored
procedure were automaticallycreated.Youuse the stagingtablesandprocedurestocreate,update,and
delete entitymembers.Inthislesson,youwill be creatingnew entitymembersforthe SupplierEntity.
To loaddata intothe MDS server,the SSISpackage firstloadsthe data intothe stagingtable
stg.supplier_Leaf andthentriggersthe associatedstoredprocedure stg.udp_Supplier_Leaf.See
ImportingData formore details.
61
In thislesson,youwill performthe followingtasks:
1. Remove supplierdatainMDS (if youhave gone throughLessons1-4). The SSISpackage you
create in thislessonuploadsthe datatoMDS automatically.Earlier,youuploadedthe cleansed
and matchedsupplierdatatoMDS servermanuallyusingthe DQSClient.
2. Create a subscriptionview onthe Supplierentitytoexpose datainthe entitytoother
applications.ThiscreatesaSQL view thatyouwill verifyusingSQLServerManagementStudio.
You will notbe consumingthisview inthisversionof the tutorial.
3. Create and runan SSIS projectusing SQL ServerData Tools. The projectwill use Data Cleansing
transformto submita cleansingrequesttothe DQS server.The matchingfunctionalityisnot
exposedbyDQSyet,soyou will use FuzzyGroupingtransformtoidentifyduplicates.
4. Verifythatthe data iscreatedinMDS by usingMasterData Manger.
5. Reviewthe resultsfromDQScleansingprojectcreatedbythe SSISpackage and optionally
performinteractive cleansingtofurtherbuildthe knowledge base.
Task1 (Prerequisite):RemovingSupplierDatainMDS
In thistask,you will remove the supplierdatastoredinMDS. You haduploadedthe datamanuallyusing
MDS Excel Add-ininthe previouslesson.The SSISpackage youwill be creatinginthislessonwill
automaticallyloadthe dataintoMDS for you.Therefore,beforetestingthe SSISpackage,we needto
remove the supplierdatafromMDS, remove the derivedhierarchy,removesupplierandstate entities,
and create the supplierentitywithnodata.
1. Launch Master Data Manager by navigatingto http://localhost /MDS or the Website and
applicationyouspecifiedwhenconfiguringMDS. If youkeptthe Master Data Manager open,click
SQL Server2012 Master Data Services at the top to switchto the home page.
2. Click SystemAdministrationin the Administrative Tasks section.
3. Hoverthe mouse overManage on the menuandclick DerivedHierarchies.We needtodelete the
derivedhierarchy SuppliersInState before deletingthe entitiesinthe Suppliersmodel.
4. SelectSuppliersInState fromthe DerivedHierarchylistandclick X (Delete) buttononthe toolbar.
5. Click OK to confirmdeletion.
6. Hoverthe mouse overManage on the menuandclick Entities.
7. Click Supplierandclick Delete (X) buttonon toolbarto delete the entity.Click OKonmessage boxes.
8. Repeatthe previoussteptodelete State entity.
9. Don’tclose Master Data Manager.
10. Switchto the Excel windowthathas Cleansedand Matched Suppliers.xlsfile open.Switchtothe
Sheet1tab at the bottom.
11. Selectonlythe firstrow with headers.Don’tselectanyotherrow. You wantto justcreate the
entitiesbasedonthe Excel columnsbutdon’twanttouploadanydata, therefore youselectonlythe
firstrow withthe headers.
12. Click Master Data on the menubar.
13. Click Create Entity from the ribbon.
14. In the Manage Connectionsdialogbox,if youdonot see the connectionto local MDS serverunder
Existingconnections,do the following:
62
a. SelectCreate a new connection,andclick New button.
b. In the Add NewConnectiondialogbox,type Local MDS Serverfor Descriptionand
http://localhost/MDS for MDS server address,and click OK to close the dialogbox.
15. In the Manage Connectionsdialogbox,select Local MDS Server (http://localhost/MDS),click Testto
testthe connection. Click OKon the message box.
16. Click Connectto connectto the MDS server.
17. In the Create Entity dialogbox,dothe following:
a. ConfirmthatRange is setto $1:$1.
b. SelectSuppliersforModel.
c. SelectVERSION_1 forVersion.
d. Type SupplierforNew entityname.
e. SelectSupplierIDforCode.
f. SelectSupplierName forName.
g. ClickOK to create the entityandclose the dialogbox.
18. Close EXCEL and do not save the file.
19. In Master Data Manager, refreshthe internetbrowserandconfirmthat Supplierentityisdisplayed
inthe list.
20. Switchto the home page by clickingSQLServer 2012 Master Data Services at the top.
21. ConfirmthatSuppliersisselectedforModel andVERSION_1 is selectedforVersion.
22. Click Explorer.Notice thatthe Supplierentitywithall the attributesiscreatedwith novalues.
Task2 (Optional):CreatingaMDS SubscriptionViewusingMasterDataManager
In thistask,you will create asubscriptionviewtoexpose the Supplierentityinthe Suppliersmodelto
otherapplications.Youwill notbe consumingthisview inthe currentversionof the tutorial.
1. Switchto the mainpage of Master Data Manager (http://localhost/MDS) byclickingSQLServer
2012 Master Data Services at the top.
2. Click IntegrationManagement.
3. Click Create Viewson the menubar.
4. Click + (Plus) iconon the toolbarto create a new subscriptionview.
5. In the Create Subscription Viewpane,type SuppliersforSubscriptionviewname.
6. SelectSuppliersforModel.
63
7. SelectVERSION_1 forVersion.
8. SelectSupplierforEntity.
9. SelectLeafmembersfor Format.
10. Click Save onthe toolbartosave the subscriptionview.Thisactuallycreatesaview inSQLServer
namedSuppliers.Youcan verifythisusingSQLServerManagementStudio(SSMS).
Task3 (Optional):Reviewingthe SubscriptionViews
In thistask,you will confirmthatthe SQLviewsare createdbyusingSQL ServerManagementStudio.
1. Launch SQL Server ManagementStudio.Clickthe Start button,click All Programs, click Microsoft
SQL Server2012, andthenclick SQL ServerManagement Studio.
2. In the Connectto Serverwindow,setServerType to Database Engine,type the server name (or
select(local),andselectappropriate authentication,andclick Connecttoconnectto the server.
3. In the ObjectExplorer pane,expand Databases,expand MDS,and thenexpand Views.
4. Confirmthatyou see the mdm.Suppliersview inthe list.
Task4: CreatinganSSIS ProjectusingSQL ServerDataTools
In thistask,you will create anSSISprojectusing SQL ServerData Tools to automate cleansingand
matchingsupplierdata.
1. Launch SQL Server Data Tools.Click Start, pointtoAll Programs, expand MicrosoftSQL Server
2012, andclick SQL Server Data Tools.
2. Click File onmenu,pointto New,and click Project.
3. ExpandBusinessIntelligence inthe InstalledTemplates pane,andselect IntegrationServices.
64
4. SelectIntegrationServicesProjectinthe listof project types.
5. Type CleanseAndCurateSuppliers forName andclick OK.
6. In the SolutionExplorerwindow,right-clickPackage.dtsxandselectRename.If youdon’tsee the
SolutionExplorerwindow,click Viewonthe menubar and click SolutionExplorer.
65
7. Type CleanseAndCurate.dtsx andpress ENTER.Make sure thatthe extensionremains .dtsx.
Task5: AddingDataFlowTask
In thistask,you will adda Data FlowTaskto the control flow of SSISpackage.
1. Drag and drop Data FlowTask fromSSIS Toolbox to the Control Flowtab in the SSISDesigner.If you
do notsee the SSIS Toolbox, clickanywhere inthe Control Flowtab, clickSSIS on the menubar,and
clickSSIS Toolbox.
66
2. Right-clickthe Data FlowTask inthe Control Flow tab and click Rename.
3. Type Receive,Cleanse,Match,and Curate SupplierData and press ENTER.
4. Double-clickonthe Data FlowTask to switchto the Data Flowtab.
Task6: AddingExcel Sourceto theData Flow
In thistask,you will addanExcel Source to the data flow to readsupplierdatafromthe source Excel file.
The Excel Source extractsdata from worksheetsorrangesinMicrosoftExcel workbooks.See Excel
Source topicfor more details.
1. Drag-dropExcel Source from OtherSources inSSIS Toolbox to the Data Flow tab.
2. Right-clickonExcel Source in the Data Flow tab,and click Rename.
3. Type Read SupplierData from Excel File and press ENTER.
4. Double-clickReadSupplierData from Excel File tolaunchthe Excel Source Editor dialogbox.
5. In the Excel Source Editor dialogbox,click Newto create a new Excel connection.
6. In the Excel ConnectionManager dialogbox,click Browse,and thenselectthe Suppliers.xlsfile in
the EIM Tutorial folder.Confirmthat MicrosoftExcel 97-2003 isselectedinthe Excel Versionbox
and thenclick OK.
67
7. In the Excel Source Editor dialogbox,select IncomingSuppliers$inthe Name of the Excel sheetlist
box.
8. Click Previewtopreviewthe datainExcel file.
9. Click OK to close the dialogbox.
10. Drag-dropDQS Cleansingtransformin OtherTransforms onthe SSIS Toolboxto the Data Flow tab
underRead SupplierData from Excel File.The DQS CleansingtransformationusesDataQuality
Services(DQS) tocorrectdata by applyingapprovedrulesinthe knowledge base.Thistransform,at
runtime,createsaDQS cleansingprojectonthe DQSserver.See DQSCleansingTransformation
topicfor more details.
Task7: AddingDQS CleansingTransformto theData Flow
In thistask,you will addDQSCleansingTransformtothe data flow tocleanse the inputsupplierdataby
usingDQS.See DQS CleansingTransform formore detailsaboutthe transform.
68
1. Right-clickDQSCleansinginthe Data Flow tab,and click Rename.Type Cleanse SupplierData,and
press ENTER.
2. SelectReadSupplierData from Excel File;drag the blue connectorto Cleanse SupplierData. The
componentsare nowconnected.
3. Double-clickCleanse SupplierData.
4. In the DQS CleansingTransformationEditor, clickNewnextto the Data Quality Connection
Manager drop-downlist.
5. In the DQS CleansingConnectionManagerdialogbox,type (local) orperiod(.) to connectto the
local server.Thislessonassumesthatyouhave DQSinstalledonalocal server.
6. Click Test Connectiontotest the connectiontoDQS server.
7. Click OK to close the dialogbox.
8. SelectSuppliersforthe Data QualityKnowledge Base.
9. Switchto the Mapping tab at the top.
69
10. From Available InputColumns,selectSupplierName,ContactEmailAddress,AddressLine,City,
State, Country,and Zip Code byclickingthe checkboxes.
11. In the bottompane,map these columnsusingdrop-downlistsinthe Domaincolumn:
Column Domain
SupplierName SupplierName
ContactEmailAddress Contact Email
AddressLine AddressLine
City City
State State
Country Country
ZipCode Zip
12. Click OK to close the DQS CleansingTransformation Editor dialogbox.
Task8: AddingConditional SplitTransformto SplitCleansingOutput
In thistransform,youwill adda Conditional SplitTransformtothe dataflow.The Conditional Split
transformationcanroute rowsto differentoutputsdependingonthe contentof the data. For the
70
purpose of thistutorial,youwill use the RecordStatus outputcolumnfromthe DQS Cleansing
transform.Youwill uploadonlycorrector correctedrecordsto MDS serverinthistutorial.Therefore
youwill checkif the Record Status isCorrect or Corrected,and combine the recordsbefore uploading
the recordsto MDS.
1. Drag-dropConditional SplitTransform fromCommon sectioninthe SSIS Toolboxto the Data Flow
tab below Cleanse SupplierData.
2. Right-clickConditional Split,andclick Rename.Type PickCorrect and CorrectedRecords and press
ENTER.
3. ConnectCleanse SupplierData and Pick Correct and CorrectedRecords usingthe blue connector.
4. Double-clickPickCorrectand CorrectedRecords inthe Data Flowtab.
5. Change the DefaultOutput Name at the bottomof the screento Correct.
6. ExpandColumnsin the top-leftpane.
71
7. Drag-dropRecord Status to the Conditioncolumn.
8. Type ==“Corrected” nextto[Record Status] for the Conditioncolumn.
9. Click Case 1 inthe Output Name Column,and change the name to Corrected.
10. Click OK to close the Conditional SplitTransformation Editordialogbox.
Task9: AddingUnionAll Transformto CombineCorrectandCorrectedRecords
In thistask,you will addthe UnionAll Transformtothe data flow.The UnionAll transformation
combinesmultiple inputsintoone output.Inourscenario,it will combinebothCorrectandCorrected
recordsintoone stream.
1. Drag-dropUnionAll TransformfromCommon sectionof the SSIS Toolboxto the Data Flowtab and
place it below PickCorrectand CorrectedRecords.
72
2. Right-clickUnionAll Transforminthe Data Flowtab, and click Rename.Type Combine Correct and
CorrectedRecords, and press ENTER.
3. ConnectPick Correct and CorrectedRecords to Combine Correct and CorrectedRecords in the Data
Flowtab usingthe blue connector.Youshouldsee the Input Output Selectiondialogbox.
4. In the Input Output dialogbox,select CorrectforOutput and click OK.
5. Move the connectortitled Correctto the leftbydraggingand droppingthe dotat the endof the
connectorto left.
73
6. If you selectPickCorrect and Corrected Records transform, youshouldsee another blue connector.
Drag that blue connectorto Combine Correctand CorrectedRecords.
7. Thisconnector shouldbe titled Corrected.Since we have onlytwoconditions Correctand
Corrected,and one conditionwasalreadyused,the InputOutputSelectiondialogbox isnot
displayedthistime.If the connectorsoverlap,moveone toleftandthe otherone to rightby
draggingthe connectorto leftor right.
Task10: AddingFuzzyGroupTransform to IdentifyDuplicates
In thistask,you will adda FuzzyGroupTransformto the data flow.The FuzzyGrouptransformationcan
helpidentifyduplicatesinthe source data.See FuzzyGroupingTransformation formore details.
1. Drag-dropFuzzy Grouptransformin Other Transforms onthe SSIS Toolboxto the Data Flow tab
below Combine Correctand CorrectedRecords.
2. Right-clickFuzzyGroupTransforminthe Data Flow tab, andclick Rename. Type GroupSuppliers
with matching IDs and press ENTER.
3. ConnectCombine Correctand CorrectedRecords to GroupSupplierswith matching IDs usingthe
blue connector.
4. Double-clickGroupSupplierswithmatchingIDs.
74
5. In the Fuzzy GroupTransformation Editor, clickNewnextto OLE DB ConnectionManager drop-
down listto launchConfigure OLE DB ConnectionManager dialogbox.
6. In the dialogbox,click Newto launchConnectionManagerdialogbox.
7. Type (local) or period(.) for the Servername.
8. SelectMDSfor Selector enter a database name field.We will be usingMDSdatabase as the
temporarystorage forthe FuzzyGroup Transform. The Fuzzy Groupingtransformationrequiresa
connectiontoan instance of SQL Servertocreate the temporarySQLServertables thatthe
transformationalgorithmrequirestodoits work. You can create a new database or use another
existingdatabase forthispurpose.
9. Click Test Connectiontotest the connectionandclick OK on the message box.
10. In the ConnectionManager dialogbox,click OK.
11. Select(local).MDS(orlocalhost.MDS) fromthe list of Data Connections andclick OK.
12. In the Fuzzy GroupingTransformation Editor,confirmthat(local).MDSor localhost.MDS isselected
for the OLE DB ConnectionManager.
13. Switchto the Columnstab.
14. Select(checkbox) SupplierID_Outputfromthe listof Available InputColumns.To configure the
transformation,youmustselectthe inputcolumnstouse whenidentifyingduplicates.Tokeepit
simple,youwill onlyuse the SupplierIDinthisstep.
15. Click OK to close the FuzzyGroup Transformation Editor.
75
Task11: AddingConditional SplitTransformto FilterDuplicates
In thistask,you will addthe ConditionalSplitTransformtothe data flow.Thistransformwill helpyou
filterduplicatesfrom the incomingrecordset.The FuzzyGrouptransformgroupsthe recordsthat it
findstobe matchesandpicksone of the record as a pivotrecord. All the recordsin a group have the
same _key_outvalue.The pivotrecordinthe group has _key_insame asthe _key_outvalue.The other
recordsin the grouphave differentvaluesfor_key_inand_key_out.Therefore,whenyoufilterusing
the condition_key_in==_key_out,youonlygetthe pivotrow inthe group.
1. Drag-dropConditional SplitTransformfrom Commonsectioninthe SSIS Toolbox to the Data Flow
tab.
2. Right-clickConditional SplitTransforminthe Data Flow tab, andclick Rename. Type Filter
Duplicatesand press ENTER.
3. ConnectGroupSupplierswith Matching IDs to FilterDuplicates.
4. Double-clickFilterDuplicatestolaunchthe Conditional SplitTransform Editor dialogbox.
5. ExpandColumnsin the top-leftpane.
6. Drag-drop_key_in to the Conditioncolumn.
7. Type == (equalsto) nextto _key_inand drag-drop_key_out.
8. Click Case 1 inthe Output Name column,type Unique Records,andpress ENTER.
9. Click OK to close the Conditional SplitTransformation Editor dialogbox.
Task12: AddingDerivedColumnTransformto AddColumnsRequiredbyMDS
In thistask,you will addthe Derive ColumnTransformtothe dataflow.Youwill addtwoderived
columns, ImportType and BatchTag, to the recordspassedto thistransform.Youneedtoadd these
columnsbefore uploadingthe datatostagingtablesinMDS. These twoare requiredcolumnsforthe
stagingtablesinMDS. See Leaf MemberStagingTables formore details.
1. Drag-dropDerivedColumntransform fromCommon sectioninthe SSIS Toolboxto the Data Flow
tab.
76
2. Right-clickDerivedColumnTransforminthe Data Flowtab, and click Rename.Type Add Columns
Requiredby MDS and press ENTER.
3. ConnectFilterDuplicatesto Add ColumnsRequiredby MDS usingthe blue connector.Thiswill
launchthe Input Output Selectiondialogbox.
4. In the Input Output Selectiondialogbox, selectUnique Records,andclick OK.
5. Click SSIS on the menubar andclick Variables.
6. In the Variableswindow,click AddVariable buttonon the toolbar.
7. Type ImportType forthe Name and 2 forthe value.You specifythe value as2 because youwantto
add newmemberstoan entityinMDS.For detailsaboutthisparameter,see Leaf MemberStaging
Table.
8. Click Add Variable toolbarbuttonagain.
9. Type BatchTag for the Name, selectStringforthe Data type,and EIMBatch forthe Value. BatchTag
isjust a unique name forthe batch youwill be submittingtoMDS.
10. In the Data Flow tab,double-clickAddColumnsRequiredbyMDS.
11. In the DerivedColumnTransformation Editor dialogbox,inthe listbox inthe bottom pane, type
ImportType forthe DerivedColumnName.
12. ExpandVariablesand Parameters inthe top-leftpane,drag-dropUser::ImportType tothe
Expressioncolumn.
77
13. Type BatchTag in the nextrowfor the DerivedColumn Name.
14. Drag-dropUser::BatchTag fromVariablesand Parameters to the Expressioncolumn.
15. Click OK to close the DerivedColumnTransformation dialogbox.
Task13: AddingOLE DBDestinationto Write Data to MDS StagingTable
Nowthat youhave added ImportType and BatchTag valuestoall records,youare readyto sendthem
overto MDS forstaging.In thistask,youwill use the OLE DB Destinationtowrite the datainto
stg.supplier_Leaf stagingtable.
1. Drag OLE DB DestinationfromOther Destinations sectioninthe SSIS Toolboxto the Data Flowtab
and dropit below AddColumnsRequiredby MDS.
2. Right-clickOLEDB Destinationinthe Data Flowtab, and click Rename.Type Write SupplierData to
MDS Staging Table and press ENTER.
3. Connectthe Add ColumnsRequiredby MDS to Write SupplierData to MDS Staging Table usingthe
blue connector.
4. Double-clickWrite SupplierDatato MDS Staging Table inthe Data Flow tab.
5. In the OLE DB DestinationEditor dialogbox,make sure that (local).MDS(orlocalhost.MDS) is
selectedforthe OLE DB ConnectionManager field.
6. Selectstg.Supplier_Leaf table fromthe listof Name ofthe table or the view.
78
7. Switchto the Mappingspage byclickingMappinginthe menuonleft.
8. Map inputand destinationcolumnsasshowninthe followingtable.
9. Confirmthatyou are using_Output columnsfor InputColumns,notthe _Status or _Source
columns. _Output columnscontainthe outputvaluesfromDQSCleansing.
10. Click OK to close the OLE DB DestinationEditor dialogbox.
11. The data flowshouldlike the followingimage.
79
Task14: AddingExecuteSQLTaskto Control Flowto Runthe StoredProcedureforMDS
Afterloadingdataintothe stagingtablesof MDS, you needtorun a storedprocedure associatedwith
that table toload the data fromstagingintothe appropriate tables inthe MDS database.Thisstored
procedure hastworequiredparametersthatyouneedtopass:LogFlag andVersionName.LogFlag
specifieswhethertransactionsare loggedduringthe stagingprocessandVersionName representsthe
versionof the model.See StagedStoredProcedure topicformore details.
In thistask,you will addthe Execute SQLTask to the control flow toinvoke the storedprocedure toload
the stageddata intoappropriate MDS tables.
1. Now,switchtothe Control Flow tab.
80
2. Drag-dropExecute SQL Task fromthe SSIS Toolbox to the Control Flow tab.
3. Right-clickExecute SQLTask inthe Control Flowtab,and click Rename.Type Trigger Stored
Procedure to Load Data into MDS and pressENTER.
4. ConnectReceive,Cleanse,Match,and Curate SupplierData to Trigger Stored Procedure to Load
Data usingthe greenconnector.
5. Usingthe Variableswindow,addtwonew variableswiththe followingsettings.If youdonotsee the
Variableswindow,click SSISonthe menubar andclick Variables.
Name Data Type Value
LogFlag Int32 1
VersionName String VERSION_1
6. Double-clickTriggerStoredProcedure to Load Data into MDS.
7. In the Execute SQL Task Editor dialogbox,select (local).MDS(orlocalhost.MDS) forConnection.
8. Type EXEC [stg].[udp_Supplier_Leaf]?,?,? forSQL Statement.You can verifythe name usingSQL
ServerManagementStudio.
81
9. Click Parameter Mappingfrom the menuonleft.
10. In the Parameter Mappingpage,click Add to add a new mapping.Maximize the windowandresize
columnssothat you can see valuesindrop-downlistsproperly.
11. SelectUser::VersionName fromthe drop-downlistforthe Variable Name.
12. SelectNVARCHARfor Data Type.
13. Type 0 (zero) forParameter Name.
14. Repeatthe previousfourstepstoaddtwo more variables.
Variable Name Data Type (important) Parameter Name
User::LogFlag LONG 1
User::BatchTag NVARCHAR 2
15. Click OK to close the Execute SQL Editor dialogbox.
Task15: BuildingandRunningtheSSIS Project
In thistask,you will buildandrunthe SSISproject.If youhave the 64-bit versionof Excel 2010 installed
on yourcomputer,youneedtoset the value of Run64BitRuntime to False for the Excel source to work.
Thisis a knownissue.
82
1. In the SolutionExplorerwindow,Click Projectonthe menu,andclick CleanseAndCurateSuppliers
Properties.
2. In the Propertiesdialogbox,expand ConfigurationProperties onleft,andclick Debugging.
3. SetRun64BitRuntime to False.
4. Click OK to close the Propertiesdialogbox.
5. Click Buildon menubar and click BuildCleanseAndCurateSuppliers.Make sure thatthere are no
builderrors.
6. Click Debugon the menubar and click Start Debugging.
7. Reviewmessagesinthe Progresswindow andverifythatpackage executedandendedsuccessfully.
83
8. Click Debugon menubar and click Stop Debuggingto stop the debuggingsession.If the package
fails,youmaywant to enable dataviewersandsee how the dataflowsbetweencomponents.
Task16: VerifyingwithMasterDataManager
In thistask,you will checkthe statusof the batch jobsubmittedbythe SSISpackage and verifythatthe
data was uploadedtoMDS serverusingMasterData Manager.
1. Launch Master Data Manager (http://localhost/MDS).If itisalreadyopen,click MicrosoftSQL
ServerMaster Data Services at the topto switchto the home page.
2. Click IntegrationManagement.
3. Notice thatthere isa batch withnamed EIMBatch that yousubmittedinthe list.Click ImportData
on the menubar if you donot see the followingscreen.
4. Switchback to the home page by click SQL Server 2012 Master Data Servicesat the top.
5. Make sure that SuppliersisselectedforModel andVERSION_1 isselectedforVersion,andclick
Explorer.
6. You can see the data SSISpackage importedintoMDS. The data shouldbe cleansedandhave no
duplicates Code values(Note:SupplierIDcolumninExcel correspondsto Code attribute of Supplier
entityinMDS).
Task17: ReviewingDQS CleansingProjectCreatedbythe SSIS package
In thisproject,youwill openthe DQSprojectcreatedbythe SSISpackage inDQS Client,review the
resultsfromthe cleansingprocess,andoptionallyperforminteractivecleansingandexportthe results.
1. Launch Data Quality Client.
2. Click ActivityMonitoring inthe Administrationpane.
3. Sort the listbasedon Activity Start Time to see the latestrecord.
4. Notice thatyou see a name of the projectin the followingformat: CleanseAndCurate.Cleanse
SupplierData.GUID.
5. Notice thatthe value inthe Is Active fieldis Active.
6. Click Profilertabin the bottompane to see profilerstatisticsforthe Cleansingactivitythatthe SSIS
package performed.
EIM Tutorial

More Related Content

What's hot

Getting started with Master Data Services 2012
Getting started with Master Data Services 2012 Getting started with Master Data Services 2012
Getting started with Master Data Services 2012 Luis Figueroa
 
Microsoft SQL Server 2012 Master Data Services
Microsoft SQL Server 2012 Master Data ServicesMicrosoft SQL Server 2012 Master Data Services
Microsoft SQL Server 2012 Master Data ServicesMark Ginnebaugh
 
Enterprise Information Management (EIM) in SQL Server 2012
Enterprise Information Management (EIM) in SQL Server 2012Enterprise Information Management (EIM) in SQL Server 2012
Enterprise Information Management (EIM) in SQL Server 2012Mark Gschwind
 
Master data services
Master data servicesMaster data services
Master data servicesSteve Xu
 
Data Quality Services in SQL Server 2012
Data Quality Services in SQL Server 2012Data Quality Services in SQL Server 2012
Data Quality Services in SQL Server 2012Stéphane Fréchette
 
Introduction to Master Data Services in SQL Server 2012
Introduction to Master Data Services in SQL Server 2012Introduction to Master Data Services in SQL Server 2012
Introduction to Master Data Services in SQL Server 2012Stéphane Fréchette
 
Data quality services
Data quality servicesData quality services
Data quality servicesSteve Xu
 
Bi an ia with sap sybase power designer
Bi an ia with sap sybase power designerBi an ia with sap sybase power designer
Bi an ia with sap sybase power designerJane Kitabayashi
 
xRM - as an Evolution of CRM
xRM - as an Evolution of CRMxRM - as an Evolution of CRM
xRM - as an Evolution of CRMCatherine Eibner
 
Microsoft SQL Azure - Cloud Based Database Datasheet
Microsoft SQL Azure - Cloud Based Database DatasheetMicrosoft SQL Azure - Cloud Based Database Datasheet
Microsoft SQL Azure - Cloud Based Database DatasheetMicrosoft Private Cloud
 
Master Data Management
Master Data ManagementMaster Data Management
Master Data ManagementHai Nguyen
 
How Real TIme Data Changes the Data Warehouse
How Real TIme Data Changes the Data WarehouseHow Real TIme Data Changes the Data Warehouse
How Real TIme Data Changes the Data Warehousemark madsen
 
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable DigitalEnterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digitalsambiswal
 
Data warehouse 101-fundamentals-
Data warehouse 101-fundamentals-Data warehouse 101-fundamentals-
Data warehouse 101-fundamentals-AshishGuleria
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesIvo Andreev
 
Power BI Advanced Data Modeling Virtual Workshop
Power BI Advanced Data Modeling Virtual WorkshopPower BI Advanced Data Modeling Virtual Workshop
Power BI Advanced Data Modeling Virtual WorkshopCCG
 
Introduction to Business Intelligence in Microsoft SQL Server 2008 R2
Introduction to Business Intelligence in Microsoft SQL Server 2008 R2Introduction to Business Intelligence in Microsoft SQL Server 2008 R2
Introduction to Business Intelligence in Microsoft SQL Server 2008 R2Quang Nguyễn Bá
 
Data Warehouse Methodology
Data Warehouse MethodologyData Warehouse Methodology
Data Warehouse MethodologySQL Power
 

What's hot (20)

Getting started with Master Data Services 2012
Getting started with Master Data Services 2012 Getting started with Master Data Services 2012
Getting started with Master Data Services 2012
 
Microsoft SQL Server 2012 Master Data Services
Microsoft SQL Server 2012 Master Data ServicesMicrosoft SQL Server 2012 Master Data Services
Microsoft SQL Server 2012 Master Data Services
 
DQS & MDS in SQL Server 2016
DQS & MDS in SQL Server 2016DQS & MDS in SQL Server 2016
DQS & MDS in SQL Server 2016
 
Enterprise Information Management (EIM) in SQL Server 2012
Enterprise Information Management (EIM) in SQL Server 2012Enterprise Information Management (EIM) in SQL Server 2012
Enterprise Information Management (EIM) in SQL Server 2012
 
Master data services
Master data servicesMaster data services
Master data services
 
Data Quality Services in SQL Server 2012
Data Quality Services in SQL Server 2012Data Quality Services in SQL Server 2012
Data Quality Services in SQL Server 2012
 
Introduction to Master Data Services in SQL Server 2012
Introduction to Master Data Services in SQL Server 2012Introduction to Master Data Services in SQL Server 2012
Introduction to Master Data Services in SQL Server 2012
 
Data quality services
Data quality servicesData quality services
Data quality services
 
Bi an ia with sap sybase power designer
Bi an ia with sap sybase power designerBi an ia with sap sybase power designer
Bi an ia with sap sybase power designer
 
xRM - as an Evolution of CRM
xRM - as an Evolution of CRMxRM - as an Evolution of CRM
xRM - as an Evolution of CRM
 
Microsoft SQL Azure - Cloud Based Database Datasheet
Microsoft SQL Azure - Cloud Based Database DatasheetMicrosoft SQL Azure - Cloud Based Database Datasheet
Microsoft SQL Azure - Cloud Based Database Datasheet
 
Master Data Management
Master Data ManagementMaster Data Management
Master Data Management
 
How Real TIme Data Changes the Data Warehouse
How Real TIme Data Changes the Data WarehouseHow Real TIme Data Changes the Data Warehouse
How Real TIme Data Changes the Data Warehouse
 
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable DigitalEnterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digital
 
Data warehouse 101-fundamentals-
Data warehouse 101-fundamentals-Data warehouse 101-fundamentals-
Data warehouse 101-fundamentals-
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best Practices
 
Power BI Advanced Data Modeling Virtual Workshop
Power BI Advanced Data Modeling Virtual WorkshopPower BI Advanced Data Modeling Virtual Workshop
Power BI Advanced Data Modeling Virtual Workshop
 
Introduction to Business Intelligence in Microsoft SQL Server 2008 R2
Introduction to Business Intelligence in Microsoft SQL Server 2008 R2Introduction to Business Intelligence in Microsoft SQL Server 2008 R2
Introduction to Business Intelligence in Microsoft SQL Server 2008 R2
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
Data Warehouse Methodology
Data Warehouse MethodologyData Warehouse Methodology
Data Warehouse Methodology
 

Similar to EIM Tutorial

Tx2014 Feature and Highlights
Tx2014 Feature and Highlights Tx2014 Feature and Highlights
Tx2014 Feature and Highlights Heath Turner
 
Fundamentals of Database Systems Laboratory Manual
Fundamentals of Database Systems Laboratory ManualFundamentals of Database Systems Laboratory Manual
Fundamentals of Database Systems Laboratory ManualSurafiel Habib
 
Gerry Hughes Bi Portfolio
Gerry Hughes Bi PortfolioGerry Hughes Bi Portfolio
Gerry Hughes Bi Portfoliophilistineking
 
Hadoop as an extension of DW
Hadoop as an extension of DWHadoop as an extension of DW
Hadoop as an extension of DWSidi yazid
 
Sql server bi poweredby pw_v16
Sql server bi poweredby pw_v16Sql server bi poweredby pw_v16
Sql server bi poweredby pw_v16MILL5
 
Sql server community_fa_qs_manual
Sql server community_fa_qs_manualSql server community_fa_qs_manual
Sql server community_fa_qs_manualSteve Xu
 
SAP MM Tutorial ds_42_tutorial_en.pdf
SAP MM Tutorial    ds_42_tutorial_en.pdfSAP MM Tutorial    ds_42_tutorial_en.pdf
SAP MM Tutorial ds_42_tutorial_en.pdfsjha120721
 
Architecting a-big-data-platform-for-analytics 24606569
Architecting a-big-data-platform-for-analytics 24606569Architecting a-big-data-platform-for-analytics 24606569
Architecting a-big-data-platform-for-analytics 24606569Kun Le
 
InformationDrivenShort
InformationDrivenShortInformationDrivenShort
InformationDrivenShortDirk Ortloff
 
Business Intelligence Portfolio
Business Intelligence PortfolioBusiness Intelligence Portfolio
Business Intelligence PortfolioDoug Armantrout
 
Big Data: Getting started with Big SQL self-study guide
Big Data:  Getting started with Big SQL self-study guideBig Data:  Getting started with Big SQL self-study guide
Big Data: Getting started with Big SQL self-study guideCynthia Saracco
 
BI Project report
BI Project reportBI Project report
BI Project reporthlel
 
Consistent join queries in cloud data stores
Consistent join queries in cloud data storesConsistent join queries in cloud data stores
Consistent join queries in cloud data storesJoão Gabriel Lima
 
Businessobjects access analysis
Businessobjects access analysisBusinessobjects access analysis
Businessobjects access analysisjoin2006
 
The Microsoft approach to Cloud Transparency
The Microsoft approach to Cloud TransparencyThe Microsoft approach to Cloud Transparency
The Microsoft approach to Cloud TransparencyNerea
 

Similar to EIM Tutorial (20)

Tx2014 Feature and Highlights
Tx2014 Feature and Highlights Tx2014 Feature and Highlights
Tx2014 Feature and Highlights
 
Fundamentals of Database Systems Laboratory Manual
Fundamentals of Database Systems Laboratory ManualFundamentals of Database Systems Laboratory Manual
Fundamentals of Database Systems Laboratory Manual
 
Gerry Hughes Bi Portfolio
Gerry Hughes Bi PortfolioGerry Hughes Bi Portfolio
Gerry Hughes Bi Portfolio
 
Hadoop as an extension of DW
Hadoop as an extension of DWHadoop as an extension of DW
Hadoop as an extension of DW
 
Sql server bi poweredby pw_v16
Sql server bi poweredby pw_v16Sql server bi poweredby pw_v16
Sql server bi poweredby pw_v16
 
Sql server community_fa_qs_manual
Sql server community_fa_qs_manualSql server community_fa_qs_manual
Sql server community_fa_qs_manual
 
SAP CPI-DS.pdf
SAP CPI-DS.pdfSAP CPI-DS.pdf
SAP CPI-DS.pdf
 
hci10_help_sap_en.pdf
hci10_help_sap_en.pdfhci10_help_sap_en.pdf
hci10_help_sap_en.pdf
 
Cloud view platform-highlights-web3
Cloud view platform-highlights-web3Cloud view platform-highlights-web3
Cloud view platform-highlights-web3
 
Openobject bi
Openobject biOpenobject bi
Openobject bi
 
SAP MM Tutorial ds_42_tutorial_en.pdf
SAP MM Tutorial    ds_42_tutorial_en.pdfSAP MM Tutorial    ds_42_tutorial_en.pdf
SAP MM Tutorial ds_42_tutorial_en.pdf
 
Architecting a-big-data-platform-for-analytics 24606569
Architecting a-big-data-platform-for-analytics 24606569Architecting a-big-data-platform-for-analytics 24606569
Architecting a-big-data-platform-for-analytics 24606569
 
InformationDrivenShort
InformationDrivenShortInformationDrivenShort
InformationDrivenShort
 
paper
paperpaper
paper
 
Business Intelligence Portfolio
Business Intelligence PortfolioBusiness Intelligence Portfolio
Business Intelligence Portfolio
 
Big Data: Getting started with Big SQL self-study guide
Big Data:  Getting started with Big SQL self-study guideBig Data:  Getting started with Big SQL self-study guide
Big Data: Getting started with Big SQL self-study guide
 
BI Project report
BI Project reportBI Project report
BI Project report
 
Consistent join queries in cloud data stores
Consistent join queries in cloud data storesConsistent join queries in cloud data stores
Consistent join queries in cloud data stores
 
Businessobjects access analysis
Businessobjects access analysisBusinessobjects access analysis
Businessobjects access analysis
 
The Microsoft approach to Cloud Transparency
The Microsoft approach to Cloud TransparencyThe Microsoft approach to Cloud Transparency
The Microsoft approach to Cloud Transparency
 

Recently uploaded

OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...NETWAYS
 
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...marjmae69
 
The Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationThe Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationNathan Young
 
The 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringThe 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringSebastiano Panichella
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@vikas rana
 
Work Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxWork Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxmavinoikein
 
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...NETWAYS
 
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)Basil Achie
 
Genshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxGenshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxJohnree4
 
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Salam Al-Karadaghi
 
Event 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxEvent 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxaryanv1753
 
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...NETWAYS
 
Philippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptPhilippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptssuser319dad
 
Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸mathanramanathan2005
 
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...henrik385807
 
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Pooja Nehwal
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxFamilyWorshipCenterD
 
SBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSebastiano Panichella
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfhenrik385807
 
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...NETWAYS
 

Recently uploaded (20)

OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
 
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
 
The Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationThe Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism Presentation
 
The 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringThe 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software Engineering
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@
 
Work Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxWork Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptx
 
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
 
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
 
Genshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxGenshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptx
 
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
 
Event 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxEvent 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptx
 
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
 
Philippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptPhilippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.ppt
 
Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸
 
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
 
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
 
SBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation Track
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
 
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
 

EIM Tutorial

  • 1. Tutorial: Enterprise Information Management using SSIS, MDS, and DQS Together SQL Server Technical Article Writer: SreedharPelluru,JaimeAlvaBravo Technical Reviewer: CarlaSabotta,Jimvan de Erve,Kumar Vivek,MattMasson,Matthew Roche Published: October2012 Applies to: SQL Server 2012 Summary: Managinginformationinanenterprise typicallyinvolvesintegratingdatafromacrossthe enterprise andbeyond,cleansingthe data,matchingthe datato remove anyduplicates,standardizing the data, enrichingthe data,makingthe dataconform tolegal and compliance requirements,andthen storingthe data ina centralizedlocationwithall the necessarysecuritysettings. In thistutorial,youwill learnhowtouse SQLServerIntegrationServices(SSIS),MasterData Services (MDS),and Data QualityServices(DQS) togethertoimplementasample Enterprise Information Management(EIM) solution.First,youwill use DQStocreate a knowledgebasethatcontainsknowledge aboutthe data (metadata),cleansethe datainan Excel file usingthe knowledge base,andmatchthe data to identifyandremove duplicatesinthe data.Next,youwill use the MDSAdd-inforExcel toupload the cleansedandmatcheddata to MDS. Then,youwill automate the whole processusinganSSIS solution.
  • 2. 2 Copyright Thisdocumentisprovided“as-is”.Informationandviewsexpressedinthisdocument,includingURLand otherInternetWebsite references,maychange withoutnotice.Youbearthe riskof usingit. Some examplesdepictedhereinare providedforillustrationonlyandare fictitious. Noreal association or connectionisintendedorshouldbe inferred. Thisdocumentdoesnotprovide youwithanylegal rightstoany intellectual propertyinanyMicrosoft product. You may copyand use thisdocumentforyour internal,referencepurposes. © 2011 Microsoft.All rightsreserved.
  • 3. 3 Contents Overview........................................................................................................................................5 Prerequisites...............................................................................................................................6 Lessons.......................................................................................................................................7 Lesson 1: Creating the Suppliers DQS Knowledge Base......................................................................7 Task 1: Creating a Knowledge Base and Domains...........................................................................8 Task 2: Adding Domain Values Manually.....................................................................................12 Task 3: Importing Domain Values from an Excel File ....................................................................12 Task 4: Setting Domain Rules .....................................................................................................13 Task 5: Setting Term-Based Relations .........................................................................................15 Task 6: Setting Synonyms...........................................................................................................15 Task 7: Creating a Composite Domain.........................................................................................16 Task 8: Creating a Composite Domain Rule .................................................................................17 Task 9: Configuring a Reference Data Service..............................................................................18 Task 10: Configuring Composite Domain to Use Reference Data Service .......................................19 Task 11: Publishing the Knowledge Base.....................................................................................20 Task 12: Discovering Knowledge (Knowledge Discovery)..............................................................21 Lesson 2: Cleansing Supplier Data using the Suppliers Knowledge Base............................................24 Task1: Creating a Data Quality Project........................................................................................24 Task 2: Mapping Excel Columns to DQS Domains.........................................................................25 Task 3: Cleansing Data against the Supplier Knowledge Base........................................................27 Task 4: Managing and Viewing Results........................................................................................28 Task 5: Exporting Cleansing Results to an Excel File .....................................................................31 Task 6: Importing Values from the Cleanse Supplier List Project...................................................32 Lesson 3: Matching Data to Remove Duplicates from Supplier List...................................................33 Task 1: Defining a Matching Policy..............................................................................................33 Task 2: Testing and Publishing the Matching Policy......................................................................36 Task 3: Creating and Running a Data Quality Project for Matching................................................38 Task 4: Exporting the Results from Matching Activity to an Excel File............................................39 Lesson 4: Storing Supplier Data in MDS ..........................................................................................40 Task 1: Creating Suppliers Model using Master Data Manager .....................................................42 Task 2: Uploading Supplier Data to MDS using MDS Add-in for Excel ............................................43
  • 4. 4 Task 3: Verifying the Data in Master Data Manager.....................................................................46 Task 4 (Optional): Combining, Matching, and PublishingNew Set of Data.....................................47 Task 5: Creating a Domain-Based Attribute from Excel.................................................................50 Task 6: Verify that the Domain-Based Attribute is Created using Master Data Manager.................52 Task 7: Viewing Updates Made using Master Data Manager in Excel ............................................54 Task 8: Adding a New Value for State Entity in Excel....................................................................55 Task 9: Creating a Derived Hierarchy using Master Data Manager ................................................57 Lesson 5: Automating the Cleansing and Matching using SSIS..........................................................60 Task 1 (Prerequisite): Removing Supplier Data in MDS.................................................................61 Task 2 (Optional): Creating a MDS SubscriptionView using Master Data Manager ........................62 Task 3 (Optional): Reviewing the Subscription Views...................................................................63 Task 4: Creating an SSIS Project using SQL Server Data Tools........................................................63 Task 5: Adding Data Flow Task....................................................................................................65 Task 6: Adding Excel Source to the Data Flow..............................................................................66 Task 7: Adding DQS Cleansing Transform to the Data Flow...........................................................67 Task 8: Adding Conditional Split Transform to Split Cleansing Output...........................................69 Task 9: Adding Union All Transform to Combine Correct and Corrected Records...........................71 Task 10: Adding Fuzzy Group Transform to Identify Duplicates.....................................................73 Task 11: Adding Conditional Split Transform to Filter Duplicates ..................................................75 Task 12: Adding Derived Column Transform to Add Columns Required by MDS.............................75 Task 13: Adding OLE DB Destination to Write Data to MDS Staging Table......................................77 Task 14: Adding Execute SQL Task to Control Flow to Run the Stored Procedure for MDS..............79 Task 15: Building and Running the SSIS Project............................................................................81 Task 16: Verifying with Master Data Manager.............................................................................83 Task 17: Reviewing DQS Cleansing Project Created by the SSIS package........................................83 Conclusion....................................................................................................................................84
  • 5. 5 Overview Managing informationinanenterprise typicallyinvolvesintegratingdatafromacrossthe enterprise and beyond,cleansingthe data,matchingthe datato remove anyduplicates,standardizingthe data, enrichingthe data,makingthe data conformto legal andcompliance requirements,andthenstoringthe data ina centralizedlocationwithall the necessarysecuritysettings. SQL Server2012 providesall the componentsneededforaneffective Enterprise Information Management(EIM) solutioninasingle product.Keycomponentsof SQLServer2012 that helpyoubuild an EIM solutionare:  SQL ServerIntegrationServices  SQL ServerData QualityServices  SQL ServerMaster Data Services SQL ServerIntegrationServices(SSIS) providesapowerful,extensible platformforintegratingdatafrom a varietyof sourcesina comprehensive extract,transform, andload(ETL) solutionthatsupports businessworkflows,adatawarehouse,ormasterdata management.See IntegrationServicesOverview topicfor a quickoverviewandtypical usesof SSIS. SQL ServerData QualityServices (DQS) enablesyoutocleanse,match,standardize,andenrichdata,so youcan delivertrustedinformationforbusinessintelligence,adata warehouse,andtransaction processingworkloads. See IntroducingDataQualityServices topicforthe businessneedforDQSand howDQS answersthe need. SQL ServerMaster Data Services (MDS) providesacentral datahub thatensures thatthe integrityof informationandconsistencyof datais constantacross differentapplications. See MasterDataServices Overview topicforbrief descriptionsof importantfeaturesof MDS. See Enterprise InformationManagementwithSQLServer2012 and CleansingandMatchingMasterData usingEIMTechnologies whitepapersforacomprehensive guidance onimplementinganEIMsolution usingthese MicrosoftEIMtechnologiestogetherandwatch Enterprise InformationManagement(EIM): BringingtogetherSSIS,DQS,andMDS videofora cool demonstrationof anEIMscenario. In thistutorial,youwill learnhowtouse SSIS,MDS, andDQS togethertoimplementasample Enterprise InformationManagement(EIM) solution.First,youwill use DQStocreate a knowledgebase that containsknowledgeaboutthe data(metadata),cleansethe datainan Excel file usingthe knowledge base,and matchthe data to identifyandremove duplicatesinthe data.Next,youwilluse the MDSAdd- infor Excel to uploadthe cleansedandmatcheddatato MDS. Then,youwill automate the whole processusingan SSISsolution.The SSISsolutioninthistutorialreadsthe inputdatafroman Excel file, but youcan extendittoread froma varietyof sourcessuchas Oracle,Teradata,DB2, andSQL Azure.
  • 6. 6 Prerequisites  MicrosoftSQL Server2012 withthe following componentsinstalled. o IntegrationServices(SSIS). o Master Data Services(MDS) o Data QualityServices(DQS) o SQL ServerData Tools See SQL Server2012 InstallationGuide fordetailsaboutinstallingthe product.  Configure MDSusingMaster Data ServicesConfigurationManager Use the ConfigurationManagertocreate and configure aMaster Data Servicesdatabase.After youcreate the MDS database,create a Web application forMDS ina Web site (forexample: http://localhost/MDS) andassociate the MDSdatabase with the MDS Webapplication.Note that, to create an MDS Web application,youneedtohave IISinstalledonyourcomputer.See WebApplicationRequirements(MasterDataServices) andDatabase Requirements(Master Data Services) fordetailsaboutthe prerequisitesforconfiguringMDSdatabase and Web application.  Install andConfigure DQSusingDataQualityServerInstaller.ClickStart,clickAll Programs, clickMicrosoft SQL Server 2012,clickData Quality Services,and thenclick Data Quality Server Installer.  MicrosoftExcel 2010 (32-bitispreferred)  Install MasterData ServicesAdd-infor Excel (32-bitor 64-bitbasedon the versionof Excel you have on yourcomputer) from here.Tofindthe versionof Excel installedonyourcomputer,run Excel,click File on menubarand click Help to see the versioninthe rightpane.Note thatyou needtoinstall Visual Studio2010 ToolsforOffice Runtime priortoinstallingthe Excel Add-in.  (Optional) Create anaccountwith WindowsAzure Marketplace.One of the tasksinthe tutorial requiresyoutohave an Azure Marketplace (originallynamed DataMarket) account.You can skipthistask if youwant andproceedwiththe nexttask.  DQS doesnotallowyouto exportthe cleansingormatchingresultstoan Excel file if youare using64-bit versionof Excel. Thisis a knownissue.Toworkaroundthe issue,dothe following: o Install SQLServer2012 SP1 (on64-bit computerswith64-bitExcel). o Run DQLInstaller.exe –upgrade.If youinstalledthe defaultinstance of SQLServer,the DQSInstaller.exe file will be available atC:ProgramFilesMicrosoftSQL ServerMSSQL11.MSSQLSERVERMSSQLBinn.Double-clickthe DQSInstaller.exe file. o In Master Data ServicesConfigurationManager,clickSelectDatabase, selectexisting MDS database,and thenclick Upgrade.
  • 7. 7 Lessons Thistutorial includesthe followinglessons: Lesson Briefdescription Estimated time to complete (in minutes) Lesson1: Creating the Suppliers DQS KnowledgeBase In thislesson,youwill create aDQSknowledge base named Suppliers. 45 Lesson2: Cleansing SupplierDatausing the Suppliers KnowledgeBase In thislessonyouwill create andruna DQS projectto cleanse the supplierdatainan Excel file usingthe SuppliersKByou createdinthe firstlesson. 30 Lesson3: Matching Data to Remove Duplicatesfrom SupplierList In thislesson,youwill create aDQSprojectto perform matchingactivitytoidentifyandremove duplicatesfromthe cleansedsupplerlist. 30 Lesson4: Storing SupplierDatainMDS In thislesson,youwill uploadthe cleansedandmatched supplierdatatoMaster Data Services(MDS) byusingthe MDS Add-infor Excel. 30 Lesson5: Automating the Cleansingand Matching usingSSIS In thislesson,youwill create anSSISsolutionthatcleanses inputdata usingDQS,matchesthe cleanseddatato remove duplicates,andstoresthe cleansedandmatcheddataonMDS inan automatedmanner. 75 Lesson 1: Creating the Suppliers DQS Knowledge Base In thislesson,youwill create aDQSknowledge base named Supplierswiththe knowledge (metadata) aboutsupplierdata.Youwill use the knowledge base toperformcleansingandmatchingactivitieson inputsupplierdata.The cleansingactivityidentifiesincorrect/invaliddata,correctsthe incorrectdataor proposescorrections/suggestions,standardizesthe data,andenrichesthe datawithadditional information.The matchingactivitycomparesdataandidentifiessimilarrecords(maybe slightly different) inthe datathat helpsyouperformde-duplication(remove duplicates) onthe data. You can use both interactive andcomputer-assistedprocessestocreate,build,andmanage aknowledge base.Knowledgeinaknowledge base ismaintainedindomains,eachof whichisspecifictoa data field inthe data that youwantto cleanse and/ormatch. In thislesson,youwill performthe followingtaskstocreate the Suppliersknowledge base:  Create a DQS knowledge base named Suppliers.Youcancreate a knowledge base inseveral ways.You can builda KB fromscratch or builditbased on an existingknowledgebase orby importingaDQS file (.dqs) thatcontainsapre-builtandexportedknowledge base,orby performingaknowledgediscoveryactivityonsample data.Inthistutorial,youwill create the KB fromscratch.
  • 8. 8  Create domainsin the Suppliersknowledgebase thatyouwill use forcleansingdata,and matchingdata to identifyduplicates.Youshouldcreate domainsfordatafieldsthatyouwantto use incleansingandmatchingactivities,notforall the datafieldsinthe data.  Addvaluestoa domainbyaddingvaluesmanually,importingvaluesfromanexcel file,by performingaknowledgediscoveryactivityonsample data,andby importingprojectvaluesfrom a cleansingproject.Youcan alsoimportdomainvaluesbyimportingaDQS file containing domainpropertiesandvalues,whichyouwill notperforminthe tutorial.  Setrulesfora domain.A domainrule isa conditionthatwill be usedbyDQSto validate,correct, and standardize domainvalues.  Setterm-basedrelationshipsforadomain.A term-basedrelationshipenablesyoutomake a correctionto a termthat is part of a value ina domain.Forexample,inthe value ContosoInc., Inc. is a termthat can be definedasIncorporated.Thishelpsinstandardizingthe dataaswell as inidentifyingduplicates.Forexample, ContosoInc.andContoso Incorporated can be consideredduplicates.  Specifysynonymsindomainvalues.Youcan settwoor more valuesassynonymsandsetone of themas a leadingvalue,whichreplacesitssynonymvaluesduringacleansingactivityto standardize the data.  Create a composite domainnamed AddressValidation thatcomprises AddressLine, City,State, and Zipdomains.A composite domainisadomainthatconsistsof one or more single domains. It letsyou create a rule thatinvolvesmultiple domains.Forexample,youcandefine arule:if CityisLos Angeles,State mustbe CA,where CityandState are twoseparate domains.  Configure anduse a reference dataprovider/service. The Reference DataService feature inData QualityServices(DQS) enablesyoutosubscribe tothird-partyreference dataproviders,andto easilycleanse andenrichyourbusinessdatabyvalidatingitagainsttheirhigh-qualitydata.You can use servicesfromleadingdataqualityservice providersfromwithinDQStostandardize, correct, or enrichyourdata duringthe cleansingprocess. Inthistutorial,youwill learnhowto configure yourDQSenvironmenttouse areference dataservice onWindowsAzure Marketplace anduse the service associatedwiththe AddressValidationcompositedomainto cleanse addressdata.  Publishthe knowledgebase sothatthe KB can be usedincleansingandmatchingactivities. Task1: Creatinga KnowledgeBaseandDomains In thistask,you will create the Suppliersknowledge base andcreate domainsthatwill be usedfor cleansingdataandmatchingdata to remove duplicates. 1. Launch Data Quality Client.Click Start,pointto All Programs, clickMicrosoftSQL Server2012, click Data Quality Services,andthenclickData Quality Client. 2. In the Connectto Serverdialogbox,selectthe database serverinstance onwhichthe DQSis installed,andclick Connect.
  • 9. 9 3. In the Data QualityClienthome page,inthe Knowledge Base Managementpane,click New Knowledge Base. 4. Enter SuppliersforName of the knowledge base.
  • 10. 10 5. ConfirmthatCreate Knowledge Base from fieldissetto None since youare creatingthe Suppliers knowledge base fromscratch. 6. ConfirmthatDomain Managementis selectedforthe Activityandclick Next.The Domain Managementactivityletsyoucreate andmanage domainsinthe knowledgebase. 7. In the Domain Managementwindow,click Create a domain toolbarbuttontocreate a domain. 8. In the Create Domain dialogbox,type SupplierID forthe Domain Name,and click OK.
  • 11. 11 9. Repeatprevioussteptocreate the followingdomainswithall the defaultsettings.Tokeepthe tutorial simple,youwill keepthe Data Type of all the domainsas String.The otheralloweddata typesare:Integer,Decimal,andDate.When the Use LeadingValuesoptionisselected(default),all synonymsare replacedwiththe leadingvalue of the synonymgroupinthe output.Setting Normalize String option(default) removesanyspecial charactersinthe domainvalues.The Format Output to optionletsyouselectthe formattingthatwill be appliedwhenthe datavaluesinthe domainare output.Select Enable Speller(default) torunSpelleronall stringvalueswhen populatingthe domain.The Language settingspecifieswhichlanguageversionof the Spelleryou wantto apply.Select Disable SyntaxError Algorithms to populate the domainwithoutchecking stringvaluesforsyntax errors.See Create aDomain topic inthe MSDN libraryfor more details. a. SupplierName b. Contact Email c. AddressLine d. City e. State f. Country g. Zip
  • 12. 12 Task2: AddingDomainValuesManually In thistask,you will adda value forthe Country domainmanually.See ChangeDomainValues topicfor more detailsaboutthe fieldsonthispage. 1. Click Country domaininthe Domain list. 2. In the rightpane,switchto the Domain Valuestab. 3. Click Add newdomain value buttonon the toolbarin the rightpane. 4. Type UnitedStates forthe Value fieldandpress ENTER. Youcan see that,by default,the Type isset to Correct (greencheck).The Type can be setto Error (redcross) or Invalid (orange triangle),anda correct value can be enteredinthe CorrectTo field. Task3: ImportingDomainValuesfromanExcel File In thistask,you will importvaluesforthe State domainfroma worksheetof anExcel file. 1. Click State domaininthe Domain list. 2. Ensure that the Domain Valuestab is active inthe rightpane. 3. In the rightpane,fromthe toolbar,click downarrow nextto the Import Valuesbutton,and click Import Valid Valuesfrom Excel. 4. Click Browse,selectSuppliers.xls,andclick Open. 5. SelectStatesToImport$ forthe Worksheet.
  • 13. 13 6. Click OK to close the Import Domain Valuesdialogbox.Youshouldsee all the namesof statesyou importedinthe list.Notice that ShowOnly Newoptionisautomaticallyselectedafterimporting. Whenyouimportvaluesandyou don’tsee the oldvaluesinthe list,itisbecause thisoptionis automaticallyenabledafterimporting.Tosee all the values,justclearthe checkbox.If youimport the same set of valuesagain,none of the valueswill be importedastheyalreadyexistinthe domain. Task4: SettingDomainRules In thistask,you will create arule forthe Contact Email domaintoverifyif the email addressendswith @adventure-works.com.See CreatingaDomainRule topicformore detailsonthe page. 1. Click Contact Email inthe Domain list. 2. Switchto the Domain Rulestab in the rightpane. 3. In the rightpane,click Add a newdomain rule toolbarbutton(see the image) toadda new rule.
  • 14. 14 4. Type Email Validationfor the rule name and press ENTER. The Active check box shouldbe checked by default.Thiscontrol allowsyoutodeactivate arule temporarily. 5. In the Builda Rule pane,click down arrow, andselectValue endswith. 6. Type @adventure-works.cominthe textbox and pressTAB. You can addmore conditionsby clickingAdda newconditionto the selectedclause toolbarbuttoninthe Build a Rule pane.Inthis scenario,youwill notbe addinganothercondition. 7. Click Run the selecteddomainrule on test data buttonon the toolbarinthe rightpane to test the rule againstsample data. 8. In the Test Domain Rule dialogbox,click Adds a new testingterm for the domainrule buttonon the toolbar. 9. Type frank7@adventure-works.com(a validvalue) inthe ContactEmail column.
  • 15. 15 10. Repeatprevioustwostepstoadd joe2@adventure-work.com(aninvalidvalue withno‘s’). 11. Clickthe lastbutton(Test the domain rule on all the terms) on the toolbarto testthe inputdata againstthe rule. 12. Notice thatthe firstentryis shownas a validitemandthe secondone as an invaliditem. 13. Click Close to close the TestDomain Rule dialogbox. Task5: SettingTerm-BasedRelations In thistask,you will define afewtermbasedrelationsforvaluesforthe SupplierName domain.A term- basedrelationenablesyoutomake a correctionto a termthat is part of a value ina domain.Itenables multiple valuesthatare identical exceptfor the spellingof acommon part of themto be considered identical synonyms.Forexample, Inc.canbe correctedto Incorporated. DQS will use these relationsin the knowledge discovery,cleansing,ormatchingprocesses.See Create Term-basedRelations formore details. 1. SelectSupplierName inthe Domain list. 2. Switchto the Term-BasedRelationshipstabin the rightpane. 3. Click Add newrelationbuttonon the toolbarto add a new relationtothe table. 4. Type Co. for the Value fieldand Companyfor the Correct To field. 5. Repeatthe previoustwostepsforthe followingvalues: Value Correct To Corp. Corporation Inc. Incorporated Task6: SettingSynonyms In thistask,you will settwodomainvalues,USAandUnitedStates,of the Country domainas synonyms withUnitedStatesas the leadingvalue.Since the Use LeadingValues optionwasselectedwhen
  • 16. 16 creatingthe Country domain,any USA valuesforthe Country domainwill be outputas UnitedStates(as thisisthe leadingvalue).See ChangeDomainValues formore details. 1. SelectCountryfromthe listof domains. 2. Switchto the Domain Valuestab. 3. Click Add newdomain value buttonon the toolbar. 4. Type USA forthe value andpress ENTER. 5. Multi-selectUnitedStatesandUSA usingCTRL or SHIFT keys,right-clickthe selecteditems,andthen clickSet as Synonyms. DQS will groupthese valuesanddesignateone of the valuesasthe leading value thatthe other valueswill be replacedwith. 6. Notice thatUnitedStates isset as the leadingvalue.If youwantUSA to be the leadingvalue,you can right-clickonUSA andselect Setas Leadingoption.Forthistutorial,we will use UnitedStatesas the leadingvalue. Task7: Creatinga CompositeDomain In thistask,you will create acomposite domain, AddressValidation,whichcomprises AddressLine, City, State and Zip domains.A composite domainletsyoudefineacross-domainrule thatinvolves multiple domainsinarule.There are otheradvantagestoa composite domainsuchasbeingable to parse a fieldvalueintomultipledomains. Forexample,avalue fora Full Name fieldcanbe parsedinto separate FirstName,Middle Name,andLastName domains.Inthistutorial,we will onlybe defininga cross-domainrule.See ManagingaComposite Domain formore details. 1. In the leftpane,clickCreate a composite domain toolbarbutton.
  • 17. 17 2. Enter AddressValidationforthe Composite DomainName. 3. From the domainlistselect AddressLine, City, State,and Zip and click right arrow to addthemto the Domains in Composite Domain list. 4. Click OK to close the dialogbox. Task8: Creatinga CompositeDomainRule In thistask,you will create arule forthe AddressValidationcomposite domain.Youwill define across- domainrule:if Cityis Los Angeles,State mustbe CA where CityandState are two domains. 1. In the rightpane,switchto the CD Rules tab. 2. Click Add a new domain rule fromthe toolbar. 3. Type City-State Rule forName andpress ENTER. 4. In the Builda Rule pane,selectCityinthe domainlist,andselectcondition Value isequal to and type Los Angelesforthe value.
  • 18. 18 5. In the Then pane,selectState inthe domainlist,andselect Value isequal to, type CA for the value, and press TAB. 6. Click Close buttonat the bottomof the page to switchto the mainpage of DQS Client.Youwill publishthe knowledgebase inthe nextlesson.Notice thatthe KBisin lockedstate (lockicon). Task9: ConfiguringaReferenceDataService In thistask,you will configureDQStouse a Reference DataService onWindowsAzure Marketplace.In the nexttask,you will be configuringthe AddressValidationdomaintouse thisservice.Atruntime, duringcleansingactivity,DQSpassesthe valuesof domainsinthe AddressValidationdomaintothe service forcleansing.See Configure DQStoUse Reference Data formore details. 1. In the mainpage of DQS Client,inthe Administrationpane,click Configuration. 2. Ensure that Reference Data tab isactive. 3. In the Network Settingsarea,type appropriate valuesinthe ProxyServerand Port fieldsif youneed to use a proxyserverto connectto internet. 4. Type your WindowsAzure Data Market (Marketplace) Account Key for the DataMarket Account ID field.
  • 19. 19 5. Click Validate buttonnexttothe textbox to validate the accountID. 6. Click OK on the message box. 7. Click Close at the bottomof the page to switchto the mainpage of DQS Client. Task10: ConfiguringCompositeDomainto UseReferenceDataService In thistask,you will configurethe AddressValidationcomposite domaintouse the MelissaData – AddressCheck service. Atruntime,duringcleansingactivity,DQSpassesthe valuesof domainsinthe AddressValidationdomaintothe service forcleansing.See MapDomain/Composite Domainto Reference Dataformore details. 1. In the mainpage of DQS Client,clickSuppliers(DomainManagement) underRecentKnowledge Bases to launchthe Domain Managementpage. 2. Selectthe AddressValidationcomposite domainisselectedinthe listofdomains. 3. In the rightpane,switchto the Reference Data tab. 4. Click Browse buttononthe toolbar. 5. On the Online Reference Data ProvidersCatalog dialogbox,select checkboxnextto MelissaData – AddressCheck.
  • 20. 20 6. In the rightpane,inthe Schemasection,map AddressLine domaintothe AddressLine (M) schema itemusingthe drop-downlist. 7. Click Add Schema Entry (+) buttonon the toolbarto create a new entryin the list. 8. Map the followingDQSdomainsusingthe drop-downlistsasshowninthe followingpicture. 9. Click OK to close the dialogbox. Task11: PublishingtheKnowledgeBase In thistask,you will publishthe knowledge base.A publishedknowledgebasecanbe usedforcleansing or matchingactivityindata qualityproject. 1. Click Finishbuttonat the bottomof the window. 2. Click Publishinthe SQL ServerData Quality Services dialogbox. 3. Click OK to close the message box.
  • 21. 21 Task12: DiscoveringKnowledge(KnowledgeDiscovery) In thistask,you will performthe KnowledgeDiscoveryactivityon SupplierIDand SupplierName domains.Inthisscenario,the knowledge discoveryprocessmainlyimportsvaluesforthese two domains. In thistutorial,youstartedbuildingknowledge base fromscratch.Youcan alsostart creatinga knowledge base byperformingaknowledge discoveryactivity.Whenyouclick Create a Knowledge Base inthe mainpage,DQS clienttakesyoutoa page with Domain Managementactivityselectedforthe activity.Youcan change the activity to Knowledge Discoveryandthenin the nextpage youcan create domainsaspart of the knowledge discoveryprocess.See PerformKnowledge Discovery formore details. 1. In the mainpage of DQS Client,inthe RecentKnowledge Base section,click right-arrownexttothe SuppliersKBand click Knowledge Discovery.Alternatively,youcanclick OpenKnowledge Base, selectSuppliersfromthe KB list,selectKnowledge Discoveryasactivity and click Next. 2. SelectExcel File forData Source. 3. Click Browse,navigate andselect Suppliers.xls,andclick Open. 4. SelectSuppliersforDiscoveryfor Worksheet.
  • 22. 22 5. In the Mappingssection,map SupplierIDcolumnfromthe Excel file tothe SupplierID domainand SupplierName columnto the SupplierName domainby usingdrop-downlists.The Excel file has sample datafor the SupplierID and SupplierName domains.Inthe discoveryprocess,youcan selectthe domainsforwhichyouwantto discoverthe values.Note thatyoucreate domainsonthis page and thenmap the source columnsto those domains.Itisnotuncommonto create domains duringknowledgediscoveryactivityinsteadof creatingdomainsduringdomainmanagement activity. 6. Click Nextto switchto the Discoverpage. 7. On the Discover page,click Start to start the discoveryprocess.Discoveryisperformedonthe columns SupplierIDandSupplierName in the Suppliers.xlsfile.The SupplierIDand SupplierName domainswill be populatedwiththe knowledge drawnfromthe discovery. 8. Afterthe analysishascompleted,review the Source Statisticsinthe Profilertab at the bottomof the page. Notice that10 newrecordswithtotal 20 values(SupplierIDandSupplierName values fromthe Excel worksheet) were discovered.Youwill alsoseehow manyof the valuesare new,
  • 23. 23 unique,newandunique,andvalid.Inthe listbox tothe right,you can see more detailsforeach domaininvolvedinthe discoveryprocess.If youhoverthe mouse overthe statusbar inthe Completenesscolumn,youcansee if there are any missingvaluesinthe columnsinthe source. 9. Click Nextto switchto the Manage Domain Valuespage. 10. In the Manage Domain Valuespage,click SupplierName domainfromthe listof domains. 11. In the rightpane,right-clickLazyCountry Storex (notice ‘x’atthe end),andselectLazy Country Store. DQSsuggeststhischange afterrunningthe spell checkeronthe domain.Bydefault,spelleris enabledonthe domainsyoucreate. 12. In the domainvalueslist,confirmthatthe value LazyCountry Storex is setas an error (red X mark) withLazy Country Store as the correctionand alsothe Lazy Country Store isalsoaddedas a valid value. 13. Click Finish. 14. On SQL Server Data Quality Services dialogbox,click Publish. 15. Click OK on the successmessage box.
  • 24. 24 Lesson 2: Cleansing Supplier Data using the Suppliers Knowledge Base In thislessonyouwill cleanse the supplierdatainan Excel file usingthe SuppliersKByouhave created inthe firstlesson.DatacleansinginDQSincludesa computer-assistedprocessthatanalyzeshow data conformsto the knowledgeinaknowledge base,andan interactive process that enablesyoutoreview and modifyresultsfromthe computer-assistedprocess.The datacleansingfeature identifiesincorrect data inyour data source,andthencorrects or suggestscorrectionsforthe incorrectdata. It also standardizesandenrichescustomerdatabyusingdomainvalues,leadingvaluesforsynonyms,domain rules,term-basedrelations,andreference data.Youcan interactivelyapproveorrejectchanges proposedbythe computer-assistedprocess.See DataCleansingformore details. The computer-assistedprocessusesthe followingthresholdvaluesthatyoucan configure usingthe Configurationoptiononthe DQSClientmainpage.  Min score for suggestions:The minimumscore or confidence level thatwill be usedbyDQSfor suggestingreplacementfora value.  Min score for auto corrections:The minimumscore orconfidence levelthatwill be usedbyDQS for automaticallycorrectingavalue. See Configure ThresholdValuesforCleansingandMatching for detailsonhow to configure these settings. In thislesson,youwill performthe followingtaskstocleanse the inputdatausingthe SuppliersKB. 1. Create a Data QualityProjectforCleansing,selectthe SuppliersKBasthe KB to use to analyze and cleanse the source datainan Excel file,andselectthe Cleansingactivity. 2. Map Excel columnstobe cleansedtoappropriate DQSdomains/compositedomainsinthe knowledge base. 3. Run the computer-assistedcleansingactivity.The computer-assistedprocessdisplaysdata qualityinformationinthe DataQuality Clientthatyoucan use to interactivelycleanse the data. 4. Viewandmanage the resultsof the cleansingactivity.Youcanreview the valuesthatare found by the computer-assistedprocesstobe correct,incorrectbut corrected,incorrectwitha suggested change,orinvalid.Youcaninteractivelyapprove orrejectchanges,correctingor overridingthe suggestionfromthe computer-assistedprocessusingthe CorrectTofield. 5. Exportthe resultsfromthe cleansingprocesstoanExcel file. 6. Importthe values fromthe cleansingprojectintodomainstoaugmentthe knowledgeinthe knowledge base withnewrules,values,correctionsetc… Task1:CreatingaData QualityProject In thistask,you will create aData QualityProjectforcleansingthe supplierdatain an Excel file against the Suppliersknowledge base youcreatedearlierinthistutorial. 1. In the Data Quality Projectpane onthe mainpage,click NewData Quality Project.
  • 25. 25 2. Type Cleanse SupplierListfor the name of project. 3. Important: SelectSuppliersforthe Use Knowledge Base field.Youwill be cleansinginputsupplier data againstthe Suppliersknowledgebase youcreatedearlierinthistutorial. 4. Ensure that Cleansingisselectedasthe activityat the bottomof the rightpane and click Next. Task2: MappingExcel Columnsto DQS Domains In thistask,you will mapcolumnsinanExcel file toDQS domainsinthe Suppliersknowledge base.
  • 26. 26 1. In the Map page,selectExcel File forData Source. 2. Click Browse,selectSuppliers.xlsx,andclick Open. 3. SelectIncomingSuppliers$forthe Worksheet. 4. Map columnsasshowninthe followingtable andscreenshot.Whencreatingmappingsforthe State domain,click Adda column mappingtoolbarbuttonlocatedjustabove the list.Note thatyouare not usingSupplierID column/domainforcleansing.Youwill use the SupplierIDdomainlaterinthe matchingactivity. Excel column DQS Domain SupplierName SupplierName ContactEmailAddress Contact Email AddressLine AddressLine City City State State Country(Click +(Adda columnmapping) toolbar to add a newrow) Country ZipCode Zip 5. As youhave mappedall the individual domainswithinthe AddressValidationcomposite domain, the composite domainautomaticallyparticipatesinthe cleansingprocess.Click View/Select Composite Domains buttonto see thatthe Address Validationcomposite domainisautomatically selected,andthenclick OK.
  • 27. 27 6. Click Nextto switchto the Cleanse page. Task3: CleansingDataagainsttheSupplierKnowledgeBase In thistask,you will runthe computer-assistedcleansingprocess.DQSusesadvancedalgorithmsand confidence levelsbasedonthe thresholdvaluesspecifiedtoanalyze the dataagainstthe selected knowledge base,andthencleanse it.See CleansingDataUsingDQS (Internal) Knowledgeformore details. 1. Click Start to start the computer-assistedcleansingprocess.
  • 28. 28 2. Whenthe cleansingprocessiscompleted,review statisticsinthe Profilertab.The Source Statistics provide the numberof recordsprocessed,numberof recordsthatare foundto be correct, number of recordsthat are correctedby DQS, numberof recordsthat have changessuggestedbyDQS,and the numberof records thatare invalid.Inthe listbox tothe right,you can see the correctedvalues, suggestedvalues,andthe completeness(the extenttowhich the dataispresent) andaccuracy (the extenttowhichthe data can be usedfor intendedpurposes)of valuesforeachdomaininvolvedin the cleansingprocess. 3. Click Nextto switchto the Manage and ViewResults page. Task4: ManagingandViewingResults In thistask,you will reviewthe resultsof computer-assistedcleansingandalsoperforminteractive cleansingonthe supplierdata.See Interactive CleansingStage formore details. 1. SelectContact Email domainfromthe listof domains. 2. Switchto the Invalidtab in the rightpane.Notice thattwo emailsaddressthatwere missing character ‘s’at the end.These twoemailswere foundtobe invalidbythe domainrule thatrequires all email addressestoendwith @adventure-works.com(with‘s’).DQSusesthe domainrule while cleansingtodeterminewhetheranemail isavalidone or not.Thistab displaysthe domainvalues that were eithermarkedasinvalidinthe KBor failedadomainrule.Inthiscase,these valuesfailed the domainrule (Email Validation). 3. In the Correct To column,type the rightemail addressthatendwith @adventure-works.com(with ‘s’).
  • 29. 29 4. Click Approve for boththe records to approve boththe changes.Whenyouapprove,the records move to the Correctedtab. Insteadof approvingeachitemseparately,youcanapprove all the changesat once usingthe Approve all termstoolbarbutton. 5. Switchto the Newtab in the rightpane.These are the valuesforwhichDQS doesnothave enough informationinthe KByetto determine whetherthe valuesare correct.Therefore,itcannotmake changesor suggestchangesto the domainvalues. 6. Reviewthe valuestoconfirmthatall the emailsendwith @adventure-works.comandclick Approve all terms onthe toolbar.The approvedvaluesfromthistabmove tothe Correct tab. 7. Selectthe Countrydomainfromthe listof domains. 8. Switchto the Correctedtab in the rightpane and notice that UnitedState value isautomatically correctedto the UnitedStates with‘s’at the end.Thisis not a rule youdefinedforthe Country domain,butDQS is 83% confidentthatthe correct value is UnitedStates.The Approve buttonis automaticallyselectedforall the Correcteditems.Youcan override thisandrejectachange if needed. 9. Notice thatUSA iscorrectedto UnitedStates because theyare synonymsand UnitedStatesis the leading(preferred) value. 10. Notice thatthe Approve buttonisalreadyselectedforthese correctedvalues.Thisisthe default behaviorforthe correctedvalues.Youcan rejecta change and whenyoudoso, the value movesto the Invalid tab. 11. SelectSupplierName fromthe listof domains. 12. Switchto the Correctedtab in the rightpane.
  • 30. 30 a. Notice thatA. Datum Corp. iscorrectedto A. Datum Corporation and the Reason isset to Term based relation.A. Datum Corporation is a knowndomainvalue toDQS because itwas discoveredduringthe knowledge discoveryprocess.Therefore,DQSis 100% confident aboutthiscorrection. b. Notice thatthat Lazy Country Storex is correctedto Lazy Country Store, Confidence Level is setto 100%, and the Reason isset to Domain Value.Duringthe knowledge discovery process,youset Lazy Country Storex as an error with Lazy Country Store as the correction, so DQS is 100% confidentaboutmakingthiscorrection. c. DQS isnot familiarwiththe othervaluesinthe list,butitfoundthe correctionsforthese valuesusingthe Spell Checkerandproposesthe appropriate corrections.DQSis not 100% confidentaboutthese corrections,butthe confidence levelisabove 80%, whichisthe thresholdlevel formakingcorrections,soDQSproposesthe corrections. 13. Notice thatthe Approve isautomaticallyenabledforall the values.Youcanoverride the corrected value or rejectthe change as appropriate.Bydefaultthe Approve buttonisselectedforall the valuesonthe Correctedtab. 14. Switchto the Newtab. 15. Notice thatCorp. is correctedto Corporation,Co. iscorrectedto Company, and Inc. iscorrectedto Incorporated. For example, Consolidate Inc.iscorrectedto Consolidate Incorporatedand ConsolidatedCo.is correctedto ConsolidatedCompany,and Frabrikam Corp. is correctedto Fabrikam Corporation. You can see that term-basedrelationismentionedasthe reason.These changesare proposedbyusingthe term-basedrelations youdefinedduringthe domain managementactivity.Youcanchange the Correct To valuesmanuallyhere. 16. Scroll the listto see Hunxgry Coyote Store witha redsquigglyline.Right-clickonitandclick Hungy Coyote Store (withno‘x’).The Correct To columnshouldbe automaticallypopulatedwith Hungry Coyote Store. You can alsomanuallytype avalue inthe Correct To column. 17. Click Approve all terms fromthe toolbar.The domainvalueswiththe CorrectTo value specified move to the Correctedtab and the new valueswithnoassociated CorrectTo valuesmove tothe Correct tab. 18. Selectthe AddressValidationcomposite domainfromthe domainlist. 19. In the rightpane,switchto the Correct tab. You shouldsee the addressesthatare foundto be correct by the MelissaData – Address CheckDQS service onthe Azure Marketplace. 20. Switchto the Correctedtab.
  • 31. 31 21. Notice thatState for the record that has City as Los Angelesissetto CA now.Notice inthe Reason fieldisthatCorrectedby Rule ‘City-State Rule’. 22. Notice thatthe Approve radiobuttonis alreadyselectedforthisiteminthe list.Thisisthe default behaviorforitemsonthe Correctedtab. 23. Switchto the Suggestedtab.Reviewthe changessuggestedbythe MelissaData – AddressCheck service. 24. ClickApprove all terms on the toolbarbuttonand click OK onthe Confirmationmessage box. 25. Click Nextto switchto the Export page. Task5: ExportingCleansingResultsto anExcel File In thistask,you will exportresultsfromthe cleansingactivitytoanExcel file.See ExportStage topicfor more details. 1. In the rightpane,select Excel forthe DestinationType. 2. Click Browse,specifythe outputfile name as CleansedSupplierList.xls,andthenclick Open. 3. SelectData Only for the Output formatto exportjustthe cleanseddata.The secondoption, Data and CleansingInfo,letsyouexportcleansingactivitydetailsalongwiththe cleanseddata.The Standardize Format optionletsyouapplyanyoutputformatsyoudefine ona domaintothe values of thatdomain.You have notdefinedanoutputformatonany domaininthe tutorial. 4. Click Export to exportthe data.Do not click Finishyet. 5. Click Close onthe Exportingdialogbox. 6. Click Finishto finishthe activity.If youhadforgottentoexportresultsbefore clicking Finish,click OpenData QualityProject inthe main page of DQS Client,selectCleanse SupplierListfromthe list
  • 32. 32 of projects,andclick Nextat the bottomof the screento getto the Export stage of cleansing processagain.You can alsoswitchto the Manage and ViewResults tab byclickingBack button. 7. Openthe CleansedSupplierList.xls anddo the following: a. Ensure that there are no email address thatendwithadventure-work.com(without character ‘s’) bysearchingforadventure-work.cominthe worksheet. b. See that there isno USA value inthe Country column. c. Searchfor Los Angelesandsee that the State is setto CA. d. Confirmthatthere are notermsCo., Corp., andInc. e. Important: delete the AddressValidationcolumnfromthe spreadsheetandsave the excel file.Thisadditional columncorrespondstothe AddressValidationcomposite domain. Task6: ImportingValuesfromtheCleanseSupplierList Project In thistask,you will importthe dataqualityknowledge gatheredduringthe cleansingprocess.See ImportingCleansingProjectValuesintoaDomain topicformore details.Youwill alsoexportthe knowledge base intoaDQSfile before publishingthe updated SuppliersKB. 1. In the mainpage of DQS Client,clickright-arrow nextto SuppliersunderRecentKnowledge Bases and click Domain Management. 2. Click Contact Email inthe listof domains,andswitchtothe Domain Valuestab inthe right pane. 3. Click down arrow nextto the Import Valuesiconon the toolbarandclick Import ProjectValues. 4. On the Import Project Values dialogbox,selectthe Cleanse SupplierListproject,andclick OK. 5. Notice thatall the emailsare importedalongwiththe twocorrectionsyoudidduringinteractive cleansing.Scroll tosee the twocorrections. Value Correct To bobby0@adventure-work.com bobby0@adventure-works.com tad0@adventure-work.com tad0@adventure-works.com 6. Repeatthe previousstepof importingprojectvaluesforthe Countrydomainandnotice thata new entryisaddedfor correcting UnitedState toUnitedStates (with‘s’) Value Correct To UnitedState UnitedStates 7. To see the olddomainvalues,clear ShowOnly Newcheckbox. 8. Repeatthe previousstepof importingprojectvaluesforthe SupplierName domain.Bydefault, afterimporting,youwill onlyseethe new values.Tosee all the values,clear ShowOnlyNewcheck box.We have enrichedthe SuppliersKBwithwhatwe learnedfromthe cleansingactivity.The strongerthe KB is,the betterthe cleansingresultsare.Note thatitisnot possible importvaluesfor a composite domain. 9. Click Export Knowledge Base icon onthe toolbarandthenclick Export Knowledge Base.
  • 33. 33 10. Navigate tothe Tutorial folder,type Suppliers.dqsforthe file name,andclick Save. You can use this DQS file tocreate a newknowledge base basedonit. 11. Click OK to close the Export Knowledge Base – Suppliersmessage box. 12. Click Finishto finishthe activity. 13. Click Publish. 14. Click OK on the message box. Lesson 3: Matching Data to Remove Duplicates from Supplier List You prepare the knowledgebase forperformingmatchingactivitybycreatingamatching policyinthe knowledge base.There canbe onlyone matchingpolicyina knowledge base.A matchingpolicyconsists of one or more matchingrules.A rule identifiesthe domainsthatwill be involvedinthe matching process,andspecifiesthe weightthateachdomainvalue carriesinthe matchingjudgment.Youspecify inthe rule whetherdomainvalueshave tobe anexactmatch or can justbe similar,andto whatdegree of similarity.Youalsospecifywhetheradomainmatch isa prerequisiteforthe matchingprocess.You can testeach rule separatelyandtestthe entire policyagainstsample data.The testingprocessdisplays recordswhose matchingscoresare greaterthan the Min record score thresholdspecifiedinthe DQS configurationinacluster(group).Youcan continue totweakthe rulesinthe policyuntil youare satisfied. Afterdefiningthe policy,youcreate aData QualityProjecttorun the matchingactivity.The matching projectappliesthe matchingrulesinthe matchingpolicytothe datasource to be assessed.Thisprocess assessesthe likelihoodthatanytworowsare matches.WhenDQSperformsthe matchinganalysis,it createsclustersof recordsthat DQS considersmatches.DQSrandomlyidentifiesone of the recordsas a pivotrecord.You can verifyandrejectanyrecord thatis not an appropriate matchforthe cluster.See Create a Matching Policy topicformore details. In thislesson,youwill performamatchingactivitytoremove duplicatesfromthe supplierlist.First,you will create amatchingpolicywithone rule toidentifyduplicatesinthe supplierlistandpublishthe policytothe knowledge base.Next,youwillcreate andruna data qualityprojectformatching.Finally, youwill exportthe resultsfromthe matchingactivitytoanExcel file thatyouwill use laterinuploading data to Master Data Services(MDS). Task1: DefiningaMatchingPolicy In thistask,you will create amatchingpolicywithone rule init.The rule will have one prerequisite: SupplierID, whichmeansthatthe SupplierIDsmustmatch before usingthe otherdomainsinthe rule. The rule usestwo otherdomains: SupplierName withSimilarityvalue setto70% andContact Email withSimilarityvalue setto30%.
  • 34. 34 1. In the mainpage of DQS Client,clickright-arrow nextto SuppliersKB,andselectMatchingPolicy. 2. SelectExcel File forData Source inthe Map page. 3. Click Browse,ensure filterissetto Excel Workbook,and selectCleansedSupplierList.xlsfilethat youexportedafterperformingthe cleansingactivity.Note thatatthe endof thisactivity,youwill not be able to exportresultsbecause thisactivityisprimarilyfocusedondefiningamatchingpolicy. You will create aData QualityProjectforthe Matchingactivityandrun it to remove duplicatesfrom the supplierlistusingthismatchingpolicyinthe nextlesson. 4. Map SupplierIDcolumntoSupplierID domain, SupplierName columnto SupplierName domain, ContactEmailAddress columnto Contact Email domain.Youonlyneedtomap source columnsto domainsthatyou wantto use in definingthe matchingpolicy.Inthiscase,youare makingthe SupplierID,SupplierName,and ContactEmail domainsavailable forthe matchingpolicyactivity.
  • 35. 35 5. Click Nextto move to the Matching Policypage where youwill be definingamatchingpolicywith one rule init. 6. Click Create a matching rule buttonon the toolbarto create a rule inthe policy. 7. In the Rule Details pane on the right,enterRemove Duplicate Suppliers forthe Rule name. 8. Click Add a new domain elementinthe toolbarinthe rightpane. 9. SelectSupplierIDfor the domain and selectthe Prerequisite checkbox.Notice that Similarityis automaticallysetto Exact. By settingSupplierIDas the Prerequisite,youspecifythatthe valuesfor thisfieldinthe tworecordsmustreturn a 100% match, else the recordsare notconsidereda match and the otherclausesinthe rule are disregarded.
  • 36. 36 10. Click Add a new domain elementfromthe toolbaragain. 11. SelectSupplierName domain,selectSimilarforSimilarity,andType 70 forthe Weight. Here,you are specifyingthatsuppliernamesdonot needtobe identical butcanbe similarforthe recordsto be consideredasa match. The weightindicatesthe contributionof thisfield’sscore tothe overall matchingscore. 12. Repeatsteps10-11 to add Contact Email domainwith 30 for the Weight. 13. Notice thatthe min matching score isset to 80%, whichisthe value yousee inthe General tabof the Configurationpage of DQS Administration.You can onlyincrease thisscore above this thresholdvalue here. 14. Notice thatOverlappingClusters optionisselected. Withthisoption,arecordcan show upin multiple clusters.If youchange the settingtoNonOverlappingClusters,the clustersthathave commonrecordsare combinedintoone single cluster. 15. The Start buttonon thispage allowsyouto testeach rule inthe policyseparately,whereas,the Start buttoninthe nextpage allowsyoutotestentire policy(all the rulesinthe policy). 16. Click Nextto switchto the Matching Resultspage. Task2: TestingandPublishingtheMatchingPolicy In thistask,you will testandpublishthe Remove Duplicate Suppliers matchingpolicy. 1. In the Matching Resultspage,click Start to testthe entire policy.Inourcase,we have onlyrule in the policy,sothe resultsfromtestingthe rule andthe policyshouldbe the same. 2. Reviewall the matchedrecordsandtheirmatchingscore inthe listbox.A recordthat has a Green iconassociatedwithitisa duplicate of the pivotrecordthatprecedesit.Here are couple of examples: a. The record with Record ID: 1000005 isa match of the record with RecordId: 1000004 with Score: 100% because boththe recordshave the same valuesfor SupplierID(prerequisite), SupplierName,and ContactEmailAddresscolumns.DQS randomlypicksarecord as the pivotrecordfor a cluster. b. The record 1000023 is a match of the record 1000022 withthe matchingscore: 93% because the tworecordshave the same valuesfor SupplierID(prerequisite) andSupplier Name columns,butdifferentvaluesforthe ContactEmailAddresscolumn. c. Scroll to the bottomof the listto see tworecordswithrecordsIDs: 1000051 and 1000052. Record1000052 isconsideredamatch withmatchingscore 91% because the tworecords have the same valuesforthe SupplierIDand ContactEmailAddress columns,butdifferent valuesforthe SupplierName column.
  • 37. 37 3. Right-clickonanymatchedrecord(withgreenicon) andclick ViewDetailsto see more detailsabout the matchingsuch as contributionof eachfieldscore tothe overall matchingscore. 4. Click Close to close the MatchingScore Details dialogbox. 5. Click Matching Resultstab at the bottomof the page.Thiswindow givesyoudetail suchasnumber of matchedrecords,numberof unmatchedrecords,numberof clusterswithmatchedrecords,the average clustersize,minimumclustersize,andmaximumclustersize.See Create aMatchingPolicy for more details.NOTE:youwill notbe able exportresultsfromthisactivity.You are justdefininga matchingpolicyusingthe sample datatotest rulesandthe policyagainstthe sample data.
  • 38. 38 6. Click Finishto finishcreatingthe matchingpolicy. NOTE: You have definedthe matchingpolicyhere;therefore youcannotexportresultsto anoutput file.Youbasicallyusedasample inputfile,createdrules,andtestedthe rulesandpolicyagainstthe sample datawiththe goal of definingthe policy. 7. Click Publishonthe SQL ServerData QualityServicesdialogbox andclick OKon the message box. Now,the matchingpolicyyoudefinedispublishedintothe SuppliersKnowledge Base.Youcanuse the knowledge base torunthe matchingprocessagainstan inputfile toidentifyandremove duplicates. Task3: CreatingandRunninga DataQuality ProjectforMatching In thistask,you will create aData QualityProjectforthe matchingactivityandrun the matchingprocess on cleansedsupplierdatato remove anyduplicatesinthe data. 1. On the mainpage of DQS Client,click NewData Quality Project. 2. Type Remove SupplierDuplicates from the Name of the project. 3. Important: SelectSuppliersfromthe listof KBs forthe Use Knowledge Base field.Youhave created a matchingpolicyinthisknowledge base inthe previouslesson. 4. Important: SelectMatching fromthe list ofactivitiesfrom the bottom-rightpane.
  • 39. 39 5. Click Next. 6. In the Map page,selectExcel File forthe Data Source. 7. Click Browse andselectCleansedSupplierList.xls,whichisthe outputfile fromthe cleansing activity. 8. Map SupplierIDsource columntothe SupplierID domain, SupplierName columntoSupplierName domain,and ContactEmailAddress columnto Contact Email domain. 9. Click Nextto switchto the Matching page. 10. Click Start to start the matchingprocess.Youwill see resultssimilartothose fromthe previoustask because youusedthe same inputfile fordefiningthe matchingpolicy. 11. Reviewall the matchedrecordsandtheirmatchingscore inthe listbox.The resultsshould be same as the onesyou sawin the previoustask.See the stepsinthe previoustasktoanalyze the results fromthismatchingactivity. 12. Click Nextto switchto the Export page. Task4: ExportingtheResultsfromMatchingActivityto an Excel File In thistask,you will exportthe resultsfromthe matchingactivitytoanExcel file. 1. In the Export page,selectExcel File forthe DestinationType. 2. SelectSurvivorshipResultsoption.Inthe survivorshipprocess,DQSdeterminesasurvivorrecordfor each clusterbasedonthe SurvivorshipRule youselected. 3. Click Browse andnavigate to the folderwhere youwanttostore the outputfile. 4. Type Cleansedand MatchedSuppliers.xlsforthe name and click Open.
  • 40. 40 5. ConfirmthatPivot Record isselectedforthe SurvivorshipRule.Whenyouselectthisoption,the pivotrecordfor eachclusterispickedforthe outputfroma cluster.The otheroptionsforthe SurvivorshipRule are: a. Most complete record: Survivorrecordisthe one withthe largestnumberof populated fields. b. Longest record: Survivorrecordisthe one withthe largestnumberof termsinsource fields. c. Most complete and longestrecord: Survivorrecordisthe one withthe largestnumberof populatedfields,andhasthe largestnumberof termsin eachfield. 6. Click Export to exportthe resultstoexcel file. 7. Click Close to close the MatchingExport dialogbox. 8. Click Finishto finishthe matchingactivity. 9. Openthe Cleansedand MatchedSuppliers.xlsx file andconfirmthatyoudonot see anyduplicates (SupplierID). Now,we have supplierdatathathas beencleansedandmatchedtoremove duplicates. Lesson 4: Storing Supplier Data in MDS Master Data Services(MDS) isthe SQL Serversolutionformasterdatamanagement.Masterdata management(MDM) describesthe effortsmade byanorganizationtodiscoveranddefinenon- transactional listsof data. Modelsare the highestlevel of organizationinMasterData Servicesandorganize the structure of your masterdata. Your MDS implementationcanhave one ormany modelswhere eachmodelgroupssimilar kindsof data. Ingeneral,masterdatacan be categorizedinone of fourways:people,place,things,or
  • 41. 41 concepts.Forexample,youcancreate a Product model tocontainproduct-relateddataor Customer model tocontaincustomer-relateddata.See Models(MasterDataServices) formore details. A model cancontainone or more entities.Eachentityhasattributes(columns) andmembers(rows). Each row containsthe masterdata. In thislesson,youwill create aSuppliersmodelwithtwoentities namedSupplierandState.The Supplierentitywill have the followingattributes:Code,Name,Contact FirstName,ContactLast Name,ContactEmail Address,AddressLine,City,State,Zip,andCountry.See Attributes(MasterDataServices) formore detailsaboutattributesingeneral.The Code andName attributescorrespondtothe SupplierIDandSupplierName columnsinthe CleansedandMatched SuppliersExcel file. A domainbasedattribute isanattribute withvaluesthatare populatedbymembersof anotherentity. Domain-basedattributespreventusersfromenteringattributevaluesthatare not valid.Anattribute valuescanbe selectedonlyfromthe drop-downlistthatispopulatedbyanotherentity.Inthistutorial, the State attribute of the Supplierentityisadomainbasedattribute withvaluesfromthe State entity. You can onlychange the value of the State attribute of the Supplierentitytoone of the valuesinthe State entity.See Domain-BasedAttributes formore details. A derivedhierarchyinMDSisderivedfromthe domain-basedattributerelationshipinthe model.Inthis tutorial,youwill create aderivedhierarchybetweenthe Supplierentityandthe State entity.Afteryou create the derivedhierarchy,youwillsee alistof statesinthe Browserof Master Data Manager.When youclick ona state inthe list,youwill see the suppliersinthatstate inthe right pane.Youwill be creatinga derivedhierarchylaterbasedonthisrelationship.See DerivedHierarchies formore details. You builta knowledgebase inDQSandusedit to cleanse andmatchsupplierdataandstoredthe results inthe CleansedandMatchedSupplierData.xlsfile.Inthislesson,youwilluploadthe cleansedand matcheddata intoMDS. Note that DQS onlycontainsknowledge aboutthe data(metadata) whereas MDS storesthe data itself (masterset).For example:DQSmayhave knowledgeaboutseveral suppliers but MDS onlymaintainsthe suppliersthata companyuses. In thislesson,youwill performthe followingtasks: 1. Create the Suppliersmodel in MDSby usingthe Master Data Manager WebApplication. 2. OpenCleansedandMatched SupplierData.xls inExcel and use the MDS Add-infor Excel to create an entitynamed Supplieranduploadthe datato MDS. 3. Verifythatthe data iscreatedinMDS by usingthe Master Data Manager. 4. Create a newentitynamed State and update the State attribute of Supplierentitytobe a domain- basedattribute dependingonthe State entity.Youwill dothisall usingthe MDS Add-infor Excel. 5. Verifythatthe domainbasedattribute iscreatedbyusing MasterData Manager and update the valuesforthe Name attribute of the State entity. 6. Viewthe updatesyoumade using MasterData Manager inExcel. 7. Load valuesfromthe State entityinto Excel andadd a new value,andverifythe additionbyusing Master Data Manager.
  • 42. 42 8. Create and use a derivedhierarchyusingthe domain-basedattribute relationshipbetweenthe Supplierentityandthe State entity(the State attribute of the Supplierentityisof type State entity) by usingMasterData Manager. Task1: CreatingSuppliersModel using MasterDataManager In thistask,you will create amodel named SuppliersinMDSusingMaster Data Manager. 1. Navigate tohttp://localhost/mds tolaunch MasterData Manager.Replace the URL if youhave configuredWebApplicationwithadifferentname oron a differentaWebsite. 2. Click SystemAdministrationin the Administrative Tasks section. 3. If you do notsee the Add Model page,hovermouse overManage on the menubar, click Models and thenclick Add Model (+) toolbarbuttonto create a new model.
  • 43. 43 4. Enter SuppliersforModel name. 5. ClearCreate entity withsame name as model option.We will be creatinganentitylaterusingthe MDS Add-infor Excel. 6. Click Save Model buttononthe toolbar. Task2: UploadingSupplierDatato MDS usingMDS Add-inforExcel In thistask,you will publishthe cleansedandsupplierdatato MDS usingthe MDS Add-infor Excel.You will create anentitynamed Supplierinthe Suppliersmodel youcreatedinthe previouslesson.The entitywill have anattribute foreachcolumninthe Excel file.The Code andName attributesof the Supplierentitycorrespondtothe SupplierIDandSupplierName columnsinExcel. 1. OpenCleansedandMatched Suppliers.xlsinEXCEL.
  • 44. 44 2. Press CTRL+A to selectentire data.Itis important that youselectthe entire datainthe spreadsheet. 3. Click Master Data on the menubar. 4. Click Create Entity buttonon the ribbon. 5. In the Manage Connectionsdialogbox,if youdonot see the connectionto local MDS serverunder Existingconnections,do the following: a. SelectCreate a new connection,andclick New button. b. In the Add NewConnectiondialogbox,type Local MDS Serverfor Descriptionand http://localhost/MDS for MDS server address,and click OK to close the dialogbox. 6. In the Manage Connectionsdialogbox,select Local MDS Server (http://localhost/MDS),click Testto testthe connection. Click OKon the message box. 7. Click Connectto connectto the MDS server. 8. In the Create Entity dialogbox,select Suppliersforthe Model. 9. Ensure that VERSION_1 isselectedforVersion. 10. Enter SupplierforNewentityname. 11. SelectSupplierIDforthe column that contains a unique identifierfield(youcanalsogenerate a code automatically).Youare essentiallymappingthe SupplierIDcolumnin Excel tothe Code attribute of Supplierentity.
  • 45. 45 12. SelectSupplierName forthe columnthat contains names field.Youare essentiallymappingthe SupplierName columnin Excel to the Name attribute of the Supplierentity.The Code andName attributesare mandatoryattributesforanentityinMDS. 13. Click OK to create the entityonMDS, publishthe masterdatato the entity,andclose Create Entity dialogbox. 14. Now,youshouldsee anewsheettitled Supplier,whichisthe name of the entity,addedtoyour Excel spreadsheetandat the top of the worksheetyoushouldsee thatthe worksheetisconnected to the MDS server.Notice thatthe original worksheet(titledSheet1) still exists.
  • 46. 46 15. Keep Excel open. Task3: VerifyingtheDatain MasterData Manager In thistask,you will verifythatthe Supplierentityiscreatedon MDS usingMasterData Manager Web Application. 1. If Master Data Manager isalreadyopen,click SQL Server2012 Master Data Services at the top to navigate tothe home page.Otherwise,navigate to http://localhost/mds tolaunch MasterData Manager. 2. SelectSuppliersforModel,andclick Explorer. 3. Reviewthe datastoredonMDS. If youdo not see the data,confirmthat youselected Suppliersfor the Model on the home page before launching Explorer.Youcanadd to or delete fromthe supplier listbyusingAdd Memberand Delete Memberbuttons onthe toolbar.
  • 47. 47 Task4 (Optional):Combining,Matching,andPublishingaNewSet of Data Overtime,youwill wanttoadd more data to the MDS repository.Before addingdata,itcan be useful to compare the newdata to the data that’s alreadymanagedinMDS, to ensure youare not adding duplicate orinaccurate data. In the Master Data ServicesAdd-inforExcel, youcancombine datafrom twoworksheets andthencompare the data to identifyandremove duplicatesbefore publishingthe data to MDS. The matchingfeature of the MDS Excel Add-inusesthe DQSmatchingfunctionalityto identifymatchesinthe data. Inthistask,you will combine datafromtwoworksheetsintoone andthen performmatchingtoidentifyandremove duplicatesbeforepublishingtoMDS.See Data Quality Matching inthe MDS Add-inforExcel andCombine Datatopicsfor more details. 1. Launch newinstance of Excel.Click Start, pointto Run, type Excel,andclick OK. 2. Switchto the Master Data tab by clickingMasterData on the menubar. 3. Click Connecton the ribboninthe Connect and Load groupto connectto the MDS server.You have configuredthisconnectionearlierinthislesson. 4. You shouldsee the MasterData Explorer pane to the right.If you do notsee the Master Data Explorer,click ShowExplorerbuttonon the ribbon. 5. In the Master Data ExplorerWindow,selectSuppliersinthe drop-downlistforthe Model.You shouldsee thatthe model hasone entity: Supplier. 6. Double-clickSupplierinthe entitylisttoloadthe entitymembersintothe Excel worksheet. 7. Click Sheet2at the bottomto switchtothe Sheet2tab. If you donot see Sheet2,justcreate a new worksheet. 8. OpenSuppliers.xlsfile(the original inputfilethatisincludedinthe tutorial files) andcopyall (three) rowsfrom the CombineAndCleanseworksheetto Sheet2. 9. Switchback to the Suppliersheetinthe Book 1 – Microsoft Excel (notthe Cleansedand Matched SupplierList Excel) thatisconnectedto MDS. 10. Click Combine Data onthe ribbon.Youwill see the Combine Datadialogbox. 11. In the Combine Data dialogbox,clickthe buttonnextto Range to combine with MDS data textbox as showninthe followingimage.
  • 48. 48 12. You shouldsee the shrunkendialogbox now.Now,click Sheet2toswitchto the Sheet2tab that has the new supplierdatawith4 rows(includingone headerrow). 13. In the Sheet2,selectall rows includingthe headerrow (evenif theyseemtobe alreadyselected). You shouldsee the Range to combine with MDS data is automaticallyupdated. 14. Switchback to the Suppliertab withoutclosingthe Combine Datadialogbox. 15. Clickthe button nextto the textbox. You shouldsee thatthe dialogbox isexpandednow.You shouldsee thatsome columnsof the SupplierMDS entity are mapped to Excel columns.
  • 49. 49 16. Important: Ensure thatCode entitycolumnismappedtothe SupplierIDcolumninthe worksheet and Zip Code entitycolumnismappedtothe Zip Code columninthe worksheet. 17. On the Combine Data dialogbox,click Combine. 18. Confirmthatthree data rowsare addedto the bottomof the worksheetandtheyshouldbe color coded. 19. Click Match Data on the ribbonto identifyduplicates.Thisfeatureusesthe matchingfunctionalityof DQS. 20. In the Match Data dialogbox,select SuppliersforDQS Knowledge Base.
  • 50. 50 21. Map worksheetcolumnstodomainsasshowninthe followingtable. WorksheetColumn Domain Code (youuploadedSupplierIDas the Code for the SupplierentityinMDS). SupplierID Name (youuploadedSupplierName asthe Name for the SupplierentitytoMDS) SupplierName ContactEmailAddress ContactEmail 22. SelectPrerequisite forthe Code columnmapping. 23. Enter 70% as the weightfor SupplierName and 30% as the weightfor Contact Email as shownin the image. 24. Click OK. 25. The matchingprocessshouldidentifyone duplicate forthe supplierwith Code:S1. 26. Selectthe duplicate row (orange),right-click,andclick Delete todeletethe row. 27. Delete the CLUSTER_ID columnsince youdon’tneeditanymore. 28. Click Publishto publish the updatedrecordset(with twonew recordswith CodesS66and S57) to MDS. 29. In the Publishand Annotate dialogbox,addan annotation,and click Publish. 30. Switchto the Master Data Manager Webapplication. 31. On the home page,ensure that Suppliersisselectedforthe Model,andclick Explorer.If youalready have the Explorer open,refreshthe internetbrowser. 4. Sort the listby Code and lookfor recordswith S57 andS66 as codes.Youcan alsouse the Filter buttonon the toolbarto searchfor a specificrecordinthe list. 5. Now,close Book1 – MicrosoftExcel window withoutsavingthe file. Task5: Creatinga Domain-BasedAttributefromExcel In thistask,you will convertthe State attribute of the Supplierentityasa domain-basedattribute.After youconfigure the State attribute tobe a domain-basedone andpublishittoMDS, a new entitynamed State will be createdonMDS serverwithall the valuesinthe columnandthe State attribute of the Supplierentitywill be populatedwithvaluesfromthe State entity.Now,the Suppliersmodel should
  • 51. 51 have twoentities: SupplierandState where the State attribute of the Supplierentityisa domain-based attribute thatdependson State entity. 1. Switchto Excel windowthathas CleansedandMatched Suppliers.xlsxopen. 2. Click Refreshbuttononthe ribbontoget the latestupdatesonMDS. You shouldsee the two additional recordsif youhave performedthe optional Task4. 3. Clickcolumnname State (Cell I1) inthe headerrow. 4. Click Attribute Properties on the ribbon. 5. In the Attribute Properties dialogbox,select Constrainedlist(Domain-based) forthe Attribute type. 6. Type State for the Newentity name and click OK. 7. Now,inExcel,youwill see downarrow whenyouclickon any value inthe State column.Youcan change the value usingthe drop-downlistif youneed.
  • 52. 52 Task6: Verifythat the Domain-BasedAttributeisCreatedusingMasterDataManager In thistask,you will verifythatthe State entityiscreatedin MDS andthe State attribute of the Supplier entityisa domain-basedattribute thatdependsonthe State entitybyusingMaster Data Manager. 1. Switchto the Master Data Manger webapplication. 2. Click SQL Server 2012 Master Data Services at the top to getto the home page. 3. Ensure that Suppliersmodel isselectedandclick Explorer.Youcouldjustrefreshthe page if you alreadyhad Explorer open. 4. Hoveryour mouse overEntitiesinthe menubar and notice thatnow there are twoentities: SupplierandState. 5. Click State if the entityisnotopenalready. 6. SelectGAfromthe list. 7. In the Detailspane to the right,change the Name to Georgiainthe right pane, andclick OK. 8. Repeatthe previousstepsforotherstates. Code Name CA California CO Colorado IL Illinois DC Districtof Columbia FL Florida AL Alabama KY Kentucky MA Massachusetts AZ Arizona
  • 53. 53 MI Michigan MN Minnesota NJ New Jersey NV Nevada NY New York OH Ohio OK Oklahoma OR Oregon PA Pennsylvania SC SouthCarolina KS Kansas TN Tennessee TX Texas UT Utah VA Virginia WA Washington WI Wisconsin HI Hawaii MD Maryland CT Connecticut 9. Selectanyof the above entriesandclick ViewTransactionsfrom the Toolbar.You shouldsee the transactionforthe update youjustmade isin the listof transactions. 10. Hoverthe mouse overEntitiesmenuand click Supplier. 11. Now,notice thata value forthe State fieldcanbe changedinthe Details pane usingthe drop-down list.Youcan alsosee that,inthe listtothe leftandinthe drop-downlistinthe Detailspane,code is displayedfirstandthenthe name incurlybraces.You can alsochange any othervalue inthe Details pane.
  • 54. 54 Task7: ViewingUpdatesMadeusingMasterDataManagerinExcel In thistask,you will verifythatyousee the updatesperformedusingMasterDataManager in Excel. 1. Now,switchtothe excel windowthathas Cleansedand MatchedSuppliers spreadsheetopen. 2. Click Refreshbuttonon the ribbon. 3. Notice thatnames showup (California,New Yorketc…) forthe State fieldalongwiththeircodes.
  • 55. 55 Task8: AddingaNew ValueforState Entity inExcel In thistask,you will adda newvalue forthe State entityinExcel andpublishthe change tothe MDS server. 1. Create a new work sheetinExcel by clickingona new tab at the bottom. 2. In Excel,clickthe Master Data tab on the menu,andthenclick Show Explorer on the ribbon. 3. In the Master Data Explorer,selectSuppliersforModel.Youshouldsee twoentities: Supplierand State inthe entitylist. 4. Double-clickState inthe list.All the membersof the State entityfromMDS shouldbe displayedin the worksheet.
  • 56. 56 5. Now,adda newrowat the endwiththe followingvalues:NorthCarolinafor Name and NC for Code.The color codingdifferentiatesanynew/updatedrecordsfromthe otherrecords. 6. Click Publishonthe ribbontopublishthe change toMDS. 7. On the Publishand Annotate dialogbox,notice thatthe Use same annotation for all changes is selected.Youcanentera single annotationforall the changeshere. 8. SelectReviewchangesand provide annotations individually optiontoprovide annotationforeach change (inthiscase,onlyone).
  • 57. 57 9. Click Publishto publishdatatoMDS. 10. Notice thatcolor coding for the rowwith North Carolina as the State is same as otherrecordsnow. 11. Optional:verifythatthe newmember(NC) isaddedtothe State entitybyusingthe Explorerin Master Data Manager. 12. In Excel,right-clickthe State worksheetatthe bottom, andclick Delete to delete the worksheet. Deletingthe worksheetdoesnotdeleteanydatafrom the MDS server. Task9: Creatinga DerivedHierarchyusingMasterDataManager In thistask,you will create aderivedhierarchybyusingMasterData Manager.This derivedhierarchyis derivedfromthe domain-basedattribute relationshipsbetweenthe SupplierandState entities. 1. Switchto the mainpage of Master Data Manager by clickingSQLServer 2012 Master Data Services at the topof the page. 2. Click SystemAdministrationin the Administrative Tasks section. 3. Hoverthe mouse overManage on the menubar, andclick DerivedHierarchies.
  • 58. 58 4. Click Add DerivedHierarchy (+) buttononthe toolbar. 5. Type SuppliersInState forthe Derivedhierarchy name. 6. Click Save buttononthe toolbarto save. 7. Drag SupplierfromAvailable Levels:SuppliersInState toCurrent Levels:SuppliersInState. 8. Drag State fromAvailable Levels:SuppliersInState toCurrentLevels:SuppliersInState.The screen shouldhave CurrentLevelsas showninthe followingpicture.
  • 59. 59 9. In the Previewwindow,expand NY{NewYork} andyou shouldsee one supplierinthatstate as showninthe precedingimage. 10. Switchto the mainpage of Master Data Manager by clickingSQLServer 2012 Master Data Services at the topof the page. 11. Click Explorer. 12. Hoverthe mouse overHierarchiesand click Derived:SuppliersInState. 13. Clickon any state node inthe tree viewand youshouldsee the suppliersinthatstate in the right pane.
  • 60. 60 Lesson 5: Automating the Cleansing and Matching using SSIS In Lesson1, youbuiltthe SuppliersKBandusedthatKB to performcleansingactivityinLesson2andthe matchingactivityin Lesson3 usingthe tool DQS Client.Ina real worldscenario,youmayhave to pull data froma source that isnot supportedbyDQSor youwantto automate the cleansingandmatching processwithouthavingtouse the DQS Clienttool.SQLServerIntegration Services(SSIS) has componentsthatyoucan use to integrate datafrom variousheterogeneoussourcesanda DQS CleansingTransform componenttoinvoke the cleansingfunctionality exposedbyDQS.Currently,DQS doesnotexpose matchingfunctionalityforSSIStouse,but youcan use the FuzzyGroupingTransform to identifyduplicatesinthe data. You can uploaddata to MDS by usingthe Entity-basedStaging feature.Whenyoucreate an entityin MDS, correspondingstagingtablesandstoredproceduresare automaticallycreated.Forexample,when we createdthe Supplierentity,the stg.supplier_Leaf table andthe stg.udp_Supplier_Leaf stored procedure were automaticallycreated.Youuse the stagingtablesandprocedurestocreate,update,and delete entitymembers.Inthislesson,youwill be creatingnew entitymembersforthe SupplierEntity. To loaddata intothe MDS server,the SSISpackage firstloadsthe data intothe stagingtable stg.supplier_Leaf andthentriggersthe associatedstoredprocedure stg.udp_Supplier_Leaf.See ImportingData formore details.
  • 61. 61 In thislesson,youwill performthe followingtasks: 1. Remove supplierdatainMDS (if youhave gone throughLessons1-4). The SSISpackage you create in thislessonuploadsthe datatoMDS automatically.Earlier,youuploadedthe cleansed and matchedsupplierdatatoMDS servermanuallyusingthe DQSClient. 2. Create a subscriptionview onthe Supplierentitytoexpose datainthe entitytoother applications.ThiscreatesaSQL view thatyouwill verifyusingSQLServerManagementStudio. You will notbe consumingthisview inthisversionof the tutorial. 3. Create and runan SSIS projectusing SQL ServerData Tools. The projectwill use Data Cleansing transformto submita cleansingrequesttothe DQS server.The matchingfunctionalityisnot exposedbyDQSyet,soyou will use FuzzyGroupingtransformtoidentifyduplicates. 4. Verifythatthe data iscreatedinMDS by usingMasterData Manger. 5. Reviewthe resultsfromDQScleansingprojectcreatedbythe SSISpackage and optionally performinteractive cleansingtofurtherbuildthe knowledge base. Task1 (Prerequisite):RemovingSupplierDatainMDS In thistask,you will remove the supplierdatastoredinMDS. You haduploadedthe datamanuallyusing MDS Excel Add-ininthe previouslesson.The SSISpackage youwill be creatinginthislessonwill automaticallyloadthe dataintoMDS for you.Therefore,beforetestingthe SSISpackage,we needto remove the supplierdatafromMDS, remove the derivedhierarchy,removesupplierandstate entities, and create the supplierentitywithnodata. 1. Launch Master Data Manager by navigatingto http://localhost /MDS or the Website and applicationyouspecifiedwhenconfiguringMDS. If youkeptthe Master Data Manager open,click SQL Server2012 Master Data Services at the top to switchto the home page. 2. Click SystemAdministrationin the Administrative Tasks section. 3. Hoverthe mouse overManage on the menuandclick DerivedHierarchies.We needtodelete the derivedhierarchy SuppliersInState before deletingthe entitiesinthe Suppliersmodel. 4. SelectSuppliersInState fromthe DerivedHierarchylistandclick X (Delete) buttononthe toolbar. 5. Click OK to confirmdeletion. 6. Hoverthe mouse overManage on the menuandclick Entities. 7. Click Supplierandclick Delete (X) buttonon toolbarto delete the entity.Click OKonmessage boxes. 8. Repeatthe previoussteptodelete State entity. 9. Don’tclose Master Data Manager. 10. Switchto the Excel windowthathas Cleansedand Matched Suppliers.xlsfile open.Switchtothe Sheet1tab at the bottom. 11. Selectonlythe firstrow with headers.Don’tselectanyotherrow. You wantto justcreate the entitiesbasedonthe Excel columnsbutdon’twanttouploadanydata, therefore youselectonlythe firstrow withthe headers. 12. Click Master Data on the menubar. 13. Click Create Entity from the ribbon. 14. In the Manage Connectionsdialogbox,if youdonot see the connectionto local MDS serverunder Existingconnections,do the following:
  • 62. 62 a. SelectCreate a new connection,andclick New button. b. In the Add NewConnectiondialogbox,type Local MDS Serverfor Descriptionand http://localhost/MDS for MDS server address,and click OK to close the dialogbox. 15. In the Manage Connectionsdialogbox,select Local MDS Server (http://localhost/MDS),click Testto testthe connection. Click OKon the message box. 16. Click Connectto connectto the MDS server. 17. In the Create Entity dialogbox,dothe following: a. ConfirmthatRange is setto $1:$1. b. SelectSuppliersforModel. c. SelectVERSION_1 forVersion. d. Type SupplierforNew entityname. e. SelectSupplierIDforCode. f. SelectSupplierName forName. g. ClickOK to create the entityandclose the dialogbox. 18. Close EXCEL and do not save the file. 19. In Master Data Manager, refreshthe internetbrowserandconfirmthat Supplierentityisdisplayed inthe list. 20. Switchto the home page by clickingSQLServer 2012 Master Data Services at the top. 21. ConfirmthatSuppliersisselectedforModel andVERSION_1 is selectedforVersion. 22. Click Explorer.Notice thatthe Supplierentitywithall the attributesiscreatedwith novalues. Task2 (Optional):CreatingaMDS SubscriptionViewusingMasterDataManager In thistask,you will create asubscriptionviewtoexpose the Supplierentityinthe Suppliersmodelto otherapplications.Youwill notbe consumingthisview inthe currentversionof the tutorial. 1. Switchto the mainpage of Master Data Manager (http://localhost/MDS) byclickingSQLServer 2012 Master Data Services at the top. 2. Click IntegrationManagement. 3. Click Create Viewson the menubar. 4. Click + (Plus) iconon the toolbarto create a new subscriptionview. 5. In the Create Subscription Viewpane,type SuppliersforSubscriptionviewname. 6. SelectSuppliersforModel.
  • 63. 63 7. SelectVERSION_1 forVersion. 8. SelectSupplierforEntity. 9. SelectLeafmembersfor Format. 10. Click Save onthe toolbartosave the subscriptionview.Thisactuallycreatesaview inSQLServer namedSuppliers.Youcan verifythisusingSQLServerManagementStudio(SSMS). Task3 (Optional):Reviewingthe SubscriptionViews In thistask,you will confirmthatthe SQLviewsare createdbyusingSQL ServerManagementStudio. 1. Launch SQL Server ManagementStudio.Clickthe Start button,click All Programs, click Microsoft SQL Server2012, andthenclick SQL ServerManagement Studio. 2. In the Connectto Serverwindow,setServerType to Database Engine,type the server name (or select(local),andselectappropriate authentication,andclick Connecttoconnectto the server. 3. In the ObjectExplorer pane,expand Databases,expand MDS,and thenexpand Views. 4. Confirmthatyou see the mdm.Suppliersview inthe list. Task4: CreatinganSSIS ProjectusingSQL ServerDataTools In thistask,you will create anSSISprojectusing SQL ServerData Tools to automate cleansingand matchingsupplierdata. 1. Launch SQL Server Data Tools.Click Start, pointtoAll Programs, expand MicrosoftSQL Server 2012, andclick SQL Server Data Tools. 2. Click File onmenu,pointto New,and click Project. 3. ExpandBusinessIntelligence inthe InstalledTemplates pane,andselect IntegrationServices.
  • 64. 64 4. SelectIntegrationServicesProjectinthe listof project types. 5. Type CleanseAndCurateSuppliers forName andclick OK. 6. In the SolutionExplorerwindow,right-clickPackage.dtsxandselectRename.If youdon’tsee the SolutionExplorerwindow,click Viewonthe menubar and click SolutionExplorer.
  • 65. 65 7. Type CleanseAndCurate.dtsx andpress ENTER.Make sure thatthe extensionremains .dtsx. Task5: AddingDataFlowTask In thistask,you will adda Data FlowTaskto the control flow of SSISpackage. 1. Drag and drop Data FlowTask fromSSIS Toolbox to the Control Flowtab in the SSISDesigner.If you do notsee the SSIS Toolbox, clickanywhere inthe Control Flowtab, clickSSIS on the menubar,and clickSSIS Toolbox.
  • 66. 66 2. Right-clickthe Data FlowTask inthe Control Flow tab and click Rename. 3. Type Receive,Cleanse,Match,and Curate SupplierData and press ENTER. 4. Double-clickonthe Data FlowTask to switchto the Data Flowtab. Task6: AddingExcel Sourceto theData Flow In thistask,you will addanExcel Source to the data flow to readsupplierdatafromthe source Excel file. The Excel Source extractsdata from worksheetsorrangesinMicrosoftExcel workbooks.See Excel Source topicfor more details. 1. Drag-dropExcel Source from OtherSources inSSIS Toolbox to the Data Flow tab. 2. Right-clickonExcel Source in the Data Flow tab,and click Rename. 3. Type Read SupplierData from Excel File and press ENTER. 4. Double-clickReadSupplierData from Excel File tolaunchthe Excel Source Editor dialogbox. 5. In the Excel Source Editor dialogbox,click Newto create a new Excel connection. 6. In the Excel ConnectionManager dialogbox,click Browse,and thenselectthe Suppliers.xlsfile in the EIM Tutorial folder.Confirmthat MicrosoftExcel 97-2003 isselectedinthe Excel Versionbox and thenclick OK.
  • 67. 67 7. In the Excel Source Editor dialogbox,select IncomingSuppliers$inthe Name of the Excel sheetlist box. 8. Click Previewtopreviewthe datainExcel file. 9. Click OK to close the dialogbox. 10. Drag-dropDQS Cleansingtransformin OtherTransforms onthe SSIS Toolboxto the Data Flow tab underRead SupplierData from Excel File.The DQS CleansingtransformationusesDataQuality Services(DQS) tocorrectdata by applyingapprovedrulesinthe knowledge base.Thistransform,at runtime,createsaDQS cleansingprojectonthe DQSserver.See DQSCleansingTransformation topicfor more details. Task7: AddingDQS CleansingTransformto theData Flow In thistask,you will addDQSCleansingTransformtothe data flow tocleanse the inputsupplierdataby usingDQS.See DQS CleansingTransform formore detailsaboutthe transform.
  • 68. 68 1. Right-clickDQSCleansinginthe Data Flow tab,and click Rename.Type Cleanse SupplierData,and press ENTER. 2. SelectReadSupplierData from Excel File;drag the blue connectorto Cleanse SupplierData. The componentsare nowconnected. 3. Double-clickCleanse SupplierData. 4. In the DQS CleansingTransformationEditor, clickNewnextto the Data Quality Connection Manager drop-downlist. 5. In the DQS CleansingConnectionManagerdialogbox,type (local) orperiod(.) to connectto the local server.Thislessonassumesthatyouhave DQSinstalledonalocal server. 6. Click Test Connectiontotest the connectiontoDQS server. 7. Click OK to close the dialogbox. 8. SelectSuppliersforthe Data QualityKnowledge Base. 9. Switchto the Mapping tab at the top.
  • 69. 69 10. From Available InputColumns,selectSupplierName,ContactEmailAddress,AddressLine,City, State, Country,and Zip Code byclickingthe checkboxes. 11. In the bottompane,map these columnsusingdrop-downlistsinthe Domaincolumn: Column Domain SupplierName SupplierName ContactEmailAddress Contact Email AddressLine AddressLine City City State State Country Country ZipCode Zip 12. Click OK to close the DQS CleansingTransformation Editor dialogbox. Task8: AddingConditional SplitTransformto SplitCleansingOutput In thistransform,youwill adda Conditional SplitTransformtothe dataflow.The Conditional Split transformationcanroute rowsto differentoutputsdependingonthe contentof the data. For the
  • 70. 70 purpose of thistutorial,youwill use the RecordStatus outputcolumnfromthe DQS Cleansing transform.Youwill uploadonlycorrector correctedrecordsto MDS serverinthistutorial.Therefore youwill checkif the Record Status isCorrect or Corrected,and combine the recordsbefore uploading the recordsto MDS. 1. Drag-dropConditional SplitTransform fromCommon sectioninthe SSIS Toolboxto the Data Flow tab below Cleanse SupplierData. 2. Right-clickConditional Split,andclick Rename.Type PickCorrect and CorrectedRecords and press ENTER. 3. ConnectCleanse SupplierData and Pick Correct and CorrectedRecords usingthe blue connector. 4. Double-clickPickCorrectand CorrectedRecords inthe Data Flowtab. 5. Change the DefaultOutput Name at the bottomof the screento Correct. 6. ExpandColumnsin the top-leftpane.
  • 71. 71 7. Drag-dropRecord Status to the Conditioncolumn. 8. Type ==“Corrected” nextto[Record Status] for the Conditioncolumn. 9. Click Case 1 inthe Output Name Column,and change the name to Corrected. 10. Click OK to close the Conditional SplitTransformation Editordialogbox. Task9: AddingUnionAll Transformto CombineCorrectandCorrectedRecords In thistask,you will addthe UnionAll Transformtothe data flow.The UnionAll transformation combinesmultiple inputsintoone output.Inourscenario,it will combinebothCorrectandCorrected recordsintoone stream. 1. Drag-dropUnionAll TransformfromCommon sectionof the SSIS Toolboxto the Data Flowtab and place it below PickCorrectand CorrectedRecords.
  • 72. 72 2. Right-clickUnionAll Transforminthe Data Flowtab, and click Rename.Type Combine Correct and CorrectedRecords, and press ENTER. 3. ConnectPick Correct and CorrectedRecords to Combine Correct and CorrectedRecords in the Data Flowtab usingthe blue connector.Youshouldsee the Input Output Selectiondialogbox. 4. In the Input Output dialogbox,select CorrectforOutput and click OK. 5. Move the connectortitled Correctto the leftbydraggingand droppingthe dotat the endof the connectorto left.
  • 73. 73 6. If you selectPickCorrect and Corrected Records transform, youshouldsee another blue connector. Drag that blue connectorto Combine Correctand CorrectedRecords. 7. Thisconnector shouldbe titled Corrected.Since we have onlytwoconditions Correctand Corrected,and one conditionwasalreadyused,the InputOutputSelectiondialogbox isnot displayedthistime.If the connectorsoverlap,moveone toleftandthe otherone to rightby draggingthe connectorto leftor right. Task10: AddingFuzzyGroupTransform to IdentifyDuplicates In thistask,you will adda FuzzyGroupTransformto the data flow.The FuzzyGrouptransformationcan helpidentifyduplicatesinthe source data.See FuzzyGroupingTransformation formore details. 1. Drag-dropFuzzy Grouptransformin Other Transforms onthe SSIS Toolboxto the Data Flow tab below Combine Correctand CorrectedRecords. 2. Right-clickFuzzyGroupTransforminthe Data Flow tab, andclick Rename. Type GroupSuppliers with matching IDs and press ENTER. 3. ConnectCombine Correctand CorrectedRecords to GroupSupplierswith matching IDs usingthe blue connector. 4. Double-clickGroupSupplierswithmatchingIDs.
  • 74. 74 5. In the Fuzzy GroupTransformation Editor, clickNewnextto OLE DB ConnectionManager drop- down listto launchConfigure OLE DB ConnectionManager dialogbox. 6. In the dialogbox,click Newto launchConnectionManagerdialogbox. 7. Type (local) or period(.) for the Servername. 8. SelectMDSfor Selector enter a database name field.We will be usingMDSdatabase as the temporarystorage forthe FuzzyGroup Transform. The Fuzzy Groupingtransformationrequiresa connectiontoan instance of SQL Servertocreate the temporarySQLServertables thatthe transformationalgorithmrequirestodoits work. You can create a new database or use another existingdatabase forthispurpose. 9. Click Test Connectiontotest the connectionandclick OK on the message box. 10. In the ConnectionManager dialogbox,click OK. 11. Select(local).MDS(orlocalhost.MDS) fromthe list of Data Connections andclick OK. 12. In the Fuzzy GroupingTransformation Editor,confirmthat(local).MDSor localhost.MDS isselected for the OLE DB ConnectionManager. 13. Switchto the Columnstab. 14. Select(checkbox) SupplierID_Outputfromthe listof Available InputColumns.To configure the transformation,youmustselectthe inputcolumnstouse whenidentifyingduplicates.Tokeepit simple,youwill onlyuse the SupplierIDinthisstep. 15. Click OK to close the FuzzyGroup Transformation Editor.
  • 75. 75 Task11: AddingConditional SplitTransformto FilterDuplicates In thistask,you will addthe ConditionalSplitTransformtothe data flow.Thistransformwill helpyou filterduplicatesfrom the incomingrecordset.The FuzzyGrouptransformgroupsthe recordsthat it findstobe matchesandpicksone of the record as a pivotrecord. All the recordsin a group have the same _key_outvalue.The pivotrecordinthe group has _key_insame asthe _key_outvalue.The other recordsin the grouphave differentvaluesfor_key_inand_key_out.Therefore,whenyoufilterusing the condition_key_in==_key_out,youonlygetthe pivotrow inthe group. 1. Drag-dropConditional SplitTransformfrom Commonsectioninthe SSIS Toolbox to the Data Flow tab. 2. Right-clickConditional SplitTransforminthe Data Flow tab, andclick Rename. Type Filter Duplicatesand press ENTER. 3. ConnectGroupSupplierswith Matching IDs to FilterDuplicates. 4. Double-clickFilterDuplicatestolaunchthe Conditional SplitTransform Editor dialogbox. 5. ExpandColumnsin the top-leftpane. 6. Drag-drop_key_in to the Conditioncolumn. 7. Type == (equalsto) nextto _key_inand drag-drop_key_out. 8. Click Case 1 inthe Output Name column,type Unique Records,andpress ENTER. 9. Click OK to close the Conditional SplitTransformation Editor dialogbox. Task12: AddingDerivedColumnTransformto AddColumnsRequiredbyMDS In thistask,you will addthe Derive ColumnTransformtothe dataflow.Youwill addtwoderived columns, ImportType and BatchTag, to the recordspassedto thistransform.Youneedtoadd these columnsbefore uploadingthe datatostagingtablesinMDS. These twoare requiredcolumnsforthe stagingtablesinMDS. See Leaf MemberStagingTables formore details. 1. Drag-dropDerivedColumntransform fromCommon sectioninthe SSIS Toolboxto the Data Flow tab.
  • 76. 76 2. Right-clickDerivedColumnTransforminthe Data Flowtab, and click Rename.Type Add Columns Requiredby MDS and press ENTER. 3. ConnectFilterDuplicatesto Add ColumnsRequiredby MDS usingthe blue connector.Thiswill launchthe Input Output Selectiondialogbox. 4. In the Input Output Selectiondialogbox, selectUnique Records,andclick OK. 5. Click SSIS on the menubar andclick Variables. 6. In the Variableswindow,click AddVariable buttonon the toolbar. 7. Type ImportType forthe Name and 2 forthe value.You specifythe value as2 because youwantto add newmemberstoan entityinMDS.For detailsaboutthisparameter,see Leaf MemberStaging Table. 8. Click Add Variable toolbarbuttonagain. 9. Type BatchTag for the Name, selectStringforthe Data type,and EIMBatch forthe Value. BatchTag isjust a unique name forthe batch youwill be submittingtoMDS. 10. In the Data Flow tab,double-clickAddColumnsRequiredbyMDS. 11. In the DerivedColumnTransformation Editor dialogbox,inthe listbox inthe bottom pane, type ImportType forthe DerivedColumnName. 12. ExpandVariablesand Parameters inthe top-leftpane,drag-dropUser::ImportType tothe Expressioncolumn.
  • 77. 77 13. Type BatchTag in the nextrowfor the DerivedColumn Name. 14. Drag-dropUser::BatchTag fromVariablesand Parameters to the Expressioncolumn. 15. Click OK to close the DerivedColumnTransformation dialogbox. Task13: AddingOLE DBDestinationto Write Data to MDS StagingTable Nowthat youhave added ImportType and BatchTag valuestoall records,youare readyto sendthem overto MDS forstaging.In thistask,youwill use the OLE DB Destinationtowrite the datainto stg.supplier_Leaf stagingtable. 1. Drag OLE DB DestinationfromOther Destinations sectioninthe SSIS Toolboxto the Data Flowtab and dropit below AddColumnsRequiredby MDS. 2. Right-clickOLEDB Destinationinthe Data Flowtab, and click Rename.Type Write SupplierData to MDS Staging Table and press ENTER. 3. Connectthe Add ColumnsRequiredby MDS to Write SupplierData to MDS Staging Table usingthe blue connector. 4. Double-clickWrite SupplierDatato MDS Staging Table inthe Data Flow tab. 5. In the OLE DB DestinationEditor dialogbox,make sure that (local).MDS(orlocalhost.MDS) is selectedforthe OLE DB ConnectionManager field. 6. Selectstg.Supplier_Leaf table fromthe listof Name ofthe table or the view.
  • 78. 78 7. Switchto the Mappingspage byclickingMappinginthe menuonleft. 8. Map inputand destinationcolumnsasshowninthe followingtable. 9. Confirmthatyou are using_Output columnsfor InputColumns,notthe _Status or _Source columns. _Output columnscontainthe outputvaluesfromDQSCleansing. 10. Click OK to close the OLE DB DestinationEditor dialogbox. 11. The data flowshouldlike the followingimage.
  • 79. 79 Task14: AddingExecuteSQLTaskto Control Flowto Runthe StoredProcedureforMDS Afterloadingdataintothe stagingtablesof MDS, you needtorun a storedprocedure associatedwith that table toload the data fromstagingintothe appropriate tables inthe MDS database.Thisstored procedure hastworequiredparametersthatyouneedtopass:LogFlag andVersionName.LogFlag specifieswhethertransactionsare loggedduringthe stagingprocessandVersionName representsthe versionof the model.See StagedStoredProcedure topicformore details. In thistask,you will addthe Execute SQLTask to the control flow toinvoke the storedprocedure toload the stageddata intoappropriate MDS tables. 1. Now,switchtothe Control Flow tab.
  • 80. 80 2. Drag-dropExecute SQL Task fromthe SSIS Toolbox to the Control Flow tab. 3. Right-clickExecute SQLTask inthe Control Flowtab,and click Rename.Type Trigger Stored Procedure to Load Data into MDS and pressENTER. 4. ConnectReceive,Cleanse,Match,and Curate SupplierData to Trigger Stored Procedure to Load Data usingthe greenconnector. 5. Usingthe Variableswindow,addtwonew variableswiththe followingsettings.If youdonotsee the Variableswindow,click SSISonthe menubar andclick Variables. Name Data Type Value LogFlag Int32 1 VersionName String VERSION_1 6. Double-clickTriggerStoredProcedure to Load Data into MDS. 7. In the Execute SQL Task Editor dialogbox,select (local).MDS(orlocalhost.MDS) forConnection. 8. Type EXEC [stg].[udp_Supplier_Leaf]?,?,? forSQL Statement.You can verifythe name usingSQL ServerManagementStudio.
  • 81. 81 9. Click Parameter Mappingfrom the menuonleft. 10. In the Parameter Mappingpage,click Add to add a new mapping.Maximize the windowandresize columnssothat you can see valuesindrop-downlistsproperly. 11. SelectUser::VersionName fromthe drop-downlistforthe Variable Name. 12. SelectNVARCHARfor Data Type. 13. Type 0 (zero) forParameter Name. 14. Repeatthe previousfourstepstoaddtwo more variables. Variable Name Data Type (important) Parameter Name User::LogFlag LONG 1 User::BatchTag NVARCHAR 2 15. Click OK to close the Execute SQL Editor dialogbox. Task15: BuildingandRunningtheSSIS Project In thistask,you will buildandrunthe SSISproject.If youhave the 64-bit versionof Excel 2010 installed on yourcomputer,youneedtoset the value of Run64BitRuntime to False for the Excel source to work. Thisis a knownissue.
  • 82. 82 1. In the SolutionExplorerwindow,Click Projectonthe menu,andclick CleanseAndCurateSuppliers Properties. 2. In the Propertiesdialogbox,expand ConfigurationProperties onleft,andclick Debugging. 3. SetRun64BitRuntime to False. 4. Click OK to close the Propertiesdialogbox. 5. Click Buildon menubar and click BuildCleanseAndCurateSuppliers.Make sure thatthere are no builderrors. 6. Click Debugon the menubar and click Start Debugging. 7. Reviewmessagesinthe Progresswindow andverifythatpackage executedandendedsuccessfully.
  • 83. 83 8. Click Debugon menubar and click Stop Debuggingto stop the debuggingsession.If the package fails,youmaywant to enable dataviewersandsee how the dataflowsbetweencomponents. Task16: VerifyingwithMasterDataManager In thistask,you will checkthe statusof the batch jobsubmittedbythe SSISpackage and verifythatthe data was uploadedtoMDS serverusingMasterData Manager. 1. Launch Master Data Manager (http://localhost/MDS).If itisalreadyopen,click MicrosoftSQL ServerMaster Data Services at the topto switchto the home page. 2. Click IntegrationManagement. 3. Notice thatthere isa batch withnamed EIMBatch that yousubmittedinthe list.Click ImportData on the menubar if you donot see the followingscreen. 4. Switchback to the home page by click SQL Server 2012 Master Data Servicesat the top. 5. Make sure that SuppliersisselectedforModel andVERSION_1 isselectedforVersion,andclick Explorer. 6. You can see the data SSISpackage importedintoMDS. The data shouldbe cleansedandhave no duplicates Code values(Note:SupplierIDcolumninExcel correspondsto Code attribute of Supplier entityinMDS). Task17: ReviewingDQS CleansingProjectCreatedbythe SSIS package In thisproject,youwill openthe DQSprojectcreatedbythe SSISpackage inDQS Client,review the resultsfromthe cleansingprocess,andoptionallyperforminteractivecleansingandexportthe results. 1. Launch Data Quality Client. 2. Click ActivityMonitoring inthe Administrationpane. 3. Sort the listbasedon Activity Start Time to see the latestrecord. 4. Notice thatyou see a name of the projectin the followingformat: CleanseAndCurate.Cleanse SupplierData.GUID. 5. Notice thatthe value inthe Is Active fieldis Active. 6. Click Profilertabin the bottompane to see profilerstatisticsforthe Cleansingactivitythatthe SSIS package performed.