Data Quality Services in SQL Server 2012


Published on

An introduction to Data Quality Services. DQS enables to discover, build, and manage knowledge about your data. Use that knowledge to perform data cleansing, matching and profiling. We will explore the numerous features and capabilities of Data Quality Services and its integration with SSIS with the DQS Cleansing Transform. Data Quality Services in SQL Server 2012

Published in: Technology
No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Data Quality Services in SQL Server 2012

  1. 1. Data Quality Services in SQL Server 2012(An Introduction)Stéphane FréchetteFriday April 26, 2013MatchingCleansingDQS
  2. 2. Who am I?My name is Stéphane FréchetteI’m a Database & Business Intelligence Professional and CEO | Founder ofI have a passion for architecting, designing and building solutions that matter.Self proclaimed Open Data Hacker/Advocate I founded Gatineau Ouverte a citizen ledinitiative which aims to promote open access to civic data of the city of Gatineau.Twitter: @sfrechetteEmail: stephanefrechette@ukubu.comBlog:
  3. 3. Session Outline• Microsoft Business Intelligence (The Stack)• Dirty Data…• SQL Server Data Quality Services (DQS)• Data Steward• Knowledge Base and Domains• Data Quality Projects• Data Cleansing Transform – SSIS• DQS (Install & Architecture)• Enterprise Information Management (EMI)• Resources
  4. 4. AnalysisServicesReportingServicesIntegrationServicesMaster DataServicesSharePointCollaborationExcelWorkbooksPowerPivotApplicationsSharePointDashboards & ScorecardsData QualityServicesODataFeedsLine of BusinessApplicationsHadoop Big DataMicrosoft Business Intelligence
  5. 5. Dirty Data…Do you have dirty data?(all projects have it! Its inevitable)
  6. 6. Dirty Data…Causes?Bad data entryPoor Data GovernanceDuplicate entities in different LOB systems
  7. 7. Sample Data Representation• Prospect in CRM System:Mark Smith | 613.111-1234 | Ottawa | ON | K1P 1K1• Prospect buys goods now entered in POS System:Markus Smith | 1234 Stilton Ave | Kanata |ON | K1P 1K1• Record also entered into Accounting System:Markus Smith | 1234 Stilton Avenue | Kanata | ON | K1P 1K1ETL process imports these records into the Data Warehouse / Data MartFirstName LastName Phone Address City Province PostalCodeMark Smith 613.111-1234 Ottawa ON K1P 1K1Markus Smith 1234 Stilton Ave Kanata ON K1P 1K1Markus Smith 1234 Stilton Avenue Kanata ON K1P 1K1
  8. 8. Sample Data Representation• Duplicate records and inaccurate, incomplete data• What we want is a golden record (one version of the truth)FirstName LastName Phone Address City Province PostalCodeMark Smith 613.111-1234 Ottawa ON K1P 1K1Markus Smith 1234 Stilton Ave Kanata ON K1P 1K1Markus Smith 1234 Stilton Avenue Kanata ON K1P 1K1FirstName LastName Phone Address City Province PostalCodeMarkus Smith 613-111-1234 1234 Stilton Ave Kanata ON K1P 1K1
  9. 9. SQL Server Data Quality Services (DQS)• New in SQL Server 2012• Enables cleansing, matching, standardizing and enriching data• Delivers trusted information for business intelligence, data warehouse, transactionprocessing workloads• Knowledge-Driven Solution (create/edit)• A knowledge management process that builds the knowledge base• A data quality project that proposes changes to source data based on the knowledge in the knowledgebase (cleansing and matching)• A key component to an Enterprise Information Management (EIM) solution
  10. 10. Answering the Need with DQS• DQS enables to resolve issues involving incompleteness, lack of conformity, inconsistency,inaccuracy, invalidity, and data duplication• Provides the following features to resolve data quality issues: Data Cleansing Matching Reference Data Services Profiling Monitoring Knowledge Base
  11. 11. Data Steward• Key role - Is usually a Business User and not from the Information Technology side• Nutshell: Responsible for maintaining data elements in a metadata registry…• Data Steward -> DQS Client• Create and edit Knowledge Bases• Run and process data though continually, iteratively, improving the Knowledge Bases• Knowledge Bases can be consumed and used by other Data Stewards and IT (SSIS / ETL Developers)DQSData StewardMDSData StewardSSISDeveloperMatching Cleansing
  12. 12. Knowledge Bases and DomainsThe knowledge base is a repo of knowledge about your data that enables you to understandyour data and maintain its integrity.• Processes:• Computer-assisted• Interactive• Components:• Knowledge Discovery• Domain Management• Reference Data Services• Matching Policy
  13. 13. DemoKnowledge Base Management(Creating a Knowledge Base)
  14. 14. Data Quality ProjectsImprove quality of source data by performing data cleansing and data matching activitiesusing defined knowledge bases• Cleansing Activity (2 step process)• Computer-assisted : data is categorized (suggested, new, invalid, corrected, and correct)• Interactive: data steward to approve, reject, or modify the proposed results from the computer-assistedcleansing process• Matching Activity• Using existing knowledge base matching policy• Prevent and remove data duplication• Data Profiling and Notifications• Profiling provides data quality stats and info: completeness and accuracy• Notification on actions that can be taken to enhance operations
  15. 15. DemoData Quality Project(Cleansing and Matching)
  16. 16. DQS Cleansing Transform in SSIS• When you want to automate the cleansing and matching processand not use the DQS Client• Use SSIS for batch data cleansing• Matching can be done with Master Data Services (MDS)• SSIS can be leveraged to bring DQS and MDS together*DQS does not expose matching functionality for SSIS, but you can use Fuzzy Grouping Transform toidentify duplicate data*Cleansing Transform is single threaded – use multiple transform for parallelism
  17. 17. DemoData Cleansing Transform(Automating the Cleansing and Matching using SSIS)
  18. 18. Installing DQS• Requires Business Intelligence or Enterprise/Developer version of SQL Server 2012• During SQL Server setup;• Instance Features -> Data Quality Services• Shared Features -> Data Quality Client• Execute the Data Quality Server Installer;• C:Program FilesMicrosoft SQL ServerMSSQL11.MSSQLSERVERMSSQLBinnDQSInstaller.exe• Data Quality Service – Data Quality Server Installer(Apps - Microsoft SQL Server 2012)
  19. 19. DQS ArchitectureDQS ServerDQS Catalog (3 databases)• DQS_MAIN (Knowledge Bases)• DQS_PROJECTS (Projects)• DQS_STAGING_DATA (Sandbox, scratch pad area)Security – Database Roles• dqs_administrator• dqs_kb_editor• dqs_kb_operator
  20. 20. Windows Azure MarketplaceReference Data Services -> validating, cleansing and enriching your data
  21. 21. Performance considerations - FYI• Major performance improvements from RTM to CU1 release of SQL Server 2012 (stronglyrecommend patching and upgrading)• Must read -> DQS Performance Best Practice Guide• Understand data volumes and hardware requirements… plan wisely!
  22. 22. Enterprise Information Management (EIM)The EIM Stack as a whole is the ‘Master Data Management’ solution from Microsoft andconsist of the following:• SQL Server Data Quality Services (DQS) - Capture and record knowledge, rules, and actions• SQL Server Master Data Services (MDS) - Master Data Management repository, Dimension data• SQL Server Integration Services (SSIS) – Moves data, integrationEnterprise Information Management (EMI)‘Master Data Management’
  23. 23. Resources• Data Quality Services Team Blog (MSDN)• SQL Server Data Quality Services (TechNet)• DQS Performance Best Practices Guide• Enterprise Information Management (EIM) Bringing Together SSIS, DQS, andMDS (Video – Channel 9)• Matt Masson – Getting Started with DQS and MDS• Paras Doshi’s – Blog (DQS)
  24. 24. What Questions Do You Have?
  25. 25. Thank YouFor attending this session
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.