Your SlideShare is downloading. ×
Data Quality Services        In SQL Server 2012        @LynnLangit
What is Data Quality Services?  A set of tools and services that allow domain experts to improve Data Quality• Produces re...
Why Use DQS? SME input     • Manually define, match , cleanse  Machine      • Programmatically “”, then manually approve  ...
When to use DQS (scenarios)Issue          DetailCompleteness   Is all information present?Conformity     Is all data in th...
DQS Architecture
Installing DQS SQL Server            Not installed                                                      Post Install   201...
DQS Components on SQL Server 2012
Data Quality Services client interface
How to Use DQS?List of Basic Steps  • Create/Refine/Use a Knowledge Base  • Perform a Data Quality Evaluation  • Generate ...
How to Use DQS? Step 1 - KBKnowledge Bases• Can use included KB• Can refine included KB• Can create KB from  source data• ...
Parts of DQS – Domain Management
Adding Domain Values• Correct• Error• Invalid
More on Domain Values• Link as synonyms• Set as leading value
Regular or Composite Domains
More about Domain Management•   Domain Properties – Description, Language…•   Reference Data – relate to 3rd party data•  ...
Parts of DQS – Knowledge Discovery
Parts of DQS – Knowledge DiscoveryStep two – Running Discovery
Parts of DQS – Knowledge DiscoveryStep three – Correcting Values
DQS KB – Creating a Matching Policy• Step One• Select data to be matched for each domain
DQS KB – Creating a Matching Policy• Step Two• Create matching rules per domains   • Similar       • set similarity score,...
DQS KB – Creating a Matching Policy• Step Three• Test matching rules per domains   • Click ‘Start’   • Review ‘Matching Re...
Matching – See ResultsMatching is usuallyperformed AFTERcleansing and is focusedon identifying (andremoving) duplicates
More Matching Output
Using the DQS KB to do Cleaning•   Create or Open a Data Quality Project•   Map the DQS KB to the new data•   Perform Clea...
DQS Project -- Cleansing
DQS Cleaning in Process…
DQS Cleaning complete
DQS Cleaning – Manage Results
DQS Output file InformationExport file column names (with option to include"Data and Cleansing Info“)  – XXX_Source - orig...
DQS Administration - General
DQS Administration – Reference Data
DQS Administration - Logging
DQS IntegrationList of Integration Points• API? – not at this time• SSIS task• MDS (Master Data Management)
DQS Cleansing Task in SSIS
DQS Cleansing Task in SSIS - mapping                   For each input column define columns for                   • Source...
Running Package Status
DQS SSIS Task
DQS SSIS Task Complex Example
What is Master Data Management?Define it• Central repository for data• Rule-based• Can work with DQS
Types of Data Quality Projects                                • Exact matches (WHERE = WHERE <> WHERE IN)T-SQL scripts (bo...
New in RC0• Use knowledge import from projects back to your knowledge  base (KB) with Cleanse2KB• Use the Office speller a...
Performance Information
Resources               www.Develop.comDQS Team Blog - hereDQS video – hereDQS on TechNet - hereMore samples – here
www.TeachingKidsProgramming.orgDo a Recipe  Teach a Kid (Ages 10 ++)Microsoft SmallBasic  Free Courseware (recipes)
Keep up with Data                Follow me @LynnLangit                RSS my blog                www.LynnLangit.com       ...
SQL 2012 Data Quality Services
Upcoming SlideShare
Loading in...5
×

SQL 2012 Data Quality Services

7,168

Published on

Slides from 60 minute screencast for DevelopMentor March 2012

Published in: Technology
1 Comment
3 Likes
Statistics
Notes
  • How do you use a third party reference? I have to code my own TPR but i dont know how. Which information does DQS need?
    I have seend you have a service from infochimps. I tried to use the same. But it doesn't work. :(
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
7,168
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
291
Comments
1
Likes
3
Embeds 0
No embeds

No notes for slide
  • SQL Server 2012 learning resources - http://www.microsoft.com/sqlserver/en/us/learning-center/resources.aspxMore about PowerView - http://technet.microsoft.com/en-us/library/hh213579(v=sql.110).aspxExcel add-in for MDS - http://www.microsoft.com/download/en/details.aspx?id=28149More about column store index - http://msdn.microsoft.com/en-us/library/gg492088(v=SQL.110).aspxNew data type (enhancement to filestream) – Filetable - http://msdn.microsoft.com/en-us/library/ff929144(v=sql.110).aspx#DescriptionAlso enhancements to Full-text indexing (adding ability to search file metadata for .pdf, etc…)Also CDC support for Oracle
  • MSDN- what is DQS? - http://msdn.microsoft.com/en-us/library/ff877917(v=sql.110).aspx
  • Both
  • http://msdn.microsoft.com/en-us/sqlserver/hh323828.aspx
  • Both
  • Both
  • Both
  • Both
  • Both
  • Both
  • Both
  • More about Matching policies - http://blogs.msdn.com/b/dqs/archive/2011/11/02/matching-policy-a-closer-look-into-data-quality-services-data-matching.aspx
  • More about Matching policies - http://blogs.msdn.com/b/dqs/archive/2011/11/02/matching-policy-a-closer-look-into-data-quality-services-data-matching.aspx
  • Both
  • About MDS integration (from MSDN)Data Quality Processes in Master Data ServicesDQS functionality has been integrated into Master Data Services (MDS), so you can perform de-duplication on source data and master data within MDS workflows. Matching is included in the Microsoft SQL Server 2011 Master Data Services Add-in for Microsoft Excel. To perform matching, the data must be in an Excel spreadsheet. The Data Quality Server components must be installed with MDS.
  • About the Data cleansing task in SSIS - http://msdn.microsoft.com/en-us/library/ee677619(v=sql.110).aspxAlso from DQS team blog (on SSIS DQS task) - http://blogs.msdn.com/b/dqs/archive/2011/07/18/using-the-ssis-dqs-cleansing-component.aspxAlso a video on the same - http://msdn.microsoft.com/en-us/sqlserver/hh323819.aspx
  • On the advanced tab, you can enable output of additional information, such as confidence score, reason for correction, etc…
  • MSDN – how to configure output - http://msdn.microsoft.com/en-us/library/ee677612(v=sql.110).aspx
  • Output is Correct –or- Corrected –or- ToBeCorrectedManually
  • Both
  • SSIS Tasks - Lookup transformation - (this for that, substitutions)Cache transformation - (multiple lookups)Fuzzy Lookup - (lookup based on threshold matching)Fuzzy Grouping - (grouping based on thresholds)Data Mining Query - (based on mining model algorithms)DQS Cleansing - (uses a KB)
  • http://blogs.msdn.com/b/dqs/archive/2011/11/29/sql-server-2012-rc0-what-s-new-in-dqs.aspx
  • From the DQS Team Performance whitepaper http://www.microsoft.com/download/en/details.aspx?id=29075Also they note suggested hardware and note the following:Recommend using SSD drivesWarn on use of tempdb, recommend monitoring
  • TechEd Video - http://www.youtube.com/watch?v=jfDVG8Nf8No
  • Teach from www.TeachingKidsProgramming.org – Donate at www.MonaFoundation.org
  • Transcript of "SQL 2012 Data Quality Services"

    1. 1. Data Quality Services In SQL Server 2012 @LynnLangit
    2. 2. Breakthrough Insights = Better BI
    3. 3. What is Data Quality Services? A set of tools and services that allow domain experts to improve Data Quality• Produces result set with suggested improvements• Does NOT change source data
    4. 4. Why Use DQS? SME input • Manually define, match , cleanse Machine • Programmatically “”, then manually approve Cleansing • Can ‘learn’ • Can incorporate 3rd party data Integration • Can integrate with other data processes (SSIS)
    5. 5. When to use DQS (scenarios)Issue DetailCompleteness Is all information present?Conformity Is all data in the correct format?Consistency Do values represent the same meaning?Accuracy Do data objects represent their real-world values?Validity Do data values fall within acceptable ranges?Duplication Are there multiple copies of the same data?
    6. 6. DQS Architecture
    7. 7. Installing DQS SQL Server Not installed Post Install 2012 by default Client / Server / SSIS task Must run ‘DQS Server BI Edition Installer’ post SQL Install Grant 1 of 3 DQS roles on the DQS_Main db Make your data accessible for SQL operations Enterprise edition Do MDS integration Enable TCP/IP for remote DQS
    8. 8. DQS Components on SQL Server 2012
    9. 9. Data Quality Services client interface
    10. 10. How to Use DQS?List of Basic Steps • Create/Refine/Use a Knowledge Base • Perform a Data Quality Evaluation • Generate output (results)• List of Components • DQS Server • DQS Client(s)
    11. 11. How to Use DQS? Step 1 - KBKnowledge Bases• Can use included KB• Can refine included KB• Can create KB from source data• Can manually create KB
    12. 12. Parts of DQS – Domain Management
    13. 13. Adding Domain Values• Correct• Error• Invalid
    14. 14. More on Domain Values• Link as synonyms• Set as leading value
    15. 15. Regular or Composite Domains
    16. 16. More about Domain Management• Domain Properties – Description, Language…• Reference Data – relate to 3rd party data• Domain Rules – RegEx/length, etc…rule-based• Domain Values – shows substitute values• Term-Based Relations – common word corrections
    17. 17. Parts of DQS – Knowledge Discovery
    18. 18. Parts of DQS – Knowledge DiscoveryStep two – Running Discovery
    19. 19. Parts of DQS – Knowledge DiscoveryStep three – Correcting Values
    20. 20. DQS KB – Creating a Matching Policy• Step One• Select data to be matched for each domain
    21. 21. DQS KB – Creating a Matching Policy• Step Two• Create matching rules per domains • Similar • set similarity score, when matching score < 60 • For numbers, set threshold (% or int) • For dates, set threshold (DD, MM or YY) • Exact – identical values (score of 100) • Configure Weight, must sum to 100 • Can configure Prerequisites
    22. 22. DQS KB – Creating a Matching Policy• Step Three• Test matching rules per domains • Click ‘Start’ • Review ‘Matching Results’ tabs to compare one or more results
    23. 23. Matching – See ResultsMatching is usuallyperformed AFTERcleansing and is focusedon identifying (andremoving) duplicates
    24. 24. More Matching Output
    25. 25. Using the DQS KB to do Cleaning• Create or Open a Data Quality Project• Map the DQS KB to the new data• Perform Cleansing• Manage / View Results• Export corrected results
    26. 26. DQS Project -- Cleansing
    27. 27. DQS Cleaning in Process…
    28. 28. DQS Cleaning complete
    29. 29. DQS Cleaning – Manage Results
    30. 30. DQS Output file InformationExport file column names (with option to include"Data and Cleansing Info“) – XXX_Source - original source column value – XXX_Output - clean column value – XXX_Reason - reason column value was either valid or invalid – XXX_Confidence - column confidence percentage returned by the DQS server algorithms – XXX_Status - column processing status (i.e. Correct, New, Invalid, etc.)
    31. 31. DQS Administration - General
    32. 32. DQS Administration – Reference Data
    33. 33. DQS Administration - Logging
    34. 34. DQS IntegrationList of Integration Points• API? – not at this time• SSIS task• MDS (Master Data Management)
    35. 35. DQS Cleansing Task in SSIS
    36. 36. DQS Cleansing Task in SSIS - mapping For each input column define columns for • Source – contains input values • Output – contains correct or corrected or invalid output values • Status – contains auto suggest, correct, invalid or new
    37. 37. Running Package Status
    38. 38. DQS SSIS Task
    39. 39. DQS SSIS Task Complex Example
    40. 40. What is Master Data Management?Define it• Central repository for data• Rule-based• Can work with DQS
    41. 41. Types of Data Quality Projects • Exact matches (WHERE = WHERE <> WHERE IN)T-SQL scripts (boolean match) • LIKE (%string matching)Full-text matching (semantic word match) • CONTAINS Semantic Search (semantic phrase match) • SEMANTICSIMIALARITIESTABLE SSIS tasks - (transactional, multi-valued • List below matching) • KnowledgeBase rules/matches - DQS (KB matching) • DataQualityproject clean correctdata - / MDS (One view of truth) • Versioned Entities, Attributes and Rules
    42. 42. New in RC0• Use knowledge import from projects back to your knowledge base (KB) with Cleanse2KB• Use the Office speller as part of the DQS client• Use Composite Domain rules – to correct values – to detect rules violations• Import values from Excel – import values together with their synonyms• Use unstructured composite domain values? – KB parsing is a new feature that takes advantage of your knowledge for a more accurate parsing• Modify server log settings through the client UI
    43. 43. Performance Information
    44. 44. Resources www.Develop.comDQS Team Blog - hereDQS video – hereDQS on TechNet - hereMore samples – here
    45. 45. www.TeachingKidsProgramming.orgDo a Recipe  Teach a Kid (Ages 10 ++)Microsoft SmallBasic  Free Courseware (recipes)
    46. 46. Keep up with Data Follow me @LynnLangit RSS my blog www.LynnLangit.com Hire me • To help build your BI/Big Data solution • To teach your team next gen BI with SQL Server 2012

    ×