Your SlideShare is downloading. ×
SQL 2012 DQS
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

SQL 2012 DQS


Published on

deck from DevTeach V

deck from DevTeach V

Published in: Technology

  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • SQL Server 2012 learning resources - about PowerView - add-in for MDS - about column store index - data type (enhancement to filestream) – Filetable - enhancements to Full-text indexing (adding ability to search file metadata for .pdf, etc…)Also CDC support for Oracle
  • MSDN- what is DQS? -
  • Both
  • -
  • Both
  • Both
  • Both
  • Both
  • Both
  • Both
  • Both
  • More about Matching policies -
  • More about Matching policies -
  • Both
  • About MDS integration (from MSDN)Data Quality Processes in Master Data ServicesDQS functionality has been integrated into Master Data Services (MDS), so you can perform de-duplication on source data and master data within MDS workflows. Matching is included in the Microsoft SQL Server 2011 Master Data Services Add-in for Microsoft Excel. To perform matching, the data must be in an Excel spreadsheet. The Data Quality Server components must be installed with MDS.
  • About the Data cleansing task in SSIS - from DQS team blog (on SSIS DQS task) - a video on the same -
  • On the advanced tab, you can enable output of additional information, such as confidence score, reason for correction, etc…
  • MSDN – how to configure output -
  • Output is Correct –or- Corrected –or- ToBeCorrectedManually
  • Both
  • SSIS Tasks - Lookup transformation - (this for that, substitutions)Cache transformation - (multiple lookups)Fuzzy Lookup - (lookup based on threshold matching)Fuzzy Grouping - (grouping based on thresholds)Data Mining Query - (based on mining model algorithms)DQS Cleansing - (uses a KB)
  • From the DQS Team Performance whitepaper they note suggested hardware and note the following:Recommend using SSD drivesWarn on use of tempdb, recommend monitoring
  • TechEd Video -
  • Teach from – Donate at
  • Transcript

    • 1. Data Quality Services @LynnLangit
    • 2. Breakthrough Insights = Better BI
    • 3. What is Data Quality Services? A set of tools and services that allow domain experts to improve Data Quality • Produces result set with suggested improvements • Does NOT change source data
    • 4. Why Use DQS? SME input • Manually define, match , cleanse Machine • Programmatically “”, then manually approve • Can „learn‟ Cleansing • Can incorporate 3rd party data Integration • Can integrate with other data processes (SSIS)
    • 5. When to use DQS (scenarios)Issue DetailCompleteness Is all information present?Conformity Is all data in the correct format?Consistency Do values represent the same meaning?Accuracy Do data objects represent their real-world values?Validity Do data values fall within acceptable ranges?Duplication Are there multiple copies of the same data?
    • 6. DQS Architecture
    • 7. Installing DQSSQL Server Not installed Post Install 2012 by default Client / Server / SSIS task Must run „DQS Server Installer‟ post SQL Install BI Edition Grant 1 of 3 DQS roles on the DQS_Main db Do MDS integration Make your data accessible for SQL operations Enterprise edition Enable TCP/IP for remote DQS CU1 DQS
    • 8. DQS Components on SQL Server 2012
    • 9. Data Quality Services client interface
    • 10. How to Use DQS?List of Basic Steps • Create/Refine/Use a Knowledge Base • Perform a Data Quality Evaluation • Generate output (results)• List of Components • DQS Server • DQS Client(s)
    • 11. How to Use DQS? Start with the KBKnowledge Bases• Can use included KB• Can refine included KB• Can create KB from source data• Can manually create KB
    • 12. Parts of DQS KB – Domain Management
    • 13. Adding Domain Values• Correct• Error• Invalid
    • 14. More on Domain Values• Link as synonyms• Set as leading value
    • 15. Regular or Composite Domains
    • 16. More about KB Domain Management• Domain Properties – Description, Language…• Reference Data – relate to 3rd party data• Domain Rules – RegEx/length, etc…rule-based• Domain Values – shows substitute values• Term-Based Relations – common word corrections
    • 17. Parts of DQS KB – Knowledge Discovery
    • 18. Parts of DQS – Knowledge Discovery – 1/2Step two – Running Discovery
    • 19. Parts of DQS KB – Knowledge Discovery – 2/2Step three – Correcting Values
    • 20. DQS KB – Creating a Matching Policy – 1/3• Select data to be matched for each domain
    • 21. DQS KB – Creating a Matching Policy – 2/3• Create matching rules per domains • Similar • set similarity score, when matching score < 60 • For numbers, set threshold (% or int) • For dates, set threshold (DD, MM or YY) • Exact – identical values (score of 100) • Configure Weight, must sum to 100 • Can configure Prerequisites
    • 22. DQS KB – Creating a Matching Policy -3/3• Test matching rules per domains • Click „Start‟ • Review „Matching Results‟ tabs to compare one or more results
    • 23. Matching – See ResultsMatching isusually performedAFTER cleansingand is focused onidentifying (andremoving)duplicates
    • 24. More Matching Output
    • 25. Using the DQS KB to do Data Cleaning• Create or Open a Data Quality Project• Map the DQS KB to the new data• Perform Cleansing• Manage / View Results• Export corrected results
    • 26. DQS Project -- Cleansing
    • 27. DQS Cleaning in Process…
    • 28. DQS Cleaning complete
    • 29. DQS Cleaning – Manage Results
    • 30. DQS Output file InformationExport file column names (with option to include "Data andCleansing Info“)  XXX_Source - original source column value  XXX_Output - clean column value  XXX_Reason - reason column value was either valid or invalid  XXX_Confidence - column confidence percentage returned by the DQS server algorithms  XXX_Status - column processing status (i.e. Correct, New, Invalid, etc.)
    • 31. DQS Administration - General
    • 32. DQS Administration – Reference Data
    • 33. DQS Administration - Logging
    • 34. DQS IntegrationList of Integration Points• API? – not at this time• SSIS task• MDS (Master Data Management)
    • 35. DQS Cleansing Task in SSIS
    • 36. DQS Cleansing Task in SSIS - mapping For each input column define columns for • Source – contains input values • Output – contains correct or corrected or invalid output values • Status – contains auto suggest, correct, invalid or new
    • 37. Running Package Status
    • 38. DQS SSIS Task
    • 39. DQS SSIS Task Complex Example
    • 40. What is Master Data Management?Defining MDS• Central repository for data• Rule-based• Can work with DQS
    • 41. Types of Data Quality Projects T-SQL scripts (boolean • Exact matches (WHERE = WHERE <> WHERE IN) match) • LIKE (%string matching) Full-text matching (semantic word match) • CONTAINS Semantic Search • SEMANTICSIMIALARITIESTABLE (semantic phrase match)SSIS tasks - (transactional, multi-valued matching) • List below • Knowledge Base - rules/matches DQS (KB matching) • Data Quality project - clean / correct data MDS (One view of truth) • Versioned Entities, Attributes and Rules
    • 42. New since RC0• Use knowledge import from projects back to your knowledge base (KB) with Cleanse2KB• Use the Office speller as part of the DQS client• Use Composite Domain rules  to correct values  to detect rules violations• Import values from Excel  import values together with their synonyms• Use unstructured composite domain values?  KB parsing is a new feature that takes advantage of your knowledge for a more accurate parsing• Modify server log settings through the client UI
    • 43. Performance Information
    • 44. Resources www.Develop.comDQS Team Blog - hereDQS video – hereDQS on TechNet - hereMore samples – hereDQS videos (playlist) - here
    • 45. Next Steps• Install DQS• Create a KB• Try out Data Cleansing
    • 46. Related Session(s)• SQL BI  SQL 366 - Understanding Analysis Services in SQL Server 2012  SQL 422 – Integrating Spreadsheets with Enterprise Data  SQL 245 - Why Data Warehousing Projects Fail
    • 47. www.TeachingKidsProgramming.orgDo a Recipe  Teach a Kid (Ages 10 ++)Microsoft SmallBasic  Free Courseware (recipes)
    • 48. Keep up with Data Follow me @LynnLangit RSS my blog Hire me • To help build your BI/Big Data solution • To teach your team next gen BI with SQL Server 2012