SQL 2012 DQS

2,694 views

Published on

deck from DevTeach V

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,694
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
150
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • SQL Server 2012 learning resources - http://www.microsoft.com/sqlserver/en/us/learning-center/resources.aspxMore about PowerView - http://technet.microsoft.com/en-us/library/hh213579(v=sql.110).aspxExcel add-in for MDS - http://www.microsoft.com/download/en/details.aspx?id=28149More about column store index - http://msdn.microsoft.com/en-us/library/gg492088(v=SQL.110).aspxNew data type (enhancement to filestream) – Filetable - http://msdn.microsoft.com/en-us/library/ff929144(v=sql.110).aspx#DescriptionAlso enhancements to Full-text indexing (adding ability to search file metadata for .pdf, etc…)Also CDC support for Oracle
  • MSDN- what is DQS? - http://msdn.microsoft.com/en-us/library/ff877917(v=sql.110).aspx
  • Both
  • http://msdn.microsoft.com/en-us/sqlserver/hh323828.aspxCU1 - http://blogs.msdn.com/b/dqs/archive/2012/04/17/significant-performance-enhancements-in-dqs-with-the-cumulative-update-1-cu1-release-for-sql-server-2012.aspx
  • Both
  • Both
  • Both
  • Both
  • Both
  • Both
  • Both
  • More about Matching policies - http://blogs.msdn.com/b/dqs/archive/2011/11/02/matching-policy-a-closer-look-into-data-quality-services-data-matching.aspx
  • More about Matching policies - http://blogs.msdn.com/b/dqs/archive/2011/11/02/matching-policy-a-closer-look-into-data-quality-services-data-matching.aspx
  • Both
  • About MDS integration (from MSDN)Data Quality Processes in Master Data ServicesDQS functionality has been integrated into Master Data Services (MDS), so you can perform de-duplication on source data and master data within MDS workflows. Matching is included in the Microsoft SQL Server 2011 Master Data Services Add-in for Microsoft Excel. To perform matching, the data must be in an Excel spreadsheet. The Data Quality Server components must be installed with MDS.
  • About the Data cleansing task in SSIS - http://msdn.microsoft.com/en-us/library/ee677619(v=sql.110).aspxAlso from DQS team blog (on SSIS DQS task) - http://blogs.msdn.com/b/dqs/archive/2011/07/18/using-the-ssis-dqs-cleansing-component.aspxAlso a video on the same - http://msdn.microsoft.com/en-us/sqlserver/hh323819.aspx
  • On the advanced tab, you can enable output of additional information, such as confidence score, reason for correction, etc…
  • MSDN – how to configure output - http://msdn.microsoft.com/en-us/library/ee677612(v=sql.110).aspx
  • Output is Correct –or- Corrected –or- ToBeCorrectedManually
  • Both
  • SSIS Tasks - Lookup transformation - (this for that, substitutions)Cache transformation - (multiple lookups)Fuzzy Lookup - (lookup based on threshold matching)Fuzzy Grouping - (grouping based on thresholds)Data Mining Query - (based on mining model algorithms)DQS Cleansing - (uses a KB)
  • http://blogs.msdn.com/b/dqs/archive/2011/11/29/sql-server-2012-rc0-what-s-new-in-dqs.aspx
  • From the DQS Team Performance whitepaper http://www.microsoft.com/download/en/details.aspx?id=29075Also they note suggested hardware and note the following:Recommend using SSD drivesWarn on use of tempdb, recommend monitoring
  • TechEd Video - http://www.youtube.com/watch?v=jfDVG8Nf8No
  • Teach from www.TeachingKidsProgramming.org – Donate at www.MonaFoundation.org
  • SQL 2012 DQS

    1. 1. Data Quality Services @LynnLangit
    2. 2. Breakthrough Insights = Better BI
    3. 3. What is Data Quality Services? A set of tools and services that allow domain experts to improve Data Quality • Produces result set with suggested improvements • Does NOT change source data
    4. 4. Why Use DQS? SME input • Manually define, match , cleanse Machine • Programmatically “”, then manually approve • Can „learn‟ Cleansing • Can incorporate 3rd party data Integration • Can integrate with other data processes (SSIS)
    5. 5. When to use DQS (scenarios)Issue DetailCompleteness Is all information present?Conformity Is all data in the correct format?Consistency Do values represent the same meaning?Accuracy Do data objects represent their real-world values?Validity Do data values fall within acceptable ranges?Duplication Are there multiple copies of the same data?
    6. 6. DQS Architecture
    7. 7. Installing DQSSQL Server Not installed Post Install 2012 by default Client / Server / SSIS task Must run „DQS Server Installer‟ post SQL Install BI Edition Grant 1 of 3 DQS roles on the DQS_Main db Do MDS integration Make your data accessible for SQL operations Enterprise edition Enable TCP/IP for remote DQS CU1 DQS
    8. 8. DQS Components on SQL Server 2012
    9. 9. Data Quality Services client interface
    10. 10. How to Use DQS?List of Basic Steps • Create/Refine/Use a Knowledge Base • Perform a Data Quality Evaluation • Generate output (results)• List of Components • DQS Server • DQS Client(s)
    11. 11. How to Use DQS? Start with the KBKnowledge Bases• Can use included KB• Can refine included KB• Can create KB from source data• Can manually create KB
    12. 12. Parts of DQS KB – Domain Management
    13. 13. Adding Domain Values• Correct• Error• Invalid
    14. 14. More on Domain Values• Link as synonyms• Set as leading value
    15. 15. Regular or Composite Domains
    16. 16. More about KB Domain Management• Domain Properties – Description, Language…• Reference Data – relate to 3rd party data• Domain Rules – RegEx/length, etc…rule-based• Domain Values – shows substitute values• Term-Based Relations – common word corrections
    17. 17. Parts of DQS KB – Knowledge Discovery
    18. 18. Parts of DQS – Knowledge Discovery – 1/2Step two – Running Discovery
    19. 19. Parts of DQS KB – Knowledge Discovery – 2/2Step three – Correcting Values
    20. 20. DQS KB – Creating a Matching Policy – 1/3• Select data to be matched for each domain
    21. 21. DQS KB – Creating a Matching Policy – 2/3• Create matching rules per domains • Similar • set similarity score, when matching score < 60 • For numbers, set threshold (% or int) • For dates, set threshold (DD, MM or YY) • Exact – identical values (score of 100) • Configure Weight, must sum to 100 • Can configure Prerequisites
    22. 22. DQS KB – Creating a Matching Policy -3/3• Test matching rules per domains • Click „Start‟ • Review „Matching Results‟ tabs to compare one or more results
    23. 23. Matching – See ResultsMatching isusually performedAFTER cleansingand is focused onidentifying (andremoving)duplicates
    24. 24. More Matching Output
    25. 25. Using the DQS KB to do Data Cleaning• Create or Open a Data Quality Project• Map the DQS KB to the new data• Perform Cleansing• Manage / View Results• Export corrected results
    26. 26. DQS Project -- Cleansing
    27. 27. DQS Cleaning in Process…
    28. 28. DQS Cleaning complete
    29. 29. DQS Cleaning – Manage Results
    30. 30. DQS Output file InformationExport file column names (with option to include "Data andCleansing Info“)  XXX_Source - original source column value  XXX_Output - clean column value  XXX_Reason - reason column value was either valid or invalid  XXX_Confidence - column confidence percentage returned by the DQS server algorithms  XXX_Status - column processing status (i.e. Correct, New, Invalid, etc.)
    31. 31. DQS Administration - General
    32. 32. DQS Administration – Reference Data
    33. 33. DQS Administration - Logging
    34. 34. DQS IntegrationList of Integration Points• API? – not at this time• SSIS task• MDS (Master Data Management)
    35. 35. DQS Cleansing Task in SSIS
    36. 36. DQS Cleansing Task in SSIS - mapping For each input column define columns for • Source – contains input values • Output – contains correct or corrected or invalid output values • Status – contains auto suggest, correct, invalid or new
    37. 37. Running Package Status
    38. 38. DQS SSIS Task
    39. 39. DQS SSIS Task Complex Example
    40. 40. What is Master Data Management?Defining MDS• Central repository for data• Rule-based• Can work with DQS
    41. 41. Types of Data Quality Projects T-SQL scripts (boolean • Exact matches (WHERE = WHERE <> WHERE IN) match) • LIKE (%string matching) Full-text matching (semantic word match) • CONTAINS Semantic Search • SEMANTICSIMIALARITIESTABLE (semantic phrase match)SSIS tasks - (transactional, multi-valued matching) • List below • Knowledge Base - rules/matches DQS (KB matching) • Data Quality project - clean / correct data MDS (One view of truth) • Versioned Entities, Attributes and Rules
    42. 42. New since RC0• Use knowledge import from projects back to your knowledge base (KB) with Cleanse2KB• Use the Office speller as part of the DQS client• Use Composite Domain rules  to correct values  to detect rules violations• Import values from Excel  import values together with their synonyms• Use unstructured composite domain values?  KB parsing is a new feature that takes advantage of your knowledge for a more accurate parsing• Modify server log settings through the client UI
    43. 43. Performance Information
    44. 44. Resources www.Develop.comDQS Team Blog - hereDQS video – hereDQS on TechNet - hereMore samples – hereDQS videos (playlist) - here
    45. 45. Next Steps• Install DQS• Create a KB• Try out Data Cleansing
    46. 46. Related Session(s)• SQL BI  SQL 366 - Understanding Analysis Services in SQL Server 2012  SQL 422 – Integrating Spreadsheets with Enterprise Data  SQL 245 - Why Data Warehousing Projects Fail
    47. 47. www.TeachingKidsProgramming.orgDo a Recipe  Teach a Kid (Ages 10 ++)Microsoft SmallBasic  Free Courseware (recipes)
    48. 48. Keep up with Data Follow me @LynnLangit RSS my blog www.LynnLangit.com Hire me • To help build your BI/Big Data solution • To teach your team next gen BI with SQL Server 2012

    ×