Your SlideShare is downloading. ×
SQL 2012 DQS
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

SQL 2012 DQS

1,996
views

Published on

deck from DevTeach V

deck from DevTeach V

Published in: Technology

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,996
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
118
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • SQL Server 2012 learning resources - http://www.microsoft.com/sqlserver/en/us/learning-center/resources.aspxMore about PowerView - http://technet.microsoft.com/en-us/library/hh213579(v=sql.110).aspxExcel add-in for MDS - http://www.microsoft.com/download/en/details.aspx?id=28149More about column store index - http://msdn.microsoft.com/en-us/library/gg492088(v=SQL.110).aspxNew data type (enhancement to filestream) – Filetable - http://msdn.microsoft.com/en-us/library/ff929144(v=sql.110).aspx#DescriptionAlso enhancements to Full-text indexing (adding ability to search file metadata for .pdf, etc…)Also CDC support for Oracle
  • MSDN- what is DQS? - http://msdn.microsoft.com/en-us/library/ff877917(v=sql.110).aspx
  • Both
  • http://msdn.microsoft.com/en-us/sqlserver/hh323828.aspxCU1 - http://blogs.msdn.com/b/dqs/archive/2012/04/17/significant-performance-enhancements-in-dqs-with-the-cumulative-update-1-cu1-release-for-sql-server-2012.aspx
  • Both
  • Both
  • Both
  • Both
  • Both
  • Both
  • Both
  • More about Matching policies - http://blogs.msdn.com/b/dqs/archive/2011/11/02/matching-policy-a-closer-look-into-data-quality-services-data-matching.aspx
  • More about Matching policies - http://blogs.msdn.com/b/dqs/archive/2011/11/02/matching-policy-a-closer-look-into-data-quality-services-data-matching.aspx
  • Both
  • About MDS integration (from MSDN)Data Quality Processes in Master Data ServicesDQS functionality has been integrated into Master Data Services (MDS), so you can perform de-duplication on source data and master data within MDS workflows. Matching is included in the Microsoft SQL Server 2011 Master Data Services Add-in for Microsoft Excel. To perform matching, the data must be in an Excel spreadsheet. The Data Quality Server components must be installed with MDS.
  • About the Data cleansing task in SSIS - http://msdn.microsoft.com/en-us/library/ee677619(v=sql.110).aspxAlso from DQS team blog (on SSIS DQS task) - http://blogs.msdn.com/b/dqs/archive/2011/07/18/using-the-ssis-dqs-cleansing-component.aspxAlso a video on the same - http://msdn.microsoft.com/en-us/sqlserver/hh323819.aspx
  • On the advanced tab, you can enable output of additional information, such as confidence score, reason for correction, etc…
  • MSDN – how to configure output - http://msdn.microsoft.com/en-us/library/ee677612(v=sql.110).aspx
  • Output is Correct –or- Corrected –or- ToBeCorrectedManually
  • Both
  • SSIS Tasks - Lookup transformation - (this for that, substitutions)Cache transformation - (multiple lookups)Fuzzy Lookup - (lookup based on threshold matching)Fuzzy Grouping - (grouping based on thresholds)Data Mining Query - (based on mining model algorithms)DQS Cleansing - (uses a KB)
  • http://blogs.msdn.com/b/dqs/archive/2011/11/29/sql-server-2012-rc0-what-s-new-in-dqs.aspx
  • From the DQS Team Performance whitepaper http://www.microsoft.com/download/en/details.aspx?id=29075Also they note suggested hardware and note the following:Recommend using SSD drivesWarn on use of tempdb, recommend monitoring
  • TechEd Video - http://www.youtube.com/watch?v=jfDVG8Nf8No
  • Teach from www.TeachingKidsProgramming.org – Donate at www.MonaFoundation.org
  • Transcript

    • 1. Data Quality Services @LynnLangit
    • 2. Breakthrough Insights = Better BI
    • 3. What is Data Quality Services? A set of tools and services that allow domain experts to improve Data Quality • Produces result set with suggested improvements • Does NOT change source data
    • 4. Why Use DQS? SME input • Manually define, match , cleanse Machine • Programmatically “”, then manually approve • Can „learn‟ Cleansing • Can incorporate 3rd party data Integration • Can integrate with other data processes (SSIS)
    • 5. When to use DQS (scenarios)Issue DetailCompleteness Is all information present?Conformity Is all data in the correct format?Consistency Do values represent the same meaning?Accuracy Do data objects represent their real-world values?Validity Do data values fall within acceptable ranges?Duplication Are there multiple copies of the same data?
    • 6. DQS Architecture
    • 7. Installing DQSSQL Server Not installed Post Install 2012 by default Client / Server / SSIS task Must run „DQS Server Installer‟ post SQL Install BI Edition Grant 1 of 3 DQS roles on the DQS_Main db Do MDS integration Make your data accessible for SQL operations Enterprise edition Enable TCP/IP for remote DQS CU1 DQS
    • 8. DQS Components on SQL Server 2012
    • 9. Data Quality Services client interface
    • 10. How to Use DQS?List of Basic Steps • Create/Refine/Use a Knowledge Base • Perform a Data Quality Evaluation • Generate output (results)• List of Components • DQS Server • DQS Client(s)
    • 11. How to Use DQS? Start with the KBKnowledge Bases• Can use included KB• Can refine included KB• Can create KB from source data• Can manually create KB
    • 12. Parts of DQS KB – Domain Management
    • 13. Adding Domain Values• Correct• Error• Invalid
    • 14. More on Domain Values• Link as synonyms• Set as leading value
    • 15. Regular or Composite Domains
    • 16. More about KB Domain Management• Domain Properties – Description, Language…• Reference Data – relate to 3rd party data• Domain Rules – RegEx/length, etc…rule-based• Domain Values – shows substitute values• Term-Based Relations – common word corrections
    • 17. Parts of DQS KB – Knowledge Discovery
    • 18. Parts of DQS – Knowledge Discovery – 1/2Step two – Running Discovery
    • 19. Parts of DQS KB – Knowledge Discovery – 2/2Step three – Correcting Values
    • 20. DQS KB – Creating a Matching Policy – 1/3• Select data to be matched for each domain
    • 21. DQS KB – Creating a Matching Policy – 2/3• Create matching rules per domains • Similar • set similarity score, when matching score < 60 • For numbers, set threshold (% or int) • For dates, set threshold (DD, MM or YY) • Exact – identical values (score of 100) • Configure Weight, must sum to 100 • Can configure Prerequisites
    • 22. DQS KB – Creating a Matching Policy -3/3• Test matching rules per domains • Click „Start‟ • Review „Matching Results‟ tabs to compare one or more results
    • 23. Matching – See ResultsMatching isusually performedAFTER cleansingand is focused onidentifying (andremoving)duplicates
    • 24. More Matching Output
    • 25. Using the DQS KB to do Data Cleaning• Create or Open a Data Quality Project• Map the DQS KB to the new data• Perform Cleansing• Manage / View Results• Export corrected results
    • 26. DQS Project -- Cleansing
    • 27. DQS Cleaning in Process…
    • 28. DQS Cleaning complete
    • 29. DQS Cleaning – Manage Results
    • 30. DQS Output file InformationExport file column names (with option to include "Data andCleansing Info“)  XXX_Source - original source column value  XXX_Output - clean column value  XXX_Reason - reason column value was either valid or invalid  XXX_Confidence - column confidence percentage returned by the DQS server algorithms  XXX_Status - column processing status (i.e. Correct, New, Invalid, etc.)
    • 31. DQS Administration - General
    • 32. DQS Administration – Reference Data
    • 33. DQS Administration - Logging
    • 34. DQS IntegrationList of Integration Points• API? – not at this time• SSIS task• MDS (Master Data Management)
    • 35. DQS Cleansing Task in SSIS
    • 36. DQS Cleansing Task in SSIS - mapping For each input column define columns for • Source – contains input values • Output – contains correct or corrected or invalid output values • Status – contains auto suggest, correct, invalid or new
    • 37. Running Package Status
    • 38. DQS SSIS Task
    • 39. DQS SSIS Task Complex Example
    • 40. What is Master Data Management?Defining MDS• Central repository for data• Rule-based• Can work with DQS
    • 41. Types of Data Quality Projects T-SQL scripts (boolean • Exact matches (WHERE = WHERE <> WHERE IN) match) • LIKE (%string matching) Full-text matching (semantic word match) • CONTAINS Semantic Search • SEMANTICSIMIALARITIESTABLE (semantic phrase match)SSIS tasks - (transactional, multi-valued matching) • List below • Knowledge Base - rules/matches DQS (KB matching) • Data Quality project - clean / correct data MDS (One view of truth) • Versioned Entities, Attributes and Rules
    • 42. New since RC0• Use knowledge import from projects back to your knowledge base (KB) with Cleanse2KB• Use the Office speller as part of the DQS client• Use Composite Domain rules  to correct values  to detect rules violations• Import values from Excel  import values together with their synonyms• Use unstructured composite domain values?  KB parsing is a new feature that takes advantage of your knowledge for a more accurate parsing• Modify server log settings through the client UI
    • 43. Performance Information
    • 44. Resources www.Develop.comDQS Team Blog - hereDQS video – hereDQS on TechNet - hereMore samples – hereDQS videos (playlist) - here
    • 45. Next Steps• Install DQS• Create a KB• Try out Data Cleansing
    • 46. Related Session(s)• SQL BI  SQL 366 - Understanding Analysis Services in SQL Server 2012  SQL 422 – Integrating Spreadsheets with Enterprise Data  SQL 245 - Why Data Warehousing Projects Fail
    • 47. www.TeachingKidsProgramming.orgDo a Recipe  Teach a Kid (Ages 10 ++)Microsoft SmallBasic  Free Courseware (recipes)
    • 48. Keep up with Data Follow me @LynnLangit RSS my blog www.LynnLangit.com Hire me • To help build your BI/Big Data solution • To teach your team next gen BI with SQL Server 2012