An introduction to Data Quality Services (DQS)
Upcoming SlideShare
Loading in...5
×
 

An introduction to Data Quality Services (DQS)

on

  • 2,119 views

Speaker: Koen Verbeeck

Speaker: Koen Verbeeck

Download SQL Server 2012: http://www.microsoft.com/sqlserver/en/us/get-sql-server/try-it.aspx

Statistics

Views

Total Views
2,119
Views on SlideShare
2,119
Embed Views
0

Actions

Likes
1
Downloads
143
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

An introduction to Data Quality Services (DQS) An introduction to Data Quality Services (DQS) Presentation Transcript

  • AN INTRODUCTION TODATA QUALITY SERVICESkoen verbeeckBI consultant
  • WHO AM I• BI consultant @ Ordina• member of SQLUG.be• MCTS, MCITP in SQL Server 2008• working with Microsoft BI for over 2 years• beer and comic books enthusiast• married with children…
  • INTRODUCTIONdata quality? Data are of high quality "if they are fit for their intended uses in operations, decision making and planning" (J. M. Juran). - Wikipedia on Data Quality• achieved through people, technology & processes• can be measured with various dimensions • accuracy • consistency • completeness • duplicates (uniqueness) • timeliness • validness• bad data = bad business
  • INTRODUCTIONData Quality Issue Sample Data ProblemStandard Are data elements consistently Gender code = M, F, U in one system and Gender defined and understood? code = 0, 1, 2 in another systemComplete Is all necessary data present ? 20% of customers’ last name is blank, 50% of zip-codes are 99999Accurate Does the data accurately A supplier is listed as ‘Active’ but went out of represent reality or a verifiable business six years ago source?Valid Do data values fall within Temperature recordings should be between acceptable ranges? -100°C and +100°CUnique Data appears several times Prince, The Artist formerly known as Prince, The Artist, … are they the same person?
  • INTRODUCTIONMonitoring CleansingTracking and monitoring Amend, remove or enrichthe state of Quality data that is incorrect oractivities and Quality incomplete. This includesof Data correction, standardization and enrichment. Monitoring Cleansing Profiling MatchingProfiling MatchingAnalysis of the data Identifying, linking orsource to provide insight merging related entriesinto the quality of the within or across sets of data.data and help to identifydata quality issues.
  • OUTLINE• introduction• overview of data quality services• building a knowledge base• data cleansing & matching• SSIS integration• conclusion
  • OVERVIEW OF DQS Data Quality Services (DQS) is aKnowledge-Driven data quality solution,enabling IT Pros and data stewards to easily improve the quality of their data
  • OVERVIEW OF DQSKnowledge- Based on a Data Quality Knowledge Base (DQKB) Driven Semantics Data Domains capture the semantics of your dataKnowledge Acquires additional knowledge the more you use it Discovery Open and Support use of user-generated knowledge and IP Extendible by 3rd party reference data providers Compelling user experience designed for increasedEasy to use productivity
  • OVERVIEW OF DQS• easy installation • pre-installation checks o SQL Server 2012 database engine (server) o .NET 4.0 & IE 6.0 or higher (client) • installation of DQS using SQL Server set-up • post-installation tasks o run DQSInstaller.exe o grant DQS roles to users o enable TCP/IP
  • OUTLINE• introduction• overview of data quality services• building a knowledge base• data cleansing & matching• SSIS integration• conclusion
  • BUILDING A KNOWLEDGE BASE Knowledge ManagementBuild Discover / Explore Data / Connect Integrated Knowledge Profiling BaseUse DQ Projects
  • BUILDING A KNOWLEDGE BASE Values Composite Domains Domains Represent 3rd party the data typeReference Data Domains Knowledge Rules & Base Relations Matching Policy
  • DEMO• our first knowledge base
  • Z85HVQ4
  • BUILDING A KNOWLEDGE BASE• iterative process• knowledge discovery • gather knowledge from o Excel o SQL Server • profiling of data o not the same as SSIS profiling task! • automatically detects anomalies
  • BUILDING A KNOWLEDGE BASE• domain management • knowledge about fields is kept in domains • data steward can o create rules o assign synonyms and corrections o create term based relations (str.  street) o link domains together into composite domains • import knowledge from o reference data (e.g. Azure Marketplace) o other knowledge bases
  • OUTLINE• introduction• overview of data quality services• building a knowledge base• data cleansing & matching• SSIS integration• conclusion
  • DATA CLEANSING & MATCHING• cleansing • St. --> street (corrected) • why? • Microsot --> Microsoft (corrected) o identifies incomplete or incorrect data • john.doe@hotmail (invalid) o standardizes and enriches data by using • 0472/34672 (invalid) domain values, domain rules and reference data • Verbeek --> Verbeeck (suggested) • DQS cleansing o create a knowledge base or select an existing one o create a data quality project o 2-step process – computer assisted cleansing – interactive cleansing o export results
  • DATA CLEANSING & MATCHING• matching • Prince • The Artist Formerly Known • why? • As Prince The Artist o identify duplicates with the data source • o create consolidated view of data • Jon Doe, High Street 13, NY, • DQS matching doe@gmail.com o build a matching policy in KB John Doe, High Str, NY, o matching training doe@gmail.com o create matching project o choose survivors DQ Client – Match Results
  • DEMO• cleanse data• use a matching policy to find duplicates
  • DATA CLEANSING & MATCHING• create a cleansing project • uses knowledge gathered in a DQS knowledge base • simple user-friendly process • profile results
  • DATA CLEANSING & MATCHING• create a matching project • uses a matching policy created in a knowledge base • eliminates duplicates • profile results • the more knowledge that is added the better results will be o tip: clean-up the data first using a cleansing project • choose survivors at the end • export results into .csv or SQL Server
  • OUTLINE• introduction• overview of data quality services• building a knowledge base• data cleansing & matching• SSIS integration• conclusion
  • SSIS INTEGRATION SSIS Data Flow Knowledge Base SSIS Package Source + Data correction Values/Rules Mapping Component DestinationReference Data Definition
  • DEMO• an SSIS cleansing project
  • SSIS INTEGRATION• cleaning as a batch process• only cleaning, matching is (not yet?) possible• composite domains are supported
  • OUTLINE• introduction• overview of data quality services• building a knowledge base• data cleansing & matching• SSIS integration• conclusion
  • CONCLUSIONKnowledge-driven Easy To Use Open & Extendible Rich Knowledge Base Focus on productivity and Focus on cloud-based Continuous improvement user experience Reference Data and knowledge acquisition Designed for business users User-generated knowledge Build once, reuse for Out-of-the-box knowledge Integration with SSIS multiple DQ improvements
  • RESOURCES• DQS Team Blog @ MSDN http://blogs.msdn.com/b/dqs/• DQS documentation @ MSDN http://msdn.microsoft.com/en-us/library/ff877917(v=sql.110).aspx• SQL Server 2012 Resource Center (nice How-To videos) http://msdn.microsoft.com/en-us/sqlserver/ff898410.aspx• DQS Forum @ MSDN http://social.msdn.microsoft.com/Forums/en- US/sqldataqualityservices/threads• TechEd presentation about DQS by Elad Ziklik http://channel9.msdn.com/Events/TechEd/NorthAmerica/2011/DBI207
  • THE ENDthanks for watching!