Data Quality Services in SQL Server 2012

Stéphane Fréchette
Stéphane FréchetteData & Business Intelligence Solutions Architect | Consultant | Big Data | NoSQL | Data Science | Data Platform MVP
Data Quality Services in SQL Server 2012
(An Introduction)
Stéphane Fréchette
Friday April 26, 2013
Matching
Cleansing
DQS
Who am I?
My name is Stéphane Fréchette
I’m a Database & Business Intelligence Professional and CEO | Founder of
I have a passion for architecting, designing and building solutions that matter.
Self proclaimed Open Data Hacker/Advocate I founded Gatineau Ouverte a citizen led
initiative which aims to promote open access to civic data of the city of Gatineau.
Twitter: @sfrechette
Email: stephanefrechette@ukubu.com
Blog: stephanefrechette.com
Session Outline
• Microsoft Business Intelligence (The Stack)
• Dirty Data…
• SQL Server Data Quality Services (DQS)
• Data Steward
• Knowledge Base and Domains
• Data Quality Projects
• Data Cleansing Transform – SSIS
• DQS (Install & Architecture)
• Enterprise Information Management (EMI)
• Resources
Analysis
Services
Reporting
Services
Integration
Services
Master Data
Services
SharePoint
Collaboration
Excel
Workbooks
PowerPivot
Applications
SharePoint
Dashboards & Scorecards
Data Quality
Services
OData
Feeds
Line of Business
Applications
Hadoop Big Data
Microsoft Business Intelligence
Dirty Data…
Do you have dirty data?
(all projects have it! Its inevitable)
Dirty Data…
Causes?
Bad data entry
Poor Data Governance
Duplicate entities in different LOB systems
Sample Data Representation
• Prospect in CRM System:
Mark Smith | 613.111-1234 | Ottawa | ON | K1P 1K1
• Prospect buys goods now entered in POS System:
Markus Smith | 1234 Stilton Ave | Kanata |ON | K1P 1K1
• Record also entered into Accounting System:
Markus Smith | 1234 Stilton Avenue | Kanata | ON | K1P 1K1
ETL process imports these records into the Data Warehouse / Data Mart
FirstName LastName Phone Address City Province PostalCode
Mark Smith 613.111-1234 Ottawa ON K1P 1K1
Markus Smith 1234 Stilton Ave Kanata ON K1P 1K1
Markus Smith 1234 Stilton Avenue Kanata ON K1P 1K1
Sample Data Representation
• Duplicate records and inaccurate, incomplete data
• What we want is a golden record (one version of the truth)
FirstName LastName Phone Address City Province PostalCode
Mark Smith 613.111-1234 Ottawa ON K1P 1K1
Markus Smith 1234 Stilton Ave Kanata ON K1P 1K1
Markus Smith 1234 Stilton Avenue Kanata ON K1P 1K1
FirstName LastName Phone Address City Province PostalCode
Markus Smith 613-111-1234 1234 Stilton Ave Kanata ON K1P 1K1
SQL Server Data Quality Services (DQS)
• New in SQL Server 2012
• Enables cleansing, matching, standardizing and enriching data
• Delivers trusted information for business intelligence, data warehouse, transaction
processing workloads
• Knowledge-Driven Solution (create/edit)
• A knowledge management process that builds the knowledge base
• A data quality project that proposes changes to source data based on the knowledge in the knowledge
base (cleansing and matching)
• A key component to an Enterprise Information Management (EIM) solution
Answering the Need with DQS
• DQS enables to resolve issues involving incompleteness, lack of conformity, inconsistency,
inaccuracy, invalidity, and data duplication
• Provides the following features to resolve data quality issues:
 Data Cleansing
 Matching
 Reference Data Services
 Profiling
 Monitoring
 Knowledge Base
Data Steward
• Key role - Is usually a Business User and not from the Information Technology side
• Nutshell: Responsible for maintaining data elements in a metadata registry…
• Data Steward -> DQS Client
• Create and edit Knowledge Bases
• Run and process data though continually, iteratively, improving the Knowledge Bases
• Knowledge Bases can be consumed and used by other Data Stewards and IT (SSIS / ETL Developers)
DQS
Data Steward
MDS
Data Steward
SSIS
Developer
Matching Cleansing
Knowledge Bases and Domains
The knowledge base is a repo of knowledge about your data that enables you to understand
your data and maintain its integrity.
• Processes:
• Computer-assisted
• Interactive
• Components:
• Knowledge Discovery
• Domain Management
• Reference Data Services
• Matching Policy
Demo
Knowledge Base Management
(Creating a Knowledge Base)
Data Quality Projects
Improve quality of source data by performing data cleansing and data matching activities
using defined knowledge bases
• Cleansing Activity (2 step process)
• Computer-assisted : data is categorized (suggested, new, invalid, corrected, and correct)
• Interactive: data steward to approve, reject, or modify the proposed results from the computer-assisted
cleansing process
• Matching Activity
• Using existing knowledge base matching policy
• Prevent and remove data duplication
• Data Profiling and Notifications
• Profiling provides data quality stats and info: completeness and accuracy
• Notification on actions that can be taken to enhance operations
Demo
Data Quality Project
(Cleansing and Matching)
DQS Cleansing Transform in SSIS
• When you want to automate the cleansing and matching process
and not use the DQS Client
• Use SSIS for batch data cleansing
• Matching can be done with Master Data Services (MDS)
• SSIS can be leveraged to bring DQS and MDS together
*DQS does not expose matching functionality for SSIS, but you can use Fuzzy Grouping Transform to
identify duplicate data
*Cleansing Transform is single threaded – use multiple transform for parallelism
Demo
Data Cleansing Transform
(Automating the Cleansing and Matching using SSIS)
Installing DQS
• Requires Business Intelligence or Enterprise/Developer version of SQL Server 2012
• During SQL Server setup;
• Instance Features -> Data Quality Services
• Shared Features -> Data Quality Client
• Execute the Data Quality Server Installer;
• C:Program FilesMicrosoft SQL ServerMSSQL11.MSSQLSERVERMSSQLBinnDQSInstaller.exe
• Data Quality Service – Data Quality Server Installer
(Apps - Microsoft SQL Server 2012)
DQS Architecture
DQS Server
DQS Catalog (3 databases)
• DQS_MAIN (Knowledge Bases)
• DQS_PROJECTS (Projects)
• DQS_STAGING_DATA (Sandbox, scratch pad area)
Security – Database Roles
• dqs_administrator
• dqs_kb_editor
• dqs_kb_operator
Windows Azure Marketplace
Reference Data Services -> validating, cleansing and enriching your data
Performance considerations - FYI
• Major performance improvements from RTM to CU1 release of SQL Server 2012 (strongly
recommend patching and upgrading) http://bit.ly/11eEhHC
• Must read -> DQS Performance Best Practice Guide http://bit.ly/16Gwenl
• Understand data volumes and hardware requirements… plan wisely!
Enterprise Information Management (EIM)
The EIM Stack as a whole is the ‘Master Data Management’ solution from Microsoft and
consist of the following:
• SQL Server Data Quality Services (DQS) - Capture and record knowledge, rules, and actions
• SQL Server Master Data Services (MDS) - Master Data Management repository, Dimension data
• SQL Server Integration Services (SSIS) – Moves data, integration
Enterprise Information Management (EMI)
‘Master Data Management’
Resources
• Data Quality Services Team Blog (MSDN) http://bit.ly/WCI2nO
• SQL Server Data Quality Services (TechNet) http://bit.ly/ZaUO8k
• DQS Performance Best Practices Guide http://bit.ly/16Gwenl
• Enterprise Information Management (EIM) Bringing Together SSIS, DQS, and
MDS (Video – Channel 9) http://bit.ly/NJXvKr
• Matt Masson – Getting Started with DQS and MDS http://bit.ly/149Ga9n
• Paras Doshi’s – Blog (DQS) http://bit.ly/YoLthh
What Questions Do You Have?
Thank You
For attending this session
1 of 25

Recommended

Data Platform Architecture Principles and Evaluation Criteria by
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaScyllaDB
553 views21 slides
Data Catalog for Better Data Discovery and Governance by
Data Catalog for Better Data Discovery and GovernanceData Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceDenodo
1.9K views19 slides
Improving Data Literacy Around Data Architecture by
Improving Data Literacy Around Data ArchitectureImproving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureDATAVERSITY
971 views28 slides
Azure Synapse 101 Webinar Presentation by
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationMatthew W. Bowers
537 views42 slides
DW Migration Webinar-March 2022.pptx by
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
4.3K views25 slides
Adopting a Process-Driven Approach to Master Data Management by
Adopting a Process-Driven Approach to Master Data ManagementAdopting a Process-Driven Approach to Master Data Management
Adopting a Process-Driven Approach to Master Data ManagementSoftware AG
7.6K views40 slides

More Related Content

What's hot

Data Modeling, Data Governance, & Data Quality by
Data Modeling, Data Governance, & Data QualityData Modeling, Data Governance, & Data Quality
Data Modeling, Data Governance, & Data QualityDATAVERSITY
10.4K views42 slides
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to... by
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...DATAVERSITY
18.5K views47 slides
07. Analytics & Reporting Requirements Template by
07. Analytics & Reporting Requirements Template07. Analytics & Reporting Requirements Template
07. Analytics & Reporting Requirements TemplateAlan D. Duncan
67.3K views7 slides
DAS Slides: Data Quality Best Practices by
DAS Slides: Data Quality Best PracticesDAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best PracticesDATAVERSITY
2.5K views32 slides
Databricks Delta Lake and Its Benefits by
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks
5.1K views21 slides
Introduction to Microsoft’s Master Data Services (MDS) by
Introduction to Microsoft’s Master Data Services (MDS)Introduction to Microsoft’s Master Data Services (MDS)
Introduction to Microsoft’s Master Data Services (MDS)James Serra
50.3K views29 slides

What's hot(20)

Data Modeling, Data Governance, & Data Quality by DATAVERSITY
Data Modeling, Data Governance, & Data QualityData Modeling, Data Governance, & Data Quality
Data Modeling, Data Governance, & Data Quality
DATAVERSITY10.4K views
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to... by DATAVERSITY
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...
DATAVERSITY18.5K views
07. Analytics & Reporting Requirements Template by Alan D. Duncan
07. Analytics & Reporting Requirements Template07. Analytics & Reporting Requirements Template
07. Analytics & Reporting Requirements Template
Alan D. Duncan67.3K views
DAS Slides: Data Quality Best Practices by DATAVERSITY
DAS Slides: Data Quality Best PracticesDAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best Practices
DATAVERSITY2.5K views
Databricks Delta Lake and Its Benefits by Databricks
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
Databricks5.1K views
Introduction to Microsoft’s Master Data Services (MDS) by James Serra
Introduction to Microsoft’s Master Data Services (MDS)Introduction to Microsoft’s Master Data Services (MDS)
Introduction to Microsoft’s Master Data Services (MDS)
James Serra50.3K views
Owning Your Own (Data) Lake House by Data Con LA
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
Data Con LA337 views
Data Architecture Strategies: Data Architecture for Digital Transformation by DATAVERSITY
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital Transformation
DATAVERSITY1.6K views
Enabling a Data Mesh Architecture and Data Sharing Culture with Denodo by Denodo
Enabling a Data Mesh Architecture and Data Sharing Culture with DenodoEnabling a Data Mesh Architecture and Data Sharing Culture with Denodo
Enabling a Data Mesh Architecture and Data Sharing Culture with Denodo
Denodo 86 views
Why an AI-Powered Data Catalog Tool is Critical to Business Success by Informatica
Why an AI-Powered Data Catalog Tool is Critical to Business SuccessWhy an AI-Powered Data Catalog Tool is Critical to Business Success
Why an AI-Powered Data Catalog Tool is Critical to Business Success
Informatica1.9K views
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka by Kai Wähner
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
Kai Wähner2K views
Master the Multi-Clustered Data Warehouse - Snowflake by Matillion
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
Matillion1.1K views
Data quality architecture by anicewick
Data quality architectureData quality architecture
Data quality architecture
anicewick9.9K views
Business Intelligence & Data Analytics– An Architected Approach by DATAVERSITY
Business Intelligence & Data Analytics– An Architected ApproachBusiness Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected Approach
DATAVERSITY590 views
Introduction SQL Analytics on Lakehouse Architecture by Databricks
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
Databricks5.8K views
Building Lakehouses on Delta Lake with SQL Analytics Primer by Databricks
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
Databricks428 views
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A... by Cathrine Wilhelmsen
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Business glossaries - The What, the Why, and the How by georgefirican
Business glossaries - The What, the Why, and the HowBusiness glossaries - The What, the Why, and the How
Business glossaries - The What, the Why, and the How
georgefirican3.1K views
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric by Cambridge Semantics
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Data Lakehouse Symposium | Day 4 by Databricks
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks1.8K views

Viewers also liked

Mobile Loyalty that works: a successful case study by Warply and Eurobank by
Mobile Loyalty that works: a successful case study by Warply and Eurobank Mobile Loyalty that works: a successful case study by Warply and Eurobank
Mobile Loyalty that works: a successful case study by Warply and Eurobank Warply
2.1K views21 slides
Data Quality by
Data QualityData Quality
Data QualityMichael Collins
5.3K views37 slides
SQL Server 2012 Certifications by
SQL Server 2012 CertificationsSQL Server 2012 Certifications
SQL Server 2012 CertificationsMarcos Freccia
2.5K views38 slides
Sql server-dba by
Sql server-dbaSql server-dba
Sql server-dbaNaviSoft
540 views4 slides
Sql Server Interview Question by
Sql Server Interview QuestionSql Server Interview Question
Sql Server Interview Questionpukal rani
527 views37 slides
Webinar On-Demand: The Power of Analytics to Drive Loyalty by
Webinar On-Demand: The Power of Analytics to Drive LoyaltyWebinar On-Demand: The Power of Analytics to Drive Loyalty
Webinar On-Demand: The Power of Analytics to Drive LoyaltyTIBCO Loyalty Lab
800 views39 slides

Viewers also liked(16)

Mobile Loyalty that works: a successful case study by Warply and Eurobank by Warply
Mobile Loyalty that works: a successful case study by Warply and Eurobank Mobile Loyalty that works: a successful case study by Warply and Eurobank
Mobile Loyalty that works: a successful case study by Warply and Eurobank
Warply2.1K views
SQL Server 2012 Certifications by Marcos Freccia
SQL Server 2012 CertificationsSQL Server 2012 Certifications
SQL Server 2012 Certifications
Marcos Freccia2.5K views
Sql server-dba by NaviSoft
Sql server-dbaSql server-dba
Sql server-dba
NaviSoft540 views
Sql Server Interview Question by pukal rani
Sql Server Interview QuestionSql Server Interview Question
Sql Server Interview Question
pukal rani527 views
Webinar On-Demand: The Power of Analytics to Drive Loyalty by TIBCO Loyalty Lab
Webinar On-Demand: The Power of Analytics to Drive LoyaltyWebinar On-Demand: The Power of Analytics to Drive Loyalty
Webinar On-Demand: The Power of Analytics to Drive Loyalty
TIBCO Loyalty Lab800 views
Sql server 2008 interview questions answers by Jitendra Gangwar
Sql server 2008 interview questions answersSql server 2008 interview questions answers
Sql server 2008 interview questions answers
Jitendra Gangwar2.4K views
The AMB Data Warehouse: A Case Study by Mark Gschwind
The AMB Data Warehouse: A Case StudyThe AMB Data Warehouse: A Case Study
The AMB Data Warehouse: A Case Study
Mark Gschwind7.8K views
Top 5 TSQL Improvements in SQL Server 2014 by Boris Hristov
Top 5 TSQL Improvements in SQL Server 2014Top 5 TSQL Improvements in SQL Server 2014
Top 5 TSQL Improvements in SQL Server 2014
Boris Hristov7.3K views
Customer Segmentation and Predictive Modeling by Angie Wang
Customer Segmentation and Predictive ModelingCustomer Segmentation and Predictive Modeling
Customer Segmentation and Predictive Modeling
Angie Wang484 views
Sql server 2012 dba online training by sqlmasters
Sql server 2012 dba online trainingSql server 2012 dba online training
Sql server 2012 dba online training
sqlmasters2.5K views
New T-SQL Features in SQL Server 2012 by Richie Rump
New T-SQL Features in SQL Server 2012 New T-SQL Features in SQL Server 2012
New T-SQL Features in SQL Server 2012
Richie Rump14.1K views
70-461 Querying Microsoft SQL Server 2012 by siphocha
70-461 Querying Microsoft SQL Server 201270-461 Querying Microsoft SQL Server 2012
70-461 Querying Microsoft SQL Server 2012
siphocha4K views
Introduction to Master Data Services in SQL Server 2012 by Stéphane Fréchette
Introduction to Master Data Services in SQL Server 2012Introduction to Master Data Services in SQL Server 2012
Introduction to Master Data Services in SQL Server 2012
Stéphane Fréchette19.3K views

Similar to Data Quality Services in SQL Server 2012

DQS & MDS in SQL Server 2016 by
DQS & MDS in SQL Server 2016DQS & MDS in SQL Server 2016
DQS & MDS in SQL Server 2016Sébastien Notebaert
9.3K views37 slides
SQL Server Integration Services – Enterprise Manageability by
SQL Server Integration Services – Enterprise ManageabilitySQL Server Integration Services – Enterprise Manageability
SQL Server Integration Services – Enterprise ManageabilityDan English
3.6K views71 slides
Hybrid Analytics in Healthcare: Leveraging Power BI and Office 365 to Make Sm... by
Hybrid Analytics in Healthcare: Leveraging Power BI and Office 365 to Make Sm...Hybrid Analytics in Healthcare: Leveraging Power BI and Office 365 to Make Sm...
Hybrid Analytics in Healthcare: Leveraging Power BI and Office 365 to Make Sm...Perficient, Inc.
3.2K views44 slides
Data Quality from Precisely by
Data Quality from PreciselyData Quality from Precisely
Data Quality from PreciselyPrecisely
36 views17 slides
Sravya(1) by
Sravya(1)Sravya(1)
Sravya(1)goutham426
497 views4 slides
Ds04 data quality by
Ds04   data qualityDs04   data quality
Ds04 data qualityDotNetCampus
440 views24 slides

Similar to Data Quality Services in SQL Server 2012(20)

SQL Server Integration Services – Enterprise Manageability by Dan English
SQL Server Integration Services – Enterprise ManageabilitySQL Server Integration Services – Enterprise Manageability
SQL Server Integration Services – Enterprise Manageability
Dan English3.6K views
Hybrid Analytics in Healthcare: Leveraging Power BI and Office 365 to Make Sm... by Perficient, Inc.
Hybrid Analytics in Healthcare: Leveraging Power BI and Office 365 to Make Sm...Hybrid Analytics in Healthcare: Leveraging Power BI and Office 365 to Make Sm...
Hybrid Analytics in Healthcare: Leveraging Power BI and Office 365 to Make Sm...
Perficient, Inc.3.2K views
Data Quality from Precisely by Precisely
Data Quality from PreciselyData Quality from Precisely
Data Quality from Precisely
Precisely36 views
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ... by DataWorks Summit
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
DataWorks Summit236 views
SQLSaturday #188 - Enterprise Information Management by Tillmann Eitelberg
SQLSaturday #188  - Enterprise Information ManagementSQLSaturday #188  - Enterprise Information Management
SQLSaturday #188 - Enterprise Information Management
Tillmann Eitelberg852 views
Krishna_IBM_Infosphere_Certified_Datastage_Consultant by Krishna Kishore
Krishna_IBM_Infosphere_Certified_Datastage_Consultant Krishna_IBM_Infosphere_Certified_Datastage_Consultant
Krishna_IBM_Infosphere_Certified_Datastage_Consultant
Krishna Kishore390 views
Marlabs Capabilities Overview: Microsoft SharePoint Services by Marlabs
Marlabs Capabilities Overview: Microsoft SharePoint Services Marlabs Capabilities Overview: Microsoft SharePoint Services
Marlabs Capabilities Overview: Microsoft SharePoint Services
Marlabs606 views
Bi Resume Ejd by EJDonavan
Bi Resume EjdBi Resume Ejd
Bi Resume Ejd
EJDonavan1.3K views

More from Stéphane Fréchette

Back to the future - Temporal Table in SQL Server 2016 by
Back to the future - Temporal Table in SQL Server 2016Back to the future - Temporal Table in SQL Server 2016
Back to the future - Temporal Table in SQL Server 2016Stéphane Fréchette
4.8K views16 slides
Self-Service Data Integration with Power Query - SQLSaturday #364 Boston by
Self-Service Data Integration with Power Query - SQLSaturday #364 Boston  Self-Service Data Integration with Power Query - SQLSaturday #364 Boston
Self-Service Data Integration with Power Query - SQLSaturday #364 Boston Stéphane Fréchette
1.4K views24 slides
Power BI - Bring your data together by
Power BI - Bring your data togetherPower BI - Bring your data together
Power BI - Bring your data togetherStéphane Fréchette
1.9K views28 slides
Data Analytics with R and SQL Server by
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL ServerStéphane Fréchette
5.9K views30 slides
Self-Service Data Integration with Power Query by
Self-Service Data Integration with Power QuerySelf-Service Data Integration with Power Query
Self-Service Data Integration with Power QueryStéphane Fréchette
2.5K views24 slides
Introduction to Azure HDInsight by
Introduction to Azure HDInsightIntroduction to Azure HDInsight
Introduction to Azure HDInsightStéphane Fréchette
3.2K views29 slides

More from Stéphane Fréchette(17)

Back to the future - Temporal Table in SQL Server 2016 by Stéphane Fréchette
Back to the future - Temporal Table in SQL Server 2016Back to the future - Temporal Table in SQL Server 2016
Back to the future - Temporal Table in SQL Server 2016
Self-Service Data Integration with Power Query - SQLSaturday #364 Boston by Stéphane Fréchette
Self-Service Data Integration with Power Query - SQLSaturday #364 Boston  Self-Service Data Integration with Power Query - SQLSaturday #364 Boston
Self-Service Data Integration with Power Query - SQLSaturday #364 Boston
Graph Databases for SQL Server Professionals - SQLSaturday #350 Winnipeg by Stéphane Fréchette
Graph Databases for SQL Server Professionals - SQLSaturday #350 WinnipegGraph Databases for SQL Server Professionals - SQLSaturday #350 Winnipeg
Graph Databases for SQL Server Professionals - SQLSaturday #350 Winnipeg
On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...) by Stéphane Fréchette
On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)
On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)

Recently uploaded

Democratising digital commerce in India-Report by
Democratising digital commerce in India-ReportDemocratising digital commerce in India-Report
Democratising digital commerce in India-ReportKapil Khandelwal (KK)
15 views161 slides
Five Things You SHOULD Know About Postman by
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About PostmanPostman
33 views43 slides
PharoJS - Zürich Smalltalk Group Meetup November 2023 by
PharoJS - Zürich Smalltalk Group Meetup November 2023PharoJS - Zürich Smalltalk Group Meetup November 2023
PharoJS - Zürich Smalltalk Group Meetup November 2023Noury Bouraqadi
127 views17 slides
Zero to Automated in Under a Year by
Zero to Automated in Under a YearZero to Automated in Under a Year
Zero to Automated in Under a YearNetwork Automation Forum
15 views23 slides
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ... by
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...Jasper Oosterveld
18 views49 slides
Kyo - Functional Scala 2023.pdf by
Kyo - Functional Scala 2023.pdfKyo - Functional Scala 2023.pdf
Kyo - Functional Scala 2023.pdfFlavio W. Brasil
368 views92 slides

Recently uploaded(20)

Five Things You SHOULD Know About Postman by Postman
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About Postman
Postman33 views
PharoJS - Zürich Smalltalk Group Meetup November 2023 by Noury Bouraqadi
PharoJS - Zürich Smalltalk Group Meetup November 2023PharoJS - Zürich Smalltalk Group Meetup November 2023
PharoJS - Zürich Smalltalk Group Meetup November 2023
Noury Bouraqadi127 views
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ... by Jasper Oosterveld
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf by Dr. Jimmy Schwarzkopf
STKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdfSTKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdf
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf
Transcript: The Details of Description Techniques tips and tangents on altern... by BookNet Canada
Transcript: The Details of Description Techniques tips and tangents on altern...Transcript: The Details of Description Techniques tips and tangents on altern...
Transcript: The Details of Description Techniques tips and tangents on altern...
BookNet Canada136 views
Piloting & Scaling Successfully With Microsoft Viva by Richard Harbridge
Piloting & Scaling Successfully With Microsoft VivaPiloting & Scaling Successfully With Microsoft Viva
Piloting & Scaling Successfully With Microsoft Viva
STPI OctaNE CoE Brochure.pdf by madhurjyapb
STPI OctaNE CoE Brochure.pdfSTPI OctaNE CoE Brochure.pdf
STPI OctaNE CoE Brochure.pdf
madhurjyapb14 views
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院 by IttrainingIttraining
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
Attacking IoT Devices from a Web Perspective - Linux Day by Simone Onofri
Attacking IoT Devices from a Web Perspective - Linux Day Attacking IoT Devices from a Web Perspective - Linux Day
Attacking IoT Devices from a Web Perspective - Linux Day
Simone Onofri16 views
Empathic Computing: Delivering the Potential of the Metaverse by Mark Billinghurst
Empathic Computing: Delivering  the Potential of the MetaverseEmpathic Computing: Delivering  the Potential of the Metaverse
Empathic Computing: Delivering the Potential of the Metaverse
Mark Billinghurst478 views
SAP Automation Using Bar Code and FIORI.pdf by Virendra Rai, PMP
SAP Automation Using Bar Code and FIORI.pdfSAP Automation Using Bar Code and FIORI.pdf
SAP Automation Using Bar Code and FIORI.pdf
Special_edition_innovator_2023.pdf by WillDavies22
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdf
WillDavies2217 views

Data Quality Services in SQL Server 2012

  • 1. Data Quality Services in SQL Server 2012 (An Introduction) Stéphane Fréchette Friday April 26, 2013 Matching Cleansing DQS
  • 2. Who am I? My name is Stéphane Fréchette I’m a Database & Business Intelligence Professional and CEO | Founder of I have a passion for architecting, designing and building solutions that matter. Self proclaimed Open Data Hacker/Advocate I founded Gatineau Ouverte a citizen led initiative which aims to promote open access to civic data of the city of Gatineau. Twitter: @sfrechette Email: stephanefrechette@ukubu.com Blog: stephanefrechette.com
  • 3. Session Outline • Microsoft Business Intelligence (The Stack) • Dirty Data… • SQL Server Data Quality Services (DQS) • Data Steward • Knowledge Base and Domains • Data Quality Projects • Data Cleansing Transform – SSIS • DQS (Install & Architecture) • Enterprise Information Management (EMI) • Resources
  • 4. Analysis Services Reporting Services Integration Services Master Data Services SharePoint Collaboration Excel Workbooks PowerPivot Applications SharePoint Dashboards & Scorecards Data Quality Services OData Feeds Line of Business Applications Hadoop Big Data Microsoft Business Intelligence
  • 5. Dirty Data… Do you have dirty data? (all projects have it! Its inevitable)
  • 6. Dirty Data… Causes? Bad data entry Poor Data Governance Duplicate entities in different LOB systems
  • 7. Sample Data Representation • Prospect in CRM System: Mark Smith | 613.111-1234 | Ottawa | ON | K1P 1K1 • Prospect buys goods now entered in POS System: Markus Smith | 1234 Stilton Ave | Kanata |ON | K1P 1K1 • Record also entered into Accounting System: Markus Smith | 1234 Stilton Avenue | Kanata | ON | K1P 1K1 ETL process imports these records into the Data Warehouse / Data Mart FirstName LastName Phone Address City Province PostalCode Mark Smith 613.111-1234 Ottawa ON K1P 1K1 Markus Smith 1234 Stilton Ave Kanata ON K1P 1K1 Markus Smith 1234 Stilton Avenue Kanata ON K1P 1K1
  • 8. Sample Data Representation • Duplicate records and inaccurate, incomplete data • What we want is a golden record (one version of the truth) FirstName LastName Phone Address City Province PostalCode Mark Smith 613.111-1234 Ottawa ON K1P 1K1 Markus Smith 1234 Stilton Ave Kanata ON K1P 1K1 Markus Smith 1234 Stilton Avenue Kanata ON K1P 1K1 FirstName LastName Phone Address City Province PostalCode Markus Smith 613-111-1234 1234 Stilton Ave Kanata ON K1P 1K1
  • 9. SQL Server Data Quality Services (DQS) • New in SQL Server 2012 • Enables cleansing, matching, standardizing and enriching data • Delivers trusted information for business intelligence, data warehouse, transaction processing workloads • Knowledge-Driven Solution (create/edit) • A knowledge management process that builds the knowledge base • A data quality project that proposes changes to source data based on the knowledge in the knowledge base (cleansing and matching) • A key component to an Enterprise Information Management (EIM) solution
  • 10. Answering the Need with DQS • DQS enables to resolve issues involving incompleteness, lack of conformity, inconsistency, inaccuracy, invalidity, and data duplication • Provides the following features to resolve data quality issues:  Data Cleansing  Matching  Reference Data Services  Profiling  Monitoring  Knowledge Base
  • 11. Data Steward • Key role - Is usually a Business User and not from the Information Technology side • Nutshell: Responsible for maintaining data elements in a metadata registry… • Data Steward -> DQS Client • Create and edit Knowledge Bases • Run and process data though continually, iteratively, improving the Knowledge Bases • Knowledge Bases can be consumed and used by other Data Stewards and IT (SSIS / ETL Developers) DQS Data Steward MDS Data Steward SSIS Developer Matching Cleansing
  • 12. Knowledge Bases and Domains The knowledge base is a repo of knowledge about your data that enables you to understand your data and maintain its integrity. • Processes: • Computer-assisted • Interactive • Components: • Knowledge Discovery • Domain Management • Reference Data Services • Matching Policy
  • 14. Data Quality Projects Improve quality of source data by performing data cleansing and data matching activities using defined knowledge bases • Cleansing Activity (2 step process) • Computer-assisted : data is categorized (suggested, new, invalid, corrected, and correct) • Interactive: data steward to approve, reject, or modify the proposed results from the computer-assisted cleansing process • Matching Activity • Using existing knowledge base matching policy • Prevent and remove data duplication • Data Profiling and Notifications • Profiling provides data quality stats and info: completeness and accuracy • Notification on actions that can be taken to enhance operations
  • 16. DQS Cleansing Transform in SSIS • When you want to automate the cleansing and matching process and not use the DQS Client • Use SSIS for batch data cleansing • Matching can be done with Master Data Services (MDS) • SSIS can be leveraged to bring DQS and MDS together *DQS does not expose matching functionality for SSIS, but you can use Fuzzy Grouping Transform to identify duplicate data *Cleansing Transform is single threaded – use multiple transform for parallelism
  • 17. Demo Data Cleansing Transform (Automating the Cleansing and Matching using SSIS)
  • 18. Installing DQS • Requires Business Intelligence or Enterprise/Developer version of SQL Server 2012 • During SQL Server setup; • Instance Features -> Data Quality Services • Shared Features -> Data Quality Client • Execute the Data Quality Server Installer; • C:Program FilesMicrosoft SQL ServerMSSQL11.MSSQLSERVERMSSQLBinnDQSInstaller.exe • Data Quality Service – Data Quality Server Installer (Apps - Microsoft SQL Server 2012)
  • 19. DQS Architecture DQS Server DQS Catalog (3 databases) • DQS_MAIN (Knowledge Bases) • DQS_PROJECTS (Projects) • DQS_STAGING_DATA (Sandbox, scratch pad area) Security – Database Roles • dqs_administrator • dqs_kb_editor • dqs_kb_operator
  • 20. Windows Azure Marketplace Reference Data Services -> validating, cleansing and enriching your data
  • 21. Performance considerations - FYI • Major performance improvements from RTM to CU1 release of SQL Server 2012 (strongly recommend patching and upgrading) http://bit.ly/11eEhHC • Must read -> DQS Performance Best Practice Guide http://bit.ly/16Gwenl • Understand data volumes and hardware requirements… plan wisely!
  • 22. Enterprise Information Management (EIM) The EIM Stack as a whole is the ‘Master Data Management’ solution from Microsoft and consist of the following: • SQL Server Data Quality Services (DQS) - Capture and record knowledge, rules, and actions • SQL Server Master Data Services (MDS) - Master Data Management repository, Dimension data • SQL Server Integration Services (SSIS) – Moves data, integration Enterprise Information Management (EMI) ‘Master Data Management’
  • 23. Resources • Data Quality Services Team Blog (MSDN) http://bit.ly/WCI2nO • SQL Server Data Quality Services (TechNet) http://bit.ly/ZaUO8k • DQS Performance Best Practices Guide http://bit.ly/16Gwenl • Enterprise Information Management (EIM) Bringing Together SSIS, DQS, and MDS (Video – Channel 9) http://bit.ly/NJXvKr • Matt Masson – Getting Started with DQS and MDS http://bit.ly/149Ga9n • Paras Doshi’s – Blog (DQS) http://bit.ly/YoLthh
  • 24. What Questions Do You Have?
  • 25. Thank You For attending this session