3. Agenda
• Intro
– Data Quality – what it is about ?
– Data Quality in Business Intelligence projects
– Tools selection
• Data Quality Services
– Structure
– Project component
– Data Quality routine
• Conclusions
7. Data Quality: What is it?
Business intelligence (BI) is a set of
methodologies, processes, and technologies
that transform raw data into meaningful and
useful information for business purposes.
Data Quality – represents the degree to which
Data is suitable for business usages
8. Data Quality: Tools selection
PROS Custom
Tools
• Variety of technologies
• Flexibility
• Accuracy
CONS
• Higher Competence level in
business area / tech. stack
• Lots of development efforts
rd-party
PROS 3
software
• Established methods, standards,
algorithms
• Open / Expandable / Reusable
• Lower entry level for newcomers
CONS
•
•
Scalability / performance issues
Limitations
9. Gartner Magic Quadrant for BI platforms
ABILITY TO EXECUTE
CHALLENGERS
NICHE PLAYERS
COMPLETENESS OF VISION
LEADERS
VISIONARIES
10. Data Quality: tasks
Data Quality Services (DQS) is a Knowledge-Driven data quality
solution enabling data stewards to easily improve the quality
of their data
Cleansing
Matching
Profiling
Monitoring
12. DQS Structure
Azure Market Place
DQ Clients
DQ Server
RD Services API
(Browse, Set,
Validate…)
3rd Party
/ Internal
Reference Data API
(Browse, Get,
Update…)
DQ Engine
DQS User
Interface
SSIS DQ
Component
Knowledge
Discovery
DQ Projects Store
DQ Active
Projects
Data
Profiling &
Exploration
Cleansing
Matching
Common Knowledge Store
Data Domains
Reference
Data
Knowledge Base Store
Published
KBs
15. Business Case – Source Data Quality Assurance
Source
Data
Screen
Confirm
Load
Oracle
DB2
DQS
csv
DQ
Reports
KDVH
16. How DQS could help QA Engineer ?
• In general it allows to bring closer things Data
Analytics usually deal with
• Helps to understand underlaying data better
• Introduce measurement and manageability to DQ
matters
• Increase re-use/decrease re-work
• Open and extendable proposal of new standard to
store and treat Knowledge Bases on iterative basis