SlideShare a Scribd company logo
1 of 4
Download to read offline
Data Quality
View Point
2Maveric Systems
Overview
Quality of data for business operations is considered to be a critical component of enterprise success. With the
exponential rise in ways and means by which data is generated and consumed, organizations are more and more
focusing on ensuring data quality. Studies indicate that fewer than 50% of IT decision makers have confidence in
their organization’s data quality initiatives, although more than 90% acknowledge the growing importance and
volumes of data that they have to grapple with in future.
This whitepaper aims to throw light on the importance of data quality and means to assure quality of data, focusing
on key facets and tools. Ensuring data quality in today’s environment of Big data is not only necessary but also
critical to an organization’s survival.
Data Quality
Data Quality Framework
A robust framework that can address data quality needs to look at the following 6 focus areas.
Profile: Analyze the data to ensure conformance to data quality dimensions outlined below
 Completeness: Field level (e.g., missing data) or data level (e.g., missing line item for an order) check to ensure
all requisite details are available. This will ensure that the key fields are populated with relevant data
 Conformity: Validation to ensure whether the data conforms to the required formats i.e. whether values are
correlating with the attributes and structure of the target field
 Consistency: Check to ensure values in one data set are consistent with the same values in another data set. This
can be within table (e.g., inconsistency between Title and Gender columns) or across tables (e.g., aggregates and
parent/child relationship)
 Accuracy: Aim to ensure the data represents its intended purpose/requirement. Data inaccuracy may happen
due to typo, using acronyms or untimely arrival of data
 Duplication: Check carried out at field level or record level to ensure same data set is not stored more than once
within a table
3Maveric SystemsData Quality
Cleanup: Revamp existing data to improve data quality by using the following methods:
 Lookup: Identify the clean data set and use it as lookup to cleanup the data. This may be within a table (e.g.,
Gender column having values like M, Male, Mr, etc.) or by looking up value in another table (master table or a
manually created lookup table). We may also use this method for cleaning up missing values (e.g., populate
Gender column based on Title or vice versa)
 Auto/Manual: Create scripts based on requirements to fix the data issues. However, certain data issues cannot
be fixed through any of the methods e.g., mandatory columns are missing, but that data is stored in a free text
column, say ‘Comments’ column
Standardize: Regulate the data to improve accuracy through the following:
 Geo-coding: Adopt various measures to ensure conformance to geographical addresses like standardizing
address columns to meet postal rules (e.g., USPS for US addresses) and getting and storing the Latitude &
Longitude of the physical location using the standardized address
 Matching/Linking: At times, we may be able to get all details about a field from multiple data sources or notice
duplicate data that may not be exactly matching but closely relating to an existing value. To address these issues
and create accurate data, we need to create business logic that enables matching of such data and removing
anomalies
Monitor: Establish systems or business processes to track quality of incoming data. Some of these tracking
mechanisms include the following.
 Business rules: Create quality checks based on business rules (e.g., daily transaction limit cannot exceed a
specific amount per account) and execute the checks on a periodic basis in line with requirements and raise alert
when a check fails
 Trending/Threshold: Analyze data and identify the trend of data (e.g., customer creation on a daily basis should
be between ‘x’ and ‘y’, sum of all loan amounts disbursed on a day should be between ‘x’ and ‘y’). Quality checks
can then be created using the analysis and run periodically based on requirement. The boundary values for
analyzing the trends also need to be changed periodically based on the patterns observed
Fix: Clean up existing data to improve quality. This needs to be carried out in a manual way by analyzing for each
data quality fail alert, the data set and identifying the root cause (it can be missing data, data duplication, data
corruption, business logic not implemented correctly, etc.). Further, it also needs to be checked upfront if the alert
that has been given is fake or requires attention.
Report: Build a reporting system to have a better understanding of the quality of data using the following means:
 Dashboard: Provides details of the quality checks that are being run, execution outcome details, and issues
identified and resolved, etc. in addition to the current status, they also provide information on trends over a
period of time
 Email: Email based alert system should be established to provide the results of each data quality check
execution. The system should have the ability to be configured at various levels
ABOUT MAVERIC
Started in 2000, Maveric Systems is a leading provider of IT Lifecycle Assurance with expertise across
requirements to release. With a strong focus on the Banking and Telecom sectors, Maveric has built a business on
the principles of deep domain expertise and innovation. Maveric’s client portfolio includes a wide array of
renowned banks, financial institutions, insurance companies, leading software product companies and telecom
companies.
Maveric Systems Limited (Corporate Office): “Lords Tower” Block 1, 2nd Floor, Plot No 1&2 NP, Jawaharlal Nehru
Road, Thiru Vi Ka Industrial Estate, Ekkatuthangal, Chennai 600032
India | Singapore | Saudi Arabia | UAE | UK | USA
Write to us at info@maveric-systems.com | www.maveric-systems.com
Data Quality Maturity Model & Tools
Using the data quality framework, organizations can move towards implementing and executing initiatives that have
a profound effect on the quality of data. However, organizations need to be cognizant of the state of data quality
maturity they are in currently and take steps to move towards higher maturity levels. The maturity model outlined
below provides a snap shot of the evolution of various data quality maturity initiatives.
An organization needs to move towards the final stage of achieving the managed and optimizing level in terms of
quality of data. A plethora of tools, manual approaches and customized solutions are available to help firms move
from one maturity stage to another. These include profiling and checking tools like SQL scripting, UI and dashboard
formats like HTML, CSS and PHP, and various third party end to end solutions like Informatica data quality, IBM
quality stage, Talend data quality and Open source data quality.
Conclusion
Ascertaining and ensuring the quality of data is paramount for business success. With personalized customer
outreach mechanisms increasing in prominence, dependence on data to understand customer preferences and
buying behavior has become more important than ever. Add to it, exponential increase in the amount of data
generated across multiple devices and access mechanisms. The importance of data quality cannot be more
emphasized. The framework and maturity model provided in this article will only serve as a mode to move towards
being a more data quality intensive organization. Understanding the importance of data quality, analyzing the level
of data quality maturity currently in and taking steps to move towards the highest level are all dependent on the
individual organization. Businesses who understand this and move on the path to execution are the ones to succeed
in today’s highly competitive landscape.

More Related Content

What's hot

Geek Sync I Agile Data Management vs. Agile Data Modeling
Geek Sync I Agile Data Management vs. Agile Data ModelingGeek Sync I Agile Data Management vs. Agile Data Modeling
Geek Sync I Agile Data Management vs. Agile Data ModelingIDERA Software
 
Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...
Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...
Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...DATAVERSITY
 
Enterprise Data Management
Enterprise Data ManagementEnterprise Data Management
Enterprise Data ManagementBhavendra Chavan
 
DAMA Australia: How to Choose a Data Management Tool
DAMA Australia: How to Choose a Data Management ToolDAMA Australia: How to Choose a Data Management Tool
DAMA Australia: How to Choose a Data Management ToolPrecisely
 
Establishing a Strategy for Data Quality
Establishing a Strategy for Data QualityEstablishing a Strategy for Data Quality
Establishing a Strategy for Data QualityDatabase Answers Ltd.
 
Foundation of data quality
Foundation of data qualityFoundation of data quality
Foundation of data qualityKhaled Mosharraf
 
How to Build Data Governance Programs That Last: A Business-First Approach
How to Build Data Governance Programs That Last: A Business-First ApproachHow to Build Data Governance Programs That Last: A Business-First Approach
How to Build Data Governance Programs That Last: A Business-First ApproachPrecisely
 
How to get data lineage right
How to get data lineage rightHow to get data lineage right
How to get data lineage rightLeigh Hill
 
Эволюция Big Data и Information Management. Reference Architecture.
Эволюция Big Data и Information Management. Reference Architecture.Эволюция Big Data и Information Management. Reference Architecture.
Эволюция Big Data и Information Management. Reference Architecture.Andrey Akulov
 
Data Quality
Data QualityData Quality
Data QualityVijaya K
 
Data quality and data profiling
Data quality and data profilingData quality and data profiling
Data quality and data profilingShailja Khurana
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityPrecisely
 
TargetStateFutureArchitect - DV
TargetStateFutureArchitect - DVTargetStateFutureArchitect - DV
TargetStateFutureArchitect - DVBhavendra Chavan
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?Precisely
 
Data quality management Basic
Data quality management BasicData quality management Basic
Data quality management BasicKhaled Mosharraf
 
Balancing Data and Processes to Achieve Organizational Maturity
Balancing Data and Processes to Achieve Organizational MaturityBalancing Data and Processes to Achieve Organizational Maturity
Balancing Data and Processes to Achieve Organizational MaturityDATAVERSITY
 
Geek Sync I Does Data Modeling Have Business Value?
Geek Sync I Does Data Modeling Have Business Value?Geek Sync I Does Data Modeling Have Business Value?
Geek Sync I Does Data Modeling Have Business Value?IDERA Software
 
Asug Gov Sig Data Quality Metrics Report Sapphire 2008
Asug Gov Sig Data Quality Metrics Report Sapphire 2008Asug Gov Sig Data Quality Metrics Report Sapphire 2008
Asug Gov Sig Data Quality Metrics Report Sapphire 2008nnorthrup
 

What's hot (20)

Geek Sync I Agile Data Management vs. Agile Data Modeling
Geek Sync I Agile Data Management vs. Agile Data ModelingGeek Sync I Agile Data Management vs. Agile Data Modeling
Geek Sync I Agile Data Management vs. Agile Data Modeling
 
Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...
Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...
Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...
 
Enterprise Data Management
Enterprise Data ManagementEnterprise Data Management
Enterprise Data Management
 
DAMA Australia: How to Choose a Data Management Tool
DAMA Australia: How to Choose a Data Management ToolDAMA Australia: How to Choose a Data Management Tool
DAMA Australia: How to Choose a Data Management Tool
 
Establishing a Strategy for Data Quality
Establishing a Strategy for Data QualityEstablishing a Strategy for Data Quality
Establishing a Strategy for Data Quality
 
Foundation of data quality
Foundation of data qualityFoundation of data quality
Foundation of data quality
 
How to Build Data Governance Programs That Last: A Business-First Approach
How to Build Data Governance Programs That Last: A Business-First ApproachHow to Build Data Governance Programs That Last: A Business-First Approach
How to Build Data Governance Programs That Last: A Business-First Approach
 
How to get data lineage right
How to get data lineage rightHow to get data lineage right
How to get data lineage right
 
Strategy For Data Quality
Strategy For Data QualityStrategy For Data Quality
Strategy For Data Quality
 
Эволюция Big Data и Information Management. Reference Architecture.
Эволюция Big Data и Information Management. Reference Architecture.Эволюция Big Data и Information Management. Reference Architecture.
Эволюция Big Data и Information Management. Reference Architecture.
 
Data Quality
Data QualityData Quality
Data Quality
 
Data quality and data profiling
Data quality and data profilingData quality and data profiling
Data quality and data profiling
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data Quality
 
TargetStateFutureArchitect - DV
TargetStateFutureArchitect - DVTargetStateFutureArchitect - DV
TargetStateFutureArchitect - DV
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?
 
Data quality management Basic
Data quality management BasicData quality management Basic
Data quality management Basic
 
Data Quality Presentation
Data Quality PresentationData Quality Presentation
Data Quality Presentation
 
Balancing Data and Processes to Achieve Organizational Maturity
Balancing Data and Processes to Achieve Organizational MaturityBalancing Data and Processes to Achieve Organizational Maturity
Balancing Data and Processes to Achieve Organizational Maturity
 
Geek Sync I Does Data Modeling Have Business Value?
Geek Sync I Does Data Modeling Have Business Value?Geek Sync I Does Data Modeling Have Business Value?
Geek Sync I Does Data Modeling Have Business Value?
 
Asug Gov Sig Data Quality Metrics Report Sapphire 2008
Asug Gov Sig Data Quality Metrics Report Sapphire 2008Asug Gov Sig Data Quality Metrics Report Sapphire 2008
Asug Gov Sig Data Quality Metrics Report Sapphire 2008
 

Similar to Data quality

What is Data Observability.pdf
What is Data Observability.pdfWhat is Data Observability.pdf
What is Data Observability.pdf4dalert
 
Data quality testing – a quick checklist to measure and improve data quality
Data quality testing – a quick checklist to measure and improve data qualityData quality testing – a quick checklist to measure and improve data quality
Data quality testing – a quick checklist to measure and improve data qualityJaveriaGauhar
 
Data quality and bi
Data quality and biData quality and bi
Data quality and bijeffd00
 
Making Data Quality a Way of Life
Making Data Quality a Way of LifeMaking Data Quality a Way of Life
Making Data Quality a Way of LifeCognizant
 
State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023RTTS
 
Data quality management system
Data quality management systemData quality management system
Data quality management systemselinasimpson361
 
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipelineQlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipelineSrikanth Sharma Boddupalli
 
Top 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdfTop 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdfShaikSikindar1
 
Quality management best practices
Quality management best practicesQuality management best practices
Quality management best practicesselinasimpson2201
 
Data quality management best practices
Data quality management best practicesData quality management best practices
Data quality management best practicesselinasimpson2201
 
Data Quality_ the holy grail for a Data Fluent Organization.pptx
Data Quality_ the holy grail for a Data Fluent Organization.pptxData Quality_ the holy grail for a Data Fluent Organization.pptx
Data Quality_ the holy grail for a Data Fluent Organization.pptxBalvinder Hira
 
SDM Presentation V1.0
SDM Presentation V1.0SDM Presentation V1.0
SDM Presentation V1.0KirSinc
 
5 Best Practices of Effective Data Quality Management
5 Best Practices of Effective Data Quality Management5 Best Practices of Effective Data Quality Management
5 Best Practices of Effective Data Quality ManagementData Entry India Outsource
 
How Data.com Drives Data Quality
How Data.com Drives Data QualityHow Data.com Drives Data Quality
How Data.com Drives Data QualitySalesforce Partners
 
DDMA / T-Mobile: Datakwaliteit
DDMA / T-Mobile: DatakwaliteitDDMA / T-Mobile: Datakwaliteit
DDMA / T-Mobile: DatakwaliteitDDMA
 
A G S004 Smith 091707
A G S004  Smith 091707A G S004  Smith 091707
A G S004 Smith 091707Dreamforce07
 
Data Quality Management: Cleaner Data, Better Reporting
Data Quality Management: Cleaner Data, Better ReportingData Quality Management: Cleaner Data, Better Reporting
Data Quality Management: Cleaner Data, Better Reportingaccenture
 
Developing A Universal Approach to Cleansing Customer and Product Data
Developing A Universal Approach to Cleansing Customer and Product DataDeveloping A Universal Approach to Cleansing Customer and Product Data
Developing A Universal Approach to Cleansing Customer and Product DataFindWhitePapers
 

Similar to Data quality (20)

Lecture 01 mis
Lecture 01 misLecture 01 mis
Lecture 01 mis
 
What is Data Observability.pdf
What is Data Observability.pdfWhat is Data Observability.pdf
What is Data Observability.pdf
 
Data quality testing – a quick checklist to measure and improve data quality
Data quality testing – a quick checklist to measure and improve data qualityData quality testing – a quick checklist to measure and improve data quality
Data quality testing – a quick checklist to measure and improve data quality
 
Data quality and bi
Data quality and biData quality and bi
Data quality and bi
 
Data quality management
Data quality managementData quality management
Data quality management
 
Making Data Quality a Way of Life
Making Data Quality a Way of LifeMaking Data Quality a Way of Life
Making Data Quality a Way of Life
 
State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023
 
Data quality management system
Data quality management systemData quality management system
Data quality management system
 
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipelineQlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
 
Top 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdfTop 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdf
 
Quality management best practices
Quality management best practicesQuality management best practices
Quality management best practices
 
Data quality management best practices
Data quality management best practicesData quality management best practices
Data quality management best practices
 
Data Quality_ the holy grail for a Data Fluent Organization.pptx
Data Quality_ the holy grail for a Data Fluent Organization.pptxData Quality_ the holy grail for a Data Fluent Organization.pptx
Data Quality_ the holy grail for a Data Fluent Organization.pptx
 
SDM Presentation V1.0
SDM Presentation V1.0SDM Presentation V1.0
SDM Presentation V1.0
 
5 Best Practices of Effective Data Quality Management
5 Best Practices of Effective Data Quality Management5 Best Practices of Effective Data Quality Management
5 Best Practices of Effective Data Quality Management
 
How Data.com Drives Data Quality
How Data.com Drives Data QualityHow Data.com Drives Data Quality
How Data.com Drives Data Quality
 
DDMA / T-Mobile: Datakwaliteit
DDMA / T-Mobile: DatakwaliteitDDMA / T-Mobile: Datakwaliteit
DDMA / T-Mobile: Datakwaliteit
 
A G S004 Smith 091707
A G S004  Smith 091707A G S004  Smith 091707
A G S004 Smith 091707
 
Data Quality Management: Cleaner Data, Better Reporting
Data Quality Management: Cleaner Data, Better ReportingData Quality Management: Cleaner Data, Better Reporting
Data Quality Management: Cleaner Data, Better Reporting
 
Developing A Universal Approach to Cleansing Customer and Product Data
Developing A Universal Approach to Cleansing Customer and Product DataDeveloping A Universal Approach to Cleansing Customer and Product Data
Developing A Universal Approach to Cleansing Customer and Product Data
 

Recently uploaded

Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 

Recently uploaded (20)

2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 

Data quality

  • 2. 2Maveric Systems Overview Quality of data for business operations is considered to be a critical component of enterprise success. With the exponential rise in ways and means by which data is generated and consumed, organizations are more and more focusing on ensuring data quality. Studies indicate that fewer than 50% of IT decision makers have confidence in their organization’s data quality initiatives, although more than 90% acknowledge the growing importance and volumes of data that they have to grapple with in future. This whitepaper aims to throw light on the importance of data quality and means to assure quality of data, focusing on key facets and tools. Ensuring data quality in today’s environment of Big data is not only necessary but also critical to an organization’s survival. Data Quality Data Quality Framework A robust framework that can address data quality needs to look at the following 6 focus areas. Profile: Analyze the data to ensure conformance to data quality dimensions outlined below  Completeness: Field level (e.g., missing data) or data level (e.g., missing line item for an order) check to ensure all requisite details are available. This will ensure that the key fields are populated with relevant data  Conformity: Validation to ensure whether the data conforms to the required formats i.e. whether values are correlating with the attributes and structure of the target field  Consistency: Check to ensure values in one data set are consistent with the same values in another data set. This can be within table (e.g., inconsistency between Title and Gender columns) or across tables (e.g., aggregates and parent/child relationship)  Accuracy: Aim to ensure the data represents its intended purpose/requirement. Data inaccuracy may happen due to typo, using acronyms or untimely arrival of data  Duplication: Check carried out at field level or record level to ensure same data set is not stored more than once within a table
  • 3. 3Maveric SystemsData Quality Cleanup: Revamp existing data to improve data quality by using the following methods:  Lookup: Identify the clean data set and use it as lookup to cleanup the data. This may be within a table (e.g., Gender column having values like M, Male, Mr, etc.) or by looking up value in another table (master table or a manually created lookup table). We may also use this method for cleaning up missing values (e.g., populate Gender column based on Title or vice versa)  Auto/Manual: Create scripts based on requirements to fix the data issues. However, certain data issues cannot be fixed through any of the methods e.g., mandatory columns are missing, but that data is stored in a free text column, say ‘Comments’ column Standardize: Regulate the data to improve accuracy through the following:  Geo-coding: Adopt various measures to ensure conformance to geographical addresses like standardizing address columns to meet postal rules (e.g., USPS for US addresses) and getting and storing the Latitude & Longitude of the physical location using the standardized address  Matching/Linking: At times, we may be able to get all details about a field from multiple data sources or notice duplicate data that may not be exactly matching but closely relating to an existing value. To address these issues and create accurate data, we need to create business logic that enables matching of such data and removing anomalies Monitor: Establish systems or business processes to track quality of incoming data. Some of these tracking mechanisms include the following.  Business rules: Create quality checks based on business rules (e.g., daily transaction limit cannot exceed a specific amount per account) and execute the checks on a periodic basis in line with requirements and raise alert when a check fails  Trending/Threshold: Analyze data and identify the trend of data (e.g., customer creation on a daily basis should be between ‘x’ and ‘y’, sum of all loan amounts disbursed on a day should be between ‘x’ and ‘y’). Quality checks can then be created using the analysis and run periodically based on requirement. The boundary values for analyzing the trends also need to be changed periodically based on the patterns observed Fix: Clean up existing data to improve quality. This needs to be carried out in a manual way by analyzing for each data quality fail alert, the data set and identifying the root cause (it can be missing data, data duplication, data corruption, business logic not implemented correctly, etc.). Further, it also needs to be checked upfront if the alert that has been given is fake or requires attention. Report: Build a reporting system to have a better understanding of the quality of data using the following means:  Dashboard: Provides details of the quality checks that are being run, execution outcome details, and issues identified and resolved, etc. in addition to the current status, they also provide information on trends over a period of time  Email: Email based alert system should be established to provide the results of each data quality check execution. The system should have the ability to be configured at various levels
  • 4. ABOUT MAVERIC Started in 2000, Maveric Systems is a leading provider of IT Lifecycle Assurance with expertise across requirements to release. With a strong focus on the Banking and Telecom sectors, Maveric has built a business on the principles of deep domain expertise and innovation. Maveric’s client portfolio includes a wide array of renowned banks, financial institutions, insurance companies, leading software product companies and telecom companies. Maveric Systems Limited (Corporate Office): “Lords Tower” Block 1, 2nd Floor, Plot No 1&2 NP, Jawaharlal Nehru Road, Thiru Vi Ka Industrial Estate, Ekkatuthangal, Chennai 600032 India | Singapore | Saudi Arabia | UAE | UK | USA Write to us at info@maveric-systems.com | www.maveric-systems.com Data Quality Maturity Model & Tools Using the data quality framework, organizations can move towards implementing and executing initiatives that have a profound effect on the quality of data. However, organizations need to be cognizant of the state of data quality maturity they are in currently and take steps to move towards higher maturity levels. The maturity model outlined below provides a snap shot of the evolution of various data quality maturity initiatives. An organization needs to move towards the final stage of achieving the managed and optimizing level in terms of quality of data. A plethora of tools, manual approaches and customized solutions are available to help firms move from one maturity stage to another. These include profiling and checking tools like SQL scripting, UI and dashboard formats like HTML, CSS and PHP, and various third party end to end solutions like Informatica data quality, IBM quality stage, Talend data quality and Open source data quality. Conclusion Ascertaining and ensuring the quality of data is paramount for business success. With personalized customer outreach mechanisms increasing in prominence, dependence on data to understand customer preferences and buying behavior has become more important than ever. Add to it, exponential increase in the amount of data generated across multiple devices and access mechanisms. The importance of data quality cannot be more emphasized. The framework and maturity model provided in this article will only serve as a mode to move towards being a more data quality intensive organization. Understanding the importance of data quality, analyzing the level of data quality maturity currently in and taking steps to move towards the highest level are all dependent on the individual organization. Businesses who understand this and move on the path to execution are the ones to succeed in today’s highly competitive landscape.