busting_10_myths_dq_management_2017

© 2017 Knowledge Integrity, Inc. 1
www.knowledge-integrity.com (301) 754-6350
Knowledge Integrity Incorporated
Business Intelligence Solutions
Busting 10 Myths about
Data Quality Management
Prepared by:
David Loshin
Knowledge Integrity, Inc.
January 2017
Sponsored by:

Introduction
Even though great strides have been made in data quality improvement over the past decades, many
myths and misconceptions are perpetuated through popular articles and presentations. Often, these
simplified views can be confusing or conflicting, and those blindly accepting the statements may find
their attempts at making discrete progress toward improvement may slow and stall. The goal of this
paper is to highlight some common “myths” about data quality management, explain why these are
myths, and to guide the reader to make better choices when deciding to pursue a data quality
management strategy.
Adding critical insight into different aspects of data quality management and putting some common
beliefs into perspective will help you put together a more thoughtful plan for a data quality
management program that can lead to measurable improvements in the quality and usability of
organizational data. There are no substitutes for good data management disciplines, and this paper will
advise the practitioner as to what the critical data issues are in the organization and how to leverage the
right tools and technologies to address those issues in the most efficient way.
Defining and deploying well-defined processes within a culture of data governance will simplify
technology acquisition and reduce time to value for implementing a data quality program. Our intent is
to provide a balanced view of the best practices for data quality improvement by examining some
common statements that can help differentiate what you heard, why it may be a myth, and some
considerations to planning your approach to data quality improvement.

Myth #1: The Business is Responsible for Data Quality
What You Heard
Data quality does not lie within the purview of the Information technology department; since poor data
quality impacts the business, the business users must take ownership for data quality improvement.
Why it is a Myth
Two of the most intractable issues organizations face when dealing with data quality problems are less
technical and more programmatic: funding the program and ensuring its sustainability. The conflict
evolves from a difference of opinion regarding financial support and resourcing. Essentially, the
statement is intended by Information Technology to drive business engagement in supporting data
quality activity.
Since the business users presume the equivalence of “data cleansing” and “data quality,” they insist that
if technical processes can be used to clean data, the responsibility lies with the IT department, and
consequently, so should the funding. On the other hand, the IT teams suggest that since poor data
quality impacts the business, and the business users are the ones defining what quality means, then the
business users need to take ownership of data quality and absolve IT from accountability. This
dichotomy pits IT and the business users against each other in terms of the effort to improve data
quality, thereby stalling progress rather than encouraging it.
Considerations and Alternatives
It is worthwhile to remember that the IT department must work in a partnership with the business users
to take best advantage of data. “Information” Technology is always going to be involved in anything that
touches information. And it is naive to presume that there are operational models that support only
non-technical business people taking responsibility for ensuring data quality.
Data quality management must be a collaborative effort that bridges the gaps between IT and the
business. An alternative approach considers a collaborative model in which the business side is
accountable for ensuring that there are good definitions of data quality rules, measures, and
acceptability levels while IT is responsible for instituting the architectural framework for ensuring the
rules are observed and reporting the measures. Data governance policies and procedures can be put in
place to ensure that issues are reported to the business but are handled by selected IT data stewards.

Myth #2: IT Owns Data Governance
What You Heard
Our company has appointed a Chief Data Officer (CDO) who will spearhead IT’s data governance
program.
Why it is a Myth
Data governance comprises the policies and practices that link data policy compliance with achieving
business objectives. The data management dependencies identified within business policies drives
definition of data policies. Data policies cannot be instituted by fiat, nor can they be enforced without
alignment and cooperation between the business and technology teams. Therefore, one cannot expect
that a CDO operating within the confines of the IT department has the ability to implement or authority
to enforce data governance without buy-in from the representatives of the business functions.
As with myth #1, the responsibility for deploying data governance is split: the business owns the policies
and processes, but IT owns the implementation. That suggests that all new system and application
development be designed with directly embedded procedures for monitoring data quality and asserting
data policy compliance.
Although the role and the list of responsibilities of the CDO is still evolving, there is a greater risk of
failing to properly institute sustainable practices for data governance when the CDO’s mandate is
designated within the information technology silo. The most effective Chief Data Officer will report
directly to the CEO, and be empowered to implement data governance by leveraging a partnership
between the business and IT. That way, the organization can inaugurate a sustainable data governance
program that directly integrates data policy compliance within defined business processes.

Myth #3: Data Quality Tools Do Not Require Any Set Up
What You Heard
The acquisition of a data cleansing tool is enough to eliminate all your data quality issues. A data quality
tool just plugs into the enterprise and cleans all your data out of the box.
Why it is a Myth
While technology is critical to data quality measurement and assurance, the quality of data is defined
within a business context and is associated with sets of metadata, assertions, and business rules. While
data quality tools have a lot of built in capabilities out of the box, they must be properly configured with
your organization’s rules in order to identify and cleanse data errors. In addition, the tools will need to
be integrated into the organization’s environment.
Often there is a presumption that if there is a data quality problem, then the process of acquiring a data
cleansing tool is the only necessary action to take. However a data quality tool is just that: a tool. And
just as the act of purchasing a shovel not guarantee that holes will appear in the ground, the purchase of
data quality tools does not guarantee that errors will be identified and corrected.
Addressing data quality issues goes beyond the purchase of a product. If the tool must be configured
with metadata, assertions, and rules in accordance with your business consumers’ expectations, the tool
will be most effective in the hands of professionals who understand the data, the context, and the
technology. That enables you to assemble a program that combines good data management practices,
data stewardship, and the use of tools that will provide the greatest benefit.

Myth #4: Manufacturing Quality Practices Are Easily Applied to Data
What You Heard
Quality processes as applied to manufacturing activities can be directly mapped to an “information
manufacturing” process. Therefore, quality techniques are eminently applicable to information.
Why it is a Myth
There is no doubt that the pioneers advocating quality in manufacturing such as Phillip Crosby, W.
Edwards Deming, and Joseph Juran have positively impacted the ways that manufacturers do business.
It makes sense to try to adapt their common-sense approaches to managing the quality of information,
and there have been some purported successes along these lines. But often, those advocating applying a
process-only quality approach to data may find that their successes may be bounded by the
characteristics of information that differ from other presumably “raw” material.
Physical manufacturing processes take limited amounts of raw materials that are transformed through a
series of processes into a unique final product. That product’s attribution and criteria can be compared
to discrete specifications for its intended use, such as the amount of usable storage on a DVD, or the
melting temperature of a screw.
In this analogy, data is the raw material and information products are the results. Yet in contrast to real
raw materials, data can be used multiple times, and contrary to real resulting manufactured products,
the output of information processes can be reused and repurposed in ways of which the original owners
never dreamt, let alone prepared for. Attempting to monitor compliance to specifications requires that
all those specifications are known beforehand – and this is often not the case with data.
There are definitely aspects of the quality movement that are applicable to managing data. But when
you take into account the fact that data sets are often found and reused in a variety of different ways,
you must reconsider what can be discretely defined as quality measures when there is a potential for
uncontrolled reuse.
This consideration can lead to two different conclusions. First, if there are ways of anticipating the
potential for different ways that created data can be repurposed, that might influence the managers of
the original sources to introduce data testing as part of a development life cycle process to anticipate
the types of flaws that might cause problems downstream and attempt to reduce or eliminate them
from the beginning. Second, those repurposing existing data sets might not be able to influence the
insertion of quality controls. In these cases the data consumers must take on the responsibility to ensure
the data meets their needs, and this may involve the direct application of data quality tools and
techniques.

Myth #5: You Must Have Perfect Data
What You Heard
Monitor data to identify imperfect data and address process issues that allow any imperfections. By
ensuring that the data is always perfect, no errors can impact the business.
Why it is a Myth
As myth #3 noted, unlike manufactured items whose adherence to engineering specifications can be
measured as they drop off the assembly line, data instances are created once and then used in different
ways by different processes requiring different levels of quality. The concept of “perfect” data is
contextual but is essentially based on use, not creation. Yet most of the time, data perfection is assumed
based on the constraints set by the process creating the record, and it is still relatively uncommon for
the expectations of downstream users to be folded into the requirements as new applications are being
designed and built.
The result is that most creating processes only care about immediate (i.e. operational) use of data, but
will not accommodate the needs of other consumers as the data sets are repurposed. On the other
hand, many data values are captured and stored for dubious reasons (they were part of a purchased
data model, or retained columns from a data migration project), but they may limited (if any) use. In this
scenario, while the concept of perfect data expresses an ideal, the data value may not be business
critical or necessary to achieve any business objectives, and investing energy in ensuring its perfection is
basically a wasted effort.
A saying commonly attributed to Voltaire is that “Perfect is the enemy of good.” Obsessive
perfectionism, even in the context of data quality, comes at a cost when the effort needed to reach
perfection exceeds the value to be achieved by progressing from good to perfect.
It is a noble idea to have perfect data, but with limited resources, it's better to focus on the biggest
offenders. Understanding who the data consumers are and what their expectations will be for data
quality allows you to more effectively anticipate process failures that can introduce errors. The law of
diminishing returns demands that the data quality team members be smart about how resources are
allocated to ensure that the organization gets the biggest bang for its buck while providing the greatest
value and efficiency.

Myth #6: The Cost of Bad Data is Obvious
What You Heard
Ensuring that your data is perfect will automatically increase revenues and decrease costs.
Why it is a Myth
There is no doubt that pervasive data flaws will lead to negative impacts to the business. And in general,
reducing the frequency and scale of occurrence of data errors should reduce the negative impacts. But
little has been reported regarding the connections between specific errors and identified costs, and that
means that there is a subtle difference between putting controls into place to prevent negative impacts
and making claims of the value of “perfect” data.
The costs of flawed data might not be reflected in the allocation of budget; for example, financial
transaction errors may impose a cost on the contact center rather than on the transaction processing
center when customers call in to complain about errors in their statements. The costs might not be
associated with data quality management within a budget line item. To follow our example, the business
impacts of incorrect customer statements are essentially subsumed within the indirect costs of the
customer service center.
This can make it difficult to determine how much you will save when you clean up your data. But it
doesn't mean that you're not saving money! Understanding this subtlety will help in truly identifying
business impacts directly related to data flaws. Awareness of the types of errors that might contribute
to negative impact allows the team to institute data quality controls that can lead to predictably
improved business processes. This suggests:
• Soliciting feedback from the business consumers of a data set;
• Determining the types of errors that have occurred or that may occur that may lead to a
negative impact; and
• Defining methods for determining the existence of an error at a point in the process where it
can be most effectively remediated.
It is wise to prevent preventable errors that have material impact. On the other hand, if no one cares
about some type of error, it may not make sense to exert the effort to prevent it. Determine which
error scenarios are most likely to cause greatest impact and prioritize accordingly.

Myth #7: Monitoring and Reporting Data Quality Eliminates Errors
What You Heard
By instituting a data quality dashboard populated with continuous measures, you will eliminate data
errors.
Why it is a Myth
The fundamental idea behind reporting conformance to data quality expectations within a data quality
scorecard or dashboard is to alert the data stewards when predictable issues can be identified within
the process. The theory is that by instituting data rules for continuous measurement, you can be
proactive when errors occur and prevent those errors from impacting the business.
However, in many cases, if you know enough to describe and then identify the error so that you can do
something about it when it occurs, you would probably be better off in seeking out the root cause of the
error and completely eliminating it, which would obviate the need to continue to monitor for the error.
Why continue to measure something whose cause might actually be eliminated?
Actually, in some sense, measuring (and addressing) something you already know to be a problem is not
being proactive, but rather it is being reactive earlier in the process. However, using a data quality
dashboard to alert you to issues early on helps you be prepared so that data quality surprises don't
knock you out, which allows more flexibility and preparedness in reacting to emerging data quality
issues that have not yet been identified.
On the other hand, being truly proactive means anticipating the types of errors that could occur and
engaging the consumers to assess any potential impacts of those errors. This can inform the design and
implementation teams as to whether fundamental changes to the process can reduce the potential for
those errors occurring. Incorporating this governance practice as part of the development lifecycle
process enables the use of tools and technologies to predict errors so that the application developers
can build stop-gaps and controls to ensure the errors won’t happen in the first place.

Myth #8: Data Quality is Only Solved by Process Improvement
What You Heard
You don’t need data quality tools – all data quality issues can be resolved using good process
management!
Why it is a Myth
Philosophical approaches to management improvement often are based on laboratory conditions, which
allow one to make generalizations that might not apply in most real-life scenarios. In particular, be
aware that there are methodologies for data quality improvement that focus only on process
improvement, such as insisting that data providers always validate their data before providing it to the
consumers. Often, these approaches suggest that technical approaches are not necessary to improving
the quality of data.
In the laboratory environment, simplistic approaches bypass any political and logistic issues, and often
do not reflect real-life scenarios in which data suppliers have no budget or interest in changing their
ways to support unknown downstream users, users repurpose “found” data sets with no control over
data creation, or participants in a collaborative environment must agree to rules for standardization for
data sharing. These situations require a combination of process improvement and data quality tools and
techniques to ensure data usability.
The unmet challenge, in many cases, is that those advocating process improvement do not understand
the limitations when you do not exercise control over the administrative domain within which the data
flows. For example, when you cannot engage the data creators to modify their ways to ensure the data
meets your application’s needs, you still must do something to prevent data flaws from impacting your
own processes.
Instead you must look at engaging the data source owners when possible but have a strategy for
maintaining data quality at the necessary level when appropriate. If you cannot influence change over
the processes in the information flow, employing tools for parsing, standardization, and cleansing may
be the best next step in managing data fitness for your own purposes.

Myth #9: We Can Establish a Single Enterprise Standard for Data Quality
What You Heard
We will centralize all data quality standards and apply them to all enterprise data. This will ensure that
all data consumers get consistently high-quality data.
Why it is a Myth
While many downstream data consumers share the fundamental set of data quality expectations
regarding timeliness, currency, and completeness, the details of specific consistency and reasonableness
expectations may differ based on the business context and application. Some business processes may be
able to ignore selected types of data flaws, while others have no tolerance for the same errors.
“Data quality” is typically in the eyes of the beholder, and attempting to enforce a single standard may
be too onerous for some users and insufficient for others.
Instead of centralizing the data quality standards, centralize data quality management. Embrace a
proper set of data quality tools that can support verification and validation of selected data quality rules
at numerous points along the end-to-end data flows, and train your users on how those tools are used
to evaluate data quality expectations
Use a collaborative platform for proposing, documenting, and adopting data quality rules. This allows
rules to be shared without demanding that all rules be enforced across all data flows. Centralized data
quality management enables the data users to specify what rules are reasonable for their business
processes and applications, which ones are to be applied, at what points in the process, and the
necessary levels of acceptability.
This approach may help to establish common standards. At the same time, it does not lock the entire
organization into a monolithic (and potentially bloated) set of rules. Added flexibility lets you grant
reasonable dispensations from proposed enterprise standards given appropriate business
circumstances.

Myth #10: Data Scientists Manage Their Own Data Quality
What You Heard
Data preparation tools let data scientists discover data quality issues and provide ways to transform raw
data into usable formats with little to no effort.
Why it is a Myth
This is essentially the converse of Myth #8. Data preparation tools provide each end-user with the
means to profile raw data and consider alternative methods of reformulation and transformation. Giving
individual analysts the ability to craft their own sequences of transformations is appealing because it
allows them the flexibility in asserting standards and semantics.
However, when isolated analysts applying their transformations do not share what they are doing or
their processes, the risk of inconsistent definitions and specifications across the organization increases.
So even if different data scientists are properly using their data preparation tools, the impacts of slight
variations in their transformations may reverberate when representatives of the business attempt to
interpret potentially conflicting results.
Having individuals managing their own data quality in a vacuum can lead to conflicting results. However,
as suggested in Myth #8, having individualized data quality plans is actually a healthy alternative to the
conventional IT-driven data quality program, since data usability is essentially defined in the contexts of
the data consumers.
If the concern is inconsistency of interpretation of analytical results, introduce policies for governing the
ways that end-user data preparation tools are used. Establish a framework for collaboration and
validation among the data scientists about data standards, semantics, and data transformations.
Configure the data preparation tools to motivate reuse of defined transformation sequences to
encourage end-product consistency.

Considerations: The Data Quality Strategy
Fulfilling the desire for improved organizational data quality requires a combination of thoughtful
planning and effective management of resources. The responsibility cannot be assigned in a haphazard
way to either the business or the technical side – both perspectives are required in order to institute
controls and procedures that allow data sets to meet the collective needs of the consumer constituency.
Likewise, the quality of the data cannot be improved by only applying technology or only applying the
process improvements dictated by the “quality movement.” It requires a collaborative effort that arms
business process experts with the right technical tools to make cost-effective decisions about
identifying, reacting to, and anticipating the types of data errors that lead to negative business impact.
Tools such as data profiling and data mapping can help to evaluate different types of errors and support
continuous monitoring to generate alerts when errors beyond your control need to be addressed.
Common data quality tools such as parsing, standardization, and identity matching and resolution can
be applied to cleanse errors and normalize data when fixing the root causes of the errors is beyond your
administrative control. Dashboards and scorecards can be configured to support monitoring the
performance and effectiveness of data stewards and data quality analysts in how data quality best
practices are applied.
Lastly, recognize that adopting the best suggestions from data management professionals will allow the
development of an effective strategy and plan for data quality improvements in the short-, medium- and
long-term. Integrating methods for taking advantage of collaboration between technical implementers
and business data consumers will help in proactively identifying data quality dependencies, anticipating
potential issues, and engineering inspections and controls into the application framework to prevent
errors from being introduced in the first place.

About the Author
David Loshin, president of Knowledge Integrity, Inc, (www.knowledge-integrity.com), is a recognized
thought leader and expert consultant in the areas of analytics, big data, data governance, data quality,
master data management, and business intelligence. Along with consulting on numerous data
management projects over the past 15 years, David is also a prolific author regarding business
intelligence best practices, as the author of numerous books and papers on data management, including
the recently published “Big Data Analytics: From Strategic Planning to Enterprise Integration with Tools,
Techniques, NoSQL, and Graph,” the second edition of “Business Intelligence – The Savvy Manager’s
Guide,” as well as other books and articles on data quality, master data management, big data, and data
governance. David is a frequent invited speaker at conferences, web seminars, and sponsored web sites
and TechTarget channels, and shares additional content at his notes and articles at
www.dataqualitybook.com
David can be reached at loshin@knowledge-integrity.com, or at (301) 754-6350.
About the Sponsor
About Information Builders
Information Builders provides solutions for business intelligence (BI), analytics, data integration, and
data quality that help drive performance improvements, innovation, and value. Through one set of
powerful products, we enable organizations to serve everyone – analysts, non-technical users, even
partners, customers, and citizens – with better data and analytics. Our dedication to customer success is
unmatched with thousands of organizations relying on us as their trusted partner. Founded in 1975,
Information Builders is headquartered in New York, NY, with global offices, and remains one of the
largest independent, privately held companies in the industry. Visit us at informationbuilders.com,
follow us on Twitter at @infobldrs, like us on Facebook, and visit our LinkedIn page.

busting_10_myths_dq_management_2017

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to busting_10_myths_dq_management_2017

Similar to busting_10_myths_dq_management_2017 (20)

busting_10_myths_dq_management_2017