SlideShare a Scribd company logo
1 of 14
Download to read offline
©	2017	Knowledge	Integrity,	Inc.	 1	
www.knowledge-integrity.com	 (301) 754-6350
Knowledge Integrity Incorporated
Business Intelligence Solutions
Busting	10	Myths	about	
Data	Quality	Management	
		Prepared	by:	
		David	Loshin	
		Knowledge	Integrity,	Inc.	
		January	2017	
		Sponsored	by:
©	2017	Knowledge	Integrity,	Inc.	 2	
www.knowledge-integrity.com	 	 (301)	754-6350	
Knowledge Integrity Incorporated
Business Intelligence Solutions
Introduction	
Even	though	great	strides	have	been	made	in	data	quality	improvement	over	the	past	decades,	many	
myths	and	misconceptions	are	perpetuated	through	popular	articles	and	presentations.	Often,	these	
simplified	views	can	be	confusing	or	conflicting,	and	those	blindly	accepting	the	statements	may	find	
their	attempts	at	making	discrete	progress	toward	improvement	may	slow	and	stall.	The	goal	of	this	
paper	is	to	highlight	some	common	“myths”	about	data	quality	management,	explain	why	these	are	
myths,	and	to	guide	the	reader	to	make	better	choices	when	deciding	to	pursue	a	data	quality	
management	strategy.		
Adding	critical	insight	into	different	aspects	of	data	quality	management	and	putting	some	common	
beliefs	into	perspective	will	help	you	put	together	a	more	thoughtful	plan	for	a	data	quality	
management	program	that	can	lead	to	measurable	improvements	in	the	quality	and	usability	of	
organizational	data.	There	are	no	substitutes	for	good	data	management	disciplines,	and	this	paper	will	
advise	the	practitioner	as	to	what	the	critical	data	issues	are	in	the	organization	and	how	to	leverage	the	
right	tools	and	technologies	to	address	those	issues	in	the	most	efficient	way.		
Defining	and	deploying	well-defined	processes	within	a	culture	of	data	governance	will	simplify	
technology	acquisition	and	reduce	time	to	value	for	implementing	a	data	quality	program.	Our	intent	is	
to	provide	a	balanced	view	of	the	best	practices	for	data	quality	improvement	by	examining	some	
common	statements	that	can	help	differentiate	what	you	heard,	why	it	may	be	a	myth,	and	some	
considerations	to	planning	your	approach	to	data	quality	improvement.
©	2017	Knowledge	Integrity,	Inc.	 3	
www.knowledge-integrity.com	 	 (301)	754-6350	
Knowledge Integrity Incorporated
Business Intelligence Solutions
Myth	#1:	The	Business	is	Responsible	for	Data	Quality	
What	You	Heard	
Data	quality	does	not	lie	within	the	purview	of	the	Information	technology	department;	since	poor	data	
quality	impacts	the	business,	the	business	users	must	take	ownership	for	data	quality	improvement.	
Why	it	is	a	Myth	
Two	of	the	most	intractable	issues	organizations	face	when	dealing	with	data	quality	problems	are	less	
technical	and	more	programmatic:	funding	the	program	and	ensuring	its	sustainability.	The	conflict	
evolves	from	a	difference	of	opinion	regarding	financial	support	and	resourcing.	Essentially,	the	
statement	is	intended	by	Information	Technology	to	drive	business	engagement	in	supporting	data	
quality	activity.		
Since	the	business	users	presume	the	equivalence	of	“data	cleansing”	and	“data	quality,”	they	insist	that	
if	technical	processes	can	be	used	to	clean	data,	the	responsibility	lies	with	the	IT	department,	and	
consequently,	so	should	the	funding.	On	the	other	hand,	the	IT	teams	suggest	that	since	poor	data	
quality	impacts	the	business,	and	the	business	users	are	the	ones	defining	what	quality	means,	then	the	
business	users	need	to	take	ownership	of	data	quality	and	absolve	IT	from	accountability.	This	
dichotomy	pits	IT	and	the	business	users	against	each	other	in	terms	of	the	effort	to	improve	data	
quality,	thereby	stalling	progress	rather	than	encouraging	it.	
Considerations	and	Alternatives	
It	is	worthwhile	to	remember	that	the	IT	department	must	work	in	a	partnership	with	the	business	users	
to	take	best	advantage	of	data.	“Information”	Technology	is	always	going	to	be	involved	in	anything	that	
touches	information.	And	it	is	naive	to	presume	that	there	are	operational	models	that	support	only	
non-technical	business	people	taking	responsibility	for	ensuring	data	quality.	
Data	quality	management	must	be	a	collaborative	effort	that	bridges	the	gaps	between	IT	and	the	
business.	An	alternative	approach	considers	a	collaborative	model	in	which	the	business	side	is	
accountable	for	ensuring	that	there	are	good	definitions	of	data	quality	rules,	measures,	and	
acceptability	levels	while	IT	is	responsible	for	instituting	the	architectural	framework	for	ensuring	the	
rules	are	observed	and	reporting	the	measures.	Data	governance	policies	and	procedures	can	be	put	in	
place	to	ensure	that	issues	are	reported	to	the	business	but	are	handled	by	selected	IT	data	stewards.
©	2017	Knowledge	Integrity,	Inc.	 4	
www.knowledge-integrity.com	 	 (301)	754-6350	
Knowledge Integrity Incorporated
Business Intelligence Solutions
Myth	#2:	IT	Owns	Data	Governance	
What	You	Heard	
Our	company	has	appointed	a	Chief	Data	Officer	(CDO)	who	will	spearhead	IT’s	data	governance	
program.	
Why	it	is	a	Myth	
Data	governance	comprises	the	policies	and	practices	that	link	data	policy	compliance	with	achieving	
business	objectives.	The	data	management	dependencies	identified	within	business	policies	drives	
definition	of	data	policies.	Data	policies	cannot	be	instituted	by	fiat,	nor	can	they	be	enforced	without	
alignment	and	cooperation	between	the	business	and	technology	teams.	Therefore,	one	cannot	expect	
that	a	CDO	operating	within	the	confines	of	the	IT	department	has	the	ability	to	implement	or	authority	
to	enforce	data	governance	without	buy-in	from	the	representatives	of	the	business	functions.	
Considerations	and	Alternatives	
As	with	myth	#1,	the	responsibility	for	deploying	data	governance	is	split:	the	business	owns	the	policies	
and	processes,	but	IT	owns	the	implementation.	That	suggests	that	all	new	system	and	application	
development	be	designed	with	directly	embedded	procedures	for	monitoring	data	quality	and	asserting	
data	policy	compliance.		
Although	the	role	and	the	list	of	responsibilities	of	the	CDO	is	still	evolving,	there	is	a	greater	risk	of	
failing	to	properly	institute	sustainable	practices	for	data	governance	when	the	CDO’s	mandate	is	
designated	within	the	information	technology	silo.	The	most	effective	Chief	Data	Officer	will	report	
directly	to	the	CEO,	and	be	empowered	to	implement	data	governance	by	leveraging	a	partnership	
between	the	business	and	IT.	That	way,	the	organization	can	inaugurate	a	sustainable	data	governance	
program	that	directly	integrates	data	policy	compliance	within	defined	business	processes.
©	2017	Knowledge	Integrity,	Inc.	 5	
www.knowledge-integrity.com	 	 (301)	754-6350	
Knowledge Integrity Incorporated
Business Intelligence Solutions
Myth	#3:	Data	Quality	Tools	Do	Not	Require	Any	Set	Up	
What	You	Heard	
The	acquisition	of	a	data	cleansing	tool	is	enough	to	eliminate	all	your	data	quality	issues.	A	data	quality	
tool	just	plugs	into	the	enterprise	and	cleans	all	your	data	out	of	the	box.	
Why	it	is	a	Myth	
While	technology	is	critical	to	data	quality	measurement	and	assurance,	the	quality	of	data	is	defined	
within	a	business	context	and	is	associated	with	sets	of	metadata,	assertions,	and	business	rules.	While	
data	quality	tools	have	a	lot	of	built	in	capabilities	out	of	the	box,	they	must	be	properly	configured	with	
your	organization’s	rules	in	order	to	identify	and	cleanse	data	errors.	In	addition,	the	tools	will	need	to	
be	integrated	into	the	organization’s	environment.		
Considerations	and	Alternatives	
Often	there	is	a	presumption	that	if	there	is	a	data	quality	problem,	then	the	process	of	acquiring	a	data	
cleansing	tool	is	the	only	necessary	action	to	take.	However	a	data	quality	tool	is	just	that:	a	tool.	And	
just	as	the	act	of	purchasing	a	shovel	not	guarantee	that	holes	will	appear	in	the	ground,	the	purchase	of	
data	quality	tools	does	not	guarantee	that	errors	will	be	identified	and	corrected.	
Addressing	data	quality	issues	goes	beyond	the	purchase	of	a	product.	If	the	tool	must	be	configured	
with	metadata,	assertions,	and	rules	in	accordance	with	your	business	consumers’	expectations,	the	tool	
will	be	most	effective	in	the	hands	of	professionals	who	understand	the	data,	the	context,	and	the	
technology.	That	enables	you	to	assemble	a	program	that	combines	good	data	management	practices,	
data	stewardship,	and	the	use	of	tools	that	will	provide	the	greatest	benefit.
©	2017	Knowledge	Integrity,	Inc.	 6	
www.knowledge-integrity.com	 	 (301)	754-6350	
Knowledge Integrity Incorporated
Business Intelligence Solutions
Myth	#4:	Manufacturing	Quality	Practices	Are	Easily	Applied	to	Data	
What	You	Heard	
Quality	processes	as	applied	to	manufacturing	activities	can	be	directly	mapped	to	an	“information	
manufacturing”	process.	Therefore,	quality	techniques	are	eminently	applicable	to	information.	
Why	it	is	a	Myth	
There	is	no	doubt	that	the	pioneers	advocating	quality	in	manufacturing	such	as	Phillip	Crosby,	W.	
Edwards	Deming,	and	Joseph	Juran	have	positively	impacted	the	ways	that	manufacturers	do	business.	
It	makes	sense	to	try	to	adapt	their	common-sense	approaches	to	managing	the	quality	of	information,	
and	there	have	been	some	purported	successes	along	these	lines.	But	often,	those	advocating	applying	a	
process-only	quality	approach	to	data	may	find	that	their	successes	may	be	bounded	by	the	
characteristics	of	information	that	differ	from	other	presumably	“raw”	material.	
Physical	manufacturing	processes	take	limited	amounts	of	raw	materials	that	are	transformed	through	a	
series	of	processes	into	a	unique	final	product.	That	product’s	attribution	and	criteria	can	be	compared	
to	discrete	specifications	for	its	intended	use,	such	as	the	amount	of	usable	storage	on	a	DVD,	or	the	
melting	temperature	of	a	screw.		
In	this	analogy,	data	is	the	raw	material	and	information	products	are	the	results.	Yet	in	contrast	to	real	
raw	materials,	data	can	be	used	multiple	times,	and	contrary	to	real	resulting	manufactured	products,	
the	output	of	information	processes	can	be	reused	and	repurposed	in	ways	of	which	the	original	owners	
never	dreamt,	let	alone	prepared	for.	Attempting	to	monitor	compliance	to	specifications	requires	that	
all	those	specifications	are	known	beforehand	–	and	this	is	often	not	the	case	with	data.	
Considerations	and	Alternatives	
There	are	definitely	aspects	of	the	quality	movement	that	are	applicable	to	managing	data.	But	when	
you	take	into	account	the	fact	that	data	sets	are	often	found	and	reused	in	a	variety	of	different	ways,	
you	must	reconsider	what	can	be	discretely	defined	as	quality	measures	when	there	is	a	potential	for	
uncontrolled	reuse.		
This	consideration	can	lead	to	two	different	conclusions.	First,	if	there	are	ways	of	anticipating	the	
potential	for	different	ways	that	created	data	can	be	repurposed,	that	might	influence	the	managers	of	
the	original	sources	to	introduce	data	testing	as	part	of	a	development	life	cycle	process	to	anticipate	
the	types	of	flaws	that	might	cause	problems	downstream	and	attempt	to	reduce	or	eliminate	them	
from	the	beginning.	Second,	those	repurposing	existing	data	sets	might	not	be	able	to	influence	the	
insertion	of	quality	controls.	In	these	cases	the	data	consumers	must	take	on	the	responsibility	to	ensure	
the	data	meets	their	needs,	and	this	may	involve	the	direct	application	of	data	quality	tools	and	
techniques.
©	2017	Knowledge	Integrity,	Inc.	 7	
www.knowledge-integrity.com	 	 (301)	754-6350	
Knowledge Integrity Incorporated
Business Intelligence Solutions
Myth	#5:	You	Must	Have	Perfect	Data	
What	You	Heard	
Monitor	data	to	identify	imperfect	data	and	address	process	issues	that	allow	any	imperfections.	By	
ensuring	that	the	data	is	always	perfect,	no	errors	can	impact	the	business.	
Why	it	is	a	Myth	
As	myth	#3	noted,	unlike	manufactured	items	whose	adherence	to	engineering	specifications	can	be	
measured	as	they	drop	off	the	assembly	line,	data	instances	are	created	once	and	then	used	in	different	
ways	by	different	processes	requiring	different	levels	of	quality.	The	concept	of	“perfect”	data	is	
contextual	but	is	essentially	based	on	use,	not	creation.	Yet	most	of	the	time,	data	perfection	is	assumed	
based	on	the	constraints	set	by	the	process	creating	the	record,	and	it	is	still	relatively	uncommon	for	
the	expectations	of	downstream	users	to	be	folded	into	the	requirements	as	new	applications	are	being	
designed	and	built.	
The	result	is	that	most	creating	processes	only	care	about	immediate	(i.e.	operational)	use	of	data,	but	
will	not	accommodate	the	needs	of	other	consumers	as	the	data	sets	are	repurposed.	On	the	other	
hand,	many	data	values	are	captured	and	stored	for	dubious	reasons	(they	were	part	of	a	purchased	
data	model,	or	retained	columns	from	a	data	migration	project),	but	they	may	limited	(if	any)	use.	In	this	
scenario,	while	the	concept	of	perfect	data	expresses	an	ideal,	the	data	value	may	not	be	business	
critical	or	necessary	to	achieve	any	business	objectives,	and	investing	energy	in	ensuring	its	perfection	is	
basically	a	wasted	effort.	
Considerations	and	Alternatives	
A	saying	commonly	attributed	to	Voltaire	is	that	“Perfect	is	the	enemy	of	good.”	Obsessive	
perfectionism,	even	in	the	context	of	data	quality,	comes	at	a	cost	when	the	effort	needed	to	reach	
perfection	exceeds	the	value	to	be	achieved	by	progressing	from	good	to	perfect.		
It	is	a	noble	idea	to	have	perfect	data,	but	with	limited	resources,	it's	better	to	focus	on	the	biggest	
offenders.	Understanding	who	the	data	consumers	are	and	what	their	expectations	will	be	for	data	
quality	allows	you	to	more	effectively	anticipate	process	failures	that	can	introduce	errors.	The	law	of	
diminishing	returns	demands	that	the	data	quality	team	members	be	smart	about	how	resources	are	
allocated	to	ensure	that	the	organization	gets	the	biggest	bang	for	its	buck	while	providing	the	greatest	
value	and	efficiency.
©	2017	Knowledge	Integrity,	Inc.	 8	
www.knowledge-integrity.com	 	 (301)	754-6350	
Knowledge Integrity Incorporated
Business Intelligence Solutions
Myth	#6:	The	Cost	of	Bad	Data	is	Obvious	
What	You	Heard	
Ensuring	that	your	data	is	perfect	will	automatically	increase	revenues	and	decrease	costs.		
Why	it	is	a	Myth	
There	is	no	doubt	that	pervasive	data	flaws	will	lead	to	negative	impacts	to	the	business.	And	in	general,	
reducing	the	frequency	and	scale	of	occurrence	of	data	errors	should	reduce	the	negative	impacts.	But	
little	has	been	reported	regarding	the	connections	between	specific	errors	and	identified	costs,	and	that	
means	that	there	is	a	subtle	difference	between	putting	controls	into	place	to	prevent	negative	impacts	
and	making	claims	of	the	value	of	“perfect”	data.		
The	costs	of	flawed	data	might	not	be	reflected	in	the	allocation	of	budget;	for	example,	financial	
transaction	errors	may	impose	a	cost	on	the	contact	center	rather	than	on	the	transaction	processing	
center	when	customers	call	in	to	complain	about	errors	in	their	statements.	The	costs	might	not	be	
associated	with	data	quality	management	within	a	budget	line	item.	To	follow	our	example,	the	business	
impacts	of	incorrect	customer	statements	are	essentially	subsumed	within	the	indirect	costs	of	the	
customer	service	center.		
Considerations	and	Alternatives	
This	can	make	it	difficult	to	determine	how	much	you	will	save	when	you	clean	up	your	data.	But	it	
doesn't	mean	that	you're	not	saving	money!	Understanding	this	subtlety	will	help	in	truly	identifying	
business	impacts	directly	related	to	data	flaws.	Awareness	of	the	types	of	errors	that	might	contribute	
to	negative	impact	allows	the	team	to	institute	data	quality	controls	that	can	lead	to	predictably	
improved	business	processes.	This	suggests:	
• Soliciting	feedback	from	the	business	consumers	of	a	data	set;	
• Determining	the	types	of	errors	that	have	occurred	or	that	may	occur	that	may	lead	to	a	
negative	impact;	and	
• Defining	methods	for	determining	the	existence	of	an	error	at	a	point	in	the	process	where	it	
can	be	most	effectively	remediated.	
It	is	wise	to	prevent	preventable	errors	that	have	material	impact.	On	the	other	hand,	if	no	one	cares	
about	some	type	of	error,	it	may	not	make	sense	to	exert	the	effort	to	prevent	it.		Determine	which	
error	scenarios	are	most	likely	to	cause	greatest	impact	and	prioritize	accordingly.
©	2017	Knowledge	Integrity,	Inc.	 9	
www.knowledge-integrity.com	 	 (301)	754-6350	
Knowledge Integrity Incorporated
Business Intelligence Solutions
Myth	#7:	Monitoring	and	Reporting	Data	Quality	Eliminates	Errors	
What	You	Heard	
By	instituting	a	data	quality	dashboard	populated	with	continuous	measures,	you	will	eliminate	data	
errors.	
Why	it	is	a	Myth	
The	fundamental	idea	behind	reporting	conformance	to	data	quality	expectations	within	a	data	quality	
scorecard	or	dashboard	is	to	alert	the	data	stewards	when	predictable	issues	can	be	identified	within	
the	process.	The	theory	is	that	by	instituting	data	rules	for	continuous	measurement,	you	can	be	
proactive	when	errors	occur	and	prevent	those	errors	from	impacting	the	business.	
However,	in	many	cases,	if	you	know	enough	to	describe	and	then	identify	the	error	so	that	you	can	do	
something	about	it	when	it	occurs,	you	would	probably	be	better	off	in	seeking	out	the	root	cause	of	the	
error	and	completely	eliminating	it,	which	would	obviate	the	need	to	continue	to	monitor	for	the	error.	
Why	continue	to	measure	something	whose	cause	might	actually	be	eliminated?	
Considerations	and	Alternatives	
Actually,	in	some	sense,	measuring	(and	addressing)	something	you	already	know	to	be	a	problem	is	not	
being	proactive,	but	rather	it	is	being	reactive	earlier	in	the	process.	However,	using	a	data	quality	
dashboard	to	alert	you	to	issues	early	on	helps	you	be	prepared	so	that	data	quality	surprises	don't	
knock	you	out,	which	allows	more	flexibility	and	preparedness	in	reacting	to	emerging	data	quality	
issues	that	have	not	yet	been	identified.	
On	the	other	hand,	being	truly	proactive	means	anticipating	the	types	of	errors	that	could	occur	and	
engaging	the	consumers	to	assess	any	potential	impacts	of	those	errors.	This	can	inform	the	design	and	
implementation	teams	as	to	whether	fundamental	changes	to	the	process	can	reduce	the	potential	for	
those	errors	occurring.	Incorporating	this	governance	practice	as	part	of	the	development	lifecycle	
process	enables	the	use	of	tools	and	technologies	to	predict	errors	so	that	the	application	developers	
can	build	stop-gaps	and	controls	to	ensure	the	errors	won’t	happen	in	the	first	place.
©	2017	Knowledge	Integrity,	Inc.	 10	
www.knowledge-integrity.com	 	 (301)	754-6350	
Knowledge Integrity Incorporated
Business Intelligence Solutions
Myth	#8:	Data	Quality	is	Only	Solved	by	Process	Improvement	
What	You	Heard	
You	don’t	need	data	quality	tools	–	all	data	quality	issues	can	be	resolved	using	good	process	
management!	
Why	it	is	a	Myth	
Philosophical	approaches	to	management	improvement	often	are	based	on	laboratory	conditions,	which	
allow	one	to	make	generalizations	that	might	not	apply	in	most	real-life	scenarios.	In	particular,	be	
aware	that	there	are	methodologies	for	data	quality	improvement	that	focus	only	on	process	
improvement,	such	as	insisting	that	data	providers	always	validate	their	data	before	providing	it	to	the	
consumers.	Often,	these	approaches	suggest	that	technical	approaches	are	not	necessary	to	improving	
the	quality	of	data.	
In	the	laboratory	environment,	simplistic	approaches	bypass	any	political	and	logistic	issues,	and	often	
do	not	reflect	real-life	scenarios	in	which	data	suppliers	have	no	budget	or	interest	in	changing	their	
ways	to	support	unknown	downstream	users,	users	repurpose	“found”	data	sets	with	no	control	over	
data	creation,	or	participants	in	a	collaborative	environment	must	agree	to	rules	for	standardization	for	
data	sharing.	These	situations	require	a	combination	of	process	improvement	and	data	quality	tools	and	
techniques	to	ensure	data	usability.	
Considerations	and	Alternatives	
The	unmet	challenge,	in	many	cases,	is	that	those	advocating	process	improvement	do	not	understand	
the	limitations	when	you	do	not	exercise	control	over	the	administrative	domain	within	which	the	data	
flows.	For	example,	when	you	cannot	engage	the	data	creators	to	modify	their	ways	to	ensure	the	data	
meets	your	application’s	needs,	you	still	must	do	something	to	prevent	data	flaws	from	impacting	your	
own	processes.		
Instead	you	must	look	at	engaging	the	data	source	owners	when	possible	but	have	a	strategy	for	
maintaining	data	quality	at	the	necessary	level	when	appropriate.	If	you	cannot	influence	change	over	
the	processes	in	the	information	flow,	employing	tools	for	parsing,	standardization,	and	cleansing	may	
be	the	best	next	step	in	managing	data	fitness	for	your	own	purposes.
©	2017	Knowledge	Integrity,	Inc.	 11	
www.knowledge-integrity.com	 	 (301)	754-6350	
Knowledge Integrity Incorporated
Business Intelligence Solutions
Myth	#9:	We	Can	Establish	a	Single	Enterprise	Standard	for	Data	Quality	
What	You	Heard	
We	will	centralize	all	data	quality	standards	and	apply	them	to	all	enterprise	data.	This	will	ensure	that	
all	data	consumers	get	consistently	high-quality	data.	
Why	it	is	a	Myth	
While	many	downstream	data	consumers	share	the	fundamental	set	of	data	quality	expectations	
regarding	timeliness,	currency,	and	completeness,	the	details	of	specific	consistency	and	reasonableness	
expectations	may	differ	based	on	the	business	context	and	application.	Some	business	processes	may	be	
able	to	ignore	selected	types	of	data	flaws,	while	others	have	no	tolerance	for	the	same	errors.	
“Data	quality”	is	typically	in	the	eyes	of	the	beholder,	and	attempting	to	enforce	a	single	standard	may	
be	too	onerous	for	some	users	and	insufficient	for	others.	
Considerations	and	Alternatives	
Instead	of	centralizing	the	data	quality	standards,	centralize	data	quality	management.	Embrace	a	
proper	set	of	data	quality	tools	that	can	support	verification	and	validation	of	selected	data	quality	rules	
at	numerous	points	along	the	end-to-end	data	flows,	and	train	your	users	on	how	those	tools	are	used	
to	evaluate	data	quality	expectations	
Use	a	collaborative	platform	for	proposing,	documenting,	and	adopting	data	quality	rules.	This	allows	
rules	to	be	shared	without	demanding	that	all	rules	be	enforced	across	all	data	flows.	Centralized	data	
quality	management	enables	the	data	users	to	specify	what	rules	are	reasonable	for	their	business	
processes	and	applications,	which	ones	are	to	be	applied,	at	what	points	in	the	process,	and	the	
necessary	levels	of	acceptability.		
This	approach	may	help	to	establish	common	standards.	At	the	same	time,	it	does	not	lock	the	entire	
organization	into	a	monolithic	(and	potentially	bloated)	set	of	rules.	Added	flexibility	lets	you	grant	
reasonable	dispensations	from	proposed	enterprise	standards	given	appropriate	business	
circumstances.
©	2017	Knowledge	Integrity,	Inc.	 12	
www.knowledge-integrity.com	 	 (301)	754-6350	
Knowledge Integrity Incorporated
Business Intelligence Solutions
Myth	#10:	Data	Scientists	Manage	Their	Own	Data	Quality	
What	You	Heard	
Data	preparation	tools	let	data	scientists	discover	data	quality	issues	and	provide	ways	to	transform	raw	
data	into	usable	formats	with	little	to	no	effort.	
Why	it	is	a	Myth	
This	is	essentially	the	converse	of	Myth	#8.	Data	preparation	tools	provide	each	end-user	with	the	
means	to	profile	raw	data	and	consider	alternative	methods	of	reformulation	and	transformation.	Giving	
individual	analysts	the	ability	to	craft	their	own	sequences	of	transformations	is	appealing	because	it	
allows	them	the	flexibility	in	asserting	standards	and	semantics.		
However,	when	isolated	analysts	applying	their	transformations	do	not	share	what	they	are	doing	or	
their	processes,	the	risk	of	inconsistent	definitions	and	specifications	across	the	organization	increases.	
So	even	if	different	data	scientists	are	properly	using	their	data	preparation	tools,	the	impacts	of	slight	
variations	in	their	transformations	may	reverberate	when	representatives	of	the	business	attempt	to	
interpret	potentially	conflicting	results.	
Considerations	and	Alternatives	
Having	individuals	managing	their	own	data	quality	in	a	vacuum	can	lead	to	conflicting	results.	However,	
as	suggested	in	Myth	#8,	having	individualized	data	quality	plans	is	actually	a	healthy	alternative	to	the	
conventional	IT-driven	data	quality	program,	since	data	usability	is	essentially	defined	in	the	contexts	of	
the	data	consumers.	
If	the	concern	is	inconsistency	of	interpretation	of	analytical	results,	introduce	policies	for	governing	the	
ways	that	end-user	data	preparation	tools	are	used.	Establish	a	framework	for	collaboration	and	
validation	among	the	data	scientists	about	data	standards,	semantics,	and	data	transformations.	
Configure	the	data	preparation	tools	to	motivate	reuse	of	defined	transformation	sequences	to	
encourage	end-product	consistency.
©	2017	Knowledge	Integrity,	Inc.	 13	
www.knowledge-integrity.com	 	 (301)	754-6350	
Knowledge Integrity Incorporated
Business Intelligence Solutions
Considerations:	The	Data	Quality	Strategy	
Fulfilling	the	desire	for	improved	organizational	data	quality	requires	a	combination	of	thoughtful	
planning	and	effective	management	of	resources.	The	responsibility	cannot	be	assigned	in	a	haphazard	
way	to	either	the	business	or	the	technical	side	–	both	perspectives	are	required	in	order	to	institute	
controls	and	procedures	that	allow	data	sets	to	meet	the	collective	needs	of	the	consumer	constituency.		
Likewise,	the	quality	of	the	data	cannot	be	improved	by	only	applying	technology	or	only	applying	the	
process	improvements	dictated	by	the	“quality	movement.”	It	requires	a	collaborative	effort	that	arms	
business	process	experts	with	the	right	technical	tools	to	make	cost-effective	decisions	about	
identifying,	reacting	to,	and	anticipating	the	types	of	data	errors	that	lead	to	negative	business	impact.		
Tools	such	as	data	profiling	and	data	mapping	can	help	to	evaluate	different	types	of	errors	and	support	
continuous	monitoring	to	generate	alerts	when	errors	beyond	your	control	need	to	be	addressed.	
Common	data	quality	tools	such	as	parsing,	standardization,	and	identity	matching	and	resolution	can	
be	applied	to	cleanse	errors	and	normalize	data	when	fixing	the	root	causes	of	the	errors	is	beyond	your	
administrative	control.	Dashboards	and	scorecards	can	be	configured	to	support	monitoring	the	
performance	and	effectiveness	of	data	stewards	and	data	quality	analysts	in	how	data	quality	best	
practices	are	applied.		
Lastly,	recognize	that	adopting	the	best	suggestions	from	data	management	professionals	will	allow	the	
development	of	an	effective	strategy	and	plan	for	data	quality	improvements	in	the	short-,	medium-	and	
long-term.	Integrating	methods	for	taking	advantage	of	collaboration	between	technical	implementers	
and	business	data	consumers	will	help	in	proactively	identifying	data	quality	dependencies,	anticipating	
potential	issues,	and	engineering	inspections	and	controls	into	the	application	framework	to	prevent	
errors	from	being	introduced	in	the	first	place.
©	2017	Knowledge	Integrity,	Inc.	 14	
www.knowledge-integrity.com	 	 (301)	754-6350	
Knowledge Integrity Incorporated
Business Intelligence Solutions
About	the	Author	
David	Loshin,	president	of	Knowledge	Integrity,	Inc,	(www.knowledge-integrity.com),	is	a	recognized	
thought	leader	and	expert	consultant	in	the	areas	of	analytics,	big	data,	data	governance,	data	quality,	
master	data	management,	and	business	intelligence.	Along	with	consulting	on	numerous	data	
management	projects	over	the	past	15	years,	David	is	also	a	prolific	author	regarding	business	
intelligence	best	practices,	as	the	author	of	numerous	books	and	papers	on	data	management,	including	
the	recently	published	“Big	Data	Analytics:	From	Strategic	Planning	to	Enterprise	Integration	with	Tools,	
Techniques,	NoSQL,	and	Graph,”	the	second	edition	of	“Business	Intelligence	–	The	Savvy	Manager’s	
Guide,”	as	well	as	other	books	and	articles	on	data	quality,	master	data	management,	big	data,	and	data	
governance.	David	is	a	frequent	invited	speaker	at	conferences,	web	seminars,	and	sponsored	web	sites	
and	TechTarget	channels,	and	shares	additional	content	at	his	notes	and	articles	at	
www.dataqualitybook.com	
David	can	be	reached	at	loshin@knowledge-integrity.com,	or	at	(301)	754-6350.
About	the	Sponsor	
About	Information	Builders	
Information	Builders	provides	solutions	for	business	intelligence	(BI),	analytics,	data	integration,	and	
data	quality	that	help	drive	performance	improvements,	innovation,	and	value.	Through	one	set	of	
powerful	products,	we	enable	organizations	to	serve	everyone	–	analysts,	non-technical	users,	even	
partners,	customers,	and	citizens	–	with	better	data	and	analytics.	Our	dedication	to	customer	success	is	
unmatched	with	thousands	of	organizations	relying	on	us	as	their	trusted	partner.	Founded	in	1975,	
Information	Builders	is	headquartered	in	New	York,	NY,	with	global	offices,	and	remains	one	of	the	
largest	independent,	privately	held	companies	in	the	industry.	Visit	us	at	informationbuilders.com,	
follow	us	on	Twitter	at	@infobldrs,	like	us	on	Facebook,	and	visit	our	LinkedIn	page.

More Related Content

What's hot

Slides: Taking an Active Approach to Data Governance
Slides: Taking an Active Approach to Data GovernanceSlides: Taking an Active Approach to Data Governance
Slides: Taking an Active Approach to Data GovernanceDATAVERSITY
 
Inside the circle of trust: Data management for modern enterprises
Inside the circle of trust: Data management for modern enterprisesInside the circle of trust: Data management for modern enterprises
Inside the circle of trust: Data management for modern enterprisesExperian Data Quality
 
Creating a Data-Driven Organization, Data Day Texas, January 2016
Creating a Data-Driven Organization, Data Day Texas, January 2016Creating a Data-Driven Organization, Data Day Texas, January 2016
Creating a Data-Driven Organization, Data Day Texas, January 2016Carl Anderson
 
Analysis of making advanced analytics work for you by jyotsana manglani
Analysis of making advanced analytics work for you by jyotsana manglaniAnalysis of making advanced analytics work for you by jyotsana manglani
Analysis of making advanced analytics work for you by jyotsana manglaniJyotsanaManglani
 
Data Insights and Analytics: The Importance of Effective Communications in An...
Data Insights and Analytics: The Importance of Effective Communications in An...Data Insights and Analytics: The Importance of Effective Communications in An...
Data Insights and Analytics: The Importance of Effective Communications in An...DATAVERSITY
 
Big Data Webinar 31st July 2014
Big Data Webinar 31st July 2014Big Data Webinar 31st July 2014
Big Data Webinar 31st July 2014Gorkana
 
Real-World Data Governance: Agile Data Governance - The Truth Be Told
Real-World Data Governance: Agile Data Governance - The Truth Be ToldReal-World Data Governance: Agile Data Governance - The Truth Be Told
Real-World Data Governance: Agile Data Governance - The Truth Be ToldDATAVERSITY
 
Data Management Cheat sheet
Data Management Cheat sheetData Management Cheat sheet
Data Management Cheat sheetB2Bdatapartners
 
GDPR: A practical approach to Data Preparation; Paul Malyon - Experian
GDPR: A practical approach to Data Preparation; Paul Malyon - ExperianGDPR: A practical approach to Data Preparation; Paul Malyon - Experian
GDPR: A practical approach to Data Preparation; Paul Malyon - ExperianBCS Data Management Specialist Group
 
Data is not facts: The impossibility of being unbiased
Data is not facts: The impossibility of being unbiasedData is not facts: The impossibility of being unbiased
Data is not facts: The impossibility of being unbiasedAndrew Patricio
 
Making advanced analytics work for you
Making advanced analytics work for youMaking advanced analytics work for you
Making advanced analytics work for youYogesh Kumar
 
Getting Ahead Of The Game: Proactive Data Governance
Getting Ahead Of The Game: Proactive Data GovernanceGetting Ahead Of The Game: Proactive Data Governance
Getting Ahead Of The Game: Proactive Data GovernanceHarley Capewell
 
Data Quality for Non-Data People
Data Quality for Non-Data PeopleData Quality for Non-Data People
Data Quality for Non-Data PeopleDATAVERSITY
 
UK Search Engine Benchmark Report 2009
UK Search Engine Benchmark Report 2009UK Search Engine Benchmark Report 2009
UK Search Engine Benchmark Report 2009Econsultancy
 

What's hot (20)

Slides: Taking an Active Approach to Data Governance
Slides: Taking an Active Approach to Data GovernanceSlides: Taking an Active Approach to Data Governance
Slides: Taking an Active Approach to Data Governance
 
Data Driven Economy @CMU
Data Driven Economy @CMUData Driven Economy @CMU
Data Driven Economy @CMU
 
Can you trust your Data? 3 ways to be sure.
Can you trust your Data? 3 ways to be sure.Can you trust your Data? 3 ways to be sure.
Can you trust your Data? 3 ways to be sure.
 
Inside the circle of trust: Data management for modern enterprises
Inside the circle of trust: Data management for modern enterprisesInside the circle of trust: Data management for modern enterprises
Inside the circle of trust: Data management for modern enterprises
 
Creating a Data-Driven Organization, Data Day Texas, January 2016
Creating a Data-Driven Organization, Data Day Texas, January 2016Creating a Data-Driven Organization, Data Day Texas, January 2016
Creating a Data-Driven Organization, Data Day Texas, January 2016
 
A42 COSLA Individual
A42 COSLA IndividualA42 COSLA Individual
A42 COSLA Individual
 
Analysis of making advanced analytics work for you by jyotsana manglani
Analysis of making advanced analytics work for you by jyotsana manglaniAnalysis of making advanced analytics work for you by jyotsana manglani
Analysis of making advanced analytics work for you by jyotsana manglani
 
Data Insights and Analytics: The Importance of Effective Communications in An...
Data Insights and Analytics: The Importance of Effective Communications in An...Data Insights and Analytics: The Importance of Effective Communications in An...
Data Insights and Analytics: The Importance of Effective Communications in An...
 
Building a data hygiene toolkit
Building a data hygiene toolkitBuilding a data hygiene toolkit
Building a data hygiene toolkit
 
Big Data Webinar 31st July 2014
Big Data Webinar 31st July 2014Big Data Webinar 31st July 2014
Big Data Webinar 31st July 2014
 
Real-World Data Governance: Agile Data Governance - The Truth Be Told
Real-World Data Governance: Agile Data Governance - The Truth Be ToldReal-World Data Governance: Agile Data Governance - The Truth Be Told
Real-World Data Governance: Agile Data Governance - The Truth Be Told
 
Data Management Cheat sheet
Data Management Cheat sheetData Management Cheat sheet
Data Management Cheat sheet
 
GDPR: A practical approach to Data Preparation; Paul Malyon - Experian
GDPR: A practical approach to Data Preparation; Paul Malyon - ExperianGDPR: A practical approach to Data Preparation; Paul Malyon - Experian
GDPR: A practical approach to Data Preparation; Paul Malyon - Experian
 
Data is not facts: The impossibility of being unbiased
Data is not facts: The impossibility of being unbiasedData is not facts: The impossibility of being unbiased
Data is not facts: The impossibility of being unbiased
 
Making advanced analytics work for you
Making advanced analytics work for youMaking advanced analytics work for you
Making advanced analytics work for you
 
Getting Ahead Of The Game: Proactive Data Governance
Getting Ahead Of The Game: Proactive Data GovernanceGetting Ahead Of The Game: Proactive Data Governance
Getting Ahead Of The Game: Proactive Data Governance
 
Customer digitaldecisioningfinal
Customer digitaldecisioningfinalCustomer digitaldecisioningfinal
Customer digitaldecisioningfinal
 
Data Quality for Non-Data People
Data Quality for Non-Data PeopleData Quality for Non-Data People
Data Quality for Non-Data People
 
Improve your data usage in 2016
Improve your data usage in 2016Improve your data usage in 2016
Improve your data usage in 2016
 
UK Search Engine Benchmark Report 2009
UK Search Engine Benchmark Report 2009UK Search Engine Benchmark Report 2009
UK Search Engine Benchmark Report 2009
 

Similar to busting_10_myths_dq_management_2017

The Top Four Skills of an Effective Healthcare Data Analyst
The Top Four Skills of an Effective Healthcare Data AnalystThe Top Four Skills of an Effective Healthcare Data Analyst
The Top Four Skills of an Effective Healthcare Data AnalystHealth Catalyst
 
Information Governance Strategy Powerpoint Presentation Slides
Information Governance Strategy Powerpoint Presentation SlidesInformation Governance Strategy Powerpoint Presentation Slides
Information Governance Strategy Powerpoint Presentation SlidesSlideTeam
 
Get the fundamentals of a good data culture ideas..pdf
Get the fundamentals of a good data culture ideas..pdfGet the fundamentals of a good data culture ideas..pdf
Get the fundamentals of a good data culture ideas..pdfJose thomas
 
The Top Five Worst practices in BI
The Top Five Worst practices in BI The Top Five Worst practices in BI
The Top Five Worst practices in BI Abhishek Sood
 
data-analytics-strategy-ebook.pptx
data-analytics-strategy-ebook.pptxdata-analytics-strategy-ebook.pptx
data-analytics-strategy-ebook.pptxMohamedHendawy17
 
Making Data Governance Work - Think Big but Start Small
Making Data Governance Work - Think Big but Start SmallMaking Data Governance Work - Think Big but Start Small
Making Data Governance Work - Think Big but Start SmallEarley Information Science
 
Data Governance — Aligning Technical and Business Approaches
Data Governance — Aligning Technical and Business ApproachesData Governance — Aligning Technical and Business Approaches
Data Governance — Aligning Technical and Business ApproachesDATAVERSITY
 
Big Data - Bridging Technology and Humans
Big Data - Bridging Technology and HumansBig Data - Bridging Technology and Humans
Big Data - Bridging Technology and HumansMark Laurance
 
Benefits of Business intelligence
Benefits of Business intelligenceBenefits of Business intelligence
Benefits of Business intelligenceSwati Gupta
 
KSA Business Intelligence Qualifications
KSA Business Intelligence QualificationsKSA Business Intelligence Qualifications
KSA Business Intelligence QualificationsJDOLIV
 
Analytics in a Data Driven Workplace
Analytics in a Data Driven WorkplaceAnalytics in a Data Driven Workplace
Analytics in a Data Driven Workplacescoopnewsgroup
 
Data-Analytics-Essentials-Building-a-Foundation-for-Informed-Business-Choices...
Data-Analytics-Essentials-Building-a-Foundation-for-Informed-Business-Choices...Data-Analytics-Essentials-Building-a-Foundation-for-Informed-Business-Choices...
Data-Analytics-Essentials-Building-a-Foundation-for-Informed-Business-Choices...Attitude Tally Academy
 
Self-Service Analytics: How to Use Healthcare Business Intelligence
Self-Service Analytics: How to Use Healthcare Business IntelligenceSelf-Service Analytics: How to Use Healthcare Business Intelligence
Self-Service Analytics: How to Use Healthcare Business IntelligenceHealth Catalyst
 
Data Integrity: From speed dating to lifelong partnership
Data Integrity: From speed dating to lifelong partnershipData Integrity: From speed dating to lifelong partnership
Data Integrity: From speed dating to lifelong partnershipPrecisely
 
The Four Pillars of Successful Self-Service Analytics in Healthcare
The Four Pillars of Successful Self-Service Analytics in HealthcareThe Four Pillars of Successful Self-Service Analytics in Healthcare
The Four Pillars of Successful Self-Service Analytics in HealthcareHealth Catalyst
 
Driving Business Performance with effective Enterprise Information Management
Driving Business Performance with effective Enterprise Information ManagementDriving Business Performance with effective Enterprise Information Management
Driving Business Performance with effective Enterprise Information ManagementRay Bachert
 
Accenture Big Data Expo
Accenture Big Data ExpoAccenture Big Data Expo
Accenture Big Data ExpoBigDataExpo
 
Data Quality Strategy: A Step-by-Step Approach
Data Quality Strategy: A Step-by-Step ApproachData Quality Strategy: A Step-by-Step Approach
Data Quality Strategy: A Step-by-Step ApproachFindWhitePapers
 

Similar to busting_10_myths_dq_management_2017 (20)

Drive Business Value With People Analytics.pptx
Drive Business Value With People Analytics.pptxDrive Business Value With People Analytics.pptx
Drive Business Value With People Analytics.pptx
 
The Top Four Skills of an Effective Healthcare Data Analyst
The Top Four Skills of an Effective Healthcare Data AnalystThe Top Four Skills of an Effective Healthcare Data Analyst
The Top Four Skills of an Effective Healthcare Data Analyst
 
Information Governance Strategy Powerpoint Presentation Slides
Information Governance Strategy Powerpoint Presentation SlidesInformation Governance Strategy Powerpoint Presentation Slides
Information Governance Strategy Powerpoint Presentation Slides
 
Get the fundamentals of a good data culture ideas..pdf
Get the fundamentals of a good data culture ideas..pdfGet the fundamentals of a good data culture ideas..pdf
Get the fundamentals of a good data culture ideas..pdf
 
Data Management
Data ManagementData Management
Data Management
 
The Top Five Worst practices in BI
The Top Five Worst practices in BI The Top Five Worst practices in BI
The Top Five Worst practices in BI
 
data-analytics-strategy-ebook.pptx
data-analytics-strategy-ebook.pptxdata-analytics-strategy-ebook.pptx
data-analytics-strategy-ebook.pptx
 
Making Data Governance Work - Think Big but Start Small
Making Data Governance Work - Think Big but Start SmallMaking Data Governance Work - Think Big but Start Small
Making Data Governance Work - Think Big but Start Small
 
Data Governance — Aligning Technical and Business Approaches
Data Governance — Aligning Technical and Business ApproachesData Governance — Aligning Technical and Business Approaches
Data Governance — Aligning Technical and Business Approaches
 
Big Data - Bridging Technology and Humans
Big Data - Bridging Technology and HumansBig Data - Bridging Technology and Humans
Big Data - Bridging Technology and Humans
 
Benefits of Business intelligence
Benefits of Business intelligenceBenefits of Business intelligence
Benefits of Business intelligence
 
KSA Business Intelligence Qualifications
KSA Business Intelligence QualificationsKSA Business Intelligence Qualifications
KSA Business Intelligence Qualifications
 
Analytics in a Data Driven Workplace
Analytics in a Data Driven WorkplaceAnalytics in a Data Driven Workplace
Analytics in a Data Driven Workplace
 
Data-Analytics-Essentials-Building-a-Foundation-for-Informed-Business-Choices...
Data-Analytics-Essentials-Building-a-Foundation-for-Informed-Business-Choices...Data-Analytics-Essentials-Building-a-Foundation-for-Informed-Business-Choices...
Data-Analytics-Essentials-Building-a-Foundation-for-Informed-Business-Choices...
 
Self-Service Analytics: How to Use Healthcare Business Intelligence
Self-Service Analytics: How to Use Healthcare Business IntelligenceSelf-Service Analytics: How to Use Healthcare Business Intelligence
Self-Service Analytics: How to Use Healthcare Business Intelligence
 
Data Integrity: From speed dating to lifelong partnership
Data Integrity: From speed dating to lifelong partnershipData Integrity: From speed dating to lifelong partnership
Data Integrity: From speed dating to lifelong partnership
 
The Four Pillars of Successful Self-Service Analytics in Healthcare
The Four Pillars of Successful Self-Service Analytics in HealthcareThe Four Pillars of Successful Self-Service Analytics in Healthcare
The Four Pillars of Successful Self-Service Analytics in Healthcare
 
Driving Business Performance with effective Enterprise Information Management
Driving Business Performance with effective Enterprise Information ManagementDriving Business Performance with effective Enterprise Information Management
Driving Business Performance with effective Enterprise Information Management
 
Accenture Big Data Expo
Accenture Big Data ExpoAccenture Big Data Expo
Accenture Big Data Expo
 
Data Quality Strategy: A Step-by-Step Approach
Data Quality Strategy: A Step-by-Step ApproachData Quality Strategy: A Step-by-Step Approach
Data Quality Strategy: A Step-by-Step Approach
 

busting_10_myths_dq_management_2017

  • 1. © 2017 Knowledge Integrity, Inc. 1 www.knowledge-integrity.com (301) 754-6350 Knowledge Integrity Incorporated Business Intelligence Solutions Busting 10 Myths about Data Quality Management Prepared by: David Loshin Knowledge Integrity, Inc. January 2017 Sponsored by:
  • 2. © 2017 Knowledge Integrity, Inc. 2 www.knowledge-integrity.com (301) 754-6350 Knowledge Integrity Incorporated Business Intelligence Solutions Introduction Even though great strides have been made in data quality improvement over the past decades, many myths and misconceptions are perpetuated through popular articles and presentations. Often, these simplified views can be confusing or conflicting, and those blindly accepting the statements may find their attempts at making discrete progress toward improvement may slow and stall. The goal of this paper is to highlight some common “myths” about data quality management, explain why these are myths, and to guide the reader to make better choices when deciding to pursue a data quality management strategy. Adding critical insight into different aspects of data quality management and putting some common beliefs into perspective will help you put together a more thoughtful plan for a data quality management program that can lead to measurable improvements in the quality and usability of organizational data. There are no substitutes for good data management disciplines, and this paper will advise the practitioner as to what the critical data issues are in the organization and how to leverage the right tools and technologies to address those issues in the most efficient way. Defining and deploying well-defined processes within a culture of data governance will simplify technology acquisition and reduce time to value for implementing a data quality program. Our intent is to provide a balanced view of the best practices for data quality improvement by examining some common statements that can help differentiate what you heard, why it may be a myth, and some considerations to planning your approach to data quality improvement.
  • 3. © 2017 Knowledge Integrity, Inc. 3 www.knowledge-integrity.com (301) 754-6350 Knowledge Integrity Incorporated Business Intelligence Solutions Myth #1: The Business is Responsible for Data Quality What You Heard Data quality does not lie within the purview of the Information technology department; since poor data quality impacts the business, the business users must take ownership for data quality improvement. Why it is a Myth Two of the most intractable issues organizations face when dealing with data quality problems are less technical and more programmatic: funding the program and ensuring its sustainability. The conflict evolves from a difference of opinion regarding financial support and resourcing. Essentially, the statement is intended by Information Technology to drive business engagement in supporting data quality activity. Since the business users presume the equivalence of “data cleansing” and “data quality,” they insist that if technical processes can be used to clean data, the responsibility lies with the IT department, and consequently, so should the funding. On the other hand, the IT teams suggest that since poor data quality impacts the business, and the business users are the ones defining what quality means, then the business users need to take ownership of data quality and absolve IT from accountability. This dichotomy pits IT and the business users against each other in terms of the effort to improve data quality, thereby stalling progress rather than encouraging it. Considerations and Alternatives It is worthwhile to remember that the IT department must work in a partnership with the business users to take best advantage of data. “Information” Technology is always going to be involved in anything that touches information. And it is naive to presume that there are operational models that support only non-technical business people taking responsibility for ensuring data quality. Data quality management must be a collaborative effort that bridges the gaps between IT and the business. An alternative approach considers a collaborative model in which the business side is accountable for ensuring that there are good definitions of data quality rules, measures, and acceptability levels while IT is responsible for instituting the architectural framework for ensuring the rules are observed and reporting the measures. Data governance policies and procedures can be put in place to ensure that issues are reported to the business but are handled by selected IT data stewards.
  • 4. © 2017 Knowledge Integrity, Inc. 4 www.knowledge-integrity.com (301) 754-6350 Knowledge Integrity Incorporated Business Intelligence Solutions Myth #2: IT Owns Data Governance What You Heard Our company has appointed a Chief Data Officer (CDO) who will spearhead IT’s data governance program. Why it is a Myth Data governance comprises the policies and practices that link data policy compliance with achieving business objectives. The data management dependencies identified within business policies drives definition of data policies. Data policies cannot be instituted by fiat, nor can they be enforced without alignment and cooperation between the business and technology teams. Therefore, one cannot expect that a CDO operating within the confines of the IT department has the ability to implement or authority to enforce data governance without buy-in from the representatives of the business functions. Considerations and Alternatives As with myth #1, the responsibility for deploying data governance is split: the business owns the policies and processes, but IT owns the implementation. That suggests that all new system and application development be designed with directly embedded procedures for monitoring data quality and asserting data policy compliance. Although the role and the list of responsibilities of the CDO is still evolving, there is a greater risk of failing to properly institute sustainable practices for data governance when the CDO’s mandate is designated within the information technology silo. The most effective Chief Data Officer will report directly to the CEO, and be empowered to implement data governance by leveraging a partnership between the business and IT. That way, the organization can inaugurate a sustainable data governance program that directly integrates data policy compliance within defined business processes.
  • 5. © 2017 Knowledge Integrity, Inc. 5 www.knowledge-integrity.com (301) 754-6350 Knowledge Integrity Incorporated Business Intelligence Solutions Myth #3: Data Quality Tools Do Not Require Any Set Up What You Heard The acquisition of a data cleansing tool is enough to eliminate all your data quality issues. A data quality tool just plugs into the enterprise and cleans all your data out of the box. Why it is a Myth While technology is critical to data quality measurement and assurance, the quality of data is defined within a business context and is associated with sets of metadata, assertions, and business rules. While data quality tools have a lot of built in capabilities out of the box, they must be properly configured with your organization’s rules in order to identify and cleanse data errors. In addition, the tools will need to be integrated into the organization’s environment. Considerations and Alternatives Often there is a presumption that if there is a data quality problem, then the process of acquiring a data cleansing tool is the only necessary action to take. However a data quality tool is just that: a tool. And just as the act of purchasing a shovel not guarantee that holes will appear in the ground, the purchase of data quality tools does not guarantee that errors will be identified and corrected. Addressing data quality issues goes beyond the purchase of a product. If the tool must be configured with metadata, assertions, and rules in accordance with your business consumers’ expectations, the tool will be most effective in the hands of professionals who understand the data, the context, and the technology. That enables you to assemble a program that combines good data management practices, data stewardship, and the use of tools that will provide the greatest benefit.
  • 6. © 2017 Knowledge Integrity, Inc. 6 www.knowledge-integrity.com (301) 754-6350 Knowledge Integrity Incorporated Business Intelligence Solutions Myth #4: Manufacturing Quality Practices Are Easily Applied to Data What You Heard Quality processes as applied to manufacturing activities can be directly mapped to an “information manufacturing” process. Therefore, quality techniques are eminently applicable to information. Why it is a Myth There is no doubt that the pioneers advocating quality in manufacturing such as Phillip Crosby, W. Edwards Deming, and Joseph Juran have positively impacted the ways that manufacturers do business. It makes sense to try to adapt their common-sense approaches to managing the quality of information, and there have been some purported successes along these lines. But often, those advocating applying a process-only quality approach to data may find that their successes may be bounded by the characteristics of information that differ from other presumably “raw” material. Physical manufacturing processes take limited amounts of raw materials that are transformed through a series of processes into a unique final product. That product’s attribution and criteria can be compared to discrete specifications for its intended use, such as the amount of usable storage on a DVD, or the melting temperature of a screw. In this analogy, data is the raw material and information products are the results. Yet in contrast to real raw materials, data can be used multiple times, and contrary to real resulting manufactured products, the output of information processes can be reused and repurposed in ways of which the original owners never dreamt, let alone prepared for. Attempting to monitor compliance to specifications requires that all those specifications are known beforehand – and this is often not the case with data. Considerations and Alternatives There are definitely aspects of the quality movement that are applicable to managing data. But when you take into account the fact that data sets are often found and reused in a variety of different ways, you must reconsider what can be discretely defined as quality measures when there is a potential for uncontrolled reuse. This consideration can lead to two different conclusions. First, if there are ways of anticipating the potential for different ways that created data can be repurposed, that might influence the managers of the original sources to introduce data testing as part of a development life cycle process to anticipate the types of flaws that might cause problems downstream and attempt to reduce or eliminate them from the beginning. Second, those repurposing existing data sets might not be able to influence the insertion of quality controls. In these cases the data consumers must take on the responsibility to ensure the data meets their needs, and this may involve the direct application of data quality tools and techniques.
  • 7. © 2017 Knowledge Integrity, Inc. 7 www.knowledge-integrity.com (301) 754-6350 Knowledge Integrity Incorporated Business Intelligence Solutions Myth #5: You Must Have Perfect Data What You Heard Monitor data to identify imperfect data and address process issues that allow any imperfections. By ensuring that the data is always perfect, no errors can impact the business. Why it is a Myth As myth #3 noted, unlike manufactured items whose adherence to engineering specifications can be measured as they drop off the assembly line, data instances are created once and then used in different ways by different processes requiring different levels of quality. The concept of “perfect” data is contextual but is essentially based on use, not creation. Yet most of the time, data perfection is assumed based on the constraints set by the process creating the record, and it is still relatively uncommon for the expectations of downstream users to be folded into the requirements as new applications are being designed and built. The result is that most creating processes only care about immediate (i.e. operational) use of data, but will not accommodate the needs of other consumers as the data sets are repurposed. On the other hand, many data values are captured and stored for dubious reasons (they were part of a purchased data model, or retained columns from a data migration project), but they may limited (if any) use. In this scenario, while the concept of perfect data expresses an ideal, the data value may not be business critical or necessary to achieve any business objectives, and investing energy in ensuring its perfection is basically a wasted effort. Considerations and Alternatives A saying commonly attributed to Voltaire is that “Perfect is the enemy of good.” Obsessive perfectionism, even in the context of data quality, comes at a cost when the effort needed to reach perfection exceeds the value to be achieved by progressing from good to perfect. It is a noble idea to have perfect data, but with limited resources, it's better to focus on the biggest offenders. Understanding who the data consumers are and what their expectations will be for data quality allows you to more effectively anticipate process failures that can introduce errors. The law of diminishing returns demands that the data quality team members be smart about how resources are allocated to ensure that the organization gets the biggest bang for its buck while providing the greatest value and efficiency.
  • 8. © 2017 Knowledge Integrity, Inc. 8 www.knowledge-integrity.com (301) 754-6350 Knowledge Integrity Incorporated Business Intelligence Solutions Myth #6: The Cost of Bad Data is Obvious What You Heard Ensuring that your data is perfect will automatically increase revenues and decrease costs. Why it is a Myth There is no doubt that pervasive data flaws will lead to negative impacts to the business. And in general, reducing the frequency and scale of occurrence of data errors should reduce the negative impacts. But little has been reported regarding the connections between specific errors and identified costs, and that means that there is a subtle difference between putting controls into place to prevent negative impacts and making claims of the value of “perfect” data. The costs of flawed data might not be reflected in the allocation of budget; for example, financial transaction errors may impose a cost on the contact center rather than on the transaction processing center when customers call in to complain about errors in their statements. The costs might not be associated with data quality management within a budget line item. To follow our example, the business impacts of incorrect customer statements are essentially subsumed within the indirect costs of the customer service center. Considerations and Alternatives This can make it difficult to determine how much you will save when you clean up your data. But it doesn't mean that you're not saving money! Understanding this subtlety will help in truly identifying business impacts directly related to data flaws. Awareness of the types of errors that might contribute to negative impact allows the team to institute data quality controls that can lead to predictably improved business processes. This suggests: • Soliciting feedback from the business consumers of a data set; • Determining the types of errors that have occurred or that may occur that may lead to a negative impact; and • Defining methods for determining the existence of an error at a point in the process where it can be most effectively remediated. It is wise to prevent preventable errors that have material impact. On the other hand, if no one cares about some type of error, it may not make sense to exert the effort to prevent it. Determine which error scenarios are most likely to cause greatest impact and prioritize accordingly.
  • 9. © 2017 Knowledge Integrity, Inc. 9 www.knowledge-integrity.com (301) 754-6350 Knowledge Integrity Incorporated Business Intelligence Solutions Myth #7: Monitoring and Reporting Data Quality Eliminates Errors What You Heard By instituting a data quality dashboard populated with continuous measures, you will eliminate data errors. Why it is a Myth The fundamental idea behind reporting conformance to data quality expectations within a data quality scorecard or dashboard is to alert the data stewards when predictable issues can be identified within the process. The theory is that by instituting data rules for continuous measurement, you can be proactive when errors occur and prevent those errors from impacting the business. However, in many cases, if you know enough to describe and then identify the error so that you can do something about it when it occurs, you would probably be better off in seeking out the root cause of the error and completely eliminating it, which would obviate the need to continue to monitor for the error. Why continue to measure something whose cause might actually be eliminated? Considerations and Alternatives Actually, in some sense, measuring (and addressing) something you already know to be a problem is not being proactive, but rather it is being reactive earlier in the process. However, using a data quality dashboard to alert you to issues early on helps you be prepared so that data quality surprises don't knock you out, which allows more flexibility and preparedness in reacting to emerging data quality issues that have not yet been identified. On the other hand, being truly proactive means anticipating the types of errors that could occur and engaging the consumers to assess any potential impacts of those errors. This can inform the design and implementation teams as to whether fundamental changes to the process can reduce the potential for those errors occurring. Incorporating this governance practice as part of the development lifecycle process enables the use of tools and technologies to predict errors so that the application developers can build stop-gaps and controls to ensure the errors won’t happen in the first place.
  • 10. © 2017 Knowledge Integrity, Inc. 10 www.knowledge-integrity.com (301) 754-6350 Knowledge Integrity Incorporated Business Intelligence Solutions Myth #8: Data Quality is Only Solved by Process Improvement What You Heard You don’t need data quality tools – all data quality issues can be resolved using good process management! Why it is a Myth Philosophical approaches to management improvement often are based on laboratory conditions, which allow one to make generalizations that might not apply in most real-life scenarios. In particular, be aware that there are methodologies for data quality improvement that focus only on process improvement, such as insisting that data providers always validate their data before providing it to the consumers. Often, these approaches suggest that technical approaches are not necessary to improving the quality of data. In the laboratory environment, simplistic approaches bypass any political and logistic issues, and often do not reflect real-life scenarios in which data suppliers have no budget or interest in changing their ways to support unknown downstream users, users repurpose “found” data sets with no control over data creation, or participants in a collaborative environment must agree to rules for standardization for data sharing. These situations require a combination of process improvement and data quality tools and techniques to ensure data usability. Considerations and Alternatives The unmet challenge, in many cases, is that those advocating process improvement do not understand the limitations when you do not exercise control over the administrative domain within which the data flows. For example, when you cannot engage the data creators to modify their ways to ensure the data meets your application’s needs, you still must do something to prevent data flaws from impacting your own processes. Instead you must look at engaging the data source owners when possible but have a strategy for maintaining data quality at the necessary level when appropriate. If you cannot influence change over the processes in the information flow, employing tools for parsing, standardization, and cleansing may be the best next step in managing data fitness for your own purposes.
  • 11. © 2017 Knowledge Integrity, Inc. 11 www.knowledge-integrity.com (301) 754-6350 Knowledge Integrity Incorporated Business Intelligence Solutions Myth #9: We Can Establish a Single Enterprise Standard for Data Quality What You Heard We will centralize all data quality standards and apply them to all enterprise data. This will ensure that all data consumers get consistently high-quality data. Why it is a Myth While many downstream data consumers share the fundamental set of data quality expectations regarding timeliness, currency, and completeness, the details of specific consistency and reasonableness expectations may differ based on the business context and application. Some business processes may be able to ignore selected types of data flaws, while others have no tolerance for the same errors. “Data quality” is typically in the eyes of the beholder, and attempting to enforce a single standard may be too onerous for some users and insufficient for others. Considerations and Alternatives Instead of centralizing the data quality standards, centralize data quality management. Embrace a proper set of data quality tools that can support verification and validation of selected data quality rules at numerous points along the end-to-end data flows, and train your users on how those tools are used to evaluate data quality expectations Use a collaborative platform for proposing, documenting, and adopting data quality rules. This allows rules to be shared without demanding that all rules be enforced across all data flows. Centralized data quality management enables the data users to specify what rules are reasonable for their business processes and applications, which ones are to be applied, at what points in the process, and the necessary levels of acceptability. This approach may help to establish common standards. At the same time, it does not lock the entire organization into a monolithic (and potentially bloated) set of rules. Added flexibility lets you grant reasonable dispensations from proposed enterprise standards given appropriate business circumstances.
  • 12. © 2017 Knowledge Integrity, Inc. 12 www.knowledge-integrity.com (301) 754-6350 Knowledge Integrity Incorporated Business Intelligence Solutions Myth #10: Data Scientists Manage Their Own Data Quality What You Heard Data preparation tools let data scientists discover data quality issues and provide ways to transform raw data into usable formats with little to no effort. Why it is a Myth This is essentially the converse of Myth #8. Data preparation tools provide each end-user with the means to profile raw data and consider alternative methods of reformulation and transformation. Giving individual analysts the ability to craft their own sequences of transformations is appealing because it allows them the flexibility in asserting standards and semantics. However, when isolated analysts applying their transformations do not share what they are doing or their processes, the risk of inconsistent definitions and specifications across the organization increases. So even if different data scientists are properly using their data preparation tools, the impacts of slight variations in their transformations may reverberate when representatives of the business attempt to interpret potentially conflicting results. Considerations and Alternatives Having individuals managing their own data quality in a vacuum can lead to conflicting results. However, as suggested in Myth #8, having individualized data quality plans is actually a healthy alternative to the conventional IT-driven data quality program, since data usability is essentially defined in the contexts of the data consumers. If the concern is inconsistency of interpretation of analytical results, introduce policies for governing the ways that end-user data preparation tools are used. Establish a framework for collaboration and validation among the data scientists about data standards, semantics, and data transformations. Configure the data preparation tools to motivate reuse of defined transformation sequences to encourage end-product consistency.
  • 13. © 2017 Knowledge Integrity, Inc. 13 www.knowledge-integrity.com (301) 754-6350 Knowledge Integrity Incorporated Business Intelligence Solutions Considerations: The Data Quality Strategy Fulfilling the desire for improved organizational data quality requires a combination of thoughtful planning and effective management of resources. The responsibility cannot be assigned in a haphazard way to either the business or the technical side – both perspectives are required in order to institute controls and procedures that allow data sets to meet the collective needs of the consumer constituency. Likewise, the quality of the data cannot be improved by only applying technology or only applying the process improvements dictated by the “quality movement.” It requires a collaborative effort that arms business process experts with the right technical tools to make cost-effective decisions about identifying, reacting to, and anticipating the types of data errors that lead to negative business impact. Tools such as data profiling and data mapping can help to evaluate different types of errors and support continuous monitoring to generate alerts when errors beyond your control need to be addressed. Common data quality tools such as parsing, standardization, and identity matching and resolution can be applied to cleanse errors and normalize data when fixing the root causes of the errors is beyond your administrative control. Dashboards and scorecards can be configured to support monitoring the performance and effectiveness of data stewards and data quality analysts in how data quality best practices are applied. Lastly, recognize that adopting the best suggestions from data management professionals will allow the development of an effective strategy and plan for data quality improvements in the short-, medium- and long-term. Integrating methods for taking advantage of collaboration between technical implementers and business data consumers will help in proactively identifying data quality dependencies, anticipating potential issues, and engineering inspections and controls into the application framework to prevent errors from being introduced in the first place.
  • 14. © 2017 Knowledge Integrity, Inc. 14 www.knowledge-integrity.com (301) 754-6350 Knowledge Integrity Incorporated Business Intelligence Solutions About the Author David Loshin, president of Knowledge Integrity, Inc, (www.knowledge-integrity.com), is a recognized thought leader and expert consultant in the areas of analytics, big data, data governance, data quality, master data management, and business intelligence. Along with consulting on numerous data management projects over the past 15 years, David is also a prolific author regarding business intelligence best practices, as the author of numerous books and papers on data management, including the recently published “Big Data Analytics: From Strategic Planning to Enterprise Integration with Tools, Techniques, NoSQL, and Graph,” the second edition of “Business Intelligence – The Savvy Manager’s Guide,” as well as other books and articles on data quality, master data management, big data, and data governance. David is a frequent invited speaker at conferences, web seminars, and sponsored web sites and TechTarget channels, and shares additional content at his notes and articles at www.dataqualitybook.com David can be reached at loshin@knowledge-integrity.com, or at (301) 754-6350. About the Sponsor About Information Builders Information Builders provides solutions for business intelligence (BI), analytics, data integration, and data quality that help drive performance improvements, innovation, and value. Through one set of powerful products, we enable organizations to serve everyone – analysts, non-technical users, even partners, customers, and citizens – with better data and analytics. Our dedication to customer success is unmatched with thousands of organizations relying on us as their trusted partner. Founded in 1975, Information Builders is headquartered in New York, NY, with global offices, and remains one of the largest independent, privately held companies in the industry. Visit us at informationbuilders.com, follow us on Twitter at @infobldrs, like us on Facebook, and visit our LinkedIn page.