World®
’16
Tech	Talk:	Sometimes	Less	is	More	–
Visualization	Can	Reduce	your	Test	
Data	while	Enhancing	Quality!
James	Walker	– Principal	Software	Engineer	– CA	Technologies
DO5T06T
DEVOPS
2 ©	2016	CA.	ALL	RIGHTS	RESERVED.@CAWORLD				#CAWORLD
©	2016	CA.	All	rights	reserved.	All	trademarks	referenced	herein	belong	to	their	respective	companies.
The	content	provided	in	this CA	World	2016	presentation	is	intended	for	informational	purposes	only	and	does	not	form	any	type	of	
warranty. The information	provided	by	a	CA	partner	and/or	CA	customer	has	not	been	reviewed	for	accuracy	by	CA.	
For	Informational	Purposes	Only	
Terms	of	this	Presentation
3 ©	2016	CA.	ALL	RIGHTS	RESERVED.@CAWORLD				#CAWORLD
Abstract
Effective	testing	requires	high	quality	test	data,	but	most	organizations	still	rely	on	
production	data	which	provides	just	10-20%	functional	coverage.	This	data	is	
drawn	from	“business	as	usual”	scenarios	that	have	occurred	in	the	past,	and	so	
rarely	provide	the	negative	scenarios	and	outliers	needed	to	rigorously	test	
software.	The	data	that	comes	thick	and	fast	into	production	is	further	too	large	
and	complicated	for	any	human	mind	to	evaluate,	so	that	profiling	or	modelling	
technology	is	needed.	To	ensure	that	they	have	the	quality	data	needed	for	
testing,	organizations	need	to	be	able	to	evaluate	which	attributes	exist	in	existing	
data,	as	well	as	how	they	combine.	Only	then	can	they	evaluate	which	missing	
attributes	are	needed	to	execute	the	tests	needed	to	deliver	quality	software.	
This	Tech	Talk	will	show	how	data	visualization	provides	a	quick	and	reliable	
method	to	measure	the	test	coverage	provided	by	existing	test	data,	spotting	any	
missing	or	invalid	data	at	a	glance.	Presenting	data	attributes	and	dimensions	in	
pictorial	form	allows	users	to	understand	what	data	they	have,	how	its	attributes	
relate,	and	what	data	is	missing.	The	accurate	model	created	in	CA’s	Data	
Visualization	can	further	then	be	fed	into	CA	Agile	Requirements	Designer	and	CA	
Test	Data	Manager,	creating	the	smallest	set	of	data	needed	to	satisfy	every	
possible	test	automatically.
James	Walker
CA	Technologies
Principal	Software	
Engineer,	CA	Agile	
Requirements	Designer
4 ©	2016	CA.	ALL	RIGHTS	RESERVED.@CAWORLD				#CAWORLD
About	Me
§ BSc,	MRes,	PhD	– Swansea	
University,	Wales
§ Research	in	Data	Visualisation	/	
Big	Data	problems
§ Grid-Tools	– Software	Engineer	
(2012	– 2015)
§ CA	– Lead	Software	Engineer	
ARD
5 ©	2016	CA.	ALL	RIGHTS	RESERVED.@CAWORLD				#CAWORLD
MOTIVATION	
INTRODUCTION	TO	DATA	VISUALIZATION
DATA	VISUALIZATION	FOR	TEST	DATA
DEMO
1
2
3
4
Agenda
CONCLUDING	THOUGHTS5
6 ©	2016	CA.	ALL	RIGHTS	RESERVED.@CAWORLD				#CAWORLD
Test	Data	Challenges
§ Production	is	often	viewed	as	the	true	source	of	good	test	
data	(default	take	a	copy	of	production	data	- masking)
§ Production	data	has	high	volumes	and	low	variance	(edge	
cases)
§ Better	data	in	development	should	be	the	goal	(subjective	–
what	is	better?)
To	do	this	you	need	to	be	able	to	answer	a	whole	bunch	of	
questions
7 ©	2016	CA.	ALL	RIGHTS	RESERVED.@CAWORLD				#CAWORLD
Test	Data	Challenges
“What	data	do	I	have?”
“What	data	don’t	I	have?”
“Do	I	have the	data	I	need	
for	my	tests?”
“Where	am	I	
undertesting?”
“Where	am	I	
overtesting?”
“How	effective	is	my	test	
data?”
“What	is	my	data	
coverage?”
8 ©	2016	CA.	ALL	RIGHTS	RESERVED.@CAWORLD				#CAWORLD
Custom	SQL
Data Views	/	Cubes Off-the-shelf visualisation	tool
Old	World	Order…
1.
2. 3.
9 ©	2016	CA.	ALL	RIGHTS	RESERVED.@CAWORLD				#CAWORLD
New	World	Order…	Testing	to	the	Big	Data	Field	
Technological	advancements	over	the	past	decade	have	increased	our	ability	to	collect	data	to	
previously	unimaginable	volumes
Estimated	that	people	will	generate	4.3	exabytes of	data	in	their	lifetime(1).
Data	contains	huge	amounts	of	value	for	gaining	insight,	understanding,	decision	making,	and	
prediction.
Virtually	every	field	of	science	and	industry	is	taking	advantage	of	analytics	(medicine,	sports,	
weather,	finances,	etc).
Testing	is	late	to	the	party – Huge	opportunities	for	big	data	techniques	to	help	us	test	
our	software,	understand	the	results,	gain	insight	into	quality,	make	decisions	(should	we	
release?),	and	eventually	predict	results	before	we’ve	even	ran	a	single	test	case...
10 ©	2016	CA.	ALL	RIGHTS	RESERVED.@CAWORLD				#CAWORLD
Introduction	to	Data	Visualization
11 ©	2016	CA.	ALL	RIGHTS	RESERVED.@CAWORLD				#CAWORLD
“The	purpose	of	computing	is	insight,	not	numbers”
Visualization:
§ A	tool	that	allows	the	user	to	gain	insight	into	data
§ To	form	a	mental	vision,	image,	or	picture	of	(something	not	
visible	or	present	to	the	sight	or	an	abstraction);	to	make	
visible	to	the	mind	or	imagination	[Oxford	English	Dictionary,	
1989]
Richard	W.	Hamming,	1962
14 ©	2016	CA.	ALL	RIGHTS	RESERVED.@CAWORLD				#CAWORLD
http://www.comm-dev.org/
12 ©	2016	CA.	ALL	RIGHTS	RESERVED.@CAWORLD				#CAWORLD
Visualization	is	Very	Old
§ Often	an	intuitive	step	to	make	
phenomena	clearer	e.g.	a	graph
§ Classical	(easy)	approaches	known	
from	business	graphics	(excel,	etc)
§ Only	now	in	the	past	decade	is	the	
value	starting	to	become	prevalent
https://utah.com/parowan-gap
13 ©	2016	CA.	ALL	RIGHTS	RESERVED.@CAWORLD				#CAWORLD
Data	Sets	Are	Ever-increasing	in	Size	– A	Graphical	
Approach	Is	Necessary	
Before – Simple	tabular	data	(very	low	number	of	data	and	
dimensions
Now – Distributed	systems	creating	millions	of	rows	a	second
14 ©	2016	CA.	ALL	RIGHTS	RESERVED.@CAWORLD				#CAWORLD
Visualization	is	Good	for:
§ Exploration
– Find	the	unknown,	unexpected
– Hypothesis	generation
§ Analysis
– Confirm	or	reject	hypotheses
– Information	drill-down
§ Presentation
– Communicate	/	disseminate	results
15 ©	2016	CA.	ALL	RIGHTS	RESERVED.@CAWORLD				#CAWORLD
https://fluidi.wordpress.com
World®
’16©	2016	CA.	ALL	RIGHTS	RESERVED.@CAWORLD				#CAWORLD15
16 ©	2016	CA.	ALL	RIGHTS	RESERVED.@CAWORLD				#CAWORLD
So	What	is	Data	Visualization?
§ Data	visualization	is	the	process	of	creating	graphical	
abstractions	of	data
§ Use	visualisation	on	the	daily	basis	(i.e.	Tube	map,	weather	
report,	stock	market,	web	traffic…)
§ Techniques	have	enormous	value	to	all	aspects	of	the	world	
we	live	in	– today	we	focus	on	testing	&	test	data!
17 ©	2016	CA.	ALL	RIGHTS	RESERVED.@CAWORLD				#CAWORLD
CA	Test	Data	Visualizer
18 ©	2016	CA.	ALL	RIGHTS	RESERVED.@CAWORLD				#CAWORLD
Large	Relational	Databases
Protein Data Bank – pdb.org
19 ©	2016	CA.	ALL	RIGHTS	RESERVED.@CAWORLD				#CAWORLD
Data	Combinations
§ It	is	impossible	to	consider	“All	Combinations”	of	data	
n times	n times	n times	n =	very	large
Each	spin	of	the	lock	is	a	data	attribute
40	possible	positions
4	inputs	required
40	x	40	x	40	x	40	=	2,560,000
40	x	40	x	40	x	40	x	40	=	102,400,000
40	x	40	x	40	x	40	x	40	x	40	=	4,096,000,000
20 ©	2016	CA.	ALL	RIGHTS	RESERVED.@CAWORLD				#CAWORLD
Data	Concepts
§ Data	concepts	– only	some	combinations	are	relevant	for	a	
test	case	(test	requirements).
§ How	do	tests	relate	to	the	data?
§ Not	all	columns	matter,	but	their	combined	effect	does
§ We	create	a	meta-layer	of	test	data	attributes
21 ©	2016	CA.	ALL	RIGHTS	RESERVED.@CAWORLD				#CAWORLD
Data	View	– Flatten	the	Data	and	Pick	Relevant	
Attributes
22 ©	2016	CA.	ALL	RIGHTS	RESERVED.@CAWORLD				#CAWORLD
Demo
23 ©	2016	CA.	ALL	RIGHTS	RESERVED.@CAWORLD				#CAWORLD
Conclusion	– Test	Data	Visualizer
§ A	visualization	tool	– designed	to	analyse &	and	assist	in	
building	‘better’	test	data
§ Use	advanced	spot	diagrams	and	parallel	coordinates
– Compare	data	for	valid	and	invalid	sets	of	combinations
– Identify	missing	combinations	of	data
– Indentify over	and	under-testing
– Compare	environments	for	coverage	(QA1,	QA2)
– Measure	data	coverage	accurately
– Reserve	data	amongst	team	members
24 ©	2016	CA.	ALL	RIGHTS	RESERVED.@CAWORLD				#CAWORLD
Test	Data	Visualizer
“What	data	do	I	have?”
“What	data	don’t	I	have?”
“Do	I	have the	data	I	need	
for	my	tests?”
“Where	am	I	
undertesting?”
“Where	am	I	
overtesting?”
“How	effective	is	my	test	
data?”
“What	is	my	data	
coverage?”
25 ©	2016	CA.	ALL	RIGHTS	RESERVED.@CAWORLD				#CAWORLD
Recommended	Sessions
SESSION	# TITLE DATE/TIME
DO5T17S
Case	Study:	Nationwide's	CA	Test	Data	Manager	Success	
Story
11/17/2016	at	1:45	PM
DO5T07T
TechTalk:	What	Happened	in	the	Backend?	The	Power	of	
DB	Compare
11/17/2016	at	3:00	PM
DO5X42S
TechVision:	Test	Data	on	Demand:	Delivering	the	Right	
Data,	to	the	Right	Place,	at	the	Right	Time
11/17/2016	at	4:30	PM
26 ©	2016	CA.	ALL	RIGHTS	RESERVED.@CAWORLD				#CAWORLD
Stay	connected	at	communities.ca.com
Thank	you.
27 ©	2016	CA.	ALL	RIGHTS	RESERVED.@CAWORLD				#CAWORLD
DevOps	– Continuous	Delivery
For	more	information	on	DevOps	– Continuous	Delivery,	please	
visit:	http://cainc.to/PiTFpu

TechTalk: Sometimes Less is More –Visualization Can Reduce your Test Data while Enhancing Quality!