SlideShare a Scribd company logo
1 of 48
Download to read offline
Use	and	Misuse	of		
the	Term	Experiment		
in	MSR	Research	
Natalia	Juristo		
	
University	of	Oulu	
&		
Technical	University	of	Madrid	
	
PROMISE September 7th 2016
Mo?va?on	
n  Today	empiricism	is	everywhere	in	SE	
n  This	does	not	mean	SE	is	empirically	mature	
n  Conduc?ng	empirical	studies	does	not	imply	they	
are	carried	out	and	understood	properly	
n  I	focus	here	in	a	methodological	issue	on	MSR	
research	
n  The	use	of	experiments	in	MSR	
2
Mo?va?on	
n  For	several	years	I	have	been	struggling	with	
matching	MSR	research	with	the	more	tradi?onal	SE	
empirical	research	(being	conducted	along	the	last	
35	years)	
n  Very	oLen	I	was	shocked	hearing	to	call	experiment	
(in	MSR	works)	to	empirical	studies	I	do	not	consider	
as	such	
n  I	discuss	today	about	a	research	we	are	conduc?ng	
to	clarify	this	issue	
3
Collabora?on	
n  This	research	has	been	conducted	in	
collabora?on	with	
n  Claudia	Ayala	
n  Xavier	Franch	
n  Burak	Turhan	
4
Evidence	of	Misuse
Small-scale	Literature	Review	
n  We	conducted	a	literature	review	to	double-
check	the	use	of	the	term	experiment	in	MSR	
works	
n  2015	MSR,	ESEM	and	EMSE		
n  MSR	 	42	papers	reviewed	
n  ESEM	 	36	papers	
n  EMSE	 	55	papers	
6
Findings	
Venue	
2015	
Use	of	Term	
Experiment	
MSR		
vs	tradi<onal	experiment	
MSR		
Use	vs.	Misuse	
ESEM	 30.5%	
11	out	of	36	
72,72%	MSR	Works	(8	papers)	
	
27,28%	tradi?onal	experiments	(3	papers)	
Wrong	use:	12,5%	
Proper	use	:	87,5%	
MSR	 42,8%	
18	out	of	42		
100%	MSR	Works	(18	papers)	
	
0%	tradi?onal	experiments	
Wrong	use:	44,45%	
Proper	use	:	55,55%	
EMSE	 52,72%	
29	out	of	55		
65,51%	MSR	Works	(19	papers)	
	
34,48%	tradi?onal	experiments	(10	papers)	
Wrong	use:	52,63%	
Proper	use	:	47,36%	
	
….Let	me	elaborate	why	the	term	is	misused
What	is	an	experiment
Experiment Definition
n  Empirical procedure where
key variables of a reality
are manipulated
to investigate the impact of such variations
What	Makes	an	Experiment	
n  Manipula?on	of	variables	under	study	
n  Treatments	must	be	assigned	to	experimental	units	
n  Controlling	poten?al	confounding	variables	
impac?ng	results	
n  Confounding	is	eliminated	though	random	assignment	of	
treatments	to	units		
10
What	Makes	an	Experiment	
								Interven?on	
n  Experimenta?on	
n  There	is	a	purposely	interven?on	by	researchers	
n  Researchers	allocate	treatments	to	units	
n  Experimental	groups	(exposure	and	unexposure)	are	
determined	by	researcher	
	
n  Observa?on		
n  Researchers	have	a	passive	role	and	do	not	
interfere	with	reality	
n  Data	are	generated	directly	from	reality	and	a>er	they	
are	analyzed	
n  Exposure	status	is	not	determined	by	researcher	
11
What	Makes	an	Experiment	
								Randomiza?on	
n  Experiments	limit	the	poten?al	for	any	confounding	
factors	(biases)		by	randomly	assigning	one	
par?cipant	pool	to	a	treatment	and	another	
par?cipant	pool	to	control	or	other	treatment	
n  Random	alloca?on	of	treatments	to	subjects	
minimizes	the	chance	that	the	incidence	of	
confounding	(par?cularly	unknown	confounding)	
variables	will	differ	between	the	two	groups	
12
What	Makes	an	Experiment	
								Interven?on	+	Randomiza?on	
n  Interven?on	guarantees	causality	
n  Inspiring	example	
n  In	a	quasi-experiment	the	alloca?on	of	treatment	is	not	
possible	
n  Although	run	under	controlled	condi?ons	
n  The	case	of	psychology	experiments	
n  Personality	treats	
13
What	Does	not	Makes	an	Experiment	
n  Randomiza?on	
n  Comparison	
n  Analysis	techniques	
14
What	Does	not	Makes	an	Experiment	
Randomiza?on	
n  Randomiza?on	is	a	strategy	aiming	to	reduce	
confounding	variables	(bias)	
n  It	is	mandatory	in	controlled	experiments		
n  Can	be	applied	to	other	type	of	empirical	studies	
n  Inspiring	example	
n  Randomiza?on	in	surveys	
15
What	Does	not	Makes	an	Experiment	
Comparison	
n  Compare	among	the	impact	of	values	of	a	
variable	does	not	mean	we	will	be	able	to	
reveal	causality	
n  Comparing	in	a	set	of	data	units	with	different	
values	of	a	variable	neither	makes	the	study	
an	experiment	nor	can	trace	back	differences	
to	treatments	
16
What	Does	not	Makes	an	Experiment	
	Analysis	
n  Analysis	techniques	do	not	differen?ate	experiments	
from	other	empirical	studies	
n  What	allows	to	reveal	causality	is	not	the	type	of	analysis	
technique	it	is	the	design	of	the	study	
n  Applying	to	a	set	of	data	an	analysis	technique	typically	
used	in	experiments	neither	makes	the	study	an	
experiment	nor	detects	causality	
17
What	Does	not	Makes	an	Experiment	
n  An	MSR	study		
n  Applying	ANOVA	does	not	mean	it	is	an	experiment	
n  Comparing	pools	of	data	differing	in	a	variable’s	value	
does	not	imply	it	is	an	experiment	
n  Even	if	MSR	studies	would	randomized	they	were	
not	experiments	
n  Design	guarantees	
n  The	drop	of	bias	and	confounding	variables	
n  The	differences	observed	in	behavior	are	caused	by	
treatments	
18
Impact	of	Randomiza?on	and	Design	
19
Types	of	Experiments	
n  Without	interven?on	
n  Natural	environment	
n  Natural	experiments	
n  Interven?on	
n  Where?	
n  Ar?ficial	controlled	environment		
n  Laboratory	controlled	experiments	
n  Natural	environment	
n  Field	experiments	
20
Laboratory	experiments	
Purposely	interven?on	
Randomized	alloca?on	of	treatments	
Ar?ficial	environment	highly	controlled	
	
Field	experiments	
Purposely	interven?on	
Randomized	alloca?on	of	treatments	
Natural	uncontrolled	environment
22	
Natural	experiments	
No	interven?on	
In	a	natural	uncontrolled	environment
Mining	SoLware	Repositories	
n  MSR	research	
n  Outcomes	(such	as	quality	and	produc?vity)	are	studied	in	large-
samples	of	past	data	to	
n  Apply	sta?s?cal	methods	to	test	hypothesis	
n  Build	machine	learning	and	mining	methods	on	past	data	into	
tools	to	support	programming	tasks	
n  The	data	stored	in	a	repository	have	been	
obtained	from	reality	(without	interven?on)	
n  Therefore	MSR	works	are	observa?onal	studies	
n  We	could	call	them	natural	experiments	but	that	term	is	
misleading	
23
MSR	and	Epidemiology
Empirical	Studies	in	Medicine	
25	
MethodDevelopment
Laboratory Research
or Pre-clinical
Non-Human
Experiments
Field Research
Ill People Ill & Healthy People
From 20-100
volunteers to
1-2M patients
Descriptive
A n a l y t i c
Retrospective
Prospective
Descriptive
Empirical	Studies	in	Medicine	
Analy<cal	 Experimental	 Clinical	Trial	
Field	Trial	
Group	Trial	
Observa<onal	 Cohort	Studies	 Prospec@ve	Study;	Follow-up	study	
Concurrent	study;	Incidence	study	
Longitudinal	study	
Historical	Cohort	studies	
Case-Control	Studies	 Retrospec@ve	study;	Case	comparison	study	
Case	history	study;	Case	compeer	study;		
Case	referent	study;	Trohoc	study	
Descrip<ve	 Individuals	 Cross-Sec?onal	Studies	 Prevalence	study;	Disease	frequency	study	
Morbidity	survey;	Health	survey	
Case	series	
Single	case	
Popula<on	 Ecological	Studies
(Prospec?ve)	Cohort	Study	
n  A	collec?on	of	data	at	regular	intervals	of	a	group	of	people	who	do	not	have	the	
disease	for	a	period	of	?me	and	see	who	develops	the	disease	(new	incidence)	
n  Cohort		
n  Group	of	people	who	share	a	common	characteris?c	within	a	defined	period		
n  e.g.,	are	born,	are	exposed	to	a	drug	or	vaccine	or	pollutant,	or	undergo	a	certain	medical	procedure	
n  Comparison	group		
n  The	general	popula?on	from	which	the	cohort	is	drawn	
n  Another	cohort	of	persons	thought	to	have	had	likle	or	no	exposure	to	the	substance	
under	inves?ga?on,	but	otherwise	similar	
n  SE:	Projects/Commits	that	have	not	applied	the	method	under	study	
n  Example	
n  Does	exposure	to	X	(smoking)	associate	with	outcome	Y	(lung	cancer)?		
n  Such	a	study	would	recruit	a	group	of	smokers	and	a	group	of	non-smokers	(the	unexposed	group)	
and	follow	them	for	a	set	period	of	?me	and	note	differences	in	the	incidence	of	lung	cancer	
between	the	groups	at	the	end	of	this	?me	
n  SE:	A	passive	follow	up	of	projects/commits,	collec@ng	data	at	regular	intervals	and	no@ng	the	
quality/produc@ve	they	get	
27
Retrospec?ve	Studies	
n  The	researcher	collects	data	from	past	records	and	
does	not	follow	pa?ents	up	as	is	in	prospec?ve	
studies	
n  All	the	events	(exposure,	latent	period,	and	subsequent	
outcome	-development	of	disease-)	have	already	
occurred	in	the	past	
n  Errors	due	to	confounding	and	bias	are	more	
common	in	retrospec?ve	studies	than	in	prospec?ve	
studies	
28
Retrospec?ve	Studies		
Threats	to	Validity	
n  Some	key	data	have	not	been	measured	
n  Biases	may	affect	the	selec?on	of	controls	
n  Selec?on	bias		
n  Only	select	pa?ents	with	the	necessary	informa?on	
n  Misclassifica?on	or	informa?on	bias	as	a	result	of	the	retrospec?ve	
aspect	
n  Researchers	cannot	control	exposure	or	outcome	assessment	but	
instead	need	to	rely	on	others	for	accurate	recordkeeping	
n  It	can	be	very	difficult	to	make	accurate	comparisons	between	the	exposed	
and	the	non-exposed	
29
Retrospec?ve	Cohort	Study	
n  Records	of	groups	of	individuals	who	are	alike	in	
many	ways	but	differ	by	a	certain	characteris?c	are	
compared	for	a	par?cular	output	
n  For	example,	female	nurses	who	smoke	and	those	who	do	not	smoke	
n  SE:	Use	of	past	data	in	a	repository	to	compare	certain	output	of	
projects	with	characteris@c	A	and	no-A	
n  The	researcher	collects	data	from	past	records	and	
does	not	follow	pa?ents	up	as	is	the	case	with	a	
prospec?ve	study	
30
(Retrospec?ve)	Case-Control	Study	
n  Records	of	individuals	are	divided	in	two	groups	
differing	in	outcome	(disease	or	not)	and	compared	
on	the	basis	of	some	supposed	causal	akribute	
n  Case-Control	studies	select	subjects	based	on	their	disease	
status	(the	effect)	
n  Cohort	studies	select	subjects	based	on	their	exposure	
status	(the	cause)	
n  SE:	Select	projects/commits	with	certain	level	(i.e.	quality	value)	and	trace	
back	certain	project	characteris@cs	that	is	believed	to	contribute	to	quality	
31
Ecological	Studies	
n  Units	of	analysis	are	popula?ons		
n  Comparison	of	groups	rather	than	individuals	
n  Explores	correla?ons	between	group	level	exposure	and	outcomes		
32
Hierarchies	of	Evidence
Hierarchy	of	Evidences	
n  It	is	cri?cal	to	understand	which	empirical	
study	you	are	conduc?ng	
n  To	fully	understand	what	the	results	are	telling	us	
n  The	type	of	results	depends	on	the	type	of	study!!!	
n  Evidence	hierarchies	reflect	the	rela?ve	
authority	of	various	types	of	empirical	studies	
34
Authority	of	Evidences	
Field Experiments
Observational
Analytical
Prospective
Retrospective
Observational
Descriptive
Laboratory
Experiments
Psychology	Hierarchy	of	Evidence	
38
Two	MSR	examples
Example	1	
n  MSR’15		
n  The	Uniqueness	of	Changes:	Characteris?cs	and	Applica?ons	
n  Ray,	Nagappan,	Bird,	Nagappan,	Zimmeramnn	
n  Why	this	paper	
n  A	very	well	wriken	paper		
n  Several	empirical	studies	of	different	type	about	the	same	issue	
n  Prominent	MSR		authors		
40
Empirical	Studies	(Authors’	terms)	
n  Topic	
n  Some	changes	are	unique	while	other	are	not	
n  They	propose	a	way	to	iden?fy	uniqueness	of	changes	
n  Empirical	studies	(in	authors’	terms)	
n  Analysis	of	unique	and	non-unique	changes	proper?es	
n  What	is	the	extent	of	unique	changes;	Who	introduces	unique	changes;											
Where	do	unique	changes	take	place	
n  Applica?ons	
n  Experiment	for	Risk	Analysis	
n  Check	whether	U	file	commits	are	have	a	higher	defect	rate	than	NU	file	commits	
n  Use	Mann-Whitney	test	for	the	comparison	
n  Recommenda?on	systems		
n  A	system	is	embedded	in	the	development	environment		to	suggest		changes	to	
developers		
n  Precision	and	recall	of	the	recommenda?ons	is	analyzed	
41
Type	of	Empirical	Studies	(Epidemiology	terms)	
n  Analysis	of	unique	and	non-unique	changes	proper?es	
n  What	is	the	extent	of	unique	changes;	Who	introduces	unique	changes;																				
Where	do	unique	changes	take	place	
n  Ecological	study		
n  Descrip?ve;	Use	of	popula?on	aggregated	data	
n  Applica?on:	Experiment	for	Risk	Analysis	
n  Check	whether	U	file	commits	have	a	higher	defect	rate	than	NU	file	commits	
n  Retrospec?ve	cohort	study	
n  Comparison	of	past	data	
n  Applica?ons:	Recommenda?on	systems		
n  A	system	is	embedded	in	the	development	environment		to	suggest		changes	to	
developers;	Precision	and	recall	of	the	recommenda?ons	is	analyzed	
n  Prospec?ve	observa?onal	study;	ecological?	
n  But	no	comparison	is	made	(i.e.:	if	quality/produc?vity	of	developments	using	the	recommenda?ons)	
n  Could	be	conducted	as	Field	Trial	or	(Prospec?ve)	Cohort	study	
42
Example	2	
n  ESEM’15		
n  How	to	make	best	use	of	cross-company	data	for	web	effort	
es?ma?on	
n  Minku,	Sarro,	Mendes,	Ferrucci	
n  Topic	
n  Compares	CC	dataset	versus	WC	dataset	for	web	effort	es?ma?on	
n  Compares	Dycom	against	NN-filtering	
n  Dycom:	Framework	for	learning	soLware	effort	es?ma?on	models	for	a	company	based	on	
mapping	CC	models	to	the	company’s	context)		
n  NN-filtering:	Nearest	Neighbor	filtering	to	make	CC	es?ma?ons	
43
Experiments	in	Effort	Es?ma?on	
Research	
n  Interven?on	
n  The	two	(effort	es?ma?on)	techniques	compared	
n  Alloca?on	of	treatments	to	units?	
n  Yes	
n  Every	project	belonging	to	the	test	data	set	is	an	experimental	
unit	
n  Experimental	groups	are	the	test	data	set	es?mated	with	one	
or	the	other	technique	
n  Typical	AB	designs;	But	could	try	others	
n  Control	confounding	variables	through	
randomiza?on?	
n  No	
44
Which	Uses	were	Right	
Venue	
2015	
Use	of	Term	
Experiment	
MSR		
vs	tradi<onal	experiment	
MSR		
Use	vs.	Misuse	
ESEM	 30.5%	
11	out	of	36	
72,7%	MSR	Works	(8	papers)	
	
27,3%	tradi?onal	experiments	(3	papers)	
Observa?onal:	12,5%	
Data	experiments:	87,5%	
MSR	 42,8%	
18	out	of	42		
100%	MSR	Works	(18	papers)	
	
0%	tradi?onal	experiments	
Observa?onal:	44,4%	
Data	experiments:	55,5%	
EMSE	 52,72%	
29	out	of	55		
65,5%	MSR	Works	(19	papers)	
	
34,5%	tradi?onal	experiments	(10	papers)	
Observa?onal:	52,6%	
Data	experiments	:	47,4%
Conclusions
Conclusions	
n  MSR	is	a	research	method		by	which	several	type	of	empirical	
studies	can	be	conducted	
n  In	any	case	most	research	is	
n  Observa?onal	
n  Retrospec?ve	
n  Unless	data	is	mined	from	development	tools	prospec?vely	
n  Therefore	the	evidence	obtained	is	of	lower	quality	than	
n  Observa?onal	prospec?ve	studies	
n  Field	experimental	studies	
n  Show	correla?on	but	it	is	hard	to	prove	causa?on		
n  More	powerful	types	of	observa?onal	studies	(Case-control;	Cohort)	
could	get	beker	evidence		
47
Use	and	Misuse	of		
the	Term	Experiment		
in	MSR	Research	
Natalia	Juristo		
	
University	of	Oulu	
&		
Technical	University	of	Madrid	
	
PROMISE September 7th 2016

More Related Content

Similar to PROMISE keynote Juristo

Lesson study in initial teacher education final
Lesson study in initial teacher education finalLesson study in initial teacher education final
Lesson study in initial teacher education finalPhilwood
 
Participatory Action Research post earthquake
Participatory Action Research post earthquakeParticipatory Action Research post earthquake
Participatory Action Research post earthquakeRoji Maharjan
 
Mixed method
Mixed methodMixed method
Mixed methodNeza Mohd
 
Research in edu group 1
Research in edu group 1Research in edu group 1
Research in edu group 1Imam Shofwa
 
Micropolitical Behavior of Second Graders: A qualitative study of student Res...
Micropolitical Behavior of Second Graders: A qualitative study of student Res...Micropolitical Behavior of Second Graders: A qualitative study of student Res...
Micropolitical Behavior of Second Graders: A qualitative study of student Res...Jack Frost
 
ICTEL proceedings of November 2016,Singapore
ICTEL proceedings of November 2016,SingaporeICTEL proceedings of November 2016,Singapore
ICTEL proceedings of November 2016,SingaporeGlobal R & D Services
 
LEAP proceedings of November 2016,Singapore
LEAP proceedings of November 2016,SingaporeLEAP proceedings of November 2016,Singapore
LEAP proceedings of November 2016,SingaporeGlobal R & D Services
 
Educational research
Educational researchEducational research
Educational researchBharti Kumari
 
KV712 Intro to Research Methodology Session1
KV712 Intro to Research Methodology Session1KV712 Intro to Research Methodology Session1
KV712 Intro to Research Methodology Session1kturvey
 

Similar to PROMISE keynote Juristo (20)

Lesson study in initial teacher education final
Lesson study in initial teacher education finalLesson study in initial teacher education final
Lesson study in initial teacher education final
 
Mixed method
Mixed methodMixed method
Mixed method
 
Participatory Action Research post earthquake
Participatory Action Research post earthquakeParticipatory Action Research post earthquake
Participatory Action Research post earthquake
 
Mixed method
Mixed methodMixed method
Mixed method
 
Mixed method
Mixed methodMixed method
Mixed method
 
Mixed method
Mixed methodMixed method
Mixed method
 
Article
ArticleArticle
Article
 
Cs faculty newsletter mar 18
Cs faculty newsletter mar 18Cs faculty newsletter mar 18
Cs faculty newsletter mar 18
 
Research in edu group 1
Research in edu group 1Research in edu group 1
Research in edu group 1
 
Inaugural lecture
Inaugural lectureInaugural lecture
Inaugural lecture
 
Out 6
Out 6Out 6
Out 6
 
Out 2
Out 2Out 2
Out 2
 
Micropolitical Behavior of Second Graders: A qualitative study of student Res...
Micropolitical Behavior of Second Graders: A qualitative study of student Res...Micropolitical Behavior of Second Graders: A qualitative study of student Res...
Micropolitical Behavior of Second Graders: A qualitative study of student Res...
 
ICTEL proceedings of November 2016,Singapore
ICTEL proceedings of November 2016,SingaporeICTEL proceedings of November 2016,Singapore
ICTEL proceedings of November 2016,Singapore
 
LEAP proceedings of November 2016,Singapore
LEAP proceedings of November 2016,SingaporeLEAP proceedings of November 2016,Singapore
LEAP proceedings of November 2016,Singapore
 
Unit_III.pptx
Unit_III.pptxUnit_III.pptx
Unit_III.pptx
 
Skills of university professor and their evaluation.
Skills of university professor and their evaluation.Skills of university professor and their evaluation.
Skills of university professor and their evaluation.
 
Educational research
Educational researchEducational research
Educational research
 
MORE metoring for teachers
MORE metoring for teachersMORE metoring for teachers
MORE metoring for teachers
 
KV712 Intro to Research Methodology Session1
KV712 Intro to Research Methodology Session1KV712 Intro to Research Methodology Session1
KV712 Intro to Research Methodology Session1
 

PROMISE keynote Juristo