Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Natalia Juristo
University of Oulu
&
Technical University of Madrid
Conduc'ng	Experiments	in	Industry:		
The	ESEIL	FiDiPro...
Project People Goal
Gain	insight	into	the	challenges	of	conduc1ng	
experiments	in	the	so6ware	industry	
	
Improve	understanding	of	differences	...
Experiment	topics	chosen	by	companies	
Up	to	three	
	
Each	experiment	replicated	by	several	industrial	partners	
Companies...
Complete	the	SE	experimental	path	
Are	soAware	industrial	experiments	equivalent	to	field	experiments?	
	
Understand	the	ba...
Experiment Runs Design
An experiment on TDD
Experiment sold as hands-on exercises
embedded in a training course
Limitationsareplacedondesign
Part...
TRAINING	 EXERCISES	 TREATMENTS	
DAY	1	 UT	Concepts	&	
Slicing	
2	Slicing	Exercises	 BASELINE	TASK	
(Do	It	Your	Way)	
DAY	...
Organiza'on	 Country	 																						Date	 										No.	of	Subjects	
Univeristy	of	Oulu	 Finland	 may-14	 48	
Tech...
Today !
I will not report !
results!
J !
What we have learnt
Recruitment
•  Hard	to	sign	up	a	significant	number	of	par1cipants	
–  Developer	1me	is	money	
–  Number	of	par1cipants	was...
Technologies
•  Technologies	vary	across	companies	
–  Language,	IDE,	tes1ng	framework	were	different	across	
companies	
		...
Design
•  Par1cipants	vola1lity	threats	control	
– Fewer	adendees	than	signed	up	
– More	drop-outs	
•  Missing	data	points...
@	Paf	
Par1cipants	
–  Planned:	14	subjects	
–  Real:	13	subjects	
–  Useful:	8	subjects	
•  Data	removed:	5	subjects	
–  ...
Design
•  Par1cipants	vola1lity	threats	control	
– Fewer	adendees	than	signed	up	
– More	drop-outs	
•  Missing	data	points...
@	Ericsson	
Planned	Design	
Expected	subjects:	experienced	in	C++,	Eclipse,	Boost	and	unit	tes1ng	
TRAINING	 EXERCISES	 TR...
Characteriza1on	of	the	par1cipants	
Most	subjects	are:	
•  Very	inexperienced	in	Boost	
•  Very	Inexperienced/inexperience...
Behavior
•  Professionals	are	less	mo1vated	than	students	
–  Adendance	of	a	training	course		<>	grading	
–  Preoccupied	w...
Results Reception
•  Managers	very	much	welcomed	the	figures	
–  They	were	amazed	by	the	quan1ta1ve	informa1on	about	
devel...
Means and Error Intervals
Non	significant	 Significant	
15.9%
46.4%
22.5%48.3%
58.4%
Results Reception
•  Managers	very	much	welcomed	the	figures	
–  They	were	amazed	by	the	quan1ta1ve	informa1on	about	
devel...
Impact of Findings
•  Some	adopted	ideas	from	the	experiment		(if	not	
the	results)		
–  To	improve	their	development	tool...
Conclusions
The	concept	of	field	experiment	needs	more	
research	
Its	adapta1on	to	SE	is	not	simple	
Strategies	to	face	threats	to	inte...
Need	to	improve	understanding	on	the	
validity	of	subjects	
	
Students,	although	novices,	might	possibly	be	not	as	bad	
as...
Natalia Juristo
University of Oulu
&
Universidad Politécnica de Madrid
Conduc'ng	Experiments	in	Industry:		
The	ESEIL	FiDi...
CESI Keynote English
CESI Keynote English
CESI Keynote English
CESI Keynote English
Upcoming SlideShare
Loading in …5
×

CESI Keynote English

154 views

Published on

  • Be the first to comment

  • Be the first to like this

CESI Keynote English

  1. 1. Natalia Juristo University of Oulu & Technical University of Madrid Conduc'ng Experiments in Industry: The ESEIL FiDiPro Project
  2. 2. Project People Goal
  3. 3. Gain insight into the challenges of conduc1ng experiments in the so6ware industry Improve understanding of differences between experiments in the lab and in the filed for SE Experimental Software Engineering Industrial Laboratory (ESEIL) January 2013-December 2017
  4. 4. Experiment topics chosen by companies Up to three Each experiment replicated by several industrial partners Companies running 2-3 experiments over 5 years With a minimum of 1 Research Approach
  5. 5. Complete the SE experimental path Are soAware industrial experiments equivalent to field experiments? Understand the barriers to soAware industry experiments Learn whether experiments can be used for decision making in industry Understand the differences between students and professionals as experimental subjects External validity of results with students Behavior of subjects Research Goals
  6. 6. Experiment Runs Design
  7. 7. An experiment on TDD Experiment sold as hands-on exercises embedded in a training course Limitationsareplacedondesign Participantsareprofessionals butnovicesinthetechnologybeingevaluated
  8. 8. TRAINING EXERCISES TREATMENTS DAY 1 UT Concepts & Slicing 2 Slicing Exercises BASELINE TASK (Do It Your Way) DAY 2 Slicing & TDD 2 ITL Exercises 1 TDD Exercise ITL TASK DAY 3 TDD 1 TDD Exercise TDD TASK
  9. 9. Organiza'on Country Date No. of Subjects Univeristy of Oulu Finland may-14 48 Technical University of Madrid Spain mar-14; oct-14; oct-15 38 University of Basilicata Italy oct-15 20 University of Southern Denmark Denmark jan-16 71 Technical University of Valencia Spain may-14 32 Univeristy of ESPE Ecuador apr-14; apr-15;apr-16 43 Elektrobit/Biaum Finland mar-14 9 Ericsson Finland mar-15 21 FSecure Finland & Malaysia oct-13 31 Mapfre Spain jun-15 14 Paf Finland mar-16 13 PlayTech Estonia mar-14 18 Ecuadorian Army Ecuador Apr-15 22 130 professionals 250 students
  10. 10. Today ! I will not report ! results! J !
  11. 11. What we have learnt
  12. 12. Recruitment •  Hard to sign up a significant number of par1cipants –  Developer 1me is money –  Number of par1cipants was low in all cases •  8 -20 •  Training was the only carrot that we found •  Company structure has a major impact on recruitment success –  Companies with booked 1me for training were easier to convince •  F-Secure •  Project leader as champion beder than innova1on managers –  PL administers developer 1me •  Mapfre & Playtech
  13. 13. Technologies •  Technologies vary across companies –  Language, IDE, tes1ng framework were different across companies –  Experimental instruments had to be adapted several 1mes for different companies •  Originally for Java, Eclipse and JUnit (for academic seang) •  Adapted to C++, C#, Boost, Google Test, IntelliJ –  We missed some interes1ng instruments •  As treatment conformance (only available for Java)
  14. 14. Design •  Par1cipants vola1lity threats control – Fewer adendees than signed up – More drop-outs •  Missing data points that threaten validity –  Paf – Adendees some1mes had different profile than expected •  Redesign on the fly –  Ericsson
  15. 15. @ Paf Par1cipants –  Planned: 14 subjects –  Real: 13 subjects –  Useful: 8 subjects •  Data removed: 5 subjects –  4 adended only 1 session –  Group 3 had only 1 subject The importance of staying on to perform all experimental tasks was not well understood Loosing a group meant that we were unable to compare all treatments for several tasks DAY 1 YW DAY 2 ITL DAY 3 TDD GROUP 1 BSK SS MR GROUP 2 SS MR BSK GROUP 3 MR BSK SS
  16. 16. Design •  Par1cipants vola1lity threats control – Fewer adendees than signed up – More drop-outs •  Missing data points that threaten validity –  Paf – Adendees some1mes had different profile than expected •  Redesign on the fly –  Ericsson
  17. 17. @ Ericsson Planned Design Expected subjects: experienced in C++, Eclipse, Boost and unit tes1ng TRAINING EXERCISES TREATMENTS DAY 1 Tes1ng Tool Concepts 2 Tool Exercises 1 Mo1va1onal Exercise (ITL) BASELINE TASK (Do It Your Way) DAY 2 Slicing 2 Slicing Exercises CONTROL TASK (ITL) DAY 3 TDD 3 TDD Exercises TREATMENT TASK (TDD) TRAINING EXERCISES TREATMENTS DAY 1 Tes1ng Tool Concepts 2 Tool Exercises 1 Mo1va1onal Exercise (ITL) BASELINE TASK (Do I tYour Way) DAY 2 Slicing 2 Slicing Exercises CONTROL TASK (ITL) DAY 3 TDD 3 TDD Exercises TREATMENT TASK (TDD) Real Design Subjects: very inexperienced in Boost & unit tes1ng; inexperienced in C++
  18. 18. Characteriza1on of the par1cipants Most subjects are: •  Very inexperienced in Boost •  Very Inexperienced/inexperienced in unit tes1ng •  Inexperienced in C++ •  All types for OO, programming and IDE
  19. 19. Behavior •  Professionals are less mo1vated than students –  Adendance of a training course <> grading –  Preoccupied with work issues –  Used to flexible schedule –  Young par1cipants more ac1ve and enthusias1c than older ones –  There might be several other psychological issues •  Treatment compliance is lower than for students –  Students appear to be more willing to abide by the rules defined by instructors –  Professionals tend to have their own ideas about what they expect to get from the course/experiment •  Professionals might be afraid of being assessed –  Some subjects removed their code
  20. 20. Results Reception •  Managers very much welcomed the figures –  They were amazed by the quan1ta1ve informa1on about development •  Significance was hard to grasp –  They tended to focus on the average and neglected significance and power – We tried out different representa1ons •  Charts were very useful •  Repor1ng needs to differ from research papers –  Focus on diagrams rather than numbers –  State the findings in words –  Discuss the consequences of results in their context
  21. 21. Means and Error Intervals Non significant Significant 15.9% 46.4% 22.5%48.3% 58.4%
  22. 22. Results Reception •  Managers very much welcomed the figures –  They were amazed by the quan1ta1ve informa1on about development •  Significance was hard to grasp –  They tended to focus on the average and neglected significance and power – We tried out different representa1ons •  Charts were very useful •  Repor1ng needs differ from research papers –  Focus on diagrams rather than numbers –  State the findings in words –  Discuss the consequences of results in their context
  23. 23. Impact of Findings •  Some adopted ideas from the experiment (if not the results) –  To improve their development tools •  EB adopted instruments to monitor developers •  Even when results convinced managers and they opted to adopt TDD they faced reluctance from developers –  Concepts from technology transfer are needed
  24. 24. Conclusions
  25. 25. The concept of field experiment needs more research Its adapta1on to SE is not simple Strategies to face threats to internal validity Both types of experiments are needed Ar1ficial highly controlled environment And natural environments
  26. 26. Need to improve understanding on the validity of subjects Students, although novices, might possibly be not as bad as we thought as experimental subjects
  27. 27. Natalia Juristo University of Oulu & Universidad Politécnica de Madrid Conduc'ng Experiments in Industry: The ESEIL FiDiPro Project

×