214 - Sampling Improvement in Software Engineering Surveys

Sampling Improvement in Software
Engineering Surveys
Rafael Maiani de Mello
rmaiani@cos.ufrj.br
Pedro Correa da Silva
pedrorez@poli.ufrj.br
Guilherme Horta Travassos
ght@cos.ufrj.br
ese.cos.ufrj.br

2
Motivation
• Small and non-probabilistic samples usually:
• reduce external validity;
• make replication difficult;
• limit possibilities of aggregation, and;
• hamper the evaluation of SE technologies .
• Particularly, SE surveys have their results affected
when inadequate samples are used.

3
Context
• Survey on Agility Characteristics and Practices in
Software Processes
– 2 previous trials
– 158 invited subjects (P1)
– 25 participants (S1)
– Only 7 participants had declared high or very high
experience on applying agile approaches in Software
Projects

4
Recruitment Strategy (RS)
Search Question
“Who are the groups from LinkedIn interested in Agility
characteristics and practices concerned with Software
Engineering?”

5
RS Execution
• 289 distinct groups were selected, 62 groups
included after analysis.
Exclusion Criteria # % from Total
Local Groups 97 42.73%
Organizations, publicity and events 66 29.07%
Out of scope 33 14.54%
Vague description 25 11.01%
Single member groups 18 7.93%
Headhunting and job offering groups 8 3.52%
LinkedIn subgroups 3 1.32%
Non-English 1 0.44%
Total of Excluded Groups 227 78.55%

6
Stratification Based on the
Overlapping Rates
Group D
Group B
Group A
Group C
Shared
Members

7
Overlapping Matrix
A B C D E F H I
A 100.00% 4.22% 25.95% 27.60% 25.42% 3.91% 4.35% 3.77%
B 3.48% 100.00% 3.43% 4.00% 2.25% 24.19% 3.01% 2.33%
C 20.04% 3.21% 100.00% 18.65% 20.13% 2.97% 2.62% 2.20%
D 15.49% 2.73% 13.56% 100.00% 16.38% 2.71% 2.35% 2.39%
E 11.32% 1.21% 11.61% 12.99% 100.00% 1.10% 1.52% 1.23%
F 1.70% 12.72% 1.66% 2.09% 1.07% 100.00% 1.62% 1.91%
H 0.90% 0.75% 0.70% 0.87% 0.71% 0.77% 100.00% 45.66%
I 0.64% 0.48% 0.49% 0.73% 0.47% 0.76% 37.73% 100.00%
.
.
.
. . .

8
Stratified Sampling
Recruitment and Effective Sample Size
Strata Name
#Distinct
Members
Sample Size Respondents
CI for
CL=95%
E1 Agility 114,827 1,031 57 12.98%
E2 Project Management 5,488 874 40 15.44%
E3 Agile Practices 1 11,633 955 56 13.06%
E4 Agile Practices 2 3,864 820 35 16.49%
E5 Software Testing 1 56,400 1,021 26 19.22%
E6 Software Testing 2 5,791 882 22 20.86%
E7
Configuration
Management
17,234 981 23 20.88%
E8 SW Architecture 7,335 911 31 17.57%

9
Skill Analysis
“What come into your mind when you think about your
five main skills in software engineering?”
3.49%

10
Skill Analysis
• 277 participants answered (95.19%)
• 1,320 reported skills
• 325 coded skills
• 88 skill groups
3.49%

11
Skill Group Skill Examples %
Personal Skill Creativity, Detailing, Learning, Planning 10.56%
Programming Algorithms, Programming Languages 8.80%
SW Analysis and Design OO Design, Design Patterns 8.25%
Social Skill Communication, Leadership 7.78%
SW Testing Testing, Debugging 7.71%
Thinking and Reasoning Abstraction, Analytical Thinking 6.24%
Agile Practices Refactoring, TDD 5.05%
Agile Characteristic Adaptability, Being Collaborative 5.00%
SW Requirements Req. Analysis, Requirements Elicitation 4.52%
SW Quality Quality, Quality Assurance 3.65%
SW Architecture SW Architecture 3.63%
Problem Solving Problem Solving 3.31%
Agile Methods Kanban, Scrum, XP 2.71%
Business Analysis Business understanding, Business Analysis 2.66%
Project Management Project Management 2.21%
Technical Expertise Technical Knowledge 2.06%
Configuration Management Change Management, Release Management 2.01%
Agile Agile coaching, Agile thinking, Agility 1.91%
3.49%
SW Development Process SW Process Improvement, SW Dev. Life-Cycle 1.27%
SW Development Development, SW Development 1.12%

12
Skill Distribution by Strata
Stratum
Personal
Skill
Programming
SW Analysis
and Design
Social Skill SW Testing
E1 16.34% 7.48% 11.81% 22.31% 5.30%
E2 9.81% 10.62% 16.87% 16.73% 5.71%
E3 12.03% 8.46% 6.94% 20.49% 17.40%
E4 9.25% 20.56% 14.10% 14.81% 9.13%
E5 11.77% 2.83% 6.85% 14.80% 34.36%
E6 9.13% 22.01% 9.21% 3.88% 20.24%
E7 12.48% 14.36% 12.50% 0.00% 2.93%
E8 19.20% 13.67% 21.72% 6.98% 4.93%

13
Skill Distribution
Similarity Analysis

14
New Strata- St1
• “Agilists”
• Composed by agility groups
• Personal Skills, Social Skills, SW Analysis and Design

15
New Strata- St2
• Testing Professionals
• Composed mainly by LinkedIn groups devoted to
Software Testing
• Software Testing is the most relevant skill group

16
New Strata- St3
• Programmers
• Composed mainly by LinkedIn groups devoted to
agile practices
• Programming is the most relevant skill group

17
New Strata- St4
• Configuration Managers
• Composed by three LinkedIn groups concerned with
Configuration Management (CM)
• CM is the most relevant skill group, closely followed
by Programming and Personal Skills

18
New Strata- St5
• “System Analysts”
• Composed by a single LinkedIn group devoted to
software architecture
• Main skill groups: personal skills and SW analysis and
design

19
Hypothesis Testing
Heterogeneity
• S2 is more heterogeneous than S1
Region
S1
12 subjects
9 countries
S2
289 subjects
43 countries
USA+Canada 41.7% 38.1%
Europe 33.3% 41.2%
Asia 25% 11.8%
Latin America - 5.9%
Oceania 16.7% 2.1%
Africa - 0.1%

20
Confidence Level
• S1 and S2 has similar confidence levels
Sample Size Mean Normal (KS) Student t-Test
Mann-Whitney
test
S1 25(1) 0.375 Yes - -
S2 291 0.418 Yes 0.055 -
S2-St1 97 0.460 Yes 0.011(0) -
S2-St2 81(3) 0.382 Yes 0.424(3) -
S2-St3 57(7) 0.470 Yes 0.003(7) -
S2-St4 24 0.333 No - 0.415 (0)
S2-St5 31 0.403 No - 0.171 (0)

21
Conclusion
• This study and previous studies suggests that we can
improve the samples quality following a systematic
sampling approach
• It is feasible to characterize better the subject profile
through open and simple questions
• We found evidence regarding the heterogeneity
between members from the same group on social
networks
– However, in a big picture, there is a trend!

22
Sampling Improvement in Software
Engineering Surveys
Rafael Maiani de Mello
rmaiani@cos.ufrj.br
Pedro Correa da Silva
pedrorez@poli.ufrj.br
Guilherme Horta Travassos
ght@cos.ufrj.br
ese.cos.ufrj.br

214 - Sampling Improvement in Software Engineering Surveys

Recommended

Recommended

More Related Content

Similar to 214 - Sampling Improvement in Software Engineering Surveys

Similar to 214 - Sampling Improvement in Software Engineering Surveys (20)

More from ESEM 2014

More from ESEM 2014 (20)

Recently uploaded

Recently uploaded (20)

214 - Sampling Improvement in Software Engineering Surveys