SlideShare a Scribd company logo
1 of 56
Download to read offline
Survey Research in
Software Engineering
Alessio Ferrari, CNR-ISTI, Pisa, Italy

alessio.ferrari@isti.cnr.it
LM Rea and RA Parker, 2014. Designing and conducting survey research: A comprehensive guide
Barbara A. Kitchenham and Shari L. Pfleeger, 2008 , https://doi.org/10.1007/978-1-84800-044-5_3
April, 2020
Survey
• A survey is a method to systematically gather qualitative and quantitative data related to
certain constructs of interests from a group of individuals that are representative of a
population of interest

• Constructs of interest: concepts that I want to evaluate, e.g., usability of a certain tool,
developers’ habits, etc.

• Population of interest (target population or population): the group of individuals that
is the focus of the survey, e.g., Python developers, companies in a certain area, Python
developers from University A vs Python developers from University B, potential users

• NOTE: I also have qualitative data, but I am normally oriented to present statistics, and
therefore the output is normally quantitative

• NOTE: In principle, individuals of the population of interest can also be objects, but here
we mainly focus on surveying subjects
Survey
• A survey is a method to systematically gather qualitative and quantitative data related to
certain constructs of interests from a group of individuals that are representative of a
population of interest

• Constructs of interest: concepts that I want to evaluate, e.g., usability of a certain tool,
developers’ habits, etc.

• Population of interest (target population or population): the group of individuals that
is the focus of the survey, e.g., Python developers, companies in a certain area, Python
developers from University A vs Python developers from University B, potential users

• NOTE: I also have qualitative data, but I am normally oriented to present statistics, and
therefore the output is normally quantitative

• NOTE: In principle, individuals of the population of interest can also be objects, but here
we mainly focus on surveying subjects
In this context, a survey is a synonymous of QUESTIONNAIRE
Survey
• A survey is a method to systematically gather qualitative and quantitative data related to
certain constructs of interests from a group of individuals that are representative of a
population of interest

• Constructs of interest: concepts that I want to evaluate, e.g., usability of a certain tool,
developers’ habits, etc.

• Population of interest (target population or population): the group of individuals that
is the focus of the survey, e.g., Python developers, companies in a certain area, Python
developers from University A vs Python developers from University B, potential users

• NOTE: I also have qualitative data, but I am normally oriented to present statistics, and
therefore the output is normally quantitative

• NOTE: In principle, individuals of the population of interest can also be objects, but here
we mainly focus on surveying subjects
In this context, a survey is a synonymous of QUESTIONNAIRE
In practice, a survey can be carried out with structured interviews
Why Surveys in
Software Engineering (SE)?
• SE Practice: Surveys are important to gather user’s
needs, which are the trigger for any software
development endeavour (e.g., understanding what are the
typical linguistic problems in official documents from the
viewpoint of citizens, and build a tool to prevent these
problems)

• SE Research: Surveys are also important to gather
information about the practice of software engineering,
in a company or across companies, and to build general
theories (e.g., 80% of problems in SE are due to poorly
written requirements)
Here we are mainly concerned with this second type
However, many considerations apply to both cases
The ABC of Software Engineering Research 11:11
Fig. 1. The ABC framework: eight research strategies as categories of research methods for software engi-
Jungle
Natural
Reserve
Flight SimulatorIn Vitro Experiment
Courtroom
Referendum
Mathematical Model Forecasting System
Survey in SE: Examples
• In a company: I have developed a set of requirements
issues from interviews, and I want to see how relevant they
are for the whole company (unit of analysis are the
employees)

• In a cross-domain population: I want to understand which
are the requirements engineering problems from a large
population; I recruit representatives from different
companies and ask them to fill the survey about their
company (unit of analysis is the company)

• In an open-source population: I want to understand which
are the reasons for following some people; I recruit them
from GitHub, ask open-ended questions, and code them
Mostly deductive, but inductive approaches
are needed in open ended questions
Which Roles to Survey in SE?
• Board of Directors: A group of people, elected by stockholders, to establish corporate policies, and make
management decisions (can also be a single person in case the co)
• Managers: three different levels of management may be present in a large company (low, middle, top)

• Top-level managers (e.g, Organisational Managers) responsible for controlling and overseeing the entire
organization.

• Middle-level managers (e.g., Functional Managers) are responsible for executing organizational plans which
comply with the company’s policies. These managers act at an intermediary between top-level management and
low-level management.

• Low-level managers focus on controlling and directing (e.g., Project Managers). They serve as role models for
the employees they supervise.

• Customers: the ones who buy the system
• Users: the ones who use the system
• Requirements/Business Analysts: the ones that gather requirements from customers and users
• Designers and Architects: the ones that design the system at the high level
• Developers: the ones who code
• Testers: the ones who test the code
Which Roles to Survey in SE?
• Board of Directors: A group of people, elected by stockholders, to establish corporate policies, and make
management decisions (can also be a single person in case the co)
• Managers: three different levels of management may be present in a large company (low, middle, top)

• Top-level managers (e.g, Organisational Managers) responsible for controlling and overseeing the entire
organization.

• Middle-level managers (e.g., Functional Managers) are responsible for executing organizational plans which
comply with the company’s policies. These managers act at an intermediary between top-level management and
low-level management.

• Low-level managers focus on controlling and directing (e.g., Project Managers). They serve as role models for
the employees they supervise.

• Customers: the ones who buy the system
• Users: the ones who use the system
• Requirements/Business Analysts: the ones that gather requirements from customers and users
• Designers and Architects: the ones that design the system at the high level
• Developers: the ones who code
• Testers: the ones who test the code
The roles may depend on the adopted software process!
Companies may include only a subset of the roles
Some roles may be covered by the same person
Survey Process
Research Questions
Sampling
Design Questionnaire
Finalise Questionnaire
Planning Execution and Analysis
Set Deadline for Reply
(if online/email)
Reporting
Collect Answers
Data Coding and Editing
Sampling Procedure
Characterise Target
Population
Pilot Questionnaire
Recruit and Deliver
Questionnaire
Data Analysis and
Interpretation
Research Questions
Questionnaire Design
Threats to Validity (Validity
and Reliability)
Deal with Ethics and GDPR
Define Measures
Results and Analysis
Discussion
in Relation to RQs
Surveys are a hybrid
between qualitative
and quantitative studies
Imputation and Adjustments
Sampling
Terminology
• Population: he universe of units from which the sample is to be selected. The
term ‘units’ is employed because it is not necessarily people who are being
sampled—the researcher may want to sample from a universe of nations,
cities, regions, firms, etc. 

• Sample: the segment of the population that is selected for investigation. It is a
subset of the population. The method of selection may be based on a
probability or a non-probability approach (next slide). 

• Sampling frame: the listing of all units in the population from which the
sample will be selected. It is an explicit list of units —sometimes it is not
possible to match it with the actual population, e.g., if the population is “all
Python developers”.

• Representative sample: a sample that reflects the population accurately so
that it is a microcosm of the population. 

• Respondents: the subject who responded to the survey
Sampling
Population
Sampling Frame
e.g., e-mails
e.g., developers
Respondents
Sample
Probability Sampling:
Sampling Frame
• The optimal sampling frame has the following qualities:

• all units have a logical, numerical identifier

• all units can be found – their contact information, map location or other relevant information is
present

• the frame is organized in a logical, systematic fashion

• the frame has additional information about the units that allow the use of more advanced
sampling frames (e.g., age or expertise of developers to have stratified samples—this may be
collected afterwards)

• every element of the population of interest is present in the frame (it is not always possible…)

• every element of the population is present only once in the frame

• no elements from outside the population of interest are present in the frame

• the data is 'up-to-date'
https://en.wikipedia.org/wiki/Sampling_frame
Terminology
• Probability sample: a sample that has been selected using random selection so that
each unit in the sampling frame has a known chance of being selected. 

• Non-probability sample: a sample that has not been selected using a random
selection method. This implies that some units are more likely to be selected than
others. 

• Sampling error: error in the findings deriving from research due to the difference
between a sample and the population from which it is selected. 

• Non-sampling error: error in the findings deriving from research due to the
differences between the population and the sample that arise either from deficiencies
in the sampling approach, such as an inadequate sampling frame or non-response
(see below), or from such problems as poor question wording, poor interviewing, or
flawed processing of data. 

• Non-response: it occurs whenever some members of the sample refuse to
cooperate, cannot be contacted, or for some reason cannot supply the required data
Probability Sampling
• Random sampling: select n units from the sampling
frame, in a random manner (e.g., “=RAND()" function in
Excel, order list of subjects by random number, select first
n)

• Stratified sampling: select s unit for each identified
stratum (e.g., developer vs tester) of the sampling frame
Typical for market analysis and user studies
Used for large SE studies
Purposive sampling (Non-probability) was used in
Interviews, here random sampling is preferred
cf. De Mello and Travassos, 2016 https://doi.org/10.1145/2961111.2962632
Probability Sampling: Formula• Recommended	when	working	with	probabilistic	sampling	designs
• SS:	sample	size	
• Z: Z-value,	established	through	a	specific	table	(Z=2.58	for	99%	of	confidence	
level,	Z=1.96	for	95%	of	confidence	level
• p: percentage selecting a choice, expressed as decimal (0.5 used as default for
calculating sample size, since it represents the worst case).
• c:	desired	confidence	Interval,	expressed	in	decimal	points	(Ex.:	0.04).
47
cf. Torchiano et al. https://www.slideshare.net/mendezfe/surveys-in-software-engineering
• SS: sample size 

• Z: Z-value, established through a specific table (Z=2.58 for 99% of confidence 

level, Z=1.96 for 95% of confidence level) 

• p: sample proportion, conservative approach is 0.5 (leads to largest SS)

• c: confidence interval, expressed in decimal points (e.g.: 0.04, ± 4%)
Example
- Confidence level: 95%
- Confidence interval: ± 4%
- If the result of a survey answer is e.g., 50% of subjects responding X,
if I repeat the survey the actual result can be between 46% to 54% of
people, with a confidence level of 95%.
How to compute
the sample size?
Probability Sampling: Formula
Sample	Size	Formula
• Correction	formula	based	on	a	finite	population	with	a	pop
size
48
Population Confidence Level
Confidence
Interval
Sample Size
10,000 95% 0.01 4,899
10,000 95% 0.05 370
500 95% 0.01 475
500 95% 0.05 217
Correction Formula, with population of pop size
Sample	Size	Formula
• Correction	formula	based	on	a	finite	population	with	a	pop
size
48
Population Confidence Level
Confidence
Interval
Sample Size
10,000 95% 0.01 4,899
10,000 95% 0.05 370
500 95% 0.01 475
500 95% 0.05 217
cf. Torchiano et al. https://www.slideshare.net/mendezfe/surveys-in-software-engineering
In SE, it may be convenient to increase the
confidence interval, as we can tolerate some imprecision
Probability Sampling in SE Practice
• Select the population from a certain portal:

• GitHub (for developers)

• check most active GitHub users here: https://gist.github.com/
paulmillr/2657075; 

• try to copy-paste this in your browser: https://api.github.com/search/
users?q=followers:100+sort:followers&per_page=100 (the GitHub API
can help you to identify users)

• Check GHTorrent project: https://ghtorrent.org 

• LinkedIn (for other types of professionals, you need to enter groups and
contact people personally, or create polls in groups)

• Consider that only 10% of the contacted subjects will respond (20% in
GitHub), so ensure that you gather enough data, contact as many people as
possible and reasonable
Probability Sampling in SE Practice
• My population is the world of developers. 

• …Well, open source developers…Well, open source developers using GitHub. 

• My sample frame is the open source developers in GitHub —I can identify their email and contact them. 

• I have identified that in GitHub there are 44,735,158 users. I can’t send a questionnaire to all of them.

• I decide to select a sample of the most active users, as I think they represent my population better: HOW
MANY? 

• Go to: https://www.surveymonkey.com/mp/sample-size-calculator/
cf. Blincoe et al. http://kblincoe.github.io/publications/2015_IST_Blincoe.pdf
confidence interval
• Since normally just 10% of the people respond, I need to
consider at least 385 * 10 people if I want a representative
sample, so about 4,000 emails.

• In the end, I get answers from 800 people (20%), not too bad.
This is my actual sample, 800 instead of 44,000,000. I can say
that it is representative, as it is clearly above 385.

• Actually, I can even reduce my confidence interval now to 4%
Probability Sampling in SE Practice
Non-probability Sampling:
Convenience Sampling
• In SE research, it is also typical to have non-probability samples

• Specific expertise is normally required by the respondents (e.g., developers but also domain experts), and it may not be
straightforward to collect a sufficiently large sample, unless you work with GitHub or other networks.

• If you are sampling in a specific company (e.g., to make a survey in a multi-national company, in which the unit of
analysis is the employee) it is unlikely that you have access to the list of all employees

• If you are sampling the companies in a certain area (e.g, to make a survey on startups in Italy or in Tuscany, the unit of
analysis is the company), it is again unlikely that you have access to the list of startups in the area

• Convenience sampling is often adopted: I gather information from all the people that I can contact through my social and
professional links; I collect relevant demographic information (e.g., age, number of years at company X, role, number of
years in a certain role) together with the responses; I check to which extent the demographic information is related to the
responses

• Often, surveys are performed at specific software engineering conferences, and may not reflect the reality—only
companies interested in research may participate, some sectors may not be covered at all

• It is more difficult to have surveys on different companies and performed online — an example will be given at the end of
the presentation

• In these cases you have to rely on personal contacts, that you personally have with companies, and that your colleagues
(other academics in other areas) have with other companies — still, some companies will never be reached

• Little, biased information is better than NO information at all, if the context is clearly explained
Formulating
Questions
What to Ask? Depends on
the Unit of Analysis
•Individuals: experience in the research context,
experience in SE, current professional role, location
and higher academic degree, ... 

•Project teams: team size, client/product domain
(avionics, finance, health, telecommunications, etc.)
and physical distribution, ... 

•Organisations: size, industry segment, location, type
(government, private company, university, etc.), ... 

•
Demographic information
What to Ask? Depends on
your Research Questions
• RQ1: Which are the most frequent requirements defects?

• RQ2: Which requirements defects are more difficult to identify?

• … 

• Question: How frequently do you encounter these types of
requirements defects (Never, Seldom, Sometimes, Often, Very
Often): ambiguity, incompleteness, grammar error, etc.

• Question: How difficult is to identify these types of defect (Very
Difficult, Moderately Difficult, Neither Easy Nor Difficult, Moderately
Easy, Very Easy): ambiguity, incompleteness, grammar error, etc.
To identify the types of defects, and the choices in general
I need to refer to the literature, or to experts in the field
What to Ask? Organise Focus
Groups and Interviews
• Sometimes it is useful to organise a focus group to identify the
relevant questions (or a draft for them, you will need more time to
revise the formulation…)

• Gather participants with different viewpoints, give them 5-10 minutes
to write in a piece of paper a set of relevant questions, ask them to
read, and brainstorm on the proposals

• Sometimes you can refer to the literature to identify your options (e.g.,
phases of a certain software process), or to experts' opinion

• If you are dealing with a somewhat unknown public—e.g., in a specific
domain—it may be useful to first interview people to identify
terminology and relevant questions, and then create the questionnaire
What to Ask? Types of Questions
• Personal factual questions: what is your role in the organisation? How many
years of experience do you have in your current role?
• Factual questions about others: how old are, in average, developers in your
company?
• Informant factual questions: does your company employ external suppliers?
• Questions about attitudes: my job is typically interesting [Disagree…Agree]
(judgments) 

• Questions about beliefs: incorrect requirements tend to result in code errors
[Never … Always] (attitudes and beliefs are different, use different Likert
scales!) 

• Questions about normative standards and values: is it considered
appropriate to have casual dressing in your office?
• Questions about knowledge: which is the most common cause of software
project failure according to research? (rare, to check if the person is informed)
Qualities of a Questionnaire
• Clarity: Will respondents understand the questions? The researchers
may find that certain ambiguities exist that confuse respondents. Are
the response choices sufficiently clear to elicit the desired information?

• Comprehensiveness: Are the questions and response choices
sufficiently comprehensive to cover a reasonably complete range of
alternatives? The researchers may find that certain questions are
irrelevant, incomplete, or redundant and that the stated questions do
not generate all of the important information required for the study. 

• Acceptability: Such potential problems as excessive questionnaire
length or questions that are perceived to invade the privacy of the
respondents, as well as those that may abridge ethical or moral
standards, must be identified and addressed by the researchers.
Structure of the
Questionnaire
• Introductory questions: easy to answer, demographic, NOT sensitive

• Sensitive/personal questions: just if needed, just late in the questionnaire after the
(virtual) rapport is established

• Related questions: group by topic

• Logical sequence: topics shall be logically connected

• Filter/Screening Questions: questions to qualify or disqualify respondents (to make
them eligible to respond to other questions, or evaluate their confidence)

• Nested Structures: try to avoid large blocks that are responded only by certain
participants —very hard to elaborate and compare afterwards

• Reliability Checks: reformulate and present questions that you consider particularly
relevant to be responded accurately (Do you like writing code? When thinking about
writing code you feel…)
Types of Questions
• Open-ended Questions: the respondent can write free
text (long or short)

• Close-ended Questions: set of alternatives; multiple
choice (with minimum and maximum choices), exclusive
choices, Likert Scale.
Open-ended vs Close-ended
Open-ended Close-ended
Allow usage of personal words 🙂 ☹
Unusual answers can be identified 🙂 😐
Typically not leading 🙂 😐
Useful to explore new areas 🙂 ☹
Time effective ☹ 🙂
Answers need to be coded ☹ 🙂
Clear answers ☹ 🙂
Easy to process ☹ 🙂
Compatible answers ☹ 🙂
Answers clarify questions ☹ 🙂
Spontaneous Answers 🙂 ☹
Exhaustive Answers 🙂 ☹
Different perception of scales 🙂 ☹
Formulating Questions: Tips
• Given a question, how would YOU answer it?

• Given a question, test it with peers (for initial draft)

• Pilot the set of questions with a group of respondents
from which you can get feedback (e.g., colleagues,
subjects from company)

• Remember that you may not know the terminology
typically used by your respondents, soy may have to
perform preliminary unstructured interviews to understand
the typical terminology
Formulating Questions: Tips
• Given a question, how would YOU answer it?

• Given a question, test it with peers (for initial draft)

• Pilot the set of questions with a group of respondents
from which you can get feedback (e.g., colleagues,
subjects from company)

• Remember that you may not know the terminology
typically used by your respondents, soy may have to
perform preliminary unstructured interviews to understand
the typical terminology
PILOT, PILOT, PILOT
Formulating Questions: Tips
• Avoid vague/ambiguous questions and answers:
• How often does your group have meetings? [Often…Never]
• How frequently does your group have meetings? [Once a day, Once per week, …] 

• Avoid double negatives: Do you consider not appropriate to avoid testing?

• Avoid long questions: Which types of defects are typically encountered by developers whose
relevance is normally difficult to communicate to managers?
• Avoid general questions: What is the general, physical, intellectual, and moral condition of
men and women employed in your group?
• Avoid double-barrelled questions: How satisfied are you with the space and the colleagues?
What testing environment do you normally use? (there could be no testing environment in use)

• Avoid technical terms: What is the Six-sigma Maturity Level of your process?
• Prefer forced choice answers instead of “all that apply” (for each choice: YES, NO)
What Types of Responses?Questionnaire	Design
Free-text
Numeric
values
• Open questions
• Allow coding
• Content analysis
• High effort on data
analysis
• Open questions
• Allow a wide range
of statistical
analysis
Interval
Scale
• Closed questions
• Not necessarily equally
distributed intervals
• Significantly restricts
statistical analysis
Ordinal/
Likert scale
• Closed questions
• Intervals are
considered equally
distributed
• Statistical analysis is
less restrictive than
Interval Scale
Nominal
• Closed questions
• Statistical analysis
based on frequency
cf. Torchiano et al. https://www.slideshare.net/mendezfe/surveys-in-software-engineering
likert scale
Response Formats:
Examples
Questionnaire	Design
How	much	experience	do	you	have	in	
Java	programming?
a) Very	High	experience
b) High	Experience
c) Few	Experience
d) Very	Few	experience
How	much	experience	do	you	have	in	
Java	Programming?
a) Less	than	one	year
b) 1	year	to	3	years
c) 3	years	to	5	years
d) More	than	5	years
How	much	experience	do	you	have	in	
Java	programming?
__5__	years
How	much	experience	do	you	have	in	
Java	programming?
I have been working with Java programming at
companies since 2011. Before, I got my first
Java certification in 2009, when I started
working in personal projects. But I have
difficult withobject-orientedparts…_________
Do	you	have	experience	in	Java	
programming?
(				)	Yes																										(				)		No
cf. Torchiano et al. https://www.slideshare.net/mendezfe/surveys-in-software-engineering
Tip: Standardised Answers
• When possible, use statements and standardised Likert-
scale answers indicating agreement (more answers can
be gathered): 

• Strongly Agree, Agree, Disagree, Strongly Disagree
Not Just Questions…
• The questionnaire must be accompanied by various administrative information
including:

• An explanation of the purpose of the study.

• A description of who is sponsoring the study (and perhaps why).

• A cover letter using letterhead paper, dated to be consistent with the mail shot

• Provide a contact name and phone number. Personalize the salutation if possible.

• An explanation of how the respondents were chosen and why.

• An explanation of how to return the questionnaire.

• A realistic estimate of the time required to complete the questionnaire. Note that an
unrealistic estimate will be counter-productive.
And privacy issues (later)
Tips for a
Successful Survey
cf. Torchiano et al. https://www.slideshare.net/mendezfe/surveys-in-software-engineering
Recruiting
• Send individual but standard invitation messages

• It is expected that great most of the individual messages sent will be read 

• Avoid "spreading spree": mailing lists, forum invitation messages, crowdsourcing
tools (such as Amazon MechanicalTurk) 

• You will have few or no control on who read the invitation. So, who was effectively
recruited? 

• Never allow forwarding (which is different from snowballing)! —It will violate the
sample 

• Send a questionnaire’s individual token to each subject 

• Establish a finite and not long period to answer the survey (One-two weeks)
• Offer rewards (raffles, donations, payments, sharing results)
Reminding
• Reminders should be used with care.

• Avoid reminding who already had participated

• Avoid reminding more than once

• The invitation message should clearly characterize the
involved researchers, the research context and present the
recruitment parameters

• Include in the invitation message a compliment and an
observation regarding the relevance of subject participation
Piloting
• Pilot the population and sampling activities

• Use a (smaller) sample of the sampling frame, reproducing all planned steps ü
Will allow you to check the adequacy of the frame population to your survey.

• Pilot the questionnaire

• Is it clear, unambiguous, did you maybe miss some questions?

• Is it too long/too short?

• Pilot the recruitment

• Is it working effectively?

• Pilot the data analysis

• Do you have planned for the proper data analysis techniques? What is the
necessary data quantity and quality?
Privacy Policy and
General Data Protection
Regulation (GDPR)
cf. https://www.slideshare.net/alanmcsweeney/gdpr-context-principles-implementation-operation-impact-on-
outsourcing-data-governance-and-data-ethics
General Data Protection Regulation
• General Data Protection Regulation (GDPR) applies to any task dealing with
personal data (not just research surveys)
• Personal Data: means any information relating to an identified or identifiable
natural person ('data subject'); an identifiable natural person is one who can
be identified, directly or indirectly, in particular by reference to an identifier
such as a name, an identification number, location data, an online identifier or
to one or more factors specific to the physical, physiological, genetic, mental,
economic, cultural or social identity of that natural person
• If you distribute your surveys anonymously and you do not process
personal data, you can disregard the GDPR. But, be careful, the GDPR has
an extremely broad view of what personal data is (basically, most
demographic data are personal)!

• If you use contacts or ask for an email address, name or any other personal
data in your surveys, then make sure to read the GDPR, as it imposes a
number of responsibilities on you.
Any individual who can be distinguished from others is considered identifiable.
If you want to ensure that one person answers
one form only, you have to identify them!
General Data Protection Regulation
• If you are creating forms or surveys for a business which is based in the
European Union (EU), or if you collect and process the personal data of
EU citizens, the General Data Protection Regulation (GDPR) affects you.

• The GDPR (General Data Protection Regulation) law basically says
that:

• you must obtain freely given, specific, informed, and unambiguous
consent from your respondents when you collect their personal data.
In other words, you shall not force people to respond to or fill out
your surveys or forms, or somehow trick them to collect their
personal data.

• Additionally, must explain how you plan to use their personal data, in
a clear and easy to understand way.

• Also, as individuals have the right to be forgotten, you must delete
information that you have collected from them if they request.
Privacy Policy: Content (1)
• What you collect and how
• In your text, explain what type of personal data you are collecting and how. Is it respondents email,
name, or IP address? Is it simply by asking them questions, 

or are you collecting data automatically (for example their geo-location or IP address)?

• Why you collect
• Your privacy policy text must clarify your reasons for collecting personal data. Explain for instance why
you need their email. 

Do you have good reasons for collecting their name or address?

• How will you use their data
• Are you going to share it with third parties? In that case, say who these 3rd parties are and why you
need to share their data with them. 

If you ask for their contact info for instance, are you going to use it to contact them, or send them
something?

• How long will you keep their data
• The GDPR requires you to define a so called “data retention” period, when you collect personal data.
Thus your privacy policy text should explain how long you will retain the data.
Privacy Policy: Content (2)
• How secure is the data in your possession
• Your privacy policy must also explain what security measurements are applied when you collect, export, share, and
store personal data of your respondents. What tools are you using, and if your data processors are also taking the
security of the data seriously.

• Clarify your respondents rights
• The GDPR clearly defines individuals rights for their own data. You must also make sure to reflect these rights in
your privacy policy text, and inform your respondents about their rights, which are as follows:

• Right to access, view, and edit their own information in a timely manner

• Right to be forgotten, which means being deleted from your survey results

• Also right to be able to opt-out form your future messages (e.g. if you use their data to send them ads or
marketing messages)

• Keep in mind that data is owned by the respondents, not you or your company or organization.

• Who to contact
• Every organization that is collecting data from EU citizens must have a Data Protection Officer. The DPO is a
person in the organization who can represent the organization with respect to data and privacy issues. Including the
DPO’s contact information in your privacy policy would be great for your respondents, in case then need to ask
questions or practice their rights.
Example: Privacy Notice
What to write in your survey entry page (with a link to the policy)
Why and How
Transparency
Data Retention
Share or Sale of Data
Link to Policy
Contact Person
We want to understand the typical problems of SE students.

For this, we need your contribution with this survey. 

The survey takes 5 to 10 minutes to complete.
Together with your opinion, we will ask also personal data,

such as your email address, to ask you follow-up questions
We securely store this data until the end of 2020
We respect your privacy and therefore we will not share

your data with any third party
By filling up this form, you agree that we will process 

your data according to our privacy policy
If you have any question regarding your data, contact

our data protection officer: Mr. John Doe, j.doe@survey.com
Threats to Validity in
Survey Research
Reliability and Validity
• Reliability and Validity are the two main criteria used in
survey research to evaluate threats to validity

• Reliability is concerned with how well we can reproduce
the survey data, as well as the extent of measurement
error. That is, a survey is reliable if we get the same kinds
and distribution of answers when we administer the
survey to two similar groups of respondents. 

• Validity is concerned with how well the instrument
measures what it is supposed to measure.
Focus groups and pilot tests shall be performed
to ensure reliability and validity
Reliability Types
• Test-retest (intra-observer) Reliability: how likely is that the person responds
in the same way if surveyed twice? 

• How to ensure: during pilot, survey twice, if correlation greater than 0.7,
reliability is good; for some questions, include alternate forms, and ensure
Cronbach alpha greater than 0.7
• Inter-rater Reliability: to which extent different observers give similar answers
when they assess the same situation? (not so common)

• How to ensure: use two pilots with different samples, and check correlation
between distributions of answers

• Inter-coder Reliability: (in case of open questions) how reliable is the coding
procedure?

• How to ensure: two coders, joint selection of a master code list, and
application of the master codes to the data; check agreement with
Krippendorff’s alpha
Validity Types
• Content Validity: how appropriate the instrument seems
to a group of reviewers (i.e., a focus group) with
knowledge of the subject matter?

• How to ensure: perform a focus group

• Construct Validity: to which extent are the constructs
related to the measured variables?

• How to ensure: provide sound arguments that show
the relationship between constructs and questions
Other types of validity shall be considered when the survey is repeated
Barbara A. Kitchenham and Shari L. Pfleeger, 2008 , https://doi.org/10.1007/978-1-84800-044-5_3
Example Survey in SE: Napire
(Naming the Pain in
Requirements Engineering)
Contemporary Problems, Causes, and Effects in Practice 

cf. http://re-survey.org/#/explore
cf. Mendez Fernandez et al. https://arxiv.org/pdf/1611.10288.pdf
Napire
3.1 Research Questions
Our objective is to get a better understanding of which problems practitioners
encounter in RE, and how those problems relate to the overall project setting
(causes and problems). To this end, we formulate three research questions, shown
in Table 2, to steer the design of our study.
Table 2 Research questions.
RQ 1 Which contemporary problems exist in RE?
RQ 2 What are observable patterns of problems and context characteristics?
RQ 3 What are their perceived causes and e↵ects?
The first question aims at understanding which problems practitioners experi-
ence in general in their RE and what their criticality is w.r.t. project failure. This
more descriptive view is complemented by the second research question, which
aims at understanding whether there exist problems that relate to specific context
factors, such as the company size or the type of used process model. Once we un-
derstand whether there exist specific patterns in the problems, we want to know
what their perceived causes and implications are going beyond project failure.
3.2 Instrument
The overall instrument used in NaPiRE constitutes in total 35 questions used to
collect data on (a) the demographics, (b) how practitioners elicit and document
requirements, (c) how requirements are changed and aligned with tests, (d) what
and how RE standards are applied and tailored, (e) how RE is improved, and
finally (f) what problems practitioners experience in their RE. In the study at
hands, we focus on the problems practitioners experience in their RE while using
8 D. M´endez Fern´andez et al.
Table 3 Questions (simplified and condensed excerpt).
Parts No. Question Type
Demographics Q 1 What is the size of your company? Closed(SC)
Q 2 Please describe the main business area and application
domain.
Open
Q 3 Does your company participate in globally distributed
projects?
Closed(SC)
Q 4 In which country are you personally located? Open
Q 5 To which project role are you most frequently assigned? Closed(SC)
Q 6 How do you rate your experience in this role? Closed(SC)
Q 7 Which organisational role does your company take most
frequently in your projects?
Closed(MC)
Q 8 Which process model do you follow (or a variation of
it)?
Closed(MC)
Status Quo Q 9 How do you elicit requirements? Closed(MC)
Q 10 How do you document functional requirements? Closed(SC)
Q 11 How do you document non-functional requirements? Closed(SC)
Q 12 How do you deal with changing requirements after the
initial release?
Closed(SC)
... ... ...
Q 16 What requirements engineering company standard have
you established at your company?
Closed(MC)
... ... ...
Problems Q 28 Considering your personal experiences, how do the fol-
lowing (more general) problems in requirements engi-
neering apply to your projects?
Likert
Q 29 Considering your personally experienced problems
(stated in the previous question), which ones would you
classify as the five most critical ones (ordered by their
relevance).
Closed
Q 30 Considering your personally experienced most critical
problems (selected in the previous question), which
causes do they have?
Open
Q 31 Considering your personally experienced most critical
problems (selected in the previous question), which im-
plications do they have?
Open
Q 32 Considering your personally experienced most critical
problems (selected in the previous question), which
mitigations do you define (if at all)?
Open
Q 33 Considering your personally experienced most critical Closed(MC)
Research Questions
Questions
(Example)
Results
To analyse the influence of the most cited causes on the most cited problems
and, in turn, of those problems to project failure (as reported by the survey re-
spondents), we visualise the relationships via an alluvial diagram. This diagram is
shown in Figure 3. The decision to relate only the most cited causes to the most
cited RE problems was taken to enhance the visualisation.
Communication flaws between project team and the customer
Customer does not know what he wants
Lack of a well-defined RE process
Lack of experience of RE team members
Lack of time
Missing direct communication to customer
Requirements remain too abstract
Too high team distribution
Unclear roles and responsonsibilities at customer side
Weak qualification of RE team members
Communication flaws between project team and the customer
Communication flaws within the project team
Incomplete and / or hidden requirements
Inconsistent requirements
Insufficient support by customer
Moving targets (changing goals, business processes and / or requirements)
Stakeholders with difficulties in separating requirements from previously known solution designs
Time boxing / Not enough time in general
Underspecified requirements that are too abstract and allow for various interpretations
Weak access to customer needs and / or (internal) business information
Project Completed
Project Failed
Fig. 3 Relation of top 10 causes, top 10 problems, and the project impact.
causes vs problems
Summary
• Surveys are a hybrid method between qualitative and
quantitative research

• Sampling is crucial to have good data

• Piloting is crucial (you have one shot only)

• Clarity of questions and time to answer is key

• Don’t forget about privacy issues

More Related Content

What's hot

Empirical research methods for software engineering
Empirical research methods for software engineeringEmpirical research methods for software engineering
Empirical research methods for software engineeringsarfraznawaz
 
Instance Space Analysis for Search Based Software Engineering
Instance Space Analysis for Search Based Software EngineeringInstance Space Analysis for Search Based Software Engineering
Instance Space Analysis for Search Based Software EngineeringAldeida Aleti
 
Building and Evaluating Theories 
 in Software Engineering
Building and Evaluating Theories 
 in Software EngineeringBuilding and Evaluating Theories 
 in Software Engineering
Building and Evaluating Theories 
 in Software EngineeringDaniel Mendez
 
Surveys in Software Engineering
Surveys in Software EngineeringSurveys in Software Engineering
Surveys in Software EngineeringDaniel Mendez
 
[2017/2018] RESEARCH in software engineering
[2017/2018] RESEARCH in software engineering[2017/2018] RESEARCH in software engineering
[2017/2018] RESEARCH in software engineeringIvano Malavolta
 
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...Alejandro Bellogin
 
Exploratory testing STEW 2016
Exploratory testing STEW 2016Exploratory testing STEW 2016
Exploratory testing STEW 2016Per Runeson
 
Theories in Empirical Software Engineering
Theories in Empirical Software EngineeringTheories in Empirical Software Engineering
Theories in Empirical Software EngineeringDaniel Mendez
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsAlejandro Bellogin
 
Design Thinking for Requirements Engineering
Design Thinking for Requirements EngineeringDesign Thinking for Requirements Engineering
Design Thinking for Requirements EngineeringDaniel Mendez
 
Tutorial on Opinion Mining and Sentiment Analysis
Tutorial on Opinion Mining and Sentiment AnalysisTutorial on Opinion Mining and Sentiment Analysis
Tutorial on Opinion Mining and Sentiment AnalysisYun Hao
 
In Quest of Requirements Engineering Research that Industry Needs
In Quest of Requirements Engineering Research that Industry NeedsIn Quest of Requirements Engineering Research that Industry Needs
In Quest of Requirements Engineering Research that Industry NeedsDaniel Mendez
 
Techniques for Context-Aware and Cold-Start Recommendations
Techniques for Context-Aware and Cold-Start RecommendationsTechniques for Context-Aware and Cold-Start Recommendations
Techniques for Context-Aware and Cold-Start RecommendationsMatthias Braunhofer
 
Design Insights for the Next Wave Ontology Authoring Tools
Design Insights for the Next Wave Ontology Authoring ToolsDesign Insights for the Next Wave Ontology Authoring Tools
Design Insights for the Next Wave Ontology Authoring ToolsMarkel Vigo
 
Common Technical Writing Issues
Common Technical Writing IssuesCommon Technical Writing Issues
Common Technical Writing IssuesTao Xie
 
Using and learning phrases
Using and learning phrasesUsing and learning phrases
Using and learning phrasesCassandra Jacobs
 
Offline evaluation of recommender systems: all pain and no gain?
Offline evaluation of recommender systems: all pain and no gain?Offline evaluation of recommender systems: all pain and no gain?
Offline evaluation of recommender systems: all pain and no gain?Mark Levy
 

What's hot (20)

Empirical research methods for software engineering
Empirical research methods for software engineeringEmpirical research methods for software engineering
Empirical research methods for software engineering
 
Instance Space Analysis for Search Based Software Engineering
Instance Space Analysis for Search Based Software EngineeringInstance Space Analysis for Search Based Software Engineering
Instance Space Analysis for Search Based Software Engineering
 
Building and Evaluating Theories 
 in Software Engineering
Building and Evaluating Theories 
 in Software EngineeringBuilding and Evaluating Theories 
 in Software Engineering
Building and Evaluating Theories 
 in Software Engineering
 
2011 EASE - Motivation in Software Engineering: A Systematic Review Update
2011 EASE - Motivation in Software Engineering: A Systematic Review Update2011 EASE - Motivation in Software Engineering: A Systematic Review Update
2011 EASE - Motivation in Software Engineering: A Systematic Review Update
 
Surveys in Software Engineering
Surveys in Software EngineeringSurveys in Software Engineering
Surveys in Software Engineering
 
[2017/2018] RESEARCH in software engineering
[2017/2018] RESEARCH in software engineering[2017/2018] RESEARCH in software engineering
[2017/2018] RESEARCH in software engineering
 
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
 
Wcre13b.ppt
Wcre13b.pptWcre13b.ppt
Wcre13b.ppt
 
Exploratory testing STEW 2016
Exploratory testing STEW 2016Exploratory testing STEW 2016
Exploratory testing STEW 2016
 
Theories in Empirical Software Engineering
Theories in Empirical Software EngineeringTheories in Empirical Software Engineering
Theories in Empirical Software Engineering
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender Systems
 
Design Thinking for Requirements Engineering
Design Thinking for Requirements EngineeringDesign Thinking for Requirements Engineering
Design Thinking for Requirements Engineering
 
Tutorial on Opinion Mining and Sentiment Analysis
Tutorial on Opinion Mining and Sentiment AnalysisTutorial on Opinion Mining and Sentiment Analysis
Tutorial on Opinion Mining and Sentiment Analysis
 
Wcre13a.ppt
Wcre13a.pptWcre13a.ppt
Wcre13a.ppt
 
In Quest of Requirements Engineering Research that Industry Needs
In Quest of Requirements Engineering Research that Industry NeedsIn Quest of Requirements Engineering Research that Industry Needs
In Quest of Requirements Engineering Research that Industry Needs
 
Techniques for Context-Aware and Cold-Start Recommendations
Techniques for Context-Aware and Cold-Start RecommendationsTechniques for Context-Aware and Cold-Start Recommendations
Techniques for Context-Aware and Cold-Start Recommendations
 
Design Insights for the Next Wave Ontology Authoring Tools
Design Insights for the Next Wave Ontology Authoring ToolsDesign Insights for the Next Wave Ontology Authoring Tools
Design Insights for the Next Wave Ontology Authoring Tools
 
Common Technical Writing Issues
Common Technical Writing IssuesCommon Technical Writing Issues
Common Technical Writing Issues
 
Using and learning phrases
Using and learning phrasesUsing and learning phrases
Using and learning phrases
 
Offline evaluation of recommender systems: all pain and no gain?
Offline evaluation of recommender systems: all pain and no gain?Offline evaluation of recommender systems: all pain and no gain?
Offline evaluation of recommender systems: all pain and no gain?
 

Similar to Survey Research In Empirical Software Engineering

Tropos project toward RE
Tropos project toward RETropos project toward RE
Tropos project toward RESehrish Asif
 
UX Design Process | Sample Proposal
UX Design Process | Sample Proposal UX Design Process | Sample Proposal
UX Design Process | Sample Proposal Marta Fioni
 
Ppt ooad ooad3unit
Ppt ooad ooad3unitPpt ooad ooad3unit
Ppt ooad ooad3unitramyalaksha
 
e3_chapter__5_evaluation_technics_HCeVpPLCvE.ppt
e3_chapter__5_evaluation_technics_HCeVpPLCvE.ppte3_chapter__5_evaluation_technics_HCeVpPLCvE.ppt
e3_chapter__5_evaluation_technics_HCeVpPLCvE.pptappstore15
 
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven ResearchISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven ResearchTao Xie
 
Usability Evaluation
Usability EvaluationUsability Evaluation
Usability EvaluationSaqib Shehzad
 
SAD _ Fact Finding Techniques.pptx
SAD _ Fact Finding Techniques.pptxSAD _ Fact Finding Techniques.pptx
SAD _ Fact Finding Techniques.pptxSharmilaMore5
 
Lecture 5: How to make the Social Web Personalized? (VU Amsterdam Social Web ...
Lecture 5: How to make the Social Web Personalized? (VU Amsterdam Social Web ...Lecture 5: How to make the Social Web Personalized? (VU Amsterdam Social Web ...
Lecture 5: How to make the Social Web Personalized? (VU Amsterdam Social Web ...Lora Aroyo
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAjaved75
 
Planning and Executing Practice-Impactful Research
Planning and Executing Practice-Impactful ResearchPlanning and Executing Practice-Impactful Research
Planning and Executing Practice-Impactful ResearchTao Xie
 
Bmgt 311 chapter_16
Bmgt 311 chapter_16Bmgt 311 chapter_16
Bmgt 311 chapter_16Chris Lovett
 
5. SE RequirementEngineering task.ppt
5. SE RequirementEngineering task.ppt5. SE RequirementEngineering task.ppt
5. SE RequirementEngineering task.pptHaiderAli252366
 
Managing Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research MethodsManaging Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research MethodsRebecca Grant
 
System and design chapter-2
System and design chapter-2System and design chapter-2
System and design chapter-2Best Rahim
 

Similar to Survey Research In Empirical Software Engineering (20)

Tropos project toward RE
Tropos project toward RETropos project toward RE
Tropos project toward RE
 
Software Analytics
Software AnalyticsSoftware Analytics
Software Analytics
 
UX Design Process | Sample Proposal
UX Design Process | Sample Proposal UX Design Process | Sample Proposal
UX Design Process | Sample Proposal
 
Ppt ooad ooad3unit
Ppt ooad ooad3unitPpt ooad ooad3unit
Ppt ooad ooad3unit
 
e3_chapter__5_evaluation_technics_HCeVpPLCvE.ppt
e3_chapter__5_evaluation_technics_HCeVpPLCvE.ppte3_chapter__5_evaluation_technics_HCeVpPLCvE.ppt
e3_chapter__5_evaluation_technics_HCeVpPLCvE.ppt
 
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven ResearchISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
 
Usability Evaluation
Usability EvaluationUsability Evaluation
Usability Evaluation
 
SAD _ Fact Finding Techniques.pptx
SAD _ Fact Finding Techniques.pptxSAD _ Fact Finding Techniques.pptx
SAD _ Fact Finding Techniques.pptx
 
Data mining
Data miningData mining
Data mining
 
data analysis.ppt
data analysis.pptdata analysis.ppt
data analysis.ppt
 
data analysis.pptx
data analysis.pptxdata analysis.pptx
data analysis.pptx
 
Lecture 5: How to make the Social Web Personalized? (VU Amsterdam Social Web ...
Lecture 5: How to make the Social Web Personalized? (VU Amsterdam Social Web ...Lecture 5: How to make the Social Web Personalized? (VU Amsterdam Social Web ...
Lecture 5: How to make the Social Web Personalized? (VU Amsterdam Social Web ...
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
 
Planning and Executing Practice-Impactful Research
Planning and Executing Practice-Impactful ResearchPlanning and Executing Practice-Impactful Research
Planning and Executing Practice-Impactful Research
 
Peer reviews
Peer reviewsPeer reviews
Peer reviews
 
Bmgt 311 chapter_16
Bmgt 311 chapter_16Bmgt 311 chapter_16
Bmgt 311 chapter_16
 
5. SE RequirementEngineering task.ppt
5. SE RequirementEngineering task.ppt5. SE RequirementEngineering task.ppt
5. SE RequirementEngineering task.ppt
 
Managing Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research MethodsManaging Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research Methods
 
The UX Analyst
The UX AnalystThe UX Analyst
The UX Analyst
 
System and design chapter-2
System and design chapter-2System and design chapter-2
System and design chapter-2
 

Recently uploaded

Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 

Recently uploaded (20)

Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 

Survey Research In Empirical Software Engineering

  • 1. Survey Research in Software Engineering Alessio Ferrari, CNR-ISTI, Pisa, Italy alessio.ferrari@isti.cnr.it LM Rea and RA Parker, 2014. Designing and conducting survey research: A comprehensive guide Barbara A. Kitchenham and Shari L. Pfleeger, 2008 , https://doi.org/10.1007/978-1-84800-044-5_3 April, 2020
  • 2. Survey • A survey is a method to systematically gather qualitative and quantitative data related to certain constructs of interests from a group of individuals that are representative of a population of interest • Constructs of interest: concepts that I want to evaluate, e.g., usability of a certain tool, developers’ habits, etc. • Population of interest (target population or population): the group of individuals that is the focus of the survey, e.g., Python developers, companies in a certain area, Python developers from University A vs Python developers from University B, potential users • NOTE: I also have qualitative data, but I am normally oriented to present statistics, and therefore the output is normally quantitative • NOTE: In principle, individuals of the population of interest can also be objects, but here we mainly focus on surveying subjects
  • 3. Survey • A survey is a method to systematically gather qualitative and quantitative data related to certain constructs of interests from a group of individuals that are representative of a population of interest • Constructs of interest: concepts that I want to evaluate, e.g., usability of a certain tool, developers’ habits, etc. • Population of interest (target population or population): the group of individuals that is the focus of the survey, e.g., Python developers, companies in a certain area, Python developers from University A vs Python developers from University B, potential users • NOTE: I also have qualitative data, but I am normally oriented to present statistics, and therefore the output is normally quantitative • NOTE: In principle, individuals of the population of interest can also be objects, but here we mainly focus on surveying subjects In this context, a survey is a synonymous of QUESTIONNAIRE
  • 4. Survey • A survey is a method to systematically gather qualitative and quantitative data related to certain constructs of interests from a group of individuals that are representative of a population of interest • Constructs of interest: concepts that I want to evaluate, e.g., usability of a certain tool, developers’ habits, etc. • Population of interest (target population or population): the group of individuals that is the focus of the survey, e.g., Python developers, companies in a certain area, Python developers from University A vs Python developers from University B, potential users • NOTE: I also have qualitative data, but I am normally oriented to present statistics, and therefore the output is normally quantitative • NOTE: In principle, individuals of the population of interest can also be objects, but here we mainly focus on surveying subjects In this context, a survey is a synonymous of QUESTIONNAIRE In practice, a survey can be carried out with structured interviews
  • 5. Why Surveys in Software Engineering (SE)? • SE Practice: Surveys are important to gather user’s needs, which are the trigger for any software development endeavour (e.g., understanding what are the typical linguistic problems in official documents from the viewpoint of citizens, and build a tool to prevent these problems) • SE Research: Surveys are also important to gather information about the practice of software engineering, in a company or across companies, and to build general theories (e.g., 80% of problems in SE are due to poorly written requirements) Here we are mainly concerned with this second type However, many considerations apply to both cases
  • 6. The ABC of Software Engineering Research 11:11 Fig. 1. The ABC framework: eight research strategies as categories of research methods for software engi- Jungle Natural Reserve Flight SimulatorIn Vitro Experiment Courtroom Referendum Mathematical Model Forecasting System
  • 7. Survey in SE: Examples • In a company: I have developed a set of requirements issues from interviews, and I want to see how relevant they are for the whole company (unit of analysis are the employees) • In a cross-domain population: I want to understand which are the requirements engineering problems from a large population; I recruit representatives from different companies and ask them to fill the survey about their company (unit of analysis is the company) • In an open-source population: I want to understand which are the reasons for following some people; I recruit them from GitHub, ask open-ended questions, and code them Mostly deductive, but inductive approaches are needed in open ended questions
  • 8. Which Roles to Survey in SE? • Board of Directors: A group of people, elected by stockholders, to establish corporate policies, and make management decisions (can also be a single person in case the co) • Managers: three different levels of management may be present in a large company (low, middle, top) • Top-level managers (e.g, Organisational Managers) responsible for controlling and overseeing the entire organization. • Middle-level managers (e.g., Functional Managers) are responsible for executing organizational plans which comply with the company’s policies. These managers act at an intermediary between top-level management and low-level management. • Low-level managers focus on controlling and directing (e.g., Project Managers). They serve as role models for the employees they supervise. • Customers: the ones who buy the system • Users: the ones who use the system • Requirements/Business Analysts: the ones that gather requirements from customers and users • Designers and Architects: the ones that design the system at the high level • Developers: the ones who code • Testers: the ones who test the code
  • 9. Which Roles to Survey in SE? • Board of Directors: A group of people, elected by stockholders, to establish corporate policies, and make management decisions (can also be a single person in case the co) • Managers: three different levels of management may be present in a large company (low, middle, top) • Top-level managers (e.g, Organisational Managers) responsible for controlling and overseeing the entire organization. • Middle-level managers (e.g., Functional Managers) are responsible for executing organizational plans which comply with the company’s policies. These managers act at an intermediary between top-level management and low-level management. • Low-level managers focus on controlling and directing (e.g., Project Managers). They serve as role models for the employees they supervise. • Customers: the ones who buy the system • Users: the ones who use the system • Requirements/Business Analysts: the ones that gather requirements from customers and users • Designers and Architects: the ones that design the system at the high level • Developers: the ones who code • Testers: the ones who test the code The roles may depend on the adopted software process! Companies may include only a subset of the roles Some roles may be covered by the same person
  • 10. Survey Process Research Questions Sampling Design Questionnaire Finalise Questionnaire Planning Execution and Analysis Set Deadline for Reply (if online/email) Reporting Collect Answers Data Coding and Editing Sampling Procedure Characterise Target Population Pilot Questionnaire Recruit and Deliver Questionnaire Data Analysis and Interpretation Research Questions Questionnaire Design Threats to Validity (Validity and Reliability) Deal with Ethics and GDPR Define Measures Results and Analysis Discussion in Relation to RQs Surveys are a hybrid between qualitative and quantitative studies Imputation and Adjustments
  • 12. Terminology • Population: he universe of units from which the sample is to be selected. The term ‘units’ is employed because it is not necessarily people who are being sampled—the researcher may want to sample from a universe of nations, cities, regions, firms, etc. • Sample: the segment of the population that is selected for investigation. It is a subset of the population. The method of selection may be based on a probability or a non-probability approach (next slide). • Sampling frame: the listing of all units in the population from which the sample will be selected. It is an explicit list of units —sometimes it is not possible to match it with the actual population, e.g., if the population is “all Python developers”. • Representative sample: a sample that reflects the population accurately so that it is a microcosm of the population. • Respondents: the subject who responded to the survey
  • 14. Probability Sampling: Sampling Frame • The optimal sampling frame has the following qualities: • all units have a logical, numerical identifier • all units can be found – their contact information, map location or other relevant information is present • the frame is organized in a logical, systematic fashion • the frame has additional information about the units that allow the use of more advanced sampling frames (e.g., age or expertise of developers to have stratified samples—this may be collected afterwards) • every element of the population of interest is present in the frame (it is not always possible…) • every element of the population is present only once in the frame • no elements from outside the population of interest are present in the frame • the data is 'up-to-date' https://en.wikipedia.org/wiki/Sampling_frame
  • 15. Terminology • Probability sample: a sample that has been selected using random selection so that each unit in the sampling frame has a known chance of being selected. • Non-probability sample: a sample that has not been selected using a random selection method. This implies that some units are more likely to be selected than others. • Sampling error: error in the findings deriving from research due to the difference between a sample and the population from which it is selected. • Non-sampling error: error in the findings deriving from research due to the differences between the population and the sample that arise either from deficiencies in the sampling approach, such as an inadequate sampling frame or non-response (see below), or from such problems as poor question wording, poor interviewing, or flawed processing of data. • Non-response: it occurs whenever some members of the sample refuse to cooperate, cannot be contacted, or for some reason cannot supply the required data
  • 16. Probability Sampling • Random sampling: select n units from the sampling frame, in a random manner (e.g., “=RAND()" function in Excel, order list of subjects by random number, select first n) • Stratified sampling: select s unit for each identified stratum (e.g., developer vs tester) of the sampling frame Typical for market analysis and user studies Used for large SE studies Purposive sampling (Non-probability) was used in Interviews, here random sampling is preferred cf. De Mello and Travassos, 2016 https://doi.org/10.1145/2961111.2962632
  • 17. Probability Sampling: Formula• Recommended when working with probabilistic sampling designs • SS: sample size • Z: Z-value, established through a specific table (Z=2.58 for 99% of confidence level, Z=1.96 for 95% of confidence level • p: percentage selecting a choice, expressed as decimal (0.5 used as default for calculating sample size, since it represents the worst case). • c: desired confidence Interval, expressed in decimal points (Ex.: 0.04). 47 cf. Torchiano et al. https://www.slideshare.net/mendezfe/surveys-in-software-engineering • SS: sample size • Z: Z-value, established through a specific table (Z=2.58 for 99% of confidence 
 level, Z=1.96 for 95% of confidence level) • p: sample proportion, conservative approach is 0.5 (leads to largest SS) • c: confidence interval, expressed in decimal points (e.g.: 0.04, ± 4%) Example - Confidence level: 95% - Confidence interval: ± 4% - If the result of a survey answer is e.g., 50% of subjects responding X, if I repeat the survey the actual result can be between 46% to 54% of people, with a confidence level of 95%. How to compute the sample size?
  • 18. Probability Sampling: Formula Sample Size Formula • Correction formula based on a finite population with a pop size 48 Population Confidence Level Confidence Interval Sample Size 10,000 95% 0.01 4,899 10,000 95% 0.05 370 500 95% 0.01 475 500 95% 0.05 217 Correction Formula, with population of pop size Sample Size Formula • Correction formula based on a finite population with a pop size 48 Population Confidence Level Confidence Interval Sample Size 10,000 95% 0.01 4,899 10,000 95% 0.05 370 500 95% 0.01 475 500 95% 0.05 217 cf. Torchiano et al. https://www.slideshare.net/mendezfe/surveys-in-software-engineering In SE, it may be convenient to increase the confidence interval, as we can tolerate some imprecision
  • 19. Probability Sampling in SE Practice • Select the population from a certain portal: • GitHub (for developers) • check most active GitHub users here: https://gist.github.com/ paulmillr/2657075; • try to copy-paste this in your browser: https://api.github.com/search/ users?q=followers:100+sort:followers&per_page=100 (the GitHub API can help you to identify users) • Check GHTorrent project: https://ghtorrent.org • LinkedIn (for other types of professionals, you need to enter groups and contact people personally, or create polls in groups) • Consider that only 10% of the contacted subjects will respond (20% in GitHub), so ensure that you gather enough data, contact as many people as possible and reasonable
  • 20. Probability Sampling in SE Practice • My population is the world of developers. • …Well, open source developers…Well, open source developers using GitHub. • My sample frame is the open source developers in GitHub —I can identify their email and contact them. • I have identified that in GitHub there are 44,735,158 users. I can’t send a questionnaire to all of them. • I decide to select a sample of the most active users, as I think they represent my population better: HOW MANY? • Go to: https://www.surveymonkey.com/mp/sample-size-calculator/ cf. Blincoe et al. http://kblincoe.github.io/publications/2015_IST_Blincoe.pdf confidence interval
  • 21. • Since normally just 10% of the people respond, I need to consider at least 385 * 10 people if I want a representative sample, so about 4,000 emails. • In the end, I get answers from 800 people (20%), not too bad. This is my actual sample, 800 instead of 44,000,000. I can say that it is representative, as it is clearly above 385. • Actually, I can even reduce my confidence interval now to 4% Probability Sampling in SE Practice
  • 22. Non-probability Sampling: Convenience Sampling • In SE research, it is also typical to have non-probability samples • Specific expertise is normally required by the respondents (e.g., developers but also domain experts), and it may not be straightforward to collect a sufficiently large sample, unless you work with GitHub or other networks. • If you are sampling in a specific company (e.g., to make a survey in a multi-national company, in which the unit of analysis is the employee) it is unlikely that you have access to the list of all employees • If you are sampling the companies in a certain area (e.g, to make a survey on startups in Italy or in Tuscany, the unit of analysis is the company), it is again unlikely that you have access to the list of startups in the area • Convenience sampling is often adopted: I gather information from all the people that I can contact through my social and professional links; I collect relevant demographic information (e.g., age, number of years at company X, role, number of years in a certain role) together with the responses; I check to which extent the demographic information is related to the responses • Often, surveys are performed at specific software engineering conferences, and may not reflect the reality—only companies interested in research may participate, some sectors may not be covered at all • It is more difficult to have surveys on different companies and performed online — an example will be given at the end of the presentation • In these cases you have to rely on personal contacts, that you personally have with companies, and that your colleagues (other academics in other areas) have with other companies — still, some companies will never be reached • Little, biased information is better than NO information at all, if the context is clearly explained
  • 24. What to Ask? Depends on the Unit of Analysis •Individuals: experience in the research context, experience in SE, current professional role, location and higher academic degree, ... 
 •Project teams: team size, client/product domain (avionics, finance, health, telecommunications, etc.) and physical distribution, ... 
 •Organisations: size, industry segment, location, type (government, private company, university, etc.), ... 
 • Demographic information
  • 25. What to Ask? Depends on your Research Questions • RQ1: Which are the most frequent requirements defects? • RQ2: Which requirements defects are more difficult to identify? • … • Question: How frequently do you encounter these types of requirements defects (Never, Seldom, Sometimes, Often, Very Often): ambiguity, incompleteness, grammar error, etc. • Question: How difficult is to identify these types of defect (Very Difficult, Moderately Difficult, Neither Easy Nor Difficult, Moderately Easy, Very Easy): ambiguity, incompleteness, grammar error, etc. To identify the types of defects, and the choices in general I need to refer to the literature, or to experts in the field
  • 26. What to Ask? Organise Focus Groups and Interviews • Sometimes it is useful to organise a focus group to identify the relevant questions (or a draft for them, you will need more time to revise the formulation…) • Gather participants with different viewpoints, give them 5-10 minutes to write in a piece of paper a set of relevant questions, ask them to read, and brainstorm on the proposals • Sometimes you can refer to the literature to identify your options (e.g., phases of a certain software process), or to experts' opinion • If you are dealing with a somewhat unknown public—e.g., in a specific domain—it may be useful to first interview people to identify terminology and relevant questions, and then create the questionnaire
  • 27. What to Ask? Types of Questions • Personal factual questions: what is your role in the organisation? How many years of experience do you have in your current role? • Factual questions about others: how old are, in average, developers in your company? • Informant factual questions: does your company employ external suppliers? • Questions about attitudes: my job is typically interesting [Disagree…Agree] (judgments) • Questions about beliefs: incorrect requirements tend to result in code errors [Never … Always] (attitudes and beliefs are different, use different Likert scales!) • Questions about normative standards and values: is it considered appropriate to have casual dressing in your office? • Questions about knowledge: which is the most common cause of software project failure according to research? (rare, to check if the person is informed)
  • 28. Qualities of a Questionnaire • Clarity: Will respondents understand the questions? The researchers may find that certain ambiguities exist that confuse respondents. Are the response choices sufficiently clear to elicit the desired information? • Comprehensiveness: Are the questions and response choices sufficiently comprehensive to cover a reasonably complete range of alternatives? The researchers may find that certain questions are irrelevant, incomplete, or redundant and that the stated questions do not generate all of the important information required for the study. • Acceptability: Such potential problems as excessive questionnaire length or questions that are perceived to invade the privacy of the respondents, as well as those that may abridge ethical or moral standards, must be identified and addressed by the researchers.
  • 29. Structure of the Questionnaire • Introductory questions: easy to answer, demographic, NOT sensitive • Sensitive/personal questions: just if needed, just late in the questionnaire after the (virtual) rapport is established • Related questions: group by topic • Logical sequence: topics shall be logically connected • Filter/Screening Questions: questions to qualify or disqualify respondents (to make them eligible to respond to other questions, or evaluate their confidence) • Nested Structures: try to avoid large blocks that are responded only by certain participants —very hard to elaborate and compare afterwards • Reliability Checks: reformulate and present questions that you consider particularly relevant to be responded accurately (Do you like writing code? When thinking about writing code you feel…)
  • 30. Types of Questions • Open-ended Questions: the respondent can write free text (long or short) • Close-ended Questions: set of alternatives; multiple choice (with minimum and maximum choices), exclusive choices, Likert Scale.
  • 31. Open-ended vs Close-ended Open-ended Close-ended Allow usage of personal words 🙂 ☹ Unusual answers can be identified 🙂 😐 Typically not leading 🙂 😐 Useful to explore new areas 🙂 ☹ Time effective ☹ 🙂 Answers need to be coded ☹ 🙂 Clear answers ☹ 🙂 Easy to process ☹ 🙂 Compatible answers ☹ 🙂 Answers clarify questions ☹ 🙂 Spontaneous Answers 🙂 ☹ Exhaustive Answers 🙂 ☹ Different perception of scales 🙂 ☹
  • 32. Formulating Questions: Tips • Given a question, how would YOU answer it? • Given a question, test it with peers (for initial draft) • Pilot the set of questions with a group of respondents from which you can get feedback (e.g., colleagues, subjects from company) • Remember that you may not know the terminology typically used by your respondents, soy may have to perform preliminary unstructured interviews to understand the typical terminology
  • 33. Formulating Questions: Tips • Given a question, how would YOU answer it? • Given a question, test it with peers (for initial draft) • Pilot the set of questions with a group of respondents from which you can get feedback (e.g., colleagues, subjects from company) • Remember that you may not know the terminology typically used by your respondents, soy may have to perform preliminary unstructured interviews to understand the typical terminology PILOT, PILOT, PILOT
  • 34. Formulating Questions: Tips • Avoid vague/ambiguous questions and answers: • How often does your group have meetings? [Often…Never] • How frequently does your group have meetings? [Once a day, Once per week, …] • Avoid double negatives: Do you consider not appropriate to avoid testing? • Avoid long questions: Which types of defects are typically encountered by developers whose relevance is normally difficult to communicate to managers? • Avoid general questions: What is the general, physical, intellectual, and moral condition of men and women employed in your group? • Avoid double-barrelled questions: How satisfied are you with the space and the colleagues? What testing environment do you normally use? (there could be no testing environment in use) • Avoid technical terms: What is the Six-sigma Maturity Level of your process? • Prefer forced choice answers instead of “all that apply” (for each choice: YES, NO)
  • 35. What Types of Responses?Questionnaire Design Free-text Numeric values • Open questions • Allow coding • Content analysis • High effort on data analysis • Open questions • Allow a wide range of statistical analysis Interval Scale • Closed questions • Not necessarily equally distributed intervals • Significantly restricts statistical analysis Ordinal/ Likert scale • Closed questions • Intervals are considered equally distributed • Statistical analysis is less restrictive than Interval Scale Nominal • Closed questions • Statistical analysis based on frequency cf. Torchiano et al. https://www.slideshare.net/mendezfe/surveys-in-software-engineering likert scale
  • 36. Response Formats: Examples Questionnaire Design How much experience do you have in Java programming? a) Very High experience b) High Experience c) Few Experience d) Very Few experience How much experience do you have in Java Programming? a) Less than one year b) 1 year to 3 years c) 3 years to 5 years d) More than 5 years How much experience do you have in Java programming? __5__ years How much experience do you have in Java programming? I have been working with Java programming at companies since 2011. Before, I got my first Java certification in 2009, when I started working in personal projects. But I have difficult withobject-orientedparts…_________ Do you have experience in Java programming? ( ) Yes ( ) No cf. Torchiano et al. https://www.slideshare.net/mendezfe/surveys-in-software-engineering
  • 37. Tip: Standardised Answers • When possible, use statements and standardised Likert- scale answers indicating agreement (more answers can be gathered): • Strongly Agree, Agree, Disagree, Strongly Disagree
  • 38. Not Just Questions… • The questionnaire must be accompanied by various administrative information including: • An explanation of the purpose of the study. • A description of who is sponsoring the study (and perhaps why). • A cover letter using letterhead paper, dated to be consistent with the mail shot • Provide a contact name and phone number. Personalize the salutation if possible. • An explanation of how the respondents were chosen and why. • An explanation of how to return the questionnaire. • A realistic estimate of the time required to complete the questionnaire. Note that an unrealistic estimate will be counter-productive. And privacy issues (later)
  • 39. Tips for a Successful Survey cf. Torchiano et al. https://www.slideshare.net/mendezfe/surveys-in-software-engineering
  • 40. Recruiting • Send individual but standard invitation messages • It is expected that great most of the individual messages sent will be read • Avoid "spreading spree": mailing lists, forum invitation messages, crowdsourcing tools (such as Amazon MechanicalTurk) • You will have few or no control on who read the invitation. So, who was effectively recruited? • Never allow forwarding (which is different from snowballing)! —It will violate the sample • Send a questionnaire’s individual token to each subject • Establish a finite and not long period to answer the survey (One-two weeks) • Offer rewards (raffles, donations, payments, sharing results)
  • 41. Reminding • Reminders should be used with care. • Avoid reminding who already had participated • Avoid reminding more than once • The invitation message should clearly characterize the involved researchers, the research context and present the recruitment parameters • Include in the invitation message a compliment and an observation regarding the relevance of subject participation
  • 42. Piloting • Pilot the population and sampling activities • Use a (smaller) sample of the sampling frame, reproducing all planned steps ü Will allow you to check the adequacy of the frame population to your survey. • Pilot the questionnaire • Is it clear, unambiguous, did you maybe miss some questions? • Is it too long/too short? • Pilot the recruitment • Is it working effectively? • Pilot the data analysis • Do you have planned for the proper data analysis techniques? What is the necessary data quantity and quality?
  • 43. Privacy Policy and General Data Protection Regulation (GDPR) cf. https://www.slideshare.net/alanmcsweeney/gdpr-context-principles-implementation-operation-impact-on- outsourcing-data-governance-and-data-ethics
  • 44. General Data Protection Regulation • General Data Protection Regulation (GDPR) applies to any task dealing with personal data (not just research surveys) • Personal Data: means any information relating to an identified or identifiable natural person ('data subject'); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person • If you distribute your surveys anonymously and you do not process personal data, you can disregard the GDPR. But, be careful, the GDPR has an extremely broad view of what personal data is (basically, most demographic data are personal)!
 • If you use contacts or ask for an email address, name or any other personal data in your surveys, then make sure to read the GDPR, as it imposes a number of responsibilities on you. Any individual who can be distinguished from others is considered identifiable. If you want to ensure that one person answers one form only, you have to identify them!
  • 45. General Data Protection Regulation • If you are creating forms or surveys for a business which is based in the European Union (EU), or if you collect and process the personal data of EU citizens, the General Data Protection Regulation (GDPR) affects you. • The GDPR (General Data Protection Regulation) law basically says that: • you must obtain freely given, specific, informed, and unambiguous consent from your respondents when you collect their personal data. In other words, you shall not force people to respond to or fill out your surveys or forms, or somehow trick them to collect their personal data. • Additionally, must explain how you plan to use their personal data, in a clear and easy to understand way. • Also, as individuals have the right to be forgotten, you must delete information that you have collected from them if they request.
  • 46. Privacy Policy: Content (1) • What you collect and how • In your text, explain what type of personal data you are collecting and how. Is it respondents email, name, or IP address? Is it simply by asking them questions, 
 or are you collecting data automatically (for example their geo-location or IP address)? • Why you collect • Your privacy policy text must clarify your reasons for collecting personal data. Explain for instance why you need their email. 
 Do you have good reasons for collecting their name or address? • How will you use their data • Are you going to share it with third parties? In that case, say who these 3rd parties are and why you need to share their data with them. 
 If you ask for their contact info for instance, are you going to use it to contact them, or send them something? • How long will you keep their data • The GDPR requires you to define a so called “data retention” period, when you collect personal data. Thus your privacy policy text should explain how long you will retain the data.
  • 47. Privacy Policy: Content (2) • How secure is the data in your possession • Your privacy policy must also explain what security measurements are applied when you collect, export, share, and store personal data of your respondents. What tools are you using, and if your data processors are also taking the security of the data seriously. • Clarify your respondents rights • The GDPR clearly defines individuals rights for their own data. You must also make sure to reflect these rights in your privacy policy text, and inform your respondents about their rights, which are as follows: • Right to access, view, and edit their own information in a timely manner • Right to be forgotten, which means being deleted from your survey results • Also right to be able to opt-out form your future messages (e.g. if you use their data to send them ads or marketing messages) • Keep in mind that data is owned by the respondents, not you or your company or organization. • Who to contact • Every organization that is collecting data from EU citizens must have a Data Protection Officer. The DPO is a person in the organization who can represent the organization with respect to data and privacy issues. Including the DPO’s contact information in your privacy policy would be great for your respondents, in case then need to ask questions or practice their rights.
  • 48. Example: Privacy Notice What to write in your survey entry page (with a link to the policy) Why and How Transparency Data Retention Share or Sale of Data Link to Policy Contact Person We want to understand the typical problems of SE students. For this, we need your contribution with this survey. The survey takes 5 to 10 minutes to complete. Together with your opinion, we will ask also personal data, such as your email address, to ask you follow-up questions We securely store this data until the end of 2020 We respect your privacy and therefore we will not share your data with any third party By filling up this form, you agree that we will process your data according to our privacy policy If you have any question regarding your data, contact our data protection officer: Mr. John Doe, j.doe@survey.com
  • 49. Threats to Validity in Survey Research
  • 50. Reliability and Validity • Reliability and Validity are the two main criteria used in survey research to evaluate threats to validity • Reliability is concerned with how well we can reproduce the survey data, as well as the extent of measurement error. That is, a survey is reliable if we get the same kinds and distribution of answers when we administer the survey to two similar groups of respondents. • Validity is concerned with how well the instrument measures what it is supposed to measure. Focus groups and pilot tests shall be performed to ensure reliability and validity
  • 51. Reliability Types • Test-retest (intra-observer) Reliability: how likely is that the person responds in the same way if surveyed twice? • How to ensure: during pilot, survey twice, if correlation greater than 0.7, reliability is good; for some questions, include alternate forms, and ensure Cronbach alpha greater than 0.7 • Inter-rater Reliability: to which extent different observers give similar answers when they assess the same situation? (not so common) • How to ensure: use two pilots with different samples, and check correlation between distributions of answers • Inter-coder Reliability: (in case of open questions) how reliable is the coding procedure? • How to ensure: two coders, joint selection of a master code list, and application of the master codes to the data; check agreement with Krippendorff’s alpha
  • 52. Validity Types • Content Validity: how appropriate the instrument seems to a group of reviewers (i.e., a focus group) with knowledge of the subject matter? • How to ensure: perform a focus group • Construct Validity: to which extent are the constructs related to the measured variables? • How to ensure: provide sound arguments that show the relationship between constructs and questions Other types of validity shall be considered when the survey is repeated Barbara A. Kitchenham and Shari L. Pfleeger, 2008 , https://doi.org/10.1007/978-1-84800-044-5_3
  • 53. Example Survey in SE: Napire (Naming the Pain in Requirements Engineering) Contemporary Problems, Causes, and Effects in Practice cf. http://re-survey.org/#/explore cf. Mendez Fernandez et al. https://arxiv.org/pdf/1611.10288.pdf
  • 54. Napire 3.1 Research Questions Our objective is to get a better understanding of which problems practitioners encounter in RE, and how those problems relate to the overall project setting (causes and problems). To this end, we formulate three research questions, shown in Table 2, to steer the design of our study. Table 2 Research questions. RQ 1 Which contemporary problems exist in RE? RQ 2 What are observable patterns of problems and context characteristics? RQ 3 What are their perceived causes and e↵ects? The first question aims at understanding which problems practitioners experi- ence in general in their RE and what their criticality is w.r.t. project failure. This more descriptive view is complemented by the second research question, which aims at understanding whether there exist problems that relate to specific context factors, such as the company size or the type of used process model. Once we un- derstand whether there exist specific patterns in the problems, we want to know what their perceived causes and implications are going beyond project failure. 3.2 Instrument The overall instrument used in NaPiRE constitutes in total 35 questions used to collect data on (a) the demographics, (b) how practitioners elicit and document requirements, (c) how requirements are changed and aligned with tests, (d) what and how RE standards are applied and tailored, (e) how RE is improved, and finally (f) what problems practitioners experience in their RE. In the study at hands, we focus on the problems practitioners experience in their RE while using 8 D. M´endez Fern´andez et al. Table 3 Questions (simplified and condensed excerpt). Parts No. Question Type Demographics Q 1 What is the size of your company? Closed(SC) Q 2 Please describe the main business area and application domain. Open Q 3 Does your company participate in globally distributed projects? Closed(SC) Q 4 In which country are you personally located? Open Q 5 To which project role are you most frequently assigned? Closed(SC) Q 6 How do you rate your experience in this role? Closed(SC) Q 7 Which organisational role does your company take most frequently in your projects? Closed(MC) Q 8 Which process model do you follow (or a variation of it)? Closed(MC) Status Quo Q 9 How do you elicit requirements? Closed(MC) Q 10 How do you document functional requirements? Closed(SC) Q 11 How do you document non-functional requirements? Closed(SC) Q 12 How do you deal with changing requirements after the initial release? Closed(SC) ... ... ... Q 16 What requirements engineering company standard have you established at your company? Closed(MC) ... ... ... Problems Q 28 Considering your personal experiences, how do the fol- lowing (more general) problems in requirements engi- neering apply to your projects? Likert Q 29 Considering your personally experienced problems (stated in the previous question), which ones would you classify as the five most critical ones (ordered by their relevance). Closed Q 30 Considering your personally experienced most critical problems (selected in the previous question), which causes do they have? Open Q 31 Considering your personally experienced most critical problems (selected in the previous question), which im- plications do they have? Open Q 32 Considering your personally experienced most critical problems (selected in the previous question), which mitigations do you define (if at all)? Open Q 33 Considering your personally experienced most critical Closed(MC) Research Questions Questions (Example)
  • 55. Results To analyse the influence of the most cited causes on the most cited problems and, in turn, of those problems to project failure (as reported by the survey re- spondents), we visualise the relationships via an alluvial diagram. This diagram is shown in Figure 3. The decision to relate only the most cited causes to the most cited RE problems was taken to enhance the visualisation. Communication flaws between project team and the customer Customer does not know what he wants Lack of a well-defined RE process Lack of experience of RE team members Lack of time Missing direct communication to customer Requirements remain too abstract Too high team distribution Unclear roles and responsonsibilities at customer side Weak qualification of RE team members Communication flaws between project team and the customer Communication flaws within the project team Incomplete and / or hidden requirements Inconsistent requirements Insufficient support by customer Moving targets (changing goals, business processes and / or requirements) Stakeholders with difficulties in separating requirements from previously known solution designs Time boxing / Not enough time in general Underspecified requirements that are too abstract and allow for various interpretations Weak access to customer needs and / or (internal) business information Project Completed Project Failed Fig. 3 Relation of top 10 causes, top 10 problems, and the project impact. causes vs problems
  • 56. Summary • Surveys are a hybrid method between qualitative and quantitative research • Sampling is crucial to have good data • Piloting is crucial (you have one shot only) • Clarity of questions and time to answer is key • Don’t forget about privacy issues