Scientific software engineering methods and their validity
1. Technische Universität München
Philosophy of Science
Scientific Methods and their Validity
Dr. Daniel Méndez Fernández
Prof. Dr. Manfred Broy
Technische Universität München
Institute for Informatics
Software & Systems Engineering
Dr. Antonio Vetrò
2. Goals of the talk
§ Get (back) to a bigger picture
– Start from a general point of view in the philosophy of science
– Drill down to implications for every day scientific work (Projects,
Publications, PhD Thesis, …)
§ Discuss …
– how to allocate the presented methods into that picture
– the methods in context of a PhD dissertation
– the notion of validity and how to increase it
2
5. What is science?
Science: Systematically and objectively gaining (and preserving), documenting,
and disseminating knowledge
§ In principle, science tries to be objective by aspiring knowledge based on “facts”
(independent of subjective judgment!)
However:
§ Accepting scientific results is a social process (documentation, communication,
following rules).
§ Some elements of science (mathematics, logics) seem to be unbiased – but
nevertheless rely on acceptance by the peers and capabilities to apply the
theories.
§ One could also say: “In the end, it is also a matter of beliefs, capability, and
individual and social judgment”
(following some basic principles, rules, and codes)
5
7. Philosophy and science
Ontology
(“Seinslehre”)
Epistemology
(“Erkenntnislehre”)
Ethics
(“Verhaltenslehre”)
Ontological questions
(“Außenweltproblem”)
Is there a world
independent of
subjectivity?
Epistemological
questions
(“Erkenntnisproblem”)
From ehere do
discoveries result?
From experiences?
Ethical questions
(“Verhaltensproblem”)
From where does ethics
result? Does there exist
something like universal
ethics?
Idealism
Rationalism
Normative Ethics
Realism
Empiricism
Descriptive Ethics
Solipsism
Scepticism
Everyday Ethics
From: Orkunoglu, 2010
7
8. What is science?
§ Aristoteles (384-324 BC)
– Search for truth
– Search for laws and reasoning for phenomena
– Understanding the nature of phenomena
§ Francis Bacon (1561-1626)
– Progress of knowledge of nature (reality)
– Draw benefits from growing knowledge
§ Era of (French) Enlightenment (Voltaire (1694-1778), Diderot (1713-1784))
– Emancipation from god and beliefs
§ Kant (1724-1804)
– System of Epistemology
§ Constructivism (Förster (1911-2002), N. Luhmann (1927-1998))
– Subjective construction
8
From: Orkunoglu, 2010
9. What is science?
Science
Theory
•
•
•
•
•
•
Formal theories
Deduction
Models
Predictions
Explanations
…
Empiricism
•
•
•
•
Observations
Experiments
Facts
…
3 obje
c
• An tives of sc
ien
al
• Pre yse and Ex ce:
plain
d
• De ict
sign
Communication
• Intersubjective
evaluations
• Agreement
• …
Engineering approach: developing tools and techniques to solve practical
problems by means of existing technology and available knowledge: is this science ?
9
Adapted From: Orkunoglu, 2010
10. What is the notion of “Truth”?
§ We speak about truth, if no subjective interpretation and distortion is possible
§ We could also say: “Whenever I repeat my treatment to a certain population, it will
always lead to the same observation”
§ If we have “universal truth”, we can call our results “generalisable” (“externally valid”)
Challenges: Obtaining truth
§ Can we obtain something as “universal truth”?
§ Can we do so in a life time? Or even within a PhD?
§ What if my observations/interpretations/analyses are dependent on human factors?
à Things can be true for certain contexts only!
10
Image: Sjøberg, 2011
11. A major challenge: Human factors
Why are human factors important to our field?
§ Software Engineering is an engineering discipline applied by human beings.
§ The value of solutions to practical problems too often depends on those to apply
the solutions.
What implications can we draw from that?
§ The notion of truth is “threatened” by subjectivity.
à The good: We can make use of that subjectivity
(e.g. “expert opinion”)
à The bad: We need to be aware of the implications
(e.g. the threats to the external validity)
à The ugly: When relying on subjects, we will never obtain full external validity
… One could also say: “Outside mathematics, there is no certainty.”
11
12. Truth in science is relative!
The different views onto science
§ Science is created by humans
– sociology of science
– psychology of science (or scientists)
– economy of science
§ Science as knowledge creation (discovery)
– theory of knowledge
– knowledge and insight
– understanding and explanation
§ Science as mean to change the world – creative science
– science and power
– science and technology
– design
12
14. Big Picture… 1st layer
Examples
Philosophy of science
Principle ways of working
Epistemology
(“Erkenntnistheorie”)
Empirical methods
Theories
Case studies
Methods and Tools
Hypothesis testing
Fundamental Theories
Statistics
Logic14
15. In Software Engineering, we rely on every layer!
Philosophy of science
Principle ways of working
Setting of Empirical
Software Engineering:
§ Methods and tools
§ Support theory building and
evaluation
§ Analogy:
Theoretical and Experimental Physics
Methods and Tools
Fundamental Theories
15
16. What do we usually need (e.g. in a PhD)?
Philosophy of science
Principle ways of working
You are (usually) here
Methods and Tools
Fundamental Theories
16
17. Big Picture… 2nd layer
Theory/System of
theories
Theory
Building
Deduction
(Tentative)
Hypotheses
Induction
Falsification /
Support
Pattern
Building
Observations /
Evaluations
Study
Population
Further reading: Runeson et al.
Case Study Research in Software Engineering: Guidelines and Experiments
17
18. Big Picture… 3rd layer: Methods and Tools
§ Each method I can apply…
– Has a specific purpose
– Relies on a specific data type
Purposes
§ Exploratory
Example: Grounded Theory
§ Descriptive
§ Explanatory
§ Improving
(Tentative)
Hypotheses
Descriptive
Exploratory, or
Explanatory
Data Types
§ Qualitative
§ Quantitative
Study
Population
Qualitative Data
18
19. Big Picture… 3rd layer: Methods and Tools
Theory/System of
theories
Grounded theory
Theory
Building
Exploratory
• Case Field Studies
• Data Analysis
Survey and Interview
Research
• Ethnographic
Studies
• Folklore Gathering
Formal / conceptual
analysis
(Tentative)
Hypotheses
Falsification /
Support
Pattern
Building
Confirmatory
• Case Field Studies
• Experiments,
• Simulations
Observations /
Evaluations
Study
Population
Further reading: Runeson et al.
Case Study Research in Software Engineering: Guidelines and Experiments
For n
ow, pr
otot
is not
part o yping
“meth
od vie f this
w”
aren’t
refere (so
mode nce
ls) 19
20. How much external validity can I expect from
applying the methods we usually apply?
Environment:
Reality
...
You s
ha
get a ll only
fee
please ling,
don
sue us ‘t
Survey
Research
Action Research
Field Study Research
Case Study Research
(Lab) Experiment
Level of Evidence
Simulation
Artificial
Environment
...
20
21. We distinguish different levels of evidence
Strong
evidence
Evidence
+
For
Circumstantial evidence
Third-party claim
First or second part
claim
First or second part
claim
Third-party claim
Circumstantial evidence
Against
Evidence
Strong
evidence
Further reading: Wohlin
An Evidence Profile for Software Engineering Research and Practice
21
23. Preliminary remarks:
A PhD thesis can have many contributions
Possible contributions
§ Exploration / evaluation of concepts
and dependencies
§ Identification of problems and / or
deficiencies in existing assumptions
§ Contributions to a precise
terminology
§ New views on existing concepts and
transfer of those concepts to new
fields of application
§ New methods / methodologies
§ New theories
§ …
Important:
§ Identification of scientific contribution
There
i
and o s no one
nly wa
writin y of
“good g a
thesis
”
Scientific methods
§ Theories
– Consistent, complete, …
– Validation (of accuracy)
§ Dialectic
§ Empirical methods
– Experiments
– Case/Field Studies
– ….
§ Literature analyses
§ ….
Important:
§ Scientific evaluation
– Empirical
– Experimental
– Theoretical
– Positioning against state of science
– …
23
24. What can be the scope of a thesis?
Practical Problem
Existing Theory
Scientific methods
Evidently solve a problem
(or parts of it)
Refine Theory
Provide guidance
for future research
Inspired by: Shneidermann
Keynote at ESEM 2013
24
25. Problem solving
How it should be
How it often is in reality
Source: http://researchinprogress.tumblr.com
25
26. Let’s engineer problem discovery solving
Implementation Evaluation /
Problem Investigation
Treatment Implementation
- Transfer to practice!
Design Validation
- Effects of treatment in this context?
- Effects satisfy requirements?
- Trade-offs?
- Sensitivity?
Engineering
cycle
- Stakeholders, goals?
- Phenomena? Effects?
- (Lack of) contribution to goals?
Treatment Design
- Specify requirements!
- Contribution to goals?
- Available treatments?
- Design new ones!
Further reading: Wieringa, R.J.:
Relevance and problem choice in design science.
In: Global Perspectives on Design Science Research. Lecture Notes in Computer Science (2010) 61–76
26
27. In any way, stick to the code of scientific working!
Principles in scientific work and behaviour
1. Integrity
2. Honesty
3. Transparency and accuracy
4. Rationalism
Principles of working (and writing)
§ Clearly and objectively outline the goals, methods and contribution of your thesis
– motivation
– relevance
– validity
§ Describe related work, gaps left open, and how you intend to close those gaps
§ Choose appropriate methods (and reflect on them)
§ Work in teams!
27
28. If working in teams
§ Clarify your own (individual) contributions as soon as possible
– Publish together with clear (predefined) authorship
– Make your work transparent
• Discuss with colleagues from your research group (or from other groups)
• Disseminate your results (and get feedback)
à In the end, however, be aware: only your individual contribution counts!
§ Dissertations and (funded) research projects
– Dissertation results can (and often should) be part of research projects
– Problems: Potentially different goals, time constraints, ….
– Instrument:
• Make clear (and discuss) your own contributions
• Publish your results – also in early stages
28
29. Finally:
There is a formal code of ethics for researchers
The seven principles of the code, intended to guide scientist's actions, are:
§ Act with skill and care in all scientific work. Maintain up to date skills and assist
their development in others.
§ Take steps to prevent corrupt practices and professional misconduct. Declare
conflicts of interest.
§ Be alert to the ways in which research derives from and affects the work of other
people, and respect the rights and reputations of others.
§ Ensure that your work is lawful and justified.
§ Minimize and justify any adverse effect your work may have on people, animals
and the natural environment.
§ Seek to discuss the issues that science raises for society. Listen to the aspirations
and concerns of others.
§ Do not knowingly mislead, or allow others to be misled, about scientific matters.
Present and review scientific evidence, theory or interpretation honestly and
accurately.
Source: David King 2007, the UK government's chief scientific advisor
29
30. Professional and ethical responsibility
§
§
§
§
Software engineering involves wider responsibilities than simply the application
of technical skills
Software engineers must behave in an honest and ethically responsible way if
they are to be respected as professionals
Ethical behaviour is more than simply upholding the law
Principles:
– Confidentiality
– Competence
– Intellectual property rights
– Refrain from computer misuse
– …
Further reading: M. Broy and B. Berenbach
Professional and Ethical Dilemmas in Software Engineering, IEEE Computer 2009
30
31. ACM/IEEE Code of Ethics
§ Software engineers shall commit themselves to making the analysis,
specification, design, development, testing and maintenance of software a
beneficial and respected profession. In accordance with their commitment to
the health, safety and welfare of the public, software engineers shall adhere to
the following Eight Principles:
–
PUBLIC INTEREST
–
CLIENT AND EMPLOYER INTEREST
–
PRODUCT
–
JUDGEMENT
–
MANAGEMENT
–
PROFESSION
–
COLLEAGUES
–
SELF
31
33. Postulate
§ There are certain rules and principles for doing scientific work
§ Creation of scientific knowledge follows a number of patterns of scientific
method
§ There is a scientific community to judge about the quality of scientific work
33
34. How to judge the quality of scientific contributions?
§ The notion of quality is multi-faceted... (in general).
§ A scientific contribution as well as the methods used can be evaluated w.r.t.:
– Relevance and impact (theoretical and practical)
– Rigorousness
– Novelty
– Appropriateness
– Validity
– Conformance to scientific rules
– …
34
35. Validity – what is it
In science and statistics, validity
§ is the extent to which a concept, theory, conclusion, or measurement is wellfounded
– well-formedness
– preciseness
– consistency
– scope
– ...
§ corresponds accurately to the real world.
Source: Adapted from Wikipedia
35
36. Understanding the validity: Why and what?
§ Increase awareness of potential threats in my study regarding
– Level of objectivity (“External Validity”)
– Appropriateness of design to answer research questions (“Construct Validity”)
– Appropriateness of measurements (“Internal Validity”)
Ø Support yourself in designing a study
Ø Support others in understanding and potentially replicating your study
Ø Support yourself and others in better understanding:
Ø The context of a study
Ø The limitations of a study
Ø Increase the trustworthiness of the results
36
37. Types of validity
Theory
Experiment objective
Cause
construct
cause-effect
construct
Effect
construct
4
3
Observation
1.
2.
3.
4.
Conclusion
Internal
Construct
External
3
treatment-outcome
construct
Treatment
Independent variable
Outcome
Experiment
operation
1
Source: Wohlin et al.
Experimentation in Software Engineering: An Introduction.
Dependent variable
2
37
38. Types of validity
§ Following classification scheme has been established for empirical SE:
1. Conclusion validity:
“In this study, is there a relationship between treatment and outcome ?
2. Internal Validity:
“Assuming there is a relationship in this study, is the relationship a causal one?”
3. Construct Validity:
Assuming that there is a causal relationship in this study, can we claim that the
treatment reflects well our cause construct and that our measure reflects well
our idea of the construct of the measure ?
4. External Validity:
“Assuming that there is a causal relationship in this study between the cause and
the effect, can we generalize this effect to other persons, places or times ?
38
39. The validity questions are cumulative
§ Validity types build on one other
Can we generalize to other
persons, places, times ?
Can we generalize to the constructs?
Is the dependency causal ?
Adapted from William M.K. Trochim, 2008
Is there a dependency between the cause and
the effect ?
40. Validity is not just the last paragraph of a paper!
Validity evaluation is part of research planning!
§ For each threat type, a list of threats is available in [Cook79] and [Campbell63]
– Credibility
– Transferability
– Confirmability
– …
§ Priority among the threats is a matter of optimization
§ Possible rank in theory testing :
– Internal – construct – conclusion – external
§ Possible rank in applied research:
– Internal – external – construct – conclusion
40
41. How can I support validity in general?
In general, we have 2 possibilities:
1. Support the validity by construction (often referred to as “validity procedures”)
2. Increase the validity after the fact
41
42. Constructively supporting validity
Conclusion Validity
§ Capture and critically discuss statistical assumptions and estimate probability of making errors
§ Draw baselines to compare representatives of samples (e.g., in surveys)
Internal Validity
§ Minimise side-effects and confounding factors, e.g., wording in questionnaire, effects by
interviewer and action research
§ Be unbiased!
§ Refer to method and subject triangulation
Construct Validity
§ Reproducibly define research questions and methods (e.g. by using GQM)
External Validity
§ Observe and explain objects and subjects à Qualitative studies
§ Refer to data triangulation
§ Refer to independent replication studies!
Further Tips
§ Define and report the study according to available guidelines
§ Be patient, be flexible
§ Recognise the positive value of checking the threats to validity!
42
43. Example
§ Comparing four approaches for technical debt identification,
Nico Zazworka, Antonio Vetro’, Clemente Izurieta, Sunny Wong, Yuanfang
Cai, Carolyn Seaman Forrest Shull,
Software Quality Journal, 21(2), 2013
§ Large correlational analyses (~ 100.000 data points) on 13 releases of Hadoop
open source software to discover relationship between quality structural metrics
(at code, design and architectural level) and rework indicators (defect proneness
and change proneness)
Threat
Type
Control strategy
Choice of statistical significance thresholds
Conclusion
Literature-based choice of thresholds
Data transformation [0,N] à [0,1]
Conclusion
Distribution check
Metrics not normalized by classes size
Conclusion
Correlation check
Correlations found are incidental
Internal
Effect measured on two outcomes
Classes size measured by nr of methods
Construct
Correlation check
Defect proneness measured by nr of bug fixes
Construct
Checked with three different computation
methods
Findings generalizability
External
Aggregation on 13 different releases
43
44. Increasing the validity after the fact
Independent Confirmation
§ Case study /experimental research of theories by researchers not involved in
development of theory
§ Replication of experiments or case studies until reaching saturation
(or getting retired)
Challenges
§ What can we expect from a PhD thesis?
Discu
ss!
J
44
45. Some final, but important remarks
§ Don’t focus on the “size” of the problem, but on
– The relevance (the practical, but also the theoretical!)
– The accuracy in the investigation (problem and evaluation research)
§ However: Don’t be afraid to
– aim high!
– be hard-headed!
– (but also accept if things don’t work)
§ When conducting empirical investigations:
– Do not make claims you can not eventually measure
– The scope / locality … is not the most important thing, as long as:
• The study population is accurately chosen and described
• The validity is carefully outlined
• The conclusions are drawn accordingly
§ Finally: Don’t think in black and white only
– Don’t divide the world in basic and applied research
– Don’t be afraid to look also at other disciplines
45