SlideShare a Scribd company logo
1 of 21
Download to read offline
Research Activities:
past, present, and future
Marco Torchiano
Seminario Pubblico
Procedura di selezione per professore universitario di ruolo di I fascia
S.C. 09/H1-S.S.D. ING-INF/05 - codice interno 03/18/P/O
October 19, 2018
Empirical Software Engineering
Survey Case study Case-controlExperiment
Package effsisze
2.2K download/month
1
On the Effectiveness of the
Test-First Approach to Programming
Hakan Erdogmus, Maurizio Morisio, Member, IEEE Computer Society, and
Marco Torchiano, Member, IEEE Computer Society
Abstract—Test-Driven Development (TDD) is based on formalizing a piece of functionality as a test, implementing the functionality
such that the test passes, and iterating the process. This paper describes a controlled experiment for evaluating an important aspect of
TDD: In TDD, programmers write functional tests before the corresponding implementation code. The experiment was conducted with
undergraduate students. While the experiment group applied a test-first strategy, the control group applied a more conventional
development technique, writing tests after the implementation. Both groups followed an incremental process, adding new features one
at a time and regression testing them. We found that test-first students on average wrote more tests and, in turn, students who wrote
more tests tended to be more productive. We also observed that the minimum quality increased linearly with the number of
programmer tests, independent of the development strategy employed.
Index Terms—General programming techniques, coding tools and techniques, testing and debugging, testing strategies, productivity,
Software Quality/SQA, software engineering process, programming paradigms.
æ
1 INTRODUCTION
TEST-DRIVEN Development [1], [2], or TDD, is an approach
to code development popularized by Extreme Program-
ming [3], [4]. The key aspect of TDD is that programmers
write low-level functional tests before production code. This
dynamic, referred to as Test-First, is the focus of this paper.
The technique is usually supported by a unit testing tool,
such as JUnit [5], [6].
Programmer tests in TDD, much like unit tests, tend to
be low-level. Sometimes they can be higher level and cross-
cutting, but do not in general address testing at integration
and system levels. TDD relies on no specific or formal
criterion to select the test cases. The tests are added
gradually during implementation. The testing strategy
employed differs from the situation where tests for a
module are written by a separate quality assurance team
before the module is available. In TDD, tests are written one
at a time and by the same person developing the module.
(Cleanroom [7], for example, also leverages tests that are
specified before implementation, but in a very different
way. Tests are specified to mirror a statistical user input
distribution as part of a formal reliability model, but this
activity is completely separated from the implementation
phase.)
TDD can be considered from several points of view:
. Feedback: Tests provide the programmer with
instant feedback as to whether new functionality
has been implemented as intended and whether it
interferes with old functionality.
. Task-orientation: Tests drive the coding activity,
encouraging the programmer to decompose the
problem into manageable, formalized programming
tasks, helping to maintain focus and providing
steady, measurable progress.
. Quality assurance: Having up-to-date tests in place
ensures a certain level of quality, maintained by
frequently running the tests.
. Low-level design: Tests provide the context in which
low-level design decisions are made, such as which
classes and methods to create, how they will be
named, what interfaces they will possess, and how
they will be used.
Although advocates claim that TDD enhances both
product quality and programmer productivity, skeptics
maintain that such an approach to programming would be
both counterproductive and hard to learn. It is easier to
imagine how interleaving coding with testing could
increase quality, but harder to imagine a similar positive
effect on productivity. Many practitioners consider tests as
overhead and testing as exclusively the job of a separate
quality assurance team. But, what if the benefits of
programmer tests outweigh their cost? If the technique
indeed has merit, a reexamination of the traditional coding
techniques could be warranted.
This paper presents a formal investigation of the
strengths and weaknesses of the Test-First approach to
programming, the central component of TDD. We study
this component in the context in which it is believed to be
most effective, namely, incremental development. To
undertake the investigation, we designed a controlled
226 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 31, NO. 3, MARCH 2005
. H. Erdogmus is with the Institute for Information Technology, National
Research Council Canada, Bldg. M-50, Montreal Rd., Ottawa, Ontario
K1A 0R6, Canada. E-mail: Hakan.Erdogmus@nrc-cnrc.gc.ca.
. M. Morisio and M. Torchiano are with the Dipartimento Automatica e
Informatica, Politecnico di Torino, C.so Duca degli Abruzzi, 24, I-10129,
Torino, Italy. E-mail: {Maurizio.Morisio, Marco.Torchiano}@polito.it.
Manuscript received 13 Feb. 2004; revised 6 Dec. 2004; accepted 31 Dec. 2004;
published online 20 Apr. 2005.
Recommended for acceptance by D. Rombach.
For information on obtaining reprints of this article, please send e-mail to:
tse@computer.org, and reference IEEECS Log Number TSE-0026-0204.
0098-5589/05/$20.00 ß 2005 IEEE Published by the IEEE Computer Society
362 citations
ERDOGMUS ET AL.: ON THE EFFECTIVENESS OF THE TEST-FIRST APPROACH TO PROGRAMMING
Test-Driven Development
• TDD is an agile development practice
• Made popular by eXtreme Programming
• Adopted widely
• Promises:
• More and better tests
• Higher external quality
• Improved productivity
Test-Driven Development: findings
• Higher number of tests
• Impact on quality and productivity
• No clear difference in terms of quality
• Too small tasks
• Process conformance
• Slight improvement in productivity
• More focused
• Less rework
On the Effectiveness of the
Test-First Approach to Programming
Hakan Erdogmus, Maurizio Morisio, Member, IEEE Computer Society, and
Marco Torchiano, Member, IEEE Computer Society
Abstract—Test-Driven Development (TDD) is based on formalizing a piece of functionality as a test, implementing the functionality
such that the test passes, and iterating the process. This paper describes a controlled experiment for evaluating an important aspect of
TDD: In TDD, programmers write functional tests before the corresponding implementation code. The experiment was conducted with
undergraduate students. While the experiment group applied a test-first strategy, the control group applied a more conventional
development technique, writing tests after the implementation. Both groups followed an incremental process, adding new features one
at a time and regression testing them. We found that test-first students on average wrote more tests and, in turn, students who wrote
more tests tended to be more productive. We also observed that the minimum quality increased linearly with the number of
programmer tests, independent of the development strategy employed.
Index Terms—General programming techniques, coding tools and techniques, testing and debugging, testing strategies, productivity,
Software Quality/SQA, software engineering process, programming paradigms.
æ
1 INTRODUCTION
TEST-DRIVEN Development [1], [2], or TDD, is an approach
to code development popularized by Extreme Program-
ming [3], [4]. The key aspect of TDD is that programmers
write low-level functional tests before production code. This
dynamic, referred to as Test-First, is the focus of this paper.
The technique is usually supported by a unit testing tool,
such as JUnit [5], [6].
Programmer tests in TDD, much like unit tests, tend to
be low-level. Sometimes they can be higher level and cross-
cutting, but do not in general address testing at integration
and system levels. TDD relies on no specific or formal
criterion to select the test cases. The tests are added
gradually during implementation. The testing strategy
employed differs from the situation where tests for a
module are written by a separate quality assurance team
before the module is available. In TDD, tests are written one
at a time and by the same person developing the module.
(Cleanroom [7], for example, also leverages tests that are
specified before implementation, but in a very different
way. Tests are specified to mirror a statistical user input
distribution as part of a formal reliability model, but this
activity is completely separated from the implementation
phase.)
TDD can be considered from several points of view:
. Feedback: Tests provide the programmer with
instant feedback as to whether new functionality
has been implemented as intended and whether it
interferes with old functionality.
. Task-orientation: Tests drive the coding activity,
encouraging the programmer to decompose the
problem into manageable, formalized programming
tasks, helping to maintain focus and providing
steady, measurable progress.
. Quality assurance: Having up-to-date tests in place
ensures a certain level of quality, maintained by
frequently running the tests.
. Low-level design: Tests provide the context in which
low-level design decisions are made, such as which
classes and methods to create, how they will be
named, what interfaces they will possess, and how
they will be used.
Although advocates claim that TDD enhances both
product quality and programmer productivity, skeptics
maintain that such an approach to programming would be
both counterproductive and hard to learn. It is easier to
imagine how interleaving coding with testing could
increase quality, but harder to imagine a similar positive
effect on productivity. Many practitioners consider tests as
overhead and testing as exclusively the job of a separate
quality assurance team. But, what if the benefits of
programmer tests outweigh their cost? If the technique
indeed has merit, a reexamination of the traditional coding
techniques could be warranted.
This paper presents a formal investigation of the
strengths and weaknesses of the Test-First approach to
programming, the central component of TDD. We study
this component in the context in which it is believed to be
most effective, namely, incremental development. To
undertake the investigation, we designed a controlled
226 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 31, NO. 3, MARCH 2005
. H. Erdogmus is with the Institute for Information Technology, National
Research Council Canada, Bldg. M-50, Montreal Rd., Ottawa, Ontario
K1A 0R6, Canada. E-mail: Hakan.Erdogmus@nrc-cnrc.gc.ca.
. M. Morisio and M. Torchiano are with the Dipartimento Automatica e
Informatica, Politecnico di Torino, C.so Duca degli Abruzzi, 24, I-10129,
Torino, Italy. E-mail: {Maurizio.Morisio, Marco.Torchiano}@polito.it.
Manuscript received 13 Feb. 2004; revised 6 Dec. 2004; accepted 31 Dec. 2004;
published online 20 Apr. 2005.
Recommended for acceptance by D. Rombach.
For information on obtaining reprints of this article, please send e-mail to:
tse@computer.org, and reference IEEECS Log Number TSE-0026-0204.
0098-5589/05/$20.00 ß 2005 IEEE Published by the IEEE Computer Society
1
On the Effectiveness of the
Test-First Approach to Programming
Hakan Erdogmus, Maurizio Morisio, Member, IEEE Computer Society, and
Marco Torchiano, Member, IEEE Computer Society
Abstract—Test-Driven Development (TDD) is based on formalizing a piece of functionality as a test, implementing the functionality
such that the test passes, and iterating the process. This paper describes a controlled experiment for evaluating an important aspect of
TDD: In TDD, programmers write functional tests before the corresponding implementation code. The experiment was conducted with
undergraduate students. While the experiment group applied a test-first strategy, the control group applied a more conventional
development technique, writing tests after the implementation. Both groups followed an incremental process, adding new features one
at a time and regression testing them. We found that test-first students on average wrote more tests and, in turn, students who wrote
more tests tended to be more productive. We also observed that the minimum quality increased linearly with the number of
programmer tests, independent of the development strategy employed.
Index Terms—General programming techniques, coding tools and techniques, testing and debugging, testing strategies, productivity,
Software Quality/SQA, software engineering process, programming paradigms.
æ
1 INTRODUCTION
TEST-DRIVEN Development [1], [2], or TDD, is an approach
to code development popularized by Extreme Program-
ming [3], [4]. The key aspect of TDD is that programmers
write low-level functional tests before production code. This
dynamic, referred to as Test-First, is the focus of this paper.
The technique is usually supported by a unit testing tool,
such as JUnit [5], [6].
Programmer tests in TDD, much like unit tests, tend to
be low-level. Sometimes they can be higher level and cross-
cutting, but do not in general address testing at integration
and system levels. TDD relies on no specific or formal
criterion to select the test cases. The tests are added
gradually during implementation. The testing strategy
employed differs from the situation where tests for a
module are written by a separate quality assurance team
before the module is available. In TDD, tests are written one
at a time and by the same person developing the module.
(Cleanroom [7], for example, also leverages tests that are
specified before implementation, but in a very different
way. Tests are specified to mirror a statistical user input
distribution as part of a formal reliability model, but this
activity is completely separated from the implementation
phase.)
TDD can be considered from several points of view:
. Feedback: Tests provide the programmer with
instant feedback as to whether new functionality
has been implemented as intended and whether it
interferes with old functionality.
. Task-orientation: Tests drive the coding activity,
encouraging the programmer to decompose the
problem into manageable, formalized programming
tasks, helping to maintain focus and providing
steady, measurable progress.
. Quality assurance: Having up-to-date tests in place
ensures a certain level of quality, maintained by
frequently running the tests.
. Low-level design: Tests provide the context in which
low-level design decisions are made, such as which
classes and methods to create, how they will be
named, what interfaces they will possess, and how
they will be used.
Although advocates claim that TDD enhances both
product quality and programmer productivity, skeptics
maintain that such an approach to programming would be
both counterproductive and hard to learn. It is easier to
imagine how interleaving coding with testing could
increase quality, but harder to imagine a similar positive
effect on productivity. Many practitioners consider tests as
overhead and testing as exclusively the job of a separate
quality assurance team. But, what if the benefits of
programmer tests outweigh their cost? If the technique
indeed has merit, a reexamination of the traditional coding
techniques could be warranted.
This paper presents a formal investigation of the
strengths and weaknesses of the Test-First approach to
programming, the central component of TDD. We study
this component in the context in which it is believed to be
most effective, namely, incremental development. To
undertake the investigation, we designed a controlled
226 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 31, NO. 3, MARCH 2005
. H. Erdogmus is with the Institute for Information Technology, National
Research Council Canada, Bldg. M-50, Montreal Rd., Ottawa, Ontario
K1A 0R6, Canada. E-mail: Hakan.Erdogmus@nrc-cnrc.gc.ca.
. M. Morisio and M. Torchiano are with the Dipartimento Automatica e
Informatica, Politecnico di Torino, C.so Duca degli Abruzzi, 24, I-10129,
Torino, Italy. E-mail: {Maurizio.Morisio, Marco.Torchiano}@polito.it.
Manuscript received 13 Feb. 2004; revised 6 Dec. 2004; accepted 31 Dec. 2004;
published online 20 Apr. 2005.
Recommended for acceptance by D. Rombach.
For information on obtaining reprints of this article, please send e-mail to:
tse@computer.org, and reference IEEECS Log Number TSE-0026-0204.
0098-5589/05/$20.00 ß 2005 IEEE Published by the IEEE Computer Society
362 citations
2
2 IEEE SOFTWARE Published by the IEEE Computer Society 0740-7459/04/$20.00 © 2004 IEEE
definition would improve discourse and enable
researchers to meta-analyze published empiri-
cal data.
After a speculative effort to define and clas-
sify COTS products,2 we decided to obtain this
much-needed understanding from the bottom
up. We asked people involved in industrial proj-
ects to name the key features in COTS-based de-
velopment and to tell us what they think con-
stitutes a COTS product. Here, we present the
interview results in the form of six theses, which
contradict widely accepted (or simply undis-
puted) ideas. We also present a definition of
“COTS product” that captures the key features.
The sample of projects we used is limited and
can’t represent the whole IT industry. Instead,
our study primarily considers Subject Matter
Experts. While SMEs produce most software,
they’re usually neglected by research. Specifi-
cally, the COTS-development literature seems
to focus on large, mostly governmental proj-
ects performed in major companies. Our goal
is to empirically explore this key topic for the
software industry, collect facts and derive well-
substantiated theses from these facts.
The groundwork
From a methodological viewpoint, we con-
ducted empirical research following two paral-
lel threads that complement each other: a sys-
tematic literature investigation and an on-field
exploratory study based on structured inter-
views. Because the industry doesn’t have a com-
mon understanding of COTS products, we laid
feature
Overlooked Aspects of
COTS-based Development
A
lthough developing with commercial-off-the-shelf components is
gaining more attention from both research and industrial com-
munities,1 most literature on the topic doesn’t clearly identify
context variables such as the type of products, projects, and sys-
tems. In particular, the literature often lacks a definition of “COTS prod-
uct,” or, when a definition is present, it invariably disagrees with other
studies (see the sidebar, “What COTS Product Do You Mean?”). A shared
cots
Studies on commercial-off-the-shelf-based development often
disagree, lack product and project details, and are founded on
uncritically accepted assumptions. This empirical study establishes
key features of industrial COTS-based development and creates a
definition of “COTS products” that captures these features.
Marco Torchiano, Norwegian University of Science and Technology
Maurizio Morisio, Politecnico di Torino
158 citations
Development with OTS components
• Development with Off-The-Shelf components
• Systems are built through integration of existing components
• The marketplace influences the evolution of OTS components
• Continuous trade-off between system and component features
• Strong push from DoD since late ‘90s
• Series of conferences driven by SEI since 2002
• As system become more complex all sw is based on OTS components
• Goal: state of the practice of OTS-based development in EU industry
OTS-Based Development: 6 theses
T1 Open source software is often used as closed source.
T2
Integration problems result from lack of standard compliance;
architectural mismatches constitute a secondary issue.
T3 Custom code mainly provides additional functionalities.
T4
Developers seldom use formal selection procedures.
Familiarity with either the product or the generic
architecture is the key factor in selection.
T5
Architecture is more important than requirements
for product selection.
T6
Integrators tend to influence the vendor concerning product
evolution whenever possible.
2 IEEE SOFTWARE Published by the IEEE Computer Society 0740-7459/04/$20.00 © 2004 IEEE
definition would improve discourse and enable
researchers to meta-analyze published empiri-
cal data.
After a speculative effort to define and clas-
sify COTS products,2 we decided to obtain this
much-needed understanding from the bottom
up. We asked people involved in industrial proj-
ects to name the key features in COTS-based de-
velopment and to tell us what they think con-
stitutes a COTS product. Here, we present the
interview results in the form of six theses, which
contradict widely accepted (or simply undis-
puted) ideas. We also present a definition of
“COTS product” that captures the key features.
The sample of projects we used is limited and
can’t represent the whole IT industry. Instead,
our study primarily considers Subject Matter
Experts. While SMEs produce most software,
they’re usually neglected by research. Specifi-
cally, the COTS-development literature seems
to focus on large, mostly governmental proj-
ects performed in major companies. Our goal
is to empirically explore this key topic for the
software industry, collect facts and derive well-
substantiated theses from these facts.
The groundwork
From a methodological viewpoint, we con-
ducted empirical research following two paral-
lel threads that complement each other: a sys-
tematic literature investigation and an on-field
exploratory study based on structured inter-
views. Because the industry doesn’t have a com-
mon understanding of COTS products, we laid
feature
Overlooked Aspects of
COTS-based Development
A
lthough developing with commercial-off-the-shelf components is
gaining more attention from both research and industrial com-
munities,1 most literature on the topic doesn’t clearly identify
context variables such as the type of products, projects, and sys-
tems. In particular, the literature often lacks a definition of “COTS prod-
uct,” or, when a definition is present, it invariably disagrees with other
studies (see the sidebar, “What COTS Product Do You Mean?”). A shared
cots
Studies on commercial-off-the-shelf-based development often
disagree, lack product and project details, and are founded on
uncritically accepted assumptions. This empirical study establishes
key features of industrial COTS-based development and creates a
definition of “COTS products” that captures these features.
Marco Torchiano, Norwegian University of Science and Technology
Maurizio Morisio, Politecnico di Torino
[49] B.H. Cohen, Explaining P
Wiley & Sons, 2000.
[50] Common Risks and Risk Re
www.mitre.org/work/s
CommonRisksCOTS.doc,
[51] V.R. Basili and B.W. Boeh
Computer, vol. 34, no. 5, p
[52] S. Lauesen, “COTS Ten
J. Requirements Eng., vol.
[53] J. Li, R. Conradi, O.P.N
Torchiano, and M. Morisio
Shelf Component Based
Software Metrics Symp., (a
[54] J. Cohen, P. Cohen, S.G.
Regression/Correlation Ana
Lawrence Erlbaum Assoc
[55] P.M. Podsakoof and D.W
Research: Problems and P
pp. 531-544, 1986.
[56] S.J. Pocock, M.D. Hughe
Reporting of Clinical Tria
The New England J. Medic
Jingyu
ter scie
Science
present
worked
NTNU.
enginee
velopm
project
program
Computer Society and the ACM
Reidar
degree
Science
in 1970
NTNU s
Depart
Science
at the F
ware E
Politecn
current research interests lie
improvement, version models,
ware, and associated empirica
Odd P
degree
Univers
fellow
Univers
heim, N
risk ma
ware e
open so
286
Thanks to:
1
On the Effectiveness of the
Test-First Approach to Programming
Hakan Erdogmus, Maurizio Morisio, Member, IEEE Computer Society, and
Marco Torchiano, Member, IEEE Computer Society
Abstract—Test-Driven Development (TDD) is based on formalizing a piece of functionality as a test, implementing the functionality
such that the test passes, and iterating the process. This paper describes a controlled experiment for evaluating an important aspect of
TDD: In TDD, programmers write functional tests before the corresponding implementation code. The experiment was conducted with
undergraduate students. While the experiment group applied a test-first strategy, the control group applied a more conventional
development technique, writing tests after the implementation. Both groups followed an incremental process, adding new features one
at a time and regression testing them. We found that test-first students on average wrote more tests and, in turn, students who wrote
more tests tended to be more productive. We also observed that the minimum quality increased linearly with the number of
programmer tests, independent of the development strategy employed.
Index Terms—General programming techniques, coding tools and techniques, testing and debugging, testing strategies, productivity,
Software Quality/SQA, software engineering process, programming paradigms.
æ
1 INTRODUCTION
TEST-DRIVEN Development [1], [2], or TDD, is an approach
to code development popularized by Extreme Program-
ming [3], [4]. The key aspect of TDD is that programmers
write low-level functional tests before production code. This
dynamic, referred to as Test-First, is the focus of this paper.
The technique is usually supported by a unit testing tool,
such as JUnit [5], [6].
Programmer tests in TDD, much like unit tests, tend to
be low-level. Sometimes they can be higher level and cross-
cutting, but do not in general address testing at integration
and system levels. TDD relies on no specific or formal
criterion to select the test cases. The tests are added
gradually during implementation. The testing strategy
employed differs from the situation where tests for a
module are written by a separate quality assurance team
before the module is available. In TDD, tests are written one
at a time and by the same person developing the module.
(Cleanroom [7], for example, also leverages tests that are
specified before implementation, but in a very different
way. Tests are specified to mirror a statistical user input
distribution as part of a formal reliability model, but this
activity is completely separated from the implementation
phase.)
TDD can be considered from several points of view:
. Feedback: Tests provide the programmer with
instant feedback as to whether new functionality
has been implemented as intended and whether it
interferes with old functionality.
. Task-orientation: Tests drive the coding activity,
encouraging the programmer to decompose the
problem into manageable, formalized programming
tasks, helping to maintain focus and providing
steady, measurable progress.
. Quality assurance: Having up-to-date tests in place
ensures a certain level of quality, maintained by
frequently running the tests.
. Low-level design: Tests provide the context in which
low-level design decisions are made, such as which
classes and methods to create, how they will be
named, what interfaces they will possess, and how
they will be used.
Although advocates claim that TDD enhances both
product quality and programmer productivity, skeptics
maintain that such an approach to programming would be
both counterproductive and hard to learn. It is easier to
imagine how interleaving coding with testing could
increase quality, but harder to imagine a similar positive
effect on productivity. Many practitioners consider tests as
overhead and testing as exclusively the job of a separate
quality assurance team. But, what if the benefits of
programmer tests outweigh their cost? If the technique
indeed has merit, a reexamination of the traditional coding
techniques could be warranted.
This paper presents a formal investigation of the
strengths and weaknesses of the Test-First approach to
programming, the central component of TDD. We study
this component in the context in which it is believed to be
most effective, namely, incremental development. To
undertake the investigation, we designed a controlled
226 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 31, NO. 3, MARCH 2005
. H. Erdogmus is with the Institute for Information Technology, National
Research Council Canada, Bldg. M-50, Montreal Rd., Ottawa, Ontario
K1A 0R6, Canada. E-mail: Hakan.Erdogmus@nrc-cnrc.gc.ca.
. M. Morisio and M. Torchiano are with the Dipartimento Automatica e
Informatica, Politecnico di Torino, C.so Duca degli Abruzzi, 24, I-10129,
Torino, Italy. E-mail: {Maurizio.Morisio, Marco.Torchiano}@polito.it.
Manuscript received 13 Feb. 2004; revised 6 Dec. 2004; accepted 31 Dec. 2004;
published online 20 Apr. 2005.
Recommended for acceptance by D. Rombach.
For information on obtaining reprints of this article, please send e-mail to:
tse@computer.org, and reference IEEECS Log Number TSE-0026-0204.
0098-5589/05/$20.00 ß 2005 IEEE Published by the IEEE Computer Society
362 citations
2
2 IEEE SOFTWARE Published by the IEEE Computer Society 0740-7459/04/$20.00 © 2004 IEEE
definition would improve discourse and enable
researchers to meta-analyze published empiri-
cal data.
After a speculative effort to define and clas-
sify COTS products,2 we decided to obtain this
much-needed understanding from the bottom
up. We asked people involved in industrial proj-
ects to name the key features in COTS-based de-
velopment and to tell us what they think con-
stitutes a COTS product. Here, we present the
interview results in the form of six theses, which
contradict widely accepted (or simply undis-
puted) ideas. We also present a definition of
“COTS product” that captures the key features.
The sample of projects we used is limited and
can’t represent the whole IT industry. Instead,
our study primarily considers Subject Matter
Experts. While SMEs produce most software,
they’re usually neglected by research. Specifi-
cally, the COTS-development literature seems
to focus on large, mostly governmental proj-
ects performed in major companies. Our goal
is to empirically explore this key topic for the
software industry, collect facts and derive well-
substantiated theses from these facts.
The groundwork
From a methodological viewpoint, we con-
ducted empirical research following two paral-
lel threads that complement each other: a sys-
tematic literature investigation and an on-field
exploratory study based on structured inter-
views. Because the industry doesn’t have a com-
mon understanding of COTS products, we laid
feature
Overlooked Aspects of
COTS-based Development
A
lthough developing with commercial-off-the-shelf components is
gaining more attention from both research and industrial com-
munities,1 most literature on the topic doesn’t clearly identify
context variables such as the type of products, projects, and sys-
tems. In particular, the literature often lacks a definition of “COTS prod-
uct,” or, when a definition is present, it invariably disagrees with other
studies (see the sidebar, “What COTS Product Do You Mean?”). A shared
cots
Studies on commercial-off-the-shelf-based development often
disagree, lack product and project details, and are founded on
uncritically accepted assumptions. This empirical study establishes
key features of industrial COTS-based development and creates a
definition of “COTS products” that captures these features.
Marco Torchiano, Norwegian University of Science and Technology
Maurizio Morisio, Politecnico di Torino
158 citations 3
The Journal of Systems and Software 86 (2013) 2110–2126
Contents lists available at SciVerse ScienceDirect
The Journal of Systems and Software
journal homepage: www.elsevier.com/locate/jss
Relevance, benefits, and problems of software modelling and model driven
techniques—A survey in the Italian industry
Marco Torchianoa,∗
, Federico Tomassettia
, Filippo Riccab
, Alessandro Tisob
, Gianna Reggiob
a
Dip. Automatica e Informatica, Politecnico di Torino, Italy
b
DIBRIS, Università di Genova, Italy
a r t i c l e i n f o
Article history:
Received 23 August 2012
Received in revised form 15 March 2013
Accepted 18 March 2013
Available online 1 April 2013
Keywords:
Model driven techniques
Software modelling
Personal opinion survey
a b s t r a c t
Context: Claimed benefits of software modelling and model driven techniques are improvements in pro-
ductivity, portability, maintainability and interoperability. However, little effort has been devoted at
collecting evidence to evaluate their actual relevance, benefits and usage complications.
Goal: The main goals of this paper are: (1) assess the diffusion and relevance of software modelling and
MD techniques in the Italian industry, (2) understand the expected and achieved benefits, and (3) identify
which problems limit/prevent their diffusion.
Method: We conducted an exploratory personal opinion survey with a sample of 155 Italian software
professionals by means of a Web-based questionnaire on-line from February to April 2011.
Results: Software modelling and MD techniques are very relevant in the Italian industry. The adoption of
simple modelling brings common benefits (better design support, documentation improvement, better
maintenance, and higher software quality), while MD techniques make it easier to achieve: improved
standardization, higher productivity, and platform independence. We identified problems, some hinder-
ing adoption (too much effort required and limited usefulness) others preventing it (lack of competencies
and supporting tools).
Conclusions: The relevance represents an important objective motivation for researchers in this area.
The relationship between techniques and attainable benefits represents an instrument for practitioners
planning the adoption of such techniques. In addition the findings may provide hints for companies and
universities.
© 2013 Elsevier Inc. All rights reserved.
1. Introduction
Usually, model-based techniques use models to describe the
architecture and design of a system and/or the behaviour of soft-
ware artefacts generally through a raise in the level of abstraction
(France and Rumpe, 2003). Models can also be used for other
purposes, e.g., to describe business workflows or development pro-
cesses.
They can be used in different phases of the development process
as communication artefacts, as points of references against which
subsequent implementations are verified, or as the basis for further
development. In the latter case, they may become the key elements
in the process, from which other artefacts (most notably code) are
generated.
∗ Corresponding author at: Dip. Automatica e Informatica, Politecnico di Torino,
C.so Duca degli Abruzzi 24, 10129 Torino, Italy. Tel.: +39 011 564 7088;
fax: +39 011 564 7099.
E-mail address: marco.torchiano@polito.it (M. Torchiano).
The term “model” is very general; it is difficult to provide a com-
prehensive yet precise definition of a model. For us, a model is an
artefact realized with the goal to capture and transfer information
in a form as pure as possible limiting the syntax noise. Exam-
ples of models include UML design models (Object Management
Group, 2011b), process models defined through BPMN (Object
Management Group, 2011a), Web application models defined
through WebML (Ceri et al., 2000) as well as textual Domain Specific
Language (DSL) models (Fowler and Parsons, 2011).
Given that models are so heterogeneous and the processes
involving them so varied, it is actually difficult to define the exact
boundaries for modelling. Moreover, models can be used practically
in different ways and with different levels of maturity (Tomassetti
et al., 2012). While in theory we have a set of best practices, in
practice we only found mixed, half baked, and personalized solu-
tions. As a consequence, the different uses of modelling are complex
and difficult to classify precisely.
When models constitute a crucial and operational ingredi-
ent of software development then the process is called model
driven. There are different model driven techniques: Model Driven
0164-1212/$ –see front matter © 2013 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.jss.2013.03.084
67 citations
MDD State of the Practice
• Model-Driven Development is not just modeling
• Code generation
• Model interpretation
• Model transformation
• Often, toolsmithing
• Promised improvements:
• Productivity
• Portability
• Maintainability and
• Interoperability
• …
MDD ≠
MDD Benefits The Journal of Systems and Software 86 (2013) 2110–2126
Contents lists available at SciVerse ScienceDirect
The Journal of Systems and Software
journal homepage: www.elsevier.com/locate/jss
Relevance, benefits, and problems of software modelling and model driven
techniques—A survey in the Italian industry
Marco Torchianoa,∗
, Federico Tomassettia
, Filippo Riccab
, Alessandro Tisob
, Gianna Reggiob
a
Dip. Automatica e Informatica, Politecnico di Torino, Italy
b
DIBRIS, Università di Genova, Italy
a r t i c l e i n f o
Article history:
Received 23 August 2012
Received in revised form 15 March 2013
Accepted 18 March 2013
Available online 1 April 2013
Keywords:
Model driven techniques
Software modelling
Personal opinion survey
a b s t r a c t
Context: Claimed benefits of software modelling and model driven techniques are improvements in pro-
ductivity, portability, maintainability and interoperability. However, little effort has been devoted at
collecting evidence to evaluate their actual relevance, benefits and usage complications.
Goal: The main goals of this paper are: (1) assess the diffusion and relevance of software modelling and
MD techniques in the Italian industry, (2) understand the expected and achieved benefits, and (3) identify
which problems limit/prevent their diffusion.
Method: We conducted an exploratory personal opinion survey with a sample of 155 Italian software
professionals by means of a Web-based questionnaire on-line from February to April 2011.
Results: Software modelling and MD techniques are very relevant in the Italian industry. The adoption of
simple modelling brings common benefits (better design support, documentation improvement, better
maintenance, and higher software quality), while MD techniques make it easier to achieve: improved
standardization, higher productivity, and platform independence. We identified problems, some hinder-
ing adoption (too much effort required and limited usefulness) others preventing it (lack of competencies
and supporting tools).
Conclusions: The relevance represents an important objective motivation for researchers in this area.
The relationship between techniques and attainable benefits represents an instrument for practitioners
planning the adoption of such techniques. In addition the findings may provide hints for companies and
universities.
© 2013 Elsevier Inc. All rights reserved.
1. Introduction
Usually, model-based techniques use models to describe the
architecture and design of a system and/or the behaviour of soft-
ware artefacts generally through a raise in the level of abstraction
(France and Rumpe, 2003). Models can also be used for other
purposes, e.g., to describe business workflows or development pro-
cesses.
They can be used in different phases of the development process
as communication artefacts, as points of references against which
subsequent implementations are verified, or as the basis for further
development. In the latter case, they may become the key elements
in the process, from which other artefacts (most notably code) are
generated.
∗ Corresponding author at: Dip. Automatica e Informatica, Politecnico di Torino,
C.so Duca degli Abruzzi 24, 10129 Torino, Italy. Tel.: +39 011 564 7088;
fax: +39 011 564 7099.
E-mail address: marco.torchiano@polito.it (M. Torchiano).
The term “model” is very general; it is difficult to provide a com-
prehensive yet precise definition of a model. For us, a model is an
artefact realized with the goal to capture and transfer information
in a form as pure as possible limiting the syntax noise. Exam-
ples of models include UML design models (Object Management
Group, 2011b), process models defined through BPMN (Object
Management Group, 2011a), Web application models defined
through WebML (Ceri et al., 2000) as well as textual Domain Specific
Language (DSL) models (Fowler and Parsons, 2011).
Given that models are so heterogeneous and the processes
involving them so varied, it is actually difficult to define the exact
boundaries for modelling. Moreover, models can be used practically
in different ways and with different levels of maturity (Tomassetti
et al., 2012). While in theory we have a set of best practices, in
practice we only found mixed, half baked, and personalized solu-
tions. As a consequence, the different uses of modelling are complex
and difficult to classify precisely.
When models constitute a crucial and operational ingredi-
ent of software development then the process is called model
driven. There are different model driven techniques: Model Driven
0164-1212/$ –see front matter © 2013 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.jss.2013.03.084
2126 M. Torchiano et al. / The Journal of Systems and Software 86 (2013) 2110–2126
Motulsky, H., 2010. Intuitive Biostatistics: A Nonmathematical Guide to Statistical
Thinking. Oxford University Press, New York.
Nugroho, A., Chaudron, M.R.,2008. A survey into the rigor of UML use and its
perceived impact on quality and productivity. In: Proceedings of the ACM-IEEE
International Symposium on Empirical Software Engineering and Measurement,
ESEM’08. ACM, New York, NY, USA, pp. 90–99.
Object Management Group, 2011a. Business Process Model and Notation, v. 2.0,
Standard. Object Management Group.
Object Management Group, 2011b. Unified Modeling Language, Superstructure, v.
2.4, Specifications. Object Management Group.
Punter, T., Ciolkowski, M., Freimut, B., John, I., 2003. Conducting on-line surveys
in software engineering. In: Proceedings of the International Symposium on
Empirical Software Engineering (ISESE), pp. 80–88.
Schmidt, D.C., 2006. Guest editor’s introduction: model-driven engineering. Com-
puter 39, 25–31.
Selic, B., 2003. The pragmatics of model-driven development. IEEE Software 20,
19–25.
Singer, J., Storey, M., Damian, D., 2008. Selecting empirical methods for software
engineering research. In: Singer, Shull, Sjoberg (Eds.), Guide to Advanced Empir-
ical Software Engineering. Springer, London, pp. 285–311.
Tomassetti, F., Torchiano, M., Tiso, A., Ricca, F., Reggio, G., 2012. Maturity of software
modelling and model driven engineering: a survey in the Italian industry. IET
Seminar Digests 2012, 91–100.
Torchiano, M., Di Penta, M., Ricca, F., De Lucia, A., Lanubile, F., 2011a. Migration
of information systems in the Italian industry: a state of the practice survey.
Information and Software Technology 53, 71–86.
Torchiano, M., Tomassetti, F., Ricca, F., Tiso, A., Reggio, G.,2011b. Preliminary findings
from a survey on the MD* state of the practice. In: International Symposium on
Empirical Software Engineering and Measurement (ESEM). IEEE, pp. 372–375.
van Deursen, A., Visser, E., Warmer, J., 2007. Model-driven software evolution: a
research agenda. Technical Report, TUDelft-SERG.
Voelter, M., 2009. Best practices for DSLs and model-driven development. Journal of
Object Technology 8 http://www.jot.fm
Walonick, D.S., 1997. Survival Statistics. StatPac, Inc.
Wohlin, C., Runeson, P., Höst, M., Ohlsson, M., Regnell, B., Wesslén, A., 2000.
Experimentation in Software Engineering – An Introduction. Kluwer Academic
Publishers.
Marco Torchiano is an associate professor at Politecnico
di Torino, Italy; he has been post-doctoral research fel-
low at Norwegian University of Science and Technology
(NTNU), Norway. He received an MSc and a PhD in Com-
puter Engineering from Politecnico di Torino, Italy. He
is author or coauthor of more than 100 research papers
published in international journals and conferences. He is
the co-author of the book ‘Software Development–Case
studies in Java’ from Addison-Wesley, and co-editor of
the book ‘Developing Services for the Wireless Internet’
from Springer. His current research interests are: design
notations, testing methodologies, OTS-based develop-
ment and software engineering for mobile and wireless
applications. The methodological approach he adopts is that of empirical software
engineering.
Federico Tomassetti is a PhD student in the Dept. of Con-
trol and Computer Engineering at Politecnico di Torino.
He is interested in all the means to lift the level of abstrac-
tion, in particular in model-driven development, domain
specific languages and language workbenches. He has
experience also as developer, consultant and technical
writer.
Filippo Ricca is an ass
of Genova, Italy. He r
puter Science from th
the thesis “Analysis, T
Applications”. In 2011
(Most Influential Pape
and Testing of Web Ap
He is author or co
papers published in i
ences/workshops. At th
h-index reported by G
the number of citation
Chair of CSMR 2013, IC
others, he served in the program committees of
ICST, SCAM, CSMR, WCRE and ESEM.
From 1999 to 2006, he worked with the Soft
(now FBK-irst), Trento, Italy. During this time he
on Reverse engineering, Re-engineering and Soft
His current research interests include: Softwa
Empirical studies in Software Engineering, Web
The research is mainly conducted through empir
controlled experiments and surveys.
Alessandro Tiso is a Ph
University of Genoa. He
applied mathematics i
interests are: model dr
mations and use of mod
last 20 years he also w
industry.
Gianna Reggio is Ass
Genova since 1992. Pre
at the same university,
in 1988. She took part i
them SAHARA and SAL
eral international conf
relevant being ETAPS
been a member of the p
national conferences a
2008, and 2007). Her
on the following topics
concurrent systems m
and semantics of prog
started working on development methods for S
several national and international conference an
2126 M. Torchiano et al. / The Journal of Systems and Software 86 (2013) 2110–2126
Motulsky, H., 2010. Intuitive Biostatistics: A Nonmathematical Guide to Statistical
Thinking. Oxford University Press, New York.
Nugroho, A., Chaudron, M.R.,2008. A survey into the rigor of UML use and its
perceived impact on quality and productivity. In: Proceedings of the ACM-IEEE
International Symposium on Empirical Software Engineering and Measurement,
ESEM’08. ACM, New York, NY, USA, pp. 90–99.
Object Management Group, 2011a. Business Process Model and Notation, v. 2.0,
Standard. Object Management Group.
Object Management Group, 2011b. Unified Modeling Language, Superstructure, v.
2.4, Specifications. Object Management Group.
Punter, T., Ciolkowski, M., Freimut, B., John, I., 2003. Conducting on-line surveys
in software engineering. In: Proceedings of the International Symposium on
Empirical Software Engineering (ISESE), pp. 80–88.
Schmidt, D.C., 2006. Guest editor’s introduction: model-driven engineering. Com-
puter 39, 25–31.
Selic, B., 2003. The pragmatics of model-driven development. IEEE Software 20,
19–25.
Singer, J., Storey, M., Damian, D., 2008. Selecting empirical methods for software
engineering research. In: Singer, Shull, Sjoberg (Eds.), Guide to Advanced Empir-
ical Software Engineering. Springer, London, pp. 285–311.
Tomassetti, F., Torchiano, M., Tiso, A., Ricca, F., Reggio, G., 2012. Maturity of software
modelling and model driven engineering: a survey in the Italian industry. IET
Seminar Digests 2012, 91–100.
Torchiano, M., Di Penta, M., Ricca, F., De Lucia, A., Lanubile, F., 2011a. Migration
of information systems in the Italian industry: a state of the practice survey.
Information and Software Technology 53, 71–86.
Torchiano, M., Tomassetti, F., Ricca, F., Tiso, A., Reggio, G.,2011b. Preliminary findings
from a survey on the MD* state of the practice. In: International Symposium on
Empirical Software Engineering and Measurement (ESEM). IEEE, pp. 372–375.
van Deursen, A., Visser, E., Warmer, J., 2007. Model-driven software evolution: a
research agenda. Technical Report, TUDelft-SERG.
Voelter, M., 2009. Best practices for DSLs and model-driven development. Journal of
Object Technology 8 http://www.jot.fm
Walonick, D.S., 1997. Survival Statistics. StatPac, Inc.
Wohlin, C., Runeson, P., Höst, M., Ohlsson, M., Regnell, B., Wesslén, A., 2000.
Experimentation in Software Engineering – An Introduction. Kluwer Academic
Publishers.
Marco Torchiano is an associate professor at Politecnico
di Torino, Italy; he has been post-doctoral research fel-
low at Norwegian University of Science and Technology
(NTNU), Norway. He received an MSc and a PhD in Com-
puter Engineering from Politecnico di Torino, Italy. He
is author or coauthor of more than 100 research papers
published in international journals and conferences. He is
the co-author of the book ‘Software Development–Case
studies in Java’ from Addison-Wesley, and co-editor of
the book ‘Developing Services for the Wireless Internet’
from Springer. His current research interests are: design
notations, testing methodologies, OTS-based develop-
ment and software engineering for mobile and wireless
applications. The methodological approach he adopts is that of empirical software
engineering.
Federico Tomassetti is a PhD student in the Dept. of Con-
trol and Computer Engineering at Politecnico di Torino.
He is interested in all the means to lift the level of abstrac-
tion, in particular in model-driven development, domain
specific languages and language workbenches. He has
experience also as developer, consultant and technical
writer.
Filippo Ricca is an assistant professor at the University
of Genova, Italy. He received his PhD degree in Com-
puter Science from the same University, in 2003, with
the thesis “Analysis, Testing and Re-structuring of Web
Applications”. In 2011 he was awarded the ICSE 2001 MIP
(Most Influential Paper) award, for his paper: “Analysis
and Testing of Web Applications”.
He is author or coauthor of more than 90 research
papers published in international journals and confer-
ences/workshops. At the time of writing (20/03/2013), the
h-index reported by Google Scholar is 24 (13 in Scopus);
the number of citations 2002. Filippo Ricca was Program
Chair of CSMR 2013, ICPC 2011 and WSE 2008. Among the
others, he served in the program committees of the following conferences: ICSM,
ICST, SCAM, CSMR, WCRE and ESEM.
From 1999 to 2006, he worked with the Software Engineering group at ITC-irst
(now FBK-irst), Trento, Italy. During this time he was part of the team that worked
on Reverse engineering, Re-engineering and Software Testing.
His current research interests include: Software modeling, Reverse engineering,
Empirical studies in Software Engineering, Web applications and Software Testing.
The research is mainly conducted through empirical methods such as case studies,
controlled experiments and surveys.
Alessandro Tiso is a PhD student in Computer Science at
University of Genoa. He is also a teacher of informatics and
applied mathematics in a high school. His main research
interests are: model driven development, model transfor-
mations and use of models in software engineering. In the
last 20 years he also worked as an IT consultant in the
industry.
Gianna Reggio is Associate Professor at University of
Genova since 1992. Previously she was Assistant Professor
at the same university, she receive the PhD in Informatics
in 1988. She took part in several research projects, among
them SAHARA and SALADIN (PRIN) and co-organized sev-
eral international conferences and workshops, the most
relevant being ETAPS 2001 and MODELS 2006. She has
been a member of the program committee of several inter-
national conferences and in particular MODELS (in 2009,
2008, and 2007). Her research activity focused mainly
on the following topics: software development methods,
concurrent systems modeling and formal specification,
and semantics of programming languages. Recently she
started working on development methods for SOA applications. She is author of
several national and international conference and journals papers and books.
2126 M. Torchiano et al. / The Journal of Systems and Software 86 (2013) 2110–2126
Motulsky, H., 2010. Intuitive Biostatistics: A Nonmathematical Guide to Statistical
Thinking. Oxford University Press, New York.
Nugroho, A., Chaudron, M.R.,2008. A survey into the rigor of UML use and its
perceived impact on quality and productivity. In: Proceedings of the ACM-IEEE
International Symposium on Empirical Software Engineering and Measurement,
ESEM’08. ACM, New York, NY, USA, pp. 90–99.
Object Management Group, 2011a. Business Process Model and Notation, v. 2.0,
Standard. Object Management Group.
Object Management Group, 2011b. Unified Modeling Language, Superstructure, v.
2.4, Specifications. Object Management Group.
Punter, T., Ciolkowski, M., Freimut, B., John, I., 2003. Conducting on-line surveys
in software engineering. In: Proceedings of the International Symposium on
Empirical Software Engineering (ISESE), pp. 80–88.
Schmidt, D.C., 2006. Guest editor’s introduction: model-driven engineering. Com-
puter 39, 25–31.
Selic, B., 2003. The pragmatics of model-driven development. IEEE Software 20,
19–25.
Singer, J., Storey, M., Damian, D., 2008. Selecting empirical methods for software
engineering research. In: Singer, Shull, Sjoberg (Eds.), Guide to Advanced Empir-
ical Software Engineering. Springer, London, pp. 285–311.
Tomassetti, F., Torchiano, M., Tiso, A., Ricca, F., Reggio, G., 2012. Maturity of software
modelling and model driven engineering: a survey in the Italian industry. IET
Seminar Digests 2012, 91–100.
Torchiano, M., Di Penta, M., Ricca, F., De Lucia, A., Lanubile, F., 2011a. Migration
of information systems in the Italian industry: a state of the practice survey.
Information and Software Technology 53, 71–86.
Torchiano, M., Tomassetti, F., Ricca, F., Tiso, A., Reggio, G.,2011b. Preliminary findings
from a survey on the MD* state of the practice. In: International Symposium on
Empirical Software Engineering and Measurement (ESEM). IEEE, pp. 372–375.
van Deursen, A., Visser, E., Warmer, J., 2007. Model-driven software evolution: a
research agenda. Technical Report, TUDelft-SERG.
Voelter, M., 2009. Best practices for DSLs and model-driven development. Journal of
Object Technology 8 http://www.jot.fm
Walonick, D.S., 1997. Survival Statistics. StatPac, Inc.
Wohlin, C., Runeson, P., Höst, M., Ohlsson, M., Regnell, B., Wesslén, A., 2000.
Experimentation in Software Engineering – An Introduction. Kluwer Academic
Publishers.
Marco Torchiano is an associate professor at Politecnico
di Torino, Italy; he has been post-doctoral research fel-
low at Norwegian University of Science and Technology
(NTNU), Norway. He received an MSc and a PhD in Com-
puter Engineering from Politecnico di Torino, Italy. He
is author or coauthor of more than 100 research papers
published in international journals and conferences. He is
the co-author of the book ‘Software Development–Case
studies in Java’ from Addison-Wesley, and co-editor of
the book ‘Developing Services for the Wireless Internet’
from Springer. His current research interests are: design
notations, testing methodologies, OTS-based develop-
ment and software engineering for mobile and wireless
applications. The methodological approach he adopts is that of empirical software
engineering.
Federico Tomassetti is a PhD student in the Dept. of Con-
trol and Computer Engineering at Politecnico di Torino.
He is interested in all the means to lift the level of abstrac-
tion, in particular in model-driven development, domain
specific languages and language workbenches. He has
experience also as developer, consultant and technical
writer.
Filippo Ricca is an assistant professor at the University
of Genova, Italy. He received his PhD degree in Com-
puter Science from the same University, in 2003, with
the thesis “Analysis, Testing and Re-structuring of Web
Applications”. In 2011 he was awarded the ICSE 2001 MIP
(Most Influential Paper) award, for his paper: “Analysis
and Testing of Web Applications”.
He is author or coauthor of more than 90 research
papers published in international journals and confer-
ences/workshops. At the time of writing (20/03/2013), the
h-index reported by Google Scholar is 24 (13 in Scopus);
the number of citations 2002. Filippo Ricca was Program
Chair of CSMR 2013, ICPC 2011 and WSE 2008. Among the
others, he served in the program committees of the following conferences: ICSM,
ICST, SCAM, CSMR, WCRE and ESEM.
From 1999 to 2006, he worked with the Software Engineering group at ITC-irst
(now FBK-irst), Trento, Italy. During this time he was part of the team that worked
on Reverse engineering, Re-engineering and Software Testing.
His current research interests include: Software modeling, Reverse engineering,
Empirical studies in Software Engineering, Web applications and Software Testing.
The research is mainly conducted through empirical methods such as case studies,
controlled experiments and surveys.
Alessandro Tiso is a PhD student in Computer Science at
University of Genoa. He is also a teacher of informatics and
applied mathematics in a high school. His main research
interests are: model driven development, model transfor-
mations and use of models in software engineering. In the
last 20 years he also worked as an IT consultant in the
industry.
Gianna Reggio is Associate Professor at University of
Genova since 1992. Previously she was Assistant Professor
at the same university, she receive the PhD in Informatics
in 1988. She took part in several research projects, among
them SAHARA and SALADIN (PRIN) and co-organized sev-
eral international conferences and workshops, the most
relevant being ETAPS 2001 and MODELS 2006. She has
been a member of the program committee of several inter-
national conferences and in particular MODELS (in 2009,
2008, and 2007). Her research activity focused mainly
on the following topics: software development methods,
concurrent systems modeling and formal specification,
and semantics of programming languages. Recently she
started working on development methods for SOA applications. She is author of
several national and international conference and journals papers and books.
2126 M. Torchiano et al. / The Journal of Systems and Software 86 (2013) 2110–2126
Motulsky, H., 2010. Intuitive Biostatistics: A Nonmathematical Guide to Statistical
Thinking. Oxford University Press, New York.
Nugroho, A., Chaudron, M.R.,2008. A survey into the rigor of UML use and its
perceived impact on quality and productivity. In: Proceedings of the ACM-IEEE
International Symposium on Empirical Software Engineering and Measurement,
ESEM’08. ACM, New York, NY, USA, pp. 90–99.
Object Management Group, 2011a. Business Process Model and Notation, v. 2.0,
Standard. Object Management Group.
Object Management Group, 2011b. Unified Modeling Language, Superstructure, v.
2.4, Specifications. Object Management Group.
Punter, T., Ciolkowski, M., Freimut, B., John, I., 2003. Conducting on-line surveys
in software engineering. In: Proceedings of the International Symposium on
Empirical Software Engineering (ISESE), pp. 80–88.
Schmidt, D.C., 2006. Guest editor’s introduction: model-driven engineering. Com-
puter 39, 25–31.
Selic, B., 2003. The pragmatics of model-driven development. IEEE Software 20,
19–25.
Singer, J., Storey, M., Damian, D., 2008. Selecting empirical methods for software
engineering research. In: Singer, Shull, Sjoberg (Eds.), Guide to Advanced Empir-
ical Software Engineering. Springer, London, pp. 285–311.
Tomassetti, F., Torchiano, M., Tiso, A., Ricca, F., Reggio, G., 2012. Maturity of software
modelling and model driven engineering: a survey in the Italian industry. IET
Seminar Digests 2012, 91–100.
Torchiano, M., Di Penta, M., Ricca, F., De Lucia, A., Lanubile, F., 2011a. Migration
of information systems in the Italian industry: a state of the practice survey.
Information and Software Technology 53, 71–86.
Torchiano, M., Tomassetti, F., Ricca, F., Tiso, A., Reggio, G.,2011b. Preliminary findings
from a survey on the MD* state of the practice. In: International Symposium on
Empirical Software Engineering and Measurement (ESEM). IEEE, pp. 372–375.
van Deursen, A., Visser, E., Warmer, J., 2007. Model-driven software evolution: a
research agenda. Technical Report, TUDelft-SERG.
Voelter, M., 2009. Best practices for DSLs and model-driven development. Journal of
Object Technology 8 http://www.jot.fm
Walonick, D.S., 1997. Survival Statistics. StatPac, Inc.
Wohlin, C., Runeson, P., Höst, M., Ohlsson, M., Regnell, B., Wesslén, A., 2000.
Experimentation in Software Engineering – An Introduction. Kluwer Academic
Publishers.
Marco Torchiano is an associate professor at Politecnico
di Torino, Italy; he has been post-doctoral research fel-
low at Norwegian University of Science and Technology
(NTNU), Norway. He received an MSc and a PhD in Com-
puter Engineering from Politecnico di Torino, Italy. He
is author or coauthor of more than 100 research papers
published in international journals and conferences. He is
the co-author of the book ‘Software Development–Case
studies in Java’ from Addison-Wesley, and co-editor of
the book ‘Developing Services for the Wireless Internet’
from Springer. His current research interests are: design
notations, testing methodologies, OTS-based develop-
ment and software engineering for mobile and wireless
applications. The methodological approach he adopts is that of empirical software
engineering.
Federico Tomassetti is a PhD student in the Dept. of Con-
trol and Computer Engineering at Politecnico di Torino.
He is interested in all the means to lift the level of abstrac-
tion, in particular in model-driven development, domain
specific languages and language workbenches. He has
experience also as developer, consultant and technical
writer.
Filippo Ricca is an assistant professor at th
of Genova, Italy. He received his PhD degr
puter Science from the same University, in
the thesis “Analysis, Testing and Re-structu
Applications”. In 2011 he was awarded the IC
(Most Influential Paper) award, for his pape
and Testing of Web Applications”.
He is author or coauthor of more than
papers published in international journals
ences/workshops. At the time of writing (20/0
h-index reported by Google Scholar is 24 (13
the number of citations 2002. Filippo Ricca w
Chair of CSMR 2013, ICPC 2011 and WSE 2008
others, he served in the program committees of the following confere
ICST, SCAM, CSMR, WCRE and ESEM.
From 1999 to 2006, he worked with the Software Engineering grou
(now FBK-irst), Trento, Italy. During this time he was part of the team
on Reverse engineering, Re-engineering and Software Testing.
His current research interests include: Software modeling, Reverse
Empirical studies in Software Engineering, Web applications and Softw
The research is mainly conducted through empirical methods such as
controlled experiments and surveys.
Alessandro Tiso is a PhD student in Comput
University of Genoa. He is also a teacher of info
applied mathematics in a high school. His m
interests are: model driven development, mo
mations and use of models in software engine
last 20 years he also worked as an IT consu
industry.
Gianna Reggio is Associate Professor at U
Genova since 1992. Previously she was Assista
at the same university, she receive the PhD in
in 1988. She took part in several research pro
them SAHARA and SALADIN (PRIN) and co-or
eral international conferences and worksho
relevant being ETAPS 2001 and MODELS 20
been a member of the program committee of s
national conferences and in particular MODE
2008, and 2007). Her research activity foc
on the following topics: software developme
concurrent systems modeling and formal s
and semantics of programming languages. R
started working on development methods for SOA applications. She
several national and international conference and journals papers and
Current Research Topics
•Data Quality
•Testing of Mobile App UI
Data Quality
• Data Quality
• Data is a key component in many domains
• Low quality makes the outcomes useless or outright dangerous
• Data is not (anymore) just a part of software
• New ISO quality standards: ISO-25012 , ISO-25024
• Challenges (focus on intrinsic characteristics):
• Practical application of quality measures to concrete cases (Open-Data)
• Quality assessment of unstructured data (NoSql, semantic KB)
Data Quality: practical application
• Open Government Data
• Defined a set of metrics from literature
• Analyzed + municipal data
• Able to identify quality issues
• Different issues depending on disclosure process
• Technical specification: UNI/TS 11725:2018
• Guidelines for the measurement of data quality
• As a member of UNI CT 504
Open data quality measurement framework: Definition and application
to Open Government Data
Antonio Vetrò ⁎, Lorenzo Canova, Marco Torchiano, Camilo Orozco Minotas,
Raimondo Iemma, Federico Morando
Nexa Center for Internet & Society, DAUIN, Politecnico di Torino, Italy
a b s t r a c ta r t i c l e i n f o
Article history:
Received 5 March 2015
Received in revised form 1 February 2016
Accepted 3 February 2016
Available online 20 February 2016
The diffusion of Open Government Data (OGD) in recent years kept a very fast pace. However, evidence from
practitioners shows that disclosing data without proper quality control may jeopardize dataset reuse and nega-
tively affect civic participation. Current approaches to the problem in literature lack a comprehensive theoretical
framework. Moreover, most of the evaluations concentrate on open data platforms, rather than on datasets.
In this work, we address these two limitations and set up a framework of indicators to measure the quality of
Open Government Data on a series of data quality dimensions at most granular level of measurement. We vali-
dated the evaluation framework by applying it to compare two cases of Italian OGD datasets: an internationally
recognized good example of OGD, with centralized disclosure and extensive data quality controls, and samples of
OGD from decentralized data disclosure (municipality level), with no possibility of extensive quality controls as
in the former case, hence with supposed lower quality.
Starting from measurements based on the quality framework, we were able to verify the difference in quality: the
measures showed a few common acquired good practices and weaknesses, and a set of discriminating factors
that pertain to the type of datasets and the overall approach. On the basis of this evaluation, we also provided
technical and policy guidelines to overcome the weaknesses observed in the decentralized release policy, ad-
dressing specific quality aspects.
© 2016 Elsevier Inc. All rights reserved.
Keywords:
Open Government Data
Open data quality
Government information quality
Data quality measurement
Empirical assessment
1. Introduction
Open data are data that “can be freely used, modified, and shared by
anyone for any purpose” (Web ref. 1). Compared to proprietary frame-
works, digital commons such as open data are characterized — from
both a legal and a technical point of view — by lower restrictions applied
to their circulation and reuse. This feature is supposed to ultimately fos-
ter collaboration, creativity and innovation (Hofmokl, 2010).
At all administrative levels, the public sector is one of the major pro-
ducers and holders of information, which ranges, e.g., from maps to
companies registers (Aichholzer & Burkert, 2004). During the last
years, the amount and variety of open data released by public adminis-
trations across the world has been tangibly growing (see, e.g., the Open
Data Census by the Open Knowledge Foundation (Web ref. 2)), while
increased political awareness on the subject has been translated in
regulation, including the revised of the EU Directive on Public Sector
Information reuse in 2013, as well as national roadmaps and technical
guidelines. Releasing public sector information as open data can provide
considerable added value, meeting a demand coming from all kinds of
actors, ranging from companies to Non-Governmental Organizations,
from developers to simple citizens. Many suggest that wider and easier
circulation of public datasets could entail interesting (and even unex-
pected) forms of reuse, also for commercial purposes (Vickery, 2011),
and in general improve transparency of public institutions (Stiglitz,
Orszag, & Orszag, 2000; Ubaldi, 2013) and distributed ability to inter-
pret complex phenomena (Janssen, Charalabidis, & Zuiderwijk, 2012).
From the point of view of data reusers, we should take into account
the role of the so-called infomediaries, i.e., players that are able to inter-
pret data and present them effectively to the general public (Mayer-
Schönberger & Zappia, 2011), with a highly diversified set of business
models (Zuiderwijk, Janssen, & Davis, 2014). More in general, open
data consumers manipulate data in many ways, ranging, e.g., from
data integration to classification, also depending on the complementary
assets they hold (Ferro & Osella, 2013). Considering this, legal and
technical openness of datasets is not sufficient, by itself, to create a pro-
lific reuse ecosystem (Helbig, Nakashima, & Dawe, 2012): failures in
providing good quality information might impair not only the reuse of
the data, but also the usage of the institutional portals (Detlor, Hupfer,
Ruhi, & Zhao, 2013). Attempts to increase meaningfulness and reusabil-
ity of public sector information also imply representing and exposing
data so that they can be easily accessed, queried, processed and linked
with other data with no restrictions (Sharon, 2010). Although sharing
Government Information Quarterly 33 (2016) 325–337
⁎ Corresponding author at: Nexa Center for Internet & Society, Politecnico di Torino, Via
Pier Carlo Boggio, 65/A, 10138 Torino, Italy.
E-mail address: antonio.vetro@polito.it (A. Vetrò).
http://dx.doi.org/10.1016/j.giq.2016.02.001
0740-624X/© 2016 Elsevier Inc. All rights reserved.
Contents lists available at ScienceDirect
Government Information Quarterly
journal homepage: www.elsevier.com/locate/govinf
Barlett, D. L., & Steele, J. B. (1985). Forevermore: Nuclear waste in America.
Basili, V., Caldiera, G., & Rombach, H. D. (1994). The goal question metric approach. In:
Encyclopedia of software engineering. John Wiley & Sons, 528–532.
Batini, C., & Scannapieco, M. (2006). Data quality, concepts, methodologies and techniques.
Berlin: Springer.
Batini, C., Cappiello, C., Francalanci, C., & Maurino A. Methodologies for data quality as-
sessment and improvement. ACM Computing Surveys 41, 3, Article 16 (July 2009),
(52 pp.).
Behkamal, Behshid, Kahani, Mohsen, Bagheri, Ebrahim, & Jeremic, Zoran (May 2014).
A metrics-driven approach for quality assessment of linked open data. Journal of
Theoretical and Applied Electronic Commerce Research, 9(2), 64–79.
Berners-Lee, T. (2006). Linked data-design issues. Tech. rep., W3C http://www.w3.org/
DesignIssues/LinkedData.html
Bovee, M., Srivastava, R., & Mak, B. (2003). Conceptual framework and belief-function
approach to assessing overall information quality. International Journal of Intelligent
Systems, 51–74.
Calero, C., Caro, A., & Piattini, M. (2008). An applicable data quality model for web portal
data consumers. World Wide Web, 11(4), 465–484.
Canova, L., Basso, S., Iemma, R., & Morando, F. (2015). Collaborative open data versioning:
A pragmatic approach using linked data. International Conference for E-Democracy and
Open Government 2015 (CeDEM15), Krems, Austria.
Catarci, T., & Scannapieco, M. (2002). Data quality under the computer science perspec-
tive. Archivi Computer, 2.
Detlor, B., Hupfer, Maureen E, Ruhi, U. & Zhao, L. Information quality and community
municipal portal use, Government Information Quarterly, Volume 30, Issue 1,
January 2013, Pages 23-32, http://dx.doi.org/10.1016/j.giq.2012.08.00, (ISSN
0740-624X).
English, L. (1999). Improving data warehouse and business information quality. Wiley &
Sons.
Even, A., & Shankaranarayanan, G. (2009). Utility cost perspectives in data quality man-
agement. The Journal of Computer Information Systems, 50(2), 127–135.
Ferro, E., & Osella, M. (2013). Eight business model archetypes for PSI re-use. “Open data
on the web” workshop, Google campus, London.
Haug, A., Pedersen, A., & Arlbjørn, J. S. (2009). A classification model of ERP system data
quality. Industrial Management & Data Systems, 109(8), 1053–1068. http://dx.doi.
org/10.1108/02635570910991292.
Heinrich, B. (2002). Datenqualitätsmanagement in Data Warehouse-Systemen. (doctoral
thesis, Oldenburg).
Heinrich, B., Klier, M., & Kaiser, M. (2009). A procedure to develop metrics for cur-
rency and its application in CRM. Journal of Data and Information Quality, 1(1),
5.
Helbig, N., Nakashima, M., & Dawe, Sharon S. (June 4–7, 2012). Understanding the value
and limits of government information in policy informatics: A preliminary explora-
tion. Proceedings of the 13th Annual International Conference on Digital Government
Research (dg.o2012).
Hofmokl, J. (2010). The Internet commons: toward an eclectic theoretical framework.
International Journal of the Commons, 4(1), 226–250.
Iemma, R., Morando, F., & Osella, M. (2014). Breaking public administrations' data silos.
eJournal of eDemocracy & Open Government, 6(2).
Janssen, M., Charalabidis, Y., & Zuiderwijk, A. (2012). Benefits, adoption barriers and
myths of open data and open government. Information Systems Management, 29(4),
258–268.
Vassiliou, Y. (1995). In M. Jarke, M. Lenzerini, & P. Vassiliadis (Eds.), Fundamentals of data
warehouses. Springer Verlag.
Jeusfeld, M., Quix, C., & Jarke, M. (1998). Design and analysis of quality information for
datawarehouses. Proceedings of the 17th International Conference on Conceptual
Modeling.
Kaiser, M., Klier, M., & Heinrich, B. (2007). “How to measure data quality? — A metric-
based approach” (2007). ICIS 2007 Proceedings. Paper 108.
Kim, W. (2002). On three major holes in data warehousing today. Journal of Object
Technology, 1(4), 39–47. http://dx.doi.org/10.5381/jot.2002.1.4.c3.
Kuk, G., & Davies, T. (2011). The roles of agency and artifacts in assembling open data
complementarities. Thirty Second International Conference on Information Systems.
Madnick, S., Wang, R., & Xian, X. (2004). The design and implementation of a corporate
householding knowledge processor to improve data quality. Journal of Management
Information Systems, 20(1), 41–49.
Maurino, A., Spahiu, B., Batini, C., & Viscusi, G. (2014). Compliance with Open Government
Data Policies: an empirical evaluation of Italian local public administrations. Twenty
Second European Conference on Information Systems, Tel Aviv.
Mayer-Schönberger, V., & Zappia, Z. (2011). Participation and power: intermediaries of
open data. 1st Berlin Symposium on Internet and Society.
Moraga, C., Moraga, M., Calero, C., & Caro, A. (2009). SQuaRE-aligned data quality model
for web portals. Quality Software, 2009. QSIC’09. 9th International Conference on
(pp. 117–122).
Naumann, F. (2002). Quality-driven query answering for integrated information systems.
Lecture Notes in Computer Science, 2261.
Redman, T. (1996). Data quality for the information age. Artech House.
Reiche, K., & Hofig, E. (2013). Implementation of metadata quality metrics and application
on public government data. Computer software and applications conference workshops
(COMPSACW), 2013 IEEE 37th annual (pp. 236–241).
Sachs, L. (1982). Applied statistics. A handbook of techniques. New York — Heidelberg —
Berlin: Springer-Verlag (734 pp., 59 figs., DM 118,-).
Sande, M. V., Dimou, A., Colpaert, P., Mannens, E., & Van de Walle, R. (2013). Linked data
as enabler for open data ecosystems. Open data on the web 23–24 April 2013, Campus
London, Shoreditch.
Sharon, Dawes S. (2010). Stewardship and usefulness: Policy principles for information-
based transparency. Government Information Quarterly, 27(4), 377–383.
Stiglitz, J. E., Orszag, P. R., & Orszag, J. M. (2000). The role of government in a digital age.
(Commissioned by the computer& communications industry association).
Tauberer, J. (2012). Open government data: The book.
Ubaldi, B. (2013). Open government data: Towards empirical analysis of open govern-
ment data initiatives. Tech. rep. OECD Publishing.
Umbrich, Jürgen, Neumaier, Sebastian, & Polleres, Axel (March 2015). Towards assessing
the quality evolution of Open Data portals. ODQ2015: Open data quality: From theory
to practice workshop (Munich, Germany).
Van de Sompel, H., Sanderson, R., Nelson, M. L., Balakireva, L. L., Shankar, H., & Ainsworth,
S. (2010). An HTTP-based versioning mechanism for linked data. (arXiv preprint arXiv:
1003.3661).
Vickery, G. (2011). Review of recent studies on PSI re-use and related market developments.
Paris: Information Economics.
Wand, Y., & Wang, R. (1996). Anchoring data quality dimensions in ontological founda-
tions. Communications of the ACM, 39, 11.
Wang, R., & Strong, D. (1996). Beyond accuracy: What data quality means to data con-
sumers. Journal of Management Information Systems, 12, 4.
Whitmore A. Using open government data to predict war: A case study of data and
systems challenges, Government Information Quarterly, Volume 31, Issue 4, October
2014, Pages 622-630, (ISSN 0740-624X).
Zuiderwijk, A., Janssen, M., & Davis, C. (2014). Innovation with open data: Essential ele-
ments of open data ecosystems. Information Policy, 19(1), 17–33.
Zuiderwijk, Anneke, & Janssen, Marijn (2015). Participation and data quality in open data
use: Open data infrastructures evaluated. Proceedings of the 15th European Conference
on eGovernment 2015: ECEG 2015. Academic Conferences Limited.
Web references
Web ref. 1. http://opendefinition.org/(last visited on November 5, 2015).
Web ref. 2. http://census.okfn.org/, (last visited on November 5, 2015).
Web ref. 3. http://goo.gl/QlUT1d (last visited on November 5, 2015).
Web ref. 4. https://goo.gl/yq4ElP (last visited on November 5, 2015).
Web ref. 5. http://sunlightfoundation.com/blog/2010/06/23/elenas-inbox/ (last visit on
November 5, 2015).
Web ref. 6. http://nexa.polito.it/lunch-9 (last visit on November 5, 2015).
Web ref. 7. http://okfnlabs.org/bad-data/ (last visit on November 5, 2015).
Web ref. 8. http://www.dati.gov.it/dataset (last visit on November 5, 2015).
Web ref. 9. https://www.opengovawards.org/Awards_Booklet_Final.pdf, (last visit on
November 5, 2015).
Web ref. 10. http://en.wikipedia.org/wiki/European_Social_Fund, (last visit on November
5, 2015).
Web ref. 11. http://www.dati.gov.it/(last visit on November 5, 2015).
Web ref. 12. http://blog.okfn.org/2013/07/02/git-and-github-for-data/ (last visited on
November 5, 2015).
Web ref. 13. http://dat-data.com/(last visited on November 5, 2015).
Web ref. 14. https://code.google.com/p/google-refine/ (last visited on November 5, 2015).
Web ref. 15. http://datacleaner.org/, (last visited on November 5, 2015).
Web ref. 16. http://checklists.opquast.com/en/opendata, (last visited on November 5,
2015).
Web ref. 17. http://odaf.org/papers/Open%20Data%20and%20Metadata%20Standards.pdf,
(last visited on November 5, 2015).
Antonio Vetrò is Director of Research and Policy at the Nexa
Center for Internet and Society at Politecnico di Torino
(Italy). Formerly, he has been a research fellow in the Soft-
ware and System Engineering Department at Technische
Universität München (Germany) and junior scientist at
Fraunhofer Center for Experimental Software Engineering
(MD, U.S.A.). He holds a PhD in Information and System En-
gineering from Politecnico di Torino (Italy). He is interested
in studying the impact of technology on society, with a focus
on technology transfer and internet science, and adopting an
empirical epistemological approach. Contact him at antonio.
vetro@polito.it.
Lorenzo Canova holds a MSc in Industrial engineering from
Politecnico di Torino and wrote his Master thesis on Open
Government Data quality. He is currently a research fellow
of the Nexa Center for Internet & Society at Politecnico di To-
rino. His main research interests are in Open Government
Data, government transparency and linked data. Contact
him at lorenzo.canova@polito.it.
336 A. Vetrò et al. / Government Information Quarterly 33 (2016) 325–337
Behkamal, Behshid, Kahani, Mohsen, Bagheri, Ebrahim, & Jeremic, Zoran (May 2014).
A metrics-driven approach for quality assessment of linked open data. Journal of
Theoretical and Applied Electronic Commerce Research, 9(2), 64–79.
Berners-Lee, T. (2006). Linked data-design issues. Tech. rep., W3C http://www.w3.org/
DesignIssues/LinkedData.html
Bovee, M., Srivastava, R., & Mak, B. (2003). Conceptual framework and belief-function
approach to assessing overall information quality. International Journal of Intelligent
Systems, 51–74.
Calero, C., Caro, A., & Piattini, M. (2008). An applicable data quality model for web portal
data consumers. World Wide Web, 11(4), 465–484.
Canova, L., Basso, S., Iemma, R., & Morando, F. (2015). Collaborative open data versioning:
A pragmatic approach using linked data. International Conference for E-Democracy and
Open Government 2015 (CeDEM15), Krems, Austria.
Catarci, T., & Scannapieco, M. (2002). Data quality under the computer science perspec-
tive. Archivi Computer, 2.
Detlor, B., Hupfer, Maureen E, Ruhi, U. & Zhao, L. Information quality and community
municipal portal use, Government Information Quarterly, Volume 30, Issue 1,
January 2013, Pages 23-32, http://dx.doi.org/10.1016/j.giq.2012.08.00, (ISSN
0740-624X).
English, L. (1999). Improving data warehouse and business information quality. Wiley &
Sons.
Even, A., & Shankaranarayanan, G. (2009). Utility cost perspectives in data quality man-
agement. The Journal of Computer Information Systems, 50(2), 127–135.
Ferro, E., & Osella, M. (2013). Eight business model archetypes for PSI re-use. “Open data
on the web” workshop, Google campus, London.
Haug, A., Pedersen, A., & Arlbjørn, J. S. (2009). A classification model of ERP system data
quality. Industrial Management & Data Systems, 109(8), 1053–1068. http://dx.doi.
org/10.1108/02635570910991292.
Heinrich, B. (2002). Datenqualitätsmanagement in Data Warehouse-Systemen. (doctoral
thesis, Oldenburg).
Heinrich, B., Klier, M., & Kaiser, M. (2009). A procedure to develop metrics for cur-
rency and its application in CRM. Journal of Data and Information Quality, 1(1),
5.
Helbig, N., Nakashima, M., & Dawe, Sharon S. (June 4–7, 2012). Understanding the value
and limits of government information in policy informatics: A preliminary explora-
tion. Proceedings of the 13th Annual International Conference on Digital Government
Research (dg.o2012).
Hofmokl, J. (2010). The Internet commons: toward an eclectic theoretical framework.
International Journal of the Commons, 4(1), 226–250.
Iemma, R., Morando, F., & Osella, M. (2014). Breaking public administrations' data silos.
eJournal of eDemocracy & Open Government, 6(2).
Janssen, M., Charalabidis, Y., & Zuiderwijk, A. (2012). Benefits, adoption barriers and
myths of open data and open government. Information Systems Management, 29(4),
258–268.
Vassiliou, Y. (1995). In M. Jarke, M. Lenzerini, & P. Vassiliadis (Eds.), Fundamentals of data
warehouses. Springer Verlag.
Jeusfeld, M., Quix, C., & Jarke, M. (1998). Design and analysis of quality information for
datawarehouses. Proceedings of the 17th International Conference on Conceptual
Modeling.
Kaiser, M., Klier, M., & Heinrich, B. (2007). “How to measure data quality? — A metric-
based approach” (2007). ICIS 2007 Proceedings. Paper 108.
Kim, W. (2002). On three major holes in data warehousing today. Journal of Object
Technology, 1(4), 39–47. http://dx.doi.org/10.5381/jot.2002.1.4.c3.
Kuk, G., & Davies, T. (2011). The roles of agency and artifacts in assembling open data
complementarities. Thirty Second International Conference on Information Systems.
Madnick, S., Wang, R., & Xian, X. (2004). The design and implementation of a corporate
householding knowledge processor to improve data quality. Journal of Management
Information Systems, 20(1), 41–49.
Maurino, A., Spahiu, B., Batini, C., & Viscusi, G. (2014). Compliance with Open Government
Data Policies: an empirical evaluation of Italian local public administrations. Twenty
Second European Conference on Information Systems, Tel Aviv.
Mayer-Schönberger, V., & Zappia, Z. (2011). Participation and power: intermediaries of
open data. 1st Berlin Symposium on Internet and Society.
Moraga, C., Moraga, M., Calero, C., & Caro, A. (2009). SQuaRE-aligned data quality model
for web portals. Quality Software, 2009. QSIC’09. 9th International Conference on
(pp. 117–122).
Naumann, F. (2002). Quality-driven query answering for integrated information systems.
Lecture Notes in Computer Science, 2261.
Redman, T. (1996). Data quality for the information age. Artech House.
Reiche, K., & Hofig, E. (2013). Implementation of metadata quality metrics and application
on public government data. Computer software and applications conference workshops
(COMPSACW), 2013 IEEE 37th annual (pp. 236–241).
Sachs, L. (1982). Applied statistics. A handbook of techniques. New York — Heidelberg —
Berlin: Springer-Verlag (734 pp., 59 figs., DM 118,-).
Sande, M. V., Dimou, A., Colpaert, P., Mannens, E., & Van de Walle, R. (2013). Linked data
as enabler for open data ecosystems. Open data on the web 23–24 April 2013, Campus
London, Shoreditch.
the quality evolution of Open Data portals. ODQ2015: Open data quality: From theory
to practice workshop (Munich, Germany).
Van de Sompel, H., Sanderson, R., Nelson, M. L., Balakireva, L. L., Shankar, H., & Ainsworth,
S. (2010). An HTTP-based versioning mechanism for linked data. (arXiv preprint arXiv:
1003.3661).
Vickery, G. (2011). Review of recent studies on PSI re-use and related market developments.
Paris: Information Economics.
Wand, Y., & Wang, R. (1996). Anchoring data quality dimensions in ontological founda-
tions. Communications of the ACM, 39, 11.
Wang, R., & Strong, D. (1996). Beyond accuracy: What data quality means to data con-
sumers. Journal of Management Information Systems, 12, 4.
Whitmore A. Using open government data to predict war: A case study of data and
systems challenges, Government Information Quarterly, Volume 31, Issue 4, October
2014, Pages 622-630, (ISSN 0740-624X).
Zuiderwijk, A., Janssen, M., & Davis, C. (2014). Innovation with open data: Essential ele-
ments of open data ecosystems. Information Policy, 19(1), 17–33.
Zuiderwijk, Anneke, & Janssen, Marijn (2015). Participation and data quality in open data
use: Open data infrastructures evaluated. Proceedings of the 15th European Conference
on eGovernment 2015: ECEG 2015. Academic Conferences Limited.
Web references
Web ref. 1. http://opendefinition.org/(last visited on November 5, 2015).
Web ref. 2. http://census.okfn.org/, (last visited on November 5, 2015).
Web ref. 3. http://goo.gl/QlUT1d (last visited on November 5, 2015).
Web ref. 4. https://goo.gl/yq4ElP (last visited on November 5, 2015).
Web ref. 5. http://sunlightfoundation.com/blog/2010/06/23/elenas-inbox/ (last visit on
November 5, 2015).
Web ref. 6. http://nexa.polito.it/lunch-9 (last visit on November 5, 2015).
Web ref. 7. http://okfnlabs.org/bad-data/ (last visit on November 5, 2015).
Web ref. 8. http://www.dati.gov.it/dataset (last visit on November 5, 2015).
Web ref. 9. https://www.opengovawards.org/Awards_Booklet_Final.pdf, (last visit on
November 5, 2015).
Web ref. 10. http://en.wikipedia.org/wiki/European_Social_Fund, (last visit on November
5, 2015).
Web ref. 11. http://www.dati.gov.it/(last visit on November 5, 2015).
Web ref. 12. http://blog.okfn.org/2013/07/02/git-and-github-for-data/ (last visited on
November 5, 2015).
Web ref. 13. http://dat-data.com/(last visited on November 5, 2015).
Web ref. 14. https://code.google.com/p/google-refine/ (last visited on November 5, 2015).
Web ref. 15. http://datacleaner.org/, (last visited on November 5, 2015).
Web ref. 16. http://checklists.opquast.com/en/opendata, (last visited on November 5,
2015).
Web ref. 17. http://odaf.org/papers/Open%20Data%20and%20Metadata%20Standards.pdf,
(last visited on November 5, 2015).
Antonio Vetrò is Director of Research and Policy at the Nexa
Center for Internet and Society at Politecnico di Torino
(Italy). Formerly, he has been a research fellow in the Soft-
ware and System Engineering Department at Technische
Universität München (Germany) and junior scientist at
Fraunhofer Center for Experimental Software Engineering
(MD, U.S.A.). He holds a PhD in Information and System En-
gineering from Politecnico di Torino (Italy). He is interested
in studying the impact of technology on society, with a focus
on technology transfer and internet science, and adopting an
empirical epistemological approach. Contact him at antonio.
vetro@polito.it.
Lorenzo Canova holds a MSc in Industrial engineering from
Politecnico di Torino and wrote his Master thesis on Open
Government Data quality. He is currently a research fellow
of the Nexa Center for Internet & Society at Politecnico di To-
rino. His main research interests are in Open Government
Data, government transparency and linked data. Contact
him at lorenzo.canova@polito.it.
Marco Torchiano is an associate professor at Politecnico di
Torino, Italy; he has been a post-doctoral research fellow at
Norwegian University of Science and Technology (NTNU),
Norway. He received an MSc and a PhD in Computer Engi-
neering from Politecnico di Torino, Italy. He is an author or
coauthor of more than 100 research papers published in in-
ternational journals and conferences. He is the co-author of
the book ‘Software Development—Case studies in Java’ from
Addison-Wesley, and the co-editor of the book ‘Developing
Services for the Wireless Internet’ from Springer. His current
research interests are: design notations, testing methodolo-
gies, OTS-based development, and static analysis. The meth-
odological approach he adopts is that of empirical software
engineering. Contact him at marco.torchiano@polito.it.
Camilo O. Minotas holds a MSc from Politecnico di Torino
and wrote his Master thesis on Open Data Quality Assess-
ment. He is currently a software developer in Colombia. His
main research interests are in open data quality and govern-
ment transparency. Contact him at minotas@gmail.com.
Raimondo Iemma holds a MSc in Industrial engin
from Politecnico di Torino (Italy). His research focu
open data platforms, smart disclosure of consumptio
economics of information, and, in general, open gove
and innovation within (or fostered by) the public sec
tween 2013 and 2015, he served as a Managing Dir
research fellow at the Nexa Center for Internet & So
Politecnico di Torino. He works as a project man
Seac02, a digital company based in Torino. Contact
raimondo.iemma@gmail.com.
Federico Morando (M.A.S. in Economic theory and
metrics from the Univ. of Toulouse, Ph.D. in Instit
Economics and Law from the Univ. of Turin and Gh
an economist, with interdisciplinary research in
focused on the intersection between law, econom
technology. He taught intellectual property and comp
law at Bocconi University in Milan, and he lect
the Politecnico di Torino, at the WIPO LL.M. in Inte
Property and in other post-graduate and doctoral c
Between 2009 and 2015, he served as a Managing D
and then Director of Research and Policy at the Nexa
for Internet & Society at Politecnico di Torino. He is cu
leading a startup company that focused on linke
Contact him at federico.morando@gmail.com.
A. Vetrò et al. / Government Information Quarterly 33 (2016) 325–337
Marco Torchiano is an associate professor at Politecnico di
Torino, Italy; he has been a post-doctoral research fellow at
Norwegian University of Science and Technology (NTNU),
Norway. He received an MSc and a PhD in Computer Engi-
neering from Politecnico di Torino, Italy. He is an author or
coauthor of more than 100 research papers published in in-
ternational journals and conferences. He is the co-author of
the book ‘Software Development—Case studies in Java’ from
Addison-Wesley, and the co-editor of the book ‘Developing
Services for the Wireless Internet’ from Springer. His current
research interests are: design notations, testing methodolo-
gies, OTS-based development, and static analysis. The meth-
odological approach he adopts is that of empirical software
engineering. Contact him at marco.torchiano@polito.it.
Camilo O. Minotas holds a MSc from Politecnico di Torino
and wrote his Master thesis on Open Data Quality Assess-
ment. He is currently a software developer in Colombia. His
main research interests are in open data quality and govern-
ment transparency. Contact him at minotas@gmail.com.
A. Vetrò et al. / Government Information Quarterly 33 (20
Marco Torchiano is an associate professor at Politecnico di
Torino, Italy; he has been a post-doctoral research fellow at
Norwegian University of Science and Technology (NTNU),
Norway. He received an MSc and a PhD in Computer Engi-
neering from Politecnico di Torino, Italy. He is an author or
coauthor of more than 100 research papers published in in-
ternational journals and conferences. He is the co-author of
the book ‘Software Development—Case studies in Java’ from
Addison-Wesley, and the co-editor of the book ‘Developing
Services for the Wireless Internet’ from Springer. His current
research interests are: design notations, testing methodolo-
gies, OTS-based development, and static analysis. The meth-
odological approach he adopts is that of empirical software
engineering. Contact him at marco.torchiano@polito.it.
Camilo O. Minotas holds a MSc from Politecnico di Torino
and wrote his Master thesis on Open Data Quality Assess-
ment. He is currently a software developer in Colombia. His
main research interests are in open data quality and govern-
ment transparency. Contact him at minotas@gmail.com.
Raimondo Iemma holds a MSc in Industrial engi
from Politecnico di Torino (Italy). His research foc
open data platforms, smart disclosure of consumptio
economics of information, and, in general, open gove
and innovation within (or fostered by) the public sec
tween 2013 and 2015, he served as a Managing Dir
research fellow at the Nexa Center for Internet & So
Politecnico di Torino. He works as a project man
Seac02, a digital company based in Torino. Contact
raimondo.iemma@gmail.com.
Federico Morando (M.A.S. in Economic theory and
metrics from the Univ. of Toulouse, Ph.D. in Insti
Economics and Law from the Univ. of Turin and Gh
an economist, with interdisciplinary research in
focused on the intersection between law, econom
technology. He taught intellectual property and com
law at Bocconi University in Milan, and he lect
the Politecnico di Torino, at the WIPO LL.M. in Inte
Property and in other post-graduate and doctoral c
Between 2009 and 2015, he served as a Managing D
and then Director of Research and Policy at the Nexa
for Internet & Society at Politecnico di Torino. He is cu
leading a startup company that focused on linke
Contact him at federico.morando@gmail.com.
A. Vetrò et al. / Government Information Quarterly 33 (2016) 325–337
© UNI – Ente Nazionale Italiano di Unificazione
CODICE PROGETTO UNI 1602662
NUMERO NORMA UNI/TS 11725:2018
TITOLO ITALIANO Ingegneria del software e di sistema - Linee guida per la misurazione della
qualità dei dati
TITOLO INGLESE System and software engineering - Guidelines for the measurement of data
quality
ORGANO COMPETENTE UNI/CT 504 Software Engineering
CO-AUTORE
SOMMARIO La specifica tecnica definisce l’applicazione della misurazione della UNI
CEI ISO/IEC 25024 "Ingegneria del software e di sistema - Requisiti e
valutazione della qualità dei sistemi e del software (SQuaRE) - Misurazione
della qualità dei dati" attraverso il confronto con le prassi attuali e spunti
teorici. La norma UNI CEI ISO/IEC 25024 definisce un set di misure da
applicare nella misurazione della qualità dei dati. Le definizioni delle
misure sono generali per essere applicate a diversi casi specifici.
RELAZIONI NAZIONALI -----
RELAZIONI
INTERNAZIONALI
-----
Data Quality: unstructured data
• Evolution measures based on simple counts
SELECT COUNT(DISTINCT ?s) AS ?COUNT
WHERE { ?s a <C> .}
Semantic Web 0 (0) 1–0 1
IOS Press
A Quality Assessment Approach for Evolving
Knowledge Bases
Mohammad Rifat Ahmmad Rashid a,⇤, Marco Torchiano a, Giuseppe Rizzo b,
Nandana Mihindukulasooriya c and Oscar Corcho c
a
Politecnico di Torino, Italy
E-mail: mohammad.rashid@polito.it, marco.torchiano@polito.it
b
Instituto Superiore Mario Boella
E-mail: giuseppe.rizzo@ismb.it
c
Universidad Politecnica de Madrid, Spain
E-mail: nmihindu@fi.upm.es, ocorcho@fi.upm.es
Abstract
Knowledge bases are nowadays essential components for any task that requires automation with some degrees of intelligence.
Assessing the quality of a Knowledge Base (KB) is a complex task as it often means measuring the quality of structured informa-
tion, ontologies and vocabularies, and queryable endpoints. Popular knowledge bases such as DBpedia, YAGO2, and Wikidata
have chosen the RDF data model to represent their data due to its capabilities for semantically rich knowledge representation.
Despite its advantages, there are challenges in using RDF data model, for example, data quality assessment and validation. In this
paper, we present a novel knowledge base quality assessment approach that relies on evolution analysis. The proposed approach
uses data profiling on consecutive knowledge base releases to compute quality measures that allow detecting quality issues. Our
quality characteristics are based on the KB evolution analysis and we used high-level change detection for measurement func-
tions. In particular, we propose four quality characteristics: Persistency, Historical Persistency, Consistency, and Completeness.
Persistency and historical persistency measures concern the degree of changes and lifespan of any entity type. Consistency and
completeness measures identify properties with incomplete information and contradictory facts. The approach has been assessed
both quantitatively and qualitatively on a series of releases from two knowledge bases, eleven releases of DBpedia and eight
releases of 3cixty. The capability of Persistency and Consistency characteristics to detect quality issues varies significantly be-
tween the two case studies. Persistency measure gives observational results for evolving KBs. It is highly effective in case of KB
with periodic updates such as 3cixty KB. The Completeness characteristic is extremely effective and was able to achieve 95%
precision in error detection for both use cases. The measures are based on simple statistical operations that make the solution
both flexible and scalable.
Keywords: Quality Assessment, Quality Issues, Temporal Analysis, Knowledge Base, Linked Data
1. Introduction
The Linked Data approach consists in exposing and
connecting data from different sources on the Web by
the means of semantic web technologies. Tim Berners-
Lee1
refers to linked open data as a distributed model
*Corresponding author. E-mail: mohammad.rashid@polito.it
1http://www.w3.org/DesignIssues/LinkedData.
html
for the Semantic Web that allows any data provider to
publish its data publicly, in a machine readable format,
and to meaningfully link them with other information
sources over the Web. This is leading to the creation
of Linked Open Data (LOD) 2
cloud hosting several
Knowledge Bases (KBs) making available billions of
RDF triples from different domains such as Geogra-
2http://lod-cloud.net
1570-0844/0-1900/$35.00 c 0 – IOS Press and the authors. All rights reserved
Mobile UI Testing
K-9 Mail
V 2.995 V 3.309
Research Activities: past, present, and future.
Research Activities: past, present, and future.
Research Activities: past, present, and future.
Research Activities: past, present, and future.

More Related Content

What's hot

Object Oriented Analysis
Object Oriented AnalysisObject Oriented Analysis
Object Oriented AnalysisAMITJain879
 
A survey of software testing
A survey of software testingA survey of software testing
A survey of software testingTao He
 
Software testing q as collection by ravi
Software testing q as   collection by raviSoftware testing q as   collection by ravi
Software testing q as collection by raviRavindranath Tagore
 
Chapter 2 - White Box Test Techniques
Chapter 2 - White Box Test TechniquesChapter 2 - White Box Test Techniques
Chapter 2 - White Box Test TechniquesNeeraj Kumar Singh
 
Prioritizing Test Cases for Regression Testing A Model Based Approach
Prioritizing Test Cases for Regression Testing A Model Based ApproachPrioritizing Test Cases for Regression Testing A Model Based Approach
Prioritizing Test Cases for Regression Testing A Model Based ApproachIJTET Journal
 
5WCSQ(CFP) - Quality Improvement by the Real-Time Detection of the Problems
5WCSQ(CFP) - Quality Improvement by the Real-Time Detection of the Problems5WCSQ(CFP) - Quality Improvement by the Real-Time Detection of the Problems
5WCSQ(CFP) - Quality Improvement by the Real-Time Detection of the ProblemsTakanori Suzuki
 
Object Oriented Testing
Object Oriented TestingObject Oriented Testing
Object Oriented TestingAMITJain879
 
Lessons Learned in Software Quality 1
Lessons Learned in Software Quality 1Lessons Learned in Software Quality 1
Lessons Learned in Software Quality 1Belal Raslan
 
Software Product Measurement and Analysis in a Continuous Integration Environ...
Software Product Measurement and Analysis in a Continuous Integration Environ...Software Product Measurement and Analysis in a Continuous Integration Environ...
Software Product Measurement and Analysis in a Continuous Integration Environ...Gabriel Moreira
 
Chapter 5 - Improving the Testing Process
Chapter 5 -  Improving the Testing ProcessChapter 5 -  Improving the Testing Process
Chapter 5 - Improving the Testing ProcessNeeraj Kumar Singh
 
Software testing and introduction to quality
Software testing and introduction to qualitySoftware testing and introduction to quality
Software testing and introduction to qualityDhanashriAmbre
 
Thesis Part I EMGT 698
Thesis Part I EMGT 698Thesis Part I EMGT 698
Thesis Part I EMGT 698Karthik Murali
 

What's hot (20)

Object Oriented Analysis
Object Oriented AnalysisObject Oriented Analysis
Object Oriented Analysis
 
A survey of software testing
A survey of software testingA survey of software testing
A survey of software testing
 
Software testing q as collection by ravi
Software testing q as   collection by raviSoftware testing q as   collection by ravi
Software testing q as collection by ravi
 
Chapter 2 - White Box Test Techniques
Chapter 2 - White Box Test TechniquesChapter 2 - White Box Test Techniques
Chapter 2 - White Box Test Techniques
 
Software testing
Software testingSoftware testing
Software testing
 
Prioritizing Test Cases for Regression Testing A Model Based Approach
Prioritizing Test Cases for Regression Testing A Model Based ApproachPrioritizing Test Cases for Regression Testing A Model Based Approach
Prioritizing Test Cases for Regression Testing A Model Based Approach
 
5WCSQ(CFP) - Quality Improvement by the Real-Time Detection of the Problems
5WCSQ(CFP) - Quality Improvement by the Real-Time Detection of the Problems5WCSQ(CFP) - Quality Improvement by the Real-Time Detection of the Problems
5WCSQ(CFP) - Quality Improvement by the Real-Time Detection of the Problems
 
Object Oriented Testing
Object Oriented TestingObject Oriented Testing
Object Oriented Testing
 
S440999102
S440999102S440999102
S440999102
 
Swtesting
SwtestingSwtesting
Swtesting
 
Lessons Learned in Software Quality 1
Lessons Learned in Software Quality 1Lessons Learned in Software Quality 1
Lessons Learned in Software Quality 1
 
agile methods.docx
agile methods.docxagile methods.docx
agile methods.docx
 
Software Product Measurement and Analysis in a Continuous Integration Environ...
Software Product Measurement and Analysis in a Continuous Integration Environ...Software Product Measurement and Analysis in a Continuous Integration Environ...
Software Product Measurement and Analysis in a Continuous Integration Environ...
 
Chapter 5 - Improving the Testing Process
Chapter 5 -  Improving the Testing ProcessChapter 5 -  Improving the Testing Process
Chapter 5 - Improving the Testing Process
 
Chapter 2 - Testing in Agile
Chapter 2 - Testing in AgileChapter 2 - Testing in Agile
Chapter 2 - Testing in Agile
 
Mary_Deepthy
Mary_DeepthyMary_Deepthy
Mary_Deepthy
 
Software metrics
Software metricsSoftware metrics
Software metrics
 
Software testing and introduction to quality
Software testing and introduction to qualitySoftware testing and introduction to quality
Software testing and introduction to quality
 
Chapter 1 - Testing Process
Chapter 1 - Testing ProcessChapter 1 - Testing Process
Chapter 1 - Testing Process
 
Thesis Part I EMGT 698
Thesis Part I EMGT 698Thesis Part I EMGT 698
Thesis Part I EMGT 698
 

Similar to Research Activities: past, present, and future.

Software testing techniques - www.testersforum.com
Software testing techniques - www.testersforum.comSoftware testing techniques - www.testersforum.com
Software testing techniques - www.testersforum.comwww.testersforum.com
 
International Journal of Soft Computing and Engineering (IJS
International Journal of Soft Computing and Engineering (IJSInternational Journal of Soft Computing and Engineering (IJS
International Journal of Soft Computing and Engineering (IJShildredzr1di
 
Test driven development
Test driven developmentTest driven development
Test driven developmentNascenia IT
 
Aim (A).pptx
Aim (A).pptxAim (A).pptx
Aim (A).pptx14941
 
Testing Strategies in .NET: From Unit Testing to Integration Testing
Testing Strategies in .NET: From Unit Testing to Integration TestingTesting Strategies in .NET: From Unit Testing to Integration Testing
Testing Strategies in .NET: From Unit Testing to Integration TestingTyrion Lannister
 
The productivity of testing in software development life cycle
The productivity of testing in software development life cycleThe productivity of testing in software development life cycle
The productivity of testing in software development life cycleNora Alriyes
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
A Research Study on importance of Testing and Quality Assurance in Software D...
A Research Study on importance of Testing and Quality Assurance in Software D...A Research Study on importance of Testing and Quality Assurance in Software D...
A Research Study on importance of Testing and Quality Assurance in Software D...Sehrish Asif
 
Unit Test using Test Driven Development Approach to Support Reusability
Unit Test using Test Driven Development Approach to Support ReusabilityUnit Test using Test Driven Development Approach to Support Reusability
Unit Test using Test Driven Development Approach to Support Reusabilityijtsrd
 
FROM THE ART OF SOFTWARE TESTING TO TEST-AS-A-SERVICE IN CLOUD COMPUTING
FROM THE ART OF SOFTWARE TESTING TO TEST-AS-A-SERVICE IN CLOUD COMPUTINGFROM THE ART OF SOFTWARE TESTING TO TEST-AS-A-SERVICE IN CLOUD COMPUTING
FROM THE ART OF SOFTWARE TESTING TO TEST-AS-A-SERVICE IN CLOUD COMPUTINGijseajournal
 
From the Art of Software Testing to Test-as-a-Service in Cloud Computing
From the Art of Software Testing to Test-as-a-Service in Cloud ComputingFrom the Art of Software Testing to Test-as-a-Service in Cloud Computing
From the Art of Software Testing to Test-as-a-Service in Cloud Computingijseajournal
 
A Study: The Analysis of Test Driven Development And Design Driven Test
A Study: The Analysis of Test Driven Development And Design Driven TestA Study: The Analysis of Test Driven Development And Design Driven Test
A Study: The Analysis of Test Driven Development And Design Driven TestEditor IJMTER
 

Similar to Research Activities: past, present, and future. (20)

C41041120
C41041120C41041120
C41041120
 
Too many files
Too many filesToo many files
Too many files
 
Software testing techniques - www.testersforum.com
Software testing techniques - www.testersforum.comSoftware testing techniques - www.testersforum.com
Software testing techniques - www.testersforum.com
 
International Journal of Soft Computing and Engineering (IJS
International Journal of Soft Computing and Engineering (IJSInternational Journal of Soft Computing and Engineering (IJS
International Journal of Soft Computing and Engineering (IJS
 
fe.docx
fe.docxfe.docx
fe.docx
 
Ijcatr04051006
Ijcatr04051006Ijcatr04051006
Ijcatr04051006
 
Test driven development
Test driven developmentTest driven development
Test driven development
 
Aim (A).pptx
Aim (A).pptxAim (A).pptx
Aim (A).pptx
 
Testing Strategies in .NET: From Unit Testing to Integration Testing
Testing Strategies in .NET: From Unit Testing to Integration TestingTesting Strategies in .NET: From Unit Testing to Integration Testing
Testing Strategies in .NET: From Unit Testing to Integration Testing
 
programming testing.pdf
programming testing.pdfprogramming testing.pdf
programming testing.pdf
 
programming testing.pdf
programming testing.pdfprogramming testing.pdf
programming testing.pdf
 
programming testing.pdf
programming testing.pdfprogramming testing.pdf
programming testing.pdf
 
Test management
Test managementTest management
Test management
 
The productivity of testing in software development life cycle
The productivity of testing in software development life cycleThe productivity of testing in software development life cycle
The productivity of testing in software development life cycle
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
A Research Study on importance of Testing and Quality Assurance in Software D...
A Research Study on importance of Testing and Quality Assurance in Software D...A Research Study on importance of Testing and Quality Assurance in Software D...
A Research Study on importance of Testing and Quality Assurance in Software D...
 
Unit Test using Test Driven Development Approach to Support Reusability
Unit Test using Test Driven Development Approach to Support ReusabilityUnit Test using Test Driven Development Approach to Support Reusability
Unit Test using Test Driven Development Approach to Support Reusability
 
FROM THE ART OF SOFTWARE TESTING TO TEST-AS-A-SERVICE IN CLOUD COMPUTING
FROM THE ART OF SOFTWARE TESTING TO TEST-AS-A-SERVICE IN CLOUD COMPUTINGFROM THE ART OF SOFTWARE TESTING TO TEST-AS-A-SERVICE IN CLOUD COMPUTING
FROM THE ART OF SOFTWARE TESTING TO TEST-AS-A-SERVICE IN CLOUD COMPUTING
 
From the Art of Software Testing to Test-as-a-Service in Cloud Computing
From the Art of Software Testing to Test-as-a-Service in Cloud ComputingFrom the Art of Software Testing to Test-as-a-Service in Cloud Computing
From the Art of Software Testing to Test-as-a-Service in Cloud Computing
 
A Study: The Analysis of Test Driven Development And Design Driven Test
A Study: The Analysis of Test Driven Development And Design Driven TestA Study: The Analysis of Test Driven Development And Design Driven Test
A Study: The Analysis of Test Driven Development And Design Driven Test
 

More from Marco Torchiano

Testing the UI of Mobile Applications
Testing the UI of Mobile ApplicationsTesting the UI of Mobile Applications
Testing the UI of Mobile ApplicationsMarco Torchiano
 
Software Engineering II Course at Politecnico di Torino
Software Engineering II Course at Politecnico di TorinoSoftware Engineering II Course at Politecnico di Torino
Software Engineering II Course at Politecnico di TorinoMarco Torchiano
 
Espresso vs. EyeAutomate: comparing two generations of Android GUI testing tools
Espresso vs. EyeAutomate: comparing two generations of Android GUI testing toolsEspresso vs. EyeAutomate: comparing two generations of Android GUI testing tools
Espresso vs. EyeAutomate: comparing two generations of Android GUI testing toolsMarco Torchiano
 
Data Quality - Standards e Applicazioni
Data Quality - Standards e ApplicazioniData Quality - Standards e Applicazioni
Data Quality - Standards e ApplicazioniMarco Torchiano
 
Data Quality - Standards and Application to Open Data
Data Quality - Standards and Application to Open DataData Quality - Standards and Application to Open Data
Data Quality - Standards and Application to Open DataMarco Torchiano
 
Riflessioni su Riforma Costituzionale "Renzi-Boschi"
Riflessioni su Riforma Costituzionale "Renzi-Boschi"Riflessioni su Riforma Costituzionale "Renzi-Boschi"
Riflessioni su Riforma Costituzionale "Renzi-Boschi"Marco Torchiano
 
Relevance, Benefits, and Barriers of Software Modelling and Model Driven Tech...
Relevance, Benefits, and Barriers of Software Modelling and Model Driven Tech...Relevance, Benefits, and Barriers of Software Modelling and Model Driven Tech...
Relevance, Benefits, and Barriers of Software Modelling and Model Driven Tech...Marco Torchiano
 
Energy Consumption Analysis
 of Image Encoding and Decoding Algorithms
Energy Consumption Analysis
 of Image Encoding and Decoding AlgorithmsEnergy Consumption Analysis
 of Image Encoding and Decoding Algorithms
Energy Consumption Analysis
 of Image Encoding and Decoding AlgorithmsMarco Torchiano
 
Relevance, Benefits, and Problems of Software Modelling and Model-Driven Tech...
Relevance, Benefits, and Problems of Software Modelling and Model-Driven Tech...Relevance, Benefits, and Problems of Software Modelling and Model-Driven Tech...
Relevance, Benefits, and Problems of Software Modelling and Model-Driven Tech...Marco Torchiano
 
A Model-Based Approach to Language Integration
A Model-Based Approach to Language Integration A Model-Based Approach to Language Integration
A Model-Based Approach to Language Integration Marco Torchiano
 
On the computation of Truck Factor
On the computation of Truck FactorOn the computation of Truck Factor
On the computation of Truck FactorMarco Torchiano
 
Language Interaction and Quality Issues: An Exploratory Study
Language Interaction and Quality Issues: An Exploratory StudyLanguage Interaction and Quality Issues: An Exploratory Study
Language Interaction and Quality Issues: An Exploratory StudyMarco Torchiano
 
The impact of process maturity on defect density
The impact of process maturity on defect densityThe impact of process maturity on defect density
The impact of process maturity on defect densityMarco Torchiano
 

More from Marco Torchiano (14)

Testing the UI of Mobile Applications
Testing the UI of Mobile ApplicationsTesting the UI of Mobile Applications
Testing the UI of Mobile Applications
 
Software Engineering II Course at Politecnico di Torino
Software Engineering II Course at Politecnico di TorinoSoftware Engineering II Course at Politecnico di Torino
Software Engineering II Course at Politecnico di Torino
 
Espresso vs. EyeAutomate: comparing two generations of Android GUI testing tools
Espresso vs. EyeAutomate: comparing two generations of Android GUI testing toolsEspresso vs. EyeAutomate: comparing two generations of Android GUI testing tools
Espresso vs. EyeAutomate: comparing two generations of Android GUI testing tools
 
Data Quality - Standards e Applicazioni
Data Quality - Standards e ApplicazioniData Quality - Standards e Applicazioni
Data Quality - Standards e Applicazioni
 
Data Quality - Standards and Application to Open Data
Data Quality - Standards and Application to Open DataData Quality - Standards and Application to Open Data
Data Quality - Standards and Application to Open Data
 
Data Visualization
Data VisualizationData Visualization
Data Visualization
 
Riflessioni su Riforma Costituzionale "Renzi-Boschi"
Riflessioni su Riforma Costituzionale "Renzi-Boschi"Riflessioni su Riforma Costituzionale "Renzi-Boschi"
Riflessioni su Riforma Costituzionale "Renzi-Boschi"
 
Relevance, Benefits, and Barriers of Software Modelling and Model Driven Tech...
Relevance, Benefits, and Barriers of Software Modelling and Model Driven Tech...Relevance, Benefits, and Barriers of Software Modelling and Model Driven Tech...
Relevance, Benefits, and Barriers of Software Modelling and Model Driven Tech...
 
Energy Consumption Analysis
 of Image Encoding and Decoding Algorithms
Energy Consumption Analysis
 of Image Encoding and Decoding AlgorithmsEnergy Consumption Analysis
 of Image Encoding and Decoding Algorithms
Energy Consumption Analysis
 of Image Encoding and Decoding Algorithms
 
Relevance, Benefits, and Problems of Software Modelling and Model-Driven Tech...
Relevance, Benefits, and Problems of Software Modelling and Model-Driven Tech...Relevance, Benefits, and Problems of Software Modelling and Model-Driven Tech...
Relevance, Benefits, and Problems of Software Modelling and Model-Driven Tech...
 
A Model-Based Approach to Language Integration
A Model-Based Approach to Language Integration A Model-Based Approach to Language Integration
A Model-Based Approach to Language Integration
 
On the computation of Truck Factor
On the computation of Truck FactorOn the computation of Truck Factor
On the computation of Truck Factor
 
Language Interaction and Quality Issues: An Exploratory Study
Language Interaction and Quality Issues: An Exploratory StudyLanguage Interaction and Quality Issues: An Exploratory Study
Language Interaction and Quality Issues: An Exploratory Study
 
The impact of process maturity on defect density
The impact of process maturity on defect densityThe impact of process maturity on defect density
The impact of process maturity on defect density
 

Recently uploaded

why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 

Recently uploaded (20)

why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 

Research Activities: past, present, and future.

  • 1. Research Activities: past, present, and future Marco Torchiano Seminario Pubblico Procedura di selezione per professore universitario di ruolo di I fascia S.C. 09/H1-S.S.D. ING-INF/05 - codice interno 03/18/P/O October 19, 2018
  • 2.
  • 3. Empirical Software Engineering Survey Case study Case-controlExperiment Package effsisze 2.2K download/month
  • 4. 1 On the Effectiveness of the Test-First Approach to Programming Hakan Erdogmus, Maurizio Morisio, Member, IEEE Computer Society, and Marco Torchiano, Member, IEEE Computer Society Abstract—Test-Driven Development (TDD) is based on formalizing a piece of functionality as a test, implementing the functionality such that the test passes, and iterating the process. This paper describes a controlled experiment for evaluating an important aspect of TDD: In TDD, programmers write functional tests before the corresponding implementation code. The experiment was conducted with undergraduate students. While the experiment group applied a test-first strategy, the control group applied a more conventional development technique, writing tests after the implementation. Both groups followed an incremental process, adding new features one at a time and regression testing them. We found that test-first students on average wrote more tests and, in turn, students who wrote more tests tended to be more productive. We also observed that the minimum quality increased linearly with the number of programmer tests, independent of the development strategy employed. Index Terms—General programming techniques, coding tools and techniques, testing and debugging, testing strategies, productivity, Software Quality/SQA, software engineering process, programming paradigms. æ 1 INTRODUCTION TEST-DRIVEN Development [1], [2], or TDD, is an approach to code development popularized by Extreme Program- ming [3], [4]. The key aspect of TDD is that programmers write low-level functional tests before production code. This dynamic, referred to as Test-First, is the focus of this paper. The technique is usually supported by a unit testing tool, such as JUnit [5], [6]. Programmer tests in TDD, much like unit tests, tend to be low-level. Sometimes they can be higher level and cross- cutting, but do not in general address testing at integration and system levels. TDD relies on no specific or formal criterion to select the test cases. The tests are added gradually during implementation. The testing strategy employed differs from the situation where tests for a module are written by a separate quality assurance team before the module is available. In TDD, tests are written one at a time and by the same person developing the module. (Cleanroom [7], for example, also leverages tests that are specified before implementation, but in a very different way. Tests are specified to mirror a statistical user input distribution as part of a formal reliability model, but this activity is completely separated from the implementation phase.) TDD can be considered from several points of view: . Feedback: Tests provide the programmer with instant feedback as to whether new functionality has been implemented as intended and whether it interferes with old functionality. . Task-orientation: Tests drive the coding activity, encouraging the programmer to decompose the problem into manageable, formalized programming tasks, helping to maintain focus and providing steady, measurable progress. . Quality assurance: Having up-to-date tests in place ensures a certain level of quality, maintained by frequently running the tests. . Low-level design: Tests provide the context in which low-level design decisions are made, such as which classes and methods to create, how they will be named, what interfaces they will possess, and how they will be used. Although advocates claim that TDD enhances both product quality and programmer productivity, skeptics maintain that such an approach to programming would be both counterproductive and hard to learn. It is easier to imagine how interleaving coding with testing could increase quality, but harder to imagine a similar positive effect on productivity. Many practitioners consider tests as overhead and testing as exclusively the job of a separate quality assurance team. But, what if the benefits of programmer tests outweigh their cost? If the technique indeed has merit, a reexamination of the traditional coding techniques could be warranted. This paper presents a formal investigation of the strengths and weaknesses of the Test-First approach to programming, the central component of TDD. We study this component in the context in which it is believed to be most effective, namely, incremental development. To undertake the investigation, we designed a controlled 226 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 31, NO. 3, MARCH 2005 . H. Erdogmus is with the Institute for Information Technology, National Research Council Canada, Bldg. M-50, Montreal Rd., Ottawa, Ontario K1A 0R6, Canada. E-mail: Hakan.Erdogmus@nrc-cnrc.gc.ca. . M. Morisio and M. Torchiano are with the Dipartimento Automatica e Informatica, Politecnico di Torino, C.so Duca degli Abruzzi, 24, I-10129, Torino, Italy. E-mail: {Maurizio.Morisio, Marco.Torchiano}@polito.it. Manuscript received 13 Feb. 2004; revised 6 Dec. 2004; accepted 31 Dec. 2004; published online 20 Apr. 2005. Recommended for acceptance by D. Rombach. For information on obtaining reprints of this article, please send e-mail to: tse@computer.org, and reference IEEECS Log Number TSE-0026-0204. 0098-5589/05/$20.00 ß 2005 IEEE Published by the IEEE Computer Society 362 citations
  • 5. ERDOGMUS ET AL.: ON THE EFFECTIVENESS OF THE TEST-FIRST APPROACH TO PROGRAMMING Test-Driven Development • TDD is an agile development practice • Made popular by eXtreme Programming • Adopted widely • Promises: • More and better tests • Higher external quality • Improved productivity
  • 6. Test-Driven Development: findings • Higher number of tests • Impact on quality and productivity • No clear difference in terms of quality • Too small tasks • Process conformance • Slight improvement in productivity • More focused • Less rework On the Effectiveness of the Test-First Approach to Programming Hakan Erdogmus, Maurizio Morisio, Member, IEEE Computer Society, and Marco Torchiano, Member, IEEE Computer Society Abstract—Test-Driven Development (TDD) is based on formalizing a piece of functionality as a test, implementing the functionality such that the test passes, and iterating the process. This paper describes a controlled experiment for evaluating an important aspect of TDD: In TDD, programmers write functional tests before the corresponding implementation code. The experiment was conducted with undergraduate students. While the experiment group applied a test-first strategy, the control group applied a more conventional development technique, writing tests after the implementation. Both groups followed an incremental process, adding new features one at a time and regression testing them. We found that test-first students on average wrote more tests and, in turn, students who wrote more tests tended to be more productive. We also observed that the minimum quality increased linearly with the number of programmer tests, independent of the development strategy employed. Index Terms—General programming techniques, coding tools and techniques, testing and debugging, testing strategies, productivity, Software Quality/SQA, software engineering process, programming paradigms. æ 1 INTRODUCTION TEST-DRIVEN Development [1], [2], or TDD, is an approach to code development popularized by Extreme Program- ming [3], [4]. The key aspect of TDD is that programmers write low-level functional tests before production code. This dynamic, referred to as Test-First, is the focus of this paper. The technique is usually supported by a unit testing tool, such as JUnit [5], [6]. Programmer tests in TDD, much like unit tests, tend to be low-level. Sometimes they can be higher level and cross- cutting, but do not in general address testing at integration and system levels. TDD relies on no specific or formal criterion to select the test cases. The tests are added gradually during implementation. The testing strategy employed differs from the situation where tests for a module are written by a separate quality assurance team before the module is available. In TDD, tests are written one at a time and by the same person developing the module. (Cleanroom [7], for example, also leverages tests that are specified before implementation, but in a very different way. Tests are specified to mirror a statistical user input distribution as part of a formal reliability model, but this activity is completely separated from the implementation phase.) TDD can be considered from several points of view: . Feedback: Tests provide the programmer with instant feedback as to whether new functionality has been implemented as intended and whether it interferes with old functionality. . Task-orientation: Tests drive the coding activity, encouraging the programmer to decompose the problem into manageable, formalized programming tasks, helping to maintain focus and providing steady, measurable progress. . Quality assurance: Having up-to-date tests in place ensures a certain level of quality, maintained by frequently running the tests. . Low-level design: Tests provide the context in which low-level design decisions are made, such as which classes and methods to create, how they will be named, what interfaces they will possess, and how they will be used. Although advocates claim that TDD enhances both product quality and programmer productivity, skeptics maintain that such an approach to programming would be both counterproductive and hard to learn. It is easier to imagine how interleaving coding with testing could increase quality, but harder to imagine a similar positive effect on productivity. Many practitioners consider tests as overhead and testing as exclusively the job of a separate quality assurance team. But, what if the benefits of programmer tests outweigh their cost? If the technique indeed has merit, a reexamination of the traditional coding techniques could be warranted. This paper presents a formal investigation of the strengths and weaknesses of the Test-First approach to programming, the central component of TDD. We study this component in the context in which it is believed to be most effective, namely, incremental development. To undertake the investigation, we designed a controlled 226 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 31, NO. 3, MARCH 2005 . H. Erdogmus is with the Institute for Information Technology, National Research Council Canada, Bldg. M-50, Montreal Rd., Ottawa, Ontario K1A 0R6, Canada. E-mail: Hakan.Erdogmus@nrc-cnrc.gc.ca. . M. Morisio and M. Torchiano are with the Dipartimento Automatica e Informatica, Politecnico di Torino, C.so Duca degli Abruzzi, 24, I-10129, Torino, Italy. E-mail: {Maurizio.Morisio, Marco.Torchiano}@polito.it. Manuscript received 13 Feb. 2004; revised 6 Dec. 2004; accepted 31 Dec. 2004; published online 20 Apr. 2005. Recommended for acceptance by D. Rombach. For information on obtaining reprints of this article, please send e-mail to: tse@computer.org, and reference IEEECS Log Number TSE-0026-0204. 0098-5589/05/$20.00 ß 2005 IEEE Published by the IEEE Computer Society
  • 7. 1 On the Effectiveness of the Test-First Approach to Programming Hakan Erdogmus, Maurizio Morisio, Member, IEEE Computer Society, and Marco Torchiano, Member, IEEE Computer Society Abstract—Test-Driven Development (TDD) is based on formalizing a piece of functionality as a test, implementing the functionality such that the test passes, and iterating the process. This paper describes a controlled experiment for evaluating an important aspect of TDD: In TDD, programmers write functional tests before the corresponding implementation code. The experiment was conducted with undergraduate students. While the experiment group applied a test-first strategy, the control group applied a more conventional development technique, writing tests after the implementation. Both groups followed an incremental process, adding new features one at a time and regression testing them. We found that test-first students on average wrote more tests and, in turn, students who wrote more tests tended to be more productive. We also observed that the minimum quality increased linearly with the number of programmer tests, independent of the development strategy employed. Index Terms—General programming techniques, coding tools and techniques, testing and debugging, testing strategies, productivity, Software Quality/SQA, software engineering process, programming paradigms. æ 1 INTRODUCTION TEST-DRIVEN Development [1], [2], or TDD, is an approach to code development popularized by Extreme Program- ming [3], [4]. The key aspect of TDD is that programmers write low-level functional tests before production code. This dynamic, referred to as Test-First, is the focus of this paper. The technique is usually supported by a unit testing tool, such as JUnit [5], [6]. Programmer tests in TDD, much like unit tests, tend to be low-level. Sometimes they can be higher level and cross- cutting, but do not in general address testing at integration and system levels. TDD relies on no specific or formal criterion to select the test cases. The tests are added gradually during implementation. The testing strategy employed differs from the situation where tests for a module are written by a separate quality assurance team before the module is available. In TDD, tests are written one at a time and by the same person developing the module. (Cleanroom [7], for example, also leverages tests that are specified before implementation, but in a very different way. Tests are specified to mirror a statistical user input distribution as part of a formal reliability model, but this activity is completely separated from the implementation phase.) TDD can be considered from several points of view: . Feedback: Tests provide the programmer with instant feedback as to whether new functionality has been implemented as intended and whether it interferes with old functionality. . Task-orientation: Tests drive the coding activity, encouraging the programmer to decompose the problem into manageable, formalized programming tasks, helping to maintain focus and providing steady, measurable progress. . Quality assurance: Having up-to-date tests in place ensures a certain level of quality, maintained by frequently running the tests. . Low-level design: Tests provide the context in which low-level design decisions are made, such as which classes and methods to create, how they will be named, what interfaces they will possess, and how they will be used. Although advocates claim that TDD enhances both product quality and programmer productivity, skeptics maintain that such an approach to programming would be both counterproductive and hard to learn. It is easier to imagine how interleaving coding with testing could increase quality, but harder to imagine a similar positive effect on productivity. Many practitioners consider tests as overhead and testing as exclusively the job of a separate quality assurance team. But, what if the benefits of programmer tests outweigh their cost? If the technique indeed has merit, a reexamination of the traditional coding techniques could be warranted. This paper presents a formal investigation of the strengths and weaknesses of the Test-First approach to programming, the central component of TDD. We study this component in the context in which it is believed to be most effective, namely, incremental development. To undertake the investigation, we designed a controlled 226 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 31, NO. 3, MARCH 2005 . H. Erdogmus is with the Institute for Information Technology, National Research Council Canada, Bldg. M-50, Montreal Rd., Ottawa, Ontario K1A 0R6, Canada. E-mail: Hakan.Erdogmus@nrc-cnrc.gc.ca. . M. Morisio and M. Torchiano are with the Dipartimento Automatica e Informatica, Politecnico di Torino, C.so Duca degli Abruzzi, 24, I-10129, Torino, Italy. E-mail: {Maurizio.Morisio, Marco.Torchiano}@polito.it. Manuscript received 13 Feb. 2004; revised 6 Dec. 2004; accepted 31 Dec. 2004; published online 20 Apr. 2005. Recommended for acceptance by D. Rombach. For information on obtaining reprints of this article, please send e-mail to: tse@computer.org, and reference IEEECS Log Number TSE-0026-0204. 0098-5589/05/$20.00 ß 2005 IEEE Published by the IEEE Computer Society 362 citations 2 2 IEEE SOFTWARE Published by the IEEE Computer Society 0740-7459/04/$20.00 © 2004 IEEE definition would improve discourse and enable researchers to meta-analyze published empiri- cal data. After a speculative effort to define and clas- sify COTS products,2 we decided to obtain this much-needed understanding from the bottom up. We asked people involved in industrial proj- ects to name the key features in COTS-based de- velopment and to tell us what they think con- stitutes a COTS product. Here, we present the interview results in the form of six theses, which contradict widely accepted (or simply undis- puted) ideas. We also present a definition of “COTS product” that captures the key features. The sample of projects we used is limited and can’t represent the whole IT industry. Instead, our study primarily considers Subject Matter Experts. While SMEs produce most software, they’re usually neglected by research. Specifi- cally, the COTS-development literature seems to focus on large, mostly governmental proj- ects performed in major companies. Our goal is to empirically explore this key topic for the software industry, collect facts and derive well- substantiated theses from these facts. The groundwork From a methodological viewpoint, we con- ducted empirical research following two paral- lel threads that complement each other: a sys- tematic literature investigation and an on-field exploratory study based on structured inter- views. Because the industry doesn’t have a com- mon understanding of COTS products, we laid feature Overlooked Aspects of COTS-based Development A lthough developing with commercial-off-the-shelf components is gaining more attention from both research and industrial com- munities,1 most literature on the topic doesn’t clearly identify context variables such as the type of products, projects, and sys- tems. In particular, the literature often lacks a definition of “COTS prod- uct,” or, when a definition is present, it invariably disagrees with other studies (see the sidebar, “What COTS Product Do You Mean?”). A shared cots Studies on commercial-off-the-shelf-based development often disagree, lack product and project details, and are founded on uncritically accepted assumptions. This empirical study establishes key features of industrial COTS-based development and creates a definition of “COTS products” that captures these features. Marco Torchiano, Norwegian University of Science and Technology Maurizio Morisio, Politecnico di Torino 158 citations
  • 8. Development with OTS components • Development with Off-The-Shelf components • Systems are built through integration of existing components • The marketplace influences the evolution of OTS components • Continuous trade-off between system and component features • Strong push from DoD since late ‘90s • Series of conferences driven by SEI since 2002 • As system become more complex all sw is based on OTS components • Goal: state of the practice of OTS-based development in EU industry
  • 9. OTS-Based Development: 6 theses T1 Open source software is often used as closed source. T2 Integration problems result from lack of standard compliance; architectural mismatches constitute a secondary issue. T3 Custom code mainly provides additional functionalities. T4 Developers seldom use formal selection procedures. Familiarity with either the product or the generic architecture is the key factor in selection. T5 Architecture is more important than requirements for product selection. T6 Integrators tend to influence the vendor concerning product evolution whenever possible. 2 IEEE SOFTWARE Published by the IEEE Computer Society 0740-7459/04/$20.00 © 2004 IEEE definition would improve discourse and enable researchers to meta-analyze published empiri- cal data. After a speculative effort to define and clas- sify COTS products,2 we decided to obtain this much-needed understanding from the bottom up. We asked people involved in industrial proj- ects to name the key features in COTS-based de- velopment and to tell us what they think con- stitutes a COTS product. Here, we present the interview results in the form of six theses, which contradict widely accepted (or simply undis- puted) ideas. We also present a definition of “COTS product” that captures the key features. The sample of projects we used is limited and can’t represent the whole IT industry. Instead, our study primarily considers Subject Matter Experts. While SMEs produce most software, they’re usually neglected by research. Specifi- cally, the COTS-development literature seems to focus on large, mostly governmental proj- ects performed in major companies. Our goal is to empirically explore this key topic for the software industry, collect facts and derive well- substantiated theses from these facts. The groundwork From a methodological viewpoint, we con- ducted empirical research following two paral- lel threads that complement each other: a sys- tematic literature investigation and an on-field exploratory study based on structured inter- views. Because the industry doesn’t have a com- mon understanding of COTS products, we laid feature Overlooked Aspects of COTS-based Development A lthough developing with commercial-off-the-shelf components is gaining more attention from both research and industrial com- munities,1 most literature on the topic doesn’t clearly identify context variables such as the type of products, projects, and sys- tems. In particular, the literature often lacks a definition of “COTS prod- uct,” or, when a definition is present, it invariably disagrees with other studies (see the sidebar, “What COTS Product Do You Mean?”). A shared cots Studies on commercial-off-the-shelf-based development often disagree, lack product and project details, and are founded on uncritically accepted assumptions. This empirical study establishes key features of industrial COTS-based development and creates a definition of “COTS products” that captures these features. Marco Torchiano, Norwegian University of Science and Technology Maurizio Morisio, Politecnico di Torino [49] B.H. Cohen, Explaining P Wiley & Sons, 2000. [50] Common Risks and Risk Re www.mitre.org/work/s CommonRisksCOTS.doc, [51] V.R. Basili and B.W. Boeh Computer, vol. 34, no. 5, p [52] S. Lauesen, “COTS Ten J. Requirements Eng., vol. [53] J. Li, R. Conradi, O.P.N Torchiano, and M. Morisio Shelf Component Based Software Metrics Symp., (a [54] J. Cohen, P. Cohen, S.G. Regression/Correlation Ana Lawrence Erlbaum Assoc [55] P.M. Podsakoof and D.W Research: Problems and P pp. 531-544, 1986. [56] S.J. Pocock, M.D. Hughe Reporting of Clinical Tria The New England J. Medic Jingyu ter scie Science present worked NTNU. enginee velopm project program Computer Society and the ACM Reidar degree Science in 1970 NTNU s Depart Science at the F ware E Politecn current research interests lie improvement, version models, ware, and associated empirica Odd P degree Univers fellow Univers heim, N risk ma ware e open so 286 Thanks to:
  • 10. 1 On the Effectiveness of the Test-First Approach to Programming Hakan Erdogmus, Maurizio Morisio, Member, IEEE Computer Society, and Marco Torchiano, Member, IEEE Computer Society Abstract—Test-Driven Development (TDD) is based on formalizing a piece of functionality as a test, implementing the functionality such that the test passes, and iterating the process. This paper describes a controlled experiment for evaluating an important aspect of TDD: In TDD, programmers write functional tests before the corresponding implementation code. The experiment was conducted with undergraduate students. While the experiment group applied a test-first strategy, the control group applied a more conventional development technique, writing tests after the implementation. Both groups followed an incremental process, adding new features one at a time and regression testing them. We found that test-first students on average wrote more tests and, in turn, students who wrote more tests tended to be more productive. We also observed that the minimum quality increased linearly with the number of programmer tests, independent of the development strategy employed. Index Terms—General programming techniques, coding tools and techniques, testing and debugging, testing strategies, productivity, Software Quality/SQA, software engineering process, programming paradigms. æ 1 INTRODUCTION TEST-DRIVEN Development [1], [2], or TDD, is an approach to code development popularized by Extreme Program- ming [3], [4]. The key aspect of TDD is that programmers write low-level functional tests before production code. This dynamic, referred to as Test-First, is the focus of this paper. The technique is usually supported by a unit testing tool, such as JUnit [5], [6]. Programmer tests in TDD, much like unit tests, tend to be low-level. Sometimes they can be higher level and cross- cutting, but do not in general address testing at integration and system levels. TDD relies on no specific or formal criterion to select the test cases. The tests are added gradually during implementation. The testing strategy employed differs from the situation where tests for a module are written by a separate quality assurance team before the module is available. In TDD, tests are written one at a time and by the same person developing the module. (Cleanroom [7], for example, also leverages tests that are specified before implementation, but in a very different way. Tests are specified to mirror a statistical user input distribution as part of a formal reliability model, but this activity is completely separated from the implementation phase.) TDD can be considered from several points of view: . Feedback: Tests provide the programmer with instant feedback as to whether new functionality has been implemented as intended and whether it interferes with old functionality. . Task-orientation: Tests drive the coding activity, encouraging the programmer to decompose the problem into manageable, formalized programming tasks, helping to maintain focus and providing steady, measurable progress. . Quality assurance: Having up-to-date tests in place ensures a certain level of quality, maintained by frequently running the tests. . Low-level design: Tests provide the context in which low-level design decisions are made, such as which classes and methods to create, how they will be named, what interfaces they will possess, and how they will be used. Although advocates claim that TDD enhances both product quality and programmer productivity, skeptics maintain that such an approach to programming would be both counterproductive and hard to learn. It is easier to imagine how interleaving coding with testing could increase quality, but harder to imagine a similar positive effect on productivity. Many practitioners consider tests as overhead and testing as exclusively the job of a separate quality assurance team. But, what if the benefits of programmer tests outweigh their cost? If the technique indeed has merit, a reexamination of the traditional coding techniques could be warranted. This paper presents a formal investigation of the strengths and weaknesses of the Test-First approach to programming, the central component of TDD. We study this component in the context in which it is believed to be most effective, namely, incremental development. To undertake the investigation, we designed a controlled 226 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 31, NO. 3, MARCH 2005 . H. Erdogmus is with the Institute for Information Technology, National Research Council Canada, Bldg. M-50, Montreal Rd., Ottawa, Ontario K1A 0R6, Canada. E-mail: Hakan.Erdogmus@nrc-cnrc.gc.ca. . M. Morisio and M. Torchiano are with the Dipartimento Automatica e Informatica, Politecnico di Torino, C.so Duca degli Abruzzi, 24, I-10129, Torino, Italy. E-mail: {Maurizio.Morisio, Marco.Torchiano}@polito.it. Manuscript received 13 Feb. 2004; revised 6 Dec. 2004; accepted 31 Dec. 2004; published online 20 Apr. 2005. Recommended for acceptance by D. Rombach. For information on obtaining reprints of this article, please send e-mail to: tse@computer.org, and reference IEEECS Log Number TSE-0026-0204. 0098-5589/05/$20.00 ß 2005 IEEE Published by the IEEE Computer Society 362 citations 2 2 IEEE SOFTWARE Published by the IEEE Computer Society 0740-7459/04/$20.00 © 2004 IEEE definition would improve discourse and enable researchers to meta-analyze published empiri- cal data. After a speculative effort to define and clas- sify COTS products,2 we decided to obtain this much-needed understanding from the bottom up. We asked people involved in industrial proj- ects to name the key features in COTS-based de- velopment and to tell us what they think con- stitutes a COTS product. Here, we present the interview results in the form of six theses, which contradict widely accepted (or simply undis- puted) ideas. We also present a definition of “COTS product” that captures the key features. The sample of projects we used is limited and can’t represent the whole IT industry. Instead, our study primarily considers Subject Matter Experts. While SMEs produce most software, they’re usually neglected by research. Specifi- cally, the COTS-development literature seems to focus on large, mostly governmental proj- ects performed in major companies. Our goal is to empirically explore this key topic for the software industry, collect facts and derive well- substantiated theses from these facts. The groundwork From a methodological viewpoint, we con- ducted empirical research following two paral- lel threads that complement each other: a sys- tematic literature investigation and an on-field exploratory study based on structured inter- views. Because the industry doesn’t have a com- mon understanding of COTS products, we laid feature Overlooked Aspects of COTS-based Development A lthough developing with commercial-off-the-shelf components is gaining more attention from both research and industrial com- munities,1 most literature on the topic doesn’t clearly identify context variables such as the type of products, projects, and sys- tems. In particular, the literature often lacks a definition of “COTS prod- uct,” or, when a definition is present, it invariably disagrees with other studies (see the sidebar, “What COTS Product Do You Mean?”). A shared cots Studies on commercial-off-the-shelf-based development often disagree, lack product and project details, and are founded on uncritically accepted assumptions. This empirical study establishes key features of industrial COTS-based development and creates a definition of “COTS products” that captures these features. Marco Torchiano, Norwegian University of Science and Technology Maurizio Morisio, Politecnico di Torino 158 citations 3 The Journal of Systems and Software 86 (2013) 2110–2126 Contents lists available at SciVerse ScienceDirect The Journal of Systems and Software journal homepage: www.elsevier.com/locate/jss Relevance, benefits, and problems of software modelling and model driven techniques—A survey in the Italian industry Marco Torchianoa,∗ , Federico Tomassettia , Filippo Riccab , Alessandro Tisob , Gianna Reggiob a Dip. Automatica e Informatica, Politecnico di Torino, Italy b DIBRIS, Università di Genova, Italy a r t i c l e i n f o Article history: Received 23 August 2012 Received in revised form 15 March 2013 Accepted 18 March 2013 Available online 1 April 2013 Keywords: Model driven techniques Software modelling Personal opinion survey a b s t r a c t Context: Claimed benefits of software modelling and model driven techniques are improvements in pro- ductivity, portability, maintainability and interoperability. However, little effort has been devoted at collecting evidence to evaluate their actual relevance, benefits and usage complications. Goal: The main goals of this paper are: (1) assess the diffusion and relevance of software modelling and MD techniques in the Italian industry, (2) understand the expected and achieved benefits, and (3) identify which problems limit/prevent their diffusion. Method: We conducted an exploratory personal opinion survey with a sample of 155 Italian software professionals by means of a Web-based questionnaire on-line from February to April 2011. Results: Software modelling and MD techniques are very relevant in the Italian industry. The adoption of simple modelling brings common benefits (better design support, documentation improvement, better maintenance, and higher software quality), while MD techniques make it easier to achieve: improved standardization, higher productivity, and platform independence. We identified problems, some hinder- ing adoption (too much effort required and limited usefulness) others preventing it (lack of competencies and supporting tools). Conclusions: The relevance represents an important objective motivation for researchers in this area. The relationship between techniques and attainable benefits represents an instrument for practitioners planning the adoption of such techniques. In addition the findings may provide hints for companies and universities. © 2013 Elsevier Inc. All rights reserved. 1. Introduction Usually, model-based techniques use models to describe the architecture and design of a system and/or the behaviour of soft- ware artefacts generally through a raise in the level of abstraction (France and Rumpe, 2003). Models can also be used for other purposes, e.g., to describe business workflows or development pro- cesses. They can be used in different phases of the development process as communication artefacts, as points of references against which subsequent implementations are verified, or as the basis for further development. In the latter case, they may become the key elements in the process, from which other artefacts (most notably code) are generated. ∗ Corresponding author at: Dip. Automatica e Informatica, Politecnico di Torino, C.so Duca degli Abruzzi 24, 10129 Torino, Italy. Tel.: +39 011 564 7088; fax: +39 011 564 7099. E-mail address: marco.torchiano@polito.it (M. Torchiano). The term “model” is very general; it is difficult to provide a com- prehensive yet precise definition of a model. For us, a model is an artefact realized with the goal to capture and transfer information in a form as pure as possible limiting the syntax noise. Exam- ples of models include UML design models (Object Management Group, 2011b), process models defined through BPMN (Object Management Group, 2011a), Web application models defined through WebML (Ceri et al., 2000) as well as textual Domain Specific Language (DSL) models (Fowler and Parsons, 2011). Given that models are so heterogeneous and the processes involving them so varied, it is actually difficult to define the exact boundaries for modelling. Moreover, models can be used practically in different ways and with different levels of maturity (Tomassetti et al., 2012). While in theory we have a set of best practices, in practice we only found mixed, half baked, and personalized solu- tions. As a consequence, the different uses of modelling are complex and difficult to classify precisely. When models constitute a crucial and operational ingredi- ent of software development then the process is called model driven. There are different model driven techniques: Model Driven 0164-1212/$ –see front matter © 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.jss.2013.03.084 67 citations
  • 11. MDD State of the Practice • Model-Driven Development is not just modeling • Code generation • Model interpretation • Model transformation • Often, toolsmithing • Promised improvements: • Productivity • Portability • Maintainability and • Interoperability • … MDD ≠
  • 12. MDD Benefits The Journal of Systems and Software 86 (2013) 2110–2126 Contents lists available at SciVerse ScienceDirect The Journal of Systems and Software journal homepage: www.elsevier.com/locate/jss Relevance, benefits, and problems of software modelling and model driven techniques—A survey in the Italian industry Marco Torchianoa,∗ , Federico Tomassettia , Filippo Riccab , Alessandro Tisob , Gianna Reggiob a Dip. Automatica e Informatica, Politecnico di Torino, Italy b DIBRIS, Università di Genova, Italy a r t i c l e i n f o Article history: Received 23 August 2012 Received in revised form 15 March 2013 Accepted 18 March 2013 Available online 1 April 2013 Keywords: Model driven techniques Software modelling Personal opinion survey a b s t r a c t Context: Claimed benefits of software modelling and model driven techniques are improvements in pro- ductivity, portability, maintainability and interoperability. However, little effort has been devoted at collecting evidence to evaluate their actual relevance, benefits and usage complications. Goal: The main goals of this paper are: (1) assess the diffusion and relevance of software modelling and MD techniques in the Italian industry, (2) understand the expected and achieved benefits, and (3) identify which problems limit/prevent their diffusion. Method: We conducted an exploratory personal opinion survey with a sample of 155 Italian software professionals by means of a Web-based questionnaire on-line from February to April 2011. Results: Software modelling and MD techniques are very relevant in the Italian industry. The adoption of simple modelling brings common benefits (better design support, documentation improvement, better maintenance, and higher software quality), while MD techniques make it easier to achieve: improved standardization, higher productivity, and platform independence. We identified problems, some hinder- ing adoption (too much effort required and limited usefulness) others preventing it (lack of competencies and supporting tools). Conclusions: The relevance represents an important objective motivation for researchers in this area. The relationship between techniques and attainable benefits represents an instrument for practitioners planning the adoption of such techniques. In addition the findings may provide hints for companies and universities. © 2013 Elsevier Inc. All rights reserved. 1. Introduction Usually, model-based techniques use models to describe the architecture and design of a system and/or the behaviour of soft- ware artefacts generally through a raise in the level of abstraction (France and Rumpe, 2003). Models can also be used for other purposes, e.g., to describe business workflows or development pro- cesses. They can be used in different phases of the development process as communication artefacts, as points of references against which subsequent implementations are verified, or as the basis for further development. In the latter case, they may become the key elements in the process, from which other artefacts (most notably code) are generated. ∗ Corresponding author at: Dip. Automatica e Informatica, Politecnico di Torino, C.so Duca degli Abruzzi 24, 10129 Torino, Italy. Tel.: +39 011 564 7088; fax: +39 011 564 7099. E-mail address: marco.torchiano@polito.it (M. Torchiano). The term “model” is very general; it is difficult to provide a com- prehensive yet precise definition of a model. For us, a model is an artefact realized with the goal to capture and transfer information in a form as pure as possible limiting the syntax noise. Exam- ples of models include UML design models (Object Management Group, 2011b), process models defined through BPMN (Object Management Group, 2011a), Web application models defined through WebML (Ceri et al., 2000) as well as textual Domain Specific Language (DSL) models (Fowler and Parsons, 2011). Given that models are so heterogeneous and the processes involving them so varied, it is actually difficult to define the exact boundaries for modelling. Moreover, models can be used practically in different ways and with different levels of maturity (Tomassetti et al., 2012). While in theory we have a set of best practices, in practice we only found mixed, half baked, and personalized solu- tions. As a consequence, the different uses of modelling are complex and difficult to classify precisely. When models constitute a crucial and operational ingredi- ent of software development then the process is called model driven. There are different model driven techniques: Model Driven 0164-1212/$ –see front matter © 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.jss.2013.03.084 2126 M. Torchiano et al. / The Journal of Systems and Software 86 (2013) 2110–2126 Motulsky, H., 2010. Intuitive Biostatistics: A Nonmathematical Guide to Statistical Thinking. Oxford University Press, New York. Nugroho, A., Chaudron, M.R.,2008. A survey into the rigor of UML use and its perceived impact on quality and productivity. In: Proceedings of the ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM’08. ACM, New York, NY, USA, pp. 90–99. Object Management Group, 2011a. Business Process Model and Notation, v. 2.0, Standard. Object Management Group. Object Management Group, 2011b. Unified Modeling Language, Superstructure, v. 2.4, Specifications. Object Management Group. Punter, T., Ciolkowski, M., Freimut, B., John, I., 2003. Conducting on-line surveys in software engineering. In: Proceedings of the International Symposium on Empirical Software Engineering (ISESE), pp. 80–88. Schmidt, D.C., 2006. Guest editor’s introduction: model-driven engineering. Com- puter 39, 25–31. Selic, B., 2003. The pragmatics of model-driven development. IEEE Software 20, 19–25. Singer, J., Storey, M., Damian, D., 2008. Selecting empirical methods for software engineering research. In: Singer, Shull, Sjoberg (Eds.), Guide to Advanced Empir- ical Software Engineering. Springer, London, pp. 285–311. Tomassetti, F., Torchiano, M., Tiso, A., Ricca, F., Reggio, G., 2012. Maturity of software modelling and model driven engineering: a survey in the Italian industry. IET Seminar Digests 2012, 91–100. Torchiano, M., Di Penta, M., Ricca, F., De Lucia, A., Lanubile, F., 2011a. Migration of information systems in the Italian industry: a state of the practice survey. Information and Software Technology 53, 71–86. Torchiano, M., Tomassetti, F., Ricca, F., Tiso, A., Reggio, G.,2011b. Preliminary findings from a survey on the MD* state of the practice. In: International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, pp. 372–375. van Deursen, A., Visser, E., Warmer, J., 2007. Model-driven software evolution: a research agenda. Technical Report, TUDelft-SERG. Voelter, M., 2009. Best practices for DSLs and model-driven development. Journal of Object Technology 8 http://www.jot.fm Walonick, D.S., 1997. Survival Statistics. StatPac, Inc. Wohlin, C., Runeson, P., Höst, M., Ohlsson, M., Regnell, B., Wesslén, A., 2000. Experimentation in Software Engineering – An Introduction. Kluwer Academic Publishers. Marco Torchiano is an associate professor at Politecnico di Torino, Italy; he has been post-doctoral research fel- low at Norwegian University of Science and Technology (NTNU), Norway. He received an MSc and a PhD in Com- puter Engineering from Politecnico di Torino, Italy. He is author or coauthor of more than 100 research papers published in international journals and conferences. He is the co-author of the book ‘Software Development–Case studies in Java’ from Addison-Wesley, and co-editor of the book ‘Developing Services for the Wireless Internet’ from Springer. His current research interests are: design notations, testing methodologies, OTS-based develop- ment and software engineering for mobile and wireless applications. The methodological approach he adopts is that of empirical software engineering. Federico Tomassetti is a PhD student in the Dept. of Con- trol and Computer Engineering at Politecnico di Torino. He is interested in all the means to lift the level of abstrac- tion, in particular in model-driven development, domain specific languages and language workbenches. He has experience also as developer, consultant and technical writer. Filippo Ricca is an ass of Genova, Italy. He r puter Science from th the thesis “Analysis, T Applications”. In 2011 (Most Influential Pape and Testing of Web Ap He is author or co papers published in i ences/workshops. At th h-index reported by G the number of citation Chair of CSMR 2013, IC others, he served in the program committees of ICST, SCAM, CSMR, WCRE and ESEM. From 1999 to 2006, he worked with the Soft (now FBK-irst), Trento, Italy. During this time he on Reverse engineering, Re-engineering and Soft His current research interests include: Softwa Empirical studies in Software Engineering, Web The research is mainly conducted through empir controlled experiments and surveys. Alessandro Tiso is a Ph University of Genoa. He applied mathematics i interests are: model dr mations and use of mod last 20 years he also w industry. Gianna Reggio is Ass Genova since 1992. Pre at the same university, in 1988. She took part i them SAHARA and SAL eral international conf relevant being ETAPS been a member of the p national conferences a 2008, and 2007). Her on the following topics concurrent systems m and semantics of prog started working on development methods for S several national and international conference an 2126 M. Torchiano et al. / The Journal of Systems and Software 86 (2013) 2110–2126 Motulsky, H., 2010. Intuitive Biostatistics: A Nonmathematical Guide to Statistical Thinking. Oxford University Press, New York. Nugroho, A., Chaudron, M.R.,2008. A survey into the rigor of UML use and its perceived impact on quality and productivity. In: Proceedings of the ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM’08. ACM, New York, NY, USA, pp. 90–99. Object Management Group, 2011a. Business Process Model and Notation, v. 2.0, Standard. Object Management Group. Object Management Group, 2011b. Unified Modeling Language, Superstructure, v. 2.4, Specifications. Object Management Group. Punter, T., Ciolkowski, M., Freimut, B., John, I., 2003. Conducting on-line surveys in software engineering. In: Proceedings of the International Symposium on Empirical Software Engineering (ISESE), pp. 80–88. Schmidt, D.C., 2006. Guest editor’s introduction: model-driven engineering. Com- puter 39, 25–31. Selic, B., 2003. The pragmatics of model-driven development. IEEE Software 20, 19–25. Singer, J., Storey, M., Damian, D., 2008. Selecting empirical methods for software engineering research. In: Singer, Shull, Sjoberg (Eds.), Guide to Advanced Empir- ical Software Engineering. Springer, London, pp. 285–311. Tomassetti, F., Torchiano, M., Tiso, A., Ricca, F., Reggio, G., 2012. Maturity of software modelling and model driven engineering: a survey in the Italian industry. IET Seminar Digests 2012, 91–100. Torchiano, M., Di Penta, M., Ricca, F., De Lucia, A., Lanubile, F., 2011a. Migration of information systems in the Italian industry: a state of the practice survey. Information and Software Technology 53, 71–86. Torchiano, M., Tomassetti, F., Ricca, F., Tiso, A., Reggio, G.,2011b. Preliminary findings from a survey on the MD* state of the practice. In: International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, pp. 372–375. van Deursen, A., Visser, E., Warmer, J., 2007. Model-driven software evolution: a research agenda. Technical Report, TUDelft-SERG. Voelter, M., 2009. Best practices for DSLs and model-driven development. Journal of Object Technology 8 http://www.jot.fm Walonick, D.S., 1997. Survival Statistics. StatPac, Inc. Wohlin, C., Runeson, P., Höst, M., Ohlsson, M., Regnell, B., Wesslén, A., 2000. Experimentation in Software Engineering – An Introduction. Kluwer Academic Publishers. Marco Torchiano is an associate professor at Politecnico di Torino, Italy; he has been post-doctoral research fel- low at Norwegian University of Science and Technology (NTNU), Norway. He received an MSc and a PhD in Com- puter Engineering from Politecnico di Torino, Italy. He is author or coauthor of more than 100 research papers published in international journals and conferences. He is the co-author of the book ‘Software Development–Case studies in Java’ from Addison-Wesley, and co-editor of the book ‘Developing Services for the Wireless Internet’ from Springer. His current research interests are: design notations, testing methodologies, OTS-based develop- ment and software engineering for mobile and wireless applications. The methodological approach he adopts is that of empirical software engineering. Federico Tomassetti is a PhD student in the Dept. of Con- trol and Computer Engineering at Politecnico di Torino. He is interested in all the means to lift the level of abstrac- tion, in particular in model-driven development, domain specific languages and language workbenches. He has experience also as developer, consultant and technical writer. Filippo Ricca is an assistant professor at the University of Genova, Italy. He received his PhD degree in Com- puter Science from the same University, in 2003, with the thesis “Analysis, Testing and Re-structuring of Web Applications”. In 2011 he was awarded the ICSE 2001 MIP (Most Influential Paper) award, for his paper: “Analysis and Testing of Web Applications”. He is author or coauthor of more than 90 research papers published in international journals and confer- ences/workshops. At the time of writing (20/03/2013), the h-index reported by Google Scholar is 24 (13 in Scopus); the number of citations 2002. Filippo Ricca was Program Chair of CSMR 2013, ICPC 2011 and WSE 2008. Among the others, he served in the program committees of the following conferences: ICSM, ICST, SCAM, CSMR, WCRE and ESEM. From 1999 to 2006, he worked with the Software Engineering group at ITC-irst (now FBK-irst), Trento, Italy. During this time he was part of the team that worked on Reverse engineering, Re-engineering and Software Testing. His current research interests include: Software modeling, Reverse engineering, Empirical studies in Software Engineering, Web applications and Software Testing. The research is mainly conducted through empirical methods such as case studies, controlled experiments and surveys. Alessandro Tiso is a PhD student in Computer Science at University of Genoa. He is also a teacher of informatics and applied mathematics in a high school. His main research interests are: model driven development, model transfor- mations and use of models in software engineering. In the last 20 years he also worked as an IT consultant in the industry. Gianna Reggio is Associate Professor at University of Genova since 1992. Previously she was Assistant Professor at the same university, she receive the PhD in Informatics in 1988. She took part in several research projects, among them SAHARA and SALADIN (PRIN) and co-organized sev- eral international conferences and workshops, the most relevant being ETAPS 2001 and MODELS 2006. She has been a member of the program committee of several inter- national conferences and in particular MODELS (in 2009, 2008, and 2007). Her research activity focused mainly on the following topics: software development methods, concurrent systems modeling and formal specification, and semantics of programming languages. Recently she started working on development methods for SOA applications. She is author of several national and international conference and journals papers and books. 2126 M. Torchiano et al. / The Journal of Systems and Software 86 (2013) 2110–2126 Motulsky, H., 2010. Intuitive Biostatistics: A Nonmathematical Guide to Statistical Thinking. Oxford University Press, New York. Nugroho, A., Chaudron, M.R.,2008. A survey into the rigor of UML use and its perceived impact on quality and productivity. In: Proceedings of the ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM’08. ACM, New York, NY, USA, pp. 90–99. Object Management Group, 2011a. Business Process Model and Notation, v. 2.0, Standard. Object Management Group. Object Management Group, 2011b. Unified Modeling Language, Superstructure, v. 2.4, Specifications. Object Management Group. Punter, T., Ciolkowski, M., Freimut, B., John, I., 2003. Conducting on-line surveys in software engineering. In: Proceedings of the International Symposium on Empirical Software Engineering (ISESE), pp. 80–88. Schmidt, D.C., 2006. Guest editor’s introduction: model-driven engineering. Com- puter 39, 25–31. Selic, B., 2003. The pragmatics of model-driven development. IEEE Software 20, 19–25. Singer, J., Storey, M., Damian, D., 2008. Selecting empirical methods for software engineering research. In: Singer, Shull, Sjoberg (Eds.), Guide to Advanced Empir- ical Software Engineering. Springer, London, pp. 285–311. Tomassetti, F., Torchiano, M., Tiso, A., Ricca, F., Reggio, G., 2012. Maturity of software modelling and model driven engineering: a survey in the Italian industry. IET Seminar Digests 2012, 91–100. Torchiano, M., Di Penta, M., Ricca, F., De Lucia, A., Lanubile, F., 2011a. Migration of information systems in the Italian industry: a state of the practice survey. Information and Software Technology 53, 71–86. Torchiano, M., Tomassetti, F., Ricca, F., Tiso, A., Reggio, G.,2011b. Preliminary findings from a survey on the MD* state of the practice. In: International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, pp. 372–375. van Deursen, A., Visser, E., Warmer, J., 2007. Model-driven software evolution: a research agenda. Technical Report, TUDelft-SERG. Voelter, M., 2009. Best practices for DSLs and model-driven development. Journal of Object Technology 8 http://www.jot.fm Walonick, D.S., 1997. Survival Statistics. StatPac, Inc. Wohlin, C., Runeson, P., Höst, M., Ohlsson, M., Regnell, B., Wesslén, A., 2000. Experimentation in Software Engineering – An Introduction. Kluwer Academic Publishers. Marco Torchiano is an associate professor at Politecnico di Torino, Italy; he has been post-doctoral research fel- low at Norwegian University of Science and Technology (NTNU), Norway. He received an MSc and a PhD in Com- puter Engineering from Politecnico di Torino, Italy. He is author or coauthor of more than 100 research papers published in international journals and conferences. He is the co-author of the book ‘Software Development–Case studies in Java’ from Addison-Wesley, and co-editor of the book ‘Developing Services for the Wireless Internet’ from Springer. His current research interests are: design notations, testing methodologies, OTS-based develop- ment and software engineering for mobile and wireless applications. The methodological approach he adopts is that of empirical software engineering. Federico Tomassetti is a PhD student in the Dept. of Con- trol and Computer Engineering at Politecnico di Torino. He is interested in all the means to lift the level of abstrac- tion, in particular in model-driven development, domain specific languages and language workbenches. He has experience also as developer, consultant and technical writer. Filippo Ricca is an assistant professor at the University of Genova, Italy. He received his PhD degree in Com- puter Science from the same University, in 2003, with the thesis “Analysis, Testing and Re-structuring of Web Applications”. In 2011 he was awarded the ICSE 2001 MIP (Most Influential Paper) award, for his paper: “Analysis and Testing of Web Applications”. He is author or coauthor of more than 90 research papers published in international journals and confer- ences/workshops. At the time of writing (20/03/2013), the h-index reported by Google Scholar is 24 (13 in Scopus); the number of citations 2002. Filippo Ricca was Program Chair of CSMR 2013, ICPC 2011 and WSE 2008. Among the others, he served in the program committees of the following conferences: ICSM, ICST, SCAM, CSMR, WCRE and ESEM. From 1999 to 2006, he worked with the Software Engineering group at ITC-irst (now FBK-irst), Trento, Italy. During this time he was part of the team that worked on Reverse engineering, Re-engineering and Software Testing. His current research interests include: Software modeling, Reverse engineering, Empirical studies in Software Engineering, Web applications and Software Testing. The research is mainly conducted through empirical methods such as case studies, controlled experiments and surveys. Alessandro Tiso is a PhD student in Computer Science at University of Genoa. He is also a teacher of informatics and applied mathematics in a high school. His main research interests are: model driven development, model transfor- mations and use of models in software engineering. In the last 20 years he also worked as an IT consultant in the industry. Gianna Reggio is Associate Professor at University of Genova since 1992. Previously she was Assistant Professor at the same university, she receive the PhD in Informatics in 1988. She took part in several research projects, among them SAHARA and SALADIN (PRIN) and co-organized sev- eral international conferences and workshops, the most relevant being ETAPS 2001 and MODELS 2006. She has been a member of the program committee of several inter- national conferences and in particular MODELS (in 2009, 2008, and 2007). Her research activity focused mainly on the following topics: software development methods, concurrent systems modeling and formal specification, and semantics of programming languages. Recently she started working on development methods for SOA applications. She is author of several national and international conference and journals papers and books. 2126 M. Torchiano et al. / The Journal of Systems and Software 86 (2013) 2110–2126 Motulsky, H., 2010. Intuitive Biostatistics: A Nonmathematical Guide to Statistical Thinking. Oxford University Press, New York. Nugroho, A., Chaudron, M.R.,2008. A survey into the rigor of UML use and its perceived impact on quality and productivity. In: Proceedings of the ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM’08. ACM, New York, NY, USA, pp. 90–99. Object Management Group, 2011a. Business Process Model and Notation, v. 2.0, Standard. Object Management Group. Object Management Group, 2011b. Unified Modeling Language, Superstructure, v. 2.4, Specifications. Object Management Group. Punter, T., Ciolkowski, M., Freimut, B., John, I., 2003. Conducting on-line surveys in software engineering. In: Proceedings of the International Symposium on Empirical Software Engineering (ISESE), pp. 80–88. Schmidt, D.C., 2006. Guest editor’s introduction: model-driven engineering. Com- puter 39, 25–31. Selic, B., 2003. The pragmatics of model-driven development. IEEE Software 20, 19–25. Singer, J., Storey, M., Damian, D., 2008. Selecting empirical methods for software engineering research. In: Singer, Shull, Sjoberg (Eds.), Guide to Advanced Empir- ical Software Engineering. Springer, London, pp. 285–311. Tomassetti, F., Torchiano, M., Tiso, A., Ricca, F., Reggio, G., 2012. Maturity of software modelling and model driven engineering: a survey in the Italian industry. IET Seminar Digests 2012, 91–100. Torchiano, M., Di Penta, M., Ricca, F., De Lucia, A., Lanubile, F., 2011a. Migration of information systems in the Italian industry: a state of the practice survey. Information and Software Technology 53, 71–86. Torchiano, M., Tomassetti, F., Ricca, F., Tiso, A., Reggio, G.,2011b. Preliminary findings from a survey on the MD* state of the practice. In: International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, pp. 372–375. van Deursen, A., Visser, E., Warmer, J., 2007. Model-driven software evolution: a research agenda. Technical Report, TUDelft-SERG. Voelter, M., 2009. Best practices for DSLs and model-driven development. Journal of Object Technology 8 http://www.jot.fm Walonick, D.S., 1997. Survival Statistics. StatPac, Inc. Wohlin, C., Runeson, P., Höst, M., Ohlsson, M., Regnell, B., Wesslén, A., 2000. Experimentation in Software Engineering – An Introduction. Kluwer Academic Publishers. Marco Torchiano is an associate professor at Politecnico di Torino, Italy; he has been post-doctoral research fel- low at Norwegian University of Science and Technology (NTNU), Norway. He received an MSc and a PhD in Com- puter Engineering from Politecnico di Torino, Italy. He is author or coauthor of more than 100 research papers published in international journals and conferences. He is the co-author of the book ‘Software Development–Case studies in Java’ from Addison-Wesley, and co-editor of the book ‘Developing Services for the Wireless Internet’ from Springer. His current research interests are: design notations, testing methodologies, OTS-based develop- ment and software engineering for mobile and wireless applications. The methodological approach he adopts is that of empirical software engineering. Federico Tomassetti is a PhD student in the Dept. of Con- trol and Computer Engineering at Politecnico di Torino. He is interested in all the means to lift the level of abstrac- tion, in particular in model-driven development, domain specific languages and language workbenches. He has experience also as developer, consultant and technical writer. Filippo Ricca is an assistant professor at th of Genova, Italy. He received his PhD degr puter Science from the same University, in the thesis “Analysis, Testing and Re-structu Applications”. In 2011 he was awarded the IC (Most Influential Paper) award, for his pape and Testing of Web Applications”. He is author or coauthor of more than papers published in international journals ences/workshops. At the time of writing (20/0 h-index reported by Google Scholar is 24 (13 the number of citations 2002. Filippo Ricca w Chair of CSMR 2013, ICPC 2011 and WSE 2008 others, he served in the program committees of the following confere ICST, SCAM, CSMR, WCRE and ESEM. From 1999 to 2006, he worked with the Software Engineering grou (now FBK-irst), Trento, Italy. During this time he was part of the team on Reverse engineering, Re-engineering and Software Testing. His current research interests include: Software modeling, Reverse Empirical studies in Software Engineering, Web applications and Softw The research is mainly conducted through empirical methods such as controlled experiments and surveys. Alessandro Tiso is a PhD student in Comput University of Genoa. He is also a teacher of info applied mathematics in a high school. His m interests are: model driven development, mo mations and use of models in software engine last 20 years he also worked as an IT consu industry. Gianna Reggio is Associate Professor at U Genova since 1992. Previously she was Assista at the same university, she receive the PhD in in 1988. She took part in several research pro them SAHARA and SALADIN (PRIN) and co-or eral international conferences and worksho relevant being ETAPS 2001 and MODELS 20 been a member of the program committee of s national conferences and in particular MODE 2008, and 2007). Her research activity foc on the following topics: software developme concurrent systems modeling and formal s and semantics of programming languages. R started working on development methods for SOA applications. She several national and international conference and journals papers and
  • 13. Current Research Topics •Data Quality •Testing of Mobile App UI
  • 14. Data Quality • Data Quality • Data is a key component in many domains • Low quality makes the outcomes useless or outright dangerous • Data is not (anymore) just a part of software • New ISO quality standards: ISO-25012 , ISO-25024 • Challenges (focus on intrinsic characteristics): • Practical application of quality measures to concrete cases (Open-Data) • Quality assessment of unstructured data (NoSql, semantic KB)
  • 15. Data Quality: practical application • Open Government Data • Defined a set of metrics from literature • Analyzed + municipal data • Able to identify quality issues • Different issues depending on disclosure process • Technical specification: UNI/TS 11725:2018 • Guidelines for the measurement of data quality • As a member of UNI CT 504 Open data quality measurement framework: Definition and application to Open Government Data Antonio Vetrò ⁎, Lorenzo Canova, Marco Torchiano, Camilo Orozco Minotas, Raimondo Iemma, Federico Morando Nexa Center for Internet & Society, DAUIN, Politecnico di Torino, Italy a b s t r a c ta r t i c l e i n f o Article history: Received 5 March 2015 Received in revised form 1 February 2016 Accepted 3 February 2016 Available online 20 February 2016 The diffusion of Open Government Data (OGD) in recent years kept a very fast pace. However, evidence from practitioners shows that disclosing data without proper quality control may jeopardize dataset reuse and nega- tively affect civic participation. Current approaches to the problem in literature lack a comprehensive theoretical framework. Moreover, most of the evaluations concentrate on open data platforms, rather than on datasets. In this work, we address these two limitations and set up a framework of indicators to measure the quality of Open Government Data on a series of data quality dimensions at most granular level of measurement. We vali- dated the evaluation framework by applying it to compare two cases of Italian OGD datasets: an internationally recognized good example of OGD, with centralized disclosure and extensive data quality controls, and samples of OGD from decentralized data disclosure (municipality level), with no possibility of extensive quality controls as in the former case, hence with supposed lower quality. Starting from measurements based on the quality framework, we were able to verify the difference in quality: the measures showed a few common acquired good practices and weaknesses, and a set of discriminating factors that pertain to the type of datasets and the overall approach. On the basis of this evaluation, we also provided technical and policy guidelines to overcome the weaknesses observed in the decentralized release policy, ad- dressing specific quality aspects. © 2016 Elsevier Inc. All rights reserved. Keywords: Open Government Data Open data quality Government information quality Data quality measurement Empirical assessment 1. Introduction Open data are data that “can be freely used, modified, and shared by anyone for any purpose” (Web ref. 1). Compared to proprietary frame- works, digital commons such as open data are characterized — from both a legal and a technical point of view — by lower restrictions applied to their circulation and reuse. This feature is supposed to ultimately fos- ter collaboration, creativity and innovation (Hofmokl, 2010). At all administrative levels, the public sector is one of the major pro- ducers and holders of information, which ranges, e.g., from maps to companies registers (Aichholzer & Burkert, 2004). During the last years, the amount and variety of open data released by public adminis- trations across the world has been tangibly growing (see, e.g., the Open Data Census by the Open Knowledge Foundation (Web ref. 2)), while increased political awareness on the subject has been translated in regulation, including the revised of the EU Directive on Public Sector Information reuse in 2013, as well as national roadmaps and technical guidelines. Releasing public sector information as open data can provide considerable added value, meeting a demand coming from all kinds of actors, ranging from companies to Non-Governmental Organizations, from developers to simple citizens. Many suggest that wider and easier circulation of public datasets could entail interesting (and even unex- pected) forms of reuse, also for commercial purposes (Vickery, 2011), and in general improve transparency of public institutions (Stiglitz, Orszag, & Orszag, 2000; Ubaldi, 2013) and distributed ability to inter- pret complex phenomena (Janssen, Charalabidis, & Zuiderwijk, 2012). From the point of view of data reusers, we should take into account the role of the so-called infomediaries, i.e., players that are able to inter- pret data and present them effectively to the general public (Mayer- Schönberger & Zappia, 2011), with a highly diversified set of business models (Zuiderwijk, Janssen, & Davis, 2014). More in general, open data consumers manipulate data in many ways, ranging, e.g., from data integration to classification, also depending on the complementary assets they hold (Ferro & Osella, 2013). Considering this, legal and technical openness of datasets is not sufficient, by itself, to create a pro- lific reuse ecosystem (Helbig, Nakashima, & Dawe, 2012): failures in providing good quality information might impair not only the reuse of the data, but also the usage of the institutional portals (Detlor, Hupfer, Ruhi, & Zhao, 2013). Attempts to increase meaningfulness and reusabil- ity of public sector information also imply representing and exposing data so that they can be easily accessed, queried, processed and linked with other data with no restrictions (Sharon, 2010). Although sharing Government Information Quarterly 33 (2016) 325–337 ⁎ Corresponding author at: Nexa Center for Internet & Society, Politecnico di Torino, Via Pier Carlo Boggio, 65/A, 10138 Torino, Italy. E-mail address: antonio.vetro@polito.it (A. Vetrò). http://dx.doi.org/10.1016/j.giq.2016.02.001 0740-624X/© 2016 Elsevier Inc. All rights reserved. Contents lists available at ScienceDirect Government Information Quarterly journal homepage: www.elsevier.com/locate/govinf Barlett, D. L., & Steele, J. B. (1985). Forevermore: Nuclear waste in America. Basili, V., Caldiera, G., & Rombach, H. D. (1994). The goal question metric approach. In: Encyclopedia of software engineering. John Wiley & Sons, 528–532. Batini, C., & Scannapieco, M. (2006). Data quality, concepts, methodologies and techniques. Berlin: Springer. Batini, C., Cappiello, C., Francalanci, C., & Maurino A. Methodologies for data quality as- sessment and improvement. ACM Computing Surveys 41, 3, Article 16 (July 2009), (52 pp.). Behkamal, Behshid, Kahani, Mohsen, Bagheri, Ebrahim, & Jeremic, Zoran (May 2014). A metrics-driven approach for quality assessment of linked open data. Journal of Theoretical and Applied Electronic Commerce Research, 9(2), 64–79. Berners-Lee, T. (2006). Linked data-design issues. Tech. rep., W3C http://www.w3.org/ DesignIssues/LinkedData.html Bovee, M., Srivastava, R., & Mak, B. (2003). Conceptual framework and belief-function approach to assessing overall information quality. International Journal of Intelligent Systems, 51–74. Calero, C., Caro, A., & Piattini, M. (2008). An applicable data quality model for web portal data consumers. World Wide Web, 11(4), 465–484. Canova, L., Basso, S., Iemma, R., & Morando, F. (2015). Collaborative open data versioning: A pragmatic approach using linked data. International Conference for E-Democracy and Open Government 2015 (CeDEM15), Krems, Austria. Catarci, T., & Scannapieco, M. (2002). Data quality under the computer science perspec- tive. Archivi Computer, 2. Detlor, B., Hupfer, Maureen E, Ruhi, U. & Zhao, L. Information quality and community municipal portal use, Government Information Quarterly, Volume 30, Issue 1, January 2013, Pages 23-32, http://dx.doi.org/10.1016/j.giq.2012.08.00, (ISSN 0740-624X). English, L. (1999). Improving data warehouse and business information quality. Wiley & Sons. Even, A., & Shankaranarayanan, G. (2009). Utility cost perspectives in data quality man- agement. The Journal of Computer Information Systems, 50(2), 127–135. Ferro, E., & Osella, M. (2013). Eight business model archetypes for PSI re-use. “Open data on the web” workshop, Google campus, London. Haug, A., Pedersen, A., & Arlbjørn, J. S. (2009). A classification model of ERP system data quality. Industrial Management & Data Systems, 109(8), 1053–1068. http://dx.doi. org/10.1108/02635570910991292. Heinrich, B. (2002). Datenqualitätsmanagement in Data Warehouse-Systemen. (doctoral thesis, Oldenburg). Heinrich, B., Klier, M., & Kaiser, M. (2009). A procedure to develop metrics for cur- rency and its application in CRM. Journal of Data and Information Quality, 1(1), 5. Helbig, N., Nakashima, M., & Dawe, Sharon S. (June 4–7, 2012). Understanding the value and limits of government information in policy informatics: A preliminary explora- tion. Proceedings of the 13th Annual International Conference on Digital Government Research (dg.o2012). Hofmokl, J. (2010). The Internet commons: toward an eclectic theoretical framework. International Journal of the Commons, 4(1), 226–250. Iemma, R., Morando, F., & Osella, M. (2014). Breaking public administrations' data silos. eJournal of eDemocracy & Open Government, 6(2). Janssen, M., Charalabidis, Y., & Zuiderwijk, A. (2012). Benefits, adoption barriers and myths of open data and open government. Information Systems Management, 29(4), 258–268. Vassiliou, Y. (1995). In M. Jarke, M. Lenzerini, & P. Vassiliadis (Eds.), Fundamentals of data warehouses. Springer Verlag. Jeusfeld, M., Quix, C., & Jarke, M. (1998). Design and analysis of quality information for datawarehouses. Proceedings of the 17th International Conference on Conceptual Modeling. Kaiser, M., Klier, M., & Heinrich, B. (2007). “How to measure data quality? — A metric- based approach” (2007). ICIS 2007 Proceedings. Paper 108. Kim, W. (2002). On three major holes in data warehousing today. Journal of Object Technology, 1(4), 39–47. http://dx.doi.org/10.5381/jot.2002.1.4.c3. Kuk, G., & Davies, T. (2011). The roles of agency and artifacts in assembling open data complementarities. Thirty Second International Conference on Information Systems. Madnick, S., Wang, R., & Xian, X. (2004). The design and implementation of a corporate householding knowledge processor to improve data quality. Journal of Management Information Systems, 20(1), 41–49. Maurino, A., Spahiu, B., Batini, C., & Viscusi, G. (2014). Compliance with Open Government Data Policies: an empirical evaluation of Italian local public administrations. Twenty Second European Conference on Information Systems, Tel Aviv. Mayer-Schönberger, V., & Zappia, Z. (2011). Participation and power: intermediaries of open data. 1st Berlin Symposium on Internet and Society. Moraga, C., Moraga, M., Calero, C., & Caro, A. (2009). SQuaRE-aligned data quality model for web portals. Quality Software, 2009. QSIC’09. 9th International Conference on (pp. 117–122). Naumann, F. (2002). Quality-driven query answering for integrated information systems. Lecture Notes in Computer Science, 2261. Redman, T. (1996). Data quality for the information age. Artech House. Reiche, K., & Hofig, E. (2013). Implementation of metadata quality metrics and application on public government data. Computer software and applications conference workshops (COMPSACW), 2013 IEEE 37th annual (pp. 236–241). Sachs, L. (1982). Applied statistics. A handbook of techniques. New York — Heidelberg — Berlin: Springer-Verlag (734 pp., 59 figs., DM 118,-). Sande, M. V., Dimou, A., Colpaert, P., Mannens, E., & Van de Walle, R. (2013). Linked data as enabler for open data ecosystems. Open data on the web 23–24 April 2013, Campus London, Shoreditch. Sharon, Dawes S. (2010). Stewardship and usefulness: Policy principles for information- based transparency. Government Information Quarterly, 27(4), 377–383. Stiglitz, J. E., Orszag, P. R., & Orszag, J. M. (2000). The role of government in a digital age. (Commissioned by the computer& communications industry association). Tauberer, J. (2012). Open government data: The book. Ubaldi, B. (2013). Open government data: Towards empirical analysis of open govern- ment data initiatives. Tech. rep. OECD Publishing. Umbrich, Jürgen, Neumaier, Sebastian, & Polleres, Axel (March 2015). Towards assessing the quality evolution of Open Data portals. ODQ2015: Open data quality: From theory to practice workshop (Munich, Germany). Van de Sompel, H., Sanderson, R., Nelson, M. L., Balakireva, L. L., Shankar, H., & Ainsworth, S. (2010). An HTTP-based versioning mechanism for linked data. (arXiv preprint arXiv: 1003.3661). Vickery, G. (2011). Review of recent studies on PSI re-use and related market developments. Paris: Information Economics. Wand, Y., & Wang, R. (1996). Anchoring data quality dimensions in ontological founda- tions. Communications of the ACM, 39, 11. Wang, R., & Strong, D. (1996). Beyond accuracy: What data quality means to data con- sumers. Journal of Management Information Systems, 12, 4. Whitmore A. Using open government data to predict war: A case study of data and systems challenges, Government Information Quarterly, Volume 31, Issue 4, October 2014, Pages 622-630, (ISSN 0740-624X). Zuiderwijk, A., Janssen, M., & Davis, C. (2014). Innovation with open data: Essential ele- ments of open data ecosystems. Information Policy, 19(1), 17–33. Zuiderwijk, Anneke, & Janssen, Marijn (2015). Participation and data quality in open data use: Open data infrastructures evaluated. Proceedings of the 15th European Conference on eGovernment 2015: ECEG 2015. Academic Conferences Limited. Web references Web ref. 1. http://opendefinition.org/(last visited on November 5, 2015). Web ref. 2. http://census.okfn.org/, (last visited on November 5, 2015). Web ref. 3. http://goo.gl/QlUT1d (last visited on November 5, 2015). Web ref. 4. https://goo.gl/yq4ElP (last visited on November 5, 2015). Web ref. 5. http://sunlightfoundation.com/blog/2010/06/23/elenas-inbox/ (last visit on November 5, 2015). Web ref. 6. http://nexa.polito.it/lunch-9 (last visit on November 5, 2015). Web ref. 7. http://okfnlabs.org/bad-data/ (last visit on November 5, 2015). Web ref. 8. http://www.dati.gov.it/dataset (last visit on November 5, 2015). Web ref. 9. https://www.opengovawards.org/Awards_Booklet_Final.pdf, (last visit on November 5, 2015). Web ref. 10. http://en.wikipedia.org/wiki/European_Social_Fund, (last visit on November 5, 2015). Web ref. 11. http://www.dati.gov.it/(last visit on November 5, 2015). Web ref. 12. http://blog.okfn.org/2013/07/02/git-and-github-for-data/ (last visited on November 5, 2015). Web ref. 13. http://dat-data.com/(last visited on November 5, 2015). Web ref. 14. https://code.google.com/p/google-refine/ (last visited on November 5, 2015). Web ref. 15. http://datacleaner.org/, (last visited on November 5, 2015). Web ref. 16. http://checklists.opquast.com/en/opendata, (last visited on November 5, 2015). Web ref. 17. http://odaf.org/papers/Open%20Data%20and%20Metadata%20Standards.pdf, (last visited on November 5, 2015). Antonio Vetrò is Director of Research and Policy at the Nexa Center for Internet and Society at Politecnico di Torino (Italy). Formerly, he has been a research fellow in the Soft- ware and System Engineering Department at Technische Universität München (Germany) and junior scientist at Fraunhofer Center for Experimental Software Engineering (MD, U.S.A.). He holds a PhD in Information and System En- gineering from Politecnico di Torino (Italy). He is interested in studying the impact of technology on society, with a focus on technology transfer and internet science, and adopting an empirical epistemological approach. Contact him at antonio. vetro@polito.it. Lorenzo Canova holds a MSc in Industrial engineering from Politecnico di Torino and wrote his Master thesis on Open Government Data quality. He is currently a research fellow of the Nexa Center for Internet & Society at Politecnico di To- rino. His main research interests are in Open Government Data, government transparency and linked data. Contact him at lorenzo.canova@polito.it. 336 A. Vetrò et al. / Government Information Quarterly 33 (2016) 325–337 Behkamal, Behshid, Kahani, Mohsen, Bagheri, Ebrahim, & Jeremic, Zoran (May 2014). A metrics-driven approach for quality assessment of linked open data. Journal of Theoretical and Applied Electronic Commerce Research, 9(2), 64–79. Berners-Lee, T. (2006). Linked data-design issues. Tech. rep., W3C http://www.w3.org/ DesignIssues/LinkedData.html Bovee, M., Srivastava, R., & Mak, B. (2003). Conceptual framework and belief-function approach to assessing overall information quality. International Journal of Intelligent Systems, 51–74. Calero, C., Caro, A., & Piattini, M. (2008). An applicable data quality model for web portal data consumers. World Wide Web, 11(4), 465–484. Canova, L., Basso, S., Iemma, R., & Morando, F. (2015). Collaborative open data versioning: A pragmatic approach using linked data. International Conference for E-Democracy and Open Government 2015 (CeDEM15), Krems, Austria. Catarci, T., & Scannapieco, M. (2002). Data quality under the computer science perspec- tive. Archivi Computer, 2. Detlor, B., Hupfer, Maureen E, Ruhi, U. & Zhao, L. Information quality and community municipal portal use, Government Information Quarterly, Volume 30, Issue 1, January 2013, Pages 23-32, http://dx.doi.org/10.1016/j.giq.2012.08.00, (ISSN 0740-624X). English, L. (1999). Improving data warehouse and business information quality. Wiley & Sons. Even, A., & Shankaranarayanan, G. (2009). Utility cost perspectives in data quality man- agement. The Journal of Computer Information Systems, 50(2), 127–135. Ferro, E., & Osella, M. (2013). Eight business model archetypes for PSI re-use. “Open data on the web” workshop, Google campus, London. Haug, A., Pedersen, A., & Arlbjørn, J. S. (2009). A classification model of ERP system data quality. Industrial Management & Data Systems, 109(8), 1053–1068. http://dx.doi. org/10.1108/02635570910991292. Heinrich, B. (2002). Datenqualitätsmanagement in Data Warehouse-Systemen. (doctoral thesis, Oldenburg). Heinrich, B., Klier, M., & Kaiser, M. (2009). A procedure to develop metrics for cur- rency and its application in CRM. Journal of Data and Information Quality, 1(1), 5. Helbig, N., Nakashima, M., & Dawe, Sharon S. (June 4–7, 2012). Understanding the value and limits of government information in policy informatics: A preliminary explora- tion. Proceedings of the 13th Annual International Conference on Digital Government Research (dg.o2012). Hofmokl, J. (2010). The Internet commons: toward an eclectic theoretical framework. International Journal of the Commons, 4(1), 226–250. Iemma, R., Morando, F., & Osella, M. (2014). Breaking public administrations' data silos. eJournal of eDemocracy & Open Government, 6(2). Janssen, M., Charalabidis, Y., & Zuiderwijk, A. (2012). Benefits, adoption barriers and myths of open data and open government. Information Systems Management, 29(4), 258–268. Vassiliou, Y. (1995). In M. Jarke, M. Lenzerini, & P. Vassiliadis (Eds.), Fundamentals of data warehouses. Springer Verlag. Jeusfeld, M., Quix, C., & Jarke, M. (1998). Design and analysis of quality information for datawarehouses. Proceedings of the 17th International Conference on Conceptual Modeling. Kaiser, M., Klier, M., & Heinrich, B. (2007). “How to measure data quality? — A metric- based approach” (2007). ICIS 2007 Proceedings. Paper 108. Kim, W. (2002). On three major holes in data warehousing today. Journal of Object Technology, 1(4), 39–47. http://dx.doi.org/10.5381/jot.2002.1.4.c3. Kuk, G., & Davies, T. (2011). The roles of agency and artifacts in assembling open data complementarities. Thirty Second International Conference on Information Systems. Madnick, S., Wang, R., & Xian, X. (2004). The design and implementation of a corporate householding knowledge processor to improve data quality. Journal of Management Information Systems, 20(1), 41–49. Maurino, A., Spahiu, B., Batini, C., & Viscusi, G. (2014). Compliance with Open Government Data Policies: an empirical evaluation of Italian local public administrations. Twenty Second European Conference on Information Systems, Tel Aviv. Mayer-Schönberger, V., & Zappia, Z. (2011). Participation and power: intermediaries of open data. 1st Berlin Symposium on Internet and Society. Moraga, C., Moraga, M., Calero, C., & Caro, A. (2009). SQuaRE-aligned data quality model for web portals. Quality Software, 2009. QSIC’09. 9th International Conference on (pp. 117–122). Naumann, F. (2002). Quality-driven query answering for integrated information systems. Lecture Notes in Computer Science, 2261. Redman, T. (1996). Data quality for the information age. Artech House. Reiche, K., & Hofig, E. (2013). Implementation of metadata quality metrics and application on public government data. Computer software and applications conference workshops (COMPSACW), 2013 IEEE 37th annual (pp. 236–241). Sachs, L. (1982). Applied statistics. A handbook of techniques. New York — Heidelberg — Berlin: Springer-Verlag (734 pp., 59 figs., DM 118,-). Sande, M. V., Dimou, A., Colpaert, P., Mannens, E., & Van de Walle, R. (2013). Linked data as enabler for open data ecosystems. Open data on the web 23–24 April 2013, Campus London, Shoreditch. the quality evolution of Open Data portals. ODQ2015: Open data quality: From theory to practice workshop (Munich, Germany). Van de Sompel, H., Sanderson, R., Nelson, M. L., Balakireva, L. L., Shankar, H., & Ainsworth, S. (2010). An HTTP-based versioning mechanism for linked data. (arXiv preprint arXiv: 1003.3661). Vickery, G. (2011). Review of recent studies on PSI re-use and related market developments. Paris: Information Economics. Wand, Y., & Wang, R. (1996). Anchoring data quality dimensions in ontological founda- tions. Communications of the ACM, 39, 11. Wang, R., & Strong, D. (1996). Beyond accuracy: What data quality means to data con- sumers. Journal of Management Information Systems, 12, 4. Whitmore A. Using open government data to predict war: A case study of data and systems challenges, Government Information Quarterly, Volume 31, Issue 4, October 2014, Pages 622-630, (ISSN 0740-624X). Zuiderwijk, A., Janssen, M., & Davis, C. (2014). Innovation with open data: Essential ele- ments of open data ecosystems. Information Policy, 19(1), 17–33. Zuiderwijk, Anneke, & Janssen, Marijn (2015). Participation and data quality in open data use: Open data infrastructures evaluated. Proceedings of the 15th European Conference on eGovernment 2015: ECEG 2015. Academic Conferences Limited. Web references Web ref. 1. http://opendefinition.org/(last visited on November 5, 2015). Web ref. 2. http://census.okfn.org/, (last visited on November 5, 2015). Web ref. 3. http://goo.gl/QlUT1d (last visited on November 5, 2015). Web ref. 4. https://goo.gl/yq4ElP (last visited on November 5, 2015). Web ref. 5. http://sunlightfoundation.com/blog/2010/06/23/elenas-inbox/ (last visit on November 5, 2015). Web ref. 6. http://nexa.polito.it/lunch-9 (last visit on November 5, 2015). Web ref. 7. http://okfnlabs.org/bad-data/ (last visit on November 5, 2015). Web ref. 8. http://www.dati.gov.it/dataset (last visit on November 5, 2015). Web ref. 9. https://www.opengovawards.org/Awards_Booklet_Final.pdf, (last visit on November 5, 2015). Web ref. 10. http://en.wikipedia.org/wiki/European_Social_Fund, (last visit on November 5, 2015). Web ref. 11. http://www.dati.gov.it/(last visit on November 5, 2015). Web ref. 12. http://blog.okfn.org/2013/07/02/git-and-github-for-data/ (last visited on November 5, 2015). Web ref. 13. http://dat-data.com/(last visited on November 5, 2015). Web ref. 14. https://code.google.com/p/google-refine/ (last visited on November 5, 2015). Web ref. 15. http://datacleaner.org/, (last visited on November 5, 2015). Web ref. 16. http://checklists.opquast.com/en/opendata, (last visited on November 5, 2015). Web ref. 17. http://odaf.org/papers/Open%20Data%20and%20Metadata%20Standards.pdf, (last visited on November 5, 2015). Antonio Vetrò is Director of Research and Policy at the Nexa Center for Internet and Society at Politecnico di Torino (Italy). Formerly, he has been a research fellow in the Soft- ware and System Engineering Department at Technische Universität München (Germany) and junior scientist at Fraunhofer Center for Experimental Software Engineering (MD, U.S.A.). He holds a PhD in Information and System En- gineering from Politecnico di Torino (Italy). He is interested in studying the impact of technology on society, with a focus on technology transfer and internet science, and adopting an empirical epistemological approach. Contact him at antonio. vetro@polito.it. Lorenzo Canova holds a MSc in Industrial engineering from Politecnico di Torino and wrote his Master thesis on Open Government Data quality. He is currently a research fellow of the Nexa Center for Internet & Society at Politecnico di To- rino. His main research interests are in Open Government Data, government transparency and linked data. Contact him at lorenzo.canova@polito.it. Marco Torchiano is an associate professor at Politecnico di Torino, Italy; he has been a post-doctoral research fellow at Norwegian University of Science and Technology (NTNU), Norway. He received an MSc and a PhD in Computer Engi- neering from Politecnico di Torino, Italy. He is an author or coauthor of more than 100 research papers published in in- ternational journals and conferences. He is the co-author of the book ‘Software Development—Case studies in Java’ from Addison-Wesley, and the co-editor of the book ‘Developing Services for the Wireless Internet’ from Springer. His current research interests are: design notations, testing methodolo- gies, OTS-based development, and static analysis. The meth- odological approach he adopts is that of empirical software engineering. Contact him at marco.torchiano@polito.it. Camilo O. Minotas holds a MSc from Politecnico di Torino and wrote his Master thesis on Open Data Quality Assess- ment. He is currently a software developer in Colombia. His main research interests are in open data quality and govern- ment transparency. Contact him at minotas@gmail.com. Raimondo Iemma holds a MSc in Industrial engin from Politecnico di Torino (Italy). His research focu open data platforms, smart disclosure of consumptio economics of information, and, in general, open gove and innovation within (or fostered by) the public sec tween 2013 and 2015, he served as a Managing Dir research fellow at the Nexa Center for Internet & So Politecnico di Torino. He works as a project man Seac02, a digital company based in Torino. Contact raimondo.iemma@gmail.com. Federico Morando (M.A.S. in Economic theory and metrics from the Univ. of Toulouse, Ph.D. in Instit Economics and Law from the Univ. of Turin and Gh an economist, with interdisciplinary research in focused on the intersection between law, econom technology. He taught intellectual property and comp law at Bocconi University in Milan, and he lect the Politecnico di Torino, at the WIPO LL.M. in Inte Property and in other post-graduate and doctoral c Between 2009 and 2015, he served as a Managing D and then Director of Research and Policy at the Nexa for Internet & Society at Politecnico di Torino. He is cu leading a startup company that focused on linke Contact him at federico.morando@gmail.com. A. Vetrò et al. / Government Information Quarterly 33 (2016) 325–337 Marco Torchiano is an associate professor at Politecnico di Torino, Italy; he has been a post-doctoral research fellow at Norwegian University of Science and Technology (NTNU), Norway. He received an MSc and a PhD in Computer Engi- neering from Politecnico di Torino, Italy. He is an author or coauthor of more than 100 research papers published in in- ternational journals and conferences. He is the co-author of the book ‘Software Development—Case studies in Java’ from Addison-Wesley, and the co-editor of the book ‘Developing Services for the Wireless Internet’ from Springer. His current research interests are: design notations, testing methodolo- gies, OTS-based development, and static analysis. The meth- odological approach he adopts is that of empirical software engineering. Contact him at marco.torchiano@polito.it. Camilo O. Minotas holds a MSc from Politecnico di Torino and wrote his Master thesis on Open Data Quality Assess- ment. He is currently a software developer in Colombia. His main research interests are in open data quality and govern- ment transparency. Contact him at minotas@gmail.com. A. Vetrò et al. / Government Information Quarterly 33 (20 Marco Torchiano is an associate professor at Politecnico di Torino, Italy; he has been a post-doctoral research fellow at Norwegian University of Science and Technology (NTNU), Norway. He received an MSc and a PhD in Computer Engi- neering from Politecnico di Torino, Italy. He is an author or coauthor of more than 100 research papers published in in- ternational journals and conferences. He is the co-author of the book ‘Software Development—Case studies in Java’ from Addison-Wesley, and the co-editor of the book ‘Developing Services for the Wireless Internet’ from Springer. His current research interests are: design notations, testing methodolo- gies, OTS-based development, and static analysis. The meth- odological approach he adopts is that of empirical software engineering. Contact him at marco.torchiano@polito.it. Camilo O. Minotas holds a MSc from Politecnico di Torino and wrote his Master thesis on Open Data Quality Assess- ment. He is currently a software developer in Colombia. His main research interests are in open data quality and govern- ment transparency. Contact him at minotas@gmail.com. Raimondo Iemma holds a MSc in Industrial engi from Politecnico di Torino (Italy). His research foc open data platforms, smart disclosure of consumptio economics of information, and, in general, open gove and innovation within (or fostered by) the public sec tween 2013 and 2015, he served as a Managing Dir research fellow at the Nexa Center for Internet & So Politecnico di Torino. He works as a project man Seac02, a digital company based in Torino. Contact raimondo.iemma@gmail.com. Federico Morando (M.A.S. in Economic theory and metrics from the Univ. of Toulouse, Ph.D. in Insti Economics and Law from the Univ. of Turin and Gh an economist, with interdisciplinary research in focused on the intersection between law, econom technology. He taught intellectual property and com law at Bocconi University in Milan, and he lect the Politecnico di Torino, at the WIPO LL.M. in Inte Property and in other post-graduate and doctoral c Between 2009 and 2015, he served as a Managing D and then Director of Research and Policy at the Nexa for Internet & Society at Politecnico di Torino. He is cu leading a startup company that focused on linke Contact him at federico.morando@gmail.com. A. Vetrò et al. / Government Information Quarterly 33 (2016) 325–337 © UNI – Ente Nazionale Italiano di Unificazione CODICE PROGETTO UNI 1602662 NUMERO NORMA UNI/TS 11725:2018 TITOLO ITALIANO Ingegneria del software e di sistema - Linee guida per la misurazione della qualità dei dati TITOLO INGLESE System and software engineering - Guidelines for the measurement of data quality ORGANO COMPETENTE UNI/CT 504 Software Engineering CO-AUTORE SOMMARIO La specifica tecnica definisce l’applicazione della misurazione della UNI CEI ISO/IEC 25024 "Ingegneria del software e di sistema - Requisiti e valutazione della qualità dei sistemi e del software (SQuaRE) - Misurazione della qualità dei dati" attraverso il confronto con le prassi attuali e spunti teorici. La norma UNI CEI ISO/IEC 25024 definisce un set di misure da applicare nella misurazione della qualità dei dati. Le definizioni delle misure sono generali per essere applicate a diversi casi specifici. RELAZIONI NAZIONALI ----- RELAZIONI INTERNAZIONALI -----
  • 16. Data Quality: unstructured data • Evolution measures based on simple counts SELECT COUNT(DISTINCT ?s) AS ?COUNT WHERE { ?s a <C> .} Semantic Web 0 (0) 1–0 1 IOS Press A Quality Assessment Approach for Evolving Knowledge Bases Mohammad Rifat Ahmmad Rashid a,⇤, Marco Torchiano a, Giuseppe Rizzo b, Nandana Mihindukulasooriya c and Oscar Corcho c a Politecnico di Torino, Italy E-mail: mohammad.rashid@polito.it, marco.torchiano@polito.it b Instituto Superiore Mario Boella E-mail: giuseppe.rizzo@ismb.it c Universidad Politecnica de Madrid, Spain E-mail: nmihindu@fi.upm.es, ocorcho@fi.upm.es Abstract Knowledge bases are nowadays essential components for any task that requires automation with some degrees of intelligence. Assessing the quality of a Knowledge Base (KB) is a complex task as it often means measuring the quality of structured informa- tion, ontologies and vocabularies, and queryable endpoints. Popular knowledge bases such as DBpedia, YAGO2, and Wikidata have chosen the RDF data model to represent their data due to its capabilities for semantically rich knowledge representation. Despite its advantages, there are challenges in using RDF data model, for example, data quality assessment and validation. In this paper, we present a novel knowledge base quality assessment approach that relies on evolution analysis. The proposed approach uses data profiling on consecutive knowledge base releases to compute quality measures that allow detecting quality issues. Our quality characteristics are based on the KB evolution analysis and we used high-level change detection for measurement func- tions. In particular, we propose four quality characteristics: Persistency, Historical Persistency, Consistency, and Completeness. Persistency and historical persistency measures concern the degree of changes and lifespan of any entity type. Consistency and completeness measures identify properties with incomplete information and contradictory facts. The approach has been assessed both quantitatively and qualitatively on a series of releases from two knowledge bases, eleven releases of DBpedia and eight releases of 3cixty. The capability of Persistency and Consistency characteristics to detect quality issues varies significantly be- tween the two case studies. Persistency measure gives observational results for evolving KBs. It is highly effective in case of KB with periodic updates such as 3cixty KB. The Completeness characteristic is extremely effective and was able to achieve 95% precision in error detection for both use cases. The measures are based on simple statistical operations that make the solution both flexible and scalable. Keywords: Quality Assessment, Quality Issues, Temporal Analysis, Knowledge Base, Linked Data 1. Introduction The Linked Data approach consists in exposing and connecting data from different sources on the Web by the means of semantic web technologies. Tim Berners- Lee1 refers to linked open data as a distributed model *Corresponding author. E-mail: mohammad.rashid@polito.it 1http://www.w3.org/DesignIssues/LinkedData. html for the Semantic Web that allows any data provider to publish its data publicly, in a machine readable format, and to meaningfully link them with other information sources over the Web. This is leading to the creation of Linked Open Data (LOD) 2 cloud hosting several Knowledge Bases (KBs) making available billions of RDF triples from different domains such as Geogra- 2http://lod-cloud.net 1570-0844/0-1900/$35.00 c 0 – IOS Press and the authors. All rights reserved
  • 17. Mobile UI Testing K-9 Mail V 2.995 V 3.309