Research Activities: past, present, and future.

Marco Torchiano
Marco TorchianoAssociate Professor
Research Activities:
past, present, and future
Marco Torchiano
Seminario Pubblico
Procedura di selezione per professore universitario di ruolo di I fascia
S.C. 09/H1-S.S.D. ING-INF/05 - codice interno 03/18/P/O
October 19, 2018
Research Activities: past, present, and future.
Empirical Software Engineering
Survey Case study Case-controlExperiment
Package effsisze
2.2K download/month
1
On the Effectiveness of the
Test-First Approach to Programming
Hakan Erdogmus, Maurizio Morisio, Member, IEEE Computer Society, and
Marco Torchiano, Member, IEEE Computer Society
Abstract—Test-Driven Development (TDD) is based on formalizing a piece of functionality as a test, implementing the functionality
such that the test passes, and iterating the process. This paper describes a controlled experiment for evaluating an important aspect of
TDD: In TDD, programmers write functional tests before the corresponding implementation code. The experiment was conducted with
undergraduate students. While the experiment group applied a test-first strategy, the control group applied a more conventional
development technique, writing tests after the implementation. Both groups followed an incremental process, adding new features one
at a time and regression testing them. We found that test-first students on average wrote more tests and, in turn, students who wrote
more tests tended to be more productive. We also observed that the minimum quality increased linearly with the number of
programmer tests, independent of the development strategy employed.
Index Terms—General programming techniques, coding tools and techniques, testing and debugging, testing strategies, productivity,
Software Quality/SQA, software engineering process, programming paradigms.
æ
1 INTRODUCTION
TEST-DRIVEN Development [1], [2], or TDD, is an approach
to code development popularized by Extreme Program-
ming [3], [4]. The key aspect of TDD is that programmers
write low-level functional tests before production code. This
dynamic, referred to as Test-First, is the focus of this paper.
The technique is usually supported by a unit testing tool,
such as JUnit [5], [6].
Programmer tests in TDD, much like unit tests, tend to
be low-level. Sometimes they can be higher level and cross-
cutting, but do not in general address testing at integration
and system levels. TDD relies on no specific or formal
criterion to select the test cases. The tests are added
gradually during implementation. The testing strategy
employed differs from the situation where tests for a
module are written by a separate quality assurance team
before the module is available. In TDD, tests are written one
at a time and by the same person developing the module.
(Cleanroom [7], for example, also leverages tests that are
specified before implementation, but in a very different
way. Tests are specified to mirror a statistical user input
distribution as part of a formal reliability model, but this
activity is completely separated from the implementation
phase.)
TDD can be considered from several points of view:
. Feedback: Tests provide the programmer with
instant feedback as to whether new functionality
has been implemented as intended and whether it
interferes with old functionality.
. Task-orientation: Tests drive the coding activity,
encouraging the programmer to decompose the
problem into manageable, formalized programming
tasks, helping to maintain focus and providing
steady, measurable progress.
. Quality assurance: Having up-to-date tests in place
ensures a certain level of quality, maintained by
frequently running the tests.
. Low-level design: Tests provide the context in which
low-level design decisions are made, such as which
classes and methods to create, how they will be
named, what interfaces they will possess, and how
they will be used.
Although advocates claim that TDD enhances both
product quality and programmer productivity, skeptics
maintain that such an approach to programming would be
both counterproductive and hard to learn. It is easier to
imagine how interleaving coding with testing could
increase quality, but harder to imagine a similar positive
effect on productivity. Many practitioners consider tests as
overhead and testing as exclusively the job of a separate
quality assurance team. But, what if the benefits of
programmer tests outweigh their cost? If the technique
indeed has merit, a reexamination of the traditional coding
techniques could be warranted.
This paper presents a formal investigation of the
strengths and weaknesses of the Test-First approach to
programming, the central component of TDD. We study
this component in the context in which it is believed to be
most effective, namely, incremental development. To
undertake the investigation, we designed a controlled
226 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 31, NO. 3, MARCH 2005
. H. Erdogmus is with the Institute for Information Technology, National
Research Council Canada, Bldg. M-50, Montreal Rd., Ottawa, Ontario
K1A 0R6, Canada. E-mail: Hakan.Erdogmus@nrc-cnrc.gc.ca.
. M. Morisio and M. Torchiano are with the Dipartimento Automatica e
Informatica, Politecnico di Torino, C.so Duca degli Abruzzi, 24, I-10129,
Torino, Italy. E-mail: {Maurizio.Morisio, Marco.Torchiano}@polito.it.
Manuscript received 13 Feb. 2004; revised 6 Dec. 2004; accepted 31 Dec. 2004;
published online 20 Apr. 2005.
Recommended for acceptance by D. Rombach.
For information on obtaining reprints of this article, please send e-mail to:
tse@computer.org, and reference IEEECS Log Number TSE-0026-0204.
0098-5589/05/$20.00 ß 2005 IEEE Published by the IEEE Computer Society
362 citations
ERDOGMUS ET AL.: ON THE EFFECTIVENESS OF THE TEST-FIRST APPROACH TO PROGRAMMING
Test-Driven Development
• TDD is an agile development practice
• Made popular by eXtreme Programming
• Adopted widely
• Promises:
• More and better tests
• Higher external quality
• Improved productivity
Test-Driven Development: findings
• Higher number of tests
• Impact on quality and productivity
• No clear difference in terms of quality
• Too small tasks
• Process conformance
• Slight improvement in productivity
• More focused
• Less rework
On the Effectiveness of the
Test-First Approach to Programming
Hakan Erdogmus, Maurizio Morisio, Member, IEEE Computer Society, and
Marco Torchiano, Member, IEEE Computer Society
Abstract—Test-Driven Development (TDD) is based on formalizing a piece of functionality as a test, implementing the functionality
such that the test passes, and iterating the process. This paper describes a controlled experiment for evaluating an important aspect of
TDD: In TDD, programmers write functional tests before the corresponding implementation code. The experiment was conducted with
undergraduate students. While the experiment group applied a test-first strategy, the control group applied a more conventional
development technique, writing tests after the implementation. Both groups followed an incremental process, adding new features one
at a time and regression testing them. We found that test-first students on average wrote more tests and, in turn, students who wrote
more tests tended to be more productive. We also observed that the minimum quality increased linearly with the number of
programmer tests, independent of the development strategy employed.
Index Terms—General programming techniques, coding tools and techniques, testing and debugging, testing strategies, productivity,
Software Quality/SQA, software engineering process, programming paradigms.
æ
1 INTRODUCTION
TEST-DRIVEN Development [1], [2], or TDD, is an approach
to code development popularized by Extreme Program-
ming [3], [4]. The key aspect of TDD is that programmers
write low-level functional tests before production code. This
dynamic, referred to as Test-First, is the focus of this paper.
The technique is usually supported by a unit testing tool,
such as JUnit [5], [6].
Programmer tests in TDD, much like unit tests, tend to
be low-level. Sometimes they can be higher level and cross-
cutting, but do not in general address testing at integration
and system levels. TDD relies on no specific or formal
criterion to select the test cases. The tests are added
gradually during implementation. The testing strategy
employed differs from the situation where tests for a
module are written by a separate quality assurance team
before the module is available. In TDD, tests are written one
at a time and by the same person developing the module.
(Cleanroom [7], for example, also leverages tests that are
specified before implementation, but in a very different
way. Tests are specified to mirror a statistical user input
distribution as part of a formal reliability model, but this
activity is completely separated from the implementation
phase.)
TDD can be considered from several points of view:
. Feedback: Tests provide the programmer with
instant feedback as to whether new functionality
has been implemented as intended and whether it
interferes with old functionality.
. Task-orientation: Tests drive the coding activity,
encouraging the programmer to decompose the
problem into manageable, formalized programming
tasks, helping to maintain focus and providing
steady, measurable progress.
. Quality assurance: Having up-to-date tests in place
ensures a certain level of quality, maintained by
frequently running the tests.
. Low-level design: Tests provide the context in which
low-level design decisions are made, such as which
classes and methods to create, how they will be
named, what interfaces they will possess, and how
they will be used.
Although advocates claim that TDD enhances both
product quality and programmer productivity, skeptics
maintain that such an approach to programming would be
both counterproductive and hard to learn. It is easier to
imagine how interleaving coding with testing could
increase quality, but harder to imagine a similar positive
effect on productivity. Many practitioners consider tests as
overhead and testing as exclusively the job of a separate
quality assurance team. But, what if the benefits of
programmer tests outweigh their cost? If the technique
indeed has merit, a reexamination of the traditional coding
techniques could be warranted.
This paper presents a formal investigation of the
strengths and weaknesses of the Test-First approach to
programming, the central component of TDD. We study
this component in the context in which it is believed to be
most effective, namely, incremental development. To
undertake the investigation, we designed a controlled
226 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 31, NO. 3, MARCH 2005
. H. Erdogmus is with the Institute for Information Technology, National
Research Council Canada, Bldg. M-50, Montreal Rd., Ottawa, Ontario
K1A 0R6, Canada. E-mail: Hakan.Erdogmus@nrc-cnrc.gc.ca.
. M. Morisio and M. Torchiano are with the Dipartimento Automatica e
Informatica, Politecnico di Torino, C.so Duca degli Abruzzi, 24, I-10129,
Torino, Italy. E-mail: {Maurizio.Morisio, Marco.Torchiano}@polito.it.
Manuscript received 13 Feb. 2004; revised 6 Dec. 2004; accepted 31 Dec. 2004;
published online 20 Apr. 2005.
Recommended for acceptance by D. Rombach.
For information on obtaining reprints of this article, please send e-mail to:
tse@computer.org, and reference IEEECS Log Number TSE-0026-0204.
0098-5589/05/$20.00 ß 2005 IEEE Published by the IEEE Computer Society
1
On the Effectiveness of the
Test-First Approach to Programming
Hakan Erdogmus, Maurizio Morisio, Member, IEEE Computer Society, and
Marco Torchiano, Member, IEEE Computer Society
Abstract—Test-Driven Development (TDD) is based on formalizing a piece of functionality as a test, implementing the functionality
such that the test passes, and iterating the process. This paper describes a controlled experiment for evaluating an important aspect of
TDD: In TDD, programmers write functional tests before the corresponding implementation code. The experiment was conducted with
undergraduate students. While the experiment group applied a test-first strategy, the control group applied a more conventional
development technique, writing tests after the implementation. Both groups followed an incremental process, adding new features one
at a time and regression testing them. We found that test-first students on average wrote more tests and, in turn, students who wrote
more tests tended to be more productive. We also observed that the minimum quality increased linearly with the number of
programmer tests, independent of the development strategy employed.
Index Terms—General programming techniques, coding tools and techniques, testing and debugging, testing strategies, productivity,
Software Quality/SQA, software engineering process, programming paradigms.
æ
1 INTRODUCTION
TEST-DRIVEN Development [1], [2], or TDD, is an approach
to code development popularized by Extreme Program-
ming [3], [4]. The key aspect of TDD is that programmers
write low-level functional tests before production code. This
dynamic, referred to as Test-First, is the focus of this paper.
The technique is usually supported by a unit testing tool,
such as JUnit [5], [6].
Programmer tests in TDD, much like unit tests, tend to
be low-level. Sometimes they can be higher level and cross-
cutting, but do not in general address testing at integration
and system levels. TDD relies on no specific or formal
criterion to select the test cases. The tests are added
gradually during implementation. The testing strategy
employed differs from the situation where tests for a
module are written by a separate quality assurance team
before the module is available. In TDD, tests are written one
at a time and by the same person developing the module.
(Cleanroom [7], for example, also leverages tests that are
specified before implementation, but in a very different
way. Tests are specified to mirror a statistical user input
distribution as part of a formal reliability model, but this
activity is completely separated from the implementation
phase.)
TDD can be considered from several points of view:
. Feedback: Tests provide the programmer with
instant feedback as to whether new functionality
has been implemented as intended and whether it
interferes with old functionality.
. Task-orientation: Tests drive the coding activity,
encouraging the programmer to decompose the
problem into manageable, formalized programming
tasks, helping to maintain focus and providing
steady, measurable progress.
. Quality assurance: Having up-to-date tests in place
ensures a certain level of quality, maintained by
frequently running the tests.
. Low-level design: Tests provide the context in which
low-level design decisions are made, such as which
classes and methods to create, how they will be
named, what interfaces they will possess, and how
they will be used.
Although advocates claim that TDD enhances both
product quality and programmer productivity, skeptics
maintain that such an approach to programming would be
both counterproductive and hard to learn. It is easier to
imagine how interleaving coding with testing could
increase quality, but harder to imagine a similar positive
effect on productivity. Many practitioners consider tests as
overhead and testing as exclusively the job of a separate
quality assurance team. But, what if the benefits of
programmer tests outweigh their cost? If the technique
indeed has merit, a reexamination of the traditional coding
techniques could be warranted.
This paper presents a formal investigation of the
strengths and weaknesses of the Test-First approach to
programming, the central component of TDD. We study
this component in the context in which it is believed to be
most effective, namely, incremental development. To
undertake the investigation, we designed a controlled
226 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 31, NO. 3, MARCH 2005
. H. Erdogmus is with the Institute for Information Technology, National
Research Council Canada, Bldg. M-50, Montreal Rd., Ottawa, Ontario
K1A 0R6, Canada. E-mail: Hakan.Erdogmus@nrc-cnrc.gc.ca.
. M. Morisio and M. Torchiano are with the Dipartimento Automatica e
Informatica, Politecnico di Torino, C.so Duca degli Abruzzi, 24, I-10129,
Torino, Italy. E-mail: {Maurizio.Morisio, Marco.Torchiano}@polito.it.
Manuscript received 13 Feb. 2004; revised 6 Dec. 2004; accepted 31 Dec. 2004;
published online 20 Apr. 2005.
Recommended for acceptance by D. Rombach.
For information on obtaining reprints of this article, please send e-mail to:
tse@computer.org, and reference IEEECS Log Number TSE-0026-0204.
0098-5589/05/$20.00 ß 2005 IEEE Published by the IEEE Computer Society
362 citations
2
2 IEEE SOFTWARE Published by the IEEE Computer Society 0740-7459/04/$20.00 © 2004 IEEE
definition would improve discourse and enable
researchers to meta-analyze published empiri-
cal data.
After a speculative effort to define and clas-
sify COTS products,2 we decided to obtain this
much-needed understanding from the bottom
up. We asked people involved in industrial proj-
ects to name the key features in COTS-based de-
velopment and to tell us what they think con-
stitutes a COTS product. Here, we present the
interview results in the form of six theses, which
contradict widely accepted (or simply undis-
puted) ideas. We also present a definition of
“COTS product” that captures the key features.
The sample of projects we used is limited and
can’t represent the whole IT industry. Instead,
our study primarily considers Subject Matter
Experts. While SMEs produce most software,
they’re usually neglected by research. Specifi-
cally, the COTS-development literature seems
to focus on large, mostly governmental proj-
ects performed in major companies. Our goal
is to empirically explore this key topic for the
software industry, collect facts and derive well-
substantiated theses from these facts.
The groundwork
From a methodological viewpoint, we con-
ducted empirical research following two paral-
lel threads that complement each other: a sys-
tematic literature investigation and an on-field
exploratory study based on structured inter-
views. Because the industry doesn’t have a com-
mon understanding of COTS products, we laid
feature
Overlooked Aspects of
COTS-based Development
A
lthough developing with commercial-off-the-shelf components is
gaining more attention from both research and industrial com-
munities,1 most literature on the topic doesn’t clearly identify
context variables such as the type of products, projects, and sys-
tems. In particular, the literature often lacks a definition of “COTS prod-
uct,” or, when a definition is present, it invariably disagrees with other
studies (see the sidebar, “What COTS Product Do You Mean?”). A shared
cots
Studies on commercial-off-the-shelf-based development often
disagree, lack product and project details, and are founded on
uncritically accepted assumptions. This empirical study establishes
key features of industrial COTS-based development and creates a
definition of “COTS products” that captures these features.
Marco Torchiano, Norwegian University of Science and Technology
Maurizio Morisio, Politecnico di Torino
158 citations
Development with OTS components
• Development with Off-The-Shelf components
• Systems are built through integration of existing components
• The marketplace influences the evolution of OTS components
• Continuous trade-off between system and component features
• Strong push from DoD since late ‘90s
• Series of conferences driven by SEI since 2002
• As system become more complex all sw is based on OTS components
• Goal: state of the practice of OTS-based development in EU industry
OTS-Based Development: 6 theses
T1 Open source software is often used as closed source.
T2
Integration problems result from lack of standard compliance;
architectural mismatches constitute a secondary issue.
T3 Custom code mainly provides additional functionalities.
T4
Developers seldom use formal selection procedures.
Familiarity with either the product or the generic
architecture is the key factor in selection.
T5
Architecture is more important than requirements
for product selection.
T6
Integrators tend to influence the vendor concerning product
evolution whenever possible.
2 IEEE SOFTWARE Published by the IEEE Computer Society 0740-7459/04/$20.00 © 2004 IEEE
definition would improve discourse and enable
researchers to meta-analyze published empiri-
cal data.
After a speculative effort to define and clas-
sify COTS products,2 we decided to obtain this
much-needed understanding from the bottom
up. We asked people involved in industrial proj-
ects to name the key features in COTS-based de-
velopment and to tell us what they think con-
stitutes a COTS product. Here, we present the
interview results in the form of six theses, which
contradict widely accepted (or simply undis-
puted) ideas. We also present a definition of
“COTS product” that captures the key features.
The sample of projects we used is limited and
can’t represent the whole IT industry. Instead,
our study primarily considers Subject Matter
Experts. While SMEs produce most software,
they’re usually neglected by research. Specifi-
cally, the COTS-development literature seems
to focus on large, mostly governmental proj-
ects performed in major companies. Our goal
is to empirically explore this key topic for the
software industry, collect facts and derive well-
substantiated theses from these facts.
The groundwork
From a methodological viewpoint, we con-
ducted empirical research following two paral-
lel threads that complement each other: a sys-
tematic literature investigation and an on-field
exploratory study based on structured inter-
views. Because the industry doesn’t have a com-
mon understanding of COTS products, we laid
feature
Overlooked Aspects of
COTS-based Development
A
lthough developing with commercial-off-the-shelf components is
gaining more attention from both research and industrial com-
munities,1 most literature on the topic doesn’t clearly identify
context variables such as the type of products, projects, and sys-
tems. In particular, the literature often lacks a definition of “COTS prod-
uct,” or, when a definition is present, it invariably disagrees with other
studies (see the sidebar, “What COTS Product Do You Mean?”). A shared
cots
Studies on commercial-off-the-shelf-based development often
disagree, lack product and project details, and are founded on
uncritically accepted assumptions. This empirical study establishes
key features of industrial COTS-based development and creates a
definition of “COTS products” that captures these features.
Marco Torchiano, Norwegian University of Science and Technology
Maurizio Morisio, Politecnico di Torino
[49] B.H. Cohen, Explaining P
Wiley & Sons, 2000.
[50] Common Risks and Risk Re
www.mitre.org/work/s
CommonRisksCOTS.doc,
[51] V.R. Basili and B.W. Boeh
Computer, vol. 34, no. 5, p
[52] S. Lauesen, “COTS Ten
J. Requirements Eng., vol.
[53] J. Li, R. Conradi, O.P.N
Torchiano, and M. Morisio
Shelf Component Based
Software Metrics Symp., (a
[54] J. Cohen, P. Cohen, S.G.
Regression/Correlation Ana
Lawrence Erlbaum Assoc
[55] P.M. Podsakoof and D.W
Research: Problems and P
pp. 531-544, 1986.
[56] S.J. Pocock, M.D. Hughe
Reporting of Clinical Tria
The New England J. Medic
Jingyu
ter scie
Science
present
worked
NTNU.
enginee
velopm
project
program
Computer Society and the ACM
Reidar
degree
Science
in 1970
NTNU s
Depart
Science
at the F
ware E
Politecn
current research interests lie
improvement, version models,
ware, and associated empirica
Odd P
degree
Univers
fellow
Univers
heim, N
risk ma
ware e
open so
286
Thanks to:
1
On the Effectiveness of the
Test-First Approach to Programming
Hakan Erdogmus, Maurizio Morisio, Member, IEEE Computer Society, and
Marco Torchiano, Member, IEEE Computer Society
Abstract—Test-Driven Development (TDD) is based on formalizing a piece of functionality as a test, implementing the functionality
such that the test passes, and iterating the process. This paper describes a controlled experiment for evaluating an important aspect of
TDD: In TDD, programmers write functional tests before the corresponding implementation code. The experiment was conducted with
undergraduate students. While the experiment group applied a test-first strategy, the control group applied a more conventional
development technique, writing tests after the implementation. Both groups followed an incremental process, adding new features one
at a time and regression testing them. We found that test-first students on average wrote more tests and, in turn, students who wrote
more tests tended to be more productive. We also observed that the minimum quality increased linearly with the number of
programmer tests, independent of the development strategy employed.
Index Terms—General programming techniques, coding tools and techniques, testing and debugging, testing strategies, productivity,
Software Quality/SQA, software engineering process, programming paradigms.
æ
1 INTRODUCTION
TEST-DRIVEN Development [1], [2], or TDD, is an approach
to code development popularized by Extreme Program-
ming [3], [4]. The key aspect of TDD is that programmers
write low-level functional tests before production code. This
dynamic, referred to as Test-First, is the focus of this paper.
The technique is usually supported by a unit testing tool,
such as JUnit [5], [6].
Programmer tests in TDD, much like unit tests, tend to
be low-level. Sometimes they can be higher level and cross-
cutting, but do not in general address testing at integration
and system levels. TDD relies on no specific or formal
criterion to select the test cases. The tests are added
gradually during implementation. The testing strategy
employed differs from the situation where tests for a
module are written by a separate quality assurance team
before the module is available. In TDD, tests are written one
at a time and by the same person developing the module.
(Cleanroom [7], for example, also leverages tests that are
specified before implementation, but in a very different
way. Tests are specified to mirror a statistical user input
distribution as part of a formal reliability model, but this
activity is completely separated from the implementation
phase.)
TDD can be considered from several points of view:
. Feedback: Tests provide the programmer with
instant feedback as to whether new functionality
has been implemented as intended and whether it
interferes with old functionality.
. Task-orientation: Tests drive the coding activity,
encouraging the programmer to decompose the
problem into manageable, formalized programming
tasks, helping to maintain focus and providing
steady, measurable progress.
. Quality assurance: Having up-to-date tests in place
ensures a certain level of quality, maintained by
frequently running the tests.
. Low-level design: Tests provide the context in which
low-level design decisions are made, such as which
classes and methods to create, how they will be
named, what interfaces they will possess, and how
they will be used.
Although advocates claim that TDD enhances both
product quality and programmer productivity, skeptics
maintain that such an approach to programming would be
both counterproductive and hard to learn. It is easier to
imagine how interleaving coding with testing could
increase quality, but harder to imagine a similar positive
effect on productivity. Many practitioners consider tests as
overhead and testing as exclusively the job of a separate
quality assurance team. But, what if the benefits of
programmer tests outweigh their cost? If the technique
indeed has merit, a reexamination of the traditional coding
techniques could be warranted.
This paper presents a formal investigation of the
strengths and weaknesses of the Test-First approach to
programming, the central component of TDD. We study
this component in the context in which it is believed to be
most effective, namely, incremental development. To
undertake the investigation, we designed a controlled
226 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 31, NO. 3, MARCH 2005
. H. Erdogmus is with the Institute for Information Technology, National
Research Council Canada, Bldg. M-50, Montreal Rd., Ottawa, Ontario
K1A 0R6, Canada. E-mail: Hakan.Erdogmus@nrc-cnrc.gc.ca.
. M. Morisio and M. Torchiano are with the Dipartimento Automatica e
Informatica, Politecnico di Torino, C.so Duca degli Abruzzi, 24, I-10129,
Torino, Italy. E-mail: {Maurizio.Morisio, Marco.Torchiano}@polito.it.
Manuscript received 13 Feb. 2004; revised 6 Dec. 2004; accepted 31 Dec. 2004;
published online 20 Apr. 2005.
Recommended for acceptance by D. Rombach.
For information on obtaining reprints of this article, please send e-mail to:
tse@computer.org, and reference IEEECS Log Number TSE-0026-0204.
0098-5589/05/$20.00 ß 2005 IEEE Published by the IEEE Computer Society
362 citations
2
2 IEEE SOFTWARE Published by the IEEE Computer Society 0740-7459/04/$20.00 © 2004 IEEE
definition would improve discourse and enable
researchers to meta-analyze published empiri-
cal data.
After a speculative effort to define and clas-
sify COTS products,2 we decided to obtain this
much-needed understanding from the bottom
up. We asked people involved in industrial proj-
ects to name the key features in COTS-based de-
velopment and to tell us what they think con-
stitutes a COTS product. Here, we present the
interview results in the form of six theses, which
contradict widely accepted (or simply undis-
puted) ideas. We also present a definition of
“COTS product” that captures the key features.
The sample of projects we used is limited and
can’t represent the whole IT industry. Instead,
our study primarily considers Subject Matter
Experts. While SMEs produce most software,
they’re usually neglected by research. Specifi-
cally, the COTS-development literature seems
to focus on large, mostly governmental proj-
ects performed in major companies. Our goal
is to empirically explore this key topic for the
software industry, collect facts and derive well-
substantiated theses from these facts.
The groundwork
From a methodological viewpoint, we con-
ducted empirical research following two paral-
lel threads that complement each other: a sys-
tematic literature investigation and an on-field
exploratory study based on structured inter-
views. Because the industry doesn’t have a com-
mon understanding of COTS products, we laid
feature
Overlooked Aspects of
COTS-based Development
A
lthough developing with commercial-off-the-shelf components is
gaining more attention from both research and industrial com-
munities,1 most literature on the topic doesn’t clearly identify
context variables such as the type of products, projects, and sys-
tems. In particular, the literature often lacks a definition of “COTS prod-
uct,” or, when a definition is present, it invariably disagrees with other
studies (see the sidebar, “What COTS Product Do You Mean?”). A shared
cots
Studies on commercial-off-the-shelf-based development often
disagree, lack product and project details, and are founded on
uncritically accepted assumptions. This empirical study establishes
key features of industrial COTS-based development and creates a
definition of “COTS products” that captures these features.
Marco Torchiano, Norwegian University of Science and Technology
Maurizio Morisio, Politecnico di Torino
158 citations 3
The Journal of Systems and Software 86 (2013) 2110–2126
Contents lists available at SciVerse ScienceDirect
The Journal of Systems and Software
journal homepage: www.elsevier.com/locate/jss
Relevance, benefits, and problems of software modelling and model driven
techniques—A survey in the Italian industry
Marco Torchianoa,∗
, Federico Tomassettia
, Filippo Riccab
, Alessandro Tisob
, Gianna Reggiob
a
Dip. Automatica e Informatica, Politecnico di Torino, Italy
b
DIBRIS, Università di Genova, Italy
a r t i c l e i n f o
Article history:
Received 23 August 2012
Received in revised form 15 March 2013
Accepted 18 March 2013
Available online 1 April 2013
Keywords:
Model driven techniques
Software modelling
Personal opinion survey
a b s t r a c t
Context: Claimed benefits of software modelling and model driven techniques are improvements in pro-
ductivity, portability, maintainability and interoperability. However, little effort has been devoted at
collecting evidence to evaluate their actual relevance, benefits and usage complications.
Goal: The main goals of this paper are: (1) assess the diffusion and relevance of software modelling and
MD techniques in the Italian industry, (2) understand the expected and achieved benefits, and (3) identify
which problems limit/prevent their diffusion.
Method: We conducted an exploratory personal opinion survey with a sample of 155 Italian software
professionals by means of a Web-based questionnaire on-line from February to April 2011.
Results: Software modelling and MD techniques are very relevant in the Italian industry. The adoption of
simple modelling brings common benefits (better design support, documentation improvement, better
maintenance, and higher software quality), while MD techniques make it easier to achieve: improved
standardization, higher productivity, and platform independence. We identified problems, some hinder-
ing adoption (too much effort required and limited usefulness) others preventing it (lack of competencies
and supporting tools).
Conclusions: The relevance represents an important objective motivation for researchers in this area.
The relationship between techniques and attainable benefits represents an instrument for practitioners
planning the adoption of such techniques. In addition the findings may provide hints for companies and
universities.
© 2013 Elsevier Inc. All rights reserved.
1. Introduction
Usually, model-based techniques use models to describe the
architecture and design of a system and/or the behaviour of soft-
ware artefacts generally through a raise in the level of abstraction
(France and Rumpe, 2003). Models can also be used for other
purposes, e.g., to describe business workflows or development pro-
cesses.
They can be used in different phases of the development process
as communication artefacts, as points of references against which
subsequent implementations are verified, or as the basis for further
development. In the latter case, they may become the key elements
in the process, from which other artefacts (most notably code) are
generated.
∗ Corresponding author at: Dip. Automatica e Informatica, Politecnico di Torino,
C.so Duca degli Abruzzi 24, 10129 Torino, Italy. Tel.: +39 011 564 7088;
fax: +39 011 564 7099.
E-mail address: marco.torchiano@polito.it (M. Torchiano).
The term “model” is very general; it is difficult to provide a com-
prehensive yet precise definition of a model. For us, a model is an
artefact realized with the goal to capture and transfer information
in a form as pure as possible limiting the syntax noise. Exam-
ples of models include UML design models (Object Management
Group, 2011b), process models defined through BPMN (Object
Management Group, 2011a), Web application models defined
through WebML (Ceri et al., 2000) as well as textual Domain Specific
Language (DSL) models (Fowler and Parsons, 2011).
Given that models are so heterogeneous and the processes
involving them so varied, it is actually difficult to define the exact
boundaries for modelling. Moreover, models can be used practically
in different ways and with different levels of maturity (Tomassetti
et al., 2012). While in theory we have a set of best practices, in
practice we only found mixed, half baked, and personalized solu-
tions. As a consequence, the different uses of modelling are complex
and difficult to classify precisely.
When models constitute a crucial and operational ingredi-
ent of software development then the process is called model
driven. There are different model driven techniques: Model Driven
0164-1212/$ –see front matter © 2013 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.jss.2013.03.084
67 citations
MDD State of the Practice
• Model-Driven Development is not just modeling
• Code generation
• Model interpretation
• Model transformation
• Often, toolsmithing
• Promised improvements:
• Productivity
• Portability
• Maintainability and
• Interoperability
• …
MDD ≠
MDD Benefits The Journal of Systems and Software 86 (2013) 2110–2126
Contents lists available at SciVerse ScienceDirect
The Journal of Systems and Software
journal homepage: www.elsevier.com/locate/jss
Relevance, benefits, and problems of software modelling and model driven
techniques—A survey in the Italian industry
Marco Torchianoa,∗
, Federico Tomassettia
, Filippo Riccab
, Alessandro Tisob
, Gianna Reggiob
a
Dip. Automatica e Informatica, Politecnico di Torino, Italy
b
DIBRIS, Università di Genova, Italy
a r t i c l e i n f o
Article history:
Received 23 August 2012
Received in revised form 15 March 2013
Accepted 18 March 2013
Available online 1 April 2013
Keywords:
Model driven techniques
Software modelling
Personal opinion survey
a b s t r a c t
Context: Claimed benefits of software modelling and model driven techniques are improvements in pro-
ductivity, portability, maintainability and interoperability. However, little effort has been devoted at
collecting evidence to evaluate their actual relevance, benefits and usage complications.
Goal: The main goals of this paper are: (1) assess the diffusion and relevance of software modelling and
MD techniques in the Italian industry, (2) understand the expected and achieved benefits, and (3) identify
which problems limit/prevent their diffusion.
Method: We conducted an exploratory personal opinion survey with a sample of 155 Italian software
professionals by means of a Web-based questionnaire on-line from February to April 2011.
Results: Software modelling and MD techniques are very relevant in the Italian industry. The adoption of
simple modelling brings common benefits (better design support, documentation improvement, better
maintenance, and higher software quality), while MD techniques make it easier to achieve: improved
standardization, higher productivity, and platform independence. We identified problems, some hinder-
ing adoption (too much effort required and limited usefulness) others preventing it (lack of competencies
and supporting tools).
Conclusions: The relevance represents an important objective motivation for researchers in this area.
The relationship between techniques and attainable benefits represents an instrument for practitioners
planning the adoption of such techniques. In addition the findings may provide hints for companies and
universities.
© 2013 Elsevier Inc. All rights reserved.
1. Introduction
Usually, model-based techniques use models to describe the
architecture and design of a system and/or the behaviour of soft-
ware artefacts generally through a raise in the level of abstraction
(France and Rumpe, 2003). Models can also be used for other
purposes, e.g., to describe business workflows or development pro-
cesses.
They can be used in different phases of the development process
as communication artefacts, as points of references against which
subsequent implementations are verified, or as the basis for further
development. In the latter case, they may become the key elements
in the process, from which other artefacts (most notably code) are
generated.
∗ Corresponding author at: Dip. Automatica e Informatica, Politecnico di Torino,
C.so Duca degli Abruzzi 24, 10129 Torino, Italy. Tel.: +39 011 564 7088;
fax: +39 011 564 7099.
E-mail address: marco.torchiano@polito.it (M. Torchiano).
The term “model” is very general; it is difficult to provide a com-
prehensive yet precise definition of a model. For us, a model is an
artefact realized with the goal to capture and transfer information
in a form as pure as possible limiting the syntax noise. Exam-
ples of models include UML design models (Object Management
Group, 2011b), process models defined through BPMN (Object
Management Group, 2011a), Web application models defined
through WebML (Ceri et al., 2000) as well as textual Domain Specific
Language (DSL) models (Fowler and Parsons, 2011).
Given that models are so heterogeneous and the processes
involving them so varied, it is actually difficult to define the exact
boundaries for modelling. Moreover, models can be used practically
in different ways and with different levels of maturity (Tomassetti
et al., 2012). While in theory we have a set of best practices, in
practice we only found mixed, half baked, and personalized solu-
tions. As a consequence, the different uses of modelling are complex
and difficult to classify precisely.
When models constitute a crucial and operational ingredi-
ent of software development then the process is called model
driven. There are different model driven techniques: Model Driven
0164-1212/$ –see front matter © 2013 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.jss.2013.03.084
2126 M. Torchiano et al. / The Journal of Systems and Software 86 (2013) 2110–2126
Motulsky, H., 2010. Intuitive Biostatistics: A Nonmathematical Guide to Statistical
Thinking. Oxford University Press, New York.
Nugroho, A., Chaudron, M.R.,2008. A survey into the rigor of UML use and its
perceived impact on quality and productivity. In: Proceedings of the ACM-IEEE
International Symposium on Empirical Software Engineering and Measurement,
ESEM’08. ACM, New York, NY, USA, pp. 90–99.
Object Management Group, 2011a. Business Process Model and Notation, v. 2.0,
Standard. Object Management Group.
Object Management Group, 2011b. Unified Modeling Language, Superstructure, v.
2.4, Specifications. Object Management Group.
Punter, T., Ciolkowski, M., Freimut, B., John, I., 2003. Conducting on-line surveys
in software engineering. In: Proceedings of the International Symposium on
Empirical Software Engineering (ISESE), pp. 80–88.
Schmidt, D.C., 2006. Guest editor’s introduction: model-driven engineering. Com-
puter 39, 25–31.
Selic, B., 2003. The pragmatics of model-driven development. IEEE Software 20,
19–25.
Singer, J., Storey, M., Damian, D., 2008. Selecting empirical methods for software
engineering research. In: Singer, Shull, Sjoberg (Eds.), Guide to Advanced Empir-
ical Software Engineering. Springer, London, pp. 285–311.
Tomassetti, F., Torchiano, M., Tiso, A., Ricca, F., Reggio, G., 2012. Maturity of software
modelling and model driven engineering: a survey in the Italian industry. IET
Seminar Digests 2012, 91–100.
Torchiano, M., Di Penta, M., Ricca, F., De Lucia, A., Lanubile, F., 2011a. Migration
of information systems in the Italian industry: a state of the practice survey.
Information and Software Technology 53, 71–86.
Torchiano, M., Tomassetti, F., Ricca, F., Tiso, A., Reggio, G.,2011b. Preliminary findings
from a survey on the MD* state of the practice. In: International Symposium on
Empirical Software Engineering and Measurement (ESEM). IEEE, pp. 372–375.
van Deursen, A., Visser, E., Warmer, J., 2007. Model-driven software evolution: a
research agenda. Technical Report, TUDelft-SERG.
Voelter, M., 2009. Best practices for DSLs and model-driven development. Journal of
Object Technology 8 http://www.jot.fm
Walonick, D.S., 1997. Survival Statistics. StatPac, Inc.
Wohlin, C., Runeson, P., Höst, M., Ohlsson, M., Regnell, B., Wesslén, A., 2000.
Experimentation in Software Engineering – An Introduction. Kluwer Academic
Publishers.
Marco Torchiano is an associate professor at Politecnico
di Torino, Italy; he has been post-doctoral research fel-
low at Norwegian University of Science and Technology
(NTNU), Norway. He received an MSc and a PhD in Com-
puter Engineering from Politecnico di Torino, Italy. He
is author or coauthor of more than 100 research papers
published in international journals and conferences. He is
the co-author of the book ‘Software Development–Case
studies in Java’ from Addison-Wesley, and co-editor of
the book ‘Developing Services for the Wireless Internet’
from Springer. His current research interests are: design
notations, testing methodologies, OTS-based develop-
ment and software engineering for mobile and wireless
applications. The methodological approach he adopts is that of empirical software
engineering.
Federico Tomassetti is a PhD student in the Dept. of Con-
trol and Computer Engineering at Politecnico di Torino.
He is interested in all the means to lift the level of abstrac-
tion, in particular in model-driven development, domain
specific languages and language workbenches. He has
experience also as developer, consultant and technical
writer.
Filippo Ricca is an ass
of Genova, Italy. He r
puter Science from th
the thesis “Analysis, T
Applications”. In 2011
(Most Influential Pape
and Testing of Web Ap
He is author or co
papers published in i
ences/workshops. At th
h-index reported by G
the number of citation
Chair of CSMR 2013, IC
others, he served in the program committees of
ICST, SCAM, CSMR, WCRE and ESEM.
From 1999 to 2006, he worked with the Soft
(now FBK-irst), Trento, Italy. During this time he
on Reverse engineering, Re-engineering and Soft
His current research interests include: Softwa
Empirical studies in Software Engineering, Web
The research is mainly conducted through empir
controlled experiments and surveys.
Alessandro Tiso is a Ph
University of Genoa. He
applied mathematics i
interests are: model dr
mations and use of mod
last 20 years he also w
industry.
Gianna Reggio is Ass
Genova since 1992. Pre
at the same university,
in 1988. She took part i
them SAHARA and SAL
eral international conf
relevant being ETAPS
been a member of the p
national conferences a
2008, and 2007). Her
on the following topics
concurrent systems m
and semantics of prog
started working on development methods for S
several national and international conference an
2126 M. Torchiano et al. / The Journal of Systems and Software 86 (2013) 2110–2126
Motulsky, H., 2010. Intuitive Biostatistics: A Nonmathematical Guide to Statistical
Thinking. Oxford University Press, New York.
Nugroho, A., Chaudron, M.R.,2008. A survey into the rigor of UML use and its
perceived impact on quality and productivity. In: Proceedings of the ACM-IEEE
International Symposium on Empirical Software Engineering and Measurement,
ESEM’08. ACM, New York, NY, USA, pp. 90–99.
Object Management Group, 2011a. Business Process Model and Notation, v. 2.0,
Standard. Object Management Group.
Object Management Group, 2011b. Unified Modeling Language, Superstructure, v.
2.4, Specifications. Object Management Group.
Punter, T., Ciolkowski, M., Freimut, B., John, I., 2003. Conducting on-line surveys
in software engineering. In: Proceedings of the International Symposium on
Empirical Software Engineering (ISESE), pp. 80–88.
Schmidt, D.C., 2006. Guest editor’s introduction: model-driven engineering. Com-
puter 39, 25–31.
Selic, B., 2003. The pragmatics of model-driven development. IEEE Software 20,
19–25.
Singer, J., Storey, M., Damian, D., 2008. Selecting empirical methods for software
engineering research. In: Singer, Shull, Sjoberg (Eds.), Guide to Advanced Empir-
ical Software Engineering. Springer, London, pp. 285–311.
Tomassetti, F., Torchiano, M., Tiso, A., Ricca, F., Reggio, G., 2012. Maturity of software
modelling and model driven engineering: a survey in the Italian industry. IET
Seminar Digests 2012, 91–100.
Torchiano, M., Di Penta, M., Ricca, F., De Lucia, A., Lanubile, F., 2011a. Migration
of information systems in the Italian industry: a state of the practice survey.
Information and Software Technology 53, 71–86.
Torchiano, M., Tomassetti, F., Ricca, F., Tiso, A., Reggio, G.,2011b. Preliminary findings
from a survey on the MD* state of the practice. In: International Symposium on
Empirical Software Engineering and Measurement (ESEM). IEEE, pp. 372–375.
van Deursen, A., Visser, E., Warmer, J., 2007. Model-driven software evolution: a
research agenda. Technical Report, TUDelft-SERG.
Voelter, M., 2009. Best practices for DSLs and model-driven development. Journal of
Object Technology 8 http://www.jot.fm
Walonick, D.S., 1997. Survival Statistics. StatPac, Inc.
Wohlin, C., Runeson, P., Höst, M., Ohlsson, M., Regnell, B., Wesslén, A., 2000.
Experimentation in Software Engineering – An Introduction. Kluwer Academic
Publishers.
Marco Torchiano is an associate professor at Politecnico
di Torino, Italy; he has been post-doctoral research fel-
low at Norwegian University of Science and Technology
(NTNU), Norway. He received an MSc and a PhD in Com-
puter Engineering from Politecnico di Torino, Italy. He
is author or coauthor of more than 100 research papers
published in international journals and conferences. He is
the co-author of the book ‘Software Development–Case
studies in Java’ from Addison-Wesley, and co-editor of
the book ‘Developing Services for the Wireless Internet’
from Springer. His current research interests are: design
notations, testing methodologies, OTS-based develop-
ment and software engineering for mobile and wireless
applications. The methodological approach he adopts is that of empirical software
engineering.
Federico Tomassetti is a PhD student in the Dept. of Con-
trol and Computer Engineering at Politecnico di Torino.
He is interested in all the means to lift the level of abstrac-
tion, in particular in model-driven development, domain
specific languages and language workbenches. He has
experience also as developer, consultant and technical
writer.
Filippo Ricca is an assistant professor at the University
of Genova, Italy. He received his PhD degree in Com-
puter Science from the same University, in 2003, with
the thesis “Analysis, Testing and Re-structuring of Web
Applications”. In 2011 he was awarded the ICSE 2001 MIP
(Most Influential Paper) award, for his paper: “Analysis
and Testing of Web Applications”.
He is author or coauthor of more than 90 research
papers published in international journals and confer-
ences/workshops. At the time of writing (20/03/2013), the
h-index reported by Google Scholar is 24 (13 in Scopus);
the number of citations 2002. Filippo Ricca was Program
Chair of CSMR 2013, ICPC 2011 and WSE 2008. Among the
others, he served in the program committees of the following conferences: ICSM,
ICST, SCAM, CSMR, WCRE and ESEM.
From 1999 to 2006, he worked with the Software Engineering group at ITC-irst
(now FBK-irst), Trento, Italy. During this time he was part of the team that worked
on Reverse engineering, Re-engineering and Software Testing.
His current research interests include: Software modeling, Reverse engineering,
Empirical studies in Software Engineering, Web applications and Software Testing.
The research is mainly conducted through empirical methods such as case studies,
controlled experiments and surveys.
Alessandro Tiso is a PhD student in Computer Science at
University of Genoa. He is also a teacher of informatics and
applied mathematics in a high school. His main research
interests are: model driven development, model transfor-
mations and use of models in software engineering. In the
last 20 years he also worked as an IT consultant in the
industry.
Gianna Reggio is Associate Professor at University of
Genova since 1992. Previously she was Assistant Professor
at the same university, she receive the PhD in Informatics
in 1988. She took part in several research projects, among
them SAHARA and SALADIN (PRIN) and co-organized sev-
eral international conferences and workshops, the most
relevant being ETAPS 2001 and MODELS 2006. She has
been a member of the program committee of several inter-
national conferences and in particular MODELS (in 2009,
2008, and 2007). Her research activity focused mainly
on the following topics: software development methods,
concurrent systems modeling and formal specification,
and semantics of programming languages. Recently she
started working on development methods for SOA applications. She is author of
several national and international conference and journals papers and books.
2126 M. Torchiano et al. / The Journal of Systems and Software 86 (2013) 2110–2126
Motulsky, H., 2010. Intuitive Biostatistics: A Nonmathematical Guide to Statistical
Thinking. Oxford University Press, New York.
Nugroho, A., Chaudron, M.R.,2008. A survey into the rigor of UML use and its
perceived impact on quality and productivity. In: Proceedings of the ACM-IEEE
International Symposium on Empirical Software Engineering and Measurement,
ESEM’08. ACM, New York, NY, USA, pp. 90–99.
Object Management Group, 2011a. Business Process Model and Notation, v. 2.0,
Standard. Object Management Group.
Object Management Group, 2011b. Unified Modeling Language, Superstructure, v.
2.4, Specifications. Object Management Group.
Punter, T., Ciolkowski, M., Freimut, B., John, I., 2003. Conducting on-line surveys
in software engineering. In: Proceedings of the International Symposium on
Empirical Software Engineering (ISESE), pp. 80–88.
Schmidt, D.C., 2006. Guest editor’s introduction: model-driven engineering. Com-
puter 39, 25–31.
Selic, B., 2003. The pragmatics of model-driven development. IEEE Software 20,
19–25.
Singer, J., Storey, M., Damian, D., 2008. Selecting empirical methods for software
engineering research. In: Singer, Shull, Sjoberg (Eds.), Guide to Advanced Empir-
ical Software Engineering. Springer, London, pp. 285–311.
Tomassetti, F., Torchiano, M., Tiso, A., Ricca, F., Reggio, G., 2012. Maturity of software
modelling and model driven engineering: a survey in the Italian industry. IET
Seminar Digests 2012, 91–100.
Torchiano, M., Di Penta, M., Ricca, F., De Lucia, A., Lanubile, F., 2011a. Migration
of information systems in the Italian industry: a state of the practice survey.
Information and Software Technology 53, 71–86.
Torchiano, M., Tomassetti, F., Ricca, F., Tiso, A., Reggio, G.,2011b. Preliminary findings
from a survey on the MD* state of the practice. In: International Symposium on
Empirical Software Engineering and Measurement (ESEM). IEEE, pp. 372–375.
van Deursen, A., Visser, E., Warmer, J., 2007. Model-driven software evolution: a
research agenda. Technical Report, TUDelft-SERG.
Voelter, M., 2009. Best practices for DSLs and model-driven development. Journal of
Object Technology 8 http://www.jot.fm
Walonick, D.S., 1997. Survival Statistics. StatPac, Inc.
Wohlin, C., Runeson, P., Höst, M., Ohlsson, M., Regnell, B., Wesslén, A., 2000.
Experimentation in Software Engineering – An Introduction. Kluwer Academic
Publishers.
Marco Torchiano is an associate professor at Politecnico
di Torino, Italy; he has been post-doctoral research fel-
low at Norwegian University of Science and Technology
(NTNU), Norway. He received an MSc and a PhD in Com-
puter Engineering from Politecnico di Torino, Italy. He
is author or coauthor of more than 100 research papers
published in international journals and conferences. He is
the co-author of the book ‘Software Development–Case
studies in Java’ from Addison-Wesley, and co-editor of
the book ‘Developing Services for the Wireless Internet’
from Springer. His current research interests are: design
notations, testing methodologies, OTS-based develop-
ment and software engineering for mobile and wireless
applications. The methodological approach he adopts is that of empirical software
engineering.
Federico Tomassetti is a PhD student in the Dept. of Con-
trol and Computer Engineering at Politecnico di Torino.
He is interested in all the means to lift the level of abstrac-
tion, in particular in model-driven development, domain
specific languages and language workbenches. He has
experience also as developer, consultant and technical
writer.
Filippo Ricca is an assistant professor at the University
of Genova, Italy. He received his PhD degree in Com-
puter Science from the same University, in 2003, with
the thesis “Analysis, Testing and Re-structuring of Web
Applications”. In 2011 he was awarded the ICSE 2001 MIP
(Most Influential Paper) award, for his paper: “Analysis
and Testing of Web Applications”.
He is author or coauthor of more than 90 research
papers published in international journals and confer-
ences/workshops. At the time of writing (20/03/2013), the
h-index reported by Google Scholar is 24 (13 in Scopus);
the number of citations 2002. Filippo Ricca was Program
Chair of CSMR 2013, ICPC 2011 and WSE 2008. Among the
others, he served in the program committees of the following conferences: ICSM,
ICST, SCAM, CSMR, WCRE and ESEM.
From 1999 to 2006, he worked with the Software Engineering group at ITC-irst
(now FBK-irst), Trento, Italy. During this time he was part of the team that worked
on Reverse engineering, Re-engineering and Software Testing.
His current research interests include: Software modeling, Reverse engineering,
Empirical studies in Software Engineering, Web applications and Software Testing.
The research is mainly conducted through empirical methods such as case studies,
controlled experiments and surveys.
Alessandro Tiso is a PhD student in Computer Science at
University of Genoa. He is also a teacher of informatics and
applied mathematics in a high school. His main research
interests are: model driven development, model transfor-
mations and use of models in software engineering. In the
last 20 years he also worked as an IT consultant in the
industry.
Gianna Reggio is Associate Professor at University of
Genova since 1992. Previously she was Assistant Professor
at the same university, she receive the PhD in Informatics
in 1988. She took part in several research projects, among
them SAHARA and SALADIN (PRIN) and co-organized sev-
eral international conferences and workshops, the most
relevant being ETAPS 2001 and MODELS 2006. She has
been a member of the program committee of several inter-
national conferences and in particular MODELS (in 2009,
2008, and 2007). Her research activity focused mainly
on the following topics: software development methods,
concurrent systems modeling and formal specification,
and semantics of programming languages. Recently she
started working on development methods for SOA applications. She is author of
several national and international conference and journals papers and books.
2126 M. Torchiano et al. / The Journal of Systems and Software 86 (2013) 2110–2126
Motulsky, H., 2010. Intuitive Biostatistics: A Nonmathematical Guide to Statistical
Thinking. Oxford University Press, New York.
Nugroho, A., Chaudron, M.R.,2008. A survey into the rigor of UML use and its
perceived impact on quality and productivity. In: Proceedings of the ACM-IEEE
International Symposium on Empirical Software Engineering and Measurement,
ESEM’08. ACM, New York, NY, USA, pp. 90–99.
Object Management Group, 2011a. Business Process Model and Notation, v. 2.0,
Standard. Object Management Group.
Object Management Group, 2011b. Unified Modeling Language, Superstructure, v.
2.4, Specifications. Object Management Group.
Punter, T., Ciolkowski, M., Freimut, B., John, I., 2003. Conducting on-line surveys
in software engineering. In: Proceedings of the International Symposium on
Empirical Software Engineering (ISESE), pp. 80–88.
Schmidt, D.C., 2006. Guest editor’s introduction: model-driven engineering. Com-
puter 39, 25–31.
Selic, B., 2003. The pragmatics of model-driven development. IEEE Software 20,
19–25.
Singer, J., Storey, M., Damian, D., 2008. Selecting empirical methods for software
engineering research. In: Singer, Shull, Sjoberg (Eds.), Guide to Advanced Empir-
ical Software Engineering. Springer, London, pp. 285–311.
Tomassetti, F., Torchiano, M., Tiso, A., Ricca, F., Reggio, G., 2012. Maturity of software
modelling and model driven engineering: a survey in the Italian industry. IET
Seminar Digests 2012, 91–100.
Torchiano, M., Di Penta, M., Ricca, F., De Lucia, A., Lanubile, F., 2011a. Migration
of information systems in the Italian industry: a state of the practice survey.
Information and Software Technology 53, 71–86.
Torchiano, M., Tomassetti, F., Ricca, F., Tiso, A., Reggio, G.,2011b. Preliminary findings
from a survey on the MD* state of the practice. In: International Symposium on
Empirical Software Engineering and Measurement (ESEM). IEEE, pp. 372–375.
van Deursen, A., Visser, E., Warmer, J., 2007. Model-driven software evolution: a
research agenda. Technical Report, TUDelft-SERG.
Voelter, M., 2009. Best practices for DSLs and model-driven development. Journal of
Object Technology 8 http://www.jot.fm
Walonick, D.S., 1997. Survival Statistics. StatPac, Inc.
Wohlin, C., Runeson, P., Höst, M., Ohlsson, M., Regnell, B., Wesslén, A., 2000.
Experimentation in Software Engineering – An Introduction. Kluwer Academic
Publishers.
Marco Torchiano is an associate professor at Politecnico
di Torino, Italy; he has been post-doctoral research fel-
low at Norwegian University of Science and Technology
(NTNU), Norway. He received an MSc and a PhD in Com-
puter Engineering from Politecnico di Torino, Italy. He
is author or coauthor of more than 100 research papers
published in international journals and conferences. He is
the co-author of the book ‘Software Development–Case
studies in Java’ from Addison-Wesley, and co-editor of
the book ‘Developing Services for the Wireless Internet’
from Springer. His current research interests are: design
notations, testing methodologies, OTS-based develop-
ment and software engineering for mobile and wireless
applications. The methodological approach he adopts is that of empirical software
engineering.
Federico Tomassetti is a PhD student in the Dept. of Con-
trol and Computer Engineering at Politecnico di Torino.
He is interested in all the means to lift the level of abstrac-
tion, in particular in model-driven development, domain
specific languages and language workbenches. He has
experience also as developer, consultant and technical
writer.
Filippo Ricca is an assistant professor at th
of Genova, Italy. He received his PhD degr
puter Science from the same University, in
the thesis “Analysis, Testing and Re-structu
Applications”. In 2011 he was awarded the IC
(Most Influential Paper) award, for his pape
and Testing of Web Applications”.
He is author or coauthor of more than
papers published in international journals
ences/workshops. At the time of writing (20/0
h-index reported by Google Scholar is 24 (13
the number of citations 2002. Filippo Ricca w
Chair of CSMR 2013, ICPC 2011 and WSE 2008
others, he served in the program committees of the following confere
ICST, SCAM, CSMR, WCRE and ESEM.
From 1999 to 2006, he worked with the Software Engineering grou
(now FBK-irst), Trento, Italy. During this time he was part of the team
on Reverse engineering, Re-engineering and Software Testing.
His current research interests include: Software modeling, Reverse
Empirical studies in Software Engineering, Web applications and Softw
The research is mainly conducted through empirical methods such as
controlled experiments and surveys.
Alessandro Tiso is a PhD student in Comput
University of Genoa. He is also a teacher of info
applied mathematics in a high school. His m
interests are: model driven development, mo
mations and use of models in software engine
last 20 years he also worked as an IT consu
industry.
Gianna Reggio is Associate Professor at U
Genova since 1992. Previously she was Assista
at the same university, she receive the PhD in
in 1988. She took part in several research pro
them SAHARA and SALADIN (PRIN) and co-or
eral international conferences and worksho
relevant being ETAPS 2001 and MODELS 20
been a member of the program committee of s
national conferences and in particular MODE
2008, and 2007). Her research activity foc
on the following topics: software developme
concurrent systems modeling and formal s
and semantics of programming languages. R
started working on development methods for SOA applications. She
several national and international conference and journals papers and
Current Research Topics
•Data Quality
•Testing of Mobile App UI
Data Quality
• Data Quality
• Data is a key component in many domains
• Low quality makes the outcomes useless or outright dangerous
• Data is not (anymore) just a part of software
• New ISO quality standards: ISO-25012 , ISO-25024
• Challenges (focus on intrinsic characteristics):
• Practical application of quality measures to concrete cases (Open-Data)
• Quality assessment of unstructured data (NoSql, semantic KB)
Data Quality: practical application
• Open Government Data
• Defined a set of metrics from literature
• Analyzed + municipal data
• Able to identify quality issues
• Different issues depending on disclosure process
• Technical specification: UNI/TS 11725:2018
• Guidelines for the measurement of data quality
• As a member of UNI CT 504
Open data quality measurement framework: Definition and application
to Open Government Data
Antonio Vetrò ⁎, Lorenzo Canova, Marco Torchiano, Camilo Orozco Minotas,
Raimondo Iemma, Federico Morando
Nexa Center for Internet & Society, DAUIN, Politecnico di Torino, Italy
a b s t r a c ta r t i c l e i n f o
Article history:
Received 5 March 2015
Received in revised form 1 February 2016
Accepted 3 February 2016
Available online 20 February 2016
The diffusion of Open Government Data (OGD) in recent years kept a very fast pace. However, evidence from
practitioners shows that disclosing data without proper quality control may jeopardize dataset reuse and nega-
tively affect civic participation. Current approaches to the problem in literature lack a comprehensive theoretical
framework. Moreover, most of the evaluations concentrate on open data platforms, rather than on datasets.
In this work, we address these two limitations and set up a framework of indicators to measure the quality of
Open Government Data on a series of data quality dimensions at most granular level of measurement. We vali-
dated the evaluation framework by applying it to compare two cases of Italian OGD datasets: an internationally
recognized good example of OGD, with centralized disclosure and extensive data quality controls, and samples of
OGD from decentralized data disclosure (municipality level), with no possibility of extensive quality controls as
in the former case, hence with supposed lower quality.
Starting from measurements based on the quality framework, we were able to verify the difference in quality: the
measures showed a few common acquired good practices and weaknesses, and a set of discriminating factors
that pertain to the type of datasets and the overall approach. On the basis of this evaluation, we also provided
technical and policy guidelines to overcome the weaknesses observed in the decentralized release policy, ad-
dressing specific quality aspects.
© 2016 Elsevier Inc. All rights reserved.
Keywords:
Open Government Data
Open data quality
Government information quality
Data quality measurement
Empirical assessment
1. Introduction
Open data are data that “can be freely used, modified, and shared by
anyone for any purpose” (Web ref. 1). Compared to proprietary frame-
works, digital commons such as open data are characterized — from
both a legal and a technical point of view — by lower restrictions applied
to their circulation and reuse. This feature is supposed to ultimately fos-
ter collaboration, creativity and innovation (Hofmokl, 2010).
At all administrative levels, the public sector is one of the major pro-
ducers and holders of information, which ranges, e.g., from maps to
companies registers (Aichholzer & Burkert, 2004). During the last
years, the amount and variety of open data released by public adminis-
trations across the world has been tangibly growing (see, e.g., the Open
Data Census by the Open Knowledge Foundation (Web ref. 2)), while
increased political awareness on the subject has been translated in
regulation, including the revised of the EU Directive on Public Sector
Information reuse in 2013, as well as national roadmaps and technical
guidelines. Releasing public sector information as open data can provide
considerable added value, meeting a demand coming from all kinds of
actors, ranging from companies to Non-Governmental Organizations,
from developers to simple citizens. Many suggest that wider and easier
circulation of public datasets could entail interesting (and even unex-
pected) forms of reuse, also for commercial purposes (Vickery, 2011),
and in general improve transparency of public institutions (Stiglitz,
Orszag, & Orszag, 2000; Ubaldi, 2013) and distributed ability to inter-
pret complex phenomena (Janssen, Charalabidis, & Zuiderwijk, 2012).
From the point of view of data reusers, we should take into account
the role of the so-called infomediaries, i.e., players that are able to inter-
pret data and present them effectively to the general public (Mayer-
Schönberger & Zappia, 2011), with a highly diversified set of business
models (Zuiderwijk, Janssen, & Davis, 2014). More in general, open
data consumers manipulate data in many ways, ranging, e.g., from
data integration to classification, also depending on the complementary
assets they hold (Ferro & Osella, 2013). Considering this, legal and
technical openness of datasets is not sufficient, by itself, to create a pro-
lific reuse ecosystem (Helbig, Nakashima, & Dawe, 2012): failures in
providing good quality information might impair not only the reuse of
the data, but also the usage of the institutional portals (Detlor, Hupfer,
Ruhi, & Zhao, 2013). Attempts to increase meaningfulness and reusabil-
ity of public sector information also imply representing and exposing
data so that they can be easily accessed, queried, processed and linked
with other data with no restrictions (Sharon, 2010). Although sharing
Government Information Quarterly 33 (2016) 325–337
⁎ Corresponding author at: Nexa Center for Internet & Society, Politecnico di Torino, Via
Pier Carlo Boggio, 65/A, 10138 Torino, Italy.
E-mail address: antonio.vetro@polito.it (A. Vetrò).
http://dx.doi.org/10.1016/j.giq.2016.02.001
0740-624X/© 2016 Elsevier Inc. All rights reserved.
Contents lists available at ScienceDirect
Government Information Quarterly
journal homepage: www.elsevier.com/locate/govinf
Barlett, D. L., & Steele, J. B. (1985). Forevermore: Nuclear waste in America.
Basili, V., Caldiera, G., & Rombach, H. D. (1994). The goal question metric approach. In:
Encyclopedia of software engineering. John Wiley & Sons, 528–532.
Batini, C., & Scannapieco, M. (2006). Data quality, concepts, methodologies and techniques.
Berlin: Springer.
Batini, C., Cappiello, C., Francalanci, C., & Maurino A. Methodologies for data quality as-
sessment and improvement. ACM Computing Surveys 41, 3, Article 16 (July 2009),
(52 pp.).
Behkamal, Behshid, Kahani, Mohsen, Bagheri, Ebrahim, & Jeremic, Zoran (May 2014).
A metrics-driven approach for quality assessment of linked open data. Journal of
Theoretical and Applied Electronic Commerce Research, 9(2), 64–79.
Berners-Lee, T. (2006). Linked data-design issues. Tech. rep., W3C http://www.w3.org/
DesignIssues/LinkedData.html
Bovee, M., Srivastava, R., & Mak, B. (2003). Conceptual framework and belief-function
approach to assessing overall information quality. International Journal of Intelligent
Systems, 51–74.
Calero, C., Caro, A., & Piattini, M. (2008). An applicable data quality model for web portal
data consumers. World Wide Web, 11(4), 465–484.
Canova, L., Basso, S., Iemma, R., & Morando, F. (2015). Collaborative open data versioning:
A pragmatic approach using linked data. International Conference for E-Democracy and
Open Government 2015 (CeDEM15), Krems, Austria.
Catarci, T., & Scannapieco, M. (2002). Data quality under the computer science perspec-
tive. Archivi Computer, 2.
Detlor, B., Hupfer, Maureen E, Ruhi, U. & Zhao, L. Information quality and community
municipal portal use, Government Information Quarterly, Volume 30, Issue 1,
January 2013, Pages 23-32, http://dx.doi.org/10.1016/j.giq.2012.08.00, (ISSN
0740-624X).
English, L. (1999). Improving data warehouse and business information quality. Wiley &
Sons.
Even, A., & Shankaranarayanan, G. (2009). Utility cost perspectives in data quality man-
agement. The Journal of Computer Information Systems, 50(2), 127–135.
Ferro, E., & Osella, M. (2013). Eight business model archetypes for PSI re-use. “Open data
on the web” workshop, Google campus, London.
Haug, A., Pedersen, A., & Arlbjørn, J. S. (2009). A classification model of ERP system data
quality. Industrial Management & Data Systems, 109(8), 1053–1068. http://dx.doi.
org/10.1108/02635570910991292.
Heinrich, B. (2002). Datenqualitätsmanagement in Data Warehouse-Systemen. (doctoral
thesis, Oldenburg).
Heinrich, B., Klier, M., & Kaiser, M. (2009). A procedure to develop metrics for cur-
rency and its application in CRM. Journal of Data and Information Quality, 1(1),
5.
Helbig, N., Nakashima, M., & Dawe, Sharon S. (June 4–7, 2012). Understanding the value
and limits of government information in policy informatics: A preliminary explora-
tion. Proceedings of the 13th Annual International Conference on Digital Government
Research (dg.o2012).
Hofmokl, J. (2010). The Internet commons: toward an eclectic theoretical framework.
International Journal of the Commons, 4(1), 226–250.
Iemma, R., Morando, F., & Osella, M. (2014). Breaking public administrations' data silos.
eJournal of eDemocracy & Open Government, 6(2).
Janssen, M., Charalabidis, Y., & Zuiderwijk, A. (2012). Benefits, adoption barriers and
myths of open data and open government. Information Systems Management, 29(4),
258–268.
Vassiliou, Y. (1995). In M. Jarke, M. Lenzerini, & P. Vassiliadis (Eds.), Fundamentals of data
warehouses. Springer Verlag.
Jeusfeld, M., Quix, C., & Jarke, M. (1998). Design and analysis of quality information for
datawarehouses. Proceedings of the 17th International Conference on Conceptual
Modeling.
Kaiser, M., Klier, M., & Heinrich, B. (2007). “How to measure data quality? — A metric-
based approach” (2007). ICIS 2007 Proceedings. Paper 108.
Kim, W. (2002). On three major holes in data warehousing today. Journal of Object
Technology, 1(4), 39–47. http://dx.doi.org/10.5381/jot.2002.1.4.c3.
Kuk, G., & Davies, T. (2011). The roles of agency and artifacts in assembling open data
complementarities. Thirty Second International Conference on Information Systems.
Madnick, S., Wang, R., & Xian, X. (2004). The design and implementation of a corporate
householding knowledge processor to improve data quality. Journal of Management
Information Systems, 20(1), 41–49.
Maurino, A., Spahiu, B., Batini, C., & Viscusi, G. (2014). Compliance with Open Government
Data Policies: an empirical evaluation of Italian local public administrations. Twenty
Second European Conference on Information Systems, Tel Aviv.
Mayer-Schönberger, V., & Zappia, Z. (2011). Participation and power: intermediaries of
open data. 1st Berlin Symposium on Internet and Society.
Moraga, C., Moraga, M., Calero, C., & Caro, A. (2009). SQuaRE-aligned data quality model
for web portals. Quality Software, 2009. QSIC’09. 9th International Conference on
(pp. 117–122).
Naumann, F. (2002). Quality-driven query answering for integrated information systems.
Lecture Notes in Computer Science, 2261.
Redman, T. (1996). Data quality for the information age. Artech House.
Reiche, K., & Hofig, E. (2013). Implementation of metadata quality metrics and application
on public government data. Computer software and applications conference workshops
(COMPSACW), 2013 IEEE 37th annual (pp. 236–241).
Sachs, L. (1982). Applied statistics. A handbook of techniques. New York — Heidelberg —
Berlin: Springer-Verlag (734 pp., 59 figs., DM 118,-).
Sande, M. V., Dimou, A., Colpaert, P., Mannens, E., & Van de Walle, R. (2013). Linked data
as enabler for open data ecosystems. Open data on the web 23–24 April 2013, Campus
London, Shoreditch.
Sharon, Dawes S. (2010). Stewardship and usefulness: Policy principles for information-
based transparency. Government Information Quarterly, 27(4), 377–383.
Stiglitz, J. E., Orszag, P. R., & Orszag, J. M. (2000). The role of government in a digital age.
(Commissioned by the computer& communications industry association).
Tauberer, J. (2012). Open government data: The book.
Ubaldi, B. (2013). Open government data: Towards empirical analysis of open govern-
ment data initiatives. Tech. rep. OECD Publishing.
Umbrich, Jürgen, Neumaier, Sebastian, & Polleres, Axel (March 2015). Towards assessing
the quality evolution of Open Data portals. ODQ2015: Open data quality: From theory
to practice workshop (Munich, Germany).
Van de Sompel, H., Sanderson, R., Nelson, M. L., Balakireva, L. L., Shankar, H., & Ainsworth,
S. (2010). An HTTP-based versioning mechanism for linked data. (arXiv preprint arXiv:
1003.3661).
Vickery, G. (2011). Review of recent studies on PSI re-use and related market developments.
Paris: Information Economics.
Wand, Y., & Wang, R. (1996). Anchoring data quality dimensions in ontological founda-
tions. Communications of the ACM, 39, 11.
Wang, R., & Strong, D. (1996). Beyond accuracy: What data quality means to data con-
sumers. Journal of Management Information Systems, 12, 4.
Whitmore A. Using open government data to predict war: A case study of data and
systems challenges, Government Information Quarterly, Volume 31, Issue 4, October
2014, Pages 622-630, (ISSN 0740-624X).
Zuiderwijk, A., Janssen, M., & Davis, C. (2014). Innovation with open data: Essential ele-
ments of open data ecosystems. Information Policy, 19(1), 17–33.
Zuiderwijk, Anneke, & Janssen, Marijn (2015). Participation and data quality in open data
use: Open data infrastructures evaluated. Proceedings of the 15th European Conference
on eGovernment 2015: ECEG 2015. Academic Conferences Limited.
Web references
Web ref. 1. http://opendefinition.org/(last visited on November 5, 2015).
Web ref. 2. http://census.okfn.org/, (last visited on November 5, 2015).
Web ref. 3. http://goo.gl/QlUT1d (last visited on November 5, 2015).
Web ref. 4. https://goo.gl/yq4ElP (last visited on November 5, 2015).
Web ref. 5. http://sunlightfoundation.com/blog/2010/06/23/elenas-inbox/ (last visit on
November 5, 2015).
Web ref. 6. http://nexa.polito.it/lunch-9 (last visit on November 5, 2015).
Web ref. 7. http://okfnlabs.org/bad-data/ (last visit on November 5, 2015).
Web ref. 8. http://www.dati.gov.it/dataset (last visit on November 5, 2015).
Web ref. 9. https://www.opengovawards.org/Awards_Booklet_Final.pdf, (last visit on
November 5, 2015).
Web ref. 10. http://en.wikipedia.org/wiki/European_Social_Fund, (last visit on November
5, 2015).
Web ref. 11. http://www.dati.gov.it/(last visit on November 5, 2015).
Web ref. 12. http://blog.okfn.org/2013/07/02/git-and-github-for-data/ (last visited on
November 5, 2015).
Web ref. 13. http://dat-data.com/(last visited on November 5, 2015).
Web ref. 14. https://code.google.com/p/google-refine/ (last visited on November 5, 2015).
Web ref. 15. http://datacleaner.org/, (last visited on November 5, 2015).
Web ref. 16. http://checklists.opquast.com/en/opendata, (last visited on November 5,
2015).
Web ref. 17. http://odaf.org/papers/Open%20Data%20and%20Metadata%20Standards.pdf,
(last visited on November 5, 2015).
Antonio Vetrò is Director of Research and Policy at the Nexa
Center for Internet and Society at Politecnico di Torino
(Italy). Formerly, he has been a research fellow in the Soft-
ware and System Engineering Department at Technische
Universität München (Germany) and junior scientist at
Fraunhofer Center for Experimental Software Engineering
(MD, U.S.A.). He holds a PhD in Information and System En-
gineering from Politecnico di Torino (Italy). He is interested
in studying the impact of technology on society, with a focus
on technology transfer and internet science, and adopting an
empirical epistemological approach. Contact him at antonio.
vetro@polito.it.
Lorenzo Canova holds a MSc in Industrial engineering from
Politecnico di Torino and wrote his Master thesis on Open
Government Data quality. He is currently a research fellow
of the Nexa Center for Internet & Society at Politecnico di To-
rino. His main research interests are in Open Government
Data, government transparency and linked data. Contact
him at lorenzo.canova@polito.it.
336 A. Vetrò et al. / Government Information Quarterly 33 (2016) 325–337
Behkamal, Behshid, Kahani, Mohsen, Bagheri, Ebrahim, & Jeremic, Zoran (May 2014).
A metrics-driven approach for quality assessment of linked open data. Journal of
Theoretical and Applied Electronic Commerce Research, 9(2), 64–79.
Berners-Lee, T. (2006). Linked data-design issues. Tech. rep., W3C http://www.w3.org/
DesignIssues/LinkedData.html
Bovee, M., Srivastava, R., & Mak, B. (2003). Conceptual framework and belief-function
approach to assessing overall information quality. International Journal of Intelligent
Systems, 51–74.
Calero, C., Caro, A., & Piattini, M. (2008). An applicable data quality model for web portal
data consumers. World Wide Web, 11(4), 465–484.
Canova, L., Basso, S., Iemma, R., & Morando, F. (2015). Collaborative open data versioning:
A pragmatic approach using linked data. International Conference for E-Democracy and
Open Government 2015 (CeDEM15), Krems, Austria.
Catarci, T., & Scannapieco, M. (2002). Data quality under the computer science perspec-
tive. Archivi Computer, 2.
Detlor, B., Hupfer, Maureen E, Ruhi, U. & Zhao, L. Information quality and community
municipal portal use, Government Information Quarterly, Volume 30, Issue 1,
January 2013, Pages 23-32, http://dx.doi.org/10.1016/j.giq.2012.08.00, (ISSN
0740-624X).
English, L. (1999). Improving data warehouse and business information quality. Wiley &
Sons.
Even, A., & Shankaranarayanan, G. (2009). Utility cost perspectives in data quality man-
agement. The Journal of Computer Information Systems, 50(2), 127–135.
Ferro, E., & Osella, M. (2013). Eight business model archetypes for PSI re-use. “Open data
on the web” workshop, Google campus, London.
Haug, A., Pedersen, A., & Arlbjørn, J. S. (2009). A classification model of ERP system data
quality. Industrial Management & Data Systems, 109(8), 1053–1068. http://dx.doi.
org/10.1108/02635570910991292.
Heinrich, B. (2002). Datenqualitätsmanagement in Data Warehouse-Systemen. (doctoral
thesis, Oldenburg).
Heinrich, B., Klier, M., & Kaiser, M. (2009). A procedure to develop metrics for cur-
rency and its application in CRM. Journal of Data and Information Quality, 1(1),
5.
Helbig, N., Nakashima, M., & Dawe, Sharon S. (June 4–7, 2012). Understanding the value
and limits of government information in policy informatics: A preliminary explora-
tion. Proceedings of the 13th Annual International Conference on Digital Government
Research (dg.o2012).
Hofmokl, J. (2010). The Internet commons: toward an eclectic theoretical framework.
International Journal of the Commons, 4(1), 226–250.
Iemma, R., Morando, F., & Osella, M. (2014). Breaking public administrations' data silos.
eJournal of eDemocracy & Open Government, 6(2).
Janssen, M., Charalabidis, Y., & Zuiderwijk, A. (2012). Benefits, adoption barriers and
myths of open data and open government. Information Systems Management, 29(4),
258–268.
Vassiliou, Y. (1995). In M. Jarke, M. Lenzerini, & P. Vassiliadis (Eds.), Fundamentals of data
warehouses. Springer Verlag.
Jeusfeld, M., Quix, C., & Jarke, M. (1998). Design and analysis of quality information for
datawarehouses. Proceedings of the 17th International Conference on Conceptual
Modeling.
Kaiser, M., Klier, M., & Heinrich, B. (2007). “How to measure data quality? — A metric-
based approach” (2007). ICIS 2007 Proceedings. Paper 108.
Kim, W. (2002). On three major holes in data warehousing today. Journal of Object
Technology, 1(4), 39–47. http://dx.doi.org/10.5381/jot.2002.1.4.c3.
Kuk, G., & Davies, T. (2011). The roles of agency and artifacts in assembling open data
complementarities. Thirty Second International Conference on Information Systems.
Madnick, S., Wang, R., & Xian, X. (2004). The design and implementation of a corporate
householding knowledge processor to improve data quality. Journal of Management
Information Systems, 20(1), 41–49.
Maurino, A., Spahiu, B., Batini, C., & Viscusi, G. (2014). Compliance with Open Government
Data Policies: an empirical evaluation of Italian local public administrations. Twenty
Second European Conference on Information Systems, Tel Aviv.
Mayer-Schönberger, V., & Zappia, Z. (2011). Participation and power: intermediaries of
open data. 1st Berlin Symposium on Internet and Society.
Moraga, C., Moraga, M., Calero, C., & Caro, A. (2009). SQuaRE-aligned data quality model
for web portals. Quality Software, 2009. QSIC’09. 9th International Conference on
(pp. 117–122).
Naumann, F. (2002). Quality-driven query answering for integrated information systems.
Lecture Notes in Computer Science, 2261.
Redman, T. (1996). Data quality for the information age. Artech House.
Reiche, K., & Hofig, E. (2013). Implementation of metadata quality metrics and application
on public government data. Computer software and applications conference workshops
(COMPSACW), 2013 IEEE 37th annual (pp. 236–241).
Sachs, L. (1982). Applied statistics. A handbook of techniques. New York — Heidelberg —
Berlin: Springer-Verlag (734 pp., 59 figs., DM 118,-).
Sande, M. V., Dimou, A., Colpaert, P., Mannens, E., & Van de Walle, R. (2013). Linked data
as enabler for open data ecosystems. Open data on the web 23–24 April 2013, Campus
London, Shoreditch.
the quality evolution of Open Data portals. ODQ2015: Open data quality: From theory
to practice workshop (Munich, Germany).
Van de Sompel, H., Sanderson, R., Nelson, M. L., Balakireva, L. L., Shankar, H., & Ainsworth,
S. (2010). An HTTP-based versioning mechanism for linked data. (arXiv preprint arXiv:
1003.3661).
Vickery, G. (2011). Review of recent studies on PSI re-use and related market developments.
Paris: Information Economics.
Wand, Y., & Wang, R. (1996). Anchoring data quality dimensions in ontological founda-
tions. Communications of the ACM, 39, 11.
Wang, R., & Strong, D. (1996). Beyond accuracy: What data quality means to data con-
sumers. Journal of Management Information Systems, 12, 4.
Whitmore A. Using open government data to predict war: A case study of data and
systems challenges, Government Information Quarterly, Volume 31, Issue 4, October
2014, Pages 622-630, (ISSN 0740-624X).
Zuiderwijk, A., Janssen, M., & Davis, C. (2014). Innovation with open data: Essential ele-
ments of open data ecosystems. Information Policy, 19(1), 17–33.
Zuiderwijk, Anneke, & Janssen, Marijn (2015). Participation and data quality in open data
use: Open data infrastructures evaluated. Proceedings of the 15th European Conference
on eGovernment 2015: ECEG 2015. Academic Conferences Limited.
Web references
Web ref. 1. http://opendefinition.org/(last visited on November 5, 2015).
Web ref. 2. http://census.okfn.org/, (last visited on November 5, 2015).
Web ref. 3. http://goo.gl/QlUT1d (last visited on November 5, 2015).
Web ref. 4. https://goo.gl/yq4ElP (last visited on November 5, 2015).
Web ref. 5. http://sunlightfoundation.com/blog/2010/06/23/elenas-inbox/ (last visit on
November 5, 2015).
Web ref. 6. http://nexa.polito.it/lunch-9 (last visit on November 5, 2015).
Web ref. 7. http://okfnlabs.org/bad-data/ (last visit on November 5, 2015).
Web ref. 8. http://www.dati.gov.it/dataset (last visit on November 5, 2015).
Web ref. 9. https://www.opengovawards.org/Awards_Booklet_Final.pdf, (last visit on
November 5, 2015).
Web ref. 10. http://en.wikipedia.org/wiki/European_Social_Fund, (last visit on November
5, 2015).
Web ref. 11. http://www.dati.gov.it/(last visit on November 5, 2015).
Web ref. 12. http://blog.okfn.org/2013/07/02/git-and-github-for-data/ (last visited on
November 5, 2015).
Web ref. 13. http://dat-data.com/(last visited on November 5, 2015).
Web ref. 14. https://code.google.com/p/google-refine/ (last visited on November 5, 2015).
Web ref. 15. http://datacleaner.org/, (last visited on November 5, 2015).
Web ref. 16. http://checklists.opquast.com/en/opendata, (last visited on November 5,
2015).
Web ref. 17. http://odaf.org/papers/Open%20Data%20and%20Metadata%20Standards.pdf,
(last visited on November 5, 2015).
Antonio Vetrò is Director of Research and Policy at the Nexa
Center for Internet and Society at Politecnico di Torino
(Italy). Formerly, he has been a research fellow in the Soft-
ware and System Engineering Department at Technische
Universität München (Germany) and junior scientist at
Fraunhofer Center for Experimental Software Engineering
(MD, U.S.A.). He holds a PhD in Information and System En-
gineering from Politecnico di Torino (Italy). He is interested
in studying the impact of technology on society, with a focus
on technology transfer and internet science, and adopting an
empirical epistemological approach. Contact him at antonio.
vetro@polito.it.
Lorenzo Canova holds a MSc in Industrial engineering from
Politecnico di Torino and wrote his Master thesis on Open
Government Data quality. He is currently a research fellow
of the Nexa Center for Internet & Society at Politecnico di To-
rino. His main research interests are in Open Government
Data, government transparency and linked data. Contact
him at lorenzo.canova@polito.it.
Marco Torchiano is an associate professor at Politecnico di
Torino, Italy; he has been a post-doctoral research fellow at
Norwegian University of Science and Technology (NTNU),
Norway. He received an MSc and a PhD in Computer Engi-
neering from Politecnico di Torino, Italy. He is an author or
coauthor of more than 100 research papers published in in-
ternational journals and conferences. He is the co-author of
the book ‘Software Development—Case studies in Java’ from
Addison-Wesley, and the co-editor of the book ‘Developing
Services for the Wireless Internet’ from Springer. His current
research interests are: design notations, testing methodolo-
gies, OTS-based development, and static analysis. The meth-
odological approach he adopts is that of empirical software
engineering. Contact him at marco.torchiano@polito.it.
Camilo O. Minotas holds a MSc from Politecnico di Torino
and wrote his Master thesis on Open Data Quality Assess-
ment. He is currently a software developer in Colombia. His
main research interests are in open data quality and govern-
ment transparency. Contact him at minotas@gmail.com.
Raimondo Iemma holds a MSc in Industrial engin
from Politecnico di Torino (Italy). His research focu
open data platforms, smart disclosure of consumptio
economics of information, and, in general, open gove
and innovation within (or fostered by) the public sec
tween 2013 and 2015, he served as a Managing Dir
research fellow at the Nexa Center for Internet & So
Politecnico di Torino. He works as a project man
Seac02, a digital company based in Torino. Contact
raimondo.iemma@gmail.com.
Federico Morando (M.A.S. in Economic theory and
metrics from the Univ. of Toulouse, Ph.D. in Instit
Economics and Law from the Univ. of Turin and Gh
an economist, with interdisciplinary research in
focused on the intersection between law, econom
technology. He taught intellectual property and comp
law at Bocconi University in Milan, and he lect
the Politecnico di Torino, at the WIPO LL.M. in Inte
Property and in other post-graduate and doctoral c
Between 2009 and 2015, he served as a Managing D
and then Director of Research and Policy at the Nexa
for Internet & Society at Politecnico di Torino. He is cu
leading a startup company that focused on linke
Contact him at federico.morando@gmail.com.
A. Vetrò et al. / Government Information Quarterly 33 (2016) 325–337
Marco Torchiano is an associate professor at Politecnico di
Torino, Italy; he has been a post-doctoral research fellow at
Norwegian University of Science and Technology (NTNU),
Norway. He received an MSc and a PhD in Computer Engi-
neering from Politecnico di Torino, Italy. He is an author or
coauthor of more than 100 research papers published in in-
ternational journals and conferences. He is the co-author of
the book ‘Software Development—Case studies in Java’ from
Addison-Wesley, and the co-editor of the book ‘Developing
Services for the Wireless Internet’ from Springer. His current
research interests are: design notations, testing methodolo-
gies, OTS-based development, and static analysis. The meth-
odological approach he adopts is that of empirical software
engineering. Contact him at marco.torchiano@polito.it.
Camilo O. Minotas holds a MSc from Politecnico di Torino
and wrote his Master thesis on Open Data Quality Assess-
ment. He is currently a software developer in Colombia. His
main research interests are in open data quality and govern-
ment transparency. Contact him at minotas@gmail.com.
A. Vetrò et al. / Government Information Quarterly 33 (20
Marco Torchiano is an associate professor at Politecnico di
Torino, Italy; he has been a post-doctoral research fellow at
Norwegian University of Science and Technology (NTNU),
Norway. He received an MSc and a PhD in Computer Engi-
neering from Politecnico di Torino, Italy. He is an author or
coauthor of more than 100 research papers published in in-
ternational journals and conferences. He is the co-author of
the book ‘Software Development—Case studies in Java’ from
Addison-Wesley, and the co-editor of the book ‘Developing
Services for the Wireless Internet’ from Springer. His current
research interests are: design notations, testing methodolo-
gies, OTS-based development, and static analysis. The meth-
odological approach he adopts is that of empirical software
engineering. Contact him at marco.torchiano@polito.it.
Camilo O. Minotas holds a MSc from Politecnico di Torino
and wrote his Master thesis on Open Data Quality Assess-
ment. He is currently a software developer in Colombia. His
main research interests are in open data quality and govern-
ment transparency. Contact him at minotas@gmail.com.
Raimondo Iemma holds a MSc in Industrial engi
from Politecnico di Torino (Italy). His research foc
open data platforms, smart disclosure of consumptio
economics of information, and, in general, open gove
and innovation within (or fostered by) the public sec
tween 2013 and 2015, he served as a Managing Dir
research fellow at the Nexa Center for Internet & So
Politecnico di Torino. He works as a project man
Seac02, a digital company based in Torino. Contact
raimondo.iemma@gmail.com.
Federico Morando (M.A.S. in Economic theory and
metrics from the Univ. of Toulouse, Ph.D. in Insti
Economics and Law from the Univ. of Turin and Gh
an economist, with interdisciplinary research in
focused on the intersection between law, econom
technology. He taught intellectual property and com
law at Bocconi University in Milan, and he lect
the Politecnico di Torino, at the WIPO LL.M. in Inte
Property and in other post-graduate and doctoral c
Between 2009 and 2015, he served as a Managing D
and then Director of Research and Policy at the Nexa
for Internet & Society at Politecnico di Torino. He is cu
leading a startup company that focused on linke
Contact him at federico.morando@gmail.com.
A. Vetrò et al. / Government Information Quarterly 33 (2016) 325–337
© UNI – Ente Nazionale Italiano di Unificazione
CODICE PROGETTO UNI 1602662
NUMERO NORMA UNI/TS 11725:2018
TITOLO ITALIANO Ingegneria del software e di sistema - Linee guida per la misurazione della
qualità dei dati
TITOLO INGLESE System and software engineering - Guidelines for the measurement of data
quality
ORGANO COMPETENTE UNI/CT 504 Software Engineering
CO-AUTORE
SOMMARIO La specifica tecnica definisce l’applicazione della misurazione della UNI
CEI ISO/IEC 25024 "Ingegneria del software e di sistema - Requisiti e
valutazione della qualità dei sistemi e del software (SQuaRE) - Misurazione
della qualità dei dati" attraverso il confronto con le prassi attuali e spunti
teorici. La norma UNI CEI ISO/IEC 25024 definisce un set di misure da
applicare nella misurazione della qualità dei dati. Le definizioni delle
misure sono generali per essere applicate a diversi casi specifici.
RELAZIONI NAZIONALI -----
RELAZIONI
INTERNAZIONALI
-----
Data Quality: unstructured data
• Evolution measures based on simple counts
SELECT COUNT(DISTINCT ?s) AS ?COUNT
WHERE { ?s a <C> .}
Semantic Web 0 (0) 1–0 1
IOS Press
A Quality Assessment Approach for Evolving
Knowledge Bases
Mohammad Rifat Ahmmad Rashid a,⇤, Marco Torchiano a, Giuseppe Rizzo b,
Nandana Mihindukulasooriya c and Oscar Corcho c
a
Politecnico di Torino, Italy
E-mail: mohammad.rashid@polito.it, marco.torchiano@polito.it
b
Instituto Superiore Mario Boella
E-mail: giuseppe.rizzo@ismb.it
c
Universidad Politecnica de Madrid, Spain
E-mail: nmihindu@fi.upm.es, ocorcho@fi.upm.es
Abstract
Knowledge bases are nowadays essential components for any task that requires automation with some degrees of intelligence.
Assessing the quality of a Knowledge Base (KB) is a complex task as it often means measuring the quality of structured informa-
tion, ontologies and vocabularies, and queryable endpoints. Popular knowledge bases such as DBpedia, YAGO2, and Wikidata
have chosen the RDF data model to represent their data due to its capabilities for semantically rich knowledge representation.
Despite its advantages, there are challenges in using RDF data model, for example, data quality assessment and validation. In this
paper, we present a novel knowledge base quality assessment approach that relies on evolution analysis. The proposed approach
uses data profiling on consecutive knowledge base releases to compute quality measures that allow detecting quality issues. Our
quality characteristics are based on the KB evolution analysis and we used high-level change detection for measurement func-
tions. In particular, we propose four quality characteristics: Persistency, Historical Persistency, Consistency, and Completeness.
Persistency and historical persistency measures concern the degree of changes and lifespan of any entity type. Consistency and
completeness measures identify properties with incomplete information and contradictory facts. The approach has been assessed
both quantitatively and qualitatively on a series of releases from two knowledge bases, eleven releases of DBpedia and eight
releases of 3cixty. The capability of Persistency and Consistency characteristics to detect quality issues varies significantly be-
tween the two case studies. Persistency measure gives observational results for evolving KBs. It is highly effective in case of KB
with periodic updates such as 3cixty KB. The Completeness characteristic is extremely effective and was able to achieve 95%
precision in error detection for both use cases. The measures are based on simple statistical operations that make the solution
both flexible and scalable.
Keywords: Quality Assessment, Quality Issues, Temporal Analysis, Knowledge Base, Linked Data
1. Introduction
The Linked Data approach consists in exposing and
connecting data from different sources on the Web by
the means of semantic web technologies. Tim Berners-
Lee1
refers to linked open data as a distributed model
*Corresponding author. E-mail: mohammad.rashid@polito.it
1http://www.w3.org/DesignIssues/LinkedData.
html
for the Semantic Web that allows any data provider to
publish its data publicly, in a machine readable format,
and to meaningfully link them with other information
sources over the Web. This is leading to the creation
of Linked Open Data (LOD) 2
cloud hosting several
Knowledge Bases (KBs) making available billions of
RDF triples from different domains such as Geogra-
2http://lod-cloud.net
1570-0844/0-1900/$35.00 c 0 – IOS Press and the authors. All rights reserved
Mobile UI Testing
K-9 Mail
V 2.995 V 3.309
Research Activities: past, present, and future.
Research Activities: past, present, and future.
Research Activities: past, present, and future.
Research Activities: past, present, and future.
1 of 21

Recommended

Comparison between Test-Driven Development and Conventional Development: A Ca... by
Comparison between Test-Driven Development and Conventional Development: A Ca...Comparison between Test-Driven Development and Conventional Development: A Ca...
Comparison between Test-Driven Development and Conventional Development: A Ca...IJERA Editor
47 views6 slides
QUALITY METRICS OF TEST SUITES IN TESTDRIVEN DESIGNED APPLICATIONS by
QUALITY METRICS OF TEST SUITES IN TESTDRIVEN DESIGNED APPLICATIONSQUALITY METRICS OF TEST SUITES IN TESTDRIVEN DESIGNED APPLICATIONS
QUALITY METRICS OF TEST SUITES IN TESTDRIVEN DESIGNED APPLICATIONSijseajournal
25 views16 slides
Importance of Testing in SDLC by
Importance of Testing in SDLCImportance of Testing in SDLC
Importance of Testing in SDLCIJEACS
88 views3 slides
International Journal of Engineering Research and Development (IJERD) by
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
404 views4 slides
Qa Faqs by
Qa FaqsQa Faqs
Qa Faqsnitin lakhanpal
3.4K views80 slides
MIT521 software testing (2012) v2 by
MIT521   software testing  (2012) v2MIT521   software testing  (2012) v2
MIT521 software testing (2012) v2Yudep Apoi
1.9K views19 slides

More Related Content

What's hot

Object Oriented Analysis by
Object Oriented AnalysisObject Oriented Analysis
Object Oriented AnalysisAMITJain879
68 views49 slides
A survey of software testing by
A survey of software testingA survey of software testing
A survey of software testingTao He
3.1K views50 slides
Software testing q as collection by ravi by
Software testing q as   collection by raviSoftware testing q as   collection by ravi
Software testing q as collection by raviRavindranath Tagore
4.2K views116 slides
Chapter 2 - White Box Test Techniques by
Chapter 2 - White Box Test TechniquesChapter 2 - White Box Test Techniques
Chapter 2 - White Box Test TechniquesNeeraj Kumar Singh
502 views44 slides
Software testing by
Software testingSoftware testing
Software testingRico-j Laurente
314 views9 slides
Prioritizing Test Cases for Regression Testing A Model Based Approach by
Prioritizing Test Cases for Regression Testing A Model Based ApproachPrioritizing Test Cases for Regression Testing A Model Based Approach
Prioritizing Test Cases for Regression Testing A Model Based ApproachIJTET Journal
429 views4 slides

What's hot(20)

Object Oriented Analysis by AMITJain879
Object Oriented AnalysisObject Oriented Analysis
Object Oriented Analysis
AMITJain87968 views
A survey of software testing by Tao He
A survey of software testingA survey of software testing
A survey of software testing
Tao He3.1K views
Prioritizing Test Cases for Regression Testing A Model Based Approach by IJTET Journal
Prioritizing Test Cases for Regression Testing A Model Based ApproachPrioritizing Test Cases for Regression Testing A Model Based Approach
Prioritizing Test Cases for Regression Testing A Model Based Approach
IJTET Journal429 views
5WCSQ(CFP) - Quality Improvement by the Real-Time Detection of the Problems by Takanori Suzuki
5WCSQ(CFP) - Quality Improvement by the Real-Time Detection of the Problems5WCSQ(CFP) - Quality Improvement by the Real-Time Detection of the Problems
5WCSQ(CFP) - Quality Improvement by the Real-Time Detection of the Problems
Takanori Suzuki432 views
Object Oriented Testing by AMITJain879
Object Oriented TestingObject Oriented Testing
Object Oriented Testing
AMITJain879627 views
Lessons Learned in Software Quality 1 by Belal Raslan
Lessons Learned in Software Quality 1Lessons Learned in Software Quality 1
Lessons Learned in Software Quality 1
Belal Raslan2.3K views
Software Product Measurement and Analysis in a Continuous Integration Environ... by Gabriel Moreira
Software Product Measurement and Analysis in a Continuous Integration Environ...Software Product Measurement and Analysis in a Continuous Integration Environ...
Software Product Measurement and Analysis in a Continuous Integration Environ...
Gabriel Moreira3.1K views
Software testing and introduction to quality by DhanashriAmbre
Software testing and introduction to qualitySoftware testing and introduction to quality
Software testing and introduction to quality
DhanashriAmbre154 views

Similar to Research Activities: past, present, and future.

C41041120 by
C41041120C41041120
C41041120IJERA Editor
392 views10 slides
Too many files by
Too many filesToo many files
Too many filesnikhilawareness
40 views20 slides
Software testing techniques - www.testersforum.com by
Software testing techniques - www.testersforum.comSoftware testing techniques - www.testersforum.com
Software testing techniques - www.testersforum.comwww.testersforum.com
1.1K views20 slides
International Journal of Soft Computing and Engineering (IJS by
International Journal of Soft Computing and Engineering (IJSInternational Journal of Soft Computing and Engineering (IJS
International Journal of Soft Computing and Engineering (IJShildredzr1di
2 views53 slides
fe.docx by
fe.docxfe.docx
fe.docxlmelaine
2 views49 slides
Ijcatr04051006 by
Ijcatr04051006Ijcatr04051006
Ijcatr04051006Editor IJCATR
437 views9 slides

Similar to Research Activities: past, present, and future.(20)

Software testing techniques - www.testersforum.com by www.testersforum.com
Software testing techniques - www.testersforum.comSoftware testing techniques - www.testersforum.com
Software testing techniques - www.testersforum.com
International Journal of Soft Computing and Engineering (IJS by hildredzr1di
International Journal of Soft Computing and Engineering (IJSInternational Journal of Soft Computing and Engineering (IJS
International Journal of Soft Computing and Engineering (IJS
hildredzr1di2 views
Test driven development by Nascenia IT
Test driven developmentTest driven development
Test driven development
Nascenia IT637 views
Application Testing For Software Testing by Robyn Champagne
Application Testing For Software TestingApplication Testing For Software Testing
Application Testing For Software Testing
Robyn Champagne3 views
The productivity of testing in software development life cycle by Nora Alriyes
The productivity of testing in software development life cycleThe productivity of testing in software development life cycle
The productivity of testing in software development life cycle
Nora Alriyes508 views
International Journal of Computational Engineering Research(IJCER) by ijceronline
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
ijceronline297 views
A Research Study on importance of Testing and Quality Assurance in Software D... by Sehrish Asif
A Research Study on importance of Testing and Quality Assurance in Software D...A Research Study on importance of Testing and Quality Assurance in Software D...
A Research Study on importance of Testing and Quality Assurance in Software D...
Sehrish Asif6.9K views
Unit Test using Test Driven Development Approach to Support Reusability by ijtsrd
Unit Test using Test Driven Development Approach to Support ReusabilityUnit Test using Test Driven Development Approach to Support Reusability
Unit Test using Test Driven Development Approach to Support Reusability
ijtsrd37 views
FROM THE ART OF SOFTWARE TESTING TO TEST-AS-A-SERVICE IN CLOUD COMPUTING by ijseajournal
FROM THE ART OF SOFTWARE TESTING TO TEST-AS-A-SERVICE IN CLOUD COMPUTINGFROM THE ART OF SOFTWARE TESTING TO TEST-AS-A-SERVICE IN CLOUD COMPUTING
FROM THE ART OF SOFTWARE TESTING TO TEST-AS-A-SERVICE IN CLOUD COMPUTING
ijseajournal12 views
A Study: The Analysis of Test Driven Development And Design Driven Test by Editor IJMTER
A Study: The Analysis of Test Driven Development And Design Driven TestA Study: The Analysis of Test Driven Development And Design Driven Test
A Study: The Analysis of Test Driven Development And Design Driven Test
Editor IJMTER412 views
Manual Testing Interview Questions & Answers.docx by ssuser305f65
Manual Testing Interview Questions & Answers.docxManual Testing Interview Questions & Answers.docx
Manual Testing Interview Questions & Answers.docx
ssuser305f6510 views

More from Marco Torchiano

Testing the UI of Mobile Applications by
Testing the UI of Mobile ApplicationsTesting the UI of Mobile Applications
Testing the UI of Mobile ApplicationsMarco Torchiano
337 views118 slides
Software Engineering II Course at Politecnico di Torino by
Software Engineering II Course at Politecnico di TorinoSoftware Engineering II Course at Politecnico di Torino
Software Engineering II Course at Politecnico di TorinoMarco Torchiano
187 views14 slides
Espresso vs. EyeAutomate: comparing two generations of Android GUI testing tools by
Espresso vs. EyeAutomate: comparing two generations of Android GUI testing toolsEspresso vs. EyeAutomate: comparing two generations of Android GUI testing tools
Espresso vs. EyeAutomate: comparing two generations of Android GUI testing toolsMarco Torchiano
240 views30 slides
Data Quality - Standards e Applicazioni by
Data Quality - Standards e ApplicazioniData Quality - Standards e Applicazioni
Data Quality - Standards e ApplicazioniMarco Torchiano
626 views32 slides
Data Quality - Standards and Application to Open Data by
Data Quality - Standards and Application to Open DataData Quality - Standards and Application to Open Data
Data Quality - Standards and Application to Open DataMarco Torchiano
845 views62 slides
Data Visualization by
Data VisualizationData Visualization
Data VisualizationMarco Torchiano
908 views107 slides

More from Marco Torchiano(14)

Testing the UI of Mobile Applications by Marco Torchiano
Testing the UI of Mobile ApplicationsTesting the UI of Mobile Applications
Testing the UI of Mobile Applications
Marco Torchiano337 views
Software Engineering II Course at Politecnico di Torino by Marco Torchiano
Software Engineering II Course at Politecnico di TorinoSoftware Engineering II Course at Politecnico di Torino
Software Engineering II Course at Politecnico di Torino
Marco Torchiano187 views
Espresso vs. EyeAutomate: comparing two generations of Android GUI testing tools by Marco Torchiano
Espresso vs. EyeAutomate: comparing two generations of Android GUI testing toolsEspresso vs. EyeAutomate: comparing two generations of Android GUI testing tools
Espresso vs. EyeAutomate: comparing two generations of Android GUI testing tools
Marco Torchiano240 views
Data Quality - Standards e Applicazioni by Marco Torchiano
Data Quality - Standards e ApplicazioniData Quality - Standards e Applicazioni
Data Quality - Standards e Applicazioni
Marco Torchiano626 views
Data Quality - Standards and Application to Open Data by Marco Torchiano
Data Quality - Standards and Application to Open DataData Quality - Standards and Application to Open Data
Data Quality - Standards and Application to Open Data
Marco Torchiano845 views
Riflessioni su Riforma Costituzionale "Renzi-Boschi" by Marco Torchiano
Riflessioni su Riforma Costituzionale "Renzi-Boschi"Riflessioni su Riforma Costituzionale "Renzi-Boschi"
Riflessioni su Riforma Costituzionale "Renzi-Boschi"
Marco Torchiano205 views
Relevance, Benefits, and Barriers of Software Modelling and Model Driven Tech... by Marco Torchiano
Relevance, Benefits, and Barriers of Software Modelling and Model Driven Tech...Relevance, Benefits, and Barriers of Software Modelling and Model Driven Tech...
Relevance, Benefits, and Barriers of Software Modelling and Model Driven Tech...
Marco Torchiano392 views
Energy Consumption Analysis
 of Image Encoding and Decoding Algorithms by Marco Torchiano
Energy Consumption Analysis
 of Image Encoding and Decoding AlgorithmsEnergy Consumption Analysis
 of Image Encoding and Decoding Algorithms
Energy Consumption Analysis
 of Image Encoding and Decoding Algorithms
Marco Torchiano475 views
Relevance, Benefits, and Problems of Software Modelling and Model-Driven Tech... by Marco Torchiano
Relevance, Benefits, and Problems of Software Modelling and Model-Driven Tech...Relevance, Benefits, and Problems of Software Modelling and Model-Driven Tech...
Relevance, Benefits, and Problems of Software Modelling and Model-Driven Tech...
Marco Torchiano611 views
A Model-Based Approach to Language Integration by Marco Torchiano
A Model-Based Approach to Language Integration A Model-Based Approach to Language Integration
A Model-Based Approach to Language Integration
Marco Torchiano532 views
On the computation of Truck Factor by Marco Torchiano
On the computation of Truck FactorOn the computation of Truck Factor
On the computation of Truck Factor
Marco Torchiano444 views
Language Interaction and Quality Issues: An Exploratory Study by Marco Torchiano
Language Interaction and Quality Issues: An Exploratory StudyLanguage Interaction and Quality Issues: An Exploratory Study
Language Interaction and Quality Issues: An Exploratory Study
Marco Torchiano928 views
The impact of process maturity on defect density by Marco Torchiano
The impact of process maturity on defect densityThe impact of process maturity on defect density
The impact of process maturity on defect density
Marco Torchiano1.3K views

Recently uploaded

Benefits in Software Development by
Benefits in Software DevelopmentBenefits in Software Development
Benefits in Software DevelopmentJohn Valentino
7 views15 slides
Ports-and-Adapters Architecture for Embedded HMI by
Ports-and-Adapters Architecture for Embedded HMIPorts-and-Adapters Architecture for Embedded HMI
Ports-and-Adapters Architecture for Embedded HMIBurkhard Stubert
37 views19 slides
Winter Projects GDSC IITK by
Winter Projects GDSC IITKWinter Projects GDSC IITK
Winter Projects GDSC IITKSahilSingh368445
511 views60 slides
University of Borås-full talk-2023-12-09.pptx by
University of Borås-full talk-2023-12-09.pptxUniversity of Borås-full talk-2023-12-09.pptx
University of Borås-full talk-2023-12-09.pptxMahdi_Fahmideh
13 views51 slides
Streamlining Your Business Operations with Enterprise Application Integration... by
Streamlining Your Business Operations with Enterprise Application Integration...Streamlining Your Business Operations with Enterprise Application Integration...
Streamlining Your Business Operations with Enterprise Application Integration...Flexsin
5 views12 slides
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P... by
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...NimaTorabi2
19 views17 slides

Recently uploaded(20)

Ports-and-Adapters Architecture for Embedded HMI by Burkhard Stubert
Ports-and-Adapters Architecture for Embedded HMIPorts-and-Adapters Architecture for Embedded HMI
Ports-and-Adapters Architecture for Embedded HMI
Burkhard Stubert37 views
University of Borås-full talk-2023-12-09.pptx by Mahdi_Fahmideh
University of Borås-full talk-2023-12-09.pptxUniversity of Borås-full talk-2023-12-09.pptx
University of Borås-full talk-2023-12-09.pptx
Mahdi_Fahmideh13 views
Streamlining Your Business Operations with Enterprise Application Integration... by Flexsin
Streamlining Your Business Operations with Enterprise Application Integration...Streamlining Your Business Operations with Enterprise Application Integration...
Streamlining Your Business Operations with Enterprise Application Integration...
Flexsin 5 views
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P... by NimaTorabi2
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...
NimaTorabi219 views
Automated Testing of Microsoft Power BI Reports by RTTS
Automated Testing of Microsoft Power BI ReportsAutomated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI Reports
RTTS13 views
Electronic AWB - Electronic Air Waybill by Freightoscope
Electronic AWB - Electronic Air Waybill Electronic AWB - Electronic Air Waybill
Electronic AWB - Electronic Air Waybill
Freightoscope 7 views
Introduction to Git Source Control by John Valentino
Introduction to Git Source ControlIntroduction to Git Source Control
Introduction to Git Source Control
John Valentino9 views
Supercharging your Python Development Environment with VS Code and Dev Contai... by Dawn Wages
Supercharging your Python Development Environment with VS Code and Dev Contai...Supercharging your Python Development Environment with VS Code and Dev Contai...
Supercharging your Python Development Environment with VS Code and Dev Contai...
Dawn Wages9 views
Mobile App Development Company by Richestsoft
Mobile App Development CompanyMobile App Development Company
Mobile App Development Company
Richestsoft 6 views
Understanding HTML terminology by artembondar5
Understanding HTML terminologyUnderstanding HTML terminology
Understanding HTML terminology
artembondar59 views
Top-5-production-devconMunich-2023-v2.pptx by Tier1 app
Top-5-production-devconMunich-2023-v2.pptxTop-5-production-devconMunich-2023-v2.pptx
Top-5-production-devconMunich-2023-v2.pptx
Tier1 app9 views
Advanced API Mocking Techniques Using Wiremock by Dimpy Adhikary
Advanced API Mocking Techniques Using WiremockAdvanced API Mocking Techniques Using Wiremock
Advanced API Mocking Techniques Using Wiremock
Dimpy Adhikary5 views
Transport Management System - Shipment & Container Tracking by Freightoscope
Transport Management System - Shipment & Container TrackingTransport Management System - Shipment & Container Tracking
Transport Management System - Shipment & Container Tracking
Freightoscope 6 views

Research Activities: past, present, and future.

  • 1. Research Activities: past, present, and future Marco Torchiano Seminario Pubblico Procedura di selezione per professore universitario di ruolo di I fascia S.C. 09/H1-S.S.D. ING-INF/05 - codice interno 03/18/P/O October 19, 2018
  • 3. Empirical Software Engineering Survey Case study Case-controlExperiment Package effsisze 2.2K download/month
  • 4. 1 On the Effectiveness of the Test-First Approach to Programming Hakan Erdogmus, Maurizio Morisio, Member, IEEE Computer Society, and Marco Torchiano, Member, IEEE Computer Society Abstract—Test-Driven Development (TDD) is based on formalizing a piece of functionality as a test, implementing the functionality such that the test passes, and iterating the process. This paper describes a controlled experiment for evaluating an important aspect of TDD: In TDD, programmers write functional tests before the corresponding implementation code. The experiment was conducted with undergraduate students. While the experiment group applied a test-first strategy, the control group applied a more conventional development technique, writing tests after the implementation. Both groups followed an incremental process, adding new features one at a time and regression testing them. We found that test-first students on average wrote more tests and, in turn, students who wrote more tests tended to be more productive. We also observed that the minimum quality increased linearly with the number of programmer tests, independent of the development strategy employed. Index Terms—General programming techniques, coding tools and techniques, testing and debugging, testing strategies, productivity, Software Quality/SQA, software engineering process, programming paradigms. æ 1 INTRODUCTION TEST-DRIVEN Development [1], [2], or TDD, is an approach to code development popularized by Extreme Program- ming [3], [4]. The key aspect of TDD is that programmers write low-level functional tests before production code. This dynamic, referred to as Test-First, is the focus of this paper. The technique is usually supported by a unit testing tool, such as JUnit [5], [6]. Programmer tests in TDD, much like unit tests, tend to be low-level. Sometimes they can be higher level and cross- cutting, but do not in general address testing at integration and system levels. TDD relies on no specific or formal criterion to select the test cases. The tests are added gradually during implementation. The testing strategy employed differs from the situation where tests for a module are written by a separate quality assurance team before the module is available. In TDD, tests are written one at a time and by the same person developing the module. (Cleanroom [7], for example, also leverages tests that are specified before implementation, but in a very different way. Tests are specified to mirror a statistical user input distribution as part of a formal reliability model, but this activity is completely separated from the implementation phase.) TDD can be considered from several points of view: . Feedback: Tests provide the programmer with instant feedback as to whether new functionality has been implemented as intended and whether it interferes with old functionality. . Task-orientation: Tests drive the coding activity, encouraging the programmer to decompose the problem into manageable, formalized programming tasks, helping to maintain focus and providing steady, measurable progress. . Quality assurance: Having up-to-date tests in place ensures a certain level of quality, maintained by frequently running the tests. . Low-level design: Tests provide the context in which low-level design decisions are made, such as which classes and methods to create, how they will be named, what interfaces they will possess, and how they will be used. Although advocates claim that TDD enhances both product quality and programmer productivity, skeptics maintain that such an approach to programming would be both counterproductive and hard to learn. It is easier to imagine how interleaving coding with testing could increase quality, but harder to imagine a similar positive effect on productivity. Many practitioners consider tests as overhead and testing as exclusively the job of a separate quality assurance team. But, what if the benefits of programmer tests outweigh their cost? If the technique indeed has merit, a reexamination of the traditional coding techniques could be warranted. This paper presents a formal investigation of the strengths and weaknesses of the Test-First approach to programming, the central component of TDD. We study this component in the context in which it is believed to be most effective, namely, incremental development. To undertake the investigation, we designed a controlled 226 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 31, NO. 3, MARCH 2005 . H. Erdogmus is with the Institute for Information Technology, National Research Council Canada, Bldg. M-50, Montreal Rd., Ottawa, Ontario K1A 0R6, Canada. E-mail: Hakan.Erdogmus@nrc-cnrc.gc.ca. . M. Morisio and M. Torchiano are with the Dipartimento Automatica e Informatica, Politecnico di Torino, C.so Duca degli Abruzzi, 24, I-10129, Torino, Italy. E-mail: {Maurizio.Morisio, Marco.Torchiano}@polito.it. Manuscript received 13 Feb. 2004; revised 6 Dec. 2004; accepted 31 Dec. 2004; published online 20 Apr. 2005. Recommended for acceptance by D. Rombach. For information on obtaining reprints of this article, please send e-mail to: tse@computer.org, and reference IEEECS Log Number TSE-0026-0204. 0098-5589/05/$20.00 ß 2005 IEEE Published by the IEEE Computer Society 362 citations
  • 5. ERDOGMUS ET AL.: ON THE EFFECTIVENESS OF THE TEST-FIRST APPROACH TO PROGRAMMING Test-Driven Development • TDD is an agile development practice • Made popular by eXtreme Programming • Adopted widely • Promises: • More and better tests • Higher external quality • Improved productivity
  • 6. Test-Driven Development: findings • Higher number of tests • Impact on quality and productivity • No clear difference in terms of quality • Too small tasks • Process conformance • Slight improvement in productivity • More focused • Less rework On the Effectiveness of the Test-First Approach to Programming Hakan Erdogmus, Maurizio Morisio, Member, IEEE Computer Society, and Marco Torchiano, Member, IEEE Computer Society Abstract—Test-Driven Development (TDD) is based on formalizing a piece of functionality as a test, implementing the functionality such that the test passes, and iterating the process. This paper describes a controlled experiment for evaluating an important aspect of TDD: In TDD, programmers write functional tests before the corresponding implementation code. The experiment was conducted with undergraduate students. While the experiment group applied a test-first strategy, the control group applied a more conventional development technique, writing tests after the implementation. Both groups followed an incremental process, adding new features one at a time and regression testing them. We found that test-first students on average wrote more tests and, in turn, students who wrote more tests tended to be more productive. We also observed that the minimum quality increased linearly with the number of programmer tests, independent of the development strategy employed. Index Terms—General programming techniques, coding tools and techniques, testing and debugging, testing strategies, productivity, Software Quality/SQA, software engineering process, programming paradigms. æ 1 INTRODUCTION TEST-DRIVEN Development [1], [2], or TDD, is an approach to code development popularized by Extreme Program- ming [3], [4]. The key aspect of TDD is that programmers write low-level functional tests before production code. This dynamic, referred to as Test-First, is the focus of this paper. The technique is usually supported by a unit testing tool, such as JUnit [5], [6]. Programmer tests in TDD, much like unit tests, tend to be low-level. Sometimes they can be higher level and cross- cutting, but do not in general address testing at integration and system levels. TDD relies on no specific or formal criterion to select the test cases. The tests are added gradually during implementation. The testing strategy employed differs from the situation where tests for a module are written by a separate quality assurance team before the module is available. In TDD, tests are written one at a time and by the same person developing the module. (Cleanroom [7], for example, also leverages tests that are specified before implementation, but in a very different way. Tests are specified to mirror a statistical user input distribution as part of a formal reliability model, but this activity is completely separated from the implementation phase.) TDD can be considered from several points of view: . Feedback: Tests provide the programmer with instant feedback as to whether new functionality has been implemented as intended and whether it interferes with old functionality. . Task-orientation: Tests drive the coding activity, encouraging the programmer to decompose the problem into manageable, formalized programming tasks, helping to maintain focus and providing steady, measurable progress. . Quality assurance: Having up-to-date tests in place ensures a certain level of quality, maintained by frequently running the tests. . Low-level design: Tests provide the context in which low-level design decisions are made, such as which classes and methods to create, how they will be named, what interfaces they will possess, and how they will be used. Although advocates claim that TDD enhances both product quality and programmer productivity, skeptics maintain that such an approach to programming would be both counterproductive and hard to learn. It is easier to imagine how interleaving coding with testing could increase quality, but harder to imagine a similar positive effect on productivity. Many practitioners consider tests as overhead and testing as exclusively the job of a separate quality assurance team. But, what if the benefits of programmer tests outweigh their cost? If the technique indeed has merit, a reexamination of the traditional coding techniques could be warranted. This paper presents a formal investigation of the strengths and weaknesses of the Test-First approach to programming, the central component of TDD. We study this component in the context in which it is believed to be most effective, namely, incremental development. To undertake the investigation, we designed a controlled 226 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 31, NO. 3, MARCH 2005 . H. Erdogmus is with the Institute for Information Technology, National Research Council Canada, Bldg. M-50, Montreal Rd., Ottawa, Ontario K1A 0R6, Canada. E-mail: Hakan.Erdogmus@nrc-cnrc.gc.ca. . M. Morisio and M. Torchiano are with the Dipartimento Automatica e Informatica, Politecnico di Torino, C.so Duca degli Abruzzi, 24, I-10129, Torino, Italy. E-mail: {Maurizio.Morisio, Marco.Torchiano}@polito.it. Manuscript received 13 Feb. 2004; revised 6 Dec. 2004; accepted 31 Dec. 2004; published online 20 Apr. 2005. Recommended for acceptance by D. Rombach. For information on obtaining reprints of this article, please send e-mail to: tse@computer.org, and reference IEEECS Log Number TSE-0026-0204. 0098-5589/05/$20.00 ß 2005 IEEE Published by the IEEE Computer Society
  • 7. 1 On the Effectiveness of the Test-First Approach to Programming Hakan Erdogmus, Maurizio Morisio, Member, IEEE Computer Society, and Marco Torchiano, Member, IEEE Computer Society Abstract—Test-Driven Development (TDD) is based on formalizing a piece of functionality as a test, implementing the functionality such that the test passes, and iterating the process. This paper describes a controlled experiment for evaluating an important aspect of TDD: In TDD, programmers write functional tests before the corresponding implementation code. The experiment was conducted with undergraduate students. While the experiment group applied a test-first strategy, the control group applied a more conventional development technique, writing tests after the implementation. Both groups followed an incremental process, adding new features one at a time and regression testing them. We found that test-first students on average wrote more tests and, in turn, students who wrote more tests tended to be more productive. We also observed that the minimum quality increased linearly with the number of programmer tests, independent of the development strategy employed. Index Terms—General programming techniques, coding tools and techniques, testing and debugging, testing strategies, productivity, Software Quality/SQA, software engineering process, programming paradigms. æ 1 INTRODUCTION TEST-DRIVEN Development [1], [2], or TDD, is an approach to code development popularized by Extreme Program- ming [3], [4]. The key aspect of TDD is that programmers write low-level functional tests before production code. This dynamic, referred to as Test-First, is the focus of this paper. The technique is usually supported by a unit testing tool, such as JUnit [5], [6]. Programmer tests in TDD, much like unit tests, tend to be low-level. Sometimes they can be higher level and cross- cutting, but do not in general address testing at integration and system levels. TDD relies on no specific or formal criterion to select the test cases. The tests are added gradually during implementation. The testing strategy employed differs from the situation where tests for a module are written by a separate quality assurance team before the module is available. In TDD, tests are written one at a time and by the same person developing the module. (Cleanroom [7], for example, also leverages tests that are specified before implementation, but in a very different way. Tests are specified to mirror a statistical user input distribution as part of a formal reliability model, but this activity is completely separated from the implementation phase.) TDD can be considered from several points of view: . Feedback: Tests provide the programmer with instant feedback as to whether new functionality has been implemented as intended and whether it interferes with old functionality. . Task-orientation: Tests drive the coding activity, encouraging the programmer to decompose the problem into manageable, formalized programming tasks, helping to maintain focus and providing steady, measurable progress. . Quality assurance: Having up-to-date tests in place ensures a certain level of quality, maintained by frequently running the tests. . Low-level design: Tests provide the context in which low-level design decisions are made, such as which classes and methods to create, how they will be named, what interfaces they will possess, and how they will be used. Although advocates claim that TDD enhances both product quality and programmer productivity, skeptics maintain that such an approach to programming would be both counterproductive and hard to learn. It is easier to imagine how interleaving coding with testing could increase quality, but harder to imagine a similar positive effect on productivity. Many practitioners consider tests as overhead and testing as exclusively the job of a separate quality assurance team. But, what if the benefits of programmer tests outweigh their cost? If the technique indeed has merit, a reexamination of the traditional coding techniques could be warranted. This paper presents a formal investigation of the strengths and weaknesses of the Test-First approach to programming, the central component of TDD. We study this component in the context in which it is believed to be most effective, namely, incremental development. To undertake the investigation, we designed a controlled 226 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 31, NO. 3, MARCH 2005 . H. Erdogmus is with the Institute for Information Technology, National Research Council Canada, Bldg. M-50, Montreal Rd., Ottawa, Ontario K1A 0R6, Canada. E-mail: Hakan.Erdogmus@nrc-cnrc.gc.ca. . M. Morisio and M. Torchiano are with the Dipartimento Automatica e Informatica, Politecnico di Torino, C.so Duca degli Abruzzi, 24, I-10129, Torino, Italy. E-mail: {Maurizio.Morisio, Marco.Torchiano}@polito.it. Manuscript received 13 Feb. 2004; revised 6 Dec. 2004; accepted 31 Dec. 2004; published online 20 Apr. 2005. Recommended for acceptance by D. Rombach. For information on obtaining reprints of this article, please send e-mail to: tse@computer.org, and reference IEEECS Log Number TSE-0026-0204. 0098-5589/05/$20.00 ß 2005 IEEE Published by the IEEE Computer Society 362 citations 2 2 IEEE SOFTWARE Published by the IEEE Computer Society 0740-7459/04/$20.00 © 2004 IEEE definition would improve discourse and enable researchers to meta-analyze published empiri- cal data. After a speculative effort to define and clas- sify COTS products,2 we decided to obtain this much-needed understanding from the bottom up. We asked people involved in industrial proj- ects to name the key features in COTS-based de- velopment and to tell us what they think con- stitutes a COTS product. Here, we present the interview results in the form of six theses, which contradict widely accepted (or simply undis- puted) ideas. We also present a definition of “COTS product” that captures the key features. The sample of projects we used is limited and can’t represent the whole IT industry. Instead, our study primarily considers Subject Matter Experts. While SMEs produce most software, they’re usually neglected by research. Specifi- cally, the COTS-development literature seems to focus on large, mostly governmental proj- ects performed in major companies. Our goal is to empirically explore this key topic for the software industry, collect facts and derive well- substantiated theses from these facts. The groundwork From a methodological viewpoint, we con- ducted empirical research following two paral- lel threads that complement each other: a sys- tematic literature investigation and an on-field exploratory study based on structured inter- views. Because the industry doesn’t have a com- mon understanding of COTS products, we laid feature Overlooked Aspects of COTS-based Development A lthough developing with commercial-off-the-shelf components is gaining more attention from both research and industrial com- munities,1 most literature on the topic doesn’t clearly identify context variables such as the type of products, projects, and sys- tems. In particular, the literature often lacks a definition of “COTS prod- uct,” or, when a definition is present, it invariably disagrees with other studies (see the sidebar, “What COTS Product Do You Mean?”). A shared cots Studies on commercial-off-the-shelf-based development often disagree, lack product and project details, and are founded on uncritically accepted assumptions. This empirical study establishes key features of industrial COTS-based development and creates a definition of “COTS products” that captures these features. Marco Torchiano, Norwegian University of Science and Technology Maurizio Morisio, Politecnico di Torino 158 citations
  • 8. Development with OTS components • Development with Off-The-Shelf components • Systems are built through integration of existing components • The marketplace influences the evolution of OTS components • Continuous trade-off between system and component features • Strong push from DoD since late ‘90s • Series of conferences driven by SEI since 2002 • As system become more complex all sw is based on OTS components • Goal: state of the practice of OTS-based development in EU industry
  • 9. OTS-Based Development: 6 theses T1 Open source software is often used as closed source. T2 Integration problems result from lack of standard compliance; architectural mismatches constitute a secondary issue. T3 Custom code mainly provides additional functionalities. T4 Developers seldom use formal selection procedures. Familiarity with either the product or the generic architecture is the key factor in selection. T5 Architecture is more important than requirements for product selection. T6 Integrators tend to influence the vendor concerning product evolution whenever possible. 2 IEEE SOFTWARE Published by the IEEE Computer Society 0740-7459/04/$20.00 © 2004 IEEE definition would improve discourse and enable researchers to meta-analyze published empiri- cal data. After a speculative effort to define and clas- sify COTS products,2 we decided to obtain this much-needed understanding from the bottom up. We asked people involved in industrial proj- ects to name the key features in COTS-based de- velopment and to tell us what they think con- stitutes a COTS product. Here, we present the interview results in the form of six theses, which contradict widely accepted (or simply undis- puted) ideas. We also present a definition of “COTS product” that captures the key features. The sample of projects we used is limited and can’t represent the whole IT industry. Instead, our study primarily considers Subject Matter Experts. While SMEs produce most software, they’re usually neglected by research. Specifi- cally, the COTS-development literature seems to focus on large, mostly governmental proj- ects performed in major companies. Our goal is to empirically explore this key topic for the software industry, collect facts and derive well- substantiated theses from these facts. The groundwork From a methodological viewpoint, we con- ducted empirical research following two paral- lel threads that complement each other: a sys- tematic literature investigation and an on-field exploratory study based on structured inter- views. Because the industry doesn’t have a com- mon understanding of COTS products, we laid feature Overlooked Aspects of COTS-based Development A lthough developing with commercial-off-the-shelf components is gaining more attention from both research and industrial com- munities,1 most literature on the topic doesn’t clearly identify context variables such as the type of products, projects, and sys- tems. In particular, the literature often lacks a definition of “COTS prod- uct,” or, when a definition is present, it invariably disagrees with other studies (see the sidebar, “What COTS Product Do You Mean?”). A shared cots Studies on commercial-off-the-shelf-based development often disagree, lack product and project details, and are founded on uncritically accepted assumptions. This empirical study establishes key features of industrial COTS-based development and creates a definition of “COTS products” that captures these features. Marco Torchiano, Norwegian University of Science and Technology Maurizio Morisio, Politecnico di Torino [49] B.H. Cohen, Explaining P Wiley & Sons, 2000. [50] Common Risks and Risk Re www.mitre.org/work/s CommonRisksCOTS.doc, [51] V.R. Basili and B.W. Boeh Computer, vol. 34, no. 5, p [52] S. Lauesen, “COTS Ten J. Requirements Eng., vol. [53] J. Li, R. Conradi, O.P.N Torchiano, and M. Morisio Shelf Component Based Software Metrics Symp., (a [54] J. Cohen, P. Cohen, S.G. Regression/Correlation Ana Lawrence Erlbaum Assoc [55] P.M. Podsakoof and D.W Research: Problems and P pp. 531-544, 1986. [56] S.J. Pocock, M.D. Hughe Reporting of Clinical Tria The New England J. Medic Jingyu ter scie Science present worked NTNU. enginee velopm project program Computer Society and the ACM Reidar degree Science in 1970 NTNU s Depart Science at the F ware E Politecn current research interests lie improvement, version models, ware, and associated empirica Odd P degree Univers fellow Univers heim, N risk ma ware e open so 286 Thanks to:
  • 10. 1 On the Effectiveness of the Test-First Approach to Programming Hakan Erdogmus, Maurizio Morisio, Member, IEEE Computer Society, and Marco Torchiano, Member, IEEE Computer Society Abstract—Test-Driven Development (TDD) is based on formalizing a piece of functionality as a test, implementing the functionality such that the test passes, and iterating the process. This paper describes a controlled experiment for evaluating an important aspect of TDD: In TDD, programmers write functional tests before the corresponding implementation code. The experiment was conducted with undergraduate students. While the experiment group applied a test-first strategy, the control group applied a more conventional development technique, writing tests after the implementation. Both groups followed an incremental process, adding new features one at a time and regression testing them. We found that test-first students on average wrote more tests and, in turn, students who wrote more tests tended to be more productive. We also observed that the minimum quality increased linearly with the number of programmer tests, independent of the development strategy employed. Index Terms—General programming techniques, coding tools and techniques, testing and debugging, testing strategies, productivity, Software Quality/SQA, software engineering process, programming paradigms. æ 1 INTRODUCTION TEST-DRIVEN Development [1], [2], or TDD, is an approach to code development popularized by Extreme Program- ming [3], [4]. The key aspect of TDD is that programmers write low-level functional tests before production code. This dynamic, referred to as Test-First, is the focus of this paper. The technique is usually supported by a unit testing tool, such as JUnit [5], [6]. Programmer tests in TDD, much like unit tests, tend to be low-level. Sometimes they can be higher level and cross- cutting, but do not in general address testing at integration and system levels. TDD relies on no specific or formal criterion to select the test cases. The tests are added gradually during implementation. The testing strategy employed differs from the situation where tests for a module are written by a separate quality assurance team before the module is available. In TDD, tests are written one at a time and by the same person developing the module. (Cleanroom [7], for example, also leverages tests that are specified before implementation, but in a very different way. Tests are specified to mirror a statistical user input distribution as part of a formal reliability model, but this activity is completely separated from the implementation phase.) TDD can be considered from several points of view: . Feedback: Tests provide the programmer with instant feedback as to whether new functionality has been implemented as intended and whether it interferes with old functionality. . Task-orientation: Tests drive the coding activity, encouraging the programmer to decompose the problem into manageable, formalized programming tasks, helping to maintain focus and providing steady, measurable progress. . Quality assurance: Having up-to-date tests in place ensures a certain level of quality, maintained by frequently running the tests. . Low-level design: Tests provide the context in which low-level design decisions are made, such as which classes and methods to create, how they will be named, what interfaces they will possess, and how they will be used. Although advocates claim that TDD enhances both product quality and programmer productivity, skeptics maintain that such an approach to programming would be both counterproductive and hard to learn. It is easier to imagine how interleaving coding with testing could increase quality, but harder to imagine a similar positive effect on productivity. Many practitioners consider tests as overhead and testing as exclusively the job of a separate quality assurance team. But, what if the benefits of programmer tests outweigh their cost? If the technique indeed has merit, a reexamination of the traditional coding techniques could be warranted. This paper presents a formal investigation of the strengths and weaknesses of the Test-First approach to programming, the central component of TDD. We study this component in the context in which it is believed to be most effective, namely, incremental development. To undertake the investigation, we designed a controlled 226 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 31, NO. 3, MARCH 2005 . H. Erdogmus is with the Institute for Information Technology, National Research Council Canada, Bldg. M-50, Montreal Rd., Ottawa, Ontario K1A 0R6, Canada. E-mail: Hakan.Erdogmus@nrc-cnrc.gc.ca. . M. Morisio and M. Torchiano are with the Dipartimento Automatica e Informatica, Politecnico di Torino, C.so Duca degli Abruzzi, 24, I-10129, Torino, Italy. E-mail: {Maurizio.Morisio, Marco.Torchiano}@polito.it. Manuscript received 13 Feb. 2004; revised 6 Dec. 2004; accepted 31 Dec. 2004; published online 20 Apr. 2005. Recommended for acceptance by D. Rombach. For information on obtaining reprints of this article, please send e-mail to: tse@computer.org, and reference IEEECS Log Number TSE-0026-0204. 0098-5589/05/$20.00 ß 2005 IEEE Published by the IEEE Computer Society 362 citations 2 2 IEEE SOFTWARE Published by the IEEE Computer Society 0740-7459/04/$20.00 © 2004 IEEE definition would improve discourse and enable researchers to meta-analyze published empiri- cal data. After a speculative effort to define and clas- sify COTS products,2 we decided to obtain this much-needed understanding from the bottom up. We asked people involved in industrial proj- ects to name the key features in COTS-based de- velopment and to tell us what they think con- stitutes a COTS product. Here, we present the interview results in the form of six theses, which contradict widely accepted (or simply undis- puted) ideas. We also present a definition of “COTS product” that captures the key features. The sample of projects we used is limited and can’t represent the whole IT industry. Instead, our study primarily considers Subject Matter Experts. While SMEs produce most software, they’re usually neglected by research. Specifi- cally, the COTS-development literature seems to focus on large, mostly governmental proj- ects performed in major companies. Our goal is to empirically explore this key topic for the software industry, collect facts and derive well- substantiated theses from these facts. The groundwork From a methodological viewpoint, we con- ducted empirical research following two paral- lel threads that complement each other: a sys- tematic literature investigation and an on-field exploratory study based on structured inter- views. Because the industry doesn’t have a com- mon understanding of COTS products, we laid feature Overlooked Aspects of COTS-based Development A lthough developing with commercial-off-the-shelf components is gaining more attention from both research and industrial com- munities,1 most literature on the topic doesn’t clearly identify context variables such as the type of products, projects, and sys- tems. In particular, the literature often lacks a definition of “COTS prod- uct,” or, when a definition is present, it invariably disagrees with other studies (see the sidebar, “What COTS Product Do You Mean?”). A shared cots Studies on commercial-off-the-shelf-based development often disagree, lack product and project details, and are founded on uncritically accepted assumptions. This empirical study establishes key features of industrial COTS-based development and creates a definition of “COTS products” that captures these features. Marco Torchiano, Norwegian University of Science and Technology Maurizio Morisio, Politecnico di Torino 158 citations 3 The Journal of Systems and Software 86 (2013) 2110–2126 Contents lists available at SciVerse ScienceDirect The Journal of Systems and Software journal homepage: www.elsevier.com/locate/jss Relevance, benefits, and problems of software modelling and model driven techniques—A survey in the Italian industry Marco Torchianoa,∗ , Federico Tomassettia , Filippo Riccab , Alessandro Tisob , Gianna Reggiob a Dip. Automatica e Informatica, Politecnico di Torino, Italy b DIBRIS, Università di Genova, Italy a r t i c l e i n f o Article history: Received 23 August 2012 Received in revised form 15 March 2013 Accepted 18 March 2013 Available online 1 April 2013 Keywords: Model driven techniques Software modelling Personal opinion survey a b s t r a c t Context: Claimed benefits of software modelling and model driven techniques are improvements in pro- ductivity, portability, maintainability and interoperability. However, little effort has been devoted at collecting evidence to evaluate their actual relevance, benefits and usage complications. Goal: The main goals of this paper are: (1) assess the diffusion and relevance of software modelling and MD techniques in the Italian industry, (2) understand the expected and achieved benefits, and (3) identify which problems limit/prevent their diffusion. Method: We conducted an exploratory personal opinion survey with a sample of 155 Italian software professionals by means of a Web-based questionnaire on-line from February to April 2011. Results: Software modelling and MD techniques are very relevant in the Italian industry. The adoption of simple modelling brings common benefits (better design support, documentation improvement, better maintenance, and higher software quality), while MD techniques make it easier to achieve: improved standardization, higher productivity, and platform independence. We identified problems, some hinder- ing adoption (too much effort required and limited usefulness) others preventing it (lack of competencies and supporting tools). Conclusions: The relevance represents an important objective motivation for researchers in this area. The relationship between techniques and attainable benefits represents an instrument for practitioners planning the adoption of such techniques. In addition the findings may provide hints for companies and universities. © 2013 Elsevier Inc. All rights reserved. 1. Introduction Usually, model-based techniques use models to describe the architecture and design of a system and/or the behaviour of soft- ware artefacts generally through a raise in the level of abstraction (France and Rumpe, 2003). Models can also be used for other purposes, e.g., to describe business workflows or development pro- cesses. They can be used in different phases of the development process as communication artefacts, as points of references against which subsequent implementations are verified, or as the basis for further development. In the latter case, they may become the key elements in the process, from which other artefacts (most notably code) are generated. ∗ Corresponding author at: Dip. Automatica e Informatica, Politecnico di Torino, C.so Duca degli Abruzzi 24, 10129 Torino, Italy. Tel.: +39 011 564 7088; fax: +39 011 564 7099. E-mail address: marco.torchiano@polito.it (M. Torchiano). The term “model” is very general; it is difficult to provide a com- prehensive yet precise definition of a model. For us, a model is an artefact realized with the goal to capture and transfer information in a form as pure as possible limiting the syntax noise. Exam- ples of models include UML design models (Object Management Group, 2011b), process models defined through BPMN (Object Management Group, 2011a), Web application models defined through WebML (Ceri et al., 2000) as well as textual Domain Specific Language (DSL) models (Fowler and Parsons, 2011). Given that models are so heterogeneous and the processes involving them so varied, it is actually difficult to define the exact boundaries for modelling. Moreover, models can be used practically in different ways and with different levels of maturity (Tomassetti et al., 2012). While in theory we have a set of best practices, in practice we only found mixed, half baked, and personalized solu- tions. As a consequence, the different uses of modelling are complex and difficult to classify precisely. When models constitute a crucial and operational ingredi- ent of software development then the process is called model driven. There are different model driven techniques: Model Driven 0164-1212/$ –see front matter © 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.jss.2013.03.084 67 citations
  • 11. MDD State of the Practice • Model-Driven Development is not just modeling • Code generation • Model interpretation • Model transformation • Often, toolsmithing • Promised improvements: • Productivity • Portability • Maintainability and • Interoperability • … MDD ≠
  • 12. MDD Benefits The Journal of Systems and Software 86 (2013) 2110–2126 Contents lists available at SciVerse ScienceDirect The Journal of Systems and Software journal homepage: www.elsevier.com/locate/jss Relevance, benefits, and problems of software modelling and model driven techniques—A survey in the Italian industry Marco Torchianoa,∗ , Federico Tomassettia , Filippo Riccab , Alessandro Tisob , Gianna Reggiob a Dip. Automatica e Informatica, Politecnico di Torino, Italy b DIBRIS, Università di Genova, Italy a r t i c l e i n f o Article history: Received 23 August 2012 Received in revised form 15 March 2013 Accepted 18 March 2013 Available online 1 April 2013 Keywords: Model driven techniques Software modelling Personal opinion survey a b s t r a c t Context: Claimed benefits of software modelling and model driven techniques are improvements in pro- ductivity, portability, maintainability and interoperability. However, little effort has been devoted at collecting evidence to evaluate their actual relevance, benefits and usage complications. Goal: The main goals of this paper are: (1) assess the diffusion and relevance of software modelling and MD techniques in the Italian industry, (2) understand the expected and achieved benefits, and (3) identify which problems limit/prevent their diffusion. Method: We conducted an exploratory personal opinion survey with a sample of 155 Italian software professionals by means of a Web-based questionnaire on-line from February to April 2011. Results: Software modelling and MD techniques are very relevant in the Italian industry. The adoption of simple modelling brings common benefits (better design support, documentation improvement, better maintenance, and higher software quality), while MD techniques make it easier to achieve: improved standardization, higher productivity, and platform independence. We identified problems, some hinder- ing adoption (too much effort required and limited usefulness) others preventing it (lack of competencies and supporting tools). Conclusions: The relevance represents an important objective motivation for researchers in this area. The relationship between techniques and attainable benefits represents an instrument for practitioners planning the adoption of such techniques. In addition the findings may provide hints for companies and universities. © 2013 Elsevier Inc. All rights reserved. 1. Introduction Usually, model-based techniques use models to describe the architecture and design of a system and/or the behaviour of soft- ware artefacts generally through a raise in the level of abstraction (France and Rumpe, 2003). Models can also be used for other purposes, e.g., to describe business workflows or development pro- cesses. They can be used in different phases of the development process as communication artefacts, as points of references against which subsequent implementations are verified, or as the basis for further development. In the latter case, they may become the key elements in the process, from which other artefacts (most notably code) are generated. ∗ Corresponding author at: Dip. Automatica e Informatica, Politecnico di Torino, C.so Duca degli Abruzzi 24, 10129 Torino, Italy. Tel.: +39 011 564 7088; fax: +39 011 564 7099. E-mail address: marco.torchiano@polito.it (M. Torchiano). The term “model” is very general; it is difficult to provide a com- prehensive yet precise definition of a model. For us, a model is an artefact realized with the goal to capture and transfer information in a form as pure as possible limiting the syntax noise. Exam- ples of models include UML design models (Object Management Group, 2011b), process models defined through BPMN (Object Management Group, 2011a), Web application models defined through WebML (Ceri et al., 2000) as well as textual Domain Specific Language (DSL) models (Fowler and Parsons, 2011). Given that models are so heterogeneous and the processes involving them so varied, it is actually difficult to define the exact boundaries for modelling. Moreover, models can be used practically in different ways and with different levels of maturity (Tomassetti et al., 2012). While in theory we have a set of best practices, in practice we only found mixed, half baked, and personalized solu- tions. As a consequence, the different uses of modelling are complex and difficult to classify precisely. When models constitute a crucial and operational ingredi- ent of software development then the process is called model driven. There are different model driven techniques: Model Driven 0164-1212/$ –see front matter © 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.jss.2013.03.084 2126 M. Torchiano et al. / The Journal of Systems and Software 86 (2013) 2110–2126 Motulsky, H., 2010. Intuitive Biostatistics: A Nonmathematical Guide to Statistical Thinking. Oxford University Press, New York. Nugroho, A., Chaudron, M.R.,2008. A survey into the rigor of UML use and its perceived impact on quality and productivity. In: Proceedings of the ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM’08. ACM, New York, NY, USA, pp. 90–99. Object Management Group, 2011a. Business Process Model and Notation, v. 2.0, Standard. Object Management Group. Object Management Group, 2011b. Unified Modeling Language, Superstructure, v. 2.4, Specifications. Object Management Group. Punter, T., Ciolkowski, M., Freimut, B., John, I., 2003. Conducting on-line surveys in software engineering. In: Proceedings of the International Symposium on Empirical Software Engineering (ISESE), pp. 80–88. Schmidt, D.C., 2006. Guest editor’s introduction: model-driven engineering. Com- puter 39, 25–31. Selic, B., 2003. The pragmatics of model-driven development. IEEE Software 20, 19–25. Singer, J., Storey, M., Damian, D., 2008. Selecting empirical methods for software engineering research. In: Singer, Shull, Sjoberg (Eds.), Guide to Advanced Empir- ical Software Engineering. Springer, London, pp. 285–311. Tomassetti, F., Torchiano, M., Tiso, A., Ricca, F., Reggio, G., 2012. Maturity of software modelling and model driven engineering: a survey in the Italian industry. IET Seminar Digests 2012, 91–100. Torchiano, M., Di Penta, M., Ricca, F., De Lucia, A., Lanubile, F., 2011a. Migration of information systems in the Italian industry: a state of the practice survey. Information and Software Technology 53, 71–86. Torchiano, M., Tomassetti, F., Ricca, F., Tiso, A., Reggio, G.,2011b. Preliminary findings from a survey on the MD* state of the practice. In: International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, pp. 372–375. van Deursen, A., Visser, E., Warmer, J., 2007. Model-driven software evolution: a research agenda. Technical Report, TUDelft-SERG. Voelter, M., 2009. Best practices for DSLs and model-driven development. Journal of Object Technology 8 http://www.jot.fm Walonick, D.S., 1997. Survival Statistics. StatPac, Inc. Wohlin, C., Runeson, P., Höst, M., Ohlsson, M., Regnell, B., Wesslén, A., 2000. Experimentation in Software Engineering – An Introduction. Kluwer Academic Publishers. Marco Torchiano is an associate professor at Politecnico di Torino, Italy; he has been post-doctoral research fel- low at Norwegian University of Science and Technology (NTNU), Norway. He received an MSc and a PhD in Com- puter Engineering from Politecnico di Torino, Italy. He is author or coauthor of more than 100 research papers published in international journals and conferences. He is the co-author of the book ‘Software Development–Case studies in Java’ from Addison-Wesley, and co-editor of the book ‘Developing Services for the Wireless Internet’ from Springer. His current research interests are: design notations, testing methodologies, OTS-based develop- ment and software engineering for mobile and wireless applications. The methodological approach he adopts is that of empirical software engineering. Federico Tomassetti is a PhD student in the Dept. of Con- trol and Computer Engineering at Politecnico di Torino. He is interested in all the means to lift the level of abstrac- tion, in particular in model-driven development, domain specific languages and language workbenches. He has experience also as developer, consultant and technical writer. Filippo Ricca is an ass of Genova, Italy. He r puter Science from th the thesis “Analysis, T Applications”. In 2011 (Most Influential Pape and Testing of Web Ap He is author or co papers published in i ences/workshops. At th h-index reported by G the number of citation Chair of CSMR 2013, IC others, he served in the program committees of ICST, SCAM, CSMR, WCRE and ESEM. From 1999 to 2006, he worked with the Soft (now FBK-irst), Trento, Italy. During this time he on Reverse engineering, Re-engineering and Soft His current research interests include: Softwa Empirical studies in Software Engineering, Web The research is mainly conducted through empir controlled experiments and surveys. Alessandro Tiso is a Ph University of Genoa. He applied mathematics i interests are: model dr mations and use of mod last 20 years he also w industry. Gianna Reggio is Ass Genova since 1992. Pre at the same university, in 1988. She took part i them SAHARA and SAL eral international conf relevant being ETAPS been a member of the p national conferences a 2008, and 2007). Her on the following topics concurrent systems m and semantics of prog started working on development methods for S several national and international conference an 2126 M. Torchiano et al. / The Journal of Systems and Software 86 (2013) 2110–2126 Motulsky, H., 2010. Intuitive Biostatistics: A Nonmathematical Guide to Statistical Thinking. Oxford University Press, New York. Nugroho, A., Chaudron, M.R.,2008. A survey into the rigor of UML use and its perceived impact on quality and productivity. In: Proceedings of the ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM’08. ACM, New York, NY, USA, pp. 90–99. Object Management Group, 2011a. Business Process Model and Notation, v. 2.0, Standard. Object Management Group. Object Management Group, 2011b. Unified Modeling Language, Superstructure, v. 2.4, Specifications. Object Management Group. Punter, T., Ciolkowski, M., Freimut, B., John, I., 2003. Conducting on-line surveys in software engineering. In: Proceedings of the International Symposium on Empirical Software Engineering (ISESE), pp. 80–88. Schmidt, D.C., 2006. Guest editor’s introduction: model-driven engineering. Com- puter 39, 25–31. Selic, B., 2003. The pragmatics of model-driven development. IEEE Software 20, 19–25. Singer, J., Storey, M., Damian, D., 2008. Selecting empirical methods for software engineering research. In: Singer, Shull, Sjoberg (Eds.), Guide to Advanced Empir- ical Software Engineering. Springer, London, pp. 285–311. Tomassetti, F., Torchiano, M., Tiso, A., Ricca, F., Reggio, G., 2012. Maturity of software modelling and model driven engineering: a survey in the Italian industry. IET Seminar Digests 2012, 91–100. Torchiano, M., Di Penta, M., Ricca, F., De Lucia, A., Lanubile, F., 2011a. Migration of information systems in the Italian industry: a state of the practice survey. Information and Software Technology 53, 71–86. Torchiano, M., Tomassetti, F., Ricca, F., Tiso, A., Reggio, G.,2011b. Preliminary findings from a survey on the MD* state of the practice. In: International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, pp. 372–375. van Deursen, A., Visser, E., Warmer, J., 2007. Model-driven software evolution: a research agenda. Technical Report, TUDelft-SERG. Voelter, M., 2009. Best practices for DSLs and model-driven development. Journal of Object Technology 8 http://www.jot.fm Walonick, D.S., 1997. Survival Statistics. StatPac, Inc. Wohlin, C., Runeson, P., Höst, M., Ohlsson, M., Regnell, B., Wesslén, A., 2000. Experimentation in Software Engineering – An Introduction. Kluwer Academic Publishers. Marco Torchiano is an associate professor at Politecnico di Torino, Italy; he has been post-doctoral research fel- low at Norwegian University of Science and Technology (NTNU), Norway. He received an MSc and a PhD in Com- puter Engineering from Politecnico di Torino, Italy. He is author or coauthor of more than 100 research papers published in international journals and conferences. He is the co-author of the book ‘Software Development–Case studies in Java’ from Addison-Wesley, and co-editor of the book ‘Developing Services for the Wireless Internet’ from Springer. His current research interests are: design notations, testing methodologies, OTS-based develop- ment and software engineering for mobile and wireless applications. The methodological approach he adopts is that of empirical software engineering. Federico Tomassetti is a PhD student in the Dept. of Con- trol and Computer Engineering at Politecnico di Torino. He is interested in all the means to lift the level of abstrac- tion, in particular in model-driven development, domain specific languages and language workbenches. He has experience also as developer, consultant and technical writer. Filippo Ricca is an assistant professor at the University of Genova, Italy. He received his PhD degree in Com- puter Science from the same University, in 2003, with the thesis “Analysis, Testing and Re-structuring of Web Applications”. In 2011 he was awarded the ICSE 2001 MIP (Most Influential Paper) award, for his paper: “Analysis and Testing of Web Applications”. He is author or coauthor of more than 90 research papers published in international journals and confer- ences/workshops. At the time of writing (20/03/2013), the h-index reported by Google Scholar is 24 (13 in Scopus); the number of citations 2002. Filippo Ricca was Program Chair of CSMR 2013, ICPC 2011 and WSE 2008. Among the others, he served in the program committees of the following conferences: ICSM, ICST, SCAM, CSMR, WCRE and ESEM. From 1999 to 2006, he worked with the Software Engineering group at ITC-irst (now FBK-irst), Trento, Italy. During this time he was part of the team that worked on Reverse engineering, Re-engineering and Software Testing. His current research interests include: Software modeling, Reverse engineering, Empirical studies in Software Engineering, Web applications and Software Testing. The research is mainly conducted through empirical methods such as case studies, controlled experiments and surveys. Alessandro Tiso is a PhD student in Computer Science at University of Genoa. He is also a teacher of informatics and applied mathematics in a high school. His main research interests are: model driven development, model transfor- mations and use of models in software engineering. In the last 20 years he also worked as an IT consultant in the industry. Gianna Reggio is Associate Professor at University of Genova since 1992. Previously she was Assistant Professor at the same university, she receive the PhD in Informatics in 1988. She took part in several research projects, among them SAHARA and SALADIN (PRIN) and co-organized sev- eral international conferences and workshops, the most relevant being ETAPS 2001 and MODELS 2006. She has been a member of the program committee of several inter- national conferences and in particular MODELS (in 2009, 2008, and 2007). Her research activity focused mainly on the following topics: software development methods, concurrent systems modeling and formal specification, and semantics of programming languages. Recently she started working on development methods for SOA applications. She is author of several national and international conference and journals papers and books. 2126 M. Torchiano et al. / The Journal of Systems and Software 86 (2013) 2110–2126 Motulsky, H., 2010. Intuitive Biostatistics: A Nonmathematical Guide to Statistical Thinking. Oxford University Press, New York. Nugroho, A., Chaudron, M.R.,2008. A survey into the rigor of UML use and its perceived impact on quality and productivity. In: Proceedings of the ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM’08. ACM, New York, NY, USA, pp. 90–99. Object Management Group, 2011a. Business Process Model and Notation, v. 2.0, Standard. Object Management Group. Object Management Group, 2011b. Unified Modeling Language, Superstructure, v. 2.4, Specifications. Object Management Group. Punter, T., Ciolkowski, M., Freimut, B., John, I., 2003. Conducting on-line surveys in software engineering. In: Proceedings of the International Symposium on Empirical Software Engineering (ISESE), pp. 80–88. Schmidt, D.C., 2006. Guest editor’s introduction: model-driven engineering. Com- puter 39, 25–31. Selic, B., 2003. The pragmatics of model-driven development. IEEE Software 20, 19–25. Singer, J., Storey, M., Damian, D., 2008. Selecting empirical methods for software engineering research. In: Singer, Shull, Sjoberg (Eds.), Guide to Advanced Empir- ical Software Engineering. Springer, London, pp. 285–311. Tomassetti, F., Torchiano, M., Tiso, A., Ricca, F., Reggio, G., 2012. Maturity of software modelling and model driven engineering: a survey in the Italian industry. IET Seminar Digests 2012, 91–100. Torchiano, M., Di Penta, M., Ricca, F., De Lucia, A., Lanubile, F., 2011a. Migration of information systems in the Italian industry: a state of the practice survey. Information and Software Technology 53, 71–86. Torchiano, M., Tomassetti, F., Ricca, F., Tiso, A., Reggio, G.,2011b. Preliminary findings from a survey on the MD* state of the practice. In: International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, pp. 372–375. van Deursen, A., Visser, E., Warmer, J., 2007. Model-driven software evolution: a research agenda. Technical Report, TUDelft-SERG. Voelter, M., 2009. Best practices for DSLs and model-driven development. Journal of Object Technology 8 http://www.jot.fm Walonick, D.S., 1997. Survival Statistics. StatPac, Inc. Wohlin, C., Runeson, P., Höst, M., Ohlsson, M., Regnell, B., Wesslén, A., 2000. Experimentation in Software Engineering – An Introduction. Kluwer Academic Publishers. Marco Torchiano is an associate professor at Politecnico di Torino, Italy; he has been post-doctoral research fel- low at Norwegian University of Science and Technology (NTNU), Norway. He received an MSc and a PhD in Com- puter Engineering from Politecnico di Torino, Italy. He is author or coauthor of more than 100 research papers published in international journals and conferences. He is the co-author of the book ‘Software Development–Case studies in Java’ from Addison-Wesley, and co-editor of the book ‘Developing Services for the Wireless Internet’ from Springer. His current research interests are: design notations, testing methodologies, OTS-based develop- ment and software engineering for mobile and wireless applications. The methodological approach he adopts is that of empirical software engineering. Federico Tomassetti is a PhD student in the Dept. of Con- trol and Computer Engineering at Politecnico di Torino. He is interested in all the means to lift the level of abstrac- tion, in particular in model-driven development, domain specific languages and language workbenches. He has experience also as developer, consultant and technical writer. Filippo Ricca is an assistant professor at the University of Genova, Italy. He received his PhD degree in Com- puter Science from the same University, in 2003, with the thesis “Analysis, Testing and Re-structuring of Web Applications”. In 2011 he was awarded the ICSE 2001 MIP (Most Influential Paper) award, for his paper: “Analysis and Testing of Web Applications”. He is author or coauthor of more than 90 research papers published in international journals and confer- ences/workshops. At the time of writing (20/03/2013), the h-index reported by Google Scholar is 24 (13 in Scopus); the number of citations 2002. Filippo Ricca was Program Chair of CSMR 2013, ICPC 2011 and WSE 2008. Among the others, he served in the program committees of the following conferences: ICSM, ICST, SCAM, CSMR, WCRE and ESEM. From 1999 to 2006, he worked with the Software Engineering group at ITC-irst (now FBK-irst), Trento, Italy. During this time he was part of the team that worked on Reverse engineering, Re-engineering and Software Testing. His current research interests include: Software modeling, Reverse engineering, Empirical studies in Software Engineering, Web applications and Software Testing. The research is mainly conducted through empirical methods such as case studies, controlled experiments and surveys. Alessandro Tiso is a PhD student in Computer Science at University of Genoa. He is also a teacher of informatics and applied mathematics in a high school. His main research interests are: model driven development, model transfor- mations and use of models in software engineering. In the last 20 years he also worked as an IT consultant in the industry. Gianna Reggio is Associate Professor at University of Genova since 1992. Previously she was Assistant Professor at the same university, she receive the PhD in Informatics in 1988. She took part in several research projects, among them SAHARA and SALADIN (PRIN) and co-organized sev- eral international conferences and workshops, the most relevant being ETAPS 2001 and MODELS 2006. She has been a member of the program committee of several inter- national conferences and in particular MODELS (in 2009, 2008, and 2007). Her research activity focused mainly on the following topics: software development methods, concurrent systems modeling and formal specification, and semantics of programming languages. Recently she started working on development methods for SOA applications. She is author of several national and international conference and journals papers and books. 2126 M. Torchiano et al. / The Journal of Systems and Software 86 (2013) 2110–2126 Motulsky, H., 2010. Intuitive Biostatistics: A Nonmathematical Guide to Statistical Thinking. Oxford University Press, New York. Nugroho, A., Chaudron, M.R.,2008. A survey into the rigor of UML use and its perceived impact on quality and productivity. In: Proceedings of the ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM’08. ACM, New York, NY, USA, pp. 90–99. Object Management Group, 2011a. Business Process Model and Notation, v. 2.0, Standard. Object Management Group. Object Management Group, 2011b. Unified Modeling Language, Superstructure, v. 2.4, Specifications. Object Management Group. Punter, T., Ciolkowski, M., Freimut, B., John, I., 2003. Conducting on-line surveys in software engineering. In: Proceedings of the International Symposium on Empirical Software Engineering (ISESE), pp. 80–88. Schmidt, D.C., 2006. Guest editor’s introduction: model-driven engineering. Com- puter 39, 25–31. Selic, B., 2003. The pragmatics of model-driven development. IEEE Software 20, 19–25. Singer, J., Storey, M., Damian, D., 2008. Selecting empirical methods for software engineering research. In: Singer, Shull, Sjoberg (Eds.), Guide to Advanced Empir- ical Software Engineering. Springer, London, pp. 285–311. Tomassetti, F., Torchiano, M., Tiso, A., Ricca, F., Reggio, G., 2012. Maturity of software modelling and model driven engineering: a survey in the Italian industry. IET Seminar Digests 2012, 91–100. Torchiano, M., Di Penta, M., Ricca, F., De Lucia, A., Lanubile, F., 2011a. Migration of information systems in the Italian industry: a state of the practice survey. Information and Software Technology 53, 71–86. Torchiano, M., Tomassetti, F., Ricca, F., Tiso, A., Reggio, G.,2011b. Preliminary findings from a survey on the MD* state of the practice. In: International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, pp. 372–375. van Deursen, A., Visser, E., Warmer, J., 2007. Model-driven software evolution: a research agenda. Technical Report, TUDelft-SERG. Voelter, M., 2009. Best practices for DSLs and model-driven development. Journal of Object Technology 8 http://www.jot.fm Walonick, D.S., 1997. Survival Statistics. StatPac, Inc. Wohlin, C., Runeson, P., Höst, M., Ohlsson, M., Regnell, B., Wesslén, A., 2000. Experimentation in Software Engineering – An Introduction. Kluwer Academic Publishers. Marco Torchiano is an associate professor at Politecnico di Torino, Italy; he has been post-doctoral research fel- low at Norwegian University of Science and Technology (NTNU), Norway. He received an MSc and a PhD in Com- puter Engineering from Politecnico di Torino, Italy. He is author or coauthor of more than 100 research papers published in international journals and conferences. He is the co-author of the book ‘Software Development–Case studies in Java’ from Addison-Wesley, and co-editor of the book ‘Developing Services for the Wireless Internet’ from Springer. His current research interests are: design notations, testing methodologies, OTS-based develop- ment and software engineering for mobile and wireless applications. The methodological approach he adopts is that of empirical software engineering. Federico Tomassetti is a PhD student in the Dept. of Con- trol and Computer Engineering at Politecnico di Torino. He is interested in all the means to lift the level of abstrac- tion, in particular in model-driven development, domain specific languages and language workbenches. He has experience also as developer, consultant and technical writer. Filippo Ricca is an assistant professor at th of Genova, Italy. He received his PhD degr puter Science from the same University, in the thesis “Analysis, Testing and Re-structu Applications”. In 2011 he was awarded the IC (Most Influential Paper) award, for his pape and Testing of Web Applications”. He is author or coauthor of more than papers published in international journals ences/workshops. At the time of writing (20/0 h-index reported by Google Scholar is 24 (13 the number of citations 2002. Filippo Ricca w Chair of CSMR 2013, ICPC 2011 and WSE 2008 others, he served in the program committees of the following confere ICST, SCAM, CSMR, WCRE and ESEM. From 1999 to 2006, he worked with the Software Engineering grou (now FBK-irst), Trento, Italy. During this time he was part of the team on Reverse engineering, Re-engineering and Software Testing. His current research interests include: Software modeling, Reverse Empirical studies in Software Engineering, Web applications and Softw The research is mainly conducted through empirical methods such as controlled experiments and surveys. Alessandro Tiso is a PhD student in Comput University of Genoa. He is also a teacher of info applied mathematics in a high school. His m interests are: model driven development, mo mations and use of models in software engine last 20 years he also worked as an IT consu industry. Gianna Reggio is Associate Professor at U Genova since 1992. Previously she was Assista at the same university, she receive the PhD in in 1988. She took part in several research pro them SAHARA and SALADIN (PRIN) and co-or eral international conferences and worksho relevant being ETAPS 2001 and MODELS 20 been a member of the program committee of s national conferences and in particular MODE 2008, and 2007). Her research activity foc on the following topics: software developme concurrent systems modeling and formal s and semantics of programming languages. R started working on development methods for SOA applications. She several national and international conference and journals papers and
  • 13. Current Research Topics •Data Quality •Testing of Mobile App UI
  • 14. Data Quality • Data Quality • Data is a key component in many domains • Low quality makes the outcomes useless or outright dangerous • Data is not (anymore) just a part of software • New ISO quality standards: ISO-25012 , ISO-25024 • Challenges (focus on intrinsic characteristics): • Practical application of quality measures to concrete cases (Open-Data) • Quality assessment of unstructured data (NoSql, semantic KB)
  • 15. Data Quality: practical application • Open Government Data • Defined a set of metrics from literature • Analyzed + municipal data • Able to identify quality issues • Different issues depending on disclosure process • Technical specification: UNI/TS 11725:2018 • Guidelines for the measurement of data quality • As a member of UNI CT 504 Open data quality measurement framework: Definition and application to Open Government Data Antonio Vetrò ⁎, Lorenzo Canova, Marco Torchiano, Camilo Orozco Minotas, Raimondo Iemma, Federico Morando Nexa Center for Internet & Society, DAUIN, Politecnico di Torino, Italy a b s t r a c ta r t i c l e i n f o Article history: Received 5 March 2015 Received in revised form 1 February 2016 Accepted 3 February 2016 Available online 20 February 2016 The diffusion of Open Government Data (OGD) in recent years kept a very fast pace. However, evidence from practitioners shows that disclosing data without proper quality control may jeopardize dataset reuse and nega- tively affect civic participation. Current approaches to the problem in literature lack a comprehensive theoretical framework. Moreover, most of the evaluations concentrate on open data platforms, rather than on datasets. In this work, we address these two limitations and set up a framework of indicators to measure the quality of Open Government Data on a series of data quality dimensions at most granular level of measurement. We vali- dated the evaluation framework by applying it to compare two cases of Italian OGD datasets: an internationally recognized good example of OGD, with centralized disclosure and extensive data quality controls, and samples of OGD from decentralized data disclosure (municipality level), with no possibility of extensive quality controls as in the former case, hence with supposed lower quality. Starting from measurements based on the quality framework, we were able to verify the difference in quality: the measures showed a few common acquired good practices and weaknesses, and a set of discriminating factors that pertain to the type of datasets and the overall approach. On the basis of this evaluation, we also provided technical and policy guidelines to overcome the weaknesses observed in the decentralized release policy, ad- dressing specific quality aspects. © 2016 Elsevier Inc. All rights reserved. Keywords: Open Government Data Open data quality Government information quality Data quality measurement Empirical assessment 1. Introduction Open data are data that “can be freely used, modified, and shared by anyone for any purpose” (Web ref. 1). Compared to proprietary frame- works, digital commons such as open data are characterized — from both a legal and a technical point of view — by lower restrictions applied to their circulation and reuse. This feature is supposed to ultimately fos- ter collaboration, creativity and innovation (Hofmokl, 2010). At all administrative levels, the public sector is one of the major pro- ducers and holders of information, which ranges, e.g., from maps to companies registers (Aichholzer & Burkert, 2004). During the last years, the amount and variety of open data released by public adminis- trations across the world has been tangibly growing (see, e.g., the Open Data Census by the Open Knowledge Foundation (Web ref. 2)), while increased political awareness on the subject has been translated in regulation, including the revised of the EU Directive on Public Sector Information reuse in 2013, as well as national roadmaps and technical guidelines. Releasing public sector information as open data can provide considerable added value, meeting a demand coming from all kinds of actors, ranging from companies to Non-Governmental Organizations, from developers to simple citizens. Many suggest that wider and easier circulation of public datasets could entail interesting (and even unex- pected) forms of reuse, also for commercial purposes (Vickery, 2011), and in general improve transparency of public institutions (Stiglitz, Orszag, & Orszag, 2000; Ubaldi, 2013) and distributed ability to inter- pret complex phenomena (Janssen, Charalabidis, & Zuiderwijk, 2012). From the point of view of data reusers, we should take into account the role of the so-called infomediaries, i.e., players that are able to inter- pret data and present them effectively to the general public (Mayer- Schönberger & Zappia, 2011), with a highly diversified set of business models (Zuiderwijk, Janssen, & Davis, 2014). More in general, open data consumers manipulate data in many ways, ranging, e.g., from data integration to classification, also depending on the complementary assets they hold (Ferro & Osella, 2013). Considering this, legal and technical openness of datasets is not sufficient, by itself, to create a pro- lific reuse ecosystem (Helbig, Nakashima, & Dawe, 2012): failures in providing good quality information might impair not only the reuse of the data, but also the usage of the institutional portals (Detlor, Hupfer, Ruhi, & Zhao, 2013). Attempts to increase meaningfulness and reusabil- ity of public sector information also imply representing and exposing data so that they can be easily accessed, queried, processed and linked with other data with no restrictions (Sharon, 2010). Although sharing Government Information Quarterly 33 (2016) 325–337 ⁎ Corresponding author at: Nexa Center for Internet & Society, Politecnico di Torino, Via Pier Carlo Boggio, 65/A, 10138 Torino, Italy. E-mail address: antonio.vetro@polito.it (A. Vetrò). http://dx.doi.org/10.1016/j.giq.2016.02.001 0740-624X/© 2016 Elsevier Inc. All rights reserved. Contents lists available at ScienceDirect Government Information Quarterly journal homepage: www.elsevier.com/locate/govinf Barlett, D. L., & Steele, J. B. (1985). Forevermore: Nuclear waste in America. Basili, V., Caldiera, G., & Rombach, H. D. (1994). The goal question metric approach. In: Encyclopedia of software engineering. John Wiley & Sons, 528–532. Batini, C., & Scannapieco, M. (2006). Data quality, concepts, methodologies and techniques. Berlin: Springer. Batini, C., Cappiello, C., Francalanci, C., & Maurino A. Methodologies for data quality as- sessment and improvement. ACM Computing Surveys 41, 3, Article 16 (July 2009), (52 pp.). Behkamal, Behshid, Kahani, Mohsen, Bagheri, Ebrahim, & Jeremic, Zoran (May 2014). A metrics-driven approach for quality assessment of linked open data. Journal of Theoretical and Applied Electronic Commerce Research, 9(2), 64–79. Berners-Lee, T. (2006). Linked data-design issues. Tech. rep., W3C http://www.w3.org/ DesignIssues/LinkedData.html Bovee, M., Srivastava, R., & Mak, B. (2003). Conceptual framework and belief-function approach to assessing overall information quality. International Journal of Intelligent Systems, 51–74. Calero, C., Caro, A., & Piattini, M. (2008). An applicable data quality model for web portal data consumers. World Wide Web, 11(4), 465–484. Canova, L., Basso, S., Iemma, R., & Morando, F. (2015). Collaborative open data versioning: A pragmatic approach using linked data. International Conference for E-Democracy and Open Government 2015 (CeDEM15), Krems, Austria. Catarci, T., & Scannapieco, M. (2002). Data quality under the computer science perspec- tive. Archivi Computer, 2. Detlor, B., Hupfer, Maureen E, Ruhi, U. & Zhao, L. Information quality and community municipal portal use, Government Information Quarterly, Volume 30, Issue 1, January 2013, Pages 23-32, http://dx.doi.org/10.1016/j.giq.2012.08.00, (ISSN 0740-624X). English, L. (1999). Improving data warehouse and business information quality. Wiley & Sons. Even, A., & Shankaranarayanan, G. (2009). Utility cost perspectives in data quality man- agement. The Journal of Computer Information Systems, 50(2), 127–135. Ferro, E., & Osella, M. (2013). Eight business model archetypes for PSI re-use. “Open data on the web” workshop, Google campus, London. Haug, A., Pedersen, A., & Arlbjørn, J. S. (2009). A classification model of ERP system data quality. Industrial Management & Data Systems, 109(8), 1053–1068. http://dx.doi. org/10.1108/02635570910991292. Heinrich, B. (2002). Datenqualitätsmanagement in Data Warehouse-Systemen. (doctoral thesis, Oldenburg). Heinrich, B., Klier, M., & Kaiser, M. (2009). A procedure to develop metrics for cur- rency and its application in CRM. Journal of Data and Information Quality, 1(1), 5. Helbig, N., Nakashima, M., & Dawe, Sharon S. (June 4–7, 2012). Understanding the value and limits of government information in policy informatics: A preliminary explora- tion. Proceedings of the 13th Annual International Conference on Digital Government Research (dg.o2012). Hofmokl, J. (2010). The Internet commons: toward an eclectic theoretical framework. International Journal of the Commons, 4(1), 226–250. Iemma, R., Morando, F., & Osella, M. (2014). Breaking public administrations' data silos. eJournal of eDemocracy & Open Government, 6(2). Janssen, M., Charalabidis, Y., & Zuiderwijk, A. (2012). Benefits, adoption barriers and myths of open data and open government. Information Systems Management, 29(4), 258–268. Vassiliou, Y. (1995). In M. Jarke, M. Lenzerini, & P. Vassiliadis (Eds.), Fundamentals of data warehouses. Springer Verlag. Jeusfeld, M., Quix, C., & Jarke, M. (1998). Design and analysis of quality information for datawarehouses. Proceedings of the 17th International Conference on Conceptual Modeling. Kaiser, M., Klier, M., & Heinrich, B. (2007). “How to measure data quality? — A metric- based approach” (2007). ICIS 2007 Proceedings. Paper 108. Kim, W. (2002). On three major holes in data warehousing today. Journal of Object Technology, 1(4), 39–47. http://dx.doi.org/10.5381/jot.2002.1.4.c3. Kuk, G., & Davies, T. (2011). The roles of agency and artifacts in assembling open data complementarities. Thirty Second International Conference on Information Systems. Madnick, S., Wang, R., & Xian, X. (2004). The design and implementation of a corporate householding knowledge processor to improve data quality. Journal of Management Information Systems, 20(1), 41–49. Maurino, A., Spahiu, B., Batini, C., & Viscusi, G. (2014). Compliance with Open Government Data Policies: an empirical evaluation of Italian local public administrations. Twenty Second European Conference on Information Systems, Tel Aviv. Mayer-Schönberger, V., & Zappia, Z. (2011). Participation and power: intermediaries of open data. 1st Berlin Symposium on Internet and Society. Moraga, C., Moraga, M., Calero, C., & Caro, A. (2009). SQuaRE-aligned data quality model for web portals. Quality Software, 2009. QSIC’09. 9th International Conference on (pp. 117–122). Naumann, F. (2002). Quality-driven query answering for integrated information systems. Lecture Notes in Computer Science, 2261. Redman, T. (1996). Data quality for the information age. Artech House. Reiche, K., & Hofig, E. (2013). Implementation of metadata quality metrics and application on public government data. Computer software and applications conference workshops (COMPSACW), 2013 IEEE 37th annual (pp. 236–241). Sachs, L. (1982). Applied statistics. A handbook of techniques. New York — Heidelberg — Berlin: Springer-Verlag (734 pp., 59 figs., DM 118,-). Sande, M. V., Dimou, A., Colpaert, P., Mannens, E., & Van de Walle, R. (2013). Linked data as enabler for open data ecosystems. Open data on the web 23–24 April 2013, Campus London, Shoreditch. Sharon, Dawes S. (2010). Stewardship and usefulness: Policy principles for information- based transparency. Government Information Quarterly, 27(4), 377–383. Stiglitz, J. E., Orszag, P. R., & Orszag, J. M. (2000). The role of government in a digital age. (Commissioned by the computer& communications industry association). Tauberer, J. (2012). Open government data: The book. Ubaldi, B. (2013). Open government data: Towards empirical analysis of open govern- ment data initiatives. Tech. rep. OECD Publishing. Umbrich, Jürgen, Neumaier, Sebastian, & Polleres, Axel (March 2015). Towards assessing the quality evolution of Open Data portals. ODQ2015: Open data quality: From theory to practice workshop (Munich, Germany). Van de Sompel, H., Sanderson, R., Nelson, M. L., Balakireva, L. L., Shankar, H., & Ainsworth, S. (2010). An HTTP-based versioning mechanism for linked data. (arXiv preprint arXiv: 1003.3661). Vickery, G. (2011). Review of recent studies on PSI re-use and related market developments. Paris: Information Economics. Wand, Y., & Wang, R. (1996). Anchoring data quality dimensions in ontological founda- tions. Communications of the ACM, 39, 11. Wang, R., & Strong, D. (1996). Beyond accuracy: What data quality means to data con- sumers. Journal of Management Information Systems, 12, 4. Whitmore A. Using open government data to predict war: A case study of data and systems challenges, Government Information Quarterly, Volume 31, Issue 4, October 2014, Pages 622-630, (ISSN 0740-624X). Zuiderwijk, A., Janssen, M., & Davis, C. (2014). Innovation with open data: Essential ele- ments of open data ecosystems. Information Policy, 19(1), 17–33. Zuiderwijk, Anneke, & Janssen, Marijn (2015). Participation and data quality in open data use: Open data infrastructures evaluated. Proceedings of the 15th European Conference on eGovernment 2015: ECEG 2015. Academic Conferences Limited. Web references Web ref. 1. http://opendefinition.org/(last visited on November 5, 2015). Web ref. 2. http://census.okfn.org/, (last visited on November 5, 2015). Web ref. 3. http://goo.gl/QlUT1d (last visited on November 5, 2015). Web ref. 4. https://goo.gl/yq4ElP (last visited on November 5, 2015). Web ref. 5. http://sunlightfoundation.com/blog/2010/06/23/elenas-inbox/ (last visit on November 5, 2015). Web ref. 6. http://nexa.polito.it/lunch-9 (last visit on November 5, 2015). Web ref. 7. http://okfnlabs.org/bad-data/ (last visit on November 5, 2015). Web ref. 8. http://www.dati.gov.it/dataset (last visit on November 5, 2015). Web ref. 9. https://www.opengovawards.org/Awards_Booklet_Final.pdf, (last visit on November 5, 2015). Web ref. 10. http://en.wikipedia.org/wiki/European_Social_Fund, (last visit on November 5, 2015). Web ref. 11. http://www.dati.gov.it/(last visit on November 5, 2015). Web ref. 12. http://blog.okfn.org/2013/07/02/git-and-github-for-data/ (last visited on November 5, 2015). Web ref. 13. http://dat-data.com/(last visited on November 5, 2015). Web ref. 14. https://code.google.com/p/google-refine/ (last visited on November 5, 2015). Web ref. 15. http://datacleaner.org/, (last visited on November 5, 2015). Web ref. 16. http://checklists.opquast.com/en/opendata, (last visited on November 5, 2015). Web ref. 17. http://odaf.org/papers/Open%20Data%20and%20Metadata%20Standards.pdf, (last visited on November 5, 2015). Antonio Vetrò is Director of Research and Policy at the Nexa Center for Internet and Society at Politecnico di Torino (Italy). Formerly, he has been a research fellow in the Soft- ware and System Engineering Department at Technische Universität München (Germany) and junior scientist at Fraunhofer Center for Experimental Software Engineering (MD, U.S.A.). He holds a PhD in Information and System En- gineering from Politecnico di Torino (Italy). He is interested in studying the impact of technology on society, with a focus on technology transfer and internet science, and adopting an empirical epistemological approach. Contact him at antonio. vetro@polito.it. Lorenzo Canova holds a MSc in Industrial engineering from Politecnico di Torino and wrote his Master thesis on Open Government Data quality. He is currently a research fellow of the Nexa Center for Internet & Society at Politecnico di To- rino. His main research interests are in Open Government Data, government transparency and linked data. Contact him at lorenzo.canova@polito.it. 336 A. Vetrò et al. / Government Information Quarterly 33 (2016) 325–337 Behkamal, Behshid, Kahani, Mohsen, Bagheri, Ebrahim, & Jeremic, Zoran (May 2014). A metrics-driven approach for quality assessment of linked open data. Journal of Theoretical and Applied Electronic Commerce Research, 9(2), 64–79. Berners-Lee, T. (2006). Linked data-design issues. Tech. rep., W3C http://www.w3.org/ DesignIssues/LinkedData.html Bovee, M., Srivastava, R., & Mak, B. (2003). Conceptual framework and belief-function approach to assessing overall information quality. International Journal of Intelligent Systems, 51–74. Calero, C., Caro, A., & Piattini, M. (2008). An applicable data quality model for web portal data consumers. World Wide Web, 11(4), 465–484. Canova, L., Basso, S., Iemma, R., & Morando, F. (2015). Collaborative open data versioning: A pragmatic approach using linked data. International Conference for E-Democracy and Open Government 2015 (CeDEM15), Krems, Austria. Catarci, T., & Scannapieco, M. (2002). Data quality under the computer science perspec- tive. Archivi Computer, 2. Detlor, B., Hupfer, Maureen E, Ruhi, U. & Zhao, L. Information quality and community municipal portal use, Government Information Quarterly, Volume 30, Issue 1, January 2013, Pages 23-32, http://dx.doi.org/10.1016/j.giq.2012.08.00, (ISSN 0740-624X). English, L. (1999). Improving data warehouse and business information quality. Wiley & Sons. Even, A., & Shankaranarayanan, G. (2009). Utility cost perspectives in data quality man- agement. The Journal of Computer Information Systems, 50(2), 127–135. Ferro, E., & Osella, M. (2013). Eight business model archetypes for PSI re-use. “Open data on the web” workshop, Google campus, London. Haug, A., Pedersen, A., & Arlbjørn, J. S. (2009). A classification model of ERP system data quality. Industrial Management & Data Systems, 109(8), 1053–1068. http://dx.doi. org/10.1108/02635570910991292. Heinrich, B. (2002). Datenqualitätsmanagement in Data Warehouse-Systemen. (doctoral thesis, Oldenburg). Heinrich, B., Klier, M., & Kaiser, M. (2009). A procedure to develop metrics for cur- rency and its application in CRM. Journal of Data and Information Quality, 1(1), 5. Helbig, N., Nakashima, M., & Dawe, Sharon S. (June 4–7, 2012). Understanding the value and limits of government information in policy informatics: A preliminary explora- tion. Proceedings of the 13th Annual International Conference on Digital Government Research (dg.o2012). Hofmokl, J. (2010). The Internet commons: toward an eclectic theoretical framework. International Journal of the Commons, 4(1), 226–250. Iemma, R., Morando, F., & Osella, M. (2014). Breaking public administrations' data silos. eJournal of eDemocracy & Open Government, 6(2). Janssen, M., Charalabidis, Y., & Zuiderwijk, A. (2012). Benefits, adoption barriers and myths of open data and open government. Information Systems Management, 29(4), 258–268. Vassiliou, Y. (1995). In M. Jarke, M. Lenzerini, & P. Vassiliadis (Eds.), Fundamentals of data warehouses. Springer Verlag. Jeusfeld, M., Quix, C., & Jarke, M. (1998). Design and analysis of quality information for datawarehouses. Proceedings of the 17th International Conference on Conceptual Modeling. Kaiser, M., Klier, M., & Heinrich, B. (2007). “How to measure data quality? — A metric- based approach” (2007). ICIS 2007 Proceedings. Paper 108. Kim, W. (2002). On three major holes in data warehousing today. Journal of Object Technology, 1(4), 39–47. http://dx.doi.org/10.5381/jot.2002.1.4.c3. Kuk, G., & Davies, T. (2011). The roles of agency and artifacts in assembling open data complementarities. Thirty Second International Conference on Information Systems. Madnick, S., Wang, R., & Xian, X. (2004). The design and implementation of a corporate householding knowledge processor to improve data quality. Journal of Management Information Systems, 20(1), 41–49. Maurino, A., Spahiu, B., Batini, C., & Viscusi, G. (2014). Compliance with Open Government Data Policies: an empirical evaluation of Italian local public administrations. Twenty Second European Conference on Information Systems, Tel Aviv. Mayer-Schönberger, V., & Zappia, Z. (2011). Participation and power: intermediaries of open data. 1st Berlin Symposium on Internet and Society. Moraga, C., Moraga, M., Calero, C., & Caro, A. (2009). SQuaRE-aligned data quality model for web portals. Quality Software, 2009. QSIC’09. 9th International Conference on (pp. 117–122). Naumann, F. (2002). Quality-driven query answering for integrated information systems. Lecture Notes in Computer Science, 2261. Redman, T. (1996). Data quality for the information age. Artech House. Reiche, K., & Hofig, E. (2013). Implementation of metadata quality metrics and application on public government data. Computer software and applications conference workshops (COMPSACW), 2013 IEEE 37th annual (pp. 236–241). Sachs, L. (1982). Applied statistics. A handbook of techniques. New York — Heidelberg — Berlin: Springer-Verlag (734 pp., 59 figs., DM 118,-). Sande, M. V., Dimou, A., Colpaert, P., Mannens, E., & Van de Walle, R. (2013). Linked data as enabler for open data ecosystems. Open data on the web 23–24 April 2013, Campus London, Shoreditch. the quality evolution of Open Data portals. ODQ2015: Open data quality: From theory to practice workshop (Munich, Germany). Van de Sompel, H., Sanderson, R., Nelson, M. L., Balakireva, L. L., Shankar, H., & Ainsworth, S. (2010). An HTTP-based versioning mechanism for linked data. (arXiv preprint arXiv: 1003.3661). Vickery, G. (2011). Review of recent studies on PSI re-use and related market developments. Paris: Information Economics. Wand, Y., & Wang, R. (1996). Anchoring data quality dimensions in ontological founda- tions. Communications of the ACM, 39, 11. Wang, R., & Strong, D. (1996). Beyond accuracy: What data quality means to data con- sumers. Journal of Management Information Systems, 12, 4. Whitmore A. Using open government data to predict war: A case study of data and systems challenges, Government Information Quarterly, Volume 31, Issue 4, October 2014, Pages 622-630, (ISSN 0740-624X). Zuiderwijk, A., Janssen, M., & Davis, C. (2014). Innovation with open data: Essential ele- ments of open data ecosystems. Information Policy, 19(1), 17–33. Zuiderwijk, Anneke, & Janssen, Marijn (2015). Participation and data quality in open data use: Open data infrastructures evaluated. Proceedings of the 15th European Conference on eGovernment 2015: ECEG 2015. Academic Conferences Limited. Web references Web ref. 1. http://opendefinition.org/(last visited on November 5, 2015). Web ref. 2. http://census.okfn.org/, (last visited on November 5, 2015). Web ref. 3. http://goo.gl/QlUT1d (last visited on November 5, 2015). Web ref. 4. https://goo.gl/yq4ElP (last visited on November 5, 2015). Web ref. 5. http://sunlightfoundation.com/blog/2010/06/23/elenas-inbox/ (last visit on November 5, 2015). Web ref. 6. http://nexa.polito.it/lunch-9 (last visit on November 5, 2015). Web ref. 7. http://okfnlabs.org/bad-data/ (last visit on November 5, 2015). Web ref. 8. http://www.dati.gov.it/dataset (last visit on November 5, 2015). Web ref. 9. https://www.opengovawards.org/Awards_Booklet_Final.pdf, (last visit on November 5, 2015). Web ref. 10. http://en.wikipedia.org/wiki/European_Social_Fund, (last visit on November 5, 2015). Web ref. 11. http://www.dati.gov.it/(last visit on November 5, 2015). Web ref. 12. http://blog.okfn.org/2013/07/02/git-and-github-for-data/ (last visited on November 5, 2015). Web ref. 13. http://dat-data.com/(last visited on November 5, 2015). Web ref. 14. https://code.google.com/p/google-refine/ (last visited on November 5, 2015). Web ref. 15. http://datacleaner.org/, (last visited on November 5, 2015). Web ref. 16. http://checklists.opquast.com/en/opendata, (last visited on November 5, 2015). Web ref. 17. http://odaf.org/papers/Open%20Data%20and%20Metadata%20Standards.pdf, (last visited on November 5, 2015). Antonio Vetrò is Director of Research and Policy at the Nexa Center for Internet and Society at Politecnico di Torino (Italy). Formerly, he has been a research fellow in the Soft- ware and System Engineering Department at Technische Universität München (Germany) and junior scientist at Fraunhofer Center for Experimental Software Engineering (MD, U.S.A.). He holds a PhD in Information and System En- gineering from Politecnico di Torino (Italy). He is interested in studying the impact of technology on society, with a focus on technology transfer and internet science, and adopting an empirical epistemological approach. Contact him at antonio. vetro@polito.it. Lorenzo Canova holds a MSc in Industrial engineering from Politecnico di Torino and wrote his Master thesis on Open Government Data quality. He is currently a research fellow of the Nexa Center for Internet & Society at Politecnico di To- rino. His main research interests are in Open Government Data, government transparency and linked data. Contact him at lorenzo.canova@polito.it. Marco Torchiano is an associate professor at Politecnico di Torino, Italy; he has been a post-doctoral research fellow at Norwegian University of Science and Technology (NTNU), Norway. He received an MSc and a PhD in Computer Engi- neering from Politecnico di Torino, Italy. He is an author or coauthor of more than 100 research papers published in in- ternational journals and conferences. He is the co-author of the book ‘Software Development—Case studies in Java’ from Addison-Wesley, and the co-editor of the book ‘Developing Services for the Wireless Internet’ from Springer. His current research interests are: design notations, testing methodolo- gies, OTS-based development, and static analysis. The meth- odological approach he adopts is that of empirical software engineering. Contact him at marco.torchiano@polito.it. Camilo O. Minotas holds a MSc from Politecnico di Torino and wrote his Master thesis on Open Data Quality Assess- ment. He is currently a software developer in Colombia. His main research interests are in open data quality and govern- ment transparency. Contact him at minotas@gmail.com. Raimondo Iemma holds a MSc in Industrial engin from Politecnico di Torino (Italy). His research focu open data platforms, smart disclosure of consumptio economics of information, and, in general, open gove and innovation within (or fostered by) the public sec tween 2013 and 2015, he served as a Managing Dir research fellow at the Nexa Center for Internet & So Politecnico di Torino. He works as a project man Seac02, a digital company based in Torino. Contact raimondo.iemma@gmail.com. Federico Morando (M.A.S. in Economic theory and metrics from the Univ. of Toulouse, Ph.D. in Instit Economics and Law from the Univ. of Turin and Gh an economist, with interdisciplinary research in focused on the intersection between law, econom technology. He taught intellectual property and comp law at Bocconi University in Milan, and he lect the Politecnico di Torino, at the WIPO LL.M. in Inte Property and in other post-graduate and doctoral c Between 2009 and 2015, he served as a Managing D and then Director of Research and Policy at the Nexa for Internet & Society at Politecnico di Torino. He is cu leading a startup company that focused on linke Contact him at federico.morando@gmail.com. A. Vetrò et al. / Government Information Quarterly 33 (2016) 325–337 Marco Torchiano is an associate professor at Politecnico di Torino, Italy; he has been a post-doctoral research fellow at Norwegian University of Science and Technology (NTNU), Norway. He received an MSc and a PhD in Computer Engi- neering from Politecnico di Torino, Italy. He is an author or coauthor of more than 100 research papers published in in- ternational journals and conferences. He is the co-author of the book ‘Software Development—Case studies in Java’ from Addison-Wesley, and the co-editor of the book ‘Developing Services for the Wireless Internet’ from Springer. His current research interests are: design notations, testing methodolo- gies, OTS-based development, and static analysis. The meth- odological approach he adopts is that of empirical software engineering. Contact him at marco.torchiano@polito.it. Camilo O. Minotas holds a MSc from Politecnico di Torino and wrote his Master thesis on Open Data Quality Assess- ment. He is currently a software developer in Colombia. His main research interests are in open data quality and govern- ment transparency. Contact him at minotas@gmail.com. A. Vetrò et al. / Government Information Quarterly 33 (20 Marco Torchiano is an associate professor at Politecnico di Torino, Italy; he has been a post-doctoral research fellow at Norwegian University of Science and Technology (NTNU), Norway. He received an MSc and a PhD in Computer Engi- neering from Politecnico di Torino, Italy. He is an author or coauthor of more than 100 research papers published in in- ternational journals and conferences. He is the co-author of the book ‘Software Development—Case studies in Java’ from Addison-Wesley, and the co-editor of the book ‘Developing Services for the Wireless Internet’ from Springer. His current research interests are: design notations, testing methodolo- gies, OTS-based development, and static analysis. The meth- odological approach he adopts is that of empirical software engineering. Contact him at marco.torchiano@polito.it. Camilo O. Minotas holds a MSc from Politecnico di Torino and wrote his Master thesis on Open Data Quality Assess- ment. He is currently a software developer in Colombia. His main research interests are in open data quality and govern- ment transparency. Contact him at minotas@gmail.com. Raimondo Iemma holds a MSc in Industrial engi from Politecnico di Torino (Italy). His research foc open data platforms, smart disclosure of consumptio economics of information, and, in general, open gove and innovation within (or fostered by) the public sec tween 2013 and 2015, he served as a Managing Dir research fellow at the Nexa Center for Internet & So Politecnico di Torino. He works as a project man Seac02, a digital company based in Torino. Contact raimondo.iemma@gmail.com. Federico Morando (M.A.S. in Economic theory and metrics from the Univ. of Toulouse, Ph.D. in Insti Economics and Law from the Univ. of Turin and Gh an economist, with interdisciplinary research in focused on the intersection between law, econom technology. He taught intellectual property and com law at Bocconi University in Milan, and he lect the Politecnico di Torino, at the WIPO LL.M. in Inte Property and in other post-graduate and doctoral c Between 2009 and 2015, he served as a Managing D and then Director of Research and Policy at the Nexa for Internet & Society at Politecnico di Torino. He is cu leading a startup company that focused on linke Contact him at federico.morando@gmail.com. A. Vetrò et al. / Government Information Quarterly 33 (2016) 325–337 © UNI – Ente Nazionale Italiano di Unificazione CODICE PROGETTO UNI 1602662 NUMERO NORMA UNI/TS 11725:2018 TITOLO ITALIANO Ingegneria del software e di sistema - Linee guida per la misurazione della qualità dei dati TITOLO INGLESE System and software engineering - Guidelines for the measurement of data quality ORGANO COMPETENTE UNI/CT 504 Software Engineering CO-AUTORE SOMMARIO La specifica tecnica definisce l’applicazione della misurazione della UNI CEI ISO/IEC 25024 "Ingegneria del software e di sistema - Requisiti e valutazione della qualità dei sistemi e del software (SQuaRE) - Misurazione della qualità dei dati" attraverso il confronto con le prassi attuali e spunti teorici. La norma UNI CEI ISO/IEC 25024 definisce un set di misure da applicare nella misurazione della qualità dei dati. Le definizioni delle misure sono generali per essere applicate a diversi casi specifici. RELAZIONI NAZIONALI ----- RELAZIONI INTERNAZIONALI -----
  • 16. Data Quality: unstructured data • Evolution measures based on simple counts SELECT COUNT(DISTINCT ?s) AS ?COUNT WHERE { ?s a <C> .} Semantic Web 0 (0) 1–0 1 IOS Press A Quality Assessment Approach for Evolving Knowledge Bases Mohammad Rifat Ahmmad Rashid a,⇤, Marco Torchiano a, Giuseppe Rizzo b, Nandana Mihindukulasooriya c and Oscar Corcho c a Politecnico di Torino, Italy E-mail: mohammad.rashid@polito.it, marco.torchiano@polito.it b Instituto Superiore Mario Boella E-mail: giuseppe.rizzo@ismb.it c Universidad Politecnica de Madrid, Spain E-mail: nmihindu@fi.upm.es, ocorcho@fi.upm.es Abstract Knowledge bases are nowadays essential components for any task that requires automation with some degrees of intelligence. Assessing the quality of a Knowledge Base (KB) is a complex task as it often means measuring the quality of structured informa- tion, ontologies and vocabularies, and queryable endpoints. Popular knowledge bases such as DBpedia, YAGO2, and Wikidata have chosen the RDF data model to represent their data due to its capabilities for semantically rich knowledge representation. Despite its advantages, there are challenges in using RDF data model, for example, data quality assessment and validation. In this paper, we present a novel knowledge base quality assessment approach that relies on evolution analysis. The proposed approach uses data profiling on consecutive knowledge base releases to compute quality measures that allow detecting quality issues. Our quality characteristics are based on the KB evolution analysis and we used high-level change detection for measurement func- tions. In particular, we propose four quality characteristics: Persistency, Historical Persistency, Consistency, and Completeness. Persistency and historical persistency measures concern the degree of changes and lifespan of any entity type. Consistency and completeness measures identify properties with incomplete information and contradictory facts. The approach has been assessed both quantitatively and qualitatively on a series of releases from two knowledge bases, eleven releases of DBpedia and eight releases of 3cixty. The capability of Persistency and Consistency characteristics to detect quality issues varies significantly be- tween the two case studies. Persistency measure gives observational results for evolving KBs. It is highly effective in case of KB with periodic updates such as 3cixty KB. The Completeness characteristic is extremely effective and was able to achieve 95% precision in error detection for both use cases. The measures are based on simple statistical operations that make the solution both flexible and scalable. Keywords: Quality Assessment, Quality Issues, Temporal Analysis, Knowledge Base, Linked Data 1. Introduction The Linked Data approach consists in exposing and connecting data from different sources on the Web by the means of semantic web technologies. Tim Berners- Lee1 refers to linked open data as a distributed model *Corresponding author. E-mail: mohammad.rashid@polito.it 1http://www.w3.org/DesignIssues/LinkedData. html for the Semantic Web that allows any data provider to publish its data publicly, in a machine readable format, and to meaningfully link them with other information sources over the Web. This is leading to the creation of Linked Open Data (LOD) 2 cloud hosting several Knowledge Bases (KBs) making available billions of RDF triples from different domains such as Geogra- 2http://lod-cloud.net 1570-0844/0-1900/$35.00 c 0 – IOS Press and the authors. All rights reserved
  • 17. Mobile UI Testing K-9 Mail V 2.995 V 3.309