Research Activities: past, present, and future.

Research Activities:
past, present, and future
Marco Torchiano
Seminario Pubblico
Procedura di selezione per professore universitario di ruolo di I fascia
S.C. 09/H1-S.S.D. ING-INF/05 - codice interno 03/18/P/O
October 19, 2018

Empirical Software Engineering
Survey Case study Case-controlExperiment
Package effsisze
2.2K download/month

1
On the Effectiveness of the
Test-First Approach to Programming
Hakan Erdogmus, Maurizio Morisio, Member, IEEE Computer Society, and
Marco Torchiano, Member, IEEE Computer Society
Abstract—Test-Driven Development (TDD) is based on formalizing a piece of functionality as a test, implementing the functionality
such that the test passes, and iterating the process. This paper describes a controlled experiment for evaluating an important aspect of
TDD: In TDD, programmers write functional tests before the corresponding implementation code. The experiment was conducted with
undergraduate students. While the experiment group applied a test-first strategy, the control group applied a more conventional
development technique, writing tests after the implementation. Both groups followed an incremental process, adding new features one
at a time and regression testing them. We found that test-first students on average wrote more tests and, in turn, students who wrote
more tests tended to be more productive. We also observed that the minimum quality increased linearly with the number of
programmer tests, independent of the development strategy employed.
Index Terms—General programming techniques, coding tools and techniques, testing and debugging, testing strategies, productivity,
Software Quality/SQA, software engineering process, programming paradigms.
æ
1 INTRODUCTION
TEST-DRIVEN Development [1], [2], or TDD, is an approach
to code development popularized by Extreme Program-
ming [3], [4]. The key aspect of TDD is that programmers
write low-level functional tests before production code. This
dynamic, referred to as Test-First, is the focus of this paper.
The technique is usually supported by a unit testing tool,
such as JUnit [5], [6].
Programmer tests in TDD, much like unit tests, tend to
be low-level. Sometimes they can be higher level and cross-
cutting, but do not in general address testing at integration
and system levels. TDD relies on no specific or formal
criterion to select the test cases. The tests are added
gradually during implementation. The testing strategy
employed differs from the situation where tests for a
module are written by a separate quality assurance team
before the module is available. In TDD, tests are written one
at a time and by the same person developing the module.
(Cleanroom [7], for example, also leverages tests that are
specified before implementation, but in a very different
way. Tests are specified to mirror a statistical user input
distribution as part of a formal reliability model, but this
activity is completely separated from the implementation
phase.)
TDD can be considered from several points of view:
. Feedback: Tests provide the programmer with
instant feedback as to whether new functionality
has been implemented as intended and whether it
interferes with old functionality.
. Task-orientation: Tests drive the coding activity,
encouraging the programmer to decompose the
problem into manageable, formalized programming
tasks, helping to maintain focus and providing
steady, measurable progress.
. Quality assurance: Having up-to-date tests in place
ensures a certain level of quality, maintained by
frequently running the tests.
. Low-level design: Tests provide the context in which
low-level design decisions are made, such as which
classes and methods to create, how they will be
named, what interfaces they will possess, and how
they will be used.
Although advocates claim that TDD enhances both
product quality and programmer productivity, skeptics
maintain that such an approach to programming would be
both counterproductive and hard to learn. It is easier to
imagine how interleaving coding with testing could
increase quality, but harder to imagine a similar positive
effect on productivity. Many practitioners consider tests as
overhead and testing as exclusively the job of a separate
quality assurance team. But, what if the benefits of
programmer tests outweigh their cost? If the technique
indeed has merit, a reexamination of the traditional coding
techniques could be warranted.
This paper presents a formal investigation of the
strengths and weaknesses of the Test-First approach to
programming, the central component of TDD. We study
this component in the context in which it is believed to be
most effective, namely, incremental development. To
undertake the investigation, we designed a controlled
226 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 31, NO. 3, MARCH 2005
. H. Erdogmus is with the Institute for Information Technology, National
Research Council Canada, Bldg. M-50, Montreal Rd., Ottawa, Ontario
K1A 0R6, Canada. E-mail: Hakan.Erdogmus@nrc-cnrc.gc.ca.
. M. Morisio and M. Torchiano are with the Dipartimento Automatica e
Informatica, Politecnico di Torino, C.so Duca degli Abruzzi, 24, I-10129,
Torino, Italy. E-mail: {Maurizio.Morisio, Marco.Torchiano}@polito.it.
Manuscript received 13 Feb. 2004; revised 6 Dec. 2004; accepted 31 Dec. 2004;
published online 20 Apr. 2005.
Recommended for acceptance by D. Rombach.
For information on obtaining reprints of this article, please send e-mail to:
tse@computer.org, and reference IEEECS Log Number TSE-0026-0204.
0098-5589/05/$20.00 ß 2005 IEEE Published by the IEEE Computer Society
362 citations

ERDOGMUS ET AL.: ON THE EFFECTIVENESS OF THE TEST-FIRST APPROACH TO PROGRAMMING
Test-Driven Development
• TDD is an agile development practice
• Made popular by eXtreme Programming
• Adopted widely
• Promises:
• More and better tests
• Higher external quality
• Improved productivity

Test-Driven Development: findings
• Higher number of tests
• Impact on quality and productivity
• No clear difference in terms of quality
• Too small tasks
• Process conformance
• Slight improvement in productivity
• More focused
• Less rework
æ
1 INTRODUCTION
phase.)
they will be used.

1
æ
1 INTRODUCTION
phase.)
they will be used.
362 citations
2
2 IEEE SOFTWARE Published by the IEEE Computer Society 0740-7459/04/$20.00 © 2004 IEEE
definition would improve discourse and enable
researchers to meta-analyze published empiri-
cal data.
After a speculative effort to define and clas-
sify COTS products,2 we decided to obtain this
much-needed understanding from the bottom
up. We asked people involved in industrial proj-
ects to name the key features in COTS-based de-
velopment and to tell us what they think con-
stitutes a COTS product. Here, we present the
interview results in the form of six theses, which
contradict widely accepted (or simply undis-
puted) ideas. We also present a definition of
“COTS product” that captures the key features.
The sample of projects we used is limited and
can’t represent the whole IT industry. Instead,
our study primarily considers Subject Matter
Experts. While SMEs produce most software,
they’re usually neglected by research. Specifi-
cally, the COTS-development literature seems
to focus on large, mostly governmental proj-
ects performed in major companies. Our goal
is to empirically explore this key topic for the
software industry, collect facts and derive well-
substantiated theses from these facts.
The groundwork
From a methodological viewpoint, we con-
ducted empirical research following two paral-
lel threads that complement each other: a sys-
tematic literature investigation and an on-field
exploratory study based on structured inter-
views. Because the industry doesn’t have a com-
mon understanding of COTS products, we laid
feature
Overlooked Aspects of
COTS-based Development
A
lthough developing with commercial-off-the-shelf components is
gaining more attention from both research and industrial com-
munities,1 most literature on the topic doesn’t clearly identify
context variables such as the type of products, projects, and sys-
tems. In particular, the literature often lacks a definition of “COTS prod-
uct,” or, when a definition is present, it invariably disagrees with other
studies (see the sidebar, “What COTS Product Do You Mean?”). A shared
cots
Studies on commercial-off-the-shelf-based development often
disagree, lack product and project details, and are founded on
uncritically accepted assumptions. This empirical study establishes
key features of industrial COTS-based development and creates a
definition of “COTS products” that captures these features.
Marco Torchiano, Norwegian University of Science and Technology
Maurizio Morisio, Politecnico di Torino
158 citations

Development with OTS components
• Development with Off-The-Shelf components
• Systems are built through integration of existing components
• The marketplace influences the evolution of OTS components
• Continuous trade-off between system and component features
• Strong push from DoD since late ‘90s
• Series of conferences driven by SEI since 2002
• As system become more complex all sw is based on OTS components
• Goal: state of the practice of OTS-based development in EU industry

OTS-Based Development: 6 theses
T1 Open source software is often used as closed source.
T2
Integration problems result from lack of standard compliance;
architectural mismatches constitute a secondary issue.
T3 Custom code mainly provides additional functionalities.
T4
Developers seldom use formal selection procedures.
Familiarity with either the product or the generic
architecture is the key factor in selection.
T5
Architecture is more important than requirements
for product selection.
T6
Integrators tend to influence the vendor concerning product
evolution whenever possible.
cal data.
The groundwork
feature
A
cots
[49] B.H. Cohen, Explaining P
Wiley & Sons, 2000.
[50] Common Risks and Risk Re
www.mitre.org/work/s
CommonRisksCOTS.doc,
[51] V.R. Basili and B.W. Boeh
Computer, vol. 34, no. 5, p
[52] S. Lauesen, “COTS Ten
J. Requirements Eng., vol.
[53] J. Li, R. Conradi, O.P.N
Torchiano, and M. Morisio
Shelf Component Based
Software Metrics Symp., (a
[54] J. Cohen, P. Cohen, S.G.
Regression/Correlation Ana
Lawrence Erlbaum Assoc
[55] P.M. Podsakoof and D.W
Research: Problems and P
pp. 531-544, 1986.
[56] S.J. Pocock, M.D. Hughe
Reporting of Clinical Tria
The New England J. Medic
Jingyu
ter scie
Science
present
worked
NTNU.
enginee
velopm
project
program
Computer Society and the ACM
Reidar
degree
Science
in 1970
NTNU s
Depart
Science
at the F
ware E
Politecn
current research interests lie
improvement, version models,
ware, and associated empirica
Odd P
degree
Univers
fellow
Univers
heim, N
risk ma
ware e
open so
286
Thanks to:

1
æ
1 INTRODUCTION
phase.)
they will be used.
362 citations
2
cal data.
The groundwork
feature
A
cots
158 citations 3
The Journal of Systems and Software 86 (2013) 2110–2126
Contents lists available at SciVerse ScienceDirect
The Journal of Systems and Software
journal homepage: www.elsevier.com/locate/jss
Relevance, benefits, and problems of software modelling and model driven
techniques—A survey in the Italian industry
Marco Torchianoa,∗
, Federico Tomassettia
, Filippo Riccab
, Alessandro Tisob
, Gianna Reggiob
a
Dip. Automatica e Informatica, Politecnico di Torino, Italy
b
DIBRIS, Università di Genova, Italy
a r t i c l e i n f o
Article history:
Received 23 August 2012
Received in revised form 15 March 2013
Accepted 18 March 2013
Available online 1 April 2013
Keywords:
Model driven techniques
Software modelling
Personal opinion survey
a b s t r a c t
Context: Claimed benefits of software modelling and model driven techniques are improvements in pro-
ductivity, portability, maintainability and interoperability. However, little effort has been devoted at
collecting evidence to evaluate their actual relevance, benefits and usage complications.
Goal: The main goals of this paper are: (1) assess the diffusion and relevance of software modelling and
MD techniques in the Italian industry, (2) understand the expected and achieved benefits, and (3) identify
which problems limit/prevent their diffusion.
Method: We conducted an exploratory personal opinion survey with a sample of 155 Italian software
professionals by means of a Web-based questionnaire on-line from February to April 2011.
Results: Software modelling and MD techniques are very relevant in the Italian industry. The adoption of
simple modelling brings common benefits (better design support, documentation improvement, better
maintenance, and higher software quality), while MD techniques make it easier to achieve: improved
standardization, higher productivity, and platform independence. We identified problems, some hinder-
ing adoption (too much effort required and limited usefulness) others preventing it (lack of competencies
and supporting tools).
Conclusions: The relevance represents an important objective motivation for researchers in this area.
The relationship between techniques and attainable benefits represents an instrument for practitioners
planning the adoption of such techniques. In addition the findings may provide hints for companies and
universities.
© 2013 Elsevier Inc. All rights reserved.
1. Introduction
Usually, model-based techniques use models to describe the
architecture and design of a system and/or the behaviour of soft-
ware artefacts generally through a raise in the level of abstraction
(France and Rumpe, 2003). Models can also be used for other
purposes, e.g., to describe business workflows or development pro-
cesses.
They can be used in different phases of the development process
as communication artefacts, as points of references against which
subsequent implementations are verified, or as the basis for further
development. In the latter case, they may become the key elements
in the process, from which other artefacts (most notably code) are
generated.
∗ Corresponding author at: Dip. Automatica e Informatica, Politecnico di Torino,
C.so Duca degli Abruzzi 24, 10129 Torino, Italy. Tel.: +39 011 564 7088;
fax: +39 011 564 7099.
E-mail address: marco.torchiano@polito.it (M. Torchiano).
The term “model” is very general; it is difficult to provide a com-
prehensive yet precise definition of a model. For us, a model is an
artefact realized with the goal to capture and transfer information
in a form as pure as possible limiting the syntax noise. Exam-
ples of models include UML design models (Object Management
Group, 2011b), process models defined through BPMN (Object
Management Group, 2011a), Web application models defined
through WebML (Ceri et al., 2000) as well as textual Domain Specific
Language (DSL) models (Fowler and Parsons, 2011).
Given that models are so heterogeneous and the processes
involving them so varied, it is actually difficult to define the exact
boundaries for modelling. Moreover, models can be used practically
in different ways and with different levels of maturity (Tomassetti
et al., 2012). While in theory we have a set of best practices, in
practice we only found mixed, half baked, and personalized solu-
tions. As a consequence, the different uses of modelling are complex
and difficult to classify precisely.
When models constitute a crucial and operational ingredi-
ent of software development then the process is called model
driven. There are different model driven techniques: Model Driven
0164-1212/$ –see front matter © 2013 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.jss.2013.03.084
67 citations

MDD State of the Practice
• Model-Driven Development is not just modeling
• Code generation
• Model interpretation
• Model transformation
• Often, toolsmithing
• Promised improvements:
• Productivity
• Portability
• Maintainability and
• Interoperability
• …
MDD ≠

MDD Benefits The Journal of Systems and Software 86 (2013) 2110–2126
Contents lists available at SciVerse ScienceDirect
The Journal of Systems and Software
journal homepage: www.elsevier.com/locate/jss
Relevance, benefits, and problems of software modelling and model driven
techniques—A survey in the Italian industry
Marco Torchianoa,∗
, Federico Tomassettia
, Filippo Riccab
, Alessandro Tisob
, Gianna Reggiob
a
Dip. Automatica e Informatica, Politecnico di Torino, Italy
b
DIBRIS, Università di Genova, Italy
a r t i c l e i n f o
Article history:
Received 23 August 2012
Received in revised form 15 March 2013
Accepted 18 March 2013
Available online 1 April 2013
Keywords:
Model driven techniques
Software modelling
Personal opinion survey
a b s t r a c t
Context: Claimed benefits of software modelling and model driven techniques are improvements in pro-
ductivity, portability, maintainability and interoperability. However, little effort has been devoted at
collecting evidence to evaluate their actual relevance, benefits and usage complications.
Goal: The main goals of this paper are: (1) assess the diffusion and relevance of software modelling and
MD techniques in the Italian industry, (2) understand the expected and achieved benefits, and (3) identify
which problems limit/prevent their diffusion.
Method: We conducted an exploratory personal opinion survey with a sample of 155 Italian software
professionals by means of a Web-based questionnaire on-line from February to April 2011.
Results: Software modelling and MD techniques are very relevant in the Italian industry. The adoption of
simple modelling brings common benefits (better design support, documentation improvement, better
maintenance, and higher software quality), while MD techniques make it easier to achieve: improved
standardization, higher productivity, and platform independence. We identified problems, some hinder-
ing adoption (too much effort required and limited usefulness) others preventing it (lack of competencies
and supporting tools).
Conclusions: The relevance represents an important objective motivation for researchers in this area.
The relationship between techniques and attainable benefits represents an instrument for practitioners
planning the adoption of such techniques. In addition the findings may provide hints for companies and
universities.
1. Introduction
Usually, model-based techniques use models to describe the
architecture and design of a system and/or the behaviour of soft-
ware artefacts generally through a raise in the level of abstraction
(France and Rumpe, 2003). Models can also be used for other
purposes, e.g., to describe business workflows or development pro-
cesses.
They can be used in different phases of the development process
as communication artefacts, as points of references against which
subsequent implementations are verified, or as the basis for further
development. In the latter case, they may become the key elements
in the process, from which other artefacts (most notably code) are
generated.
∗ Corresponding author at: Dip. Automatica e Informatica, Politecnico di Torino,
C.so Duca degli Abruzzi 24, 10129 Torino, Italy. Tel.: +39 011 564 7088;
fax: +39 011 564 7099.
E-mail address: marco.torchiano@polito.it (M. Torchiano).
The term “model” is very general; it is difficult to provide a com-
prehensive yet precise definition of a model. For us, a model is an
artefact realized with the goal to capture and transfer information
in a form as pure as possible limiting the syntax noise. Exam-
ples of models include UML design models (Object Management
Group, 2011b), process models defined through BPMN (Object
Management Group, 2011a), Web application models defined
through WebML (Ceri et al., 2000) as well as textual Domain Specific
Language (DSL) models (Fowler and Parsons, 2011).
Given that models are so heterogeneous and the processes
involving them so varied, it is actually difficult to define the exact
boundaries for modelling. Moreover, models can be used practically
in different ways and with different levels of maturity (Tomassetti
et al., 2012). While in theory we have a set of best practices, in
practice we only found mixed, half baked, and personalized solu-
tions. As a consequence, the different uses of modelling are complex
and difficult to classify precisely.
When models constitute a crucial and operational ingredi-
ent of software development then the process is called model
driven. There are different model driven techniques: Model Driven
0164-1212/$ –see front matter © 2013 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.jss.2013.03.084
2126 M. Torchiano et al. / The Journal of Systems and Software 86 (2013) 2110–2126
Motulsky, H., 2010. Intuitive Biostatistics: A Nonmathematical Guide to Statistical
Thinking. Oxford University Press, New York.
Nugroho, A., Chaudron, M.R.,2008. A survey into the rigor of UML use and its
perceived impact on quality and productivity. In: Proceedings of the ACM-IEEE
International Symposium on Empirical Software Engineering and Measurement,
ESEM’08. ACM, New York, NY, USA, pp. 90–99.
Object Management Group, 2011a. Business Process Model and Notation, v. 2.0,
Standard. Object Management Group.
Object Management Group, 2011b. Unified Modeling Language, Superstructure, v.
2.4, Specifications. Object Management Group.
Punter, T., Ciolkowski, M., Freimut, B., John, I., 2003. Conducting on-line surveys
in software engineering. In: Proceedings of the International Symposium on
Empirical Software Engineering (ISESE), pp. 80–88.
Schmidt, D.C., 2006. Guest editor’s introduction: model-driven engineering. Com-
puter 39, 25–31.
Selic, B., 2003. The pragmatics of model-driven development. IEEE Software 20,
19–25.
Singer, J., Storey, M., Damian, D., 2008. Selecting empirical methods for software
engineering research. In: Singer, Shull, Sjoberg (Eds.), Guide to Advanced Empir-
ical Software Engineering. Springer, London, pp. 285–311.
Tomassetti, F., Torchiano, M., Tiso, A., Ricca, F., Reggio, G., 2012. Maturity of software
modelling and model driven engineering: a survey in the Italian industry. IET
Seminar Digests 2012, 91–100.
Torchiano, M., Di Penta, M., Ricca, F., De Lucia, A., Lanubile, F., 2011a. Migration
of information systems in the Italian industry: a state of the practice survey.
Information and Software Technology 53, 71–86.
Torchiano, M., Tomassetti, F., Ricca, F., Tiso, A., Reggio, G.,2011b. Preliminary findings
from a survey on the MD* state of the practice. In: International Symposium on
Empirical Software Engineering and Measurement (ESEM). IEEE, pp. 372–375.
van Deursen, A., Visser, E., Warmer, J., 2007. Model-driven software evolution: a
research agenda. Technical Report, TUDelft-SERG.
Voelter, M., 2009. Best practices for DSLs and model-driven development. Journal of
Object Technology 8 http://www.jot.fm
Walonick, D.S., 1997. Survival Statistics. StatPac, Inc.
Wohlin, C., Runeson, P., Höst, M., Ohlsson, M., Regnell, B., Wesslén, A., 2000.
Experimentation in Software Engineering – An Introduction. Kluwer Academic
Publishers.
Marco Torchiano is an associate professor at Politecnico
di Torino, Italy; he has been post-doctoral research fel-
low at Norwegian University of Science and Technology
(NTNU), Norway. He received an MSc and a PhD in Com-
puter Engineering from Politecnico di Torino, Italy. He
is author or coauthor of more than 100 research papers
published in international journals and conferences. He is
the co-author of the book ‘Software Development–Case
studies in Java’ from Addison-Wesley, and co-editor of
the book ‘Developing Services for the Wireless Internet’
from Springer. His current research interests are: design
notations, testing methodologies, OTS-based develop-
ment and software engineering for mobile and wireless
applications. The methodological approach he adopts is that of empirical software
engineering.
Federico Tomassetti is a PhD student in the Dept. of Con-
trol and Computer Engineering at Politecnico di Torino.
He is interested in all the means to lift the level of abstrac-
tion, in particular in model-driven development, domain
specific languages and language workbenches. He has
experience also as developer, consultant and technical
writer.
Filippo Ricca is an ass
of Genova, Italy. He r
puter Science from th
the thesis “Analysis, T
Applications”. In 2011
(Most Influential Pape
and Testing of Web Ap
He is author or co
papers published in i
ences/workshops. At th
h-index reported by G
the number of citation
Chair of CSMR 2013, IC
others, he served in the program committees of
ICST, SCAM, CSMR, WCRE and ESEM.
From 1999 to 2006, he worked with the Soft
(now FBK-irst), Trento, Italy. During this time he
on Reverse engineering, Re-engineering and Soft
His current research interests include: Softwa
Empirical studies in Software Engineering, Web
The research is mainly conducted through empir
controlled experiments and surveys.
Alessandro Tiso is a Ph
University of Genoa. He
applied mathematics i
interests are: model dr
mations and use of mod
last 20 years he also w
industry.
Gianna Reggio is Ass
Genova since 1992. Pre
at the same university,
in 1988. She took part i
them SAHARA and SAL
eral international conf
relevant being ETAPS
been a member of the p
national conferences a
2008, and 2007). Her
on the following topics
concurrent systems m
and semantics of prog
started working on development methods for S
several national and international conference an
puter 39, 25–31.
19–25.
Publishers.
engineering.
writer.
Filippo Ricca is an assistant professor at the University
of Genova, Italy. He received his PhD degree in Com-
puter Science from the same University, in 2003, with
the thesis “Analysis, Testing and Re-structuring of Web
Applications”. In 2011 he was awarded the ICSE 2001 MIP
(Most Influential Paper) award, for his paper: “Analysis
and Testing of Web Applications”.
He is author or coauthor of more than 90 research
papers published in international journals and confer-
ences/workshops. At the time of writing (20/03/2013), the
h-index reported by Google Scholar is 24 (13 in Scopus);
the number of citations 2002. Filippo Ricca was Program
Chair of CSMR 2013, ICPC 2011 and WSE 2008. Among the
others, he served in the program committees of the following conferences: ICSM,
From 1999 to 2006, he worked with the Software Engineering group at ITC-irst
(now FBK-irst), Trento, Italy. During this time he was part of the team that worked
on Reverse engineering, Re-engineering and Software Testing.
His current research interests include: Software modeling, Reverse engineering,
Empirical studies in Software Engineering, Web applications and Software Testing.
The research is mainly conducted through empirical methods such as case studies,
Alessandro Tiso is a PhD student in Computer Science at
University of Genoa. He is also a teacher of informatics and
applied mathematics in a high school. His main research
interests are: model driven development, model transfor-
mations and use of models in software engineering. In the
last 20 years he also worked as an IT consultant in the
industry.
Gianna Reggio is Associate Professor at University of
Genova since 1992. Previously she was Assistant Professor
at the same university, she receive the PhD in Informatics
in 1988. She took part in several research projects, among
them SAHARA and SALADIN (PRIN) and co-organized sev-
eral international conferences and workshops, the most
relevant being ETAPS 2001 and MODELS 2006. She has
been a member of the program committee of several inter-
national conferences and in particular MODELS (in 2009,
2008, and 2007). Her research activity focused mainly
on the following topics: software development methods,
concurrent systems modeling and formal specification,
and semantics of programming languages. Recently she
started working on development methods for SOA applications. She is author of
several national and international conference and journals papers and books.
puter 39, 25–31.
19–25.
Publishers.
engineering.
writer.
Filippo Ricca is an assistant professor at the University
of Genova, Italy. He received his PhD degree in Com-
puter Science from the same University, in 2003, with
the thesis “Analysis, Testing and Re-structuring of Web
Applications”. In 2011 he was awarded the ICSE 2001 MIP
(Most Influential Paper) award, for his paper: “Analysis
He is author or coauthor of more than 90 research
papers published in international journals and confer-
ences/workshops. At the time of writing (20/03/2013), the
h-index reported by Google Scholar is 24 (13 in Scopus);
the number of citations 2002. Filippo Ricca was Program
Chair of CSMR 2013, ICPC 2011 and WSE 2008. Among the
others, he served in the program committees of the following conferences: ICSM,
From 1999 to 2006, he worked with the Software Engineering group at ITC-irst
(now FBK-irst), Trento, Italy. During this time he was part of the team that worked
His current research interests include: Software modeling, Reverse engineering,
Empirical studies in Software Engineering, Web applications and Software Testing.
The research is mainly conducted through empirical methods such as case studies,
Alessandro Tiso is a PhD student in Computer Science at
University of Genoa. He is also a teacher of informatics and
applied mathematics in a high school. His main research
interests are: model driven development, model transfor-
mations and use of models in software engineering. In the
last 20 years he also worked as an IT consultant in the
industry.
Gianna Reggio is Associate Professor at University of
Genova since 1992. Previously she was Assistant Professor
at the same university, she receive the PhD in Informatics
in 1988. She took part in several research projects, among
them SAHARA and SALADIN (PRIN) and co-organized sev-
eral international conferences and workshops, the most
relevant being ETAPS 2001 and MODELS 2006. She has
been a member of the program committee of several inter-
national conferences and in particular MODELS (in 2009,
2008, and 2007). Her research activity focused mainly
on the following topics: software development methods,
concurrent systems modeling and formal specification,
and semantics of programming languages. Recently she
started working on development methods for SOA applications. She is author of
several national and international conference and journals papers and books.
puter 39, 25–31.
19–25.
Publishers.
engineering.
writer.
Filippo Ricca is an assistant professor at th
of Genova, Italy. He received his PhD degr
puter Science from the same University, in
the thesis “Analysis, Testing and Re-structu
Applications”. In 2011 he was awarded the IC
(Most Influential Paper) award, for his pape
He is author or coauthor of more than
papers published in international journals
ences/workshops. At the time of writing (20/0
h-index reported by Google Scholar is 24 (13
the number of citations 2002. Filippo Ricca w
Chair of CSMR 2013, ICPC 2011 and WSE 2008
others, he served in the program committees of the following confere
From 1999 to 2006, he worked with the Software Engineering grou
(now FBK-irst), Trento, Italy. During this time he was part of the team
His current research interests include: Software modeling, Reverse
Empirical studies in Software Engineering, Web applications and Softw
The research is mainly conducted through empirical methods such as
Alessandro Tiso is a PhD student in Comput
University of Genoa. He is also a teacher of info
applied mathematics in a high school. His m
interests are: model driven development, mo
mations and use of models in software engine
last 20 years he also worked as an IT consu
industry.
Gianna Reggio is Associate Professor at U
Genova since 1992. Previously she was Assista
at the same university, she receive the PhD in
in 1988. She took part in several research pro
them SAHARA and SALADIN (PRIN) and co-or
eral international conferences and worksho
relevant being ETAPS 2001 and MODELS 20
been a member of the program committee of s
national conferences and in particular MODE
2008, and 2007). Her research activity foc
on the following topics: software developme
concurrent systems modeling and formal s
and semantics of programming languages. R
started working on development methods for SOA applications. She
several national and international conference and journals papers and

Current Research Topics
•Data Quality
•Testing of Mobile App UI

Data Quality
• Data Quality
• Data is a key component in many domains
• Low quality makes the outcomes useless or outright dangerous
• Data is not (anymore) just a part of software
• New ISO quality standards: ISO-25012 , ISO-25024
• Challenges (focus on intrinsic characteristics):
• Practical application of quality measures to concrete cases (Open-Data)
• Quality assessment of unstructured data (NoSql, semantic KB)

Data Quality: practical application
• Open Government Data
• Defined a set of metrics from literature
• Analyzed + municipal data
• Able to identify quality issues
• Different issues depending on disclosure process
• Technical specification: UNI/TS 11725:2018
• Guidelines for the measurement of data quality
• As a member of UNI CT 504
Open data quality measurement framework: Definition and application
to Open Government Data
Antonio Vetrò ⁎, Lorenzo Canova, Marco Torchiano, Camilo Orozco Minotas,
Raimondo Iemma, Federico Morando
Nexa Center for Internet & Society, DAUIN, Politecnico di Torino, Italy
a b s t r a c ta r t i c l e i n f o
Article history:
Received 5 March 2015
Received in revised form 1 February 2016
Accepted 3 February 2016
Available online 20 February 2016
The diffusion of Open Government Data (OGD) in recent years kept a very fast pace. However, evidence from
practitioners shows that disclosing data without proper quality control may jeopardize dataset reuse and nega-
tively affect civic participation. Current approaches to the problem in literature lack a comprehensive theoretical
framework. Moreover, most of the evaluations concentrate on open data platforms, rather than on datasets.
In this work, we address these two limitations and set up a framework of indicators to measure the quality of
Open Government Data on a series of data quality dimensions at most granular level of measurement. We vali-
dated the evaluation framework by applying it to compare two cases of Italian OGD datasets: an internationally
recognized good example of OGD, with centralized disclosure and extensive data quality controls, and samples of
OGD from decentralized data disclosure (municipality level), with no possibility of extensive quality controls as
in the former case, hence with supposed lower quality.
Starting from measurements based on the quality framework, we were able to verify the difference in quality: the
measures showed a few common acquired good practices and weaknesses, and a set of discriminating factors
that pertain to the type of datasets and the overall approach. On the basis of this evaluation, we also provided
technical and policy guidelines to overcome the weaknesses observed in the decentralized release policy, ad-
dressing specific quality aspects.
Keywords:
Open Government Data
Open data quality
Government information quality
Data quality measurement
Empirical assessment
1. Introduction
Open data are data that “can be freely used, modified, and shared by
anyone for any purpose” (Web ref. 1). Compared to proprietary frame-
works, digital commons such as open data are characterized — from
both a legal and a technical point of view — by lower restrictions applied
to their circulation and reuse. This feature is supposed to ultimately fos-
ter collaboration, creativity and innovation (Hofmokl, 2010).
At all administrative levels, the public sector is one of the major pro-
ducers and holders of information, which ranges, e.g., from maps to
companies registers (Aichholzer & Burkert, 2004). During the last
years, the amount and variety of open data released by public adminis-
trations across the world has been tangibly growing (see, e.g., the Open
Data Census by the Open Knowledge Foundation (Web ref. 2)), while
increased political awareness on the subject has been translated in
regulation, including the revised of the EU Directive on Public Sector
Information reuse in 2013, as well as national roadmaps and technical
guidelines. Releasing public sector information as open data can provide
considerable added value, meeting a demand coming from all kinds of
actors, ranging from companies to Non-Governmental Organizations,
from developers to simple citizens. Many suggest that wider and easier
circulation of public datasets could entail interesting (and even unex-
pected) forms of reuse, also for commercial purposes (Vickery, 2011),
and in general improve transparency of public institutions (Stiglitz,
Orszag, & Orszag, 2000; Ubaldi, 2013) and distributed ability to inter-
pret complex phenomena (Janssen, Charalabidis, & Zuiderwijk, 2012).
From the point of view of data reusers, we should take into account
the role of the so-called infomediaries, i.e., players that are able to inter-
pret data and present them effectively to the general public (Mayer-
Schönberger & Zappia, 2011), with a highly diversified set of business
models (Zuiderwijk, Janssen, & Davis, 2014). More in general, open
data consumers manipulate data in many ways, ranging, e.g., from
data integration to classification, also depending on the complementary
assets they hold (Ferro & Osella, 2013). Considering this, legal and
technical openness of datasets is not sufficient, by itself, to create a pro-
lific reuse ecosystem (Helbig, Nakashima, & Dawe, 2012): failures in
providing good quality information might impair not only the reuse of
the data, but also the usage of the institutional portals (Detlor, Hupfer,
Ruhi, & Zhao, 2013). Attempts to increase meaningfulness and reusabil-
ity of public sector information also imply representing and exposing
data so that they can be easily accessed, queried, processed and linked
with other data with no restrictions (Sharon, 2010). Although sharing
Government Information Quarterly 33 (2016) 325–337
⁎ Corresponding author at: Nexa Center for Internet & Society, Politecnico di Torino, Via
Pier Carlo Boggio, 65/A, 10138 Torino, Italy.
E-mail address: antonio.vetro@polito.it (A. Vetrò).
http://dx.doi.org/10.1016/j.giq.2016.02.001
0740-624X/© 2016 Elsevier Inc. All rights reserved.
Contents lists available at ScienceDirect
Government Information Quarterly
journal homepage: www.elsevier.com/locate/govinf
Barlett, D. L., & Steele, J. B. (1985). Forevermore: Nuclear waste in America.
Basili, V., Caldiera, G., & Rombach, H. D. (1994). The goal question metric approach. In:
Encyclopedia of software engineering. John Wiley & Sons, 528–532.
Batini, C., & Scannapieco, M. (2006). Data quality, concepts, methodologies and techniques.
Berlin: Springer.
Batini, C., Cappiello, C., Francalanci, C., & Maurino A. Methodologies for data quality as-
sessment and improvement. ACM Computing Surveys 41, 3, Article 16 (July 2009),
(52 pp.).
Behkamal, Behshid, Kahani, Mohsen, Bagheri, Ebrahim, & Jeremic, Zoran (May 2014).
A metrics-driven approach for quality assessment of linked open data. Journal of
Theoretical and Applied Electronic Commerce Research, 9(2), 64–79.
Berners-Lee, T. (2006). Linked data-design issues. Tech. rep., W3C http://www.w3.org/
DesignIssues/LinkedData.html
Bovee, M., Srivastava, R., & Mak, B. (2003). Conceptual framework and belief-function
approach to assessing overall information quality. International Journal of Intelligent
Systems, 51–74.
Calero, C., Caro, A., & Piattini, M. (2008). An applicable data quality model for web portal
data consumers. World Wide Web, 11(4), 465–484.
Canova, L., Basso, S., Iemma, R., & Morando, F. (2015). Collaborative open data versioning:
A pragmatic approach using linked data. International Conference for E-Democracy and
Open Government 2015 (CeDEM15), Krems, Austria.
Catarci, T., & Scannapieco, M. (2002). Data quality under the computer science perspec-
tive. Archivi Computer, 2.
Detlor, B., Hupfer, Maureen E, Ruhi, U. & Zhao, L. Information quality and community
municipal portal use, Government Information Quarterly, Volume 30, Issue 1,
January 2013, Pages 23-32, http://dx.doi.org/10.1016/j.giq.2012.08.00, (ISSN
0740-624X).
English, L. (1999). Improving data warehouse and business information quality. Wiley &
Sons.
Even, A., & Shankaranarayanan, G. (2009). Utility cost perspectives in data quality man-
agement. The Journal of Computer Information Systems, 50(2), 127–135.
Ferro, E., & Osella, M. (2013). Eight business model archetypes for PSI re-use. “Open data
on the web” workshop, Google campus, London.
Haug, A., Pedersen, A., & Arlbjørn, J. S. (2009). A classification model of ERP system data
quality. Industrial Management & Data Systems, 109(8), 1053–1068. http://dx.doi.
org/10.1108/02635570910991292.
Heinrich, B. (2002). Datenqualitätsmanagement in Data Warehouse-Systemen. (doctoral
thesis, Oldenburg).
Heinrich, B., Klier, M., & Kaiser, M. (2009). A procedure to develop metrics for cur-
rency and its application in CRM. Journal of Data and Information Quality, 1(1),
5.
Helbig, N., Nakashima, M., & Dawe, Sharon S. (June 4–7, 2012). Understanding the value
and limits of government information in policy informatics: A preliminary explora-
tion. Proceedings of the 13th Annual International Conference on Digital Government
Research (dg.o2012).
Hofmokl, J. (2010). The Internet commons: toward an eclectic theoretical framework.
International Journal of the Commons, 4(1), 226–250.
Iemma, R., Morando, F., & Osella, M. (2014). Breaking public administrations' data silos.
eJournal of eDemocracy & Open Government, 6(2).
Janssen, M., Charalabidis, Y., & Zuiderwijk, A. (2012). Benefits, adoption barriers and
myths of open data and open government. Information Systems Management, 29(4),
258–268.
Vassiliou, Y. (1995). In M. Jarke, M. Lenzerini, & P. Vassiliadis (Eds.), Fundamentals of data
warehouses. Springer Verlag.
Jeusfeld, M., Quix, C., & Jarke, M. (1998). Design and analysis of quality information for
datawarehouses. Proceedings of the 17th International Conference on Conceptual
Modeling.
Kaiser, M., Klier, M., & Heinrich, B. (2007). “How to measure data quality? — A metric-
based approach” (2007). ICIS 2007 Proceedings. Paper 108.
Kim, W. (2002). On three major holes in data warehousing today. Journal of Object
Technology, 1(4), 39–47. http://dx.doi.org/10.5381/jot.2002.1.4.c3.
Kuk, G., & Davies, T. (2011). The roles of agency and artifacts in assembling open data
complementarities. Thirty Second International Conference on Information Systems.
Madnick, S., Wang, R., & Xian, X. (2004). The design and implementation of a corporate
householding knowledge processor to improve data quality. Journal of Management
Information Systems, 20(1), 41–49.
Maurino, A., Spahiu, B., Batini, C., & Viscusi, G. (2014). Compliance with Open Government
Data Policies: an empirical evaluation of Italian local public administrations. Twenty
Second European Conference on Information Systems, Tel Aviv.
Mayer-Schönberger, V., & Zappia, Z. (2011). Participation and power: intermediaries of
open data. 1st Berlin Symposium on Internet and Society.
Moraga, C., Moraga, M., Calero, C., & Caro, A. (2009). SQuaRE-aligned data quality model
for web portals. Quality Software, 2009. QSIC’09. 9th International Conference on
(pp. 117–122).
Naumann, F. (2002). Quality-driven query answering for integrated information systems.
Lecture Notes in Computer Science, 2261.
Redman, T. (1996). Data quality for the information age. Artech House.
Reiche, K., & Hofig, E. (2013). Implementation of metadata quality metrics and application
on public government data. Computer software and applications conference workshops
(COMPSACW), 2013 IEEE 37th annual (pp. 236–241).
Sachs, L. (1982). Applied statistics. A handbook of techniques. New York — Heidelberg —
Berlin: Springer-Verlag (734 pp., 59 figs., DM 118,-).
Sande, M. V., Dimou, A., Colpaert, P., Mannens, E., & Van de Walle, R. (2013). Linked data
as enabler for open data ecosystems. Open data on the web 23–24 April 2013, Campus
London, Shoreditch.
Sharon, Dawes S. (2010). Stewardship and usefulness: Policy principles for information-
based transparency. Government Information Quarterly, 27(4), 377–383.
Stiglitz, J. E., Orszag, P. R., & Orszag, J. M. (2000). The role of government in a digital age.
(Commissioned by the computer& communications industry association).
Tauberer, J. (2012). Open government data: The book.
Ubaldi, B. (2013). Open government data: Towards empirical analysis of open govern-
ment data initiatives. Tech. rep. OECD Publishing.
Umbrich, Jürgen, Neumaier, Sebastian, & Polleres, Axel (March 2015). Towards assessing
the quality evolution of Open Data portals. ODQ2015: Open data quality: From theory
to practice workshop (Munich, Germany).
Van de Sompel, H., Sanderson, R., Nelson, M. L., Balakireva, L. L., Shankar, H., & Ainsworth,
S. (2010). An HTTP-based versioning mechanism for linked data. (arXiv preprint arXiv:
1003.3661).
Vickery, G. (2011). Review of recent studies on PSI re-use and related market developments.
Paris: Information Economics.
Wand, Y., & Wang, R. (1996). Anchoring data quality dimensions in ontological founda-
tions. Communications of the ACM, 39, 11.
Wang, R., & Strong, D. (1996). Beyond accuracy: What data quality means to data con-
sumers. Journal of Management Information Systems, 12, 4.
Whitmore A. Using open government data to predict war: A case study of data and
systems challenges, Government Information Quarterly, Volume 31, Issue 4, October
2014, Pages 622-630, (ISSN 0740-624X).
Zuiderwijk, A., Janssen, M., & Davis, C. (2014). Innovation with open data: Essential ele-
ments of open data ecosystems. Information Policy, 19(1), 17–33.
Zuiderwijk, Anneke, & Janssen, Marijn (2015). Participation and data quality in open data
use: Open data infrastructures evaluated. Proceedings of the 15th European Conference
on eGovernment 2015: ECEG 2015. Academic Conferences Limited.
Web references
Web ref. 1. http://opendefinition.org/(last visited on November 5, 2015).
Web ref. 2. http://census.okfn.org/, (last visited on November 5, 2015).
Web ref. 3. http://goo.gl/QlUT1d (last visited on November 5, 2015).
Web ref. 4. https://goo.gl/yq4ElP (last visited on November 5, 2015).
Web ref. 5. http://sunlightfoundation.com/blog/2010/06/23/elenas-inbox/ (last visit on
November 5, 2015).
Web ref. 6. http://nexa.polito.it/lunch-9 (last visit on November 5, 2015).
Web ref. 7. http://okfnlabs.org/bad-data/ (last visit on November 5, 2015).
Web ref. 8. http://www.dati.gov.it/dataset (last visit on November 5, 2015).
Web ref. 9. https://www.opengovawards.org/Awards_Booklet_Final.pdf, (last visit on
November 5, 2015).
Web ref. 10. http://en.wikipedia.org/wiki/European_Social_Fund, (last visit on November
5, 2015).
Web ref. 11. http://www.dati.gov.it/(last visit on November 5, 2015).
Web ref. 12. http://blog.okfn.org/2013/07/02/git-and-github-for-data/ (last visited on
November 5, 2015).
Web ref. 13. http://dat-data.com/(last visited on November 5, 2015).
Web ref. 14. https://code.google.com/p/google-refine/ (last visited on November 5, 2015).
Web ref. 15. http://datacleaner.org/, (last visited on November 5, 2015).
Web ref. 16. http://checklists.opquast.com/en/opendata, (last visited on November 5,
2015).
Web ref. 17. http://odaf.org/papers/Open%20Data%20and%20Metadata%20Standards.pdf,
(last visited on November 5, 2015).
Antonio Vetrò is Director of Research and Policy at the Nexa
Center for Internet and Society at Politecnico di Torino
(Italy). Formerly, he has been a research fellow in the Soft-
ware and System Engineering Department at Technische
Universität München (Germany) and junior scientist at
Fraunhofer Center for Experimental Software Engineering
(MD, U.S.A.). He holds a PhD in Information and System En-
gineering from Politecnico di Torino (Italy). He is interested
in studying the impact of technology on society, with a focus
on technology transfer and internet science, and adopting an
empirical epistemological approach. Contact him at antonio.
vetro@polito.it.
Lorenzo Canova holds a MSc in Industrial engineering from
Politecnico di Torino and wrote his Master thesis on Open
Government Data quality. He is currently a research fellow
of the Nexa Center for Internet & Society at Politecnico di To-
rino. His main research interests are in Open Government
Data, government transparency and linked data. Contact
him at lorenzo.canova@polito.it.
336 A. Vetrò et al. / Government Information Quarterly 33 (2016) 325–337
Behkamal, Behshid, Kahani, Mohsen, Bagheri, Ebrahim, & Jeremic, Zoran (May 2014).
A metrics-driven approach for quality assessment of linked open data. Journal of
Theoretical and Applied Electronic Commerce Research, 9(2), 64–79.
Berners-Lee, T. (2006). Linked data-design issues. Tech. rep., W3C http://www.w3.org/
DesignIssues/LinkedData.html
Bovee, M., Srivastava, R., & Mak, B. (2003). Conceptual framework and belief-function
approach to assessing overall information quality. International Journal of Intelligent
Systems, 51–74.
Calero, C., Caro, A., & Piattini, M. (2008). An applicable data quality model for web portal
data consumers. World Wide Web, 11(4), 465–484.
Canova, L., Basso, S., Iemma, R., & Morando, F. (2015). Collaborative open data versioning:
A pragmatic approach using linked data. International Conference for E-Democracy and
Open Government 2015 (CeDEM15), Krems, Austria.
Catarci, T., & Scannapieco, M. (2002). Data quality under the computer science perspec-
tive. Archivi Computer, 2.
Detlor, B., Hupfer, Maureen E, Ruhi, U. & Zhao, L. Information quality and community
municipal portal use, Government Information Quarterly, Volume 30, Issue 1,
January 2013, Pages 23-32, http://dx.doi.org/10.1016/j.giq.2012.08.00, (ISSN
0740-624X).
English, L. (1999). Improving data warehouse and business information quality. Wiley &
Sons.
Even, A., & Shankaranarayanan, G. (2009). Utility cost perspectives in data quality man-
agement. The Journal of Computer Information Systems, 50(2), 127–135.
Ferro, E., & Osella, M. (2013). Eight business model archetypes for PSI re-use. “Open data
on the web” workshop, Google campus, London.
Haug, A., Pedersen, A., & Arlbjørn, J. S. (2009). A classification model of ERP system data
quality. Industrial Management & Data Systems, 109(8), 1053–1068. http://dx.doi.
org/10.1108/02635570910991292.
Heinrich, B. (2002). Datenqualitätsmanagement in Data Warehouse-Systemen. (doctoral
thesis, Oldenburg).
Heinrich, B., Klier, M., & Kaiser, M. (2009). A procedure to develop metrics for cur-
rency and its application in CRM. Journal of Data and Information Quality, 1(1),
5.
Helbig, N., Nakashima, M., & Dawe, Sharon S. (June 4–7, 2012). Understanding the value
and limits of government information in policy informatics: A preliminary explora-
tion. Proceedings of the 13th Annual International Conference on Digital Government
Research (dg.o2012).
Hofmokl, J. (2010). The Internet commons: toward an eclectic theoretical framework.
International Journal of the Commons, 4(1), 226–250.
Iemma, R., Morando, F., & Osella, M. (2014). Breaking public administrations' data silos.
eJournal of eDemocracy & Open Government, 6(2).
Janssen, M., Charalabidis, Y., & Zuiderwijk, A. (2012). Benefits, adoption barriers and
myths of open data and open government. Information Systems Management, 29(4),
258–268.
Vassiliou, Y. (1995). In M. Jarke, M. Lenzerini, & P. Vassiliadis (Eds.), Fundamentals of data
warehouses. Springer Verlag.
Jeusfeld, M., Quix, C., & Jarke, M. (1998). Design and analysis of quality information for
datawarehouses. Proceedings of the 17th International Conference on Conceptual
Modeling.
Kaiser, M., Klier, M., & Heinrich, B. (2007). “How to measure data quality? — A metric-
based approach” (2007). ICIS 2007 Proceedings. Paper 108.
Kim, W. (2002). On three major holes in data warehousing today. Journal of Object
Technology, 1(4), 39–47. http://dx.doi.org/10.5381/jot.2002.1.4.c3.
Kuk, G., & Davies, T. (2011). The roles of agency and artifacts in assembling open data
complementarities. Thirty Second International Conference on Information Systems.
Madnick, S., Wang, R., & Xian, X. (2004). The design and implementation of a corporate
householding knowledge processor to improve data quality. Journal of Management
Information Systems, 20(1), 41–49.
Maurino, A., Spahiu, B., Batini, C., & Viscusi, G. (2014). Compliance with Open Government
Data Policies: an empirical evaluation of Italian local public administrations. Twenty
Second European Conference on Information Systems, Tel Aviv.
Mayer-Schönberger, V., & Zappia, Z. (2011). Participation and power: intermediaries of
open data. 1st Berlin Symposium on Internet and Society.
Moraga, C., Moraga, M., Calero, C., & Caro, A. (2009). SQuaRE-aligned data quality model
for web portals. Quality Software, 2009. QSIC’09. 9th International Conference on
(pp. 117–122).
Naumann, F. (2002). Quality-driven query answering for integrated information systems.
Lecture Notes in Computer Science, 2261.
Redman, T. (1996). Data quality for the information age. Artech House.
Reiche, K., & Hofig, E. (2013). Implementation of metadata quality metrics and application
on public government data. Computer software and applications conference workshops
(COMPSACW), 2013 IEEE 37th annual (pp. 236–241).
Sachs, L. (1982). Applied statistics. A handbook of techniques. New York — Heidelberg —
Berlin: Springer-Verlag (734 pp., 59 figs., DM 118,-).
Sande, M. V., Dimou, A., Colpaert, P., Mannens, E., & Van de Walle, R. (2013). Linked data
as enabler for open data ecosystems. Open data on the web 23–24 April 2013, Campus
London, Shoreditch.
the quality evolution of Open Data portals. ODQ2015: Open data quality: From theory
to practice workshop (Munich, Germany).
Van de Sompel, H., Sanderson, R., Nelson, M. L., Balakireva, L. L., Shankar, H., & Ainsworth,
S. (2010). An HTTP-based versioning mechanism for linked data. (arXiv preprint arXiv:
1003.3661).
Vickery, G. (2011). Review of recent studies on PSI re-use and related market developments.
Paris: Information Economics.
Wand, Y., & Wang, R. (1996). Anchoring data quality dimensions in ontological founda-
tions. Communications of the ACM, 39, 11.
Wang, R., & Strong, D. (1996). Beyond accuracy: What data quality means to data con-
sumers. Journal of Management Information Systems, 12, 4.
Whitmore A. Using open government data to predict war: A case study of data and
systems challenges, Government Information Quarterly, Volume 31, Issue 4, October
2014, Pages 622-630, (ISSN 0740-624X).
Zuiderwijk, A., Janssen, M., & Davis, C. (2014). Innovation with open data: Essential ele-
ments of open data ecosystems. Information Policy, 19(1), 17–33.
Zuiderwijk, Anneke, & Janssen, Marijn (2015). Participation and data quality in open data
use: Open data infrastructures evaluated. Proceedings of the 15th European Conference
on eGovernment 2015: ECEG 2015. Academic Conferences Limited.
Web references
Web ref. 1. http://opendefinition.org/(last visited on November 5, 2015).
Web ref. 2. http://census.okfn.org/, (last visited on November 5, 2015).
Web ref. 3. http://goo.gl/QlUT1d (last visited on November 5, 2015).
Web ref. 4. https://goo.gl/yq4ElP (last visited on November 5, 2015).
Web ref. 5. http://sunlightfoundation.com/blog/2010/06/23/elenas-inbox/ (last visit on
November 5, 2015).
Web ref. 6. http://nexa.polito.it/lunch-9 (last visit on November 5, 2015).
Web ref. 7. http://okfnlabs.org/bad-data/ (last visit on November 5, 2015).
Web ref. 8. http://www.dati.gov.it/dataset (last visit on November 5, 2015).
Web ref. 9. https://www.opengovawards.org/Awards_Booklet_Final.pdf, (last visit on
November 5, 2015).
Web ref. 10. http://en.wikipedia.org/wiki/European_Social_Fund, (last visit on November
5, 2015).
Web ref. 11. http://www.dati.gov.it/(last visit on November 5, 2015).
Web ref. 12. http://blog.okfn.org/2013/07/02/git-and-github-for-data/ (last visited on
November 5, 2015).
Web ref. 13. http://dat-data.com/(last visited on November 5, 2015).
Web ref. 14. https://code.google.com/p/google-refine/ (last visited on November 5, 2015).
Web ref. 15. http://datacleaner.org/, (last visited on November 5, 2015).
Web ref. 16. http://checklists.opquast.com/en/opendata, (last visited on November 5,
2015).
Web ref. 17. http://odaf.org/papers/Open%20Data%20and%20Metadata%20Standards.pdf,
(last visited on November 5, 2015).
Antonio Vetrò is Director of Research and Policy at the Nexa
Center for Internet and Society at Politecnico di Torino
(Italy). Formerly, he has been a research fellow in the Soft-
ware and System Engineering Department at Technische
Universität München (Germany) and junior scientist at
Fraunhofer Center for Experimental Software Engineering
(MD, U.S.A.). He holds a PhD in Information and System En-
gineering from Politecnico di Torino (Italy). He is interested
in studying the impact of technology on society, with a focus
on technology transfer and internet science, and adopting an
empirical epistemological approach. Contact him at antonio.
vetro@polito.it.
Lorenzo Canova holds a MSc in Industrial engineering from
Politecnico di Torino and wrote his Master thesis on Open
Government Data quality. He is currently a research fellow
of the Nexa Center for Internet & Society at Politecnico di To-
rino. His main research interests are in Open Government
Data, government transparency and linked data. Contact
him at lorenzo.canova@polito.it.
Marco Torchiano is an associate professor at Politecnico di
Torino, Italy; he has been a post-doctoral research fellow at
Norwegian University of Science and Technology (NTNU),
Norway. He received an MSc and a PhD in Computer Engi-
neering from Politecnico di Torino, Italy. He is an author or
coauthor of more than 100 research papers published in in-
ternational journals and conferences. He is the co-author of
the book ‘Software Development—Case studies in Java’ from
Addison-Wesley, and the co-editor of the book ‘Developing
Services for the Wireless Internet’ from Springer. His current
research interests are: design notations, testing methodolo-
gies, OTS-based development, and static analysis. The meth-
odological approach he adopts is that of empirical software
engineering. Contact him at marco.torchiano@polito.it.
Camilo O. Minotas holds a MSc from Politecnico di Torino
and wrote his Master thesis on Open Data Quality Assess-
ment. He is currently a software developer in Colombia. His
main research interests are in open data quality and govern-
ment transparency. Contact him at minotas@gmail.com.
Raimondo Iemma holds a MSc in Industrial engin
from Politecnico di Torino (Italy). His research focu
open data platforms, smart disclosure of consumptio
economics of information, and, in general, open gove
and innovation within (or fostered by) the public sec
tween 2013 and 2015, he served as a Managing Dir
research fellow at the Nexa Center for Internet & So
Politecnico di Torino. He works as a project man
Seac02, a digital company based in Torino. Contact
raimondo.iemma@gmail.com.
Federico Morando (M.A.S. in Economic theory and
metrics from the Univ. of Toulouse, Ph.D. in Instit
Economics and Law from the Univ. of Turin and Gh
an economist, with interdisciplinary research in
focused on the intersection between law, econom
technology. He taught intellectual property and comp
law at Bocconi University in Milan, and he lect
the Politecnico di Torino, at the WIPO LL.M. in Inte
Property and in other post-graduate and doctoral c
Between 2009 and 2015, he served as a Managing D
and then Director of Research and Policy at the Nexa
for Internet & Society at Politecnico di Torino. He is cu
leading a startup company that focused on linke
Contact him at federico.morando@gmail.com.
A. Vetrò et al. / Government Information Quarterly 33 (2016) 325–337
A. Vetrò et al. / Government Information Quarterly 33 (20
Raimondo Iemma holds a MSc in Industrial engi
from Politecnico di Torino (Italy). His research foc
open data platforms, smart disclosure of consumptio
economics of information, and, in general, open gove
and innovation within (or fostered by) the public sec
tween 2013 and 2015, he served as a Managing Dir
research fellow at the Nexa Center for Internet & So
Politecnico di Torino. He works as a project man
Seac02, a digital company based in Torino. Contact
raimondo.iemma@gmail.com.
Federico Morando (M.A.S. in Economic theory and
metrics from the Univ. of Toulouse, Ph.D. in Insti
Economics and Law from the Univ. of Turin and Gh
an economist, with interdisciplinary research in
focused on the intersection between law, econom
technology. He taught intellectual property and com
law at Bocconi University in Milan, and he lect
the Politecnico di Torino, at the WIPO LL.M. in Inte
Property and in other post-graduate and doctoral c
Between 2009 and 2015, he served as a Managing D
and then Director of Research and Policy at the Nexa
for Internet & Society at Politecnico di Torino. He is cu
leading a startup company that focused on linke
Contact him at federico.morando@gmail.com.
A. Vetrò et al. / Government Information Quarterly 33 (2016) 325–337
© UNI – Ente Nazionale Italiano di Unificazione
CODICE PROGETTO UNI 1602662
NUMERO NORMA UNI/TS 11725:2018
TITOLO ITALIANO Ingegneria del software e di sistema - Linee guida per la misurazione della
qualità dei dati
TITOLO INGLESE System and software engineering - Guidelines for the measurement of data
quality
ORGANO COMPETENTE UNI/CT 504 Software Engineering
CO-AUTORE
SOMMARIO La specifica tecnica definisce l’applicazione della misurazione della UNI
CEI ISO/IEC 25024 "Ingegneria del software e di sistema - Requisiti e
valutazione della qualità dei sistemi e del software (SQuaRE) - Misurazione
della qualità dei dati" attraverso il confronto con le prassi attuali e spunti
teorici. La norma UNI CEI ISO/IEC 25024 definisce un set di misure da
applicare nella misurazione della qualità dei dati. Le definizioni delle
misure sono generali per essere applicate a diversi casi specifici.
RELAZIONI NAZIONALI -----
RELAZIONI
INTERNAZIONALI
-----

Data Quality: unstructured data
• Evolution measures based on simple counts
SELECT COUNT(DISTINCT ?s) AS ?COUNT
WHERE { ?s a <C> .}
Semantic Web 0 (0) 1–0 1
IOS Press
A Quality Assessment Approach for Evolving
Knowledge Bases
Mohammad Rifat Ahmmad Rashid a,⇤, Marco Torchiano a, Giuseppe Rizzo b,
Nandana Mihindukulasooriya c and Oscar Corcho c
a
Politecnico di Torino, Italy
E-mail: mohammad.rashid@polito.it, marco.torchiano@polito.it
b
Instituto Superiore Mario Boella
E-mail: giuseppe.rizzo@ismb.it
c
Universidad Politecnica de Madrid, Spain
E-mail: nmihindu@fi.upm.es, ocorcho@fi.upm.es
Abstract
Knowledge bases are nowadays essential components for any task that requires automation with some degrees of intelligence.
Assessing the quality of a Knowledge Base (KB) is a complex task as it often means measuring the quality of structured informa-
tion, ontologies and vocabularies, and queryable endpoints. Popular knowledge bases such as DBpedia, YAGO2, and Wikidata
have chosen the RDF data model to represent their data due to its capabilities for semantically rich knowledge representation.
Despite its advantages, there are challenges in using RDF data model, for example, data quality assessment and validation. In this
paper, we present a novel knowledge base quality assessment approach that relies on evolution analysis. The proposed approach
uses data profiling on consecutive knowledge base releases to compute quality measures that allow detecting quality issues. Our
quality characteristics are based on the KB evolution analysis and we used high-level change detection for measurement func-
tions. In particular, we propose four quality characteristics: Persistency, Historical Persistency, Consistency, and Completeness.
Persistency and historical persistency measures concern the degree of changes and lifespan of any entity type. Consistency and
completeness measures identify properties with incomplete information and contradictory facts. The approach has been assessed
both quantitatively and qualitatively on a series of releases from two knowledge bases, eleven releases of DBpedia and eight
releases of 3cixty. The capability of Persistency and Consistency characteristics to detect quality issues varies significantly be-
tween the two case studies. Persistency measure gives observational results for evolving KBs. It is highly effective in case of KB
with periodic updates such as 3cixty KB. The Completeness characteristic is extremely effective and was able to achieve 95%
precision in error detection for both use cases. The measures are based on simple statistical operations that make the solution
both flexible and scalable.
Keywords: Quality Assessment, Quality Issues, Temporal Analysis, Knowledge Base, Linked Data
1. Introduction
The Linked Data approach consists in exposing and
connecting data from different sources on the Web by
the means of semantic web technologies. Tim Berners-
Lee1
refers to linked open data as a distributed model
*Corresponding author. E-mail: mohammad.rashid@polito.it
1http://www.w3.org/DesignIssues/LinkedData.
html
for the Semantic Web that allows any data provider to
publish its data publicly, in a machine readable format,
and to meaningfully link them with other information
sources over the Web. This is leading to the creation
of Linked Open Data (LOD) 2
cloud hosting several
Knowledge Bases (KBs) making available billions of
RDF triples from different domains such as Geogra-
2http://lod-cloud.net
1570-0844/0-1900/$35.00 c 0 – IOS Press and the authors. All rights reserved

Mobile UI Testing
K-9 Mail
V 2.995 V 3.309

Research Activities: past, present, and future.

Research Activities: past, present, and future.

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Research Activities: past, present, and future.

Similar to Research Activities: past, present, and future. (20)

More from Marco Torchiano

More from Marco Torchiano (14)

Recently uploaded

Recently uploaded (20)

Research Activities: past, present, and future.