SlideShare a Scribd company logo
Sharing Advisory Board
Software Sharing Newsletter
Issue 8: April 2013
Editorial (Marton Vucsan, SAB Chairman)
Kodak, Sony, Philips, your local newspaper, it
happened to all of them: The main product they
built their existence was no longer relevant. Films,
walkmans, light bulbs, encyclopedias, the list is
endless .. Whole industries are wiped out to make
room for new ones. Darwin would have enjoyed
this. At the Big Data seminar that was organized by
the UN in New York you could see the first signals
that for us the bell will ring too in due course.
Choices will have to be made

There is a tremendous opportunity in using the new
information that the planet now produces as by-
product of all its processes. It opens the door to all
kinds of new statistical products and processes to
make them. Let us for this moment focus on the
processes. Looking at the way our researchers
work with the new data we see it is fundamentally
different from what we are used to. First the
amount of data they work with is way too large to
edit, second the tools they use are alien to what we
are familiar with, third the workflow is intermittent
because of the time needed for processing in every
step. Aside from setting up rule sets and creating
workflow type actions no human labor is involved in
the actual production. The creation of these
workflows is knowledge intensive, and once they
are created very little effort is needed to produce
the result. What we see is a shift from manual
labour in production to manual labour in design
which will supply the multiplier we need for sharing
to be effective. In the traditional artisanal
production system there is no multiplier, more
statistical output means more people. Sharing and
collaborating outside the office seems not very
meaningful because of the logistical overhead and
the differences in execution and definition.
Moreover, there is no multiplier present either,
sharing work does not make other work
unnecessary in this kind of setup.
In the big data area where the problems are more
uniform, the real work is in the design of the
process. The processes are more formally defined
and these processes represent the production
knowledge for the statistics they produce. It will be
very profitable to share these processes under the
"build one get ten" rule. The reason is that there is
a multiplier present in the form of a formalized
mostly automated process.
In this Issue:
Open data – taming the tiger
The OECD Open Data Project
Developing software sharing in the
European Statistical System
Improving data collection by soft
computing
Tools for a Sprint
Understanding “Plug and Play”
The knowledge is deployed in the design phase
of the process and sharing parts of the processes
means getting executable process parts in return
including the knowledge that created them. In
this setup sharing means indeed freeing up
resources.
The threat or opportunity of Big Data will help us
to do two things: It will help us shift the balance
between human labor and machine labor towards
machine labor and it will help us to become a
solution sharing industry that can do much more
with much less money.
In the end it might not only be the new products
that will be our opportunity but also the new
industrial processes that will underlie these new
products
Are you Linked ?
The LinkedIn group “Business Architecture in
Statistics” aims to share knowledge and ideas
and promote activities, such as those undertaken
by the Sharing Advisory Board, that support
standards-based modernisation and greater
collaboration between statistical organisations.
Join the discussions at:
http://www.linkedin.com/groups?home=&gid=
4173055
You can also find out more about SAB activities
and outputs via the MSIS wiki:
www1.unece.org/stat/platform/display/msis
Open data – taming the tiger
Eoin McCuirc (Central Statistics Office, Ireland)
The term “open data” means different things to
different people, though the goals of making
information freely available and easily accessible
online are very clear. I’ll start by looking at Tim
Berners-Lee’s classification of five levels of open
data:
★   Make your data available on the Web 
under an open license
★★  Make it available as structured data (e.g. 
Excel sheet instead of image scan of a 
table)
★★★   Use a non‐proprietary format (e.g. CSV 
file instead of an Excel sheet) 
★★★★  Use Linked Data formats (URIs to 
identify things, RDF to represent data)
★★★★★  Link your data to other people’s data to 
provide context
So, there are degrees of “openness” – from simply
putting information up on the web to providing
linked open data. Yes, both are a form of open
data but, though similar in appearance, they are
two completely different animals, as different say
as a cat to a tiger. In this article I want to talk about
the tiger: linked open data, the semantic web and
how the CSO is beginning to meet this new
challenge.
In managing the dissemination of statistics we are
guided by international standards. Principles 14
and 15 of the European Statistics Code of Practice
are particularly relevant:
Principle  14  Coherence  and  Comparability:  European 
Statistics  are  consistent  internally,  over  time  and 
comparable  between  regions  and  countries;  it  is 
possible to combine and make joint use of related data 
from different sources. 
Principle  15  Accessibility  and  Clarity:  European 
Statistics are presented in a clear and understandable 
form,  released  in  a  suitable  and  convenient  manner, 
available  and  accessible  on  an  impartial  basis  with 
supporting metadata and guidance. 
Clearly, the opportunities offered by open data will
help statistical offices to deliver outputs which
match these two principles.
The data deluge, or the accumulation of data about
people, places and things, is changing the world in
which statistical offices process and publish
statistics – and is another important driver for open
data. In general, it is getting more and more
difficult to find the information you need – the
problem of the needle in the haystack. Sooner
rather than later we will need machines to trawl
through all available data in order to find the
proverbial needle. But for this to become possible,
data needs to be structured in a particular way – a
challenge to which the semantic web offers a
solution.
The semantic web provides a way of making data
machine-readable, independent of the variety of
technical platforms and software packages in use
throughout the web. A key concept is that of linked
open data. In many ways, linked open data is
similar to open data: An organisation such as a
statistical office decides what information it wants
to publish on the web and makes the necessary
technical choices about hosting, security, domain
names, content management, maintenance, etc.
These choices apply equally to publishing linked
open data.
However, the key difference is one of language. In
linked open data, semantic web objects are named
to indicate all the attributes needed to make the
data machine-readable without human
intervention. For statisticians, this is an opportunity
to use international classifications (e.g. NACE,
ISCO, ISCED etc.) as de facto standards for linked
open data. Indeed, if we don’t take this opportunity,
it’s possible that other early adopters could set a
different standard.
We are starting on a journey and, unfortunately,
there is no clear road map yet and few precedents
to give guidance. So, how do we acquire the
expertise needed to publish statistics as linked
open data?
In 2012 the CSO began a pilot project with the
Digital Enterprise Research Institute (DERI) at the
National University of Ireland, Galway (NUIG), to
publish some of the Census 2011 results as linked
open data. The project has given valuable
experience to the CSO dissemination team and the
following are some of the important lessons so far:
1. For data to be linked across the semantic
web, objects need to be named. Uniform Resource
Identifiers (URIs) are the code that identifies an
object. Official statistics use many standard
classifications to define their data and, as noted
above, this is very useful when creating URIs.
2. Once the objects have been named a
framework is needed to publish this data on the
web. Using the Resource Description Framework
(RDF) which views the world in triplets –
Special Feature: Open Data
The next two articles explore the implications of
open data for official statistics. The first presents
the view from a national statistical organisation
(CSO Ireland). The second gives the perspective
of an international organisation (OECD).
(Resource, Attribute, Value) – the information is
published on the web. An example of an RDF
statement is (Population of Ireland 2011, Statistic,
4588252).
3. To publish data on the semantic web an
organisation needs to put it in a place and in a
format that a machine will expect. The CSO will
publish its Census 2011 open data on data.cso.ie
not on the CSO website www.cso.ie. In this
scenario machines should get RDF data and users
should get some readable representation of the
data e.g. HTML.
4. Ideally all the URIs an organisation
produces related to a single real world concept –
e.g. Population of Ireland 2011 – should be linked
together.
5. Ideally the URIs would be “cool”, built to last
2000 years or more.
http://www.w3.org/TR/2008/WD-cooluris-20080321/
6. For the Census 2011 pilot it is proposed to
produce a SPARQL (SPARQL Protocol and RDF
Query Language) service to facilitate access to the
data.
The following table sets out the framework which is
planned for Census 2011 results as linked open
data:
Base URI: http://data.cso.ie/
Entity  URI pattern (rel. to 
base) 
RDF class 
Classification  /classification/{id}  skos:ConceptSch
eme 
Concept in a 
classification 
/classification/{id}#{
code} 
skos:Concept 
Dataset  /dataset/{id}  qb:DataSet 
Data structure 
definition 
/dataset/{id}#struct
ure 
qb:DataStructur
eDefinition 
Observation  /dataset/{id}#{dim1}
;{dim2};{dim3} 
qb:Observation 
Property 
(dimension, 
attribute) 
/property#{id}  qb:DimensionPr
operty, 
qb:AttributePro
perty 
Later in 2013, we will publish the outputs from the
project – i.e. Census 2011 results as linked open
data and, to mark International Year of Statistics,
we will have a competition for the best “mash up”
using those statistics. We hope this will not only be
a proof of concept, but also a proof of the value of
linked open data.
The OECD Open Data Project
The OECD is currently undertaking an Open
Data project with the aim of making its statistical
data content machine-readable, retrievable,
indexable and re-usable. The Open Data project
will implement an Application Programming
Interface (API) to provide machine-to-machine
access to the OECD statistical data warehouse
“OECD.Stat” via a number of formats along with
the challenges involved in standardising the
statistical content from the 800+ datasets.
In addition an Open Innovation community will be
created to encourage the re-use of OECD data
via external innovation.
The Open Data project is aligned to the
Knowledge Information Management (KIM)
Ontology Management and Semantic
Infrastructure project to make data accessible via
linked data.
Background to the Open Data Project
Statistics are of strategic importance to the
OECD both as an input for internal analysis and
also as a product for dissemination to a wider
audience in their own right. Following a review of
the OECD Publishing Policy in 2011 a number of
recommendations were proposed to make OECD
statistics “open, accessible and free”. The OECD
Council welcomed this proposal and as a result
the DELTA programme was initiated to
implement these aims.
DELTA Project – Open Data
Openness is one of the key values that guide the
OECD vision for a stronger, cleaner and fairer
economy. Making data open is an important part
of this and to this end a number of open
benchmarks in the project have been defined as
follows:
Completeness – content should include data,
metadata, sources and methods.
Primacy – datasets should be primary and not
aggregated and include details on how the data
was collected.
Timeliness – data should be automatically
available in trusted third-party repositories upon
publication.
Ease of access – data made available via a
simple Application Programming Interface (API)
Machine Readability – data and metadata
provided in machine-readable standard plus
documentation.
Non-discrimination – No special permissions
required to access data.
 Use of common standards – Stored data can be
accessed without a special software license
 Licensing – Creative Commons CC-BY
(Licensees may copy, distribute, display and
perform the work and make derivative works
based on it only if they give the author or
licensor the credits in the manner specified by
these).
 Permanence – Information made available
remain online with archiving over time together
with notification mechanism.
 Usage costs – Free.
Open Data Project goals
Data today can be extracted only via downloads
from OECD.Stat. The ODWS will make them
available to other web sites directly for creating
custom data visualizations, live combinations with
other data sources etc. The goals of the Open
Data project are: to make OECD data machine-
readable, retrievable, indexable and re-usable; to
increase the dissemination and impact of OECD
data via open data services for its statistical data;
and, to encourage re-use of OECD data by
external innovation communities.
The Open Data Project has 3 main deliverables: i)
a full set of “Open-ready” data and metadata; ii) a
set of Open Data Web Services and iii) an
interface for managing the OECD Open Innovation
Community.
“Open-Ready” Data and metadata
For data to be considered “Open-ready” the
existing data and metadata content of the OECD
corporate data warehouse OECD.Stat will be
required to meet certain criteria of structure and
content necessary for machine-to-machine access.
To achieve this, data owners will carry out a self-
assessment of all OECD.Stat data content to
gauge the state of open-readiness for each
dataset. This will involve analysing the metadata
content according to the criteria.
Open Data Web Services (ODWS)
In parallel to the data assessment exercise, the
Open Data Web Services will be developed. This
will involve building a set of Web Services to
provide machine-to-machine access to OECD.Stat
data via a number of formats. This will involve
defining the technical standards for data to be
machine-readable that meet the needs of both
expert and non-expert audiences. Application
Programming Interfaces (API) will be developed to
make the data and metadata in OECD.Stat
available to systems outside the organisation via a
number of formats.
These Web Services will be available to other
organisations currently sharing the .Stat Data
warehouse software via the OECD Statistical
Information System Collaboration Community
(SIS-CC) .
Open Data formats
Data and metadata will be made available to
external users in as many output formats as
possible to maximise data access. The project will
start with formats including: SDMX/JSON, Restful
API, OData, XLS and CSV. Additional formats will
be added as needed over time. These formats
have been chosen for the reasons described
below.
a) Excel/CSV - Excel and CSV are already widely
used exchange standards so including them as
output formats was a fairly obvious decision.
b) SDMX/JSON - JavaScript Object Notation
(JSON) is a text-based open standard designed for
human-readable data interchange and has
become one of the most popular industry-used
open data formats on web sites today.
The Statistics Data and Metadata eXchange
standard (SDMX) provides a standard model for
statistical data and metadata exchange between
national agencies and international agencies,
within national statistical systems and within
organisations. OECD is a member of the SDMX
Sponsor Group (together with the Bank of
International Settlements, European Central Bank,
Eurostat, International Monetary Fund, United
Nations Statistics Division and World Bank). SDMX
data extracts from OECD.Stat are already provided
via a web service; this will be adapted as an API
using the SDMX compact version.
c) Open Data (OData) - OData is an open protocol
for sharing data
Future formats could include Google Data (a
REST-inspired technology), Google Dataset
Publishing Language (DPSL) or Google KML, a
Geospatial file format.
Linked Data and the OECD KIM project
The OECD Knowledge and Information
Management (KIM) has been established to
integrate information and centralise access to all
OECD content (corporate content management,
record management, authoring, etc.). KIM was
launched in parallel to the DELTA project and is
concerned with developing semantic enrichment
and centralized taxonomy linked data support.
A long-term goal of the project is to create linked
data sources with the Resource Description
Framework (RDF) using existing vocabularies to
map data to related subjects and generating a
Software
inventory
 Over 60 statistical software
tools available for sharing
 Find new software, or post
information about yours at:
www1.unece.org/stat/platform/di
splay/msis/Software+Inventory
collection of “triples” (consisting of a subject, a
predicate and an object) known as a “triple-store”.
Each component of the triple has a Unique
Resource Identifier (URI) enable data to be linked
to related sources.
Creating a triple-store from the OECD.Stat data
warehouse will be a huge task and work
investigating the possibilities has only recently
started (at time of writing the tools have not yet
been selected), but the long-term goal is to
conform to the Tim Lee-Berners “5 star” level of
open data.
The vision of the Semantic Web is to extend
principles of the Web from documents to data.
Data should be accessed using the general Web
architecture using, e.g., URI-s; data should be
related to one another just as documents (or
portions of documents) are already. This also
means creation of a common framework that
allows data to be shared and reused across
application, enterprise, and community boundaries,
to be processed automatically by tools as well as
manually, including revealing possible new
relationships among data items.
The OECD Open Innovation Community
The Open Innovation Community will consist of an
interface for managing Open Innovation
Community (OIC) content and involves designing,
building and maintaining this interface to provide
the following:
 Information describing the open platform
 Registration services
 Examples of products developed using the
open platform
 Open Services available with associated
technical documentation
 OIC Blog
 FAQ
Developing software sharing in the
European Statistical System
Denis Grofils (Eurostat)
Software represents an important part of the
assets of the European Statistical System (ESS).
In statistical institutions as in many modern
businesses quality and availability of software is of
primordial importance as it affects directly the way
business processes are executed. If all members
of the ESS are possibly developing software to
some extent, all are using software without any
doubt. The development of software is usually
recognized as costly as well at development as in
maintenance stages. The simple usage of
software may be costly in different respects:
licensing fees, consultancy, training, 

Software may be of different nature and extent, for
example some types of software could be:
 Data collection systems
 Procedures developed in statistical computing
languages for different purposes (sampling,
imputation, weighting, aggregation,
confidentiality protection, etc.)
 Tools for the management of statistical
metadata
 Web portals for data dissemination
As the level of standardisation grows in the
statistical community through harmonization at
international level and through initiatives that
promote industrialisation of official statistical
production (see the work of the HLG of the
UNECE or the Joint strategy and the ESS.VIP
programme at ESS level), the sharing of software
at a wider level becomes easier.
The move toward service oriented architecture
(SOA) and the development of a so called "plug
and play" architecture for statistical production
reinforce strongly the potential of sharing.
Platform-independent services allow distributed
architecture models that promote a high level of
reuse of software components. Services can be
developed independently or cooperatively and
shared among partners. Functionalities of existing
software can be offered as service at limited cost
via proper wrapping. All this make the potential of
software sharing higher than ever.
The possibility to share software among
institutions of the ESS represents several
advantages, notably:
 Increase efficiency and reduce costs by
avoiding multiple developments of virtually the
same products by different organisations
 Increase harmonization and interoperability
through the use of standard software building
blocks
 Improve quality of the data through the use of
widely accepted and validated software
building blocks and improve comparability
among data coming from different countries
 Increase the level of collaboration and resource
sharing between members of the ESS
Several important achievements relating to OSS
have been realised at the European level, notably:
 The European Union Public Licence (EUPL):
The first European OSS licence. It has been
created on the initiative of the European
Commission and is approved by the European
Commission in 22 official languages of the
European Union.
 Joinup: A collaborative platform created by the
European Commission that offers a set of
services to help e-Government professionals
share their experience with interoperability
solutions and support them to find, choose, re-
use, develop, and implement open source
software and semantic interoperability assets.
The ESS IT Directors Group (ITDG) mandated the
Statistical Information System Architecture and
Integration working group (SISAI) to launch a task
force dealing with the development of policy and
guidelines supporting ESS software sharing. The
work of this task force started during fall 2012.
The following aspects of software sharing are
tackled:
 Definition of software of interest: In this
context the term ‘software’ is to be understood
in its broadest sense as any set of computer
programs, these being defined as any set of
instructions for computers. Objective criteria for
defining the target of the recommendations are
necessary. Software of interest is defined as
“software used by members of the ESS to
support directly activities of the GSBPM in
order to realise the statistical programme of the
ESS”. It should be noted that this definition is
independent of technological characteristics of
software (web-based, command-line batch,
macros, web-services, etc.).
 Software catalogue: The way a catalogue of
ESS software should be maintained and which
information should be recorder is defined. A
distinction is made between unshared software
(used by only one ESS-member) for which a
minimal set of information is collected and
shared software (used by several ESS-
members) for which an extensive set of
information is collected.
This newsletter and previous editions are also
available on-line at:
http://www1.unece.org/stat/platform/display/
msis/SAB+Newsletter
 Sharing scenarios: Several scenarios are
identified and the applicability of
recommendations per scenario is defined (i.e.
all recommendations do not apply to all
scenarios).
 Sharing software use: The federation of
software users through the creation of user
communities is organized. This concerns
software published under any type of licence
inclusive commercial software.
 Sharing software development:
Recommendations are made for each step of
the development cycle. As an example it is
recommended to consider several type of
constraints when designing software:
Architectural constrains (consistency with
GSBPM & GSIM, link with PnP constraints),
clear documentation of methodological
aspects, data protection constraints specific to
the ESS, support for multilingualism and legal
roadmap (particularly intellectual property right
tracking when developing component-based
applications).
 Software quality evaluation: A template for
software quality assessment is provided.
Evaluations of elaborated recommendations on
real cases were performed to evaluate
propositions against reality and incorporate
feedback from these experiences. Three
illustration cases were used: Blaise, Demetra+
and SDMX-RI.
The set of draft recommendations elaborated by
the task force will be submitted in the coming
weeks to the Statistical Information System
Architecture and Integration working group (SISAI)
and then to the ESS IT Directors Group (ITDG).
Improving data collection by soft
computing
Miroslav Hudec and Jana JuriovĂĄ (Infostat)
The applicability of soft computing (fuzzy logic
and neural networks) as a modern means to
improve the collection and the quality of data for
business and trade statistics is one of topics of
the Blue-ETS project (http://www.blue-
ets.istat.it/).
The main findings which support this line of
development are:
 Large complex administrative and statistical
databases contain valuable information
which can be mined using powerful
methodologies;
 Statisticians possess knowledge on how to
deal with their tasks, but this knowledge
cannot be always expressed by precise
rules.
In order to estimate missing values, relations
between similar respondents are relevant.
Mining the Intrastat database by neural
networks (NN) reveals that it is a rational option
that could present a solution. NNs find patterns
and relations between similar respondents. In
this way we are able to estimate items if we
have enough data available from other
respondents.
Fuzzy rules expressed by linguistic terms and
quantifiers reveal levels of similarity between
imputed and surveyed values.
Similar techniques are also promising for
dissemination. People prefer to use expressions
of natural language in searching for useful data
and information. For example: select regions
where most municipalities have a small altitude
above sea level, etc. The result is entities
ranked according to the degree of match to the
query condition.
Modernization of the first and the last stage of
data collection could create a chain reaction of
improvements in data quality. Better data
dissemination (by flexible queries) could
motivate respondents to provide their own data
more timely and accurately, and reduce the
frequency of missing values implying more
efficient imputation (less missing values and
powerful neural networks).
Relevant equations, models and experimental
tools have been created in order to evaluate
pros and cons. The next step is the creation of
full functional tools and their adaptation to
particular needs.
What does Big data mean
for official statistics?
A new paper prepared by leading international
experts has recently been released by the High-
Level Group for the Modernisation of Statistical
Production and Services:
http://www1.unece.org/stat/platform/pages/v
iewpage.action?pageId=77170614
Tools for a Sprint
Carlo Vaccari (a sprinter)
In Ottawa from 8 to 13 April 2013 we had a Sprint for the Plug & Play Architecture Project. People from
Australia, Canada, Eurostat, Italy, Mexico, Netherlands, New Zealand, Sweden, UNECE and United
Kingdom met together to start defining a “common statistical production architecture for the world's official
statistical industry.” as stated by High-Level Group for the Modernisation of Statistical Production and
Services (see http://www1.unece.org/stat/platform/display/hlgbas).
The objective of the sprint session was to ensure agreement on key principles regarding the architectural
blueprint to build an interoperable statistical industry platform.
We will discuss the documents produced by this meeting over the next few weeks. Here I just want to show
you which tools were used in the Sprint.
Paper, a lot of paper
We wrote a lot on sheets, flip charts, post-its of any color and any shape.
Paper was used to explain, show, collect, store, group and debate ideas.
White-boards
We used many white-boards writing there with markers of any type – often we used cameras/mobiles to get
a picture of what was written to be able to transfer the concepts to digital files (we would love smart boards
like: www.youtube.com/watch?v=NZNTgglPbUA)
Mixed
And yes, we used paper and boards together, very useful when you
want to group concepts and keep track of what was done.
Wiki
We inserted documents, presentations, images,
discussion, glossary and so on in the UNECE
wiki.
Mind Maps
Often we used Mind Maps to capture
brainstorming and discussions: one of the
best ways to avoid losing ideas and
summarize what has been said in lengthy
discussions
Presentations
Presentation software was used not only to
prepare slides to present, but also to draw and
develop schemas. Using notes and colors,
presentation software has then been used as a
kind of digital dashboard.
Lego bricks
Each participant received from our wonderful facilitators three
Lego bricks of different colors. We had to raise them to
indicate respectively: “I want to speak” “Off topic” “Too much
detail”. A very simple way to force participants to follow rules
for an efficient exchange of views.
Lollies
Each participant brought sweets (“lollies” in Australian English) from
their country, to share with partners. Biscuits, chocolates, sweets of all
kinds were the fuel to provide energy to tired brains
Understanding “Plug and Play”
Marton Vucsan (Statistics Netherlands)
I have high hopes for the Common Statistical
Production Architecture (CSPA) Project, commonly
known under the more profane title of Plug and
Play. Although it sounds easy, it may prove to be
hard, very hard. From what I hear, “plug and play”
has many interpretations. Some point in the
direction of the feared “mother of all systems”
projects that never work. Understanding what is
really meant by the current CSPA project is
important because CSPA is something completely
different from the big feared projects of the past.
CSPA is about reducing complexity and getting
operating system independence. It is also about
sharing and reducing our efforts while still getting
what we need. To achieve this we have to realize
that our means of production are composed of
different levels of abstraction. There are the
methods, the process descriptions and finally the
applications (I am deliberately keeping it simple
here). Normally, to arrive at a statistical output we
describe a method, create a process and build an
application. All three are normally monolithic in
nature and custom made. Unshareable, un-
reusable, expensive, complex. The stuff called
“legacy apps”, the stuff we should stop making.
CSPA starts with the insight, that splitting things up
reduces complexity. The GSBPM does this; it splits
up the statistical process into easy to understand
sub-processes. Thinking and building in sub-
systems reduces complexity and increases
reliability. Many programmers struggle with this,
trying to split up a given solution into meaningful
parts and often failing. In hindsight, the reason for
that failure is obvious; the reduction in complexity
has to be done at a much higher level. The
complexity is often in the methods and the way the
processes were thought up, independent from the
IT implementation. If we really want to reduce
complexity, that is our point of attack: the level
were we specify our statistical recipe.
As a statistical community, we seem to agree that
statistical outputs can be produced by processes
composed of GSBPM sub-processes. With the
right compromises we will be able to use these
sub-processes or components across a broad
range of statistics and agencies; like the engines
on a plane or the motor management system in
your car. Just like the conceptual understanding
that a car and a plane are a collection of functional
sub-systems, we need to understand that a
statistical production system is a collection of
functional sub-processes.
Once we are able to think of our processes as
assemblies of components we can reuse them or
exchange them. Of course it is not that simple, but
there are powerful forces at work to make that
happen. Look what happened in other industries.
When a component, say a motor management
system, is available, most designs gravitate to
using that component because it is much cheaper
than to “roll your own”.
A component can be manufactured separately
from the system it will be used in. Rolls Royce
don’t need planes to manufacture engines. The
key is to do it at the right conceptual level. Others
have done it (look at your phone); we can do it
too!
Many statistical organisations are modernising
using Enterprise Architecture to underpin their
vision and change strategy. This enables them to
develop statistical services in a standard way.
Enterprise architecture creates an environment
which can change and support business goals. It
shows what the business needs are, where the
organisation wants to be, and ensures that IT
strategy aligns with this. It helps to remove silos,
improves collaboration and ensures that
technology is aligned to business needs.
In parallel, the High Level Group for the
Modernization of Statistical Production and
Services (HLG) is developing the CSPA. This will
be a generic architecture for statistical production,
and will serve as an industry architecture for
official statistics. Adopting a common architecture
will make it easier for organisations to standardise
and combine the components of statistical
production, regardless of where the statistical
services are built.
The CSPA also provides a starting point for
concerted developments of statistical
infrastructure and shared investment across
statistical organisations.
Version 0.1 of the CSPA documentation has
just been released for public comment at:
www1.unece.org/stat/platform/x/_ISwB
Your feedback is welcome!

More Related Content

What's hot

Data Center Computing for Data Science: an evolution of machines, middleware,...
Data Center Computing for Data Science: an evolution of machines, middleware,...Data Center Computing for Data Science: an evolution of machines, middleware,...
Data Center Computing for Data Science: an evolution of machines, middleware,...
Paco Nathan
 
Delivering on Standards for Publishing Government Linked Data
Delivering on Standards for Publishing Government Linked DataDelivering on Standards for Publishing Government Linked Data
Delivering on Standards for Publishing Government Linked Data
3 Round Stones
 
DSSG Speaker Series: Paco Nathan
DSSG Speaker Series: Paco NathanDSSG Speaker Series: Paco Nathan
DSSG Speaker Series: Paco Nathan
Paco Nathan
 
Confluence2016
Confluence2016Confluence2016
Confluence2016Bebo White
 
A Roadmap Towards Big Data Opportunities, Emerging Issues and Hadoop as a Sol...
A Roadmap Towards Big Data Opportunities, Emerging Issues and Hadoop as a Sol...A Roadmap Towards Big Data Opportunities, Emerging Issues and Hadoop as a Sol...
A Roadmap Towards Big Data Opportunities, Emerging Issues and Hadoop as a Sol...
Rida Qayyum
 
Linked Water Data For Water Information Management
Linked Water Data For Water Information ManagementLinked Water Data For Water Information Management
Linked Water Data For Water Information Management
Edward Curry
 
How to collect and organize data
How to collect and organize dataHow to collect and organize data
How to collect and organize data
Frieda Brioschi
 
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddjData-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
Mirko Lorenz
 
Data Science For Social Good: Tackling the Challenge of Homelessness
Data Science For Social Good: Tackling the Challenge of HomelessnessData Science For Social Good: Tackling the Challenge of Homelessness
Data Science For Social Good: Tackling the Challenge of Homelessness
Anita Luthra
 
Making the invisible visible. Managing the digital footprint of development p...
Making the invisible visible. Managing the digital footprint of development p...Making the invisible visible. Managing the digital footprint of development p...
Making the invisible visible. Managing the digital footprint of development p...
UNDP Eurasia
 
Big Data, Data Science, Machine Intelligence and Learning: Demystification, T...
Big Data, Data Science, Machine Intelligence and Learning: Demystification, T...Big Data, Data Science, Machine Intelligence and Learning: Demystification, T...
Big Data, Data Science, Machine Intelligence and Learning: Demystification, T...
Prof. Dr. Diego Kuonen
 
Data dynamite presentation
Data dynamite presentationData dynamite presentation
Data dynamite presentation
W. David Stephenson
 
A Swiss Statistician's 'Big Tent' View on Big Data and Data Science (Version 10)
A Swiss Statistician's 'Big Tent' View on Big Data and Data Science (Version 10)A Swiss Statistician's 'Big Tent' View on Big Data and Data Science (Version 10)
A Swiss Statistician's 'Big Tent' View on Big Data and Data Science (Version 10)
Prof. Dr. Diego Kuonen
 
Challenges and outlook with Big Data
Challenges and outlook with Big Data Challenges and outlook with Big Data
Challenges and outlook with Big Data
IJCERT JOURNAL
 
Broad Data
Broad DataBroad Data
Broad Data
James Hendler
 
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 9)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 9)A Statistician's 'Big Tent' View on Big Data and Data Science (Version 9)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 9)
Prof. Dr. Diego Kuonen
 
Designing a second generation of open data platforms
Designing a second generation of open data platformsDesigning a second generation of open data platforms
Designing a second generation of open data platforms
Yannis Charalabidis
 
Big Data, Data-Driven Decision Making and Statistics Towards Data-Informed Po...
Big Data, Data-Driven Decision Making and Statistics Towards Data-Informed Po...Big Data, Data-Driven Decision Making and Statistics Towards Data-Informed Po...
Big Data, Data-Driven Decision Making and Statistics Towards Data-Informed Po...
Prof. Dr. Diego Kuonen
 
A Statistician's `Big Tent' View on Big Data and Data Science in Health Scien...
A Statistician's `Big Tent' View on Big Data and Data Science in Health Scien...A Statistician's `Big Tent' View on Big Data and Data Science in Health Scien...
A Statistician's `Big Tent' View on Big Data and Data Science in Health Scien...
Prof. Dr. Diego Kuonen
 
Mac373 med312 data journalism lecture
Mac373 med312 data journalism lectureMac373 med312 data journalism lecture
Mac373 med312 data journalism lecture
Rob Jewitt
 

What's hot (20)

Data Center Computing for Data Science: an evolution of machines, middleware,...
Data Center Computing for Data Science: an evolution of machines, middleware,...Data Center Computing for Data Science: an evolution of machines, middleware,...
Data Center Computing for Data Science: an evolution of machines, middleware,...
 
Delivering on Standards for Publishing Government Linked Data
Delivering on Standards for Publishing Government Linked DataDelivering on Standards for Publishing Government Linked Data
Delivering on Standards for Publishing Government Linked Data
 
DSSG Speaker Series: Paco Nathan
DSSG Speaker Series: Paco NathanDSSG Speaker Series: Paco Nathan
DSSG Speaker Series: Paco Nathan
 
Confluence2016
Confluence2016Confluence2016
Confluence2016
 
A Roadmap Towards Big Data Opportunities, Emerging Issues and Hadoop as a Sol...
A Roadmap Towards Big Data Opportunities, Emerging Issues and Hadoop as a Sol...A Roadmap Towards Big Data Opportunities, Emerging Issues and Hadoop as a Sol...
A Roadmap Towards Big Data Opportunities, Emerging Issues and Hadoop as a Sol...
 
Linked Water Data For Water Information Management
Linked Water Data For Water Information ManagementLinked Water Data For Water Information Management
Linked Water Data For Water Information Management
 
How to collect and organize data
How to collect and organize dataHow to collect and organize data
How to collect and organize data
 
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddjData-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
Data-driven journalism: What is there to learn? (Stanford, June 2010) #ddj
 
Data Science For Social Good: Tackling the Challenge of Homelessness
Data Science For Social Good: Tackling the Challenge of HomelessnessData Science For Social Good: Tackling the Challenge of Homelessness
Data Science For Social Good: Tackling the Challenge of Homelessness
 
Making the invisible visible. Managing the digital footprint of development p...
Making the invisible visible. Managing the digital footprint of development p...Making the invisible visible. Managing the digital footprint of development p...
Making the invisible visible. Managing the digital footprint of development p...
 
Big Data, Data Science, Machine Intelligence and Learning: Demystification, T...
Big Data, Data Science, Machine Intelligence and Learning: Demystification, T...Big Data, Data Science, Machine Intelligence and Learning: Demystification, T...
Big Data, Data Science, Machine Intelligence and Learning: Demystification, T...
 
Data dynamite presentation
Data dynamite presentationData dynamite presentation
Data dynamite presentation
 
A Swiss Statistician's 'Big Tent' View on Big Data and Data Science (Version 10)
A Swiss Statistician's 'Big Tent' View on Big Data and Data Science (Version 10)A Swiss Statistician's 'Big Tent' View on Big Data and Data Science (Version 10)
A Swiss Statistician's 'Big Tent' View on Big Data and Data Science (Version 10)
 
Challenges and outlook with Big Data
Challenges and outlook with Big Data Challenges and outlook with Big Data
Challenges and outlook with Big Data
 
Broad Data
Broad DataBroad Data
Broad Data
 
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 9)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 9)A Statistician's 'Big Tent' View on Big Data and Data Science (Version 9)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 9)
 
Designing a second generation of open data platforms
Designing a second generation of open data platformsDesigning a second generation of open data platforms
Designing a second generation of open data platforms
 
Big Data, Data-Driven Decision Making and Statistics Towards Data-Informed Po...
Big Data, Data-Driven Decision Making and Statistics Towards Data-Informed Po...Big Data, Data-Driven Decision Making and Statistics Towards Data-Informed Po...
Big Data, Data-Driven Decision Making and Statistics Towards Data-Informed Po...
 
A Statistician's `Big Tent' View on Big Data and Data Science in Health Scien...
A Statistician's `Big Tent' View on Big Data and Data Science in Health Scien...A Statistician's `Big Tent' View on Big Data and Data Science in Health Scien...
A Statistician's `Big Tent' View on Big Data and Data Science in Health Scien...
 
Mac373 med312 data journalism lecture
Mac373 med312 data journalism lectureMac373 med312 data journalism lecture
Mac373 med312 data journalism lecture
 

Viewers also liked

Web 2.0 dirigenti
Web 2.0 dirigentiWeb 2.0 dirigenti
Web 2.0 dirigentiCarlo Vaccari
 
Core kick off vaccari
Core kick off vaccariCore kick off vaccari
Core kick off vaccari
Carlo Vaccari
 
Social network and job searching and SN for researchers
Social network and job searching and SN for researchersSocial network and job searching and SN for researchers
Social network and job searching and SN for researchers
Carlo Vaccari
 
Software 2009 - meeting unibz
Software 2009 - meeting unibzSoftware 2009 - meeting unibz
Software 2009 - meeting unibz
Carlo Vaccari
 
Guidelines for Public Finance Data Dissemination and Access
Guidelines for Public Finance Data Dissemination and AccessGuidelines for Public Finance Data Dissemination and Access
Guidelines for Public Finance Data Dissemination and Access
Carlo Vaccari
 
ConfSL 2010: open source in INAIL e Istat
ConfSL 2010: open source in INAIL e IstatConfSL 2010: open source in INAIL e Istat
ConfSL 2010: open source in INAIL e Istat
Carlo Vaccari
 
CORA for MSIS 2010
CORA for MSIS 2010CORA for MSIS 2010
CORA for MSIS 2010
Carlo Vaccari
 
IT tools for statistics, visualization, open data
IT tools for statistics, visualization, open dataIT tools for statistics, visualization, open data
IT tools for statistics, visualization, open data
Carlo Vaccari
 
CORE final workshop introduction
CORE final workshop introductionCORE final workshop introduction
CORE final workshop introduction
Carlo Vaccari
 
CORE ESSnet Report @MSIS 2011
CORE ESSnet Report @MSIS 2011CORE ESSnet Report @MSIS 2011
CORE ESSnet Report @MSIS 2011
Carlo Vaccari
 
Yves Studer: Big Data in practice
Yves Studer: Big Data in practiceYves Studer: Big Data in practice
Yves Studer: Big Data in practice
Carlo Vaccari
 
2. Evoluzione del Web: dal Web 1.0 al Web 2.0
2. Evoluzione del Web: dal Web 1.0 al Web 2.02. Evoluzione del Web: dal Web 1.0 al Web 2.0
2. Evoluzione del Web: dal Web 1.0 al Web 2.0
Roberto Polillo
 

Viewers also liked (12)

Web 2.0 dirigenti
Web 2.0 dirigentiWeb 2.0 dirigenti
Web 2.0 dirigenti
 
Core kick off vaccari
Core kick off vaccariCore kick off vaccari
Core kick off vaccari
 
Social network and job searching and SN for researchers
Social network and job searching and SN for researchersSocial network and job searching and SN for researchers
Social network and job searching and SN for researchers
 
Software 2009 - meeting unibz
Software 2009 - meeting unibzSoftware 2009 - meeting unibz
Software 2009 - meeting unibz
 
Guidelines for Public Finance Data Dissemination and Access
Guidelines for Public Finance Data Dissemination and AccessGuidelines for Public Finance Data Dissemination and Access
Guidelines for Public Finance Data Dissemination and Access
 
ConfSL 2010: open source in INAIL e Istat
ConfSL 2010: open source in INAIL e IstatConfSL 2010: open source in INAIL e Istat
ConfSL 2010: open source in INAIL e Istat
 
CORA for MSIS 2010
CORA for MSIS 2010CORA for MSIS 2010
CORA for MSIS 2010
 
IT tools for statistics, visualization, open data
IT tools for statistics, visualization, open dataIT tools for statistics, visualization, open data
IT tools for statistics, visualization, open data
 
CORE final workshop introduction
CORE final workshop introductionCORE final workshop introduction
CORE final workshop introduction
 
CORE ESSnet Report @MSIS 2011
CORE ESSnet Report @MSIS 2011CORE ESSnet Report @MSIS 2011
CORE ESSnet Report @MSIS 2011
 
Yves Studer: Big Data in practice
Yves Studer: Big Data in practiceYves Studer: Big Data in practice
Yves Studer: Big Data in practice
 
2. Evoluzione del Web: dal Web 1.0 al Web 2.0
2. Evoluzione del Web: dal Web 1.0 al Web 2.02. Evoluzione del Web: dal Web 1.0 al Web 2.0
2. Evoluzione del Web: dal Web 1.0 al Web 2.0
 

Similar to Sharing Advisory Board newsletter #8

Open Data and Artificial Intelligence
Open Data and Artificial IntelligenceOpen Data and Artificial Intelligence
Open Data and Artificial Intelligence
Open Knowledge Nepal
 
The Internet of Things: Exploring revenue generating use cases
The Internet of Things: Exploring revenue generating use casesThe Internet of Things: Exploring revenue generating use cases
The Internet of Things: Exploring revenue generating use cases
Deloitte United States
 
Putting the L in front: from Open Data to Linked Open Data
Putting the L in front: from Open Data to Linked Open DataPutting the L in front: from Open Data to Linked Open Data
Putting the L in front: from Open Data to Linked Open Data
Martin Kaltenböck
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked dataOpen Data Support
 
Lecture week 5 -
Lecture week 5 -Lecture week 5 -
Lecture week 5 -Fatemeh Ahmadi
 
Open Data-Driven Innovation and Smart Cities_Open Data Business Model and Pat...
Open Data-Driven Innovation and Smart Cities_Open Data Business Model and Pat...Open Data-Driven Innovation and Smart Cities_Open Data Business Model and Pat...
Open Data-Driven Innovation and Smart Cities_Open Data Business Model and Pat...Fatemeh Ahmadi
 
Innovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerInnovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringer
Microsoft
 
Beltug philippe van impe - opendata
Beltug   philippe van impe - opendataBeltug   philippe van impe - opendata
Beltug philippe van impe - opendata
DigitYser
 
Semic 2011 highlights report
Semic 2011 highlights report Semic 2011 highlights report
Semic 2011 highlights report
Semic.eu
 
#opendata Back to the future
#opendata Back to the future#opendata Back to the future
#opendata Back to the future
Slim Turki, Dr.
 
CMI tech trends March 2015
CMI tech trends March 2015CMI tech trends March 2015
CMI tech trends March 2015
Philip Sheldrake
 
Big data upload
Big data uploadBig data upload
Big data upload
Bhavin Tandel
 
NIC Linked Data: the OHIO project
NIC Linked Data:   the OHIO projectNIC Linked Data:   the OHIO project
NIC Linked Data: the OHIO project
Michael Wilkinson
 
Making sense-of-the-chaos
Making sense-of-the-chaosMaking sense-of-the-chaos
Making sense-of-the-chaosswaipnew
 
Open Source & Open Data Session report from imaGIne 2014 Conference
Open Source & Open Data Session report from imaGIne 2014 ConferenceOpen Source & Open Data Session report from imaGIne 2014 Conference
Open Source & Open Data Session report from imaGIne 2014 Conference
GSDI Association
 
Open Data per il riuso della PSI: l'Europa spinge sull'economia del futuro
Open Data per il riuso della PSI: l'Europa spinge sull'economia del futuroOpen Data per il riuso della PSI: l'Europa spinge sull'economia del futuro
Open Data per il riuso della PSI: l'Europa spinge sull'economia del futuro
Matteo Brunati
 
Data driven innovation for growth and well being
Data driven innovation for growth and well beingData driven innovation for growth and well being
Data driven innovation for growth and well being
innovationoecd
 
A Guide to Data Innovation for Development - From idea to proof-of-concept
A Guide to Data Innovation for Development - From idea to proof-of-conceptA Guide to Data Innovation for Development - From idea to proof-of-concept
A Guide to Data Innovation for Development - From idea to proof-of-concept
UN Global Pulse
 
Social Space for Geospatial Information
Social Space for Geospatial InformationSocial Space for Geospatial Information
Social Space for Geospatial InformationNaturNetPlus
 
Ist africa paper_ref_115_doc_3988
Ist africa paper_ref_115_doc_3988Ist africa paper_ref_115_doc_3988
Ist africa paper_ref_115_doc_3988Karel Charvat
 

Similar to Sharing Advisory Board newsletter #8 (20)

Open Data and Artificial Intelligence
Open Data and Artificial IntelligenceOpen Data and Artificial Intelligence
Open Data and Artificial Intelligence
 
The Internet of Things: Exploring revenue generating use cases
The Internet of Things: Exploring revenue generating use casesThe Internet of Things: Exploring revenue generating use cases
The Internet of Things: Exploring revenue generating use cases
 
Putting the L in front: from Open Data to Linked Open Data
Putting the L in front: from Open Data to Linked Open DataPutting the L in front: from Open Data to Linked Open Data
Putting the L in front: from Open Data to Linked Open Data
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked data
 
Lecture week 5 -
Lecture week 5 -Lecture week 5 -
Lecture week 5 -
 
Open Data-Driven Innovation and Smart Cities_Open Data Business Model and Pat...
Open Data-Driven Innovation and Smart Cities_Open Data Business Model and Pat...Open Data-Driven Innovation and Smart Cities_Open Data Business Model and Pat...
Open Data-Driven Innovation and Smart Cities_Open Data Business Model and Pat...
 
Innovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerInnovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringer
 
Beltug philippe van impe - opendata
Beltug   philippe van impe - opendataBeltug   philippe van impe - opendata
Beltug philippe van impe - opendata
 
Semic 2011 highlights report
Semic 2011 highlights report Semic 2011 highlights report
Semic 2011 highlights report
 
#opendata Back to the future
#opendata Back to the future#opendata Back to the future
#opendata Back to the future
 
CMI tech trends March 2015
CMI tech trends March 2015CMI tech trends March 2015
CMI tech trends March 2015
 
Big data upload
Big data uploadBig data upload
Big data upload
 
NIC Linked Data: the OHIO project
NIC Linked Data:   the OHIO projectNIC Linked Data:   the OHIO project
NIC Linked Data: the OHIO project
 
Making sense-of-the-chaos
Making sense-of-the-chaosMaking sense-of-the-chaos
Making sense-of-the-chaos
 
Open Source & Open Data Session report from imaGIne 2014 Conference
Open Source & Open Data Session report from imaGIne 2014 ConferenceOpen Source & Open Data Session report from imaGIne 2014 Conference
Open Source & Open Data Session report from imaGIne 2014 Conference
 
Open Data per il riuso della PSI: l'Europa spinge sull'economia del futuro
Open Data per il riuso della PSI: l'Europa spinge sull'economia del futuroOpen Data per il riuso della PSI: l'Europa spinge sull'economia del futuro
Open Data per il riuso della PSI: l'Europa spinge sull'economia del futuro
 
Data driven innovation for growth and well being
Data driven innovation for growth and well beingData driven innovation for growth and well being
Data driven innovation for growth and well being
 
A Guide to Data Innovation for Development - From idea to proof-of-concept
A Guide to Data Innovation for Development - From idea to proof-of-conceptA Guide to Data Innovation for Development - From idea to proof-of-concept
A Guide to Data Innovation for Development - From idea to proof-of-concept
 
Social Space for Geospatial Information
Social Space for Geospatial InformationSocial Space for Geospatial Information
Social Space for Geospatial Information
 
Ist africa paper_ref_115_doc_3988
Ist africa paper_ref_115_doc_3988Ist africa paper_ref_115_doc_3988
Ist africa paper_ref_115_doc_3988
 

More from Carlo Vaccari

HLG Big Data project and Sandbox
HLG Big Data project and SandboxHLG Big Data project and Sandbox
HLG Big Data project and Sandbox
Carlo Vaccari
 
I Big Data e la Statistica: un progetto internazionale
I Big Data e la Statistica: un progetto internazionaleI Big Data e la Statistica: un progetto internazionale
I Big Data e la Statistica: un progetto internazionale
Carlo Vaccari
 
Andrea Talamonti: CKAN a tool for Open Data
Andrea Talamonti: CKAN a tool for Open DataAndrea Talamonti: CKAN a tool for Open Data
Andrea Talamonti: CKAN a tool for Open Data
Carlo Vaccari
 
Fabrizio Allegretto: Open Data & University
Fabrizio Allegretto: Open Data & UniversityFabrizio Allegretto: Open Data & University
Fabrizio Allegretto: Open Data & University
Carlo Vaccari
 
Yapo Juares Tanguy: RSS environment
Yapo Juares Tanguy: RSS environmentYapo Juares Tanguy: RSS environment
Yapo Juares Tanguy: RSS environment
Carlo Vaccari
 
Matteo Marchionne: Foaf e feed reader
Matteo Marchionne: Foaf e feed readerMatteo Marchionne: Foaf e feed reader
Matteo Marchionne: Foaf e feed reader
Carlo Vaccari
 
Alex Haechler: China vs USA social networks
Alex Haechler: China vs USA social networksAlex Haechler: China vs USA social networks
Alex Haechler: China vs USA social networks
Carlo Vaccari
 
Carlo Colicchio: Big Data for business
Carlo Colicchio: Big Data for businessCarlo Colicchio: Big Data for business
Carlo Colicchio: Big Data for business
Carlo Vaccari
 
Klevis Mino: MongoDB
Klevis Mino: MongoDBKlevis Mino: MongoDB
Klevis Mino: MongoDB
Carlo Vaccari
 
Rando Veizi: Data warehouse and Pentaho suite
Rando Veizi: Data warehouse and Pentaho suiteRando Veizi: Data warehouse and Pentaho suite
Rando Veizi: Data warehouse and Pentaho suiteCarlo Vaccari
 
Unkan Erol: Xing vs Linkedin
Unkan Erol: Xing vs LinkedinUnkan Erol: Xing vs Linkedin
Unkan Erol: Xing vs Linkedin
Carlo Vaccari
 
Big Data Conference Ottobre 2013
Big Data Conference Ottobre 2013Big Data Conference Ottobre 2013
Big Data Conference Ottobre 2013
Carlo Vaccari
 
Big data analytics vaccari oct2013
Big data analytics vaccari oct2013Big data analytics vaccari oct2013
Big data analytics vaccari oct2013
Carlo Vaccari
 
Serena Carota: Open Data nella Regione Marche
Serena Carota: Open Data nella Regione MarcheSerena Carota: Open Data nella Regione Marche
Serena Carota: Open Data nella Regione Marche
Carlo Vaccari
 
Introduzione ai Social network
Introduzione ai Social network  Introduzione ai Social network
Introduzione ai Social network
Carlo Vaccari
 
Start up innovative
Start up innovativeStart up innovative
Start up innovative
Carlo Vaccari
 
Social network ,ricerca di lavoro e ricerca scientifica
Social network ,ricerca di lavoro e ricerca scientificaSocial network ,ricerca di lavoro e ricerca scientifica
Social network ,ricerca di lavoro e ricerca scientifica
Carlo Vaccari
 
Social networks , Job Searching and Research - 1
Social networks , Job Searching and Research - 1Social networks , Job Searching and Research - 1
Social networks , Job Searching and Research - 1
Carlo Vaccari
 
Seminario su Open data - UniCam 18.4.2013
Seminario su Open data - UniCam 18.4.2013Seminario su Open data - UniCam 18.4.2013
Seminario su Open data - UniCam 18.4.2013
Carlo Vaccari
 
Turismo e social network
Turismo e social networkTurismo e social network
Turismo e social network
Carlo Vaccari
 

More from Carlo Vaccari (20)

HLG Big Data project and Sandbox
HLG Big Data project and SandboxHLG Big Data project and Sandbox
HLG Big Data project and Sandbox
 
I Big Data e la Statistica: un progetto internazionale
I Big Data e la Statistica: un progetto internazionaleI Big Data e la Statistica: un progetto internazionale
I Big Data e la Statistica: un progetto internazionale
 
Andrea Talamonti: CKAN a tool for Open Data
Andrea Talamonti: CKAN a tool for Open DataAndrea Talamonti: CKAN a tool for Open Data
Andrea Talamonti: CKAN a tool for Open Data
 
Fabrizio Allegretto: Open Data & University
Fabrizio Allegretto: Open Data & UniversityFabrizio Allegretto: Open Data & University
Fabrizio Allegretto: Open Data & University
 
Yapo Juares Tanguy: RSS environment
Yapo Juares Tanguy: RSS environmentYapo Juares Tanguy: RSS environment
Yapo Juares Tanguy: RSS environment
 
Matteo Marchionne: Foaf e feed reader
Matteo Marchionne: Foaf e feed readerMatteo Marchionne: Foaf e feed reader
Matteo Marchionne: Foaf e feed reader
 
Alex Haechler: China vs USA social networks
Alex Haechler: China vs USA social networksAlex Haechler: China vs USA social networks
Alex Haechler: China vs USA social networks
 
Carlo Colicchio: Big Data for business
Carlo Colicchio: Big Data for businessCarlo Colicchio: Big Data for business
Carlo Colicchio: Big Data for business
 
Klevis Mino: MongoDB
Klevis Mino: MongoDBKlevis Mino: MongoDB
Klevis Mino: MongoDB
 
Rando Veizi: Data warehouse and Pentaho suite
Rando Veizi: Data warehouse and Pentaho suiteRando Veizi: Data warehouse and Pentaho suite
Rando Veizi: Data warehouse and Pentaho suite
 
Unkan Erol: Xing vs Linkedin
Unkan Erol: Xing vs LinkedinUnkan Erol: Xing vs Linkedin
Unkan Erol: Xing vs Linkedin
 
Big Data Conference Ottobre 2013
Big Data Conference Ottobre 2013Big Data Conference Ottobre 2013
Big Data Conference Ottobre 2013
 
Big data analytics vaccari oct2013
Big data analytics vaccari oct2013Big data analytics vaccari oct2013
Big data analytics vaccari oct2013
 
Serena Carota: Open Data nella Regione Marche
Serena Carota: Open Data nella Regione MarcheSerena Carota: Open Data nella Regione Marche
Serena Carota: Open Data nella Regione Marche
 
Introduzione ai Social network
Introduzione ai Social network  Introduzione ai Social network
Introduzione ai Social network
 
Start up innovative
Start up innovativeStart up innovative
Start up innovative
 
Social network ,ricerca di lavoro e ricerca scientifica
Social network ,ricerca di lavoro e ricerca scientificaSocial network ,ricerca di lavoro e ricerca scientifica
Social network ,ricerca di lavoro e ricerca scientifica
 
Social networks , Job Searching and Research - 1
Social networks , Job Searching and Research - 1Social networks , Job Searching and Research - 1
Social networks , Job Searching and Research - 1
 
Seminario su Open data - UniCam 18.4.2013
Seminario su Open data - UniCam 18.4.2013Seminario su Open data - UniCam 18.4.2013
Seminario su Open data - UniCam 18.4.2013
 
Turismo e social network
Turismo e social networkTurismo e social network
Turismo e social network
 

Recently uploaded

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 

Recently uploaded (20)

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 

Sharing Advisory Board newsletter #8

  • 1. Sharing Advisory Board Software Sharing Newsletter Issue 8: April 2013 Editorial (Marton Vucsan, SAB Chairman) Kodak, Sony, Philips, your local newspaper, it happened to all of them: The main product they built their existence was no longer relevant. Films, walkmans, light bulbs, encyclopedias, the list is endless .. Whole industries are wiped out to make room for new ones. Darwin would have enjoyed this. At the Big Data seminar that was organized by the UN in New York you could see the first signals that for us the bell will ring too in due course. Choices will have to be made
 There is a tremendous opportunity in using the new information that the planet now produces as by- product of all its processes. It opens the door to all kinds of new statistical products and processes to make them. Let us for this moment focus on the processes. Looking at the way our researchers work with the new data we see it is fundamentally different from what we are used to. First the amount of data they work with is way too large to edit, second the tools they use are alien to what we are familiar with, third the workflow is intermittent because of the time needed for processing in every step. Aside from setting up rule sets and creating workflow type actions no human labor is involved in the actual production. The creation of these workflows is knowledge intensive, and once they are created very little effort is needed to produce the result. What we see is a shift from manual labour in production to manual labour in design which will supply the multiplier we need for sharing to be effective. In the traditional artisanal production system there is no multiplier, more statistical output means more people. Sharing and collaborating outside the office seems not very meaningful because of the logistical overhead and the differences in execution and definition. Moreover, there is no multiplier present either, sharing work does not make other work unnecessary in this kind of setup. In the big data area where the problems are more uniform, the real work is in the design of the process. The processes are more formally defined and these processes represent the production knowledge for the statistics they produce. It will be very profitable to share these processes under the "build one get ten" rule. The reason is that there is a multiplier present in the form of a formalized mostly automated process. In this Issue: Open data – taming the tiger The OECD Open Data Project Developing software sharing in the European Statistical System Improving data collection by soft computing Tools for a Sprint Understanding “Plug and Play” The knowledge is deployed in the design phase of the process and sharing parts of the processes means getting executable process parts in return including the knowledge that created them. In this setup sharing means indeed freeing up resources. The threat or opportunity of Big Data will help us to do two things: It will help us shift the balance between human labor and machine labor towards machine labor and it will help us to become a solution sharing industry that can do much more with much less money. In the end it might not only be the new products that will be our opportunity but also the new industrial processes that will underlie these new products Are you Linked ? The LinkedIn group “Business Architecture in Statistics” aims to share knowledge and ideas and promote activities, such as those undertaken by the Sharing Advisory Board, that support standards-based modernisation and greater collaboration between statistical organisations. Join the discussions at: http://www.linkedin.com/groups?home=&gid= 4173055 You can also find out more about SAB activities and outputs via the MSIS wiki: www1.unece.org/stat/platform/display/msis
  • 2. Open data – taming the tiger Eoin McCuirc (Central Statistics Office, Ireland) The term “open data” means different things to different people, though the goals of making information freely available and easily accessible online are very clear. I’ll start by looking at Tim Berners-Lee’s classification of five levels of open data: ★   Make your data available on the Web  under an open license ★★  Make it available as structured data (e.g.  Excel sheet instead of image scan of a  table) ★★★   Use a non‐proprietary format (e.g. CSV  file instead of an Excel sheet)  ★★★★  Use Linked Data formats (URIs to  identify things, RDF to represent data) ★★★★★  Link your data to other people’s data to  provide context So, there are degrees of “openness” – from simply putting information up on the web to providing linked open data. Yes, both are a form of open data but, though similar in appearance, they are two completely different animals, as different say as a cat to a tiger. In this article I want to talk about the tiger: linked open data, the semantic web and how the CSO is beginning to meet this new challenge. In managing the dissemination of statistics we are guided by international standards. Principles 14 and 15 of the European Statistics Code of Practice are particularly relevant: Principle  14  Coherence  and  Comparability:  European  Statistics  are  consistent  internally,  over  time  and  comparable  between  regions  and  countries;  it  is  possible to combine and make joint use of related data  from different sources.  Principle  15  Accessibility  and  Clarity:  European  Statistics are presented in a clear and understandable  form,  released  in  a  suitable  and  convenient  manner,  available  and  accessible  on  an  impartial  basis  with  supporting metadata and guidance.  Clearly, the opportunities offered by open data will help statistical offices to deliver outputs which match these two principles. The data deluge, or the accumulation of data about people, places and things, is changing the world in which statistical offices process and publish statistics – and is another important driver for open data. In general, it is getting more and more difficult to find the information you need – the problem of the needle in the haystack. Sooner rather than later we will need machines to trawl through all available data in order to find the proverbial needle. But for this to become possible, data needs to be structured in a particular way – a challenge to which the semantic web offers a solution. The semantic web provides a way of making data machine-readable, independent of the variety of technical platforms and software packages in use throughout the web. A key concept is that of linked open data. In many ways, linked open data is similar to open data: An organisation such as a statistical office decides what information it wants to publish on the web and makes the necessary technical choices about hosting, security, domain names, content management, maintenance, etc. These choices apply equally to publishing linked open data. However, the key difference is one of language. In linked open data, semantic web objects are named to indicate all the attributes needed to make the data machine-readable without human intervention. For statisticians, this is an opportunity to use international classifications (e.g. NACE, ISCO, ISCED etc.) as de facto standards for linked open data. Indeed, if we don’t take this opportunity, it’s possible that other early adopters could set a different standard. We are starting on a journey and, unfortunately, there is no clear road map yet and few precedents to give guidance. So, how do we acquire the expertise needed to publish statistics as linked open data? In 2012 the CSO began a pilot project with the Digital Enterprise Research Institute (DERI) at the National University of Ireland, Galway (NUIG), to publish some of the Census 2011 results as linked open data. The project has given valuable experience to the CSO dissemination team and the following are some of the important lessons so far: 1. For data to be linked across the semantic web, objects need to be named. Uniform Resource Identifiers (URIs) are the code that identifies an object. Official statistics use many standard classifications to define their data and, as noted above, this is very useful when creating URIs. 2. Once the objects have been named a framework is needed to publish this data on the web. Using the Resource Description Framework (RDF) which views the world in triplets – Special Feature: Open Data The next two articles explore the implications of open data for official statistics. The first presents the view from a national statistical organisation (CSO Ireland). The second gives the perspective of an international organisation (OECD).
  • 3. (Resource, Attribute, Value) – the information is published on the web. An example of an RDF statement is (Population of Ireland 2011, Statistic, 4588252). 3. To publish data on the semantic web an organisation needs to put it in a place and in a format that a machine will expect. The CSO will publish its Census 2011 open data on data.cso.ie not on the CSO website www.cso.ie. In this scenario machines should get RDF data and users should get some readable representation of the data e.g. HTML. 4. Ideally all the URIs an organisation produces related to a single real world concept – e.g. Population of Ireland 2011 – should be linked together. 5. Ideally the URIs would be “cool”, built to last 2000 years or more. http://www.w3.org/TR/2008/WD-cooluris-20080321/ 6. For the Census 2011 pilot it is proposed to produce a SPARQL (SPARQL Protocol and RDF Query Language) service to facilitate access to the data. The following table sets out the framework which is planned for Census 2011 results as linked open data: Base URI: http://data.cso.ie/ Entity  URI pattern (rel. to  base)  RDF class  Classification  /classification/{id}  skos:ConceptSch eme  Concept in a  classification  /classification/{id}#{ code}  skos:Concept  Dataset  /dataset/{id}  qb:DataSet  Data structure  definition  /dataset/{id}#struct ure  qb:DataStructur eDefinition  Observation  /dataset/{id}#{dim1} ;{dim2};{dim3}  qb:Observation  Property  (dimension,  attribute)  /property#{id}  qb:DimensionPr operty,  qb:AttributePro perty  Later in 2013, we will publish the outputs from the project – i.e. Census 2011 results as linked open data and, to mark International Year of Statistics, we will have a competition for the best “mash up” using those statistics. We hope this will not only be a proof of concept, but also a proof of the value of linked open data. The OECD Open Data Project The OECD is currently undertaking an Open Data project with the aim of making its statistical data content machine-readable, retrievable, indexable and re-usable. The Open Data project will implement an Application Programming Interface (API) to provide machine-to-machine access to the OECD statistical data warehouse “OECD.Stat” via a number of formats along with the challenges involved in standardising the statistical content from the 800+ datasets. In addition an Open Innovation community will be created to encourage the re-use of OECD data via external innovation. The Open Data project is aligned to the Knowledge Information Management (KIM) Ontology Management and Semantic Infrastructure project to make data accessible via linked data. Background to the Open Data Project Statistics are of strategic importance to the OECD both as an input for internal analysis and also as a product for dissemination to a wider audience in their own right. Following a review of the OECD Publishing Policy in 2011 a number of recommendations were proposed to make OECD statistics “open, accessible and free”. The OECD Council welcomed this proposal and as a result the DELTA programme was initiated to implement these aims. DELTA Project – Open Data Openness is one of the key values that guide the OECD vision for a stronger, cleaner and fairer economy. Making data open is an important part of this and to this end a number of open benchmarks in the project have been defined as follows: Completeness – content should include data, metadata, sources and methods. Primacy – datasets should be primary and not aggregated and include details on how the data was collected. Timeliness – data should be automatically available in trusted third-party repositories upon publication. Ease of access – data made available via a simple Application Programming Interface (API) Machine Readability – data and metadata provided in machine-readable standard plus documentation. Non-discrimination – No special permissions required to access data.
  • 4.  Use of common standards – Stored data can be accessed without a special software license  Licensing – Creative Commons CC-BY (Licensees may copy, distribute, display and perform the work and make derivative works based on it only if they give the author or licensor the credits in the manner specified by these).  Permanence – Information made available remain online with archiving over time together with notification mechanism.  Usage costs – Free. Open Data Project goals Data today can be extracted only via downloads from OECD.Stat. The ODWS will make them available to other web sites directly for creating custom data visualizations, live combinations with other data sources etc. The goals of the Open Data project are: to make OECD data machine- readable, retrievable, indexable and re-usable; to increase the dissemination and impact of OECD data via open data services for its statistical data; and, to encourage re-use of OECD data by external innovation communities. The Open Data Project has 3 main deliverables: i) a full set of “Open-ready” data and metadata; ii) a set of Open Data Web Services and iii) an interface for managing the OECD Open Innovation Community. “Open-Ready” Data and metadata For data to be considered “Open-ready” the existing data and metadata content of the OECD corporate data warehouse OECD.Stat will be required to meet certain criteria of structure and content necessary for machine-to-machine access. To achieve this, data owners will carry out a self- assessment of all OECD.Stat data content to gauge the state of open-readiness for each dataset. This will involve analysing the metadata content according to the criteria. Open Data Web Services (ODWS) In parallel to the data assessment exercise, the Open Data Web Services will be developed. This will involve building a set of Web Services to provide machine-to-machine access to OECD.Stat data via a number of formats. This will involve defining the technical standards for data to be machine-readable that meet the needs of both expert and non-expert audiences. Application Programming Interfaces (API) will be developed to make the data and metadata in OECD.Stat available to systems outside the organisation via a number of formats. These Web Services will be available to other organisations currently sharing the .Stat Data warehouse software via the OECD Statistical Information System Collaboration Community (SIS-CC) . Open Data formats Data and metadata will be made available to external users in as many output formats as possible to maximise data access. The project will start with formats including: SDMX/JSON, Restful API, OData, XLS and CSV. Additional formats will be added as needed over time. These formats have been chosen for the reasons described below. a) Excel/CSV - Excel and CSV are already widely used exchange standards so including them as output formats was a fairly obvious decision. b) SDMX/JSON - JavaScript Object Notation (JSON) is a text-based open standard designed for human-readable data interchange and has become one of the most popular industry-used open data formats on web sites today. The Statistics Data and Metadata eXchange standard (SDMX) provides a standard model for statistical data and metadata exchange between national agencies and international agencies, within national statistical systems and within organisations. OECD is a member of the SDMX Sponsor Group (together with the Bank of International Settlements, European Central Bank, Eurostat, International Monetary Fund, United Nations Statistics Division and World Bank). SDMX data extracts from OECD.Stat are already provided via a web service; this will be adapted as an API using the SDMX compact version. c) Open Data (OData) - OData is an open protocol for sharing data Future formats could include Google Data (a REST-inspired technology), Google Dataset Publishing Language (DPSL) or Google KML, a Geospatial file format. Linked Data and the OECD KIM project The OECD Knowledge and Information Management (KIM) has been established to integrate information and centralise access to all OECD content (corporate content management, record management, authoring, etc.). KIM was launched in parallel to the DELTA project and is concerned with developing semantic enrichment and centralized taxonomy linked data support. A long-term goal of the project is to create linked data sources with the Resource Description Framework (RDF) using existing vocabularies to map data to related subjects and generating a
  • 5. Software inventory  Over 60 statistical software tools available for sharing  Find new software, or post information about yours at: www1.unece.org/stat/platform/di splay/msis/Software+Inventory collection of “triples” (consisting of a subject, a predicate and an object) known as a “triple-store”. Each component of the triple has a Unique Resource Identifier (URI) enable data to be linked to related sources. Creating a triple-store from the OECD.Stat data warehouse will be a huge task and work investigating the possibilities has only recently started (at time of writing the tools have not yet been selected), but the long-term goal is to conform to the Tim Lee-Berners “5 star” level of open data. The vision of the Semantic Web is to extend principles of the Web from documents to data. Data should be accessed using the general Web architecture using, e.g., URI-s; data should be related to one another just as documents (or portions of documents) are already. This also means creation of a common framework that allows data to be shared and reused across application, enterprise, and community boundaries, to be processed automatically by tools as well as manually, including revealing possible new relationships among data items. The OECD Open Innovation Community The Open Innovation Community will consist of an interface for managing Open Innovation Community (OIC) content and involves designing, building and maintaining this interface to provide the following:  Information describing the open platform  Registration services  Examples of products developed using the open platform  Open Services available with associated technical documentation  OIC Blog  FAQ
  • 6. Developing software sharing in the European Statistical System Denis Grofils (Eurostat) Software represents an important part of the assets of the European Statistical System (ESS). In statistical institutions as in many modern businesses quality and availability of software is of primordial importance as it affects directly the way business processes are executed. If all members of the ESS are possibly developing software to some extent, all are using software without any doubt. The development of software is usually recognized as costly as well at development as in maintenance stages. The simple usage of software may be costly in different respects: licensing fees, consultancy, training, 
 Software may be of different nature and extent, for example some types of software could be:  Data collection systems  Procedures developed in statistical computing languages for different purposes (sampling, imputation, weighting, aggregation, confidentiality protection, etc.)  Tools for the management of statistical metadata  Web portals for data dissemination As the level of standardisation grows in the statistical community through harmonization at international level and through initiatives that promote industrialisation of official statistical production (see the work of the HLG of the UNECE or the Joint strategy and the ESS.VIP programme at ESS level), the sharing of software at a wider level becomes easier. The move toward service oriented architecture (SOA) and the development of a so called "plug and play" architecture for statistical production reinforce strongly the potential of sharing. Platform-independent services allow distributed architecture models that promote a high level of reuse of software components. Services can be developed independently or cooperatively and shared among partners. Functionalities of existing software can be offered as service at limited cost via proper wrapping. All this make the potential of software sharing higher than ever. The possibility to share software among institutions of the ESS represents several advantages, notably:  Increase efficiency and reduce costs by avoiding multiple developments of virtually the same products by different organisations  Increase harmonization and interoperability through the use of standard software building blocks  Improve quality of the data through the use of widely accepted and validated software building blocks and improve comparability among data coming from different countries  Increase the level of collaboration and resource sharing between members of the ESS Several important achievements relating to OSS have been realised at the European level, notably:  The European Union Public Licence (EUPL): The first European OSS licence. It has been created on the initiative of the European Commission and is approved by the European Commission in 22 official languages of the European Union.  Joinup: A collaborative platform created by the European Commission that offers a set of services to help e-Government professionals share their experience with interoperability solutions and support them to find, choose, re- use, develop, and implement open source software and semantic interoperability assets. The ESS IT Directors Group (ITDG) mandated the Statistical Information System Architecture and Integration working group (SISAI) to launch a task force dealing with the development of policy and guidelines supporting ESS software sharing. The work of this task force started during fall 2012. The following aspects of software sharing are tackled:  Definition of software of interest: In this context the term ‘software’ is to be understood in its broadest sense as any set of computer programs, these being defined as any set of instructions for computers. Objective criteria for defining the target of the recommendations are necessary. Software of interest is defined as “software used by members of the ESS to support directly activities of the GSBPM in order to realise the statistical programme of the ESS”. It should be noted that this definition is independent of technological characteristics of software (web-based, command-line batch, macros, web-services, etc.).  Software catalogue: The way a catalogue of ESS software should be maintained and which information should be recorder is defined. A distinction is made between unshared software (used by only one ESS-member) for which a minimal set of information is collected and shared software (used by several ESS- members) for which an extensive set of information is collected.
  • 7. This newsletter and previous editions are also available on-line at: http://www1.unece.org/stat/platform/display/ msis/SAB+Newsletter  Sharing scenarios: Several scenarios are identified and the applicability of recommendations per scenario is defined (i.e. all recommendations do not apply to all scenarios).  Sharing software use: The federation of software users through the creation of user communities is organized. This concerns software published under any type of licence inclusive commercial software.  Sharing software development: Recommendations are made for each step of the development cycle. As an example it is recommended to consider several type of constraints when designing software: Architectural constrains (consistency with GSBPM & GSIM, link with PnP constraints), clear documentation of methodological aspects, data protection constraints specific to the ESS, support for multilingualism and legal roadmap (particularly intellectual property right tracking when developing component-based applications).  Software quality evaluation: A template for software quality assessment is provided. Evaluations of elaborated recommendations on real cases were performed to evaluate propositions against reality and incorporate feedback from these experiences. Three illustration cases were used: Blaise, Demetra+ and SDMX-RI. The set of draft recommendations elaborated by the task force will be submitted in the coming weeks to the Statistical Information System Architecture and Integration working group (SISAI) and then to the ESS IT Directors Group (ITDG). Improving data collection by soft computing Miroslav Hudec and Jana JuriovĂĄ (Infostat) The applicability of soft computing (fuzzy logic and neural networks) as a modern means to improve the collection and the quality of data for business and trade statistics is one of topics of the Blue-ETS project (http://www.blue- ets.istat.it/). The main findings which support this line of development are:  Large complex administrative and statistical databases contain valuable information which can be mined using powerful methodologies;  Statisticians possess knowledge on how to deal with their tasks, but this knowledge cannot be always expressed by precise rules. In order to estimate missing values, relations between similar respondents are relevant. Mining the Intrastat database by neural networks (NN) reveals that it is a rational option that could present a solution. NNs find patterns and relations between similar respondents. In this way we are able to estimate items if we have enough data available from other respondents. Fuzzy rules expressed by linguistic terms and quantifiers reveal levels of similarity between imputed and surveyed values. Similar techniques are also promising for dissemination. People prefer to use expressions of natural language in searching for useful data and information. For example: select regions where most municipalities have a small altitude above sea level, etc. The result is entities ranked according to the degree of match to the query condition. Modernization of the first and the last stage of data collection could create a chain reaction of improvements in data quality. Better data dissemination (by flexible queries) could motivate respondents to provide their own data more timely and accurately, and reduce the frequency of missing values implying more efficient imputation (less missing values and powerful neural networks). Relevant equations, models and experimental tools have been created in order to evaluate pros and cons. The next step is the creation of full functional tools and their adaptation to particular needs. What does Big data mean for official statistics? A new paper prepared by leading international experts has recently been released by the High- Level Group for the Modernisation of Statistical Production and Services: http://www1.unece.org/stat/platform/pages/v iewpage.action?pageId=77170614
  • 8. Tools for a Sprint Carlo Vaccari (a sprinter) In Ottawa from 8 to 13 April 2013 we had a Sprint for the Plug & Play Architecture Project. People from Australia, Canada, Eurostat, Italy, Mexico, Netherlands, New Zealand, Sweden, UNECE and United Kingdom met together to start defining a “common statistical production architecture for the world's official statistical industry.” as stated by High-Level Group for the Modernisation of Statistical Production and Services (see http://www1.unece.org/stat/platform/display/hlgbas). The objective of the sprint session was to ensure agreement on key principles regarding the architectural blueprint to build an interoperable statistical industry platform. We will discuss the documents produced by this meeting over the next few weeks. Here I just want to show you which tools were used in the Sprint. Paper, a lot of paper We wrote a lot on sheets, flip charts, post-its of any color and any shape. Paper was used to explain, show, collect, store, group and debate ideas. White-boards We used many white-boards writing there with markers of any type – often we used cameras/mobiles to get a picture of what was written to be able to transfer the concepts to digital files (we would love smart boards like: www.youtube.com/watch?v=NZNTgglPbUA) Mixed And yes, we used paper and boards together, very useful when you want to group concepts and keep track of what was done.
  • 9. Wiki We inserted documents, presentations, images, discussion, glossary and so on in the UNECE wiki. Mind Maps Often we used Mind Maps to capture brainstorming and discussions: one of the best ways to avoid losing ideas and summarize what has been said in lengthy discussions Presentations Presentation software was used not only to prepare slides to present, but also to draw and develop schemas. Using notes and colors, presentation software has then been used as a kind of digital dashboard. Lego bricks Each participant received from our wonderful facilitators three Lego bricks of different colors. We had to raise them to indicate respectively: “I want to speak” “Off topic” “Too much detail”. A very simple way to force participants to follow rules for an efficient exchange of views. Lollies Each participant brought sweets (“lollies” in Australian English) from their country, to share with partners. Biscuits, chocolates, sweets of all kinds were the fuel to provide energy to tired brains
  • 10. Understanding “Plug and Play” Marton Vucsan (Statistics Netherlands) I have high hopes for the Common Statistical Production Architecture (CSPA) Project, commonly known under the more profane title of Plug and Play. Although it sounds easy, it may prove to be hard, very hard. From what I hear, “plug and play” has many interpretations. Some point in the direction of the feared “mother of all systems” projects that never work. Understanding what is really meant by the current CSPA project is important because CSPA is something completely different from the big feared projects of the past. CSPA is about reducing complexity and getting operating system independence. It is also about sharing and reducing our efforts while still getting what we need. To achieve this we have to realize that our means of production are composed of different levels of abstraction. There are the methods, the process descriptions and finally the applications (I am deliberately keeping it simple here). Normally, to arrive at a statistical output we describe a method, create a process and build an application. All three are normally monolithic in nature and custom made. Unshareable, un- reusable, expensive, complex. The stuff called “legacy apps”, the stuff we should stop making. CSPA starts with the insight, that splitting things up reduces complexity. The GSBPM does this; it splits up the statistical process into easy to understand sub-processes. Thinking and building in sub- systems reduces complexity and increases reliability. Many programmers struggle with this, trying to split up a given solution into meaningful parts and often failing. In hindsight, the reason for that failure is obvious; the reduction in complexity has to be done at a much higher level. The complexity is often in the methods and the way the processes were thought up, independent from the IT implementation. If we really want to reduce complexity, that is our point of attack: the level were we specify our statistical recipe. As a statistical community, we seem to agree that statistical outputs can be produced by processes composed of GSBPM sub-processes. With the right compromises we will be able to use these sub-processes or components across a broad range of statistics and agencies; like the engines on a plane or the motor management system in your car. Just like the conceptual understanding that a car and a plane are a collection of functional sub-systems, we need to understand that a statistical production system is a collection of functional sub-processes. Once we are able to think of our processes as assemblies of components we can reuse them or exchange them. Of course it is not that simple, but there are powerful forces at work to make that happen. Look what happened in other industries. When a component, say a motor management system, is available, most designs gravitate to using that component because it is much cheaper than to “roll your own”. A component can be manufactured separately from the system it will be used in. Rolls Royce don’t need planes to manufacture engines. The key is to do it at the right conceptual level. Others have done it (look at your phone); we can do it too! Many statistical organisations are modernising using Enterprise Architecture to underpin their vision and change strategy. This enables them to develop statistical services in a standard way. Enterprise architecture creates an environment which can change and support business goals. It shows what the business needs are, where the organisation wants to be, and ensures that IT strategy aligns with this. It helps to remove silos, improves collaboration and ensures that technology is aligned to business needs. In parallel, the High Level Group for the Modernization of Statistical Production and Services (HLG) is developing the CSPA. This will be a generic architecture for statistical production, and will serve as an industry architecture for official statistics. Adopting a common architecture will make it easier for organisations to standardise and combine the components of statistical production, regardless of where the statistical services are built. The CSPA also provides a starting point for concerted developments of statistical infrastructure and shared investment across statistical organisations. Version 0.1 of the CSPA documentation has just been released for public comment at: www1.unece.org/stat/platform/x/_ISwB Your feedback is welcome!