Sharing Advisory Board newsletter #8

Sharing Advisory Board
Software Sharing Newsletter
Issue 8: April 2013
Editorial (Marton Vucsan, SAB Chairman)
Kodak, Sony, Philips, your local newspaper, it
happened to all of them: The main product they
built their existence was no longer relevant. Films,
walkmans, light bulbs, encyclopedias, the list is
endless .. Whole industries are wiped out to make
room for new ones. Darwin would have enjoyed
this. At the Big Data seminar that was organized by
the UN in New York you could see the first signals
that for us the bell will ring too in due course.
Choices will have to be made…
There is a tremendous opportunity in using the new
information that the planet now produces as by-
product of all its processes. It opens the door to all
kinds of new statistical products and processes to
make them. Let us for this moment focus on the
processes. Looking at the way our researchers
work with the new data we see it is fundamentally
different from what we are used to. First the
amount of data they work with is way too large to
edit, second the tools they use are alien to what we
are familiar with, third the workflow is intermittent
because of the time needed for processing in every
step. Aside from setting up rule sets and creating
workflow type actions no human labor is involved in
the actual production. The creation of these
workflows is knowledge intensive, and once they
are created very little effort is needed to produce
the result. What we see is a shift from manual
labour in production to manual labour in design
which will supply the multiplier we need for sharing
to be effective. In the traditional artisanal
production system there is no multiplier, more
statistical output means more people. Sharing and
collaborating outside the office seems not very
meaningful because of the logistical overhead and
the differences in execution and definition.
Moreover, there is no multiplier present either,
sharing work does not make other work
unnecessary in this kind of setup.
In the big data area where the problems are more
uniform, the real work is in the design of the
process. The processes are more formally defined
and these processes represent the production
knowledge for the statistics they produce. It will be
very profitable to share these processes under the
"build one get ten" rule. The reason is that there is
a multiplier present in the form of a formalized
mostly automated process.
In this Issue:
Open data – taming the tiger
The OECD Open Data Project
Developing software sharing in the
European Statistical System
Improving data collection by soft
computing
Tools for a Sprint
Understanding “Plug and Play”
The knowledge is deployed in the design phase
of the process and sharing parts of the processes
means getting executable process parts in return
including the knowledge that created them. In
this setup sharing means indeed freeing up
resources.
The threat or opportunity of Big Data will help us
to do two things: It will help us shift the balance
between human labor and machine labor towards
machine labor and it will help us to become a
solution sharing industry that can do much more
with much less money.
In the end it might not only be the new products
that will be our opportunity but also the new
industrial processes that will underlie these new
products
Are you Linked ?
The LinkedIn group “Business Architecture in
Statistics” aims to share knowledge and ideas
and promote activities, such as those undertaken
by the Sharing Advisory Board, that support
standards-based modernisation and greater
collaboration between statistical organisations.
Join the discussions at:
http://www.linkedin.com/groups?home=&gid=
4173055
You can also find out more about SAB activities
and outputs via the MSIS wiki:
www1.unece.org/stat/platform/display/msis

Open data – taming the tiger
Eoin McCuirc (Central Statistics Office, Ireland)
The term “open data” means different things to
different people, though the goals of making
information freely available and easily accessible
online are very clear. I’ll start by looking at Tim
Berners-Lee’s classification of five levels of open
data:
★ Make your data available on the Web
under an open license
★★ Make it available as structured data (e.g.
Excel sheet instead of image scan of a
table)
★★★ Use a non‐proprietary format (e.g. CSV
file instead of an Excel sheet)
★★★★ Use Linked Data formats (URIs to
identify things, RDF to represent data)
★★★★★ Link your data to other people’s data to
provide context
So, there are degrees of “openness” – from simply
putting information up on the web to providing
linked open data. Yes, both are a form of open
data but, though similar in appearance, they are
two completely different animals, as different say
as a cat to a tiger. In this article I want to talk about
the tiger: linked open data, the semantic web and
how the CSO is beginning to meet this new
challenge.
In managing the dissemination of statistics we are
guided by international standards. Principles 14
and 15 of the European Statistics Code of Practice
are particularly relevant:
Principle 14 Coherence and Comparability: European
Statistics are consistent internally, over time and
comparable between regions and countries; it is
possible to combine and make joint use of related data
from different sources.
Principle 15 Accessibility and Clarity: European
Statistics are presented in a clear and understandable
form, released in a suitable and convenient manner,
available and accessible on an impartial basis with
supporting metadata and guidance.
Clearly, the opportunities offered by open data will
help statistical offices to deliver outputs which
match these two principles.
The data deluge, or the accumulation of data about
people, places and things, is changing the world in
which statistical offices process and publish
statistics – and is another important driver for open
data. In general, it is getting more and more
difficult to find the information you need – the
problem of the needle in the haystack. Sooner
rather than later we will need machines to trawl
through all available data in order to find the
proverbial needle. But for this to become possible,
data needs to be structured in a particular way – a
challenge to which the semantic web offers a
solution.
The semantic web provides a way of making data
machine-readable, independent of the variety of
technical platforms and software packages in use
throughout the web. A key concept is that of linked
open data. In many ways, linked open data is
similar to open data: An organisation such as a
statistical office decides what information it wants
to publish on the web and makes the necessary
technical choices about hosting, security, domain
names, content management, maintenance, etc.
These choices apply equally to publishing linked
open data.
However, the key difference is one of language. In
linked open data, semantic web objects are named
to indicate all the attributes needed to make the
data machine-readable without human
intervention. For statisticians, this is an opportunity
to use international classifications (e.g. NACE,
ISCO, ISCED etc.) as de facto standards for linked
open data. Indeed, if we don’t take this opportunity,
it’s possible that other early adopters could set a
different standard.
We are starting on a journey and, unfortunately,
there is no clear road map yet and few precedents
to give guidance. So, how do we acquire the
expertise needed to publish statistics as linked
open data?
In 2012 the CSO began a pilot project with the
Digital Enterprise Research Institute (DERI) at the
National University of Ireland, Galway (NUIG), to
publish some of the Census 2011 results as linked
open data. The project has given valuable
experience to the CSO dissemination team and the
following are some of the important lessons so far:
1. For data to be linked across the semantic
web, objects need to be named. Uniform Resource
Identifiers (URIs) are the code that identifies an
object. Official statistics use many standard
classifications to define their data and, as noted
above, this is very useful when creating URIs.
2. Once the objects have been named a
framework is needed to publish this data on the
web. Using the Resource Description Framework
(RDF) which views the world in triplets –
Special Feature: Open Data
The next two articles explore the implications of
open data for official statistics. The first presents
the view from a national statistical organisation
(CSO Ireland). The second gives the perspective
of an international organisation (OECD).

(Resource, Attribute, Value) – the information is
published on the web. An example of an RDF
statement is (Population of Ireland 2011, Statistic,
4588252).
3. To publish data on the semantic web an
organisation needs to put it in a place and in a
format that a machine will expect. The CSO will
publish its Census 2011 open data on data.cso.ie
not on the CSO website www.cso.ie. In this
scenario machines should get RDF data and users
should get some readable representation of the
data e.g. HTML.
4. Ideally all the URIs an organisation
produces related to a single real world concept –
e.g. Population of Ireland 2011 – should be linked
together.
5. Ideally the URIs would be “cool”, built to last
2000 years or more.
http://www.w3.org/TR/2008/WD-cooluris-20080321/
6. For the Census 2011 pilot it is proposed to
produce a SPARQL (SPARQL Protocol and RDF
Query Language) service to facilitate access to the
data.
The following table sets out the framework which is
planned for Census 2011 results as linked open
data:
Base URI: http://data.cso.ie/
Entity URI pattern (rel. to
base)
RDF class
Classification /classification/{id} skos:ConceptSch
eme
Concept in a
classification
/classification/{id}#{
code}
skos:Concept
Dataset /dataset/{id} qb:DataSet
Data structure
definition
/dataset/{id}#struct
ure
qb:DataStructur
eDefinition
Observation /dataset/{id}#{dim1}
;{dim2};{dim3}
qb:Observation
Property
(dimension,
attribute)
/property#{id} qb:DimensionPr
operty,
qb:AttributePro
perty
Later in 2013, we will publish the outputs from the
project – i.e. Census 2011 results as linked open
data and, to mark International Year of Statistics,
we will have a competition for the best “mash up”
using those statistics. We hope this will not only be
a proof of concept, but also a proof of the value of
linked open data.
The OECD Open Data Project
The OECD is currently undertaking an Open
Data project with the aim of making its statistical
data content machine-readable, retrievable,
indexable and re-usable. The Open Data project
will implement an Application Programming
Interface (API) to provide machine-to-machine
access to the OECD statistical data warehouse
“OECD.Stat” via a number of formats along with
the challenges involved in standardising the
statistical content from the 800+ datasets.
In addition an Open Innovation community will be
created to encourage the re-use of OECD data
via external innovation.
The Open Data project is aligned to the
Knowledge Information Management (KIM)
Ontology Management and Semantic
Infrastructure project to make data accessible via
linked data.
Background to the Open Data Project
Statistics are of strategic importance to the
OECD both as an input for internal analysis and
also as a product for dissemination to a wider
audience in their own right. Following a review of
the OECD Publishing Policy in 2011 a number of
recommendations were proposed to make OECD
statistics “open, accessible and free”. The OECD
Council welcomed this proposal and as a result
the DELTA programme was initiated to
implement these aims.
DELTA Project – Open Data
Openness is one of the key values that guide the
OECD vision for a stronger, cleaner and fairer
economy. Making data open is an important part
of this and to this end a number of open
benchmarks in the project have been defined as
follows:
Completeness – content should include data,
metadata, sources and methods.
Primacy – datasets should be primary and not
aggregated and include details on how the data
was collected.
Timeliness – data should be automatically
available in trusted third-party repositories upon
publication.
Ease of access – data made available via a
simple Application Programming Interface (API)
Machine Readability – data and metadata
provided in machine-readable standard plus
documentation.
Non-discrimination – No special permissions
required to access data.

 Use of common standards – Stored data can be
accessed without a special software license
 Licensing – Creative Commons CC-BY
(Licensees may copy, distribute, display and
perform the work and make derivative works
based on it only if they give the author or
licensor the credits in the manner specified by
these).
 Permanence – Information made available
remain online with archiving over time together
with notification mechanism.
 Usage costs – Free.
Open Data Project goals
Data today can be extracted only via downloads
from OECD.Stat. The ODWS will make them
available to other web sites directly for creating
custom data visualizations, live combinations with
other data sources etc. The goals of the Open
Data project are: to make OECD data machine-
readable, retrievable, indexable and re-usable; to
increase the dissemination and impact of OECD
data via open data services for its statistical data;
and, to encourage re-use of OECD data by
external innovation communities.
The Open Data Project has 3 main deliverables: i)
a full set of “Open-ready” data and metadata; ii) a
set of Open Data Web Services and iii) an
interface for managing the OECD Open Innovation
Community.
“Open-Ready” Data and metadata
For data to be considered “Open-ready” the
existing data and metadata content of the OECD
corporate data warehouse OECD.Stat will be
required to meet certain criteria of structure and
content necessary for machine-to-machine access.
To achieve this, data owners will carry out a self-
assessment of all OECD.Stat data content to
gauge the state of open-readiness for each
dataset. This will involve analysing the metadata
content according to the criteria.
Open Data Web Services (ODWS)
In parallel to the data assessment exercise, the
Open Data Web Services will be developed. This
will involve building a set of Web Services to
provide machine-to-machine access to OECD.Stat
data via a number of formats. This will involve
defining the technical standards for data to be
machine-readable that meet the needs of both
expert and non-expert audiences. Application
Programming Interfaces (API) will be developed to
make the data and metadata in OECD.Stat
available to systems outside the organisation via a
number of formats.
These Web Services will be available to other
organisations currently sharing the .Stat Data
warehouse software via the OECD Statistical
Information System Collaboration Community
(SIS-CC) .
Open Data formats
Data and metadata will be made available to
external users in as many output formats as
possible to maximise data access. The project will
start with formats including: SDMX/JSON, Restful
API, OData, XLS and CSV. Additional formats will
be added as needed over time. These formats
have been chosen for the reasons described
below.
a) Excel/CSV - Excel and CSV are already widely
used exchange standards so including them as
output formats was a fairly obvious decision.
b) SDMX/JSON - JavaScript Object Notation
(JSON) is a text-based open standard designed for
human-readable data interchange and has
become one of the most popular industry-used
open data formats on web sites today.
The Statistics Data and Metadata eXchange
standard (SDMX) provides a standard model for
statistical data and metadata exchange between
national agencies and international agencies,
within national statistical systems and within
organisations. OECD is a member of the SDMX
Sponsor Group (together with the Bank of
International Settlements, European Central Bank,
Eurostat, International Monetary Fund, United
Nations Statistics Division and World Bank). SDMX
data extracts from OECD.Stat are already provided
via a web service; this will be adapted as an API
using the SDMX compact version.
c) Open Data (OData) - OData is an open protocol
for sharing data
Future formats could include Google Data (a
REST-inspired technology), Google Dataset
Publishing Language (DPSL) or Google KML, a
Geospatial file format.
Linked Data and the OECD KIM project
The OECD Knowledge and Information
Management (KIM) has been established to
integrate information and centralise access to all
OECD content (corporate content management,
record management, authoring, etc.). KIM was
launched in parallel to the DELTA project and is
concerned with developing semantic enrichment
and centralized taxonomy linked data support.
A long-term goal of the project is to create linked
data sources with the Resource Description
Framework (RDF) using existing vocabularies to
map data to related subjects and generating a

Software
inventory
 Over 60 statistical software
tools available for sharing
 Find new software, or post
information about yours at:
www1.unece.org/stat/platform/di
splay/msis/Software+Inventory
collection of “triples” (consisting of a subject, a
predicate and an object) known as a “triple-store”.
Each component of the triple has a Unique
Resource Identifier (URI) enable data to be linked
to related sources.
Creating a triple-store from the OECD.Stat data
warehouse will be a huge task and work
investigating the possibilities has only recently
started (at time of writing the tools have not yet
been selected), but the long-term goal is to
conform to the Tim Lee-Berners “5 star” level of
open data.
The vision of the Semantic Web is to extend
principles of the Web from documents to data.
Data should be accessed using the general Web
architecture using, e.g., URI-s; data should be
related to one another just as documents (or
portions of documents) are already. This also
means creation of a common framework that
allows data to be shared and reused across
application, enterprise, and community boundaries,
to be processed automatically by tools as well as
manually, including revealing possible new
relationships among data items.
The OECD Open Innovation Community
The Open Innovation Community will consist of an
interface for managing Open Innovation
Community (OIC) content and involves designing,
building and maintaining this interface to provide
the following:
 Information describing the open platform
 Registration services
 Examples of products developed using the
open platform
 Open Services available with associated
technical documentation
 OIC Blog
 FAQ

Developing software sharing in the
European Statistical System
Denis Grofils (Eurostat)
Software represents an important part of the
assets of the European Statistical System (ESS).
In statistical institutions as in many modern
businesses quality and availability of software is of
primordial importance as it affects directly the way
business processes are executed. If all members
of the ESS are possibly developing software to
some extent, all are using software without any
doubt. The development of software is usually
recognized as costly as well at development as in
maintenance stages. The simple usage of
software may be costly in different respects:
licensing fees, consultancy, training, …
Software may be of different nature and extent, for
example some types of software could be:
 Data collection systems
 Procedures developed in statistical computing
languages for different purposes (sampling,
imputation, weighting, aggregation,
confidentiality protection, etc.)
 Tools for the management of statistical
metadata
 Web portals for data dissemination
As the level of standardisation grows in the
statistical community through harmonization at
international level and through initiatives that
promote industrialisation of official statistical
production (see the work of the HLG of the
UNECE or the Joint strategy and the ESS.VIP
programme at ESS level), the sharing of software
at a wider level becomes easier.
The move toward service oriented architecture
(SOA) and the development of a so called "plug
and play" architecture for statistical production
reinforce strongly the potential of sharing.
Platform-independent services allow distributed
architecture models that promote a high level of
reuse of software components. Services can be
developed independently or cooperatively and
shared among partners. Functionalities of existing
software can be offered as service at limited cost
via proper wrapping. All this make the potential of
software sharing higher than ever.
The possibility to share software among
institutions of the ESS represents several
advantages, notably:
 Increase efficiency and reduce costs by
avoiding multiple developments of virtually the
same products by different organisations
 Increase harmonization and interoperability
through the use of standard software building
blocks
 Improve quality of the data through the use of
widely accepted and validated software
building blocks and improve comparability
among data coming from different countries
 Increase the level of collaboration and resource
sharing between members of the ESS
Several important achievements relating to OSS
have been realised at the European level, notably:
 The European Union Public Licence (EUPL):
The first European OSS licence. It has been
created on the initiative of the European
Commission and is approved by the European
Commission in 22 official languages of the
European Union.
 Joinup: A collaborative platform created by the
European Commission that offers a set of
services to help e-Government professionals
share their experience with interoperability
solutions and support them to find, choose, re-
use, develop, and implement open source
software and semantic interoperability assets.
The ESS IT Directors Group (ITDG) mandated the
Statistical Information System Architecture and
Integration working group (SISAI) to launch a task
force dealing with the development of policy and
guidelines supporting ESS software sharing. The
work of this task force started during fall 2012.
The following aspects of software sharing are
tackled:
 Definition of software of interest: In this
context the term ‘software’ is to be understood
in its broadest sense as any set of computer
programs, these being defined as any set of
instructions for computers. Objective criteria for
defining the target of the recommendations are
necessary. Software of interest is defined as
“software used by members of the ESS to
support directly activities of the GSBPM in
order to realise the statistical programme of the
ESS”. It should be noted that this definition is
independent of technological characteristics of
software (web-based, command-line batch,
macros, web-services, etc.).
 Software catalogue: The way a catalogue of
ESS software should be maintained and which
information should be recorder is defined. A
distinction is made between unshared software
(used by only one ESS-member) for which a
minimal set of information is collected and
shared software (used by several ESS-
members) for which an extensive set of
information is collected.

This newsletter and previous editions are also
available on-line at:
http://www1.unece.org/stat/platform/display/
msis/SAB+Newsletter
 Sharing scenarios: Several scenarios are
identified and the applicability of
recommendations per scenario is defined (i.e.
all recommendations do not apply to all
scenarios).
 Sharing software use: The federation of
software users through the creation of user
communities is organized. This concerns
software published under any type of licence
inclusive commercial software.
 Sharing software development:
Recommendations are made for each step of
the development cycle. As an example it is
recommended to consider several type of
constraints when designing software:
Architectural constrains (consistency with
GSBPM & GSIM, link with PnP constraints),
clear documentation of methodological
aspects, data protection constraints specific to
the ESS, support for multilingualism and legal
roadmap (particularly intellectual property right
tracking when developing component-based
applications).
 Software quality evaluation: A template for
software quality assessment is provided.
Evaluations of elaborated recommendations on
real cases were performed to evaluate
propositions against reality and incorporate
feedback from these experiences. Three
illustration cases were used: Blaise, Demetra+
and SDMX-RI.
The set of draft recommendations elaborated by
the task force will be submitted in the coming
weeks to the Statistical Information System
Architecture and Integration working group (SISAI)
and then to the ESS IT Directors Group (ITDG).
Improving data collection by soft
computing
Miroslav Hudec and Jana Juriová (Infostat)
The applicability of soft computing (fuzzy logic
and neural networks) as a modern means to
improve the collection and the quality of data for
business and trade statistics is one of topics of
the Blue-ETS project (http://www.blue-
ets.istat.it/).
The main findings which support this line of
development are:
 Large complex administrative and statistical
databases contain valuable information
which can be mined using powerful
methodologies;
 Statisticians possess knowledge on how to
deal with their tasks, but this knowledge
cannot be always expressed by precise
rules.
In order to estimate missing values, relations
between similar respondents are relevant.
Mining the Intrastat database by neural
networks (NN) reveals that it is a rational option
that could present a solution. NNs find patterns
and relations between similar respondents. In
this way we are able to estimate items if we
have enough data available from other
respondents.
Fuzzy rules expressed by linguistic terms and
quantifiers reveal levels of similarity between
imputed and surveyed values.
Similar techniques are also promising for
dissemination. People prefer to use expressions
of natural language in searching for useful data
and information. For example: select regions
where most municipalities have a small altitude
above sea level, etc. The result is entities
ranked according to the degree of match to the
query condition.
Modernization of the first and the last stage of
data collection could create a chain reaction of
improvements in data quality. Better data
dissemination (by flexible queries) could
motivate respondents to provide their own data
more timely and accurately, and reduce the
frequency of missing values implying more
efficient imputation (less missing values and
powerful neural networks).
Relevant equations, models and experimental
tools have been created in order to evaluate
pros and cons. The next step is the creation of
full functional tools and their adaptation to
particular needs.
What does Big data mean
for official statistics?
A new paper prepared by leading international
experts has recently been released by the High-
Level Group for the Modernisation of Statistical
Production and Services:
http://www1.unece.org/stat/platform/pages/v
iewpage.action?pageId=77170614

Tools for a Sprint
Carlo Vaccari (a sprinter)
In Ottawa from 8 to 13 April 2013 we had a Sprint for the Plug & Play Architecture Project. People from
Australia, Canada, Eurostat, Italy, Mexico, Netherlands, New Zealand, Sweden, UNECE and United
Kingdom met together to start defining a “common statistical production architecture for the world's official
statistical industry.” as stated by High-Level Group for the Modernisation of Statistical Production and
Services (see http://www1.unece.org/stat/platform/display/hlgbas).
The objective of the sprint session was to ensure agreement on key principles regarding the architectural
blueprint to build an interoperable statistical industry platform.
We will discuss the documents produced by this meeting over the next few weeks. Here I just want to show
you which tools were used in the Sprint.
Paper, a lot of paper
We wrote a lot on sheets, flip charts, post-its of any color and any shape.
Paper was used to explain, show, collect, store, group and debate ideas.
White-boards
We used many white-boards writing there with markers of any type – often we used cameras/mobiles to get
a picture of what was written to be able to transfer the concepts to digital files (we would love smart boards
like: www.youtube.com/watch?v=NZNTgglPbUA)
Mixed
And yes, we used paper and boards together, very useful when you
want to group concepts and keep track of what was done.

Wiki
We inserted documents, presentations, images,
discussion, glossary and so on in the UNECE
wiki.
Mind Maps
Often we used Mind Maps to capture
brainstorming and discussions: one of the
best ways to avoid losing ideas and
summarize what has been said in lengthy
discussions
Presentations
Presentation software was used not only to
prepare slides to present, but also to draw and
develop schemas. Using notes and colors,
presentation software has then been used as a
kind of digital dashboard.
Lego bricks
Each participant received from our wonderful facilitators three
Lego bricks of different colors. We had to raise them to
indicate respectively: “I want to speak” “Off topic” “Too much
detail”. A very simple way to force participants to follow rules
for an efficient exchange of views.
Lollies
Each participant brought sweets (“lollies” in Australian English) from
their country, to share with partners. Biscuits, chocolates, sweets of all
kinds were the fuel to provide energy to tired brains

Understanding “Plug and Play”
Marton Vucsan (Statistics Netherlands)
I have high hopes for the Common Statistical
Production Architecture (CSPA) Project, commonly
known under the more profane title of Plug and
Play. Although it sounds easy, it may prove to be
hard, very hard. From what I hear, “plug and play”
has many interpretations. Some point in the
direction of the feared “mother of all systems”
projects that never work. Understanding what is
really meant by the current CSPA project is
important because CSPA is something completely
different from the big feared projects of the past.
CSPA is about reducing complexity and getting
operating system independence. It is also about
sharing and reducing our efforts while still getting
what we need. To achieve this we have to realize
that our means of production are composed of
different levels of abstraction. There are the
methods, the process descriptions and finally the
applications (I am deliberately keeping it simple
here). Normally, to arrive at a statistical output we
describe a method, create a process and build an
application. All three are normally monolithic in
nature and custom made. Unshareable, un-
reusable, expensive, complex. The stuff called
“legacy apps”, the stuff we should stop making.
CSPA starts with the insight, that splitting things up
reduces complexity. The GSBPM does this; it splits
up the statistical process into easy to understand
sub-processes. Thinking and building in sub-
systems reduces complexity and increases
reliability. Many programmers struggle with this,
trying to split up a given solution into meaningful
parts and often failing. In hindsight, the reason for
that failure is obvious; the reduction in complexity
has to be done at a much higher level. The
complexity is often in the methods and the way the
processes were thought up, independent from the
IT implementation. If we really want to reduce
complexity, that is our point of attack: the level
were we specify our statistical recipe.
As a statistical community, we seem to agree that
statistical outputs can be produced by processes
composed of GSBPM sub-processes. With the
right compromises we will be able to use these
sub-processes or components across a broad
range of statistics and agencies; like the engines
on a plane or the motor management system in
your car. Just like the conceptual understanding
that a car and a plane are a collection of functional
sub-systems, we need to understand that a
statistical production system is a collection of
functional sub-processes.
Once we are able to think of our processes as
assemblies of components we can reuse them or
exchange them. Of course it is not that simple, but
there are powerful forces at work to make that
happen. Look what happened in other industries.
When a component, say a motor management
system, is available, most designs gravitate to
using that component because it is much cheaper
than to “roll your own”.
A component can be manufactured separately
from the system it will be used in. Rolls Royce
don’t need planes to manufacture engines. The
key is to do it at the right conceptual level. Others
have done it (look at your phone); we can do it
too!
Many statistical organisations are modernising
using Enterprise Architecture to underpin their
vision and change strategy. This enables them to
develop statistical services in a standard way.
Enterprise architecture creates an environment
which can change and support business goals. It
shows what the business needs are, where the
organisation wants to be, and ensures that IT
strategy aligns with this. It helps to remove silos,
improves collaboration and ensures that
technology is aligned to business needs.
In parallel, the High Level Group for the
Modernization of Statistical Production and
Services (HLG) is developing the CSPA. This will
be a generic architecture for statistical production,
and will serve as an industry architecture for
official statistics. Adopting a common architecture
will make it easier for organisations to standardise
and combine the components of statistical
production, regardless of where the statistical
services are built.
The CSPA also provides a starting point for
concerted developments of statistical
infrastructure and shared investment across
statistical organisations.
Version 0.1 of the CSPA documentation has
just been released for public comment at:
www1.unece.org/stat/platform/x/_ISwB
Your feedback is welcome!

Sharing Advisory Board newsletter #8

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (12)

Similar to Sharing Advisory Board newsletter #8

Similar to Sharing Advisory Board newsletter #8 (20)

More from Carlo Vaccari

More from Carlo Vaccari (20)

Recently uploaded

Recently uploaded (20)

Sharing Advisory Board newsletter #8