Head of Division, Dissemination 29 July 2008
IAOS, Shanghai 14-16 October 2008
From Data Access to Information Integration
Statistics Denmark is continuously focusing on user needs and carries out
regular user surveys annually. Different aspects of quality measures are
addressed, including technical functionality, accessibility, comprehen-
sibility and contents.
Since the mid 1980’s statistical data have been made accessible via on-line
databases. Through personal meetings with users, seminars, hot line,
training courses and user surveys Statistics Denmark has kept quite close
contact with the users and has been exposed to their needs. Although
different users have different needs, it is possible to point out a few
generally expressed wishes.
1980’s Focus on: Electronic on-line access
Content, cross-domain coverage and details
1990’s Focus on: Internet access
2000’s Focus on: Presentation, layout
Long time span
2008 Focus on: Response time
Search and browse
Visualisation, maps and graphics
Integration with own systems
The above list is not complete. However, in our contact with users these
were some of the more frequently expressed needs for improvements. It
seems quite evident that fulfilling user needs does not consequently result
in completely satisfied users. More likely it will result in additional user
The possibility to get on-line access through public data nets to the very
latest published statistics was a revolution some 25 years ago and solved
many problems for our professional users at that time. Today the public
Internet databank, StatBank (www.statbank.dk), contains more than 2,000
large multi dimensional tables covering all 17 subject areas in our statistical
production. The geographical dimension – where available - goes down to
municipality and region.
By launching the StatBank access spread to a much larger group of users.
From statistic users to
Gradually the typical Internet user – who differed a lot from the traditional
statistics user – also became aware of the possibilities and treasures of
statistical information. The number of annual table retrievals from the
databank rocketed from 20,000 to 2,000,000.
To support this new generation of users we – as a statistical organisation –
have to present statistics in a comprehensive and appealing way.
Statistics Denmark has the advantage of compiling statistics from all public
administrative registers. Data from different registers can, due to unique
identifiers, be linked together. This gives the possibility of producing a large
variety of information down to detailed geographical levels. However, the
issue of confidentiality is very important and has of course to be taken into
account when publishing. The amount of available statistics is growing
enormously - partly as a result of the huge possibilities of combining data
from the statistical registers.
Organising the data in a good and logic structure becomes essential.
During the past few years user satisfaction surveys have showed a need for
enhanced documentation and a successful search engine.
The latest survey of the StatBank was carried out December 2007. The
response rate was 35%, which is remarkably high when compared to many
other web based user surveys.
The five functions and possibilities that users evaluated highest as well as
the five that got the lowest score are listed below.
Satisfied and Very satisfied n=755
The StatBank in general 94 % and Very
Response time 94 %
Content 93 % table
Finding the right 30 %
Download possibilities 92 %
Help 29 %
Presentation in charts 89 %
Documentation related to a table 23 %
Degree of detail 18 %
Presentation on maps 15 %
Different levels of experience in using statistics are reflected by different
needs for documentation. The real professionals and recurrent users have
already accessed the relevant documentation for some years as every
statistic is linked to a quality description giving information on source,
contents, time, accuracy, comparability and accessability.
The less advanced users have – more or less – been left on their own. They
will now be prioritised and their situation will be improved in the year to
A project linking “down-to-earth” definitions of concepts with the data in
the StatBank was inaugurated in 2007. Our investigations show that
recurrent users have less problems interpreting the statistics and finding
the needed information. They manage to browse through the structure of
subject areas and they are familiar with the terms used in the statistical
area. Moreover they study the Quality declarations connected to the
statistics, for instance information on reliability, sample size, breaks over
What the users still ask for are definitions of concepts that help clarify to
the not-so-frequent users, which information to select and how to interpret
The challenge is three-fold:
1. selecting the concepts we need to define
2. defining the concepts so it becomes meaningful to the general user
3. making the definitions accessible
The concepts we are referring to are all concepts that describe the tables:
Selection of concepts
variable names, values in a list connected to a variable, content of the table.
To give a couple examples:
− Table 1: Unemployed by ancestry, age and sex.
− Table 2: Immigrated by country of origin, citizenship, age and sex
Statistics Denmark has chosen a pragmatic solution deciding that not all
concepts need being defined. For instance age and sex will not be defined in
this system (while they will be documented in the metadata system
connected to micro data and statistical registers).
The selection task is in the hands of a librarian who is not “grown up” in
statistics but who has profound experience meeting users’ questions and
Listed in scheme 1 are some examples of concepts that are selected to be
Concept Dimension Dimension Frequency
Unemployed Ancestry Descendants Continuous
Immigrated Country of origin Western
The StatBank contains approximately 850 different dimensions and
173,000 different dimension members. The first version of the concept
definition database will contain close to 2,000 defined concepts, while the
expected size of the fully developed base will be 8,000 concept definitions.
(Values belonging to an official classification will not be defined separately.
There will instead be made links to the classification itself)
So far, the definitions are only in Danish. 80% of visitors to the StatBank
are using the Danish site, and we assume that the percentage among the
less experienced users is even higher. However, as the interpretation of
concepts may vary from country to country, an English translation of the
concepts is far from irrelevant and the system will be built making multiple
languages possible whenever we find the resources to fill it in.
It has turned out that many concepts were already defined – unfortunately
it was as glossaries in different publications like for instance the statistical
yearbook. They were not in a database and thus they could not be reused in
an electronic way. However, copy/paste form these documents gave a good
start filling the base with relevant definitions. Definitions with similar
content but with slight differences in the wording were often found thus
indicating that the text had been written several times and not re-used from
one publication to the other. Here there is an opportunity for
rationalisation and improving communication.
Our intention is to turn the working process around: storing the concepts
and definitions in a database, connecting the definitions of the concepts to
the databank tables, and re-use them in publications and in a glossary on
the web site.
Scheme 2 gives a few examples of the kind of “down-to-earth” definitions
we have chosen.
Unemployed Out of work, but seeking work (the last week of
Descendant Born in Denmark and neither of the parents is born in
Denmark and has Danish citizenship as well.
Western countries All 27 EU-countries plus Andorra, Iceland,
Liechtenstein, Monaco, Norway, San Marino,
Switzerland, The Vatican City, Canada, USA,
Australia and New Zealand
Collecting information and display it “somewhere” on the Internet is not
good enough. This media has not (yet) the long term tradition for where
users could expect to find the explanations, definitions etc. A printed
publication contains often an appendix with definitions and explanatory
notes in the back of the publication. No such convention exists when it
comes to the Internet.
The challenge is to create a solution that seems intuitively to most users.
The StatBank contains 2,000 large, multi dimensional tables structured
within 17 subject areas. To some users this is overwhelming and difficult to
overview. We plan making the definitions available directly in connection
with the databank and at the stage, where users still are considering
whether it is the right table they are about to select. They need some further
information to make that choice.
A draft solution of the display function looks like this:
The idea is to use the definitions in the database whereever they are
needed: on the web site as a glossary, on publications as explanatory notes,
together with ad hoc table deliveries etc.; red i’s point to this additional
A glossary on the website is at the same time planned to be used in
combination with a search for data. Data should be understood as
integrated – they all have the same source but are displayed at different
levels of detail. A concept used in one publishing should be defined in
similar terms in another publishing. This will be helpful to the users and it
will save resources in the statistical organisation. Definitions are composed
once only and changes made once will be displayed all over.
The glossary will look something like this:
From the glossary there will be links to tables in the StatBank that use the
actual concept. This is automaticlly maintained through the data model
built for the concepts. At a later stage it could possibly also include links to
We expect the concept database to be useful and help the not so frequent
users finding the statistics they need. At the same time it is our expectation
that even the search task will be improved when the direct access to tables
is displayed together with the concept.
Usability tests will help us amending the design solution, and user
satisfaction surveys will tell us if it is used and useful.