SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 30 day free trial to unlock unlimited reading.
1.
Open Statistics
Open Belgium 6 March 2017
Statistics Belgium
Lucia Decuyper
Youri Baeyesn
2.
Open Statistics – Agenda
Statistics Belgium => Open Data
• Statistics Belgium
• Open Data Start
• Statbel Open Data Portal
• Statistics Belgium in the EU
Open Data => Linked Open Data
• 5*****?
• RDF
• LOD
• Semantic Web
• Ontologies for statisticians
• LOD in the NSIs
• RDF@statbel
Questions
Contact
3.
Open Statistics – Statistics Belgium 1
Statistics Belgium ?
– National Statistical Institute (before NIS)
• largest producer of official statistics in Belgium
What do we do?
– Collect data: administrative sources (registers) or surveys
– Process and analyse data:
• common methodology, definitions (national, European)
– Publish data
• => +/- 400y releases on Statbel
4.
Open Statistics – Statistics Belgium 2
One of the core tasks consists in making all produced statistics
available to everyone (European Statistics Code of Practice)
– Website Statbel since 1997
– Free re-(use) => source
– ‘open by default’
+/-100 statistics
– The main fields covered are population, society, work,
economy, real estate, construction, mobility and transport.
– Census
5.
Open Statistics – Open Data Start?
Why?
• 2nd PSI – directive
• Belgian Federal Open Data strategy 2015
• Digital agenda (EU)
• Eurostat => EU Open Data Portal
• Crossroad Bank Enterprises (KBO) company register
• Users
Benefits
6.
Open Statistics – Statbel Open Data Portal 1
Open Data Portal on the Statbel website since Q4 2015 : www.statbel.fgov.be/opendata
– Population & Census
– Labour market &
living conditions
• Fiscal statistics
on income
– Environment
– Prices
• CPI
– Tools
• Geography
• Codes and Classifications
7.
Open Statistics – Statbel Open Data Portal 2
+-/ 110 datasets
Formats
• XLSX Excel Pivot tables
• CSV, TXT R, SAS, …, PostgreSQL,
• GML, SHP QGIS, ArcGIS, … ,
• Json, XML, CSV, XLSX be.STAT=> dynamic databank of Statbel
Special care
– Privacy
– Continuity
Goal : 1 new dataset/month
– Next : population, households, real estate
8.
Open Statistics – Statistics Belgium in the EU
European Statistical System = Eurostat + NSI’s
– Key provider of public open data
– Draft Open Data Strategy (feb 2017)
Statistics Belgium
• Statbel.fgov.be/opendata
Eurostat
• Key contributor to the open
data portals
EU Open Data Portal
• Data.europa.eu/euodp
Belgium
• Data.gov.be
Metadata
harvesting European Data Portal
• www.europeandataportal.eu
metadata
Metadata
harvesting
9.
Open Statistics – 5***** ?
Statistics Belgium => Open Data
Statbel: Situation actuelle
Statbel: Ambition
10.
Open Statistics – RDF
Resource description framework (RDF)
11.
Open Statistics – RDF - Uniform resource identifier URI
Use URIs to identify things, so that people can point at your
stuff
– A URI identifies a concept.
– Example of a URI for the Rixensart
commune:http://vocab.belgif.be/refnis/25091#id
– In general, a URI is associated with a web page that documents the
concept. For Rixensart:
http://vocab.belgif.be/refnis/25091
12.
Open Statistics – Resource description framework (RDF)
In the RDF files, triplets of the type “subject-predicate-object” are stored
In RDF files,
– subjects are URIs.
– predicats are URIs.
– objects are URIs ou des litéraux
Example (nomenclature):
<http://vocab.belgif.be/refnis/25091#id>
<http://www.w3.org/2004/02/skos/core#prefLabel> "Rixensart"@fr .
There are "standard vocabularies" (rules for forming triplets). Skos is one
of them.
13.
Open Statistics – Resource description framework (RDF)
It’s possible to use "prefixes" to "abbreviate" URIs in RDF files
Example:
@prefix refnis: http://vocab.belgif.be/refnis/ .
@prefix skos: http://www.w3.org/2004/02/skos/core# .
refnis:25091#id skos:prefLabel "Rixensart"@fr.
refnis:25091#id skos:broader refnis:25000#id.
14.
Open Statistics – Resource description framework (RDF)
Sample RDF file to describe a study(metadata):
– ddi:Study_1 a disco:Study.
– ddi:Study_1 dcterms:title "National Population and Housing Census, 1980"@en.
– ddi:Study_1 dcterms:identifier "ARG_1980_PHC_v01_A_IPUMS“ .
This description uses the vocabulary « ddi-rdf » (disco):
– DDI-RDF is “A vocabulary for publishing metadata about data sets
(research and survey data) into the Web of Linked Data”
– Described here : http://rdf-vocabulary.ddialliance.org/discovery.html
15.
Open Statistics – Resource description framework (RDF)
RDF = forming triplets
There are several syntaxes to form them
– turtle,
– N-triples,
– xml,
– …
16.
Open Statistics – Linked Open Data (LOD)
Linked open-data (LOD)
17.
Open Statistics – Linked Open Data (LOD)
It’s possible to link several RDF sources. This is referred to as Linked
Open Data (LOD).
Examples of LOD sites on which to link :
– Dbpedia
– Wikidata
– Geonames
A simple way to link to another DB is to re-use its URIs
18.
Open Statistics – Linked Open Data (LOD)
Example of LOD (nomenclature):
– @prefix refnis: http://vocab.belgif.be/refnis/ .
@prefix skos: http://www.w3.org/2004/02/skos/core# .
refnis:25091#id skos:prefLabel "Rixensart"@fr.
refnis:25091#id skos:broader refnis:25000#id.
refnis:25091#id skos:exactMatch <http://sws.geonames.org/2787990>.
refnis:25091#id skos:exactMatch <http://www.wikidata.org/entity/Q630478> .
20.
Open Statistics – Semantic web
All the " sujet-prédicat-objet " sentences of the different LODs
form a giant "knowledge graph" whose size increases rapidly
22.
Open Statistics – Ontologies for statisticians
Standard vocabularies
23.
Open Statistics – Standard vocabularies
Classifications
– SKOS: Classifications (nomenclatures)
– XKOS: SKOS extension (for NACE, …)
Document a list of files (catalog)
– DCAT
– StatDCAT-AP
– GeoDCAT-AP
24.
Open Statistics – Standard vocabularies
Metadata:
– Dublin core
– DDI-RDF
Data:
– RDF Data cube vocabulary
25.
Open Statistics – Standard vocabularies
Other interesting vocabularies recommended by Eurostat
– The Organization Ontology
– The PROV ontology
– Time Ontology in OWL
– Dublin Core
– ISA Core Vocabularies in RDF (Person, Public Organisation,
Business, Public Service, Location)
– Vocabulary of Interlinked Datasets (VoID)
26.
Open Statistics – Nomenclatures
Some nomenclatures, "controlled vocabularies" & thesauri
recommended by Eurostat:INSPIRE code lists
– EuroVoc thesaurus
– Named Authority Lists (NAL)
27.
Open Statistics – LOD IN THE NSIs
Some NSIs already have LOD:
– Insee: Some code tables + legal population
– Istat
– ONS + Geoportal UK
– Census 2011 in Ireland
28.
Open Statistics – RDF@Statbel
What to publish as LOD?
Priorities for publication as LOD:
– Nomenclatures (create URIs for NACEBEL, REFNIS, … +
create files that expose hierarchies, …)
– Catalog of the data (to let the ‘machines’ all over the world
know that our datasets are available in csv, …)
– Metadata
– A selection of datasets (For example: legal population of
municipalities)