OpenCube Workshop at eGov2015 & ePart2015 dual conference
1. Workshop at Dual EGOV 2015 & ePart 2015 conference
2 September 2015, Thessaloniki Greece
Create, Expand, and Exploit Linked Open
Statistical Data
E. Tambouris, E. Kalampokis, K. Tarabanis
2. Data Cubes on the Web
The RDF Data Cube Vocabulary
OpenCube OLAP Browser
OpenCube Mapview
OpenCube Expander
2
Table of Contents
Dual EGOV2015 & ePart2015 conference
9. We need generic tools that can be reused across different datasets
and sources.
9
The vision of exploiting Expanded Linked Data Cubes
Dual EGOV2015 & ePart2015 conference
10. Data Cubes on the Web
The RDF Data Cube Vocabulary
OpenCube OLAP Browser
OpenCube Mapview
OpenCube Expander
10
Table of Contents
Dual EGOV2015 & ePart2015 conference
12. A qb:DataSet is a collection of statistical data that corresponds
to a defined structure
12
Data Set
qb:DataSet
a
Dual EGOV2015 & ePart2015 conference
13. A qb:DataStructureDefinition defines the structure of one or
more datasets. In particular, it defines the dimensions, attributes
and measures used in the dataset
13
Data Structure Definition
qb:DataStructureDefinition
qb:structure
Dual EGOV2015 & ePart2015 conference
14. The Data Cube vocabulary represents the dimensions, attributes and measures
as RDF properties.
Each is an instance of the abstract qb:ComponentProperty class, which in
turn has sub-classes qb:DimensionProperty, qb:AttributeProperty and
qb:MeasureProperty.
14
Data Structure Definition
qb:DimensionProperty
Dual EGOV2015 & ePart2015 conference
15. Each observation is represented as an instance of type
qb:Observation.
15
Observation
qb:Observation
Dual EGOV2015 & ePart2015 conference
16. Slices allow us to group subsets of observations together.
This is not intended to represent arbitrary selections from the
observations but uniform slices through the cube in which one
or more of the dimension values are fixed.
16
Slice
Dual EGOV2015 & ePart2015 conference
17. Eurostat – Linked Data
This is NOT an official endeavor of Eurostat
http://eurostat.linked-statistics.org
Every data set from Eurostat has been transformed to an RDF
data cube described in a RDF file in the following links
<http://eurostat.linked-statistics.org/data/EUROSTAT_CODE.rdf>
<http://eurostat.linked-statistics.org/dsd/EUROSTAT_CODE.ttl>
For example:
DATA: <http://eurostat.linked-statistics.org/data/t2020_50.rdf>
DSD: <http://eurostat.linked-statistics.org/dsd/t2020_50.ttl>
17
Data (Observations)
Data Structure Definition
Dual EGOV2015 & ePart2015 conference
20. SPARQL federated query
Get from Eurostat the percentage of people at risk of poverty and from Digital Agenda the eGov indicator
per country in 2013
20
PREFIX …
SELECT DISTINCT (str(?label) as ?labelStr) ?poverty (str(?ucs) as ?userCentricityScore)
FROM <http://eurostat.linked-statistics.org/data/t2020_50.rdf>
FROM <http://eurostat.linked-statistics.org/dsd/t2020_50.ttl>
WHERE {
?obs a qb:Observation;
sdmx-dimension:timePeriod "2013-01-01"^^xsd:date;
property:unit <http://eurostat.linked-statistics.org/dic/unit#PC_POP>;
property:geo ?country;
sdmx-measure:obsValue ?poverty.
?country skos:prefLabel ?label.
FILTER(LANGMATCHES(LANG(?label), "EN"))
SERVICE <http://digital-agenda-data.eu/data/sparql> {
?observation a qb:Observation;
dad-prop:indicator <http://semantic.digital-agenda-data.eu/codelist/indicator/user_centric_egov>;
dad-prop:time-period <http://reference.data.gov.uk/id/gregorian-year/2013>;
dad-prop:breakdown <http://semantic.digital-agenda-data.eu/codelist/breakdown/all_egov_le>;
dad-prop:ref-area ?country;
sdmx-measure:obsValue ?ucs.
}}
Dual EGOV2015 & ePart2015 conference
21. SPARQL federated query
Get from Eurostat the percentage of people at risk of poverty and from Digital Agenda the eGov indicator
per country in 2013
21
PREFIX …
SELECT DISTINCT (str(?label) as ?labelStr) ?poverty (str(?ucs) as ?userCentricityScore)
FROM <http://eurostat.linked-statistics.org/data/t2020_50.rdf>
FROM <http://eurostat.linked-statistics.org/dsd/t2020_50.ttl>
WHERE {
?obs a qb:Observation;
sdmx-dimension:timePeriod "2013-01-01"^^xsd:date;
property:unit <http://eurostat.linked-statistics.org/dic/unit#PC_POP>;
property:geo ?country;
sdmx-measure:obsValue ?poverty.
?country skos:prefLabel ?label.
FILTER(LANGMATCHES(LANG(?label), "EN"))
SERVICE <http://digital-agenda-data.eu/data/sparql> {
?observation a qb:Observation;
dad-prop:indicator <http://semantic.digital-agenda-data.eu/codelist/indicator/user_centric_egov>;
dad-prop:time-period <http://reference.data.gov.uk/id/gregorian-year/2013>;
dad-prop:breakdown <http://semantic.digital-agenda-data.eu/codelist/breakdown/all_egov_le>;
dad-prop:ref-area ?country;
sdmx-measure:obsValue ?ucs.
}}
Dual EGOV2015 & ePart2015 conference
22. Data cubes on the Web
The RDF Data Cube Vocabulary
OpenCube OLAP Browser
OpenCube Mapview
22
Table of Contents
Dual EGOV2015 & ePart2015 conference
23. It is a proof of concept of the linked data analytics vision.
It enables performing OLAP operations on top of integrated views of
multiple linked data cubes.
23
The OpenCube OLAP browser
Dual EGOV2015 & ePart2015 conference
25. Dual EGOV2015 & ePart2015 conference 25
Architecture (Aggregator)
The Aggregator computes
aggregations of cells across
dimensions or hierarchies
26. The Aggregator creates 2n-1 sub-cubes from a cube of n dimensions.
26
Compute aggregations across dimension
Time
Geo
Sex
Time Time
Geo Sex
Geo
Sex
Time GeoSex
Total
Three dimensions
Two dimensions
One dimension
No dimensions
Dual EGOV2015 & ePart2015 conference
27. It enriches an existing cube with new observations by using a
hierarchy.
Dual EGOV2015 & ePart2015 conference 27
Compute aggregations across hierarchies
Time
Geo
Sex
city1
city2
city3
+
city4
country1
region1
region2
city1
city2
city3
city4
Time
Geo
city1
city2
city3
city4
= region1
region2
country1
Sex
28. Dual EGOV2015 & ePart2015 conference 28
Architecture (Compatibility Explorer)
Given a cube in the local store,
the Compatibility Explorer
(a) Searches into the Linked
Data Web and identifies
cubes that are compatible to
expand the initial cube and
(b) Establishes typed links
between the local and the
compatible cubes
29. Binary relations that link two cubes that are compatible to integrate.
Operators that map from these two cubes to a new expanded one.
The framework assumes that a cube can be expanded by increasing
the size of one of the sets that define a cube i.e.:
The set of measures
The set of objects of an attribute of a dimension
The set of attributes of a dimension
The set of dimensions
Dual EGOV2015 & ePart2015 conference 29
Theoretical Framework
32. Dual EGOV2015 & ePart2015 conference 32
Architecture (Expander)
The Expander creates a new
expanded cube by merging two
compatible ones.
The Expander implements the
theoretical framework
33. In our case the Expander is integrated with the OLAP browser enabling
this way the performance of OLAP operations on top of integrated views of
compatible cubes
Dual EGOV2015 & ePart2015 conference 33
Expander
34. Dual EGOV2015 & ePart2015 conference 34
Architecture (OLAP Browser)
The linked data OLAP browser
exploits the others components
of the platform in order to
enable performing OLAP
operation on top of expanded
cubes.
These may include measures,
dimensions, objects, and/or
attributes from multiple cubes
that reside on disparate sources
on the Web.
36. An instance of the developed platform have been
deployed at the premises of the Flemish government.
Flemish government had already opened up statistics by
means of linked data cubes.
11 cubes had been transformed to linked data according
to the QB vocabulary and stored in a Virtuoso RDF store.
Using the Aggregator a total of 230 sub-cubes have been
created.
250 links have been established from 73 cubes or
(sub)cubes to other compatibles (sub)cubes
Dual EGOV2015 & ePart2015 conference 36
The Flemish Government
38. The user selects one of the cubes
Dual EGOV2015 & ePart2015 conference 38
OpenCube Browser
39. The browser starts with an empty canvas
Dual EGOV2015 & ePart2015 conference 39
OpenCube Browser
40. The user can change the language
Dual EGOV2015 & ePart2015 conference 40
OpenCube Browser
41. The user can see the dimensions of the cube
Dual EGOV2015 & ePart2015 conference 41
OpenCube Browser
42. The user can see the measures of the cube
Dual EGOV2015 & ePart2015 conference 42
OpenCube Browser
43. When the user selects at least one measure and one dimension…
Dual EGOV2015 & ePart2015 conference 43
OpenCube Browser
The geo
dimension has 4
levels
44. When the user selects a second level in a dimension…
Dual EGOV2015 & ePart2015 conference 44
OpenCube Browser (Drill-down & roll-up)
2 levels have
been selected
45. Keep in mind that you can select at most 2 levels
Dual EGOV2015 & ePart2015 conference 45
OpenCube Browser (Drill-down & roll-up)
46. Dual EGOV2015 & ePart2015 conference 46
OpenCube Browser (Selecting more measure & dimensions)
We set a fixed
value in the
other
dimensions
Different colors
for multiple
measures
47. All this time you see a green message
The user is able to select to expand the cube that sees in the table
using data from other cubes
Dual EGOV2015 & ePart2015 conference 47
OpenCube Browser (Expander)
51. The RDF Data Cube Vocabulary
Linked Data Cubes on the Web
The OpenCube Toolkit
OpenCube OLAP Browser
OpenCube Mapview
51
Table of Contents
17-18 June 2015 Foster School of Data
56. The QB vocabulary is expressive enough and fulfils the requirements
for combining two cubes and performing OLAP operations.
Difficulties to combine cubes from different sources
The QB vocabulary allows data publishers to choose application practices that
are best suited to their particular situation
As a result, different practices are followed by different publishers making it
difficult to produce generically applicable tools that combine data.
The second reason that prevents disparate cubes integration is the
standardisation of concept schemes and code lists.
Extend both the theoretical framework (join) and the platform
(statistical analyses)
Dual EGOV2015 & ePart2015 conference 56
Conclusions & Challenges
57. The work presented in the paper is partly funded by
Dual EGOV2015 & ePart2015 conference 57
Acknowledgments
http://opencube-project.eu
@OpenCubeProject