Analyzing Statistics
with Background Knowledge
from Linked Open Data

10/22/13 Ristoski, Heiko Paulheim
Petar Ristoski, He...
Idea
•

Background knowledge from LOD can help
– finding explanations
– creating more sophisticated visualizations

•

Ste...
Linking to LOD Datasets
•

Linking to DBpedia
– using DBpedia Lookup
– restricting results to Place and AdministrativeArea...
Linking to LOD Datasets
•

Linking to GADM
– searching by name turned out to be error-prone
– searching by coordinates (fr...
Feature Extraction
•

Once the links have been created
– get polygon shapes from GADM (for visualization)
– get datatype p...
Visualization with GADM Polygons
•

Polygons from GADM allow for visualization
of unemployment on maps

10/22/13

Petar Ri...
Finding Correlations
•

Using extracted features to find interesting correlations

•

Example correlation for unemployment...
Visualization Correlations
•

e.g., unemployment rate ~ number of police stations

10/22/13

Petar Ristoski, Heiko Paulhei...
Tools
•

FeGeLOD/Explain-a-LOD (ESWC 2012: best demo award)

10/22/13

Petar Ristoski, Heiko Paulheim

9
Tools
•

RapidMiner Linked Open Data Extension (2013)

10/22/13

Petar Ristoski, Heiko Paulheim

10
Assets
•

New modules for RapidMiner LOD extension
– DBpedia Lookup linker
– Label-based linker

•

New links for DBpedia ...
Conclusions & Lessons Learned
•

Linked Open Data provides useful background knowledge
– For finding explanations
– For cr...
Conclusions & Lessons Learned
•

Negative correlation: traffic accident casualties ~ unemployment
– Fight unemployment by ...
Analyzing Statistics
with Background Knowledge
from Linked Open Data

10/22/13 Ristoski, Heiko Paulheim
Petar Ristoski, He...
Upcoming SlideShare
Loading in …5
×

Analyzing Statistics with Background Knowledge from Linked Open Data

1,203 views
1,044 views

Published on

Background knowledge from Linked Open Data sources, such as DBpedia, Eurostat, and GADM, can be used to create both interpretations and advanced visualizations of statistical data. In this paper, we discuss methods of linking statistical data to Linked Open Data sources and the use of the Explain-a-LOD toolkit. The paper further shows exemplary findings and visualizations created by combining the statistics datasets with Linked Open Data.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,203
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Analyzing Statistics with Background Knowledge from Linked Open Data

  1. 1. Analyzing Statistics with Background Knowledge from Linked Open Data 10/22/13 Ristoski, Heiko Paulheim Petar Ristoski, Heiko Paulheim Petar 1
  2. 2. Idea • Background knowledge from LOD can help – finding explanations – creating more sophisticated visualizations • Steps taken – linking the statistics datasets to LOD datasets – DBpedia, Eurostat, GADM, Linked Geo Data – Extracting features • Finding correlations with unemployment rate – using only one target variable for demonstration purposes – works for arbitrary target variables 10/22/13 Petar Ristoski, Heiko Paulheim 2
  3. 3. Linking to LOD Datasets • Linking to DBpedia – using DBpedia Lookup – restricting results to Place and AdministrativeArea – select from many results by minimum edit distance • Linking to Eurostat – using SPARQL to query for labels – querying for word 1-grams, 2-grams, … from original labels – selecting by minimum edit distance 10/22/13 Petar Ristoski, Heiko Paulheim 3
  4. 4. Linking to LOD Datasets • Linking to GADM – searching by name turned out to be error-prone – searching by coordinates (from DBpedia) is precise • but suffers from low recall – two-stage approach: • searching by coordinates • searching by average coordinates of all linked objects (in DBpedia) • Total figures: – France regions: 27/27 DBpedia, 26/27 Eurostat, 27/27 GADM – France departments: 101/101 DBpedia, 101/101 GADM – Australia states: 8/9 DBpedia, 9/9 GADM – Australia SA3/SA4: no satisfying results, discarded 10/22/13 Petar Ristoski, Heiko Paulheim 4
  5. 5. Feature Extraction • Once the links have been created – get polygon shapes from GADM (for visualization) – get datatype properties from Eurostat/DBpedia – get direct types from DBpedia (incl. YAGO types) – get qualified relations from DBpedia • Using information from Linked Geo Data – extract objects within GADM polygon, aggregate by type (e.g., region contains 125 police stations) – spatial queries only possible with rectangles – workaround: use minimum enclosing rectangle and filter afterwards 10/22/13 Petar Ristoski, Heiko Paulheim 5
  6. 6. Visualization with GADM Polygons • Polygons from GADM allow for visualization of unemployment on maps 10/22/13 Petar Ristoski, Heiko Paulheim 6
  7. 7. Finding Correlations • Using extracted features to find interesting correlations • Example correlation for unemployment in France: – African islands, Islands in the Indian Ocean, Outermost regions of the EU (positive) – GDP (negative) – Disposable income (negative) – Hospital beds/inhabitants (negative) – RnD spendings (negative) – Energy consumption (negative) – Population growth (positive) – Casualties in traffic accidents (negative) – Fast food restaurants (positive) – Police stations (positive) 10/22/13 Petar Ristoski, Heiko Paulheim 7
  8. 8. Visualization Correlations • e.g., unemployment rate ~ number of police stations 10/22/13 Petar Ristoski, Heiko Paulheim 8
  9. 9. Tools • FeGeLOD/Explain-a-LOD (ESWC 2012: best demo award) 10/22/13 Petar Ristoski, Heiko Paulheim 9
  10. 10. Tools • RapidMiner Linked Open Data Extension (2013) 10/22/13 Petar Ristoski, Heiko Paulheim 10
  11. 11. Assets • New modules for RapidMiner LOD extension – DBpedia Lookup linker – Label-based linker • New links for DBpedia 3.9 release – DBpedia – GADM (39,000 links) 10/22/13 Petar Ristoski, Heiko Paulheim 11
  12. 12. Conclusions & Lessons Learned • Linked Open Data provides useful background knowledge – For finding explanations – For creating visualizations • Some data sources are more suitable than others – official data sources (e.g. Eurostat) provide best results 10/22/13 Petar Ristoski, Heiko Paulheim 12
  13. 13. Conclusions & Lessons Learned • Negative correlation: traffic accident casualties ~ unemployment – Fight unemployment by increasing traffic accidents? http://xkcd.com/552/ 10/22/13 Petar Ristoski, Heiko Paulheim 13
  14. 14. Analyzing Statistics with Background Knowledge from Linked Open Data 10/22/13 Ristoski, Heiko Paulheim Petar Ristoski, Heiko Paulheim Petar 14

×