EDF 2012 Datasets


Published on

The presentation shows which datasets have been converted to RDF and interlinked within the LATC EU project. In particular, it shows the typical conversion process for one example dataset - the EU financial transparency system.

Published in: Self Improvement, Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

EDF 2012 Datasets

  1. 1. Dataset Presentations Jens Lehmann AKSW Group, University of Leipzig 6 June 2012Realising and Exploiting the EU Data Cloud European Data Forum, Copenhagen, Denmark
  2. 2. EU-Level Dataset Development
  3. 3. List of LATC DatasetsBusiness Legal InstitutionsFTS Eur-Lex EuroStat(EU finance) (European Law) (Statistical Data)CORDIS N-Lex Institution List(EU projects, finance) (National Law)Euraxess Taxation & Customs EU Who is Who(EU jobs, companies)EURES EU Patent Office EU Barometer(EU jobs)EC Competition EU Agencies European Election Results(market overview)eSBN PreLex European Parliament(eBusiness solutions) (inter-institutional law) Media UNODC European Central Bank (drugs & crime statistics) Statitsticshttp://latc-project.eu/datasets/ Other: Eventseer, Sciencewise Total: 22 Datasets
  4. 4. Financial Transparency System Step 1: Analysing the Dataset Financial Transparency System (FTS) contains information about 110000+ EU grants Contains beneficiaries, amount of funding, year, responsible department, country etc. Covers years 2007 – 2010 Originally published in HTML, XML and CSV
  5. 5. Financial Transparency System Step 2: Modelling the Data in RDF and OWL Michael Martin, Claus Stadler, Philipp Frischmuth, Jens Lehmann: Increasing the Financial Transparency of European Commission Project Funding: Semantic Web Journal (Under review)
  6. 6. Financial Transparency System Step 3: Converting the Dataset Java classes generated automatically from XML Schema XML data accessible as Java Objects → script based transformation High flexibility for data cleansing and special cases JAXB Transformation XML Java Objects RDF JAXB XSD Java Classes Source code of transformation ● https://github.com/AKSW/FTS-EC-2-RDF/
  7. 7. Financial Transparency System Step 4: Publishing the Dataset Landing Page, Linked Data, SPARQL endpoint, browser at http://fts.publicdata.eu via OntoWiki Metadata: Datahub http://thedatahub.org OntoWiki
  8. 8. Financial Transparency System
  9. 9. Financial Transparency System
  10. 10. Financial Transparency System Step 5: Enriching the Dataset Linking with LIMES (http://limes.aksw.org) Link targets: ● LinkedGeoData: cities ● DBpedia: cities, countries, years, schema Geo-Coding of beneficiaries on city and address level – 45k coordinates Meta data: author, license, source, statistics using DublinCore, Void, DataCube
  11. 11. Financial Transparency System Step 6: Queries, Applications, Visualisation RDF version allows: ● Find organisations with highest funding ● Compare funding across countries / beneficiaries ● Compare funding per year and country (from FTS) with gross domestic product (from DBpedia) – see next slide → overall increases transparency and may serve as input for research policy strategies
  12. 12. Financial Transparency SystemSELECT * { { SELECT ?ftsyear ?ftscountry (SUM(?amount) AS ?funding) { ?com rdf:type fts-o:Commitment . ?com fts-o:year ?year . ?year rdfs:label ?ftsyear . ?com fts-o:benefit ?benefit . ?benefit fts-o:detailAmount ?amount . ?benefit fts-o:beneficiary ?beneficiary . ?beneficiary fts-o:country ?country . ?country owl:sameAs ?ftscountry . } } { SELECT ?dbpcountry ?gdpyear ?gdpnominal { ?dbpcountry rdf:type dbp-o:Country . ?dbpcountry dbp-p:gdpNominal ?gdpnominal . ?dbpcountry dbp-p:gdpNominalYear ?gdpyear . } } FILTER ((?ftsyear = str(?gdpyear)) && (?ftscountry = ?dbpcountry)) }
  13. 13. Financial Transparency System
  14. 14. European Employment Services European Employment Services (EURES) cooperation network for free movement of workers in the EU Publishes 1.2+ mio Job vacancies, 700 000 CVs, 25000 employers RDF version can be used to: ● compare geographical, economic information for new jobs (DBpedia, LGD) ● Salary comparisons relative to standards in job region ● Quality of nearby schools
  15. 15. European Employment Services Neither API nor dump available → site scraping Modelling considered existing ontologies Published using D2R: http://www4.wiwiss.fu-berlin.de/eures/ 7 mio triples, classes: Offer, Skill, Employer 3000 links to DBpedia cities + regions + countries + languages + currencies, LEXVO languages, Eurostat Updates can be performed by scraping only new pages
  16. 16. Euraxess Contains research jobs in EU, 6400 organisations, 1700 open jobs, 61000 registered researchers, 18000 researcher CVs http://ec.europa.eu/euraxess/ Contains information about people, jobs, skills, languages etc. links to DBpedia languages and LEXVO languages
  17. 17. Euraxess + EURES Query  Query: aggregates information about jobs and companies in a country from two different sourcesSELECT DISTINCT ?job ?company WHERE {SERVICE <http://www4.wiwiss.fu-berlin.de/eures/sparql> { ?job eures:country ?countryjob. ?countryjob a eures:Country. ?countryjob rdfs:label ?n.}SERVICE <http://www4.wiwiss.fu-berlin.de/euraxess/sparql>{ ?company euraxess:country ?countrycomp. ?countrycomp a euraxess:Country. ?countryjob owl:sameAs ?countrycomp .}}
  18. 18. Summary / Take Away Messages Linked Data increasingly important in EU E-Government Many RDF conversion tools/techniques available depending on source format Linked Data simplifies data integration – added value by enrichment, e.g. linking to other data sets or schema creation LOD cloud provides rich background information Thanks for your Attention!