Big Data Benchmarking Community Call                 CODECommercial Empowered Linked Open Data       Ecosystem in Research...
Some basic facts...•   Budget: 2,4M €, funded by the European Commission•   Started in May 2012 with a runtime of 2 years ...
Current situation•    Data is being produced in an immense rate:    • Evaluation campaigns (e.g., CLEF campaign)    • Benc...
Why is this a problem?!• From a data perspective...  • ...what is the quality of data?  • ...how to deal with missing valu...
The long way to knowledge...                               5
Step 1: Analyze data• Analysis of documents has to find: • Structural elements (TOC, images, etc.) • Extract facts and nume...
Step 2: Lift and extend data• Extracted and disambiguated data will be lifted into the  Linked Data cloud • Interlink with...
Step 3: Interact with data• Query wizard will focus on: • Excel based interaction possibilities • On the fly creation of st...
...does all this actually work?Current analysis of PDFsis able to discover basictable of contents,reading direction, as we...
...how are the users involved?One possible way toengage users inannotating data is theMendeley Desktop.(early stage)      ...
...what about lifting data?Basic triplificationchain established tolift table based datainto a Semantic Webcompatible datac...
...how can i find data?The first prototype ofthe query wizard is ableto show and interact withretrieved data in a Excel-lik...
...and the marketplace?The data can be exposedin several ways, just likein mind maps to help thingsgetting structured.(exa...
Thank you for your                          attention!        http://www.code-research.eu/                                ...
Upcoming SlideShare
Loading in...5
×

Introduction to the FP7 CODE project @ BDBC

238

Published on

The FP7 CODE project will be presented at the Big Data Benchmarking Community call. Here, a high-level overview shall introduce CODEs vision and show the progress after 6-months.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
238
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Introduction to the FP7 CODE project @ BDBC"

  1. 1. Big Data Benchmarking Community Call CODECommercial Empowered Linked Open Data Ecosystem in Research presented by Florian Stegmaier University of Passau 2012-10-04
  2. 2. Some basic facts...• Budget: 2,4M €, funded by the European Commission• Started in May 2012 with a runtime of 2 years 2
  3. 3. Current situation• Data is being produced in an immense rate: • Evaluation campaigns (e.g., CLEF campaign) • Benchmarking communities (e.g., TPC) • Researchers (e.g., proceedings, journals or slides) Most data remains unstructured and sophisticated access methods are missing! 3
  4. 4. Why is this a problem?!• From a data perspective... • ...what is the quality of data? • ...how to deal with missing values?• From a user perspective... • ...how can i compare this data? baseline? • ...are there contradicting facts? The semantics of documents must be unleashed to make them accessible and processible! 4
  5. 5. The long way to knowledge... 5
  6. 6. Step 1: Analyze data• Analysis of documents has to find: • Structural elements (TOC, images, etc.) • Extract facts and numerical measures • Disambiguate facts (from „string“ to „object“)• Automatic annotation is defective or not complete • Crowdsourced annotation of documents • Marketplace offers revenue for expert knowledge 6
  7. 7. Step 2: Lift and extend data• Extracted and disambiguated data will be lifted into the Linked Data cloud • Interlink with already existent data of the cloud • Enrich data with provenance information (increase quality estimations) • Perform OLAP queries on data cubes (e.g., time series) Enriched and aggregated data is exposed as Linked Data endpoint. 7
  8. 8. Step 3: Interact with data• Query wizard will focus on: • Excel based interaction possibilities • On the fly creation of statistical analyses on a federated dataset• Marketplace encourages users to interact with dataNon-IT (but maybe domain) experts are able to create visual analytics as well as create new data cubes. 8
  9. 9. ...does all this actually work?Current analysis of PDFsis able to discover basictable of contents,reading direction, as wellas specific objects. 9
  10. 10. ...how are the users involved?One possible way toengage users inannotating data is theMendeley Desktop.(early stage) 10
  11. 11. ...what about lifting data?Basic triplificationchain established tolift table based datainto a Semantic Webcompatible datacube. 11
  12. 12. ...how can i find data?The first prototype ofthe query wizard is ableto show and interact withretrieved data in a Excel-like manner. 12
  13. 13. ...and the marketplace?The data can be exposedin several ways, just likein mind maps to help thingsgetting structured.(example shows biggerplate) 13
  14. 14. Thank you for your attention! http://www.code-research.eu/ https://www.facebook.com/CODEresearchEU #CODEresearchEU (Twitter)Thanks to Michael Granitzer, Christin Seifert, Kai Schlegel and Sebastian Bayerl for supporting me with input, figures and slide templates and last butnot least our consortium for the prototype screenshots ;)

×