SlideShare a Scribd company logo
1 of 14
Big Data Benchmarking Community Call




                 CODE
Commercial Empowered Linked Open Data
       Ecosystem in Research

           presented by Florian Stegmaier
                University of Passau
                    2012-10-04
Some basic facts...
•   Budget: 2,4M €, funded by the European Commission
•   Started in May 2012 with a runtime of 2 years




                                                        2
Current situation

•    Data is being produced in an immense rate:
    • Evaluation campaigns (e.g., CLEF campaign)
    • Benchmarking communities (e.g., TPC)
    • Researchers (e.g., proceedings, journals or slides)
            Most data remains unstructured and
          sophisticated access methods are missing!

                                                            3
Why is this a problem?!
• From a data perspective...
  • ...what is the quality of data?
  • ...how to deal with missing values?
•  From a user perspective...
  • ...how can i compare this data? baseline?
  • ...are there contradicting facts?
    The semantics of documents must be unleashed
       to make them accessible and processible!
                                                   4
The long way to knowledge...




                               5
Step 1: Analyze data
• Analysis of documents has to find:
 • Structural elements (TOC, images, etc.)
 • Extract facts and numerical measures
 • Disambiguate facts (from „string“ to „object“)
• Automatic annotation is defective or not complete
 • Crowdsourced annotation of documents
 • Marketplace offers revenue for expert knowledge
                                                      6
Step 2: Lift and extend data
• Extracted and disambiguated data will be lifted into the
  Linked Data cloud
 • Interlink with already existent data of the cloud
 • Enrich data with provenance information (increase
     quality estimations)
 •   Perform OLAP queries on data cubes (e.g., time series)
         Enriched and aggregated data is exposed
                as Linked Data endpoint.
                                                              7
Step 3: Interact with data
• Query wizard will focus on:
 • Excel based interaction possibilities
 • On the fly creation of statistical analyses on a
     federated dataset
•   Marketplace encourages users to interact with data

Non-IT (but maybe domain) experts are able to create
  visual analytics as well as create new data cubes.


                                                         8
...does all this actually work?


Current analysis of PDFs
is able to discover basic
table of contents,
reading direction, as well
as specific objects.



                                       9
...how are the users involved?

One possible way to
engage users in
annotating data is the
Mendeley Desktop.
(early stage)




                                      10
...what about lifting data?

Basic triplification
chain established to
lift table based data
into a Semantic Web
compatible data
cube.




                                       11
...how can i find data?

The first prototype of
the query wizard is able
to show and interact with
retrieved data in a Excel-
like manner.




                                      12
...and the marketplace?

The data can be exposed
in several ways, just like
in mind maps to help things
getting structured.
(example shows biggerplate)




                                    13
Thank you for your
                          attention!        http://www.code-research.eu/
                                       https://www.facebook.com/CODEresearchEU
                                              #CODEresearchEU (Twitter)




Thanks to Michael Granitzer, Christin Seifert, Kai Schlegel and Sebastian Bayerl for supporting me with input, figures and slide templates and last but
not least our consortium for the prototype screenshots ;)

More Related Content

What's hot

KnowEscape workshop, OKCon 2013
KnowEscape workshop, OKCon 2013KnowEscape workshop, OKCon 2013
KnowEscape workshop, OKCon 2013
Stefan Dietze
 
University of Northumbria Research
University of Northumbria ResearchUniversity of Northumbria Research
University of Northumbria Research
Kevin Ashley
 
Towards the Discovery of Person-Level Data (SemStats, ISWC 2013) [2013.10]
Towards the Discovery of Person-Level Data (SemStats, ISWC 2013) [2013.10]Towards the Discovery of Person-Level Data (SemStats, ISWC 2013) [2013.10]
Towards the Discovery of Person-Level Data (SemStats, ISWC 2013) [2013.10]
Dr.-Ing. Thomas Hartmann
 

What's hot (20)

EuropeanaTech 2018: A distributed network of digital heritage information
EuropeanaTech 2018: A distributed network of digital heritage informationEuropeanaTech 2018: A distributed network of digital heritage information
EuropeanaTech 2018: A distributed network of digital heritage information
 
A distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics AmsterdamA distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics Amsterdam
 
KnowEscape workshop, OKCon 2013
KnowEscape workshop, OKCon 2013KnowEscape workshop, OKCon 2013
KnowEscape workshop, OKCon 2013
 
2013 DataCite Summer Meeting - Elsevier's program to support research data (H...
2013 DataCite Summer Meeting - Elsevier's program to support research data (H...2013 DataCite Summer Meeting - Elsevier's program to support research data (H...
2013 DataCite Summer Meeting - Elsevier's program to support research data (H...
 
A distributed network of digital heritage information by Enno Meijers - Europ...
A distributed network of digital heritage information by Enno Meijers - Europ...A distributed network of digital heritage information by Enno Meijers - Europ...
A distributed network of digital heritage information by Enno Meijers - Europ...
 
Woogle -- On Why and How to Marry Wikis with Enterprise Search
Woogle -- On Why and How to Marry Wikis with Enterprise SearchWoogle -- On Why and How to Marry Wikis with Enterprise Search
Woogle -- On Why and How to Marry Wikis with Enterprise Search
 
Managing Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseManaging Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS case
 
University of Northumbria Research
University of Northumbria ResearchUniversity of Northumbria Research
University of Northumbria Research
 
Probabilistic indexing for archival holdings - possibilities and limits
Probabilistic indexing for archival holdings - possibilities and limitsProbabilistic indexing for archival holdings - possibilities and limits
Probabilistic indexing for archival holdings - possibilities and limits
 
Building a Public Research Center for the HathiTrust Digital Library
Building a Public Research Center for the HathiTrust Digital LibraryBuilding a Public Research Center for the HathiTrust Digital Library
Building a Public Research Center for the HathiTrust Digital Library
 
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)
 
Long-term data curation, aka data preservation - EUDAT Summer School (Marjan ...
Long-term data curation, aka data preservation - EUDAT Summer School (Marjan ...Long-term data curation, aka data preservation - EUDAT Summer School (Marjan ...
Long-term data curation, aka data preservation - EUDAT Summer School (Marjan ...
 
Library and data lecture for inf21306
Library and data lecture for  inf21306Library and data lecture for  inf21306
Library and data lecture for inf21306
 
Trusted data and services from the GDML
Trusted data and services from the GDMLTrusted data and services from the GDML
Trusted data and services from the GDML
 
Towards Generating Policy-compliant Datasets (poster)
Towards GeneratingPolicy-compliant Datasets (poster)Towards GeneratingPolicy-compliant Datasets (poster)
Towards Generating Policy-compliant Datasets (poster)
 
Recommendation to the EU Hearing on Access to and Preservation of Scientific ...
Recommendation to the EU Hearing on Access to and Preservation of Scientific ...Recommendation to the EU Hearing on Access to and Preservation of Scientific ...
Recommendation to the EU Hearing on Access to and Preservation of Scientific ...
 
Research Data MANTRA Demo
Research Data MANTRA DemoResearch Data MANTRA Demo
Research Data MANTRA Demo
 
20070919 Bkt Padua Esf Dfg Workshop Intro
20070919 Bkt Padua Esf Dfg Workshop Intro20070919 Bkt Padua Esf Dfg Workshop Intro
20070919 Bkt Padua Esf Dfg Workshop Intro
 
Towards the Discovery of Person-Level Data (SemStats, ISWC 2013) [2013.10]
Towards the Discovery of Person-Level Data (SemStats, ISWC 2013) [2013.10]Towards the Discovery of Person-Level Data (SemStats, ISWC 2013) [2013.10]
Towards the Discovery of Person-Level Data (SemStats, ISWC 2013) [2013.10]
 
The Data Lifecycle - EUDAT Summer School (Yann Le Franc)
The Data Lifecycle - EUDAT Summer School (Yann Le Franc)The Data Lifecycle - EUDAT Summer School (Yann Le Franc)
The Data Lifecycle - EUDAT Summer School (Yann Le Franc)
 

Similar to Introduction to the FP7 CODE project @ BDBC

Connecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata SilosConnecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
OCLC
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_public
Attila Barta
 
Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.
Alexandru Iosup
 

Similar to Introduction to the FP7 CODE project @ BDBC (20)

Shifting Scientific Practice - ORCID 2015
Shifting Scientific Practice - ORCID 2015Shifting Scientific Practice - ORCID 2015
Shifting Scientific Practice - ORCID 2015
 
Shifting Scientific Practice (K. Thaney)
Shifting Scientific Practice (K. Thaney)Shifting Scientific Practice (K. Thaney)
Shifting Scientific Practice (K. Thaney)
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
 
00-01 DSnDA.pdf
00-01 DSnDA.pdf00-01 DSnDA.pdf
00-01 DSnDA.pdf
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata SilosConnecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_public
 
Lecture_1_Intro.pdf
Lecture_1_Intro.pdfLecture_1_Intro.pdf
Lecture_1_Intro.pdf
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps Approach
 
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-shareBigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
 
Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10
 
The CSO Open Data Experience
The CSO Open Data ExperienceThe CSO Open Data Experience
The CSO Open Data Experience
 
Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.
 
Data literacy
Data literacyData literacy
Data literacy
 
Behind the scenes of data science
Behind the scenes of data scienceBehind the scenes of data science
Behind the scenes of data science
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
 

More from Florian Stegmaier

More from Florian Stegmaier (16)

Ansätze für gemeinschaftliches Filtering
Ansätze für gemeinschaftliches FilteringAnsätze für gemeinschaftliches Filtering
Ansätze für gemeinschaftliches Filtering
 
Fortschritte im Bereich Collaborative Filtering
Fortschritte im Bereich Collaborative FilteringFortschritte im Bereich Collaborative Filtering
Fortschritte im Bereich Collaborative Filtering
 
Realtime
 Distributed Analysis
 of Datastreams
Realtime
 Distributed Analysis
 of DatastreamsRealtime
 Distributed Analysis
 of Datastreams
Realtime
 Distributed Analysis
 of Datastreams
 
Effiziente Verarbeitung von großen Datenmengen
Effiziente Verarbeitung von großen DatenmengenEffiziente Verarbeitung von großen Datenmengen
Effiziente Verarbeitung von großen Datenmengen
 
Trust-based recommender systems
Trust-based recommender systemsTrust-based recommender systems
Trust-based recommender systems
 
Trust und Interest Similarity und deren Anwendung für Empfehlungssysteme
Trust und Interest Similarity und deren Anwendung für EmpfehlungssystemeTrust und Interest Similarity und deren Anwendung für Empfehlungssysteme
Trust und Interest Similarity und deren Anwendung für Empfehlungssysteme
 
Musikempfehlungssysteme
MusikempfehlungssystemeMusikempfehlungssysteme
Musikempfehlungssysteme
 
Robustheit in Empfehlungssystemen
Robustheit in EmpfehlungssystemenRobustheit in Empfehlungssystemen
Robustheit in Empfehlungssystemen
 
Linked Open Data als Basis für Empfehlungssysteme
Linked Open Data als Basis für EmpfehlungssystemeLinked Open Data als Basis für Empfehlungssysteme
Linked Open Data als Basis für Empfehlungssysteme
 
Entscheidungshilfe: Recommender System
Entscheidungshilfe: Recommender SystemEntscheidungshilfe: Recommender System
Entscheidungshilfe: Recommender System
 
Funktionsweise und Ansätze von inhaltsbasiertem Filtern
Funktionsweise und Ansätze von inhaltsbasiertem FilternFunktionsweise und Ansätze von inhaltsbasiertem Filtern
Funktionsweise und Ansätze von inhaltsbasiertem Filtern
 
Context Basierte Personalisierungsansätze
Context Basierte PersonalisierungsansätzeContext Basierte Personalisierungsansätze
Context Basierte Personalisierungsansätze
 
Evaluierung von Empfehlungssystemen
Evaluierung von EmpfehlungssystemenEvaluierung von Empfehlungssystemen
Evaluierung von Empfehlungssystemen
 
Effiziente Verarbeitung von grossen Datenmengen
Effiziente Verarbeitung von grossen DatenmengenEffiziente Verarbeitung von grossen Datenmengen
Effiziente Verarbeitung von grossen Datenmengen
 
Generische Datenintegration zur semantischen Diagnoseunterstützung im Projekt...
Generische Datenintegration zur semantischen Diagnoseunterstützung im Projekt...Generische Datenintegration zur semantischen Diagnoseunterstützung im Projekt...
Generische Datenintegration zur semantischen Diagnoseunterstützung im Projekt...
 
AIR: Architecture for Interoperable Retrieval on Distributed and Heterogeneou...
AIR: Architecture for Interoperable Retrieval on Distributed and Heterogeneou...AIR: Architecture for Interoperable Retrieval on Distributed and Heterogeneou...
AIR: Architecture for Interoperable Retrieval on Distributed and Heterogeneou...
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Introduction to the FP7 CODE project @ BDBC

  • 1. Big Data Benchmarking Community Call CODE Commercial Empowered Linked Open Data Ecosystem in Research presented by Florian Stegmaier University of Passau 2012-10-04
  • 2. Some basic facts... • Budget: 2,4M €, funded by the European Commission • Started in May 2012 with a runtime of 2 years 2
  • 3. Current situation • Data is being produced in an immense rate: • Evaluation campaigns (e.g., CLEF campaign) • Benchmarking communities (e.g., TPC) • Researchers (e.g., proceedings, journals or slides) Most data remains unstructured and sophisticated access methods are missing! 3
  • 4. Why is this a problem?! • From a data perspective... • ...what is the quality of data? • ...how to deal with missing values? • From a user perspective... • ...how can i compare this data? baseline? • ...are there contradicting facts? The semantics of documents must be unleashed to make them accessible and processible! 4
  • 5. The long way to knowledge... 5
  • 6. Step 1: Analyze data • Analysis of documents has to find: • Structural elements (TOC, images, etc.) • Extract facts and numerical measures • Disambiguate facts (from „string“ to „object“) • Automatic annotation is defective or not complete • Crowdsourced annotation of documents • Marketplace offers revenue for expert knowledge 6
  • 7. Step 2: Lift and extend data • Extracted and disambiguated data will be lifted into the Linked Data cloud • Interlink with already existent data of the cloud • Enrich data with provenance information (increase quality estimations) • Perform OLAP queries on data cubes (e.g., time series) Enriched and aggregated data is exposed as Linked Data endpoint. 7
  • 8. Step 3: Interact with data • Query wizard will focus on: • Excel based interaction possibilities • On the fly creation of statistical analyses on a federated dataset • Marketplace encourages users to interact with data Non-IT (but maybe domain) experts are able to create visual analytics as well as create new data cubes. 8
  • 9. ...does all this actually work? Current analysis of PDFs is able to discover basic table of contents, reading direction, as well as specific objects. 9
  • 10. ...how are the users involved? One possible way to engage users in annotating data is the Mendeley Desktop. (early stage) 10
  • 11. ...what about lifting data? Basic triplification chain established to lift table based data into a Semantic Web compatible data cube. 11
  • 12. ...how can i find data? The first prototype of the query wizard is able to show and interact with retrieved data in a Excel- like manner. 12
  • 13. ...and the marketplace? The data can be exposed in several ways, just like in mind maps to help things getting structured. (example shows biggerplate) 13
  • 14. Thank you for your attention! http://www.code-research.eu/ https://www.facebook.com/CODEresearchEU #CODEresearchEU (Twitter) Thanks to Michael Granitzer, Christin Seifert, Kai Schlegel and Sebastian Bayerl for supporting me with input, figures and slide templates and last but not least our consortium for the prototype screenshots ;)