Commercially empowered Linked Open DataEcosystems in Research           Towards unfolding todays and tomorrows           s...
nani gigantum humeris insidentes   Standing on the shouldes of giants     – Research builds on the past     – We pass on ...
Lying under a pile of text documents   .. with varying quality   .. with contradicting facts   .. with missing data   ...
Yes, we (think) we can...   Make Facts and Figures explicit, discoveralbe and comparable   Giving textually enCODED scie...
That‘s nice, but how?      Extract                                                          Analyse &                     ...
Extract & Integrate: Approach and Challenges   Extracting Structural Elements     – Tables     – Figures     – Sections a...
Extract & Integrate: Example                               Numerical Facts                                 Dimension/     ...
Extract & Integrate: Current Status                                                                                    Te...
Aggregate: Approach and Challenges   Representation and Storage     – Representation using the RDF Data Cube Vocabulary  ...
Aggregate: Current Status   Representation and Storage     – Data Model implemented     – Triplification of Benchmarking ...
Analyse: Approach and Challenges   Visual Analytics for Linked Scientific Facts     – RDF based description of visualisat...
Share: Approach and Challenges   Provenance     – Who published data?     – Who modified data?   Share aggregated data s...
Why should YOU do it?Marketplace concept for research data Users (=researchers) will be enabled to “sell” their analysis ...
integrate    crowdsource      extract &                      organise      visualise Find us, join us, ask us, help us    ...
Upcoming SlideShare
Loading in...5
×

I-Know presentation: CODE - Commerically empowered Linked Open Data Ecosystems in Research

1,501

Published on

Invited talk i gave at I-Know on our recently started FP 7 Project CODE (http://code-research.eu/)

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,501
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
3
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

I-Know presentation: CODE - Commerically empowered Linked Open Data Ecosystems in Research

  1. 1. Commercially empowered Linked Open DataEcosystems in Research Towards unfolding todays and tomorrows scientific treasures Michael Granitzer University of Passau FP 7 Strep No. 296150 1
  2. 2. nani gigantum humeris insidentes Standing on the shouldes of giants – Research builds on the past – We pass on knowledge, to create new knowledge Root of (Western) Society 2
  3. 3. Lying under a pile of text documents .. with varying quality .. with contradicting facts .. with missing data .. labour intensive to compare results Some examples – “Improvements that don’t add up” Armstrong et. al. 2009 – “Why most research results are false” Ioannidis, 2005 Can we do better? 3
  4. 4. Yes, we (think) we can... Make Facts and Figures explicit, discoveralbe and comparable Giving textually enCODED scientific knowledge, we can – Extract facts from research papers – Integrate those facts with existing knowledge – Make it available for (visual) analysis – Crowdsource Focus on – Empirical observations/facts – Linked Open Data – Computer Science and Biomedical Domain 4
  5. 5. That‘s nice, but how? Extract Analyse & Share & Aggregate & Integrate Organise Commercialise Dependency and Frequency Analysis Graph Depencies Machine Algorithm Learning CRF SVM Biomedical Data Set 1 Gesamtergebnis" Algorithms" (Leer)" SVM" Domain" DataSet2" Experiment" DataSet1" CRF" (Leer)" Biomedical" Gesamtergebnis" 0" 5" 10" 15" 20"Text, Linked Data Linked Scientific Fact Visual Analytics & Crowdsourcing & Experiments Data Warehouse Collaborative Marketplace mind-mapping 5
  6. 6. Extract & Integrate: Approach and Challenges Extracting Structural Elements – Tables – Figures – Sections and sub-sections Extracting Facts from Structural Elements – Entity extraction (e.g. algorithms, data sets, genes, significance levels etc.) – Fact extraction – <Entity, Relation, Measure> – Table Triplification Crowdsourcing Extraction – Extraction quality and domain knowledge remains a key issue  Empower users to maintain their own extraction model  Allow to semantically annotate research papers (e.g. entities, facts) Result: Semantically annotated scientific data as LOD Endpoint 6
  7. 7. Extract & Integrate: Example Numerical Facts Dimension/ Entity In-Document Context Ranking Facts 7
  8. 8. Extract & Integrate: Current Status  TeamBeam -PDF Structure Extraction – Structural elements – Focusing now on tables  Entity Extraction in work  First Prototypes for Table2RDFDataCube TeamBeam — Meta-Data Extraction from Scientific Literature By Roman Kern, Graz University of Technology; Kris Jack and Maya Hristakeva, Mendeley Ltd.; Michael Granitzer, University of Passau 8
  9. 9. Aggregate: Approach and Challenges Representation and Storage – Representation using the RDF Data Cube Vocabulary • Dimensions (e.g. Algorithms, Genes) • Measures (e.g. 0.3, 37) and Attributes (e.g. %, °) – Challenge 1: Ensure independency of dimensions – Challenge 2: Decentralized querying and aggregation http://www.w3.org/TR/vocab-data-cube/#ref_qb_measureType SPARQL Data Warehousing Wizard – Provide simple and intuitive Wizard for creating aggregation queries • Google-like starting point • Pivot table creation similar like in Spreadsheets – Store using RDF Data Cube Vocabulary Linked Scientific Fact Data Warehouse for non-IT Experts 9
  10. 10. Aggregate: Current Status Representation and Storage – Data Model implemented – Triplification of Benchmarking Data (e.g. CLEF, TPC-H etc.) We are looking for data SPARQL Data Warehousing Wizard 10
  11. 11. Analyse: Approach and Challenges Visual Analytics for Linked Scientific Facts – RDF based description of visualisations • Glue between data and single visualisations • Make visualisation state explicit • Share visualisation state – HTML 5 based visualisations and visualisation wizard 11
  12. 12. Share: Approach and Challenges Provenance – Who published data? – Who modified data? Share aggregated data sets and annotation models – Build on insights created by others – Re-use text annotation models Share visual analytics applications – Simple visualisations might be misleading – Sharing whole states of a visual analysis will reveal more details on certain decisions 12
  13. 13. Why should YOU do it?Marketplace concept for research data Users (=researchers) will be enabled to “sell” their analysis results (or give it away for free) Serveral concepts to be investigated: Revenue chains, roles, models (donations, paid subscription for data feeds, purchase etc.) Increased opportunities for researchers and research data 13
  14. 14. integrate crowdsource extract & organise visualise Find us, join us, ask us, help us http://code-research.eu/http://www.facebook.com/CODEresearchEU #CODEresearchEU
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×