Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

I-Know presentation: CODE - Commerically empowered Linked Open Data Ecosystems in Research


Published on

Invited talk i gave at I-Know on our recently started FP 7 Project CODE (

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

I-Know presentation: CODE - Commerically empowered Linked Open Data Ecosystems in Research

  1. 1. Commercially empowered Linked Open DataEcosystems in Research Towards unfolding todays and tomorrows scientific treasures Michael Granitzer University of Passau FP 7 Strep No. 296150 1
  2. 2. nani gigantum humeris insidentes Standing on the shouldes of giants – Research builds on the past – We pass on knowledge, to create new knowledge Root of (Western) Society 2
  3. 3. Lying under a pile of text documents .. with varying quality .. with contradicting facts .. with missing data .. labour intensive to compare results Some examples – “Improvements that don’t add up” Armstrong et. al. 2009 – “Why most research results are false” Ioannidis, 2005 Can we do better? 3
  4. 4. Yes, we (think) we can... Make Facts and Figures explicit, discoveralbe and comparable Giving textually enCODED scientific knowledge, we can – Extract facts from research papers – Integrate those facts with existing knowledge – Make it available for (visual) analysis – Crowdsource Focus on – Empirical observations/facts – Linked Open Data – Computer Science and Biomedical Domain 4
  5. 5. That‘s nice, but how? Extract Analyse & Share & Aggregate & Integrate Organise Commercialise Dependency and Frequency Analysis Graph Depencies Machine Algorithm Learning CRF SVM Biomedical Data Set 1 Gesamtergebnis" Algorithms" (Leer)" SVM" Domain" DataSet2" Experiment" DataSet1" CRF" (Leer)" Biomedical" Gesamtergebnis" 0" 5" 10" 15" 20"Text, Linked Data Linked Scientific Fact Visual Analytics & Crowdsourcing & Experiments Data Warehouse Collaborative Marketplace mind-mapping 5
  6. 6. Extract & Integrate: Approach and Challenges Extracting Structural Elements – Tables – Figures – Sections and sub-sections Extracting Facts from Structural Elements – Entity extraction (e.g. algorithms, data sets, genes, significance levels etc.) – Fact extraction – <Entity, Relation, Measure> – Table Triplification Crowdsourcing Extraction – Extraction quality and domain knowledge remains a key issue  Empower users to maintain their own extraction model  Allow to semantically annotate research papers (e.g. entities, facts) Result: Semantically annotated scientific data as LOD Endpoint 6
  7. 7. Extract & Integrate: Example Numerical Facts Dimension/ Entity In-Document Context Ranking Facts 7
  8. 8. Extract & Integrate: Current Status  TeamBeam -PDF Structure Extraction – Structural elements – Focusing now on tables  Entity Extraction in work  First Prototypes for Table2RDFDataCube TeamBeam — Meta-Data Extraction from Scientific Literature By Roman Kern, Graz University of Technology; Kris Jack and Maya Hristakeva, Mendeley Ltd.; Michael Granitzer, University of Passau 8
  9. 9. Aggregate: Approach and Challenges Representation and Storage – Representation using the RDF Data Cube Vocabulary • Dimensions (e.g. Algorithms, Genes) • Measures (e.g. 0.3, 37) and Attributes (e.g. %, °) – Challenge 1: Ensure independency of dimensions – Challenge 2: Decentralized querying and aggregation SPARQL Data Warehousing Wizard – Provide simple and intuitive Wizard for creating aggregation queries • Google-like starting point • Pivot table creation similar like in Spreadsheets – Store using RDF Data Cube Vocabulary Linked Scientific Fact Data Warehouse for non-IT Experts 9
  10. 10. Aggregate: Current Status Representation and Storage – Data Model implemented – Triplification of Benchmarking Data (e.g. CLEF, TPC-H etc.) We are looking for data SPARQL Data Warehousing Wizard 10
  11. 11. Analyse: Approach and Challenges Visual Analytics for Linked Scientific Facts – RDF based description of visualisations • Glue between data and single visualisations • Make visualisation state explicit • Share visualisation state – HTML 5 based visualisations and visualisation wizard 11
  12. 12. Share: Approach and Challenges Provenance – Who published data? – Who modified data? Share aggregated data sets and annotation models – Build on insights created by others – Re-use text annotation models Share visual analytics applications – Simple visualisations might be misleading – Sharing whole states of a visual analysis will reveal more details on certain decisions 12
  13. 13. Why should YOU do it?Marketplace concept for research data Users (=researchers) will be enabled to “sell” their analysis results (or give it away for free) Serveral concepts to be investigated: Revenue chains, roles, models (donations, paid subscription for data feeds, purchase etc.) Increased opportunities for researchers and research data 13
  14. 14. integrate crowdsource extract & organise visualise Find us, join us, ask us, help us #CODEresearchEU