I-Know presentation: CODE - Commerically empowered Linked Open Data Ecosystems in Research

Commercially empowered Linked Open Data
Ecosystems in Research
Towards unfolding todays and tomorrows
scientific treasures

Michael Granitzer
University of Passau

FP 7 Strep No. 296150
1

nani gigantum humeris insidentes
 Standing on the shouldes of giants
– Research builds on the past
– We pass on knowledge, to create
new knowledge

Root of (Western) Society

2

Lying under a pile of text documents
 .. with varying quality
 .. with contradicting facts
 .. with missing data
 .. labour intensive to compare results
 Some examples
– “Improvements that don’t add up”
Armstrong et. al. 2009

– “Why most research results are false”
Ioannidis, 2005

Can we do better?

3

Yes, we (think) we can...
 Make Facts and Figures explicit, discoveralbe and comparable

 Giving textually enCODED scientific knowledge, we can
– Extract facts from research papers
– Integrate those facts with existing knowledge
– Make it available for (visual) analysis
– Crowdsource

 Focus on
– Empirical observations/facts
– Linked Open Data
– Computer Science and Biomedical Domain

4

That‘s nice, but how?

Extract Analyse & Share &
Aggregate
& Integrate Organise Commercialise

Dependency and Frequency Analysis

Graph Depencies
Machine
Algorithm
Learning

CRF SVM

Biomedical
Data Set 1

Gesamtergebnis"
Algorithms"
(Leer)"
SVM" Domain"
DataSet2"
Experiment"
DataSet1"
CRF" (Leer)"
Biomedical" Gesamtergebnis"
0" 5" 10" 15" 20"

Text, Linked Data Linked Scientific Fact Visual Analytics & Crowdsourcing &
Experiments Data Warehouse Collaborative Marketplace
mind-mapping
5

Extract & Integrate: Approach and Challenges
 Extracting Structural Elements
– Tables
– Figures
– Sections and sub-sections
 Extracting Facts from Structural Elements
– Entity extraction (e.g. algorithms, data sets, genes, significance levels etc.)
– Fact extraction – <Entity, Relation, Measure>
– Table Triplification
 Crowdsourcing Extraction
– Extraction quality and domain knowledge remains a key issue
 Empower users to maintain their own extraction model
 Allow to semantically annotate research papers (e.g. entities, facts)

 Result: Semantically annotated scientific data as LOD Endpoint

6

Extract & Integrate: Example
Numerical Facts

Dimension/
Entity

In-Document
Context

Ranking Facts

7

Extract & Integrate: Current Status
 TeamBeam -PDF
Structure Extraction
– Structural elements
– Focusing now on
tables

 Entity Extraction in work

 First Prototypes for
Table2RDFDataCube

TeamBeam — Meta-Data Extraction from Scientific Literature
By Roman Kern, Graz University of Technology; Kris Jack and Maya Hristakeva, Mendeley Ltd.; Michael
Granitzer, University of Passau 8

Aggregate: Approach and Challenges
 Representation and Storage
– Representation using the RDF Data Cube Vocabulary
• Dimensions (e.g. Algorithms, Genes)
• Measures (e.g. 0.3, 37) and Attributes (e.g. %, °)
– Challenge 1: Ensure independency of dimensions
– Challenge 2: Decentralized querying and aggregation
http://www.w3.org/TR/vocab-data-cube/#ref_qb_measureType

 SPARQL Data Warehousing Wizard
– Provide simple and intuitive Wizard for creating aggregation queries
• Google-like starting point
• Pivot table creation similar like in Spreadsheets
– Store using RDF Data Cube Vocabulary

 Linked Scientific Fact Data Warehouse for non-IT Experts

9

Aggregate: Current Status
 Representation and Storage
– Data Model implemented
– Triplification of Benchmarking Data (e.g. CLEF, TPC-H etc.)
We are looking for data

 SPARQL Data Warehousing Wizard

10

Analyse: Approach and Challenges
 Visual Analytics for Linked Scientific Facts
– RDF based description of visualisations
• Glue between data and single visualisations
• Make visualisation state explicit
• Share visualisation state

– HTML 5 based visualisations and visualisation wizard

11

Share: Approach and Challenges
 Provenance
– Who published data?
– Who modified data?

 Share aggregated data sets and annotation models
– Build on insights created by others
– Re-use text annotation models

 Share visual analytics applications
– Simple visualisations might be misleading
– Sharing whole states of a visual analysis will reveal
more details on certain decisions

12

Why should YOU do it?

Marketplace concept for research data
 Users (=researchers) will be enabled to “sell” their analysis results
(or give it away for free)
 Serveral concepts to be investigated: Revenue chains, roles, models
(donations, paid subscription for data feeds, purchase etc.)
 Increased opportunities for researchers and research data
13

integrate crowdsource

extract &
organise
visualise

Find us, join us, ask us, help us
http://code-research.eu/
http://www.facebook.com/CODEresearchEU
#CODEresearchEU

I-Know presentation: CODE - Commerically empowered Linked Open Data Ecosystems in Research

Recommended

Recommended

More Related Content

Similar to I-Know presentation: CODE - Commerically empowered Linked Open Data Ecosystems in Research

Similar to I-Know presentation: CODE - Commerically empowered Linked Open Data Ecosystems in Research (20)

Recently uploaded

Recently uploaded (20)

I-Know presentation: CODE - Commerically empowered Linked Open Data Ecosystems in Research