SlideShare a Scribd company logo
1 of 14
Commercially empowered Linked Open Data
Ecosystems in Research
           Towards unfolding todays and tomorrows
           scientific treasures

           Michael Granitzer
           University of Passau




                                          FP 7 Strep No. 296150
                                                                  1
nani gigantum humeris insidentes
   Standing on the shouldes of giants
     – Research builds on the past
     – We pass on knowledge, to create
       new knowledge




     Root of (Western) Society




                                         2
Lying under a pile of text documents
   .. with varying quality
   .. with contradicting facts
   .. with missing data
   .. labour intensive to compare results
   Some examples
    – “Improvements that don’t add up”
       Armstrong et. al. 2009

    – “Why most research results are false”
       Ioannidis, 2005




       Can we do better?


                                              3
Yes, we (think) we can...
   Make Facts and Figures explicit, discoveralbe and comparable

   Giving textually enCODED scientific knowledge, we can
    –   Extract facts from research papers
    –   Integrate those facts with existing knowledge
    –   Make it available for (visual) analysis
    –   Crowdsource


   Focus on
    – Empirical observations/facts
    – Linked Open Data
    – Computer Science and Biomedical Domain



                                                                   4
That‘s nice, but how?

      Extract                                                          Analyse &                                                        Share &
                         Aggregate
    & Integrate                                                        Organise                                                       Commercialise




                                             Dependency and Frequency Analysis

                                                                                        Graph Depencies
                                                                                                                           Machine
                                                                                              Algorithm
                                                                                                                           Learning




                                                                                                                  CRF        SVM


                                                                                              Biomedical
                                                                                                                        Data Set 1




                                               Gesamtergebnis"
                                                                                                Algorithms"
                                                        (Leer)"
                                                         SVM"                                   Domain"
                                                     DataSet2"
                                                                                                Experiment"
                                                     DataSet1"
                                                          CRF"                                  (Leer)"
                                                   Biomedical"                                  Gesamtergebnis"
                                                                  0"   5"   10"   15"   20"




Text, Linked Data   Linked Scientific Fact                    Visual Analytics &                                                        Crowdsourcing &
  Experiments          Data Warehouse                           Collaborative                                                             Marketplace
                                                               mind-mapping
                                                                                                                                                          5
Extract & Integrate: Approach and Challenges
   Extracting Structural Elements
     – Tables
     – Figures
     – Sections and sub-sections
   Extracting Facts from Structural Elements
     – Entity extraction (e.g. algorithms, data sets, genes, significance levels etc.)
     – Fact extraction – <Entity, Relation, Measure>
     – Table Triplification
   Crowdsourcing Extraction
     – Extraction quality and domain knowledge remains a key issue
      Empower users to maintain their own extraction model
      Allow to semantically annotate research papers (e.g. entities, facts)


   Result: Semantically annotated scientific data as LOD Endpoint


                                                                                         6
Extract & Integrate: Example
                               Numerical Facts

                                 Dimension/
                                   Entity

                                 In-Document
                                    Context




               Ranking Facts




                                                 7
Extract & Integrate: Current Status
                                                                                    TeamBeam -PDF
                                                                                     Structure Extraction
                                                                                      – Structural elements
                                                                                      – Focusing now on
                                                                                         tables

                                                                                    Entity Extraction in work



                                                                                    First Prototypes for
                                                                                     Table2RDFDataCube




         TeamBeam — Meta-Data Extraction from Scientific Literature
         By Roman Kern, Graz University of Technology; Kris Jack and Maya Hristakeva, Mendeley Ltd.; Michael
         Granitzer, University of Passau                                                                         8
Aggregate: Approach and Challenges
   Representation and Storage
     – Representation using the RDF Data Cube Vocabulary
         • Dimensions (e.g. Algorithms, Genes)
         • Measures (e.g. 0.3, 37) and Attributes (e.g. %, °)
     – Challenge 1: Ensure independency of dimensions
     – Challenge 2: Decentralized querying and aggregation
                                                                http://www.w3.org/TR/vocab-data-cube/#ref_qb_measureType




   SPARQL Data Warehousing Wizard
     – Provide simple and intuitive Wizard for creating aggregation queries
         • Google-like starting point
         • Pivot table creation similar like in Spreadsheets
     – Store using RDF Data Cube Vocabulary

 Linked Scientific Fact Data Warehouse for non-IT Experts


                                                                                                                           9
Aggregate: Current Status
   Representation and Storage
     – Data Model implemented
     – Triplification of Benchmarking Data (e.g. CLEF, TPC-H etc.)
     We are looking for data

   SPARQL Data Warehousing Wizard




                                                                     10
Analyse: Approach and Challenges
   Visual Analytics for Linked Scientific Facts
     – RDF based description of visualisations
         • Glue between data and single visualisations
         • Make visualisation state explicit
         • Share visualisation state

     – HTML 5 based visualisations and visualisation wizard




                                                              11
Share: Approach and Challenges
   Provenance
     – Who published data?
     – Who modified data?

   Share aggregated data sets and annotation models
     – Build on insights created by others
     – Re-use text annotation models

   Share visual analytics applications
     – Simple visualisations might be misleading
     – Sharing whole states of a visual analysis will reveal
       more details on certain decisions




                                                               12
Why should YOU do it?




Marketplace concept for research data
 Users (=researchers) will be enabled to “sell” their analysis results
  (or give it away for free)
 Serveral concepts to be investigated: Revenue chains, roles, models
  (donations, paid subscription for data feeds, purchase etc.)
 Increased opportunities for researchers and research data
                                                                          13
integrate    crowdsource




      extract &
                      organise
      visualise




 Find us, join us, ask us, help us
         http://code-research.eu/
http://www.facebook.com/CODEresearchEU
           #CODEresearchEU

More Related Content

Similar to I-Know presentation: CODE - Commerically empowered Linked Open Data Ecosystems in Research

Business Intelligence Applications: Build or Buy Evaluation and IBM Cognos Demo
Business Intelligence Applications:  Build or Buy Evaluation and IBM Cognos DemoBusiness Intelligence Applications:  Build or Buy Evaluation and IBM Cognos Demo
Business Intelligence Applications: Build or Buy Evaluation and IBM Cognos DemoSenturus
 
March 2009 DIA Janus Update
March 2009 DIA Janus UpdateMarch 2009 DIA Janus Update
March 2009 DIA Janus Updateolivaa
 
Semantic Web powering Enterprise and Web Applications
Semantic Web powering Enterprise and Web ApplicationsSemantic Web powering Enterprise and Web Applications
Semantic Web powering Enterprise and Web ApplicationsAmit Sheth
 
Analytics capability framework viramdas 201212 ssnet
Analytics capability framework viramdas 201212 ssnetAnalytics capability framework viramdas 201212 ssnet
Analytics capability framework viramdas 201212 ssnetVishwanath Ramdas
 
제1회 Korea Community Day 발표자료 Bigdata
제1회 Korea Community Day 발표자료 Bigdata 제1회 Korea Community Day 발표자료 Bigdata
제1회 Korea Community Day 발표자료 Bigdata Gruter
 
Big Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureBig Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureOdinot Stanislas
 
Bug deBug Chennai 2012 Talk - Driving innovation using pattern based thinking...
Bug deBug Chennai 2012 Talk - Driving innovation using pattern based thinking...Bug deBug Chennai 2012 Talk - Driving innovation using pattern based thinking...
Bug deBug Chennai 2012 Talk - Driving innovation using pattern based thinking...RIA RUI Society
 
"Cost/Benefit Case for Enterprise Warehouse Solutions"
"Cost/Benefit Case for Enterprise Warehouse Solutions""Cost/Benefit Case for Enterprise Warehouse Solutions"
"Cost/Benefit Case for Enterprise Warehouse Solutions"IBM India Smarter Computing
 
Data mining process powerpoint presentation templates.
Data mining process powerpoint presentation templates.Data mining process powerpoint presentation templates.
Data mining process powerpoint presentation templates.SlideTeam.net
 
Data mining strategy powerpoint ppt templates.
Data mining strategy powerpoint ppt templates.Data mining strategy powerpoint ppt templates.
Data mining strategy powerpoint ppt templates.SlideTeam.net
 
Data mining process powerpoint ppt templates.
Data mining process powerpoint ppt templates.Data mining process powerpoint ppt templates.
Data mining process powerpoint ppt templates.SlideTeam.net
 
Data mining strategy powerpoint presentation templates.
Data mining strategy powerpoint presentation templates.Data mining strategy powerpoint presentation templates.
Data mining strategy powerpoint presentation templates.SlideTeam.net
 
Data mining process powerpoint presentation slides.
Data mining process powerpoint presentation slides.Data mining process powerpoint presentation slides.
Data mining process powerpoint presentation slides.SlideTeam.net
 
Data mining strategy powerpoint ppt slides.
Data mining strategy powerpoint ppt slides.Data mining strategy powerpoint ppt slides.
Data mining strategy powerpoint ppt slides.SlideTeam.net
 
Infosys - Supply Chain Analytics Services | Solution
Infosys - Supply Chain Analytics Services | SolutionInfosys - Supply Chain Analytics Services | Solution
Infosys - Supply Chain Analytics Services | SolutionInfosys
 
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learningParadigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learningSalford Systems
 

Similar to I-Know presentation: CODE - Commerically empowered Linked Open Data Ecosystems in Research (20)

Business Intelligence Applications: Build or Buy Evaluation and IBM Cognos Demo
Business Intelligence Applications:  Build or Buy Evaluation and IBM Cognos DemoBusiness Intelligence Applications:  Build or Buy Evaluation and IBM Cognos Demo
Business Intelligence Applications: Build or Buy Evaluation and IBM Cognos Demo
 
March 2009 DIA Janus Update
March 2009 DIA Janus UpdateMarch 2009 DIA Janus Update
March 2009 DIA Janus Update
 
Data mining
Data miningData mining
Data mining
 
Semantic Web powering Enterprise and Web Applications
Semantic Web powering Enterprise and Web ApplicationsSemantic Web powering Enterprise and Web Applications
Semantic Web powering Enterprise and Web Applications
 
101 ab 1345-1415
101 ab 1345-1415101 ab 1345-1415
101 ab 1345-1415
 
101 ab 1345-1415
101 ab 1345-1415101 ab 1345-1415
101 ab 1345-1415
 
Analytics capability framework viramdas 201212 ssnet
Analytics capability framework viramdas 201212 ssnetAnalytics capability framework viramdas 201212 ssnet
Analytics capability framework viramdas 201212 ssnet
 
제1회 Korea Community Day 발표자료 Bigdata
제1회 Korea Community Day 발표자료 Bigdata 제1회 Korea Community Day 발표자료 Bigdata
제1회 Korea Community Day 발표자료 Bigdata
 
Big Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureBig Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the Future
 
Bug deBug Chennai 2012 Talk - Driving innovation using pattern based thinking...
Bug deBug Chennai 2012 Talk - Driving innovation using pattern based thinking...Bug deBug Chennai 2012 Talk - Driving innovation using pattern based thinking...
Bug deBug Chennai 2012 Talk - Driving innovation using pattern based thinking...
 
"Cost/Benefit Case for Enterprise Warehouse Solutions"
"Cost/Benefit Case for Enterprise Warehouse Solutions""Cost/Benefit Case for Enterprise Warehouse Solutions"
"Cost/Benefit Case for Enterprise Warehouse Solutions"
 
Predictive analytics
Predictive analytics Predictive analytics
Predictive analytics
 
Data mining process powerpoint presentation templates.
Data mining process powerpoint presentation templates.Data mining process powerpoint presentation templates.
Data mining process powerpoint presentation templates.
 
Data mining strategy powerpoint ppt templates.
Data mining strategy powerpoint ppt templates.Data mining strategy powerpoint ppt templates.
Data mining strategy powerpoint ppt templates.
 
Data mining process powerpoint ppt templates.
Data mining process powerpoint ppt templates.Data mining process powerpoint ppt templates.
Data mining process powerpoint ppt templates.
 
Data mining strategy powerpoint presentation templates.
Data mining strategy powerpoint presentation templates.Data mining strategy powerpoint presentation templates.
Data mining strategy powerpoint presentation templates.
 
Data mining process powerpoint presentation slides.
Data mining process powerpoint presentation slides.Data mining process powerpoint presentation slides.
Data mining process powerpoint presentation slides.
 
Data mining strategy powerpoint ppt slides.
Data mining strategy powerpoint ppt slides.Data mining strategy powerpoint ppt slides.
Data mining strategy powerpoint ppt slides.
 
Infosys - Supply Chain Analytics Services | Solution
Infosys - Supply Chain Analytics Services | SolutionInfosys - Supply Chain Analytics Services | Solution
Infosys - Supply Chain Analytics Services | Solution
 
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learningParadigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
 

Recently uploaded

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 

I-Know presentation: CODE - Commerically empowered Linked Open Data Ecosystems in Research

  • 1. Commercially empowered Linked Open Data Ecosystems in Research Towards unfolding todays and tomorrows scientific treasures Michael Granitzer University of Passau FP 7 Strep No. 296150 1
  • 2. nani gigantum humeris insidentes  Standing on the shouldes of giants – Research builds on the past – We pass on knowledge, to create new knowledge Root of (Western) Society 2
  • 3. Lying under a pile of text documents  .. with varying quality  .. with contradicting facts  .. with missing data  .. labour intensive to compare results  Some examples – “Improvements that don’t add up” Armstrong et. al. 2009 – “Why most research results are false” Ioannidis, 2005 Can we do better? 3
  • 4. Yes, we (think) we can...  Make Facts and Figures explicit, discoveralbe and comparable  Giving textually enCODED scientific knowledge, we can – Extract facts from research papers – Integrate those facts with existing knowledge – Make it available for (visual) analysis – Crowdsource  Focus on – Empirical observations/facts – Linked Open Data – Computer Science and Biomedical Domain 4
  • 5. That‘s nice, but how? Extract Analyse & Share & Aggregate & Integrate Organise Commercialise Dependency and Frequency Analysis Graph Depencies Machine Algorithm Learning CRF SVM Biomedical Data Set 1 Gesamtergebnis" Algorithms" (Leer)" SVM" Domain" DataSet2" Experiment" DataSet1" CRF" (Leer)" Biomedical" Gesamtergebnis" 0" 5" 10" 15" 20" Text, Linked Data Linked Scientific Fact Visual Analytics & Crowdsourcing & Experiments Data Warehouse Collaborative Marketplace mind-mapping 5
  • 6. Extract & Integrate: Approach and Challenges  Extracting Structural Elements – Tables – Figures – Sections and sub-sections  Extracting Facts from Structural Elements – Entity extraction (e.g. algorithms, data sets, genes, significance levels etc.) – Fact extraction – <Entity, Relation, Measure> – Table Triplification  Crowdsourcing Extraction – Extraction quality and domain knowledge remains a key issue  Empower users to maintain their own extraction model  Allow to semantically annotate research papers (e.g. entities, facts)  Result: Semantically annotated scientific data as LOD Endpoint 6
  • 7. Extract & Integrate: Example Numerical Facts Dimension/ Entity In-Document Context Ranking Facts 7
  • 8. Extract & Integrate: Current Status  TeamBeam -PDF Structure Extraction – Structural elements – Focusing now on tables  Entity Extraction in work  First Prototypes for Table2RDFDataCube TeamBeam — Meta-Data Extraction from Scientific Literature By Roman Kern, Graz University of Technology; Kris Jack and Maya Hristakeva, Mendeley Ltd.; Michael Granitzer, University of Passau 8
  • 9. Aggregate: Approach and Challenges  Representation and Storage – Representation using the RDF Data Cube Vocabulary • Dimensions (e.g. Algorithms, Genes) • Measures (e.g. 0.3, 37) and Attributes (e.g. %, °) – Challenge 1: Ensure independency of dimensions – Challenge 2: Decentralized querying and aggregation http://www.w3.org/TR/vocab-data-cube/#ref_qb_measureType  SPARQL Data Warehousing Wizard – Provide simple and intuitive Wizard for creating aggregation queries • Google-like starting point • Pivot table creation similar like in Spreadsheets – Store using RDF Data Cube Vocabulary  Linked Scientific Fact Data Warehouse for non-IT Experts 9
  • 10. Aggregate: Current Status  Representation and Storage – Data Model implemented – Triplification of Benchmarking Data (e.g. CLEF, TPC-H etc.) We are looking for data  SPARQL Data Warehousing Wizard 10
  • 11. Analyse: Approach and Challenges  Visual Analytics for Linked Scientific Facts – RDF based description of visualisations • Glue between data and single visualisations • Make visualisation state explicit • Share visualisation state – HTML 5 based visualisations and visualisation wizard 11
  • 12. Share: Approach and Challenges  Provenance – Who published data? – Who modified data?  Share aggregated data sets and annotation models – Build on insights created by others – Re-use text annotation models  Share visual analytics applications – Simple visualisations might be misleading – Sharing whole states of a visual analysis will reveal more details on certain decisions 12
  • 13. Why should YOU do it? Marketplace concept for research data  Users (=researchers) will be enabled to “sell” their analysis results (or give it away for free)  Serveral concepts to be investigated: Revenue chains, roles, models (donations, paid subscription for data feeds, purchase etc.)  Increased opportunities for researchers and research data 13
  • 14. integrate crowdsource extract & organise visualise Find us, join us, ask us, help us http://code-research.eu/ http://www.facebook.com/CODEresearchEU #CODEresearchEU