SlideShare a Scribd company logo
1 of 36
Download to read offline
THE CHALLENGE OF
DEEPER KNOWLEDGE
GRAPHS FOR SCIENCEPAUL GROTH | @PGROTH | PGROTH.COM
CONTRIBUTIONS: RON DANIEL, MICHAEL LAURUHN & @ELSEVIERLABS TEAM
OUTLINE
▸Research Performance
▸Knowledge Graphs
▸Research as a low resource domain
▸Quality
Bloom, N., Jones, C. I., Van Reenen, J., & Webb,
M. (2017). Are ideas getting harder to find? (No.
w23782). National Bureau of Economic
Research.
Slides: https://web.stanford.edu/~chadj/slides-
ideas.pdf
Bloom, N., Jones, C. I., Van Reenen, J., & Webb,
M. (2017). Are ideas getting harder to find? (No.
w23782). National Bureau of Economic
Research.
Slides: https://web.stanford.edu/~chadj/slides-
ideas.pdf
Bloom, N., Jones, C. I., Van Reenen, J., & Webb,
M. (2017). Are ideas getting harder to find? (No.
w23782). National Bureau of Economic
Research.
Slides: https://web.stanford.edu/~chadj/slides-
ideas.pdf
Bloom, N., Jones, C. I., Van Reenen, J., & Webb,
M. (2017). Are ideas getting harder to find? (No.
w23782). National Bureau of Economic
Research.
Slides: https://web.stanford.edu/~chadj/slides-
ideas.pdf
Bloom, N., Jones, C. I., Van Reenen, J., & Webb,
M. (2017). Are ideas getting harder to find? (No.
w23782). National Bureau of Economic
Research.
Slides: https://web.stanford.edu/~chadj/slides-
ideas.pdf
WHY?
INFORMATION OVERLOAD
WHY?
IN PRACTICE
Gregory, K., Groth, P., Cousijn, H., Scharnhorst, A., & Wyatt, S. (2017).
Searching Data: A Review of Observational Data Retrieval Practices.
arXiv preprint arXiv:1707.06937.
Some observations from @gregory_km
survey & interviews :
• The needs and behaviors of specific user groups (e.g.
early career researchers, policy makers, students) are
not well documented.
• Participants require details about data collection and
handling
• Reconstructing data tables from journal articles,
using general search engines, and making direct
data requests are common.
K Gregory, H Cousijn, P Groth, A Scharnhorst, S Wyatt (2018).
Understanding Data Retrieval Practices: A Social Informatics Perspective.
arXiv preprint arXiv:1801.04971
THE ROLE OF METADATA IN THE SECOND MACHINE AGE – DC-2016 / KØBENHAVN / 13 OCTOBER
ANSWERS ARE ABOUT THINGS, NOT JUST WORKS
Why shouldn’t a search on an author return information
about the author, including the author’s works? Where
was the author born, when did she live, what is she
known for? … All of this is possible, but only if we can
make some fundamental changes in our approach to
bibliographic description. ... The challenge for us lies in
transforming what we can of our data into
interrelated “things” without overindulging that
metaphor.
Coyle, K. (2016). FRBR, before and after: a look at our bibliographical
models. Chicago: ALA Editions.
ENTER
KNOWLEDGE
GRAPHS
ERNST, PATRICK, ET AL. "DEEPLIFE: AN ENTITY-
AWARE SEARCH, ANALYTICS AND EXPLORATION
PLATFORM FOR HEALTH AND LIFE SCIENCES."
PROCEEDINGS OF ACL-2016 SYSTEM
DEMONSTRATIONS (2016): 19-24.
Knowledge Graphs: The
Science System
Knowledge Graphs:
Curated Databases
From: Wikidata as a semantic framework for the Gene Wiki initiative
Database (Oxford). 2016;2016. doi:10.1093/database/baw015
RESEARCH IS
DIVERSE
http://knowescape.org/map-of-science-an-update/
15
Augenstein, Isabelle, et al. "SemEval 2017 Task 10:
ScienceIE-Extracting Keyphrases and Relations from
Scientific Publications." Proceedings of the 11th
International Workshop on Semantic Evaluation
(SemEval-2017). 2017.
SCIENTIFIC TEXT IS CHALLENGING
UNSUPERVISED & DISTANT SUPERVISION
EXAMPLE: UNIVERSAL SCHEMAS AND REVERB
Groth et al., Applying Universal Schemas for Domain Specific Ontology Expansion http://www.akbc.ws/2016/papers/3_Paper.pdf
• Successful in predicting new triples
(F1 =~ .7)
• ReVerb’s relations very interesting,
but recall very low
• Was not domain independent
• Matched arguments against a
medical ontology to improve
precision
• Predicted relations were restricted
to relation types from the same
ontology
OPEN INFORMATION EXTRACTION IN SCIENCE IS
HARD
Open Information Extraction on Scientific Text: An Evaluation.
Paul Groth, Mike Lauruhn, Antony Scerri and Ron Daniel, Jr.. COLING
2018
Example:
“The patient was treated with Emtricitabine,
Etravirine, and Darunavir”
‣ (The patient :: was treated with :: Emtricitabine,
Etravirine, and Darunavir)
Another possible extraction is:
‣ (The patient :: was treated with :: Emtricitabine)
‣ (The patient :: was treated with :: Etravirine)
‣ (The patient :: was treated with :: Darunavir)
698 unique relation types – 400 relation types
CROWDS ARE NOT EXPERTS
Use of Internal Testing Data to Help Determine Compensation for
Crowdsourcing Tasks
Michael Lauruhn, Paul Groth, Corey Harper, Helena Deus. HUML 2018
TRANSFER LEARNING
Sujit Pal @ Elsevier Labs
TRANSFER LEARNING & MACHINE DEPENDENCIES
QUALITY IS
DEPENDENT
ON SOURCES
PROVENANCE
SOURCES AREN’T JUST DATA
Lauruhn, Michael, and Paul Groth. "Sources of
Change for Modern Knowledge Organization
Systems." Knowledge Organization 43, no. 8
(2016).
A MORE TRANSPARENT SUPPLY CHAIN
Groth, Paul, "Transparency and Reliability in the Data Supply
Chain," Internet Computing, IEEE, vol.17, no.2, pp.69,71, March-
April 2013 doi: 10.1109/MIC.2013.41
1) https://www.elsevier.com/connect/how-elsevier-is-breaking-down-barriers-
to-reproducibility
REPRODUCIBILITY AS QUALITY?
QUALITY AS MORE AUTOMATION
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for Science
http://blog.booleanbiotech.com/genetic_engine
ering_pipeline_python.html “There are some catches too of course,
especially since it's very early in the
evolution of these tools. If it were the
internet it would be around 1994”
The Challenge of Deeper Knowledge Graphs for Science
RESEARCH QUESTIONS
1. Does basic lab-based
biomedical research reuse
and assemble existing
methods, or is it primarily
focused on the development
of new techniques?
2. What existing methods are
covered by robotic labs?
RESULTS
DIRECTION: GROUNDING KNOWLEDGE GRAPHS IN
ACTIONS
http://www.researchobject.orghttps://smart-api.info
CONCLUSIONS
▸Knowledge Graphs are crucial for overcoming information overload in research
▸Research has less redundancy than other domains
▸less resources and high diversity
▸challenge: effectively use general knowledge in these domains
▸Quality is central
▸turn towards processes and reproducibility as foundations

More Related Content

What's hot

Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph FuturesPaul Groth
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph MaintenancePaul Groth
 
The need for a transparent data supply chain
The need for a transparent data supply chainThe need for a transparent data supply chain
The need for a transparent data supply chainPaul Groth
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataPaul Groth
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Paul Groth
 
Content + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningContent + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningPaul Groth
 
From Data Search to Data Showcasing
From Data Search to Data ShowcasingFrom Data Search to Data Showcasing
From Data Search to Data ShowcasingPaul Groth
 
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsPaul Groth
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the WebRinke Hoekstra
 
An Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities DataAn Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities DataRinke Hoekstra
 
Data Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionData Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionUniversity of Washington
 
Prov-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance VisualizationProv-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance VisualizationRinke Hoekstra
 
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Rinke Hoekstra
 
Data, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data ScienceData, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data ScienceUniversity of Washington
 
Elsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphElsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphPaul Groth
 
Data Discovery and Visualization
Data Discovery and VisualizationData Discovery and Visualization
Data Discovery and VisualizationDr. Neil Brittliff
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformaticsc.titus.brown
 

What's hot (20)

Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph Futures
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
The need for a transparent data supply chain
The need for a transparent data supply chainThe need for a transparent data supply chain
The need for a transparent data supply chain
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture Data
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
 
Content + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningContent + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learning
 
From Data Search to Data Showcasing
From Data Search to Data ShowcasingFrom Data Search to Data Showcasing
From Data Search to Data Showcasing
 
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the Web
 
An Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities DataAn Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities Data
 
Data Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionData Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data Interaction
 
Prov-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance VisualizationProv-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance Visualization
 
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
 
Urban Data Science at UW
Urban Data Science at UWUrban Data Science at UW
Urban Data Science at UW
 
Science Data, Responsibly
Science Data, ResponsiblyScience Data, Responsibly
Science Data, Responsibly
 
Data, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data ScienceData, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data Science
 
Elsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphElsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge Graph
 
Democratizing Data Science in the Cloud
Democratizing Data Science in the CloudDemocratizing Data Science in the Cloud
Democratizing Data Science in the Cloud
 
Data Discovery and Visualization
Data Discovery and VisualizationData Discovery and Visualization
Data Discovery and Visualization
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformatics
 

Similar to The Challenge of Deeper Knowledge Graphs for Science

What is the reproducibility crisis in science and what can we do about it?
What is the reproducibility crisis in science and what can we do about it?What is the reproducibility crisis in science and what can we do about it?
What is the reproducibility crisis in science and what can we do about it?Dorothy Bishop
 
Mitigating microaggressions in virtual reference
Mitigating microaggressions in virtual referenceMitigating microaggressions in virtual reference
Mitigating microaggressions in virtual referenceLynn Connaway
 
Who to believe: How epistemic cognition can inform science communication (key...
Who to believe: How epistemic cognition can inform science communication (key...Who to believe: How epistemic cognition can inform science communication (key...
Who to believe: How epistemic cognition can inform science communication (key...Simon Knight
 
The Landscape of Citizen Science
The Landscape of Citizen ScienceThe Landscape of Citizen Science
The Landscape of Citizen ScienceDarlene Cavalier
 
Rare (and emergent) disciplines in the light of science studies
Rare (and emergent) disciplines in the light of science studiesRare (and emergent) disciplines in the light of science studies
Rare (and emergent) disciplines in the light of science studiesAndrea Scharnhorst
 
Viewing universities as landscapes of scholarship, VIVO keynote, 2017-08-04
Viewing universities as landscapes of scholarship, VIVO keynote, 2017-08-04Viewing universities as landscapes of scholarship, VIVO keynote, 2017-08-04
Viewing universities as landscapes of scholarship, VIVO keynote, 2017-08-04jodischneider
 
Data publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminarData publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminarCarly Strasser
 
Love for science or 'Academic Prostitution' - DFD2014 version
Love for science or 'Academic Prostitution' - DFD2014 versionLove for science or 'Academic Prostitution' - DFD2014 version
Love for science or 'Academic Prostitution' - DFD2014 versionLourdes Verdes-Montenegro
 
Love for science or Academic prostitution, 2019 update
Love for science or Academic prostitution, 2019 updateLove for science or Academic prostitution, 2019 update
Love for science or Academic prostitution, 2019 updateLourdes Verdes-Montenegro
 
Bias and the Data Lifecycle
Bias and the Data LifecycleBias and the Data Lifecycle
Bias and the Data LifecycleRichard Ferrers
 
Day1 Civic Science Lab: Experts in the Policymaking Process & Models of Scien...
Day1 Civic Science Lab: Experts in the Policymaking Process & Models of Scien...Day1 Civic Science Lab: Experts in the Policymaking Process & Models of Scien...
Day1 Civic Science Lab: Experts in the Policymaking Process & Models of Scien...Matthew Nisbet
 
Love for science or 'Academic Prostitution' - IAA version
Love for science or 'Academic Prostitution' - IAA versionLove for science or 'Academic Prostitution' - IAA version
Love for science or 'Academic Prostitution' - IAA versionLourdes Verdes-Montenegro
 
The future of scholarly publishing
The future of scholarly publishingThe future of scholarly publishing
The future of scholarly publishingBjörn Brembs
 
Watching the workers: researching information behaviours in, and for, workplaces
Watching the workers: researching information behaviours in, and for, workplacesWatching the workers: researching information behaviours in, and for, workplaces
Watching the workers: researching information behaviours in, and for, workplacesHazel Hall
 
Slideshare Presentation of Qualitative Data
Slideshare   Presentation of Qualitative DataSlideshare   Presentation of Qualitative Data
Slideshare Presentation of Qualitative DataDavin Marcus Raja
 
Open data: Enhancing preservation, reproducibility, and innovation
Open data: Enhancing preservation, reproducibility, and innovationOpen data: Enhancing preservation, reproducibility, and innovation
Open data: Enhancing preservation, reproducibility, and innovationciakov
 
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...GigaScience, BGI Hong Kong
 
The Impact Of Scientific Research On The Development And...
The Impact Of Scientific Research On The Development And...The Impact Of Scientific Research On The Development And...
The Impact Of Scientific Research On The Development And...Sandra Arveseth
 

Similar to The Challenge of Deeper Knowledge Graphs for Science (20)

What is the reproducibility crisis in science and what can we do about it?
What is the reproducibility crisis in science and what can we do about it?What is the reproducibility crisis in science and what can we do about it?
What is the reproducibility crisis in science and what can we do about it?
 
Mitigating microaggressions in virtual reference
Mitigating microaggressions in virtual referenceMitigating microaggressions in virtual reference
Mitigating microaggressions in virtual reference
 
Who to believe: How epistemic cognition can inform science communication (key...
Who to believe: How epistemic cognition can inform science communication (key...Who to believe: How epistemic cognition can inform science communication (key...
Who to believe: How epistemic cognition can inform science communication (key...
 
The Landscape of Citizen Science
The Landscape of Citizen ScienceThe Landscape of Citizen Science
The Landscape of Citizen Science
 
Rare (and emergent) disciplines in the light of science studies
Rare (and emergent) disciplines in the light of science studiesRare (and emergent) disciplines in the light of science studies
Rare (and emergent) disciplines in the light of science studies
 
Viewing universities as landscapes of scholarship, VIVO keynote, 2017-08-04
Viewing universities as landscapes of scholarship, VIVO keynote, 2017-08-04Viewing universities as landscapes of scholarship, VIVO keynote, 2017-08-04
Viewing universities as landscapes of scholarship, VIVO keynote, 2017-08-04
 
Lecture #01
Lecture #01Lecture #01
Lecture #01
 
Data publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminarData publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminar
 
Love for science or 'Academic Prostitution' - DFD2014 version
Love for science or 'Academic Prostitution' - DFD2014 versionLove for science or 'Academic Prostitution' - DFD2014 version
Love for science or 'Academic Prostitution' - DFD2014 version
 
Love for science or Academic prostitution, 2019 update
Love for science or Academic prostitution, 2019 updateLove for science or Academic prostitution, 2019 update
Love for science or Academic prostitution, 2019 update
 
Bias and the Data Lifecycle
Bias and the Data LifecycleBias and the Data Lifecycle
Bias and the Data Lifecycle
 
Day1 Civic Science Lab: Experts in the Policymaking Process & Models of Scien...
Day1 Civic Science Lab: Experts in the Policymaking Process & Models of Scien...Day1 Civic Science Lab: Experts in the Policymaking Process & Models of Scien...
Day1 Civic Science Lab: Experts in the Policymaking Process & Models of Scien...
 
Love for science or 'Academic Prostitution' - IAA version
Love for science or 'Academic Prostitution' - IAA versionLove for science or 'Academic Prostitution' - IAA version
Love for science or 'Academic Prostitution' - IAA version
 
The future of scholarly publishing
The future of scholarly publishingThe future of scholarly publishing
The future of scholarly publishing
 
Watching the workers: researching information behaviours in, and for, workplaces
Watching the workers: researching information behaviours in, and for, workplacesWatching the workers: researching information behaviours in, and for, workplaces
Watching the workers: researching information behaviours in, and for, workplaces
 
Slideshare Presentation of Qualitative Data
Slideshare   Presentation of Qualitative DataSlideshare   Presentation of Qualitative Data
Slideshare Presentation of Qualitative Data
 
Open data: Enhancing preservation, reproducibility, and innovation
Open data: Enhancing preservation, reproducibility, and innovationOpen data: Enhancing preservation, reproducibility, and innovation
Open data: Enhancing preservation, reproducibility, and innovation
 
Sci am 10.2014
Sci am 10.2014Sci am 10.2014
Sci am 10.2014
 
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
 
The Impact Of Scientific Research On The Development And...
The Impact Of Scientific Research On The Development And...The Impact Of Scientific Research On The Development And...
The Impact Of Scientific Research On The Development And...
 

More from Paul Groth

Data Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIData Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIPaul Groth
 
Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsPaul Groth
 
Progressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationProgressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationPaul Groth
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicinePaul Groth
 
Are we finally ready for transclusion?*
Are we finally ready for transclusion?*Are we finally ready for transclusion?*
Are we finally ready for transclusion?*Paul Groth
 
Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsPaul Groth
 
Structured Data & the Future of Educational Material
Structured Data & the Future of Educational MaterialStructured Data & the Future of Educational Material
Structured Data & the Future of Educational MaterialPaul Groth
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkPaul Groth
 
Data for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchersData for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchersPaul Groth
 
Tradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance CaptureTradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance CapturePaul Groth
 
Knowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaKnowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaPaul Groth
 
Information architecture at Elsevier
Information architecture at ElsevierInformation architecture at Elsevier
Information architecture at ElsevierPaul Groth
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging EnvironmentsPaul Groth
 

More from Paul Groth (13)

Data Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIData Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AI
 
Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domains
 
Progressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationProgressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computation
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicine
 
Are we finally ready for transclusion?*
Are we finally ready for transclusion?*Are we finally ready for transclusion?*
Are we finally ready for transclusion?*
 
Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization Systems
 
Structured Data & the Future of Educational Material
Structured Data & the Future of Educational MaterialStructured Data & the Future of Educational Material
Structured Data & the Future of Educational Material
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic Framework
 
Data for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchersData for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchers
 
Tradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance CaptureTradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance Capture
 
Knowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaKnowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPedia
 
Information architecture at Elsevier
Information architecture at ElsevierInformation architecture at Elsevier
Information architecture at Elsevier
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging Environments
 

Recently uploaded

UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3DianaGray10
 
UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4DianaGray10
 
TrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie WorldTrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie WorldTrustArc
 
The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)IES VE
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameKapil Thakar
 
Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...DianaGray10
 
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl
 
Keep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES LiveKeep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES LiveIES VE
 
Automation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsAutomation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsDianaGray10
 
3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud DataEric D. Schabell
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxNeo4j
 
Extra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdfExtra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdfInfopole1
 
From the origin to the future of Open Source model and business
From the origin to the future of  Open Source model and businessFrom the origin to the future of  Open Source model and business
From the origin to the future of Open Source model and businessFrancesco Corti
 
Scenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosScenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosErol GIRAUDY
 
20140402 - Smart house demo kit
20140402 - Smart house demo kit20140402 - Smart house demo kit
20140402 - Smart house demo kitJamie (Taka) Wang
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0DanBrown980551
 
Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxOracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxSatishbabu Gunukula
 
2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdfThe Good Food Institute
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTxtailishbaloch
 
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfQ4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfTejal81
 

Recently uploaded (20)

UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3
 
UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4
 
TrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie WorldTrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie World
 
The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First Frame
 
Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...
 
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile Brochure
 
Keep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES LiveKeep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES Live
 
Automation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsAutomation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projects
 
3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
 
Extra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdfExtra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdf
 
From the origin to the future of Open Source model and business
From the origin to the future of  Open Source model and businessFrom the origin to the future of  Open Source model and business
From the origin to the future of Open Source model and business
 
Scenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosScenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenarios
 
20140402 - Smart house demo kit
20140402 - Smart house demo kit20140402 - Smart house demo kit
20140402 - Smart house demo kit
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0
 
Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxOracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptx
 
2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
 
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfQ4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
 

The Challenge of Deeper Knowledge Graphs for Science

  • 1. THE CHALLENGE OF DEEPER KNOWLEDGE GRAPHS FOR SCIENCEPAUL GROTH | @PGROTH | PGROTH.COM CONTRIBUTIONS: RON DANIEL, MICHAEL LAURUHN & @ELSEVIERLABS TEAM
  • 3. Bloom, N., Jones, C. I., Van Reenen, J., & Webb, M. (2017). Are ideas getting harder to find? (No. w23782). National Bureau of Economic Research. Slides: https://web.stanford.edu/~chadj/slides- ideas.pdf
  • 4. Bloom, N., Jones, C. I., Van Reenen, J., & Webb, M. (2017). Are ideas getting harder to find? (No. w23782). National Bureau of Economic Research. Slides: https://web.stanford.edu/~chadj/slides- ideas.pdf
  • 5. Bloom, N., Jones, C. I., Van Reenen, J., & Webb, M. (2017). Are ideas getting harder to find? (No. w23782). National Bureau of Economic Research. Slides: https://web.stanford.edu/~chadj/slides- ideas.pdf
  • 6. Bloom, N., Jones, C. I., Van Reenen, J., & Webb, M. (2017). Are ideas getting harder to find? (No. w23782). National Bureau of Economic Research. Slides: https://web.stanford.edu/~chadj/slides- ideas.pdf
  • 7. Bloom, N., Jones, C. I., Van Reenen, J., & Webb, M. (2017). Are ideas getting harder to find? (No. w23782). National Bureau of Economic Research. Slides: https://web.stanford.edu/~chadj/slides- ideas.pdf
  • 9. WHY? IN PRACTICE Gregory, K., Groth, P., Cousijn, H., Scharnhorst, A., & Wyatt, S. (2017). Searching Data: A Review of Observational Data Retrieval Practices. arXiv preprint arXiv:1707.06937. Some observations from @gregory_km survey & interviews : • The needs and behaviors of specific user groups (e.g. early career researchers, policy makers, students) are not well documented. • Participants require details about data collection and handling • Reconstructing data tables from journal articles, using general search engines, and making direct data requests are common. K Gregory, H Cousijn, P Groth, A Scharnhorst, S Wyatt (2018). Understanding Data Retrieval Practices: A Social Informatics Perspective. arXiv preprint arXiv:1801.04971
  • 10. THE ROLE OF METADATA IN THE SECOND MACHINE AGE – DC-2016 / KØBENHAVN / 13 OCTOBER ANSWERS ARE ABOUT THINGS, NOT JUST WORKS Why shouldn’t a search on an author return information about the author, including the author’s works? Where was the author born, when did she live, what is she known for? … All of this is possible, but only if we can make some fundamental changes in our approach to bibliographic description. ... The challenge for us lies in transforming what we can of our data into interrelated “things” without overindulging that metaphor. Coyle, K. (2016). FRBR, before and after: a look at our bibliographical models. Chicago: ALA Editions.
  • 11. ENTER KNOWLEDGE GRAPHS ERNST, PATRICK, ET AL. "DEEPLIFE: AN ENTITY- AWARE SEARCH, ANALYTICS AND EXPLORATION PLATFORM FOR HEALTH AND LIFE SCIENCES." PROCEEDINGS OF ACL-2016 SYSTEM DEMONSTRATIONS (2016): 19-24.
  • 13. Knowledge Graphs: Curated Databases From: Wikidata as a semantic framework for the Gene Wiki initiative Database (Oxford). 2016;2016. doi:10.1093/database/baw015
  • 15. 15 Augenstein, Isabelle, et al. "SemEval 2017 Task 10: ScienceIE-Extracting Keyphrases and Relations from Scientific Publications." Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). 2017. SCIENTIFIC TEXT IS CHALLENGING
  • 16. UNSUPERVISED & DISTANT SUPERVISION EXAMPLE: UNIVERSAL SCHEMAS AND REVERB Groth et al., Applying Universal Schemas for Domain Specific Ontology Expansion http://www.akbc.ws/2016/papers/3_Paper.pdf • Successful in predicting new triples (F1 =~ .7) • ReVerb’s relations very interesting, but recall very low • Was not domain independent • Matched arguments against a medical ontology to improve precision • Predicted relations were restricted to relation types from the same ontology
  • 17. OPEN INFORMATION EXTRACTION IN SCIENCE IS HARD Open Information Extraction on Scientific Text: An Evaluation. Paul Groth, Mike Lauruhn, Antony Scerri and Ron Daniel, Jr.. COLING 2018 Example: “The patient was treated with Emtricitabine, Etravirine, and Darunavir” ‣ (The patient :: was treated with :: Emtricitabine, Etravirine, and Darunavir) Another possible extraction is: ‣ (The patient :: was treated with :: Emtricitabine) ‣ (The patient :: was treated with :: Etravirine) ‣ (The patient :: was treated with :: Darunavir) 698 unique relation types – 400 relation types
  • 18. CROWDS ARE NOT EXPERTS Use of Internal Testing Data to Help Determine Compensation for Crowdsourcing Tasks Michael Lauruhn, Paul Groth, Corey Harper, Helena Deus. HUML 2018
  • 19. TRANSFER LEARNING Sujit Pal @ Elsevier Labs
  • 20. TRANSFER LEARNING & MACHINE DEPENDENCIES
  • 23. SOURCES AREN’T JUST DATA Lauruhn, Michael, and Paul Groth. "Sources of Change for Modern Knowledge Organization Systems." Knowledge Organization 43, no. 8 (2016).
  • 24. A MORE TRANSPARENT SUPPLY CHAIN Groth, Paul, "Transparency and Reliability in the Data Supply Chain," Internet Computing, IEEE, vol.17, no.2, pp.69,71, March- April 2013 doi: 10.1109/MIC.2013.41
  • 26. QUALITY AS MORE AUTOMATION
  • 31. http://blog.booleanbiotech.com/genetic_engine ering_pipeline_python.html “There are some catches too of course, especially since it's very early in the evolution of these tools. If it were the internet it would be around 1994”
  • 33. RESEARCH QUESTIONS 1. Does basic lab-based biomedical research reuse and assemble existing methods, or is it primarily focused on the development of new techniques? 2. What existing methods are covered by robotic labs?
  • 35. DIRECTION: GROUNDING KNOWLEDGE GRAPHS IN ACTIONS http://www.researchobject.orghttps://smart-api.info
  • 36. CONCLUSIONS ▸Knowledge Graphs are crucial for overcoming information overload in research ▸Research has less redundancy than other domains ▸less resources and high diversity ▸challenge: effectively use general knowledge in these domains ▸Quality is central ▸turn towards processes and reproducibility as foundations

Editor's Notes

  1. Work with dans Reviewed 400 papers deep dive 114
  2. Cloud based labs provide remote access to frequently used experimental equipment Able to support increasingly complex protocols (e.g. transcriptic.com , emerald cloud lab)