Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
THE CHALLENGE OF
DEEPER KNOWLEDGE
GRAPHS FOR SCIENCEPAUL GROTH | @PGROTH | PGROTH.COM
CONTRIBUTIONS: RON DANIEL, MICHAEL L...
OUTLINE
▸Research Performance
▸Knowledge Graphs
▸Research as a low resource domain
▸Quality
Bloom, N., Jones, C. I., Van Reenen, J., & Webb,
M. (2017). Are ideas getting harder to find? (No.
w23782). National Burea...
Bloom, N., Jones, C. I., Van Reenen, J., & Webb,
M. (2017). Are ideas getting harder to find? (No.
w23782). National Burea...
Bloom, N., Jones, C. I., Van Reenen, J., & Webb,
M. (2017). Are ideas getting harder to find? (No.
w23782). National Burea...
Bloom, N., Jones, C. I., Van Reenen, J., & Webb,
M. (2017). Are ideas getting harder to find? (No.
w23782). National Burea...
Bloom, N., Jones, C. I., Van Reenen, J., & Webb,
M. (2017). Are ideas getting harder to find? (No.
w23782). National Burea...
WHY?
INFORMATION OVERLOAD
WHY?
IN PRACTICE
Gregory, K., Groth, P., Cousijn, H., Scharnhorst, A., & Wyatt, S. (2017).
Searching Data: A Review of Obs...
THE ROLE OF METADATA IN THE SECOND MACHINE AGE – DC-2016 / KØBENHAVN / 13 OCTOBER
ANSWERS ARE ABOUT THINGS, NOT JUST WORKS...
ENTER
KNOWLEDGE
GRAPHS
ERNST, PATRICK, ET AL. "DEEPLIFE: AN ENTITY-
AWARE SEARCH, ANALYTICS AND EXPLORATION
PLATFORM FOR H...
Knowledge Graphs: The
Science System
Knowledge Graphs:
Curated Databases
From: Wikidata as a semantic framework for the Gene Wiki initiative
Database (Oxford)....
RESEARCH IS
DIVERSE
http://knowescape.org/map-of-science-an-update/
15
Augenstein, Isabelle, et al. "SemEval 2017 Task 10:
ScienceIE-Extracting Keyphrases and Relations from
Scientific Publi...
UNSUPERVISED & DISTANT SUPERVISION
EXAMPLE: UNIVERSAL SCHEMAS AND REVERB
Groth et al., Applying Universal Schemas for Doma...
OPEN INFORMATION EXTRACTION IN SCIENCE IS
HARD
Open Information Extraction on Scientific Text: An Evaluation.
Paul Groth, ...
CROWDS ARE NOT EXPERTS
Use of Internal Testing Data to Help Determine Compensation for
Crowdsourcing Tasks
Michael Lauruhn...
TRANSFER LEARNING
Sujit Pal @ Elsevier Labs
TRANSFER LEARNING & MACHINE DEPENDENCIES
QUALITY IS
DEPENDENT
ON SOURCES
PROVENANCE
SOURCES AREN’T JUST DATA
Lauruhn, Michael, and Paul Groth. "Sources of
Change for Modern Knowledge Organization
Systems." ...
A MORE TRANSPARENT SUPPLY CHAIN
Groth, Paul, "Transparency and Reliability in the Data Supply
Chain," Internet Computing, ...
1) https://www.elsevier.com/connect/how-elsevier-is-breaking-down-barriers-
to-reproducibility
REPRODUCIBILITY AS QUALITY?
QUALITY AS MORE AUTOMATION
http://blog.booleanbiotech.com/genetic_engine
ering_pipeline_python.html “There are some catches too of course,
especially...
RESEARCH QUESTIONS
1. Does basic lab-based
biomedical research reuse
and assemble existing
methods, or is it primarily
foc...
RESULTS
DIRECTION: GROUNDING KNOWLEDGE GRAPHS IN
ACTIONS
http://www.researchobject.orghttps://smart-api.info
CONCLUSIONS
▸Knowledge Graphs are crucial for overcoming information overload in research
▸Research has less redundancy th...
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for Science
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
What to Upload to SlideShare
Next
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

2

Share

The Challenge of Deeper Knowledge Graphs for Science

Download to read offline

Over the past 5 years, we have seen multiple successes in the development of knowledge graphs for supporting science in domains ranging from drug discovery to social science. However, in order to really improve scientific productivity, we need to expand and deepen our knowledge graphs. To do so, I believe we need to address two critical challenges: 1) dealing with low resource domains; and 2) improving quality. In this talk, I describe these challenges in detail and discuss some efforts to overcome them through the application of techniques such as unsupervised learning; the use of non-experts in expert domains, and the integration of action-oriented knowledge (i.e. experiments) into knowledge graphs.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

The Challenge of Deeper Knowledge Graphs for Science

  1. 1. THE CHALLENGE OF DEEPER KNOWLEDGE GRAPHS FOR SCIENCEPAUL GROTH | @PGROTH | PGROTH.COM CONTRIBUTIONS: RON DANIEL, MICHAEL LAURUHN & @ELSEVIERLABS TEAM
  2. 2. OUTLINE ▸Research Performance ▸Knowledge Graphs ▸Research as a low resource domain ▸Quality
  3. 3. Bloom, N., Jones, C. I., Van Reenen, J., & Webb, M. (2017). Are ideas getting harder to find? (No. w23782). National Bureau of Economic Research. Slides: https://web.stanford.edu/~chadj/slides- ideas.pdf
  4. 4. Bloom, N., Jones, C. I., Van Reenen, J., & Webb, M. (2017). Are ideas getting harder to find? (No. w23782). National Bureau of Economic Research. Slides: https://web.stanford.edu/~chadj/slides- ideas.pdf
  5. 5. Bloom, N., Jones, C. I., Van Reenen, J., & Webb, M. (2017). Are ideas getting harder to find? (No. w23782). National Bureau of Economic Research. Slides: https://web.stanford.edu/~chadj/slides- ideas.pdf
  6. 6. Bloom, N., Jones, C. I., Van Reenen, J., & Webb, M. (2017). Are ideas getting harder to find? (No. w23782). National Bureau of Economic Research. Slides: https://web.stanford.edu/~chadj/slides- ideas.pdf
  7. 7. Bloom, N., Jones, C. I., Van Reenen, J., & Webb, M. (2017). Are ideas getting harder to find? (No. w23782). National Bureau of Economic Research. Slides: https://web.stanford.edu/~chadj/slides- ideas.pdf
  8. 8. WHY? INFORMATION OVERLOAD
  9. 9. WHY? IN PRACTICE Gregory, K., Groth, P., Cousijn, H., Scharnhorst, A., & Wyatt, S. (2017). Searching Data: A Review of Observational Data Retrieval Practices. arXiv preprint arXiv:1707.06937. Some observations from @gregory_km survey & interviews : • The needs and behaviors of specific user groups (e.g. early career researchers, policy makers, students) are not well documented. • Participants require details about data collection and handling • Reconstructing data tables from journal articles, using general search engines, and making direct data requests are common. K Gregory, H Cousijn, P Groth, A Scharnhorst, S Wyatt (2018). Understanding Data Retrieval Practices: A Social Informatics Perspective. arXiv preprint arXiv:1801.04971
  10. 10. THE ROLE OF METADATA IN THE SECOND MACHINE AGE – DC-2016 / KØBENHAVN / 13 OCTOBER ANSWERS ARE ABOUT THINGS, NOT JUST WORKS Why shouldn’t a search on an author return information about the author, including the author’s works? Where was the author born, when did she live, what is she known for? … All of this is possible, but only if we can make some fundamental changes in our approach to bibliographic description. ... The challenge for us lies in transforming what we can of our data into interrelated “things” without overindulging that metaphor. Coyle, K. (2016). FRBR, before and after: a look at our bibliographical models. Chicago: ALA Editions.
  11. 11. ENTER KNOWLEDGE GRAPHS ERNST, PATRICK, ET AL. "DEEPLIFE: AN ENTITY- AWARE SEARCH, ANALYTICS AND EXPLORATION PLATFORM FOR HEALTH AND LIFE SCIENCES." PROCEEDINGS OF ACL-2016 SYSTEM DEMONSTRATIONS (2016): 19-24.
  12. 12. Knowledge Graphs: The Science System
  13. 13. Knowledge Graphs: Curated Databases From: Wikidata as a semantic framework for the Gene Wiki initiative Database (Oxford). 2016;2016. doi:10.1093/database/baw015
  14. 14. RESEARCH IS DIVERSE http://knowescape.org/map-of-science-an-update/
  15. 15. 15 Augenstein, Isabelle, et al. "SemEval 2017 Task 10: ScienceIE-Extracting Keyphrases and Relations from Scientific Publications." Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). 2017. SCIENTIFIC TEXT IS CHALLENGING
  16. 16. UNSUPERVISED & DISTANT SUPERVISION EXAMPLE: UNIVERSAL SCHEMAS AND REVERB Groth et al., Applying Universal Schemas for Domain Specific Ontology Expansion http://www.akbc.ws/2016/papers/3_Paper.pdf • Successful in predicting new triples (F1 =~ .7) • ReVerb’s relations very interesting, but recall very low • Was not domain independent • Matched arguments against a medical ontology to improve precision • Predicted relations were restricted to relation types from the same ontology
  17. 17. OPEN INFORMATION EXTRACTION IN SCIENCE IS HARD Open Information Extraction on Scientific Text: An Evaluation. Paul Groth, Mike Lauruhn, Antony Scerri and Ron Daniel, Jr.. COLING 2018 Example: “The patient was treated with Emtricitabine, Etravirine, and Darunavir” ‣ (The patient :: was treated with :: Emtricitabine, Etravirine, and Darunavir) Another possible extraction is: ‣ (The patient :: was treated with :: Emtricitabine) ‣ (The patient :: was treated with :: Etravirine) ‣ (The patient :: was treated with :: Darunavir) 698 unique relation types – 400 relation types
  18. 18. CROWDS ARE NOT EXPERTS Use of Internal Testing Data to Help Determine Compensation for Crowdsourcing Tasks Michael Lauruhn, Paul Groth, Corey Harper, Helena Deus. HUML 2018
  19. 19. TRANSFER LEARNING Sujit Pal @ Elsevier Labs
  20. 20. TRANSFER LEARNING & MACHINE DEPENDENCIES
  21. 21. QUALITY IS DEPENDENT ON SOURCES
  22. 22. PROVENANCE
  23. 23. SOURCES AREN’T JUST DATA Lauruhn, Michael, and Paul Groth. "Sources of Change for Modern Knowledge Organization Systems." Knowledge Organization 43, no. 8 (2016).
  24. 24. A MORE TRANSPARENT SUPPLY CHAIN Groth, Paul, "Transparency and Reliability in the Data Supply Chain," Internet Computing, IEEE, vol.17, no.2, pp.69,71, March- April 2013 doi: 10.1109/MIC.2013.41
  25. 25. 1) https://www.elsevier.com/connect/how-elsevier-is-breaking-down-barriers- to-reproducibility REPRODUCIBILITY AS QUALITY?
  26. 26. QUALITY AS MORE AUTOMATION
  27. 27. http://blog.booleanbiotech.com/genetic_engine ering_pipeline_python.html “There are some catches too of course, especially since it's very early in the evolution of these tools. If it were the internet it would be around 1994”
  28. 28. RESEARCH QUESTIONS 1. Does basic lab-based biomedical research reuse and assemble existing methods, or is it primarily focused on the development of new techniques? 2. What existing methods are covered by robotic labs?
  29. 29. RESULTS
  30. 30. DIRECTION: GROUNDING KNOWLEDGE GRAPHS IN ACTIONS http://www.researchobject.orghttps://smart-api.info
  31. 31. CONCLUSIONS ▸Knowledge Graphs are crucial for overcoming information overload in research ▸Research has less redundancy than other domains ▸less resources and high diversity ▸challenge: effectively use general knowledge in these domains ▸Quality is central ▸turn towards processes and reproducibility as foundations
  • IoannisKiachopoulos

    Oct. 21, 2018
  • dgarijo

    Oct. 16, 2018

Over the past 5 years, we have seen multiple successes in the development of knowledge graphs for supporting science in domains ranging from drug discovery to social science. However, in order to really improve scientific productivity, we need to expand and deepen our knowledge graphs. To do so, I believe we need to address two critical challenges: 1) dealing with low resource domains; and 2) improving quality. In this talk, I describe these challenges in detail and discuss some efforts to overcome them through the application of techniques such as unsupervised learning; the use of non-experts in expert domains, and the integration of action-oriented knowledge (i.e. experiments) into knowledge graphs.

Views

Total views

1,113

On Slideshare

0

From embeds

0

Number of embeds

39

Actions

Downloads

26

Shares

0

Comments

0

Likes

2

×