This document discusses integrating Covid-19 bioassays into the Open Research Knowledge Graph to make their information more accessible and useful. It describes how bioassays can be manually structured or automatically classified using ontologies to extract key details as semantic triples. An example shows how three bioassays were semantically integrated into the ORKG to generate a comparison table. This helps scientists quickly understand assay findings versus traditional searching. The document invites collaborations on semantically structuring more bioassays and evaluating their hybrid automatic system.
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
Comparing Covid-19 Bioassays in the Open Research Knowledge Graph
1. See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/344170585
Integrating Covid-19 Bioassays in the Open Research Knowledge
Graph
Presentation · September 2020
DOI: 10.13140/RG.2.2.10022.55362
CITATIONS
0
READS
92
4 authors, including:
Some of the authors of this publication are also working on these related projects:
Free-form timber building structures View project
LiDaKrA View project
Marco Anteghini
12 PUBLICATIONS 27 CITATIONS
SEE PROFILE
Jennifer D'Souza
Leibniz Information Centre for Science and Technology University Library
55 PUBLICATIONS 335 CITATIONS
SEE PROFILE
All content following this page was uploaded by Jennifer D'Souza on 14 September 2020.
The user has requested enhancement of the downloaded file.
2. Anita Monteverdi, Marco Anteghini, Jennifer D’Souza1, Sören Auer1
1Technische Informationsbibliothek (TIB)
Welfengarten 1B // 30167 Hannover
Integrating Covid-19 Bioassays in
the Open Research Knowledge Graph
4. ● Bioassays measure "the potency of any stimulus, physical, chemical, or
biological, by means of the reactions that it produces in living matter".1
● Standard biochemical test procedures to determine the concentration or
potency of a substance by its effect on living cells or tissues.2 They result
in a detailed quantification of the stimulus by observing their effects on
living animals (in vivo) or tissue/cell culture systems (in vitro).
● A typical bioassay involves a stimulus (e.g., chemicals, drug compounds)
applied to a subject (e.g., animals, tissues, plants) and a response (e.g.,
death) of the subject is triggered and measured.
3 of 17
Bioassays: what are they?
[1] Finney, D. J. 1952a. Statistical Method in Biological Assay. New York: Hafner Publishing.
[2] Hoskins, W. M.; Craig, R. (1962-01-01). "Uses of Bioassay in Entomology". Annual Review of Entomology. 7 (1): 437–464.
doi:10.1146/annurev.en.07.010162.002253. ISSN 0066-4170. PMID 14449182.
5. ● Developed to study the potency of insecticides during the early twentieth century at
Rothamstead Station in England.
● Sir R. A. Fisher and other statisticians developed experimental designs and
basic statistical procedures in collaboration with toxicologists and entomologists
(Bliss, 1934a,b; Finney, 1952a; Gaddum, 1933).
● Prior to the 1970s, bioassays were used only to measure the toxicity of particular
chemicals in, for example, medical, pharmacological, or agricultural studies (McKee
and Wolf, 1963; Sprague, 1969).
● During the 1970s, the EPA began to use bioassays to determine the effect of
chemicals to establish the water-quality criteria in the context of sewage treatments
(e.g., United States Environmental Protection Agency, 1973, 1986).
● Over the last decade, bioassays and biostatistical analysis have become more
important in effectively controlling the quality of biopharmaceutical development and
manufacturing.
4 of 17
Bioassays: a brief historical context
7. What if ...
● The global scientific knowledge base would be more than a document repository
● Scientific information and knowledge would be FAIR also for machines
○ The FAIR data principles are a set of guiding principles in order to make
scientific data findable, accessible, interoperable, and reusable in the current
digital ecosystem (Wilkinson et al, 2016)
● Currently
○ Findability could be better
○ Assuming OA, accessibility is OK
○ Interoperability and Reusability is non-existent
● The problem: The scholarly communications format is stuck in the last century
○ While documents have been digitized as pdfs, other areas such as ecommerce
and navigation maps have seen a transformative digitalization with knowledge
graphs.
6 of 17
8. Open Research Knowledge Graph: what is it?
● It is a next-generation digital library (DL) that focuses on ingesting information in
scholarly articles as machine-actionable knowledge graphs (KG).
● In it, an article is represented with both (bibliographic) metadata and semantic
descriptions (as subject-predicate-object triples) of their contributions.
● scientific knowledge elements in approaches and methods, materials and
results, otherwise buried in document text, are made explicitly machine
actionable as graph nodes and links
7 of 17
9. Open Research Knowledge Graph: platform benefits
Leveraging the ORKG platform has a number of advantages as:
1) it enables flexible semantic content modeling (i.e., ontologized or not, depending on
the user or domain);
2) it semantifies contributions at various levels of granularity from shallow to fine-
grained; and
3) it publishes persistent KG links per article contribution that it contains.
8 of 17
11. Covid-19 Bioassays in the Open Research Knowledge Graph
● In the context of the current pandemic, bioassays are a key pillar in disease
investigation and drugs discovery.
o worldwide over billions are invested in such Covid19 research.
● Problem: Bioassays documented as text.
● A early 20th century scientist would've been able to read the whole bioassay
canon when they were new. But this is no longer feasible even in narrow
studies with the burgeoning volume of bioassays.
● Intelligent computational tools must be made available to scientists, particularly
in crises times, to assist them in massive knowledge ingestion scenarios to
quickly grasp just the highlights of findings (or bioassay results) thereby
enabling rapid discoveries.
● This is precisely the computational support targeted in our work that the
ORKG Digital Library enables over bioassays represented as a
Knowledge Graphs instead of text, consequently highlighting the benefits
of digitalizing bioassays and of the ORKG DL platform.
10 of 17
12. Covid-19 Bioassays in the Open Research Knowledge Graph:
how can bioassays be integrated?
● Manual Digitalization Workflow: structuring key information units from bioassay description
text as subject-predicate-object triples based on the Bioassay Ontology (BAO).1
Examples of triples: (CONTRIBUTION, HAS ASSAY FORMAT, TISSUE-BASED FORMAT),
(CONTRIBUTION, HAS ASSAY METHOD, REPORTER GENE), among others.
● Recommended Step: associating each ontologized resource (i.e., a subject, a predicate, an
object) with a URI as its defining class in the original ontology, which for bioassays is the BAO.
11 of 17
1. Visser, U., Abeyruwan, S., Vempati, U., Smith, R.P., Lemmon, V., Sch ̈urer, S.C.:Bioassay ontology (bao): a semantic description of bioassays and high-throughputscreening
results. BMC bioinformatics12(1), 257 (2011)
13. ● Can be browsed on the ORKG DL at the following
persistent link: https://www.orkg.org/orkg/paper/R48178,
or by searching the title
● This example bioassay1 was semantified on eight
properties based on the BAO
• can have as many as 30 properties or more
depending on the bioassay content
Covid-19 Bioassays in the Open Research Knowledge Graph:
EXAMPLE OF A MANUALLY SEMANTIFIED BIOASSAY IN ORKG
12 of 17
[1] National Center for Biotechnology Information (2020). PubChem Bioassay Record for AID 2522, Source: Broad Institute. Retrieved September 8, 2020
from https://pubchem.ncbi.nlm.nih.gov/bioassay/2522.
14. Covid-19 Bioassays in the Open Research Knowledge Graph:
how can bioassays be integrated?
● Hybrid Digitalization Workflow (WIP): given a new bioassay text input, to implement a two-
step workflow as follows: 1) an automated semantifier; and 2) a human-in-the-loop curation of
the predicted labels either by the assay author or a dedicated curator.
● Advantage: Unlike the manual workflow, this presents a much easier and less time-intensive
task for the human.
● They would be merely selecting the correctly predicted triples, deleting the incorrect ones,
or defining new ones as needed. Assuming a well-trained machine learning module, the
latter two steps may be entirely omitted.
13 of 17
15. Covid-19 Bioassays in the Open Research Knowledge Graph:
AS A SOLUTION TO THE MASSIVE BIOASSAY INFORMATION INGESTION HURDLE
14 of 17
● Premise: We needed an intelligent computational processing tool that can be used by
biomedical practitioners to quickly comprehend bioassays' key properties.
● The ORKG Digital Library has a computational feature to generate and publish surveys in the
form of tabulated comparisons from the semantified articles’ Knowledge Graph nodes.
To demonstrate this feature, we integrated three separate semantified bioassays by the
manual workflow into the ORKG.
When we apply the comparison feature over the three assay KGs, we get
16. Covid-19 Bioassays in the Open Research Knowledge Graph
SOLVING THE INFORMATION INGESTION HURDLE: COMPARISON SURVEYS ACROSS
KG-BASED BIOASSAYS
15 of 17
● The assay graph nodes aggregated in tabulated
comparisons.
● This computation aligns closely with the notion of
the traditional survey articles, except it is fully
automated and operates on machine-actionable
knowledge elements.
● The BAO-semantified assays are compared side-by-side
on their graph nodes.
● Thus, tracking the progress on bioassays, can be
eased from a task of several days to a few minutes
shaving off precious time and labor that the
scientist would have spent in the current traditional
search paradigm in identifying the key findings of
individual bioassays and then having to recall them
comparatively.
17. ● Discovery of cures during pandemics such as Covid-19 which we currently face can be greatly
expedited if scientists are given intelligent information access tools
● To this end, we propose „Covid-19 Bioassays in the Open Research Knowledge Graph“
● Outlined two separate workflows for integrating bioassay text as knowledge graphs in the
ORKG DL
● Demonstrated advanced computational functionality (e.g., automatic survey generation) that
scholarly knowledge represented as knowledge graphs enable.
● We invite collaborations! on the following topics: 1) semantically structuring bioassays in the
ORKG; 2) user evaluation of our hybrid system for automatically structuring bioassay data.
● If interested, please contact me jennifer.dsouza@tib.eu,
Conclusion: Takeaways
16 of 17
18. Questions?
Thank you for your attention!
Call For Participation: NLPContributionGraph Shared Task 11
at SemEval 2021 (https://ncg-task.github.io/)
• Structuring Scholarly NLP Contributions in the ORKG automatically
• Begins 1st October until February next year
View publication stats
View publication stats