Towards knowledge maintenance in scientific digital libraries with the keystone framework -jcdl2020--2020-08-02

Towards Knowledge Maintenance in Scientific
Digital Libraries with the Keystone Framework
Yuanxi Fu & Jodi Schneider
School of Information Sciences
University of Illinois at Urbana-Champaign
Presentation for JCDL 2020, Virtual, 2020-08-02

Motivation: Need for KNOWLEDGE MAINTENANCE
• When you’re not an expert, how do you judge papers?
– Recency
– Citation count
– Other heuristics
• Literature becomes obsolete but usually that’s not explicit.
• These errors can be passed on & can lead to misinterpretations.
– Most research needs information from cognate fields.

How big is retraction & citation to retraction
• Over 600,000 articles directly cite a retracted paper.
• The Retraction Watch Database lists over 19,000 retracted
publications.
• In biomedicine 94% of retracted papers have received at least
one citation, with an average citation count of 35 (Dinh, …, Schneider 2019).

Motivating Questions
• Are papers citing a retracted paper
necessarily wrong?
• Does it matter when citing authors make
use of a paper whose findings are no longer
considered valid?
• When DOES the citation matter?
• Could we selectively alert authors who cite a
retracted or abandoned paper?

Introducing the Keystone Framework

Under our framework:
1) A scientific research paper puts forward at
least one main finding, along with a
logical argument, giving reasons and
evidence to support the main finding.
2) The main finding is accepted (or not) on
the basis of the logical argument.
3) Evidence from earlier literature may be
incorporated into the argument by citing a
paper and presenting it as support, using a
citation context.

Main Finding
Arguments
support
support
Data Methods Citations …
Applying the Framework
Step 1, model the argument

Citing ArticleCited Article
Citation Context
“Many papers with
known validity problems
are highly cited [3].”
Our
paper
[3] = Bar-Ilan, J.
and Halevi, G.
2018. Temporal
characteristics
of retracted
articles
Applying the Framework:
Step 2, find the citation contexts that contribute to
the argument

Step 3, analyze the citation context:
How many items are cited?
Citation Context
“Many papers with
Singleton
Cluster
“Digital library
applications of
argumentation theory
include argument-based
retrieval [21, 30]”

Cited Article
Citation Context
“Many papers with
[3] = Bar-Ilan, J.
and Halevi, G.
2018. Temporal
characteristics
of retracted
articles
Step 4, analyze the cited article:
What kind of support does it give?
Main Findings Support

Applying the Keystone Framework to
Knowledge Maintenance Tasks

Case Study 1: Citing Non-reproducible Code
Assess the real impact of
citing an unreliable paper

Case Study 2: Citations Supporting One Paper’s Argument
Curate a high impact paper
(citation count > 1000) to find its
keystone citations
(de Calignon et al., 2012)

Experts & Non-experts
Step 4
Flag those articles
that are potentially
impacted.
Workflow for Assessing the Citations of an Unreliable Paper
Step 2
The domain expert
develops a list of screening
questions.
Step 3
Experts/non-
experts/text mining
tools screen target
articles using the check
list.
Step 1
The domain expert
develops a generalized
argument model.
Experts

Workflow for Targeted Curation of Important Papers
Step 1
Identify main findings of the
article from abstract,
conclusion section, and
section titles.
Step 2
Construct an argument
diagram for each main
finding.
Experts
Step 3
Align citations and
citation contexts to
components in the
argument diagrams.
Step 4
Identify and categorize
keystone citation
contexts and keystone
citations.

Results from Case Study 1
Our assessment # of papers Why? What to do?
Unaffected by the code glitch 6 Didn’t directly use the protocol.
Cited it to either support
decisions they made in their
calculations or use it as
background information.
Nothing!
Affected by the code glitch 4 Followed the protocol.
Calculations supported claims
that went into abstracts or the
conclusion section.
Authors of the 4 potentially
affected papers should
double-check their results
and either amend their
claims or document how
the claims are sustained
despite the code glitch.

Results from Case Study 2
• 51 citations in the whole paper
• 5 singleton keystone citations, including
• 3 main finding singleton keystone citations support the choice of experimental
materials
• 1 pass-through keystone citation support the choice of experimental materials
• 1 singleton main finding keystone citation support the interpretation of data
• One keystone citation cluster with 3 reference items supports the
choice of experimental method.

An ad-hoc literature review from
pass-through keystone citation 1
was used to support the choice of
PSD-95 as a synaptic marker.

Research Agenda
• Scale up: Large-scale identification of keystone citations
– Test argument-based curation approaches that have already been
automated (e.g. rhetorical-based approaches, Teufel & Kan, 2009)
– Develop text mining tools to aid manual curation and screening
• Understand citation behaviors, esp. pass-through citations
• Develop a taxonomy of validity for indicating the confidence of
a reader can have in relying on or reusing the methods and
findings of a paper

References
Bar-Ilan, J. and Halevi, G. 2018. Temporal characteristics of retracted articles. Scientometrics. 116, 3 (Jun.
2018), 1771–1783. https://doi.org/10.1007/s11192-018-2802-y
Clark, T., Ciccarese, P., & Goble, C.A. (2014). Micropublications: a semantic model for claims, evidence,
arguments and annotations in biomedical communications. Journal of Biomedical Semantics, 5, 28.
https://doi.org/10.1186/2041-1480-5-28
Dinh, L., Sarol, J., Cheng, Y., Hsiao, T., Parulian, N.N., & Schneider, J. (2019). Systematic Examination of Pre- and
Post-Retraction Citations. Proceedings of ASIST, 56(1), 390-394 https://doi.org/10.1002/pra2.35
Green, N.L. (2017). Argumentation Mining in Scientific Discourse. CMNA@ICAIL.
http://ceur-ws.org/Vol-2048/paper02.pdf
Greenberg, S.A. (2009). How citation distortions create unfounded authority: analysis of a citation network.
The BMJ, 339, b2680. https://doi.org/10.1136/bmj.b2680
Teufel, S., & Kan, M. (2009). Robust Argumentative Zoning for Sensemaking in Scholarly Documents.
NLP4DL/AT4DL, 154-170. https://doi.org/10.1007/978-3-642-23160-5_10
de Calignon, A., Polydoro, M., Suárez-Calvet, M., William, C., Adamowicz, D. H., Kopeikina, K. J., Pitstick, R.,
Sahara, N., Ashe, K. H., Carlson, G. A., Spires-Jones, T. L., & Hyman, B. T. (2012). Propagation of tau
pathology in a model of early Alzheimer's disease. Neuron, 73(4), 685–697.
https://doi.org/10.1016/j.neuron.2011.11.033

Appendix A
Keystone Citations Found in Case Study 2

Three Types of Keystone Citation Contexts Observed
Properties of the
keystone citation
context
Removing the citation
context would weaken the
argument supporting a
main finding
Only one
paper is
cited.
The main findings of the
cited paper(s) provide
evidence to support the
argument.
Corresponding
cited article
Singleton, main-
findings support
+ + + Main-finding
keystone citation
Cluster, main-findings
support
+ - + Main-finding
keystone citation
cluster
Singleton, pass-
through support
+ + - Pass-through
keystone citation

Three main-finding keystone
citations supported the choice of
experimental materials.
Main-finding keystone citation to
support choice of material

The fourth main-finding
keystone citation supports
the interpretation of
experimental data.
Main-finding keystone citation to
support data interpretation

An ad-hoc literature review from
pass-through keystone citation 1
was used to support the choice of
PSD-95 as a synaptic marker.
Main-finding keystone citation cluster
to support material

A main-finding keystone citation
cluster supports the choice of
experimental method.
Main-finding keystone citation cluster
to support method

Appendix B
The Keystone Framework

Concepts in the Framework
• Keystone statement: any statement whose unreliability
threatens the argument for a main finding of a paper.
• Keystone citation context: citation contexts supporting
keystone statements.
– singleton vs. cluster citation context.
• A singleton citation context cites one item, e.g. ‘[2]’
• A cluster citation context cites multiple items, e.g., ‘[2, 16]’ or ‘(DeKosky and
Scheff, 1990; Scheff and Price, 2006; Terry et al., 1991)’.

• Keystone statement: any statement whose unreliability
threatens the argument for a main finding of a paper.
• Keystone citation context: citation contexts supporting
keystone statements.
– Number of supporting items
• A singleton citation context cites one item, e.g. ‘[2]’
• A cluster citation context cites multiple items, e.g., ‘[2, 16]’ or
‘(DeKosky and Scheff, 1990; Scheff and Price, 2006; Terry et al., 1991)’.

• Keystone statement: any statement whose unreliability threatens
the argument for a main finding of a paper.
• Keystone citation context: citation contexts supporting keystone
statements.
– Strength of support
• Main-findings support, if the citation context closely relates to a main finding of the
cited item.
• Pass-through support, if support can be found within the cited item but only in an
unsupported statement or a statement referencing one or more other work(s).
• No clear support, if the citation context does not clearly relate to the cited item,
either its main findings, or other statements it makes.

• Keystone Statement (KS):
any statement whose
unreliability threatens the
argument for a main finding
of a paper.
Main Finding
Tau pathology results in neurodegeneration
Data
support
Method
Synaptic marker
support
Keystone Statement
XXX is a good synaptic biomarker for detecting
neurodegeneration

• Keystone Citation Context
(KCC): citation contexts
that supports keystone
statements.
Keystone Statement
XXX is a good synaptic biomarker for detecting
neurodegeneration
Keystone Citation Context
support

Keystone Citation Contexts Continued
How many items are cited?
Singleton
Cluster
Whether the cited item’s
main findings support the
citation context?
Main-finding
support
Pass-through
support
No
support

Distinction between KS and KCC
• KSs are summation of KCCs. Domain
experts can distill the same KS from
several KCCs found in different papers.
• One study found that several citation
contexts can be distilled to a statement
about β amyloid accumulation, but the
cited papers mentioned the statement
only as “hypothesis” or not at all.
Greenberg, 2009
Statement: The accumulation of β amyloid occurs early and
precedes other abnormalities.
Citing Article Cited Article
The appearance of Aβ-
positive, noncongophilic
deposits precedes
vacuolization in IBM
muscle fibers.8 (as fact)
Some muscle fibers had
Aβ-positive
accumulations,… Those
muscle fibers,…, may
represent early changes of
IBM. (as hypothesis)
support

Appendix C
Argument-based Curation

(Teufel & Kan, 2011)
Rhetoric-based Approaches
• Extract information based on
rhetorical feature
• Limited need for domain
knowledge
• Provide a coarser argumentative
structure
• Well-automated

Argument-scheme based Approaches
Premises:
• A group of individuals G have atypical
phenotype P
• All of the individuals in G have atypical
genotype M
• Another group of individuals (controls) do not
have P
• None of controls have M
Conclusion: M may be the cause of P (in G)
(Green, 2017)
• Require deep domain analysis
• Show the logic of how a discipline
justifies its findings
• Identify potential weakness through
critical questions
• Currently depend on manual curation,
but potentially automatable through
text mining

Provenance-based Method
(Clark et al., 2014)
• Model scientists’ work process
• Most suitable for modeling
empirical research articles
• Require manual curation

Towards knowledge maintenance in scientific digital libraries with the keystone framework -jcdl2020--2020-08-02

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to Towards knowledge maintenance in scientific digital libraries with the keystone framework -jcdl2020--2020-08-02

Similar to Towards knowledge maintenance in scientific digital libraries with the keystone framework -jcdl2020--2020-08-02 (20)

More from jodischneider

More from jodischneider (20)

Recently uploaded

Recently uploaded (20)

Towards knowledge maintenance in scientific digital libraries with the keystone framework -jcdl2020--2020-08-02