This is a quick, high-level tour of some ideas from evidence-based medicine, citation-related ontologies for argumentation and evidence curation and biomedicine.
What WikiCite can learn from biomedical citation networks--Wikicite2017--2017-05-22
1. What WikiCite can learn from
biomedical citation networks
Jodi Schneider
WikiCite, 2017-05-22
jschneider@pobox.com
http://jodischneider.com/jodi.html
@jschneider
2. My hats
• Librarian
now: Professor of future librarians and info managers
• Ontologist
• Journal editor/founder (not quite a publisher!)
co-founder of open access Code4Lib Journal c. 2007
• Researcher
scholarly communication, Biomedical informatics, Linked Data,
argumentation, Computer-Supported Collaboration, Wikipedia
• Wikipedian
mainly EN.WP, EN.WikiQuote
• User
acawiki.org (Community Manager c. 2009)
bibliographic management software
articles, books, etc. in many, many fields
5. Hierarchy of Evidence
Figure credit: SUNY Downstate Medical Center. Medical Research
Library of Brooklyn. Evidence Based Medicine Course. A Guide to
Research Methods: The Evidence Pyramid:
http://library.downstate.edu/EBM2/2100.htm
8. Approaches include:
• Appraisal
• Aggregation
Figure credit: Forest plot from Underhill, Kristen, Paul
Montgomery, and Don Operario. "Sexual abstinence only
programmes to prevent HIV infection in high income countries:
systematic review." BMJ 335.7613 (2007): 248.
9. Figure credit: Duke University Medical Center Library. Introduction to
Evidence-based Practice. What is Evidence-Based Practice (EBP)?
http://guides.mclibrary.duke.edu/c.php?g=158201&p=1036021
• Appraisal
• Aggregation
• Contextualization
Approaches include:
12. What supports it? What holds it up?
By Biochem1 (Own work) CC BY-SA 3.0 via Wikimedia
Commons
https://commons.wikimedia.org/wiki/File:.جنگا_ست_یکJPG
13. Can it be shored up?
By Biochem1 (Own work) CC BY-SA 3.0 via Wikimedia Commons
https://commons.wikimedia.org/wiki/Category:Jenga#/media/File:
.جنگاJPG
18. “This” work agrees with…
• “This is in accordance with earlier studies
in the ambulatory surgical setting [3]” -
PMC1637100
Jodi Schneider, Graciela Rosemblat, Shabnam Tafreshi and Halil Kilicoglu.
Rhetorical moves and audience considerations in the discussion sections
of Randomized Controlled Trials of health interventions. To be presented
at European Conference on Argumentation, June 2017
19. Definitions and background info
• “Self-efficacy, which may relate to
motivation, is the perceived confidence in
one's ability to accomplish a specific task
[19].” - PMC2194735
Jodi Schneider, Graciela Rosemblat, Shabnam Tafreshi and Halil Kilicoglu.
Rhetorical moves and audience considerations in the discussion sections
of Randomized Controlled Trials of health interventions. To be presented
at European Conference on Argumentation, June 2017
20. Presenting a range of evidence
• “Except in one study [20], short-term
administration of GH transiently worsened
insulin resistance [19,53] and increased
fasting glucose levels [53].” - PMC1865086
Jodi Schneider, Graciela Rosemblat, Shabnam Tafreshi and Halil Kilicoglu.
Rhetorical moves and audience considerations in the discussion sections
of Randomized Controlled Trials of health interventions. To be presented
at European Conference on Argumentation, June 2017
21. Clause-level changes in meaning
• “Two of four randomised clinical trials
…have found a difference in admission
rate [12,19] and two have not [22,23].” -
PMC1142326
Jodi Schneider, Graciela Rosemblat, Shabnam Tafreshi and Halil Kilicoglu.
Rhetorical moves and audience considerations in the discussion sections
of Randomized Controlled Trials of health interventions. To be presented
at European Conference on Argumentation, June 2017
22. A single citation can support a
whole paragraph
• Dutton and colleagues [8] described a series
of 81 coagulopathic trauma patients treated
with rFVIIa. Of these, 20 received rFVIIa for
treatment of coagulopathy related to TBI. Six
of these patients had additional polytrauma.
The outcome of these patients was poor and
15 of 20 patients died. The authors attributed
this high mortality rate to the severity of brain
injury. None of the 81 trauma patients in this
series had any clinical indication of TE
events.”
Jodi Schneider, Graciela Rosemblat, Shabnam Tafreshi and Halil Kilicoglu.
Rhetorical moves and audience considerations in the discussion sections
of Randomized Controlled Trials of health interventions. To be presented
at European Conference on Argumentation, June 2017
23. Discussing treatments, outcomes,
other authors’ conclusions
• Dutton and colleagues [8] described a series
of 81 coagulopathic trauma patients treated
with rFVIIa. Of these, 20 received rFVIIa for
treatment of coagulopathy related to TBI. Six
of these patients had additional polytrauma.
The outcome of these patients was poor and
15 of 20 patients died. The authors
attributed this high mortality rate to the
severity of brain injury. None of the 81 trauma
patients in this series had any clinical
indication of TE events.”
Jodi Schneider, Graciela Rosemblat, Shabnam Tafreshi and Halil Kilicoglu.
Rhetorical moves and audience considerations in the discussion sections
of Randomized Controlled Trials of health interventions. To be presented
at European Conference on Argumentation, June 2017
24. Sometimes several parallel
paragraphs.
• Dutton and colleagues [8] described a
series of 81 …patients treated with rFVIIa”
• “Zaaroor and Bar-Lavie [23] reported the
first series of five patients …”
• “Morenski and colleagues [24] described
…three pediatric … cases”
Jodi Schneider, Graciela Rosemblat, Shabnam Tafreshi and Halil Kilicoglu.
Rhetorical moves and audience considerations in the discussion sections
of Randomized Controlled Trials of health interventions. To be presented
at European Conference on Argumentation, June 2017
25. Multiple citations in a paragraph
• “Berger et al. [42] compared the efficacy of
hypertonic saline and mannitol to reduce ICP after
a combination of two different neuronal injuries.
Initially, ….The authors demonstrated that …After
…. It is remarkable that … An accumulation
…These different effects … [42]. Furthermore,
Prough et al. observed a higher regional cerebral
blood flow in dogs with induced intracerebral
hemorrhage after hypertonic saline without any
increase of the CPP [43].” - PMC1297608
Jodi Schneider, Graciela Rosemblat, Shabnam Tafreshi and Halil Kilicoglu.
Rhetorical moves and audience considerations in the discussion sections
of Randomized Controlled Trials of health interventions. To be presented
at European Conference on Argumentation, June 2017
26. Avoiding a 1-sentence paragraph?
• “Berger et al. [42] compared the efficacy of
hypertonic saline and mannitol to reduce ICP after
a combination of two different neuronal injuries.
Initially, ….The authors demonstrated that …After
…. It is remarkable that … An accumulation
…These different effects … [42]. Furthermore,
Prough et al. observed a higher regional cerebral
blood flow in dogs with induced intracerebral
hemorrhage after hypertonic saline without any
increase of the CPP [43].” - PMC1297608
Jodi Schneider, Graciela Rosemblat, Shabnam Tafreshi and Halil Kilicoglu.
Rhetorical moves and audience considerations in the discussion sections
of Randomized Controlled Trials of health interventions. To be presented
at European Conference on Argumentation, June 2017
27. “[Y]ou can transform a fact into
fiction or a fiction into fact just by
adding or subtracting references”
- Bruno Latour
28. ... two miRNAs, miRNA-372 and-373, function as potential novel oncogenes in
testicular germ cell tumors by inhibition of LATS2 expression, which suggests that
Lats2 is an important tumor suppressor (Voorhoeve et al., 2006).
Raver-Shapira et.al, JMolCell 2007
miR-372 and miR-373 target the Lats2 tumor suppressor (Voorhoeve et al., 2006)
Yabuta, JBioChem 2007:
As claims get cited, they become facts:
To investigate the possibility that miR-372 and miR-373 suppress the
expression of LATS2, we...
Therefore, these results point to LATS2 as a mediator of the miR-372 and miR-373
effects on cell proliferation and tumorigenicity,
Voorhoeve et al, Cell, 2006:
Hypothesis
Implication
Cited Implication
Fact
Slide credit: Anita DeWaard: 'Stories that persuade with data' - talk at CENDI meeting January 9 2014
https://www.slideshare.net/anitawaard/stories-that-persuade-with-data-talk-at-cendi-meeting-january-
9-2014/6
29. “The conversion of hypothesis to
fact through citation alone.”
- Stephen Greenberg
30. Greenberg, Steven A.
"Understanding belief using
citation networks." Journal of
evaluation in clinical
practice 17.2 (2011): 389-393.
http://dx.doi.org/
10.1111/j.1365-
2753.2011.01646.x
31. “The conversion of hypothesis to fact
through citation alone.”
- Stephen Greenberg
Greenberg, Steven A. "How citation distortions create unfounded
authority: analysis of a citation network." BMJ 339 (2009): b2680.
https://doi.org/10.1136/bmj.b2680
32. Funded grants with citation bias &
citation distortion.
Greenberg, Steven A. "How citation distortions create unfounded
authority: analysis of a citation network." BMJ 339 (2009): b2680.
https://doi.org/10.1136/bmj.b2680
34. SEPIO – evidence lines
Brush, Matthew, Kent Shefchek, and Melissa Haendel. "SEPIO: a
semantic model for the integration and analysis of scientific
evidence." International Conference on Biomedical Ontology and
BioCreative. 2016. http://ceur-ws.org/Vol-1747/IT605_ICBO2016.pdf
“A proposition has_evidence
one or more evidence lines, which have_supporting_data
one or more data items used in evaluation of the
proposition’s truth.”
35. SEPIO – evidence lines example
Brush, Matthew, Kent Shefchek, and Melissa Haendel. "SEPIO: a
semantic model for the integration and analysis of scientific
evidence." International Conference on Biomedical Ontology and
BioCreative. 2016. http://ceur-ws.org/Vol-1747/IT605_ICBO2016.pdf
“A simplified account of existing evidence related to this proposition is presented below,
presenting summaries of five evidence lines (E1-E5) from five studies relevant to the
classification of the variant for Fabry Disease:
E1. Six affected individuals with the variant were found to have reduced GLA enzyme
activity.
E2. The variant was absent from 528 unaffected controls.
E3. The variant is predicted to cause abnormal splicing that inserts additional sequence.
E4. Pedigree analyses showed Fabry Disease phenotypes segregating with the variant.
E5. Population databases show high frequency of individuals homozygous for the variant.”
36. SEPIO – evidence lines example
Brush, Matthew, Kent Shefchek, and Melissa Haendel. "SEPIO: a
semantic model for the integration and analysis of scientific
evidence." International Conference on Biomedical Ontology and
BioCreative. 2016. http://ceur-ws.org/Vol-1747/IT605_ICBO2016.pdf
“A simplified account of existing evidence related to this proposition is presented below,
presenting summaries of five evidence lines (E1-E5) from five studies relevant to the
classification of the variant for Fabry Disease:
E1. Six affected individuals with the variant were found to have reduced GLA enzyme
activity.
E2. The variant was absent from 528 unaffected controls.
E3. The variant is predicted to cause abnormal splicing that inserts additional sequence.
E4. Pedigree analyses showed Fabry Disease phenotypes segregating with the variant.
E5. Population databases show high frequency of individuals homozygous for the variant.”
38. SEE
Bö̈ lling, Christian, Michael Weidlich, and Hermann-Georg Holzhütter.
"SEE: structured representation of scientific evidence in the biomedical
domain using Semantic Web techniques." Journal of Biomedical
Semantics 5.1 (2014): 1.
39. SEE
Bö̈ lling, Christian, Michael Weidlich, and Hermann-Georg Holzhütter.
"SEE: structured representation of scientific evidence in the biomedical
domain using Semantic Web techniques." Journal of Biomedical
Semantics 5.1 (2014): 1.
41. Micropublications
Clark, Tim, Paolo N. Ciccarese, and Carole A. Goble.
"Micropublications: a semantic model for claims, evidence, arguments
and annotations in biomedical communications." Journal of Biomedical
Semantics 5.28 (2014). http://dx.doi.org/10.1186/2041-1480-5-28
42. Jodi Schneider, Paolo Ciccarese, Tim Clark, Richard D. Boyce. “Using the Micropublications ontology and the
Open Annotation Data Model to represent evidence within a drug-drug interaction knowledge base.” Linked
Science at ISWC 2014 http://ceur-ws.org/Vol-1282/lisc2014_submission_8.pdf
44. Cataloging evidence types for
knowledge bases.
Boyce, R.D.: A Draft Evidence Taxonomy and Inclusion Criteria for the
Drug Interaction Knowledge Base (DIKB),
http://purl.net/net/drug-interaction-knowledge-base/evidence-types-and-
inclusion-criteria
45. Biological Expression Language
Rastegar-Mojarad, Majid, Ravikumar Komandur Elayavilli, and
Hongfang Liu. "BELTracker: evidence sentence retrieval for BEL
statements." Database 2016 (2016).
See also: http://openbel.org
47. Voorhoeve et al. (116) employed a novel
strategy by combining an miRNA vector library
and corresponding bar code array…
miR-372 and miR-373 were consequently found
to permit proliferation and tumorigenesis of
these primary cells carrying both oncogenic RAS
and wild-type p53,
probably through direct inhibition of the
expression of the tumor-suppressor LATS2 and
subsequent neutralization of the p53 pathway.
to identify miRNAs that when overexpressed
could substitute for p53 loss and allow continued
proliferation in the context of Ras activation
TAC Corpus:
Curated Collection of 500 Citing > 50 Cited Papers
Voorhoeve et al. (2006), A Genetic
Screen …
In mammals, a near-perfect complementarity between
miRNAs and protein coding genes almost never exists, making
it difficult to directly pinpoint relevant downstream targets of
a miRNA. Several algorithms were developed that predict
miRNA targets, most notably TargetScanS, PicTar, and
miRanda (John et al., 2004, Lewis et al., 2005 and Robins et al.,
2005).
These programs predict dozens to hundreds of target genes
per miRNA, making it difficult to directly infer the cellular
pathways affected by a given miRNA. Furthermore, the
biological effect of the downregulation depends greatly on the
cellular context, which exemplifies the need to deduce miRNA
functions by in vivo genetic screens in well-defined model
systems.
The cancerous process can be modeled by in vitro neoplastic
transformation assays in primary human cells (Hahn et al.,
1999). Using this system, sets of genetic elements required for
transformation were identified. For example, the joint
expression of the telomerase reverse transcriptase subunit
(hTERT),
oncogenic H-RASV12, and SV40-small t antigen combined with
the suppression of p53 and p16INK4A were sufficient to
render primary human fibroblasts tumorigenic (Voorhoeve
and Agami, 2003).
Goal
Method
Result
Con
clusion
Citing PapersReference Paper
Slide credit: Anita DeWaard:
Argumentation in biology papers
https://www.slideshare.net/anitawaard/argumentation-in-biology-papers/27
48. Take away messages
• Biomedicine has evolved multiple
approaches for managing and appraising
individual papers and bodies of “facts”.
• Citations come in many shapes and sizes.
• Citations may support “facts” – as part of a
larger scientific fabric that includes data,
evidence, arguments.
• Powermove: identify the critical supports
(think Jenga)
49. Take away messages
• Network analysis may help identify
problematic citation practices.
• Modeling arguments can help identify the
robustness of a claimed “fact”.
• Semantic models could enable inference-
based reasoning and citation network
querying.
• Relevant citation corpora exist.
54. Anita’s insight:
Detect and Track Metadiscourse
• Voorhoeve et al., 2006: “These miRNAs neutralize p53- mediated CDK
inhibition, possibly through direct inhibition of the expression of the tumor
suppressor LATS2.”
• Kloosterman and Plasterk, 2006: “In a genetic screen, miR-372 and miR-373
were found to allow proliferation of primary human cells that express
oncogenic RAS and active p53, possibly by inhibiting the tumor suppressor
LATS2 (Voorhoeve et al., 2006).”
• Yabuta et al., 2007: “[On the other hand,] two miRNAs, miRNA-372 and-373,
function as potential novel oncogenes in testicular germ cell tumors by
inhibition of LATS2 expression, which suggests that Lats2 is an important
tumor suppressor (Voorhoeve et al., 2006).”
• Okada et al., 2011: “Two oncogenic miRNAs, miR-372 and miR-373, directly
inhibit the expression of Lats2, thereby allowing tumorigenic growth in the
presence of p53 (Voorhoeve et al., 2006).”
Slide credit: Based on Anita DeWaard: How to persuade with data
https://www.slideshare.net/anitawaard/stories-thatpersuadev-4/18
55. Metadiscourse: some progress
• Hedging cues, speculative language, modality/negation:
• Light et al [6]: finding speculative language
• Wilbur et al (Hagit) [7]: focus, polarity, certainty, evidence, and
directionality
• Thompson et al (Sophia) [8]: level of speculation, type/source
of the evidence and level of certainty
• Sentiment detection (e.g. Kim and Hovy [9] a.m.o.):
• Holder of the opinion, strength, polarity as ‘mathematical
function’ acting on main propositional content
• Can make this part of the semantic web: (e.g., Ontology for
Reasoning, Certainty and Attribution, ORCA [10]):
• Value (Presumed True, Probable, Possible, Unknown)
• Source (Author, Named Other, Unknown)
• Basis (Data, Reasoning, Unknown) Slide credit: Anita DeWaard:
How to persuade with data
https://www.slideshare.net/anita
waard/stories-thatpersuadev-
4/19
56. Anita’s citations
[6] Light M, Qiu XY, Srinivasan P. (2004). The language of bioscience: facts,
speculations, and statements in between. BioLINK 2004: Linking Biological
Literature, Ontologies and Databases 2004:17-24.
[7] Wilbur WJ, Rzhetsky A, Shatkay H (2006). New directions in biomedical
text annotations: definitions, guidelines and corpus construction. BMC
Bioinformatics 2006, 7:356.
[8] Thompson P., Venturi G., McNaught J, Montemagni S, Ananiadou S.
(2008). Categorising modality in biomedical texts. Proc. LREC 2008 Wkshp
Building and Evaluating Resources for Biomedical Text Mining 2008.
[9] Kim, S-M. Hovy, E.H. (2004). Determining the Sentiment of Opinions.
Proceedings of the COLING conference, Geneva, 2004.
[10] de Waard, A. and Schneider, J. (2012) An Ontology of Reasoning,
Certainty and Attribution (ORCA), ISWC 2012, http://ceur-ws.org/Vol-
930/p2.pdf
Editor's Notes
Is Wikipedia a gateway to biomedical research? (Lauren Maggio)
Which evidence do we take into account for a given purpose?
How trustworthy and valid is a given scientific “fact”?
What supports it? What line of research established it? How many pieces can we remove before it tumbles?
Greenberg, Steven A. "How citation distortions create unfounded authority: analysis of a citation network." BMJ 339 (2009): b2680. https://doi.org/10.1136/bmj.b2680
Latour, Bruno. Science in action: How to follow scientists and engineers through society. Harvard University Press, 1987. p33
Greenberg, Steven A. "How citation distortions create unfounded authority: analysis of a citation network." BMJ 339 (2009): b2680. https://doi.org/10.1136/bmj.b2680
Latour, Bruno. Science in action: How to follow scientists and engineers through society. Harvard University Press, 1987. p33
“As is commonly the case, different evidence is used by each lab - either because certain data were not accessible, or some labs judged certain data to be unreliable or irrelevant to the claim, or some labs interpreted the same data in different ways. SEPIO translates this scenario into the following narrative and set of instances to be represented in its formal modeling of the data.”
“As is commonly the case, different evidence is used by each lab - either because certain data were not accessible, or some labs judged certain data to be unreliable or irrelevant to the claim, or some labs interpreted the same data in different ways. SEPIO translates this scenario into the following narrative and set of instances to be represented in its formal modeling of the data.”
“As is commonly the case, different evidence is used by each lab - either because certain data were not accessible, or some labs judged certain data to be unreliable or irrelevant to the claim, or some labs interpreted the same data in different ways. SEPIO translates this scenario into the following narrative and set of instances to be represented in its formal modeling of the data.”
“As is commonly the case, different evidence is used by each lab - either because certain data were not accessible, or some labs judged certain data to be unreliable or irrelevant to the claim, or some labs interpreted the same data in different ways. SEPIO translates this scenario into the following narrative and set of instances to be represented in its formal modeling of the data.”
“As is commonly the case, different evidence is used by each lab - either because certain data were not accessible, or some labs judged certain data to be unreliable or irrelevant to the claim, or some labs interpreted the same data in different ways. SEPIO translates this scenario into the following narrative and set of instances to be represented in its formal modeling of the data.”
“As is commonly the case, different evidence is used by each lab - either because certain data were not accessible, or some labs judged certain data to be unreliable or irrelevant to the claim, or some labs interpreted the same data in different ways. SEPIO translates this scenario into the following narrative and set of instances to be represented in its formal modeling of the data.”
“A model of the evidence for and against the assertion escitalopram does not inhibit CYP2D6. This is based on the Micropublications ontology, and reuses the ev- idence taxonomy (dikbEvidence), terms (dikb), and data from the DIKB. The Drug Ontology (DRON) and Protein Ontology (PRO) are reused in semantic qualifiers. A more detailed view of Method Me1 is shown in Figure 1. "
“As is commonly the case, different evidence is used by each lab - either because certain data were not accessible, or some labs judged certain data to be unreliable or irrelevant to the claim, or some labs interpreted the same data in different ways. SEPIO translates this scenario into the following narrative and set of instances to be represented in its formal modeling of the data.”
“The Biological Expression Language (BEL) is a language for representing scientific findings in the life sciences in a computable form. BEL is designed to represent scientific findings by capturing causal and correlative relationships in context, where context can include information about the biological and experimental system in which the relationships were observed, the supporting publications cited and the curation process used.” http://openbel.org
“As is commonly the case, different evidence is used by each lab - either because certain data were not accessible, or some labs judged certain data to be unreliable or irrelevant to the claim, or some labs interpreted the same data in different ways. SEPIO translates this scenario into the following narrative and set of instances to be represented in its formal modeling of the data.”
“As is commonly the case, different evidence is used by each lab - either because certain data were not accessible, or some labs judged certain data to be unreliable or irrelevant to the claim, or some labs interpreted the same data in different ways. SEPIO translates this scenario into the following narrative and set of instances to be represented in its formal modeling of the data.”