Once Upon a Time....
- There was too much scientiﬁc information
(43,848 Papers on p53)
- And it was all written in stories....
- Find a structure for research articles,
that allows computer-aided access to
- Start with Research Articles in Cell Biology
- Expand to other genres/domains?
- How do we extract this structure?
- How do we use this structure?
Speech acts, conversational maxims, face principles, deixis, …
PragmaticEnglish 306A; Harris 5
1. Colloquial: practical, vs. theoretical
2. Linguistic: ‘meaning
of linguistic messages in
their context of use’ (per/il/locutionary goals)
3. Pragmaticweb: ‘quality of goal-oriented
discourse in communities’
Genre + Discourse Studies
- Science is written in text, as a story
- Text is created by humans to persuade
other humans (peers, that claims are facts)
- To tell the computer how we encode our
knowledge, we need to understand:
=> How do humans tell stories?
=> How do stories make sense?
Work on corpus
- Corpus of 14 coherent (citing, cited)
articles in Cell Biology, based around
- Hand-modeled ascii text; created XML
- Manual (by me + small user validation)
1st Attempt: Classical rhetoric
Aristotle Quintilian Cell APA Style
The introduction of a speech, where one announces the subject and
prooimion Introduction exordium purpose of the discourse, and where one usually employs the persuasive Introduction Introduction
appeal of ethos in order to establish credibility with the audience.
The second part of a classical oration, following the introduction or
Statement exordium. The speaker here provides a narrative account of what has
prothesis of Facts narratio happened and generally explains the nature of the case. Quintilian adds Introduction Introduction
that the narratio is followed by the propositio, a kind of summary of the
issues or a statement of the charge.
Coming between the narratio and the partitio of a classical oration, the
Summary propostitio propositio provides a brief summary of what one is about to speak on, or Abstract Abstract
concisely puts forth the charges or accusation.
Following the statement of facts, or narratio, comes the partitio or divisio.
Division/ In this section of the oration, the speaker outlines what will follow, in Table of
outline partitio accordance with what's been stated as the status, or point at issue in the Contents Article Outline
case. Quintilian suggests the partitio is blended with the propositio and
also assists memory.
Following the division / outline or partitio comes the main body of the
pistis Proof conﬁrmatio speech where one offers logical arguments as proof. The appeal to logos is Results Methods, Results
Following the the conﬁrmatio or section on proof in a classical oration,
Refutation refutatio comes the refutation. As the name connotes, this section of a speech was Discussion Discussion
devoted to answering the counterarguments of one's opponent.
Following the refutatio and concluding the classical oration, the peroratio
epilogos peroratio conventionally employed appeals through pathos, and often included a Discussion Discussion
summing up (see the ﬁgures of summary, below).
2nd Attempt: Story Grammar
The Story of Goldilocks Story Grammar Paper The AXH Domain of Ataxin-1 Mediates
and the Three Bears Neurodegeneration through Its Interaction with
Once upon a time Time Setting Background The mechanisms mediating SCA1 pathogenesis are still not fully
understood, but some general principles have emerged.
a little girl named Goldilocks Characters Objects of the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ
She went for a walk in the Location Experimental studied and compared in vivo effects and interactions to those o
forest. Pretty soon, she came setup the human protein
upon a house.
She knocked and, when no Goal Theme Research Gain insight into how Atx-1's function contributes to SCA1
one answered, goal pathogenesis. How these interactions might contribute to the
disease process and how they might cause toxicity in only a
subset of neurons in SCA1 is not fully understood.
she walked right in. Attempt Hypothesis Atx-1 may play a role in the regulation of gene expression
At the table in the kitchen, Name Episode 1 Name dAtX-1 and hAtx-1 Induce Similar Phenotypes When
there were three bowls of Overexpressed in Files
Goldilocks was hungry. Subgoal Subgoal test the function of the AXH domain
She tasted the porridge from Attempt Method overexpressed dAtx-1 in ﬂies using the GAL4/UAS system
the ﬁrst bowl. (Brand and Perrimon, 1993) and compared its effects to those o
This porridge is too hot! she Outcome Results Overexpression of dAtx-1 by Rhodopsin1(Rh1)-GAL4, which
exclaimed. drives expression in the differentiated R1-R6 photoreceptor cell
(Mollereau et al., 2000 and O'Tousa et al., 1985), results in
neurodegeneration in the eye, as does overexpression of hAtx-1
[82Q]. Although at 2 days after eclosion, overexpression of eithe
Atx-1 does not show obvious morphological changes in the
So, she tasted the porridge Data (data not shown),
from the second bowl.
This porridge is too cold, she Outcome Results both genotypes show many large holes and loss of cell integrity
said at 28 days
So, she tasted the last bowl of Data (Figures 1B-1D).
3rd Attempt: Discourse Segments
- “A text is made up of Discourse Segments
and the relations between them” - Grosz and
Sidner, Mann-Thomson, Marcu, Swales
- Discourse Segment Purpose: element that has
a consistent rhetorical/pragmatic goal.
- Deﬁne for Biological Research Article
Discourse Segments In Biology
To examine miRNA expression from the miR-Vec system,
a miR-24 minigene-containing virus was transduced into
human cells. Expression was determined using an RNase
protection assay (RPA) with a probe designed to identify
both precursor and mature miR-24 (Figure 1B).
Figure 1C shows that cells transduced with miR-Vec-24
clearly express high levels of mature miR-24,
whereas little expression was detected in control-
Discourse: A Fact(ory)
hypothetical realm: realm of activity:
(might, would) (to test, to see)
we realm of
introduction method experience:
incongruity or ignorance
realm of models:
fact fact fact implication present
Shared view Own view discussion
Links (Under Construction)
- From/to segment type makes difference:
methods link, fact link, agree/disagree link
- Not clear where to link into: is claim truly in referred
document? How to locate?
- Usually main proof in results (methods) segments: need
to allow multi-media elements in system!
- Many taxonomies: RST, Hovy, Sanders, ClaiMaker
- Identify textual coherence/argumentation...
Fact Problem Goal Method Results Implication Hypothesi
Fact in animals however to, we we fused, we in contrast, we our data suggesting
(3x) examined utilised found (5x), suggest, we that (2x)
(2x) though, on propose that,
average, under consistent
our conditions with
Problem we fused in this paper
Goal we isolated we showed
Method we found (2x), but suggests we
while, as seen predicted
Results in addition, we utilised, interestingly (2x), (strongly) we
in contrast we used since (3x), also suggests/ propose,
(2x), while (2x), suggesting suggesting
second (2x), third that (8x), that
(2x), ﬁnally (2x), implicating
thereafter, in our consistent
study with (2x),
g that (3x)
Implication to verify, to we however, ﬁrst also in theory
conﬁrm replaced, we (2x), interestingly
fused, we (2x), consistent
tried with, in our
1 'To' inﬁnitive appears as marker of Goal moves +
2 Sequential connectives appear within same segment type -
3 'though', 'however', 'therefore' - causal connectives occur at all 0
-> Problem and -> Hypothesis boundaries
4 'suggests' occurs at Results-> Implication/Hypothesis boundary +/0
5 'we found' /'we observed'/ 'we showed' -> Result boundary +/0
6 'we + other verb' occurs at -> method boundary 0
7 Contrast/correspondence in Fact <-> Result <-> Implication moves +!
Research Goals fulﬁlled?
allow computer-aided access to knowledge:
> need to identify if they do cover this genre
> need to ﬁnalize a structure of relations
> investigate more than cell biology
how do we extract this structure?
> collaborative attempts to identify segment markers/
relationships - next step
how do we use this structure? : [ DEMO ]
> possible collaborations with sensemaking systems?
- Science is created in text
- Goal of text is to convince peers that claims (backed
by data) belong to fact canon
- Text convinces humans through rhetorical/narrative
- Text creates meaning in the human mind
- Discourse parsing could allow access to knowledge
- More work needed: collaborations?
Bio-informatics Style Shum et Harmsze Swales RST Teufel Collier
Guides al et.al
Sections x x x
Moves x x x x!
Entities x x
Embedding x x
Discourse relations x x x
Argumentational x x
* Need complete model for multidocument collection – markup
* Unique role as a publisher: can apply/mandate at the source
Clause Classiﬁcation Test
Nr Section Introduction Results Discussion
Results Clause assignment test (8 tests handed
in, avg. 38 clauses each): A1 Agami, Results 4
51 No Disagreement A2 Agami, Discussion ½ 2 ½
10 Method/Result A3 Agami, Introduction 3
3 Problem/Goal S1 Serrano, Results 2
2 Fact/Interpretation S2 Serrano, Discussion 1 1
Comments on classification: S3 Serrano, Introduction 2
• Incomplete sentences are unclear, hard to classify
• Add ‘Hypothesis’ category, exx. clauses 8, 33, 74a, 77,
V1 Voorhoeve, Results 2
• Other possible categories: Assumption, Observation,
V2 Voorhoeve, 3
V3 Voorhoeve, 1 2
• Austin, J.L. How to do things with words, J.O. Urmson, ed. Oxford: Clarendon Press, 1962.
• Bazerman, Charles : Shaping written knowledge : the genre and activity of the experimental article in science, Madison,
Wisconsin: Univ. of Wisconsin Press, 1988.
• F.J. Bex, H. Prakken, C. Reed & D.N. Walton, Towards a formal account of reasoning about evidence: argumentation schemes and
generalisations. Artificial Intelligence and Law 11 (2003), 125-165
• Buckingham Shum, Simon J. Uren,V. et. al , Modelling Naturalistic Argumentation in Research literatures: Representation and
Interaction Design Issues, Tech Report kmi-04-28, December 2004
• Harmsze, Frédérique. PhD Thesis, February 9, 2000. A modular structure for scientiﬁc articles in an electronic environment
(HTML & PDF).
• Hovy, E. Automated discourse generation using discourse structure relations. Art. Intelligence 63(1-2): 1993. 341-386.
• Kircz, Joost G.. Modularity: the next form of scientiﬁc information presentation? Journal of Documentation. vol.54. No. 2. March
1998. pp. 210-235.
• Kuhn, Thomas, The Structure of Scientiﬁc Revolutions (Chicago: University of Chicago Press, 1962)
• Latour, B., Science in Action, How to Follow Scientists and Engineers through Society, (Cambridge, Ma.: Harvard University Press,
• Latour, Bruno, Steve Woolgar, Jonas Salk, Laboratory Life: The Construction of Scientific Facts, Princeton University Press,