ICPW2007.deWaard
Upcoming SlideShare
Loading in...5
×
 

ICPW2007.deWaard

on

  • 2,918 views

 

Statistics

Views

Total Views
2,918
Views on SlideShare
2,890
Embed Views
28

Actions

Likes
0
Downloads
39
Comments
0

1 Embed 28

http://www.pragmaticweb.info 28

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

ICPW2007.deWaard ICPW2007.deWaard Presentation Transcript

  • Science Beyond the Facts: A Pragmatic Structure for Research Articles Anita de Waard Elsevier Labs, Disruptive Technologies Utrecht University
  • Introduction
  • Once Upon a Time.... - There was too much scientific information (43,848 Papers on p53) - And it was all written in stories.... [demo Papers]
  • Research Goal - Find a structure for research articles, that allows computer-aided access to knowledge elements - Start with Research Articles in Cell Biology - Expand to other genres/domains? - How do we extract this structure? - How do we use this structure?
  • Speech acts, conversational maxims, face principles, deixis, … PragmaticEnglish 306A; Harris 5 1. Colloquial: practical, vs. theoretical 2. Linguistic: ‘meaning of linguistic messages in their context of use’ (per/il/locutionary goals) 3. Pragmaticweb: ‘quality of goal-oriented Meaning discourse in communities’ Semantics Pragmatics Propositions Utterances Truth/falsity Appropriateness Context-free Context-dependent Language-in-vitro Language-in-vivo
  • Method
  • Genre + Discourse Studies - Science is written in text, as a story - Text is created by humans to persuade other humans (peers, that claims are facts) - To tell the computer how we encode our knowledge, we need to understand: => How do humans tell stories? => How do stories make sense?
  • Work on corpus - Corpus of 14 coherent (citing, cited) articles in Cell Biology, based around (Voorhoeve, 2006) - Hand-modeled ascii text; created XML - Manual (by me + small user validation)
  • (Preliminary) Results
  • 1st Attempt: Classical rhetoric Aristotle Quintilian Cell APA Style Guide The introduction of a speech, where one announces the subject and prooimion Introduction exordium purpose of the discourse, and where one usually employs the persuasive Introduction Introduction appeal of ethos in order to establish credibility with the audience. The second part of a classical oration, following the introduction or Statement exordium. The speaker here provides a narrative account of what has prothesis of Facts narratio happened and generally explains the nature of the case. Quintilian adds Introduction Introduction that the narratio is followed by the propositio, a kind of summary of the issues or a statement of the charge. Coming between the narratio and the partitio of a classical oration, the   Summary propostitio propositio provides a brief summary of what one is about to speak on, or Abstract Abstract concisely puts forth the charges or accusation. Following the statement of facts, or narratio, comes the partitio or divisio. Division/ In this section of the oration, the speaker outlines what will follow, in Table of   outline partitio accordance with what's been stated as the status, or point at issue in the Contents Article Outline case. Quintilian suggests the partitio is blended with the propositio and also assists memory. Following the division / outline or partitio comes the main body of the pistis Proof confirmatio speech where one offers logical arguments as proof. The appeal to logos is Results Methods, Results emphasized here. Following the the confirmatio or section on proof in a classical oration,   Refutation refutatio comes the refutation. As the name connotes, this section of a speech was Discussion Discussion devoted to answering the counterarguments of one's opponent. Following the refutatio and concluding the classical oration, the peroratio epilogos   peroratio conventionally employed appeals through pathos, and often included a Discussion Discussion summing up (see the figures of summary, below).
  • 2nd Attempt: Story Grammar The Story of Goldilocks Story Grammar Paper The AXH Domain of Ataxin-1 Mediates and the Three Bears Neurodegeneration through Its Interaction with Gfi-1/Senseless Proteins Once upon a time Time Setting Background The mechanisms mediating SCA1 pathogenesis are still not fully understood, but some general principles have emerged. a little girl named Goldilocks Characters Objects of the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ study tract, She went for a walk in the Location Experimental studied and compared in vivo effects and interactions to those o forest. Pretty soon, she came setup the human protein upon a house. She knocked and, when no Goal Theme Research Gain insight into how Atx-1's function contributes to SCA1 one answered, goal pathogenesis. How these interactions might contribute to the disease process and how they might cause toxicity in only a subset of neurons in SCA1 is not fully understood. she walked right in. Attempt Hypothesis Atx-1 may play a role in the regulation of gene expression At the table in the kitchen, Name Episode 1 Name dAtX-1 and hAtx-1 Induce Similar Phenotypes When there were three bowls of Overexpressed in Files porridge. Goldilocks was hungry. Subgoal Subgoal test the function of the AXH domain She tasted the porridge from Attempt Method overexpressed dAtx-1 in flies using the GAL4/UAS system the first bowl. (Brand and Perrimon, 1993) and compared its effects to those o hAtx-1. This porridge is too hot! she Outcome Results Overexpression of dAtx-1 by Rhodopsin1(Rh1)-GAL4, which exclaimed. drives expression in the differentiated R1-R6 photoreceptor cell (Mollereau et al., 2000 and O'Tousa et al., 1985), results in neurodegeneration in the eye, as does overexpression of hAtx-1 [82Q]. Although at 2 days after eclosion, overexpression of eithe Atx-1 does not show obvious morphological changes in the photoreceptor cells So, she tasted the porridge   Data (data not shown), from the second bowl. This porridge is too cold, she Outcome Results both genotypes show many large holes and loss of cell integrity said at 28 days So, she tasted the last bowl of   Data (Figures 1B-1D). porridge.
  • 3rd Attempt: Discourse Segments - “A text is made up of Discourse Segments and the relations between them” - Grosz and Sidner, Mann-Thomson, Marcu, Swales - Discourse Segment Purpose: element that has a consistent rhetorical/pragmatic goal. - Define for Biological Research Article
  • Discourse Segments In Biology <Goal> To examine miRNA expression from the miR-Vec system, </Goal> <Method> a miR-24 minigene-containing virus was transduced into human cells. Expression was determined using an RNase protection assay (RPA) with a probe designed to identify both precursor and mature miR-24 (Figure 1B). </Method> <Result> Figure 1C shows that cells transduced with miR-Vec-24 clearly express high levels of mature miR-24, whereas little expression was detected in control- transduced cells. </Result>
  • 12
  • Segments vs. Sections Introduction Method Results Discussion Total Fact 63 0 104 37 204 Problem 20 0 10 15 45 Goal 2 0 72 6 80 Method 2 all 129 6 137 Result 10 0 230 44 284 Implication 14 0 100 36 150 Hypothesis 10 0 33 26 69 Total 121 0 678 170 969
  • Segment Tense Fact Problem Goal Method Result Implication Hypothesis Present active 72 46% 27 60% 15 23% 7 7% 37 16% 69 51% 38 55% Present 5 3% 2 4% 2 3% 1 1% 1 0% 11 8% 1 1% passive Past active 18 11% 5 11% 11 17% 48 47% 122 54% 16 12% 8 12% Past passive 25 16% 2 4% 1 2% 17 17% 21 9% 1 1% 5 7% Future 2 1% 3 7% 0 0% 0 0% 1 0% 0 0% 0 0% Imperfect: quot;toquot; 13 8% 2 4% 32 50% 2 2% 20 9% 14 10% 7 10% Gerund (quot;ingquot;) 22 14% 4 9% 3 5% 28 27% 23 10% 24 18% 10 14% Total 157 100% 45 100% 64 100% 103 100% 225 100% 135 100% 69 100%
  • Segment order Fact Hypothesis Problem Goal Method Result Implication End Total Start 18 3 1 8 2 2 4 0 38 Fact 83 22 13 17 9 31 12 1 188 Hypothesis 20 5 3 7 6 2 6 3 52 Problem 9 7 7 2 3 5 3 3 39 Goal 7 0 2 4 46 6 0 0 65 Method 13 2 3 10 25 54 3 0 110 Result 23 9 4 6 16 85 78 6 227 Implication 13 6 4 12 11 30 12 25 113 Total 186 54 37 61 118 215 118 38 827
  • Discourse: A Fact(ory) hypothetical realm: realm of activity: (might, would) (to test, to see) goal to problem results we realm of introduction method experience: past resulting in result suggests that hypothesis incongruity or ignorance discussion realm of models: fact fact fact implication present Shared view Own view discussion
  • Links (Under Construction) To references: - From/to segment type makes difference: methods link, fact link, agree/disagree link - Not clear where to link into: is claim truly in referred document? How to locate? To figures/tables: - Usually main proof in results (methods) segments: need to allow multi-media elements in system! Discourse relations: - Many taxonomies: RST, Hovy, Sanders, ClaiMaker - Identify textual coherence/argumentation...
  • Coherence Markers Fact Problem Goal Method Results Implication Hypothesi s Fact in animals however to, we we fused, we in contrast, we our data suggesting (3x) examined utilised found (5x), suggest, we that (2x) (2x) though, on propose that, average, under consistent our conditions with Problem we fused in this paper Goal we isolated we showed Method we found (2x), but suggests we while, as seen predicted Results in addition, we utilised, interestingly (2x), (strongly) we in contrast we used since (3x), also suggests/ propose, (2x), while (2x), suggesting suggesting second (2x), third that (8x), that (2x), finally (2x), implicating subsequent, (2x), thereafter, in our consistent study with (2x), demonstratin g that (3x) Implication to verify, to we however, first also in theory confirm replaced, we (2x), interestingly fused, we (2x), consistent tried with, in our analysis, strikingly, neither
  • Preliminary Hypotheses 1 'To' infinitive appears as marker of Goal moves + 2 Sequential connectives appear within same segment type - 3 'though', 'however', 'therefore' - causal connectives occur at all 0 -> Problem and -> Hypothesis boundaries 4 'suggests' occurs at Results-> Implication/Hypothesis boundary +/0 5 'we found' /'we observed'/ 'we showed' -> Result boundary +/0 6 'we + other verb' occurs at -> method boundary 0 7 Contrast/correspondence in Fact <-> Result <-> Implication moves +!
  • Discussion
  • Research Goals fulfilled? allow computer-aided access to knowledge: yes, but: > need to identify if they do cover this genre > need to finalize a structure of relations other genres/domains? > investigate more than cell biology how do we extract this structure? > collaborative attempts to identify segment markers/ relationships - next step how do we use this structure? : [ DEMO ] > possible collaborations with sensemaking systems?
  • Preliminary Conclusions - Science is created in text - Goal of text is to convince peers that claims (backed by data) belong to fact canon - Text convinces humans through rhetorical/narrative discourse structure - Text creates meaning in the human mind - Discourse parsing could allow access to knowledge structure - More work needed: collaborations?
  • Questions? anita@cs.uu.nl http://people.cs.uu.nl/anita
  • Appendix
  • Related work Bio-informatics Style Shum et Harmsze Swales RST Teufel Collier Guides al et.al Sections x x x Moves x x x x! Entities x x Embedding x x Discourse relations x x x Argumentational x x relations * Need complete model for multidocument collection – markup content elements and relationships * Unique role as a publisher: can apply/mandate at the source
  • Total Fact Problem Goal Method Result Implication Hypothesis End Total Start 18 1 8 2 2 4 3 0 38 Fact 83 13 17 9 31 12 22 1 188 Problem 9 7 2 3 5 3 7 3 39 Goal 7 2 4 46 6 0 0 0 65 Method 13 3 10 25 54 3 2 0 110 Result 23 4 6 16 85 78 9 6 227 Implication 13 4 12 11 30 12 6 25 113 Hypothesis 20 3 7 6 2 6 5 3 52 Total 186 37 61 118 215 118 54 38 827 Selfs 221 Model: 399 19 % in 65.84% Model:
  • Clause Classification Test Nr Section Introduction Results Discussion Results Clause assignment test (8 tests handed in, avg. 38 clauses each): A1 Agami, Results 4 114 Clauses 51 No Disagreement A2 Agami, Discussion ½ 2 ½ 13 Fact/Result 11 Fact/Problem 10 Method/Result A3 Agami, Introduction 3 7 Result/Implication 4 Goal/Method 3 Problem/Goal S1 Serrano, Results 2 2 Goal/Result 2 Problem/Interpretation 2 Fact/Interpretation S2 Serrano, Discussion 1 1 1 Problem/Result Comments on classification: S3 Serrano, Introduction 2 • Incomplete sentences are unclear, hard to classify • Add ‘Hypothesis’ category, exx. clauses 8, 33, 74a, 77, 78b. V1 Voorhoeve, Results 2 • Other possible categories: Assumption, Observation, “Given that...” V2 Voorhoeve, 3 Discussion V3 Voorhoeve, 1 2 Introduction 24
  • References • Austin, J.L. How to do things with words, J.O. Urmson, ed. Oxford: Clarendon Press, 1962. • Bazerman, Charles : Shaping written knowledge : the genre and activity of the experimental article in science, Madison, Wisconsin: Univ. of Wisconsin Press, 1988. • F.J. Bex, H. Prakken, C. Reed & D.N. Walton, Towards a formal account of reasoning about evidence: argumentation schemes and generalisations. Artificial Intelligence and Law 11 (2003), 125-165 • Buckingham Shum, Simon J. Uren,V. et. al , Modelling Naturalistic Argumentation in Research literatures: Representation and Interaction Design Issues, Tech Report kmi-04-28, December 2004 • Harmsze, Frédérique. PhD Thesis, February 9, 2000. A modular structure for scientific articles in an electronic environment (HTML & PDF). • Hovy, E. Automated discourse generation using discourse structure relations. Art. Intelligence 63(1-2): 1993. 341-386. • Kircz, Joost G.. Modularity: the next form of scientific information presentation? Journal of Documentation. vol.54. No. 2. March 1998. pp. 210-235. • Kuhn, Thomas, The Structure of Scientific Revolutions (Chicago: University of Chicago Press, 1962) • Latour, B., Science in Action, How to Follow Scientists and Engineers through Society, (Cambridge, Ma.: Harvard University Press, 1987) • Latour, Bruno, Steve Woolgar, Jonas Salk, Laboratory Life: The Construction of Scientific Facts, Princeton University Press, 1986 25