Categorizing Epistemic
                               Segment Types in Biology
                                  Research ...
Introduction




Thursday, September 17, 2009                  2
Why Study Biological Discourse?

                    -      There is too much of it!

                    -      Text mini...
Example state of the art: MEDIE

                                       without some idea of the status of the
           ...
How can linguistics help?
             Underlying model of text mining systems:

                      -        Scientific ...
Modality Dropping
                    -      Fact creation occurs through social acceptance: “[Y]ou can
                  ...
Overall Research Questions
              I. (How) can we add epistemic value to results from a
                 text minin...
Present work:
            Perform discourse analysis on a few selected texts in
            biology:
            1. Parse ...
Present research questions:

          i. Can these segments indeed be grouped by linguistic
             characteristics ...
Methods




Thursday, September 17, 2009             10
Method
            1. Parse text into Discourse Segments (EDUs) according to
               syntactic criteria
           ...
Segmentation Criteria
        Goal: ‘one new thought per segment’:
                Figure 4A shows that following RASV12 s...
Segmentation Criteria (summary)
                Finite/
                                        Grammatical role          ...
Basic Segment Types
                         Segment               Description                                     Example...
Two Types of Derived Segment Types
                ‘Other-segments’, related to (referenced) other work:

                ...
My categories vs. Latour (1979)




Thursday, September 17, 2009                       16
Linguistic and structural properties
                        1. Position in text

                               -   Secti...
Verb class
    Two types of entities interact in biology texts:
    -       Thing:
              -       Thing -> Increase...
Results




Thursday, September 17, 2009             19
Two texts
                    1. Voorhoeve, 2006: Cell

                          -    Cell biology text, written by group...
Segment vs. Section




Thursday, September 17, 2009                         21
Segment vs.Verb Type




Thursday, September 17, 2009                          22
Segment vs. verb tense




Thursday, September 17, 2009                            23
Segments vs. markers




Thursday, September 17, 2009                          24
Segment Order




Thursday, September 17, 2009                   25
Discussion




Thursday, September 17, 2009                26
Interpretation: 3 Realms of Science:
                                    (1) Oncogene-induced senescence is            (4b...
Tense 1: Concepts vs. Experiment
                               (1) Oncogene-induced senescence is            (4b) transdu...
Tense 2: Referral

                               past                                present                             ...
Tense 1+ 2 = 3:


                                             Claim,
                                              fact
 ...
Discourse Fact-ory
              hypothetical realm:              hypothesis                                realm of activ...
Citation and fact creation:                                                                        Yabuta, JBioChem 2007

...
Answers to current research questions:
    i.     Can these segments indeed be identified?
                    ✓      yes, ...
Where are we on overall research questions?
              I. (How) can we add epistemic value to results from a
          ...
Work on (biological) scientific discourse

                    -      Is a growing field of interest!

                    -...
Upcoming SlideShare
Loading in...5
×

Epistemics

495

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
495
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Epistemics

  1. 1. Categorizing Epistemic Segment Types in Biology Research Articles Anita de Waard Elsevier Labs, Amsterdam UiL-OTS, Utrecht University Thursday, September 17, 2009 1
  2. 2. Introduction Thursday, September 17, 2009 2
  3. 3. Why Study Biological Discourse? - There is too much of it! - Text mining and ‘fact extraction’ techniques are gaining ground to tame this tangle - Emerging area of biological natural language processing (BioNLP): subfield of computational linguistics - Main focus: identifying biological entities (genes, proteins, drugs) and their relationships Thursday, September 17, 2009 3
  4. 4. Example state of the art: MEDIE without some idea of the status of the sentence, it cannot be interpreted! Alteration of nm23, P53, and S100A4 expression may contribute to the development of gastric Previous studies have implicated miR-34a as a tumor suppressor gene whose transcription is activated by p53. Thursday, September 17, 2009 4
  5. 5. How can linguistics help? Underlying model of text mining systems: - Scientific paper is ‘statement of pertinent facts’ - So: finding entities and relationships will give you a summary of the knowledge within the paper - However, information extracted this way is not very useful.... Proposed approach: treat scientific paper as a persuasive text: specific genre, with genre characteristics and allowed persuasive techniques: - ‘these results suggest’ (depersonification) - ‘as fig. 2a shows’ (evidence is in the data) - ‘oncogenes produce a stress response [Serrano, 2003]’ References and data form a “folded array of successive defense lines, behind which scientists ensconce themselves” [Latour, 1988] Thursday, September 17, 2009 5
  6. 6. Modality Dropping - Fact creation occurs through social acceptance: “[Y]ou can transform .. fiction into fact just by adding or subtracting references” [Latour, 1988] - When references are cited the modality is dropped: - A: ‘these results suggest/demonstate/imply that’ X - B: ‘A et al. have shown that X [A, 2009]’ - C: ‘X [2009]’ - D: ‘Since X, we investigated the possibility that Y’ Thursday, September 17, 2009 6
  7. 7. Overall Research Questions I. (How) can we add epistemic value to results from a text mining system? II. How is a scientific fact created, as it moves from a hedged claim to a throughout successive citations? III. Can we identify a rhetorically successful text (and help authors create them)? Thursday, September 17, 2009 7
  8. 8. Present work: Perform discourse analysis on a few selected texts in biology: 1. Parse text into discourse segments (edu’s) containing a single rhetorical move (if possible...) 2. Determine categories or types of discourse segments that have similar rhetorical/pragmatic properties 3. Look at a number of linguistic characteristics and see if these segments share those characteristics. Thursday, September 17, 2009 8
  9. 9. Present research questions: i. Can these segments indeed be grouped by linguistic characteristics (verb tense, verb registry, metadiscourse markers?) ii. Does this offer a useful version of the structure of a paper? iii. Is this useful for enabling automated epistemic markup? iv. Can this help us to trace evolution of a hypothesis? Thursday, September 17, 2009 9
  10. 10. Methods Thursday, September 17, 2009 10
  11. 11. Method 1. Parse text into Discourse Segments (EDUs) according to syntactic criteria 2. Define set of semantic segment types 3. Identify semantic type for each segment 4. Specify linguistic and structural properties for each segment 5. Identify correlations between semantic type and structural/syntactic properties 6. Trace a hypothesis through the process of fact creation Thursday, September 17, 2009 11
  12. 12. Segmentation Criteria Goal: ‘one new thought per segment’: Figure 4A shows that following RASV12 stimulation, p53 was stabilized and activated, and its target gene, p21cip1, was induced in all cases, indicating an intact p53 pathway in these cells. a. Figure 4a shows that b. following RASV12 stimulation c. p53 was stabilized and activated d. and the target gene, p21cip1, was induced in all cases, e. indicating an intact p53 pathway in these cells. Thursday, September 17, 2009 12
  13. 13. Segmentation Criteria (summary) Finite/ Grammatical role Segment? Example Non-finite The extent to which miRNAs specifically affect Finite/Non-finite Subject N metastasis Finite/Non-finite Direct Object Y these miRNAs are potential novel oncogenes Phrase-level adjunct (restrictive and Nonfinite N spanning a given miRNA genomic region non-restrictive) Nonfinite Clause-level adjunct Y by cloning eight miR-Vec plasmids which is only active when tamoxifen is added (De Finite Non-restrictive Phrase-level adjunct Y Vita et al, 2005) […] Finite Restrictive Phrase-level adjunct N that we examined which correlates with the reported ES-cell Finite Clause-level adjunct Y expression pattern of the miR-371-3 cluster (Suh et al, 2004) Thursday, September 17, 2009 13
  14. 14. Basic Segment Types Segment Description Example a known fact, generally Fact mature miR-373 is a homolog of miR-372 without explicit citation a proposed idea, not Hypothesis This could for instance be a result of high mdm2 levels supported by evidence unresolved, contradictory, or However, further investigation is required to Problem unclear issue demonstrate the exact mechanism of LATS2 action Goal research goal To identify novel functions of miRNAs, Method experimental method Using fluorescence microscopy and luciferase assays, a restatement of the outcome all constructs yielded high expression levels of mature Result of an experiment miRNAs an interpretation of the our procedure is sensitive enough to detect mild growth Implication results, in light of earlier hypotheses and facts differences Thursday, September 17, 2009 14
  15. 15. Two Types of Derived Segment Types ‘Other-segments’, related to (referenced) other work: - other-result: ‘they are also found in the FCX and other cortical structures ([Sokoloff et al., 1990]’ - other-goal: ‘the role of D3 receptors in the control of motivation and affect has been intensively studied [Heidbreder et al., 2005]’ - other-implication: ‘D1 or, more likely, D5, receptors have been implicated in mechanisms underlying long-term spatial memory [Hersi et al., 1995]’ Regulatory segments, acting as matrix sentences framing other segments: - reg-hypothesis: ‘we hypothesized that ’ - reg-implication: ‘These observations suggest that’ - intratextual: ‘Fig 4 shows that’ - intertextual: ‘reviewed in (Serrano, 1997)’ Thursday, September 17, 2009 15
  16. 16. My categories vs. Latour (1979) Thursday, September 17, 2009 16
  17. 17. Linguistic and structural properties 1. Position in text - Section of the paper (Introduction, Results, Discussion) - Beginning/middle/end of section - First/second third part of sentence 2. Verb: - Tense, aspect, voice - Verb class (idiosyncratic) - Lexicon 3. Metadiscourse markers [Hyland, 2003]: - Connectives - Endophorics, Evidentials - Hedges, Boosters - Person markers Thursday, September 17, 2009 17
  18. 18. Verb class Two types of entities interact in biology texts: - Thing: - Thing -> Increase, die, etc - Thing-thing: affect, stimulate etc. - People: - People -> Thing: - Examine (Goal) - Operate (Method) - Observe (Result) - Implicate (Implication) - People - people: Report Thursday, September 17, 2009 18
  19. 19. Results Thursday, September 17, 2009 19
  20. 20. Two texts 1. Voorhoeve, 2006: Cell - Cell biology text, written by group in Amsterdam - Dealing with microRNAs - hot topic - 290 citations in Google Scholar: succesful paper! 2. Louiseau, 2008: European Neuropsychopharmacology - Text on schizophrenia - Prompted by interest from Pharma company - Adjacent subfield of biology (neuropharmacology) Thursday, September 17, 2009 20
  21. 21. Segment vs. Section Thursday, September 17, 2009 21
  22. 22. Segment vs.Verb Type Thursday, September 17, 2009 22
  23. 23. Segment vs. verb tense Thursday, September 17, 2009 23
  24. 24. Segments vs. markers Thursday, September 17, 2009 24
  25. 25. Segment Order Thursday, September 17, 2009 25
  26. 26. Discussion Thursday, September 17, 2009 26
  27. 27. Interpretation: 3 Realms of Science: (1) Oncogene-induced senescence is (4b) transduction with either Conceptual characterized by the appearance of miR-Vec-371&2 or miR-Vec- V12 cells with a flat morphology that 373 prevents RAS - realm express senescence associated (SA)- induced growth arrest in -Galactosid a s e . primary human cells. (2a) Indeed, (4a) Altogether, these data show that Experimental realm (2b) control RAS V12 -arrested (3b) very few cells showed cells showed relatively high senescent morphology when (3a) Consistent abundance of flat cells transduced with either miR- with the cell expressing SA- - Vec-371&2, miR-Vec-373, or growth assay, kd Galactosidase control p53 . (2c) (Figures 2G and 2H). Data realm (Figures) Thursday, September 17, 2009 27
  28. 28. Tense 1: Concepts vs. Experiment (1) Oncogene-induced senescence is (4b) transduction with either Concept realm characterized by the appearance of miR-Vec-371&2 or miR-Vec- V12 cells with a flat morphology that 373 prevents RAS - express senescence associated (SA)- induced growth arrest in -Galactosid a s e . primary human cells. (2a) Indeed, (4a) Altogether, these data show that Experimental realm (personal, past) V12 (2b) control RAS -arrested (3b) very few cells showed cells showed relatively high senescent morphology when (3a) Consistent abundance of flat cells transduced with either miR- with the cell expressing SA- - Vec-371&2, miR-Vec-373, or growth assay, kd Galactosidase control p53 . (2c) (Figures 2G and 2H). (nontverbal) Data realm (Figures) Thursday, September 17, 2009 28
  29. 29. Tense 2: Referral past present future Introduction Discussion own paper After Before current Current work After current other work: present work: past (= Results section) work: past other papers Other Work Thursday, September 17, 2009 29
  30. 30. Tense 1+ 2 = 3: Claim, fact Conceptual Experi ment Experiential past present future Reading time Thursday, September 17, 2009 30
  31. 31. Discourse Fact-ory hypothetical realm: hypothesis realm of activity: (might, would) (to test, to see) goal to problem results we realm of introduction method experience: past resulting in result suggests that discussion realm of models: fact fact fact present implication Shared view Own view discussion Thursday, September 17, 2009 31
  32. 32. Citation and fact creation: Yabuta, JBioChem 2007 Voorhoeve, 2006 miR-372 and miR-373 target the Lats2 tumor suppressor To investigate the possibility that (Voorhoeve et al., 2006) miR-372 and miR-373 suppress the expression of LATS2, we... Raver-Shapira et.al, JMolCell 2007 Therefore, these results point to two miRNAs, miRNA-372 and-373, function as LATS2 as a mediator of the miR-372 and potential novel oncogenes in testicular germ cell miR-373 effects on cell proliferation and tumors by inhibition of LATS2 expression, which tumorigenicity, suggests that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006). KnownFact KnownFact Concepts Hypothesis Implication Fact Goal Goal Method Result Method Result Data Data Experiment 1 Experiment 2 Thursday, September 17, 2009 32
  33. 33. Answers to current research questions: i. Can these segments indeed be identified? ✓ yes, adequate evidence, probably ok segments: ‣ need more annotators! ii. Does this offer a useful version of the structure of a paper? ✓ yes, offers insight, and a possible model ‣ need to be validated whether this structure holds over more papers, different subcategories iii. Is this useful for enabling automated epistemic markup? ✓ first efforts seem promising: simple markers (‘suggest’ verbs, connectives, etc.) already help ‣ ongoing research! (Sandor, XRCE; Buitelaar, DERI) iv. Can this help us to trace the evolution of a hypothesis? ✓ anecdotal: promising ‣ need to scale up! Thursday, September 17, 2009 33
  34. 34. Where are we on overall research questions? I. (How) can we add epistemic value to results from a text mining system? ‣ Segment types help - need to expand + verify II. How is a scientific fact created, as it moves from a hedged claim to a throughout successive citations? ‣ Model is developing, also spurt of other work! III. Can we identify a rhetorically successful text (and help authors create them)? ‣ Not addressed yet - verb tense, hedging seem important. Thursday, September 17, 2009 34
  35. 35. Work on (biological) scientific discourse - Is a growing field of interest! - Several projects developing going ‘beyond the facts’ - Epistemic modality is becoming a term bioinformaticians are exploring - Room for people who know about discourse analysis! Thursday, September 17, 2009 35
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×