Identifying Biological Knowledge:
    Three Possible Strategies
             Anita de Waard
     Disruptive Technologies D...
Overview
Overview
Problem: too much discourse, tools are not yet good
enough...
Overview
Problem: too much discourse, tools are not yet good
enough...
1. First attempt: allow authors to validate entities
Overview
Problem: too much discourse, tools are not yet good
enough...
1. First attempt: allow authors to validate entitie...
Overview
Problem: too much discourse, tools are not yet good
enough...
1. First attempt: allow authors to validate entitie...
Why Study Biological Discourse?
Why Study Biological Discourse?

-   There is too much of it!
Why Study Biological Discourse?

-   There is too much of it!
Why Study Biological Discourse?

-   There is too much of it!

-   Text mining and ‘fact
    extraction’ techniques are
  ...
Why Study Biological Discourse?

-   There is too much of it!

-   Text mining and ‘fact
    extraction’ techniques are
  ...
Why Study Biological Discourse?

-   There is too much of it!

-   Text mining and ‘fact
    extraction’ techniques are
  ...
Example state of the art: MEDIE
Example state of the art: MEDIE




Alteration of nm23, P53, and S100A4 expression may
contribute to the development of ga...
Example state of the art: MEDIE




Alteration of nm23, P53, and S100A4 expression may
contribute to the development of ga...
Example state of the art: MEDIE


                              Add this knowledge during authoring?

Alteration of nm23, ...
First attempt: allow authors
     to validate entities
Improve time + quality of knowledgebase entry
Improve time + quality of knowledgebase entry

 - For database curators: save time and money
Improve time + quality of knowledgebase entry

 - For database curators: save time and money
 - For authors: lower the thr...
Improve time + quality of knowledgebase entry

 - For database curators: save time and money
 - For authors: lower the thr...
Improve time + quality of knowledgebase entry

 - For database curators: save time and money
 - For authors: lower the thr...
expression of GSG1 stimulates TPAP targeting to the
ER, suggesting that interactions between the two
proteins lead to the ...
expression of GSG1 stimulates TPAP targeting to the
ER, suggesting that interactions between the two
proteins lead to the ...
expression of GSG1 stimulates TPAP targeting to the
ER, suggesting that interactions between the two
proteins lead to the ...
expression of GSG1 stimulates TPAP targeting to the
ER, suggesting that interactions between the two
proteins lead to the ...
expression of GSG1 stimulates TPAP targeting to the
ER, suggesting that interactions between the two
proteins lead to the ...
How? Word Plugin




                   5
How? Word Plugin
-   Okkam4MsW: a Microsoft Word plugin interact with Web Services performing NLP
    and semantic technol...
How? Word Plugin
-   Okkam4MsW: a Microsoft Word plugin interact with Web Services performing NLP
    and semantic technol...
OKKAM Entity Editor in MS Word
OKKAM Entity Editor in MS Word
OKKAM Entity Editor in MS Word
OKKAM Entity Editor in MS Word
http://sig.ma
Second attempt:
discourse analysis
What else is wrong with MEDIE?
What else is wrong with MEDIE?




Alteration of nm23, P53, and S100A4 expression may
contribute to the development of gas...
What else is wrong with MEDIE?




Alteration of nm23, P53, and S100A4 expression may
contribute to the development of gas...
What else is wrong with MEDIE?

                          without some idea of the status of the
                         ...
Discourse Analysis
Discourse Analysis
Underlying model of text mining systems:

   -   Scientific paper is ‘statement of pertinent facts’
   -...
Discourse Analysis
Underlying model of text mining systems:

    -   Scientific paper is ‘statement of pertinent facts’
   ...
Overall Research Questions
Overall Research Questions
i. How can we model the discourse/suasive moves in a
   biological paper?
Overall Research Questions
i. How can we model the discourse/suasive moves in a
   biological paper?
ii. Can this model he...
Overall Research Questions
i. How can we model the discourse/suasive moves in a
   biological paper?
ii. Can this model he...
Discourse analysis
Discourse analysis
Segmentation and classification:
Discourse analysis
Segmentation and classification:
1. Parse text into discourse segments (edu’s) containing a
   single rh...
Discourse analysis
Segmentation and classification:
1. Parse text into discourse segments (edu’s) containing a
   single rh...
Discourse analysis
Segmentation and classification:
1. Parse text into discourse segments (edu’s) containing a
   single rh...
Segmentation
Segmentation
Goal: ‘one new thought per segment’:
Segmentation
Goal: ‘one new thought per segment’:
Figure 4A shows that following RASV12 stimulation, p53
was stabilized an...
Segmentation
Goal: ‘one new thought per segment’:
Figure 4A shows that following RASV12 stimulation, p53
was stabilized an...
Segmentation
Goal: ‘one new thought per segment’:
Figure 4A shows that following RASV12 stimulation, p53
was stabilized an...
Segmentation
Goal: ‘one new thought per segment’:
Figure 4A shows that following RASV12 stimulation, p53
was stabilized an...
Segment Types
Segment Types
Segment       Description                          Example
Fact          a known fact, generally without    ...
Segment Types
Segment       Description                          Example
Fact          a known fact, generally without    ...
Segment Types
Segment       Description                          Example
Fact          a known fact, generally without    ...
Linguistic and structural properties
Linguistic and structural properties
Linguistic and structural properties
1. Position in text

      -   Section of the paper (Introduction, Results, Discussio...
Linguistic and structural properties
1. Position in text

      -    Section of the paper (Introduction, Results, Discussi...
Linguistic and structural properties
1. Position in text

      -    Section of the paper (Introduction, Results, Discussi...
Results: Section and Sequence
Results: Section and Sequence
1. Voorhoeve, 2006: Cell - 427 segments
Results: Section and Sequence
1. Voorhoeve, 2006: Cell - 427 segments

2. Louiseau, 2008: European Neuropsychopharmacology...
Results: Section and Sequence
1. Voorhoeve, 2006: Cell - 427 segments

2. Louiseau, 2008: European Neuropsychopharmacology...
Results: Section and Sequence
1. Voorhoeve, 2006: Cell - 427 segments

2. Louiseau, 2008: European Neuropsychopharmacology...
Results: Section and Sequence
1. Voorhoeve, 2006: Cell - 427 segments

2. Louiseau, 2008: European Neuropsychopharmacology...
Results:Verb Tense
Results:Verb Tense

-   Realm of the Present:
    Fact (82%), Hypothesis (71%), Implication (62%)
Results:Verb Tense

-   Realm of the Present:
    Fact (82%), Hypothesis (71%), Implication (62%)

-   Realm of the Past:
...
Results:Verb Tense

-   Realm of the Present:
    Fact (82%), Hypothesis (71%), Implication (62%)

-   Realm of the Past:
...
Results:Verb Tense

-   Realm of the Present:
    Fact (82%), Hypothesis (71%), Implication (62%)

-   Realm of the Past:
...
Results: Verb Type
Results: Verb Type

-   Thing - Thing: high in experimental (Method, Result)
    and conceptual (Problem, Hypothesis, Fact...
Results: Verb Type

-   Thing - Thing: high in experimental (Method, Result)
    and conceptual (Problem, Hypothesis, Fact...
Results: Verb Type

-   Thing - Thing: high in experimental (Method, Result)
    and conceptual (Problem, Hypothesis, Fact...
Results: Verb Type

-   Thing - Thing: high in experimental (Method, Result)
    and conceptual (Problem, Hypothesis, Fact...
Results: Metadiscourse Markers
Results: Metadiscourse Markers

-   Causitive: high in Implications (therefore, thus),

-   Comparison: high in Results (w...
i. How can we model the discourse moves in a biological paper?

       Discourse as a Fact-ory
                   hypothes...
i. How can we model the discourse moves in a biological paper?

       Discourse as a Fact-ory
                   hypothes...
i. How can we model the discourse moves in a biological paper?

       Discourse as a Fact-ory
                   hypothes...
i. How can we model the discourse moves in a biological paper?

       Discourse as a Fact-ory
                   hypothes...
i. How can we model the discourse moves in a biological paper?

       Discourse as a Fact-ory
                   hypothes...
i. How can we model the discourse moves in a biological paper?

       Discourse as a Fact-ory
                   hypothes...
i. How can we model the discourse moves in a biological paper?

       Discourse as a Fact-ory
                   hypothes...
i. How can we model the discourse moves in a biological paper?

        Discourse as a Fact-ory
                    hypoth...
i. How can we model the discourse moves in a biological paper?

        Discourse as a Fact-ory
                    hypoth...
i. How can we model the discourse moves in a biological paper?

        Discourse as a Fact-ory
                    hypoth...
i. How can we model the discourse moves in a biological paper?

        Discourse as a Fact-ory
                    hypoth...
i. How can we model the discourse moves in a biological paper?

            Discourse as a Fact-ory
   hypothetical realm:...
ii. Is this useful for enabling automated epistemic markup?
ii. Is this useful for enabling automated epistemic markup?

✓ first efforts seem promising: simple markers (‘suggest’
   v...
ii. Is this useful for enabling automated epistemic markup?

✓ first efforts seem promising: simple markers (‘suggest’
   v...
ii. Is this useful for enabling automated epistemic markup?

✓ first efforts seem promising: simple markers (‘suggest’
   v...
ii. Is this useful for enabling automated epistemic markup?

✓ first efforts seem promising: simple markers (‘suggest’
   v...
ii. Is this useful for enabling automated epistemic markup?

✓ first efforts seem promising: simple markers (‘suggest’
    ...
ii. Is this useful for enabling automated epistemic markup?

✓ first efforts seem promising: simple markers (‘suggest’
    ...
ii. Is this useful for enabling automated epistemic markup?

✓ first efforts seem promising: simple markers (‘suggest’
    ...
ii. Is this useful for enabling automated epistemic markup?

✓ first efforts seem promising: simple markers (‘suggest’
    ...
KnownFact   KnownFact

Concepts
To investigate the possibility that
   miR-372 and miR-373 suppress the
       expression of LATS2, we...




            ...
To investigate the possibility that
   miR-372 and miR-373 suppress the
       expression of LATS2, we...




            ...
To investigate the possibility that
   miR-372 and miR-373 suppress the
       expression of LATS2, we...

               ...
Voorhoeve, 2006
    To investigate the possibility that
   miR-372 and miR-373 suppress the
       expression of LATS2, we...
Voorhoeve, 2006
    To investigate the possibility that
   miR-372 and miR-373 suppress the
       expression of LATS2, we...
Voorhoeve, 2006
    To investigate the possibility that
   miR-372 and miR-373 suppress the
       expression of LATS2, we...
Yabuta, JBioChem 2007

                                          Voorhoeve, 2006                                   miR-372...
Fact creation vs. Latour (1986)
Fact creation vs. Latour (1986)
Future research:
Future research:

‣   Need co-annotators to verify semantic types
Future research:

‣   Need co-annotators to verify semantic types
‣   Need to scale up with more (types of) texts!
Future research:

‣   Need co-annotators to verify semantic types
‣   Need to scale up with more (types of) texts!
I. How ...
Future research:

‣   Need co-annotators to verify semantic types
‣   Need to scale up with more (types of) texts!
I. How ...
Future research:

‣   Need co-annotators to verify semantic types
‣   Need to scale up with more (types of) texts!
I. How ...
Third attempt:
collaboration!
Improve ‘what is claimed about an entity’
insulin ::: maintaining   glucose       ... diabetes defect) to overcome
GB00084...
Improve ‘what is claimed about an entity’
insulin ::: maintaining   glucose       ... diabetes defect) to overcome    When...
Improve ‘what is claimed about an entity’
insulin ::: maintaining   glucose       ... diabetes defect) to overcome    When...
A network of hypotheses and evidence




                               30
A network of hypotheses and evidence

       PHC   undergo Growth arrest




                                     30
A network of hypotheses and evidence

                   PHC     undergo Growth arrest



Paper A:
            implication...
A network of hypotheses and evidence

                   PHC       undergo Growth arrest



Paper A:
            implicati...
A network of hypotheses and evidence

                   PHC       undergo Growth arrest



Paper A:                      ...
A network of hypotheses and evidence

                   PHC       undergo Growth arrest



Paper A:                      ...
A network of hypotheses and evidence

                   PHC       undergo Growth arrest



Paper A:                      ...
A network of hypotheses and evidence

                   PHC       undergo Growth arrest



Paper A:                      ...
A network of hypotheses and evidence

                   PHC       undergo Growth arrest



Paper A:                      ...
A network of hypotheses and evidence

                   PHC        undergo Growth arrest



Paper A:                     ...
For Example: SWAN
For Example: SWAN
For Example: SWAN
For Example: SWAN
HypER Working Group:
-       Goal: Align and expand existing efforts on detection and
        analysis of Hypotheses, Evid...
HypER Working Group:
-       Goal: Align and expand existing efforts on detection and
        analysis of Hypotheses, Evid...
HypER Working Group:
-       Goal: Align and expand existing efforts on detection and
        analysis of Hypotheses, Evid...
HypER Working Group:
-       Goal: Align and expand existing efforts on detection and
        analysis of Hypotheses, Evid...
HypER Working Group:
-       Goal: Align and expand existing efforts on detection and
        analysis of Hypotheses, Evid...
HypER Working Group:
-       Goal: Align and expand existing efforts on detection and
        analysis of Hypotheses, Evid...
HypER Working Group:
-     Goal: Align and expand existing efforts on detection and
      analysis of Hypotheses, Evidence...
HypER Activities: http://hyper.wik.is
HypER Activities: http://hyper.wik.is

Current activities:

   -   Aligning discourse ontologies: joint task with W3C HCLS...
HypER Activities: http://hyper.wik.is

Current activities:

   -   Aligning discourse ontologies: joint task with W3C HCLS...
Conclusion
Conclusion
Problem: too much discourse, tools are not yet good
enough...
Conclusion
Problem: too much discourse, tools are not yet good
enough...
1. First attempt: allow authors to validate entit...
Conclusion
Problem: too much discourse, tools are not yet good
enough...
1. First attempt: allow authors to validate entit...
Conclusion
Problem: too much discourse, tools are not yet good
enough...
1. First attempt: allow authors to validate entit...
Questions?
       a.dewaard@elsevier.com
http://elsatglabs.elsevier.com/labs/anita
References
Hyland, K. (2004). Disciplinary Discourses: Social Interactions in Academic
Writing, Addison Wesley Publishing ...
Segmentation Criteria (summary)
  Finite/
                          Grammatical role                 Segment?             ...
Basic Segment Types
Segment              Description                                     Example

                a known ...
Two Types of Derived Segment Types
Two Types of Derived Segment Types
‘Other-segments’, related to (referenced) other work:
Two Types of Derived Segment Types
‘Other-segments’, related to (referenced) other work:

-   other-result: ‘they are also...
Two Types of Derived Segment Types
‘Other-segments’, related to (referenced) other work:

-   other-result: ‘they are also...
Two Types of Derived Segment Types
‘Other-segments’, related to (referenced) other work:

-   other-result: ‘they are also...
Two Types of Derived Segment Types
‘Other-segments’, related to (referenced) other work:

-   other-result: ‘they are also...
Two Types of Derived Segment Types
‘Other-segments’, related to (referenced) other work:

-   other-result: ‘they are also...
Two Types of Derived Segment Types
‘Other-segments’, related to (referenced) other work:

-   other-result: ‘they are also...
Two Types of Derived Segment Types
‘Other-segments’, related to (referenced) other work:

-   other-result: ‘they are also...
Two Types of Derived Segment Types
‘Other-segments’, related to (referenced) other work:

-   other-result: ‘they are also...
My categories vs. Latour (1979)
Linguistic and structural properties
Linguistic and structural properties
Linguistic and structural properties
1. Position in text
Linguistic and structural properties
1. Position in text

      -   Section of the paper (Introduction, Results, Discussio...
Linguistic and structural properties
1. Position in text

      -   Section of the paper (Introduction, Results, Discussio...
Linguistic and structural properties
1. Position in text

      -   Section of the paper (Introduction, Results, Discussio...
Linguistic and structural properties
1. Position in text

      -    Section of the paper (Introduction, Results, Discussi...
Linguistic and structural properties
1. Position in text

      -    Section of the paper (Introduction, Results, Discussi...
Linguistic and structural properties
1. Position in text

      -    Section of the paper (Introduction, Results, Discussi...
Linguistic and structural properties
1. Position in text

      -    Section of the paper (Introduction, Results, Discussi...
Linguistic and structural properties
1. Position in text

      -    Section of the paper (Introduction, Results, Discussi...
Linguistic and structural properties
1. Position in text

      -    Section of the paper (Introduction, Results, Discussi...
Linguistic and structural properties
1. Position in text

      -    Section of the paper (Introduction, Results, Discussi...
Linguistic and structural properties
1. Position in text

      -    Section of the paper (Introduction, Results, Discussi...
Linguistic and structural properties
1. Position in text

      -    Section of the paper (Introduction, Results, Discussi...
Verb class
Verb class
Two types of entities interact in biology texts:
-   Thing:
    -   Thing -> Increase, die, etc
    -   Thing-t...
Interpretation: 3 Realms of Science:

  Conceptual
    realm



Experimental realm




     Data realm
Interpretation: 3 Realms of Science:
                       (1) Oncogene-induced senescence is            (4b) transductio...
Interpretation: 3 Realms of Science:
                       (1) Oncogene-induced senescence is            (4b) transductio...
Tense 1: Concepts vs. Experiment
(1) Oncogene-induced senescence is            (4b) transduction with either




         ...
Tense 2: Referral

                past                                present                                  future
   ...
Tense 1+ 2 = 3:


                          Claim,
                           fact
Conceptual




                        ...
Xerox2009
Upcoming SlideShare
Loading in...5
×

Xerox2009

546

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
546
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Xerox2009"

  1. 1. Identifying Biological Knowledge: Three Possible Strategies Anita de Waard Disruptive Technologies Director, Elsevier Labs, Amsterdam Casimir Researcher, UiL-OTS, Utrecht University XRCE, Grenoble, 24 September 2009
  2. 2. Overview
  3. 3. Overview Problem: too much discourse, tools are not yet good enough...
  4. 4. Overview Problem: too much discourse, tools are not yet good enough... 1. First attempt: allow authors to validate entities
  5. 5. Overview Problem: too much discourse, tools are not yet good enough... 1. First attempt: allow authors to validate entities 2. Second attempt: discourse analysis
  6. 6. Overview Problem: too much discourse, tools are not yet good enough... 1. First attempt: allow authors to validate entities 2. Second attempt: discourse analysis 3. Third attempt: collaboration to identify hypotheses
  7. 7. Why Study Biological Discourse?
  8. 8. Why Study Biological Discourse? - There is too much of it!
  9. 9. Why Study Biological Discourse? - There is too much of it!
  10. 10. Why Study Biological Discourse? - There is too much of it! - Text mining and ‘fact extraction’ techniques are gaining ground to tame this tangle
  11. 11. Why Study Biological Discourse? - There is too much of it! - Text mining and ‘fact extraction’ techniques are gaining ground to tame this tangle - Emerging area of biological natural language processing (BioNLP): subfield of computational linguistics
  12. 12. Why Study Biological Discourse? - There is too much of it! - Text mining and ‘fact extraction’ techniques are gaining ground to tame this tangle - Emerging area of biological natural language processing (BioNLP): subfield of computational linguistics - Main focus: identifying biological entities (genes, proteins, drugs) and their relationships
  13. 13. Example state of the art: MEDIE
  14. 14. Example state of the art: MEDIE Alteration of nm23, P53, and S100A4 expression may contribute to the development of gastric
  15. 15. Example state of the art: MEDIE Alteration of nm23, P53, and S100A4 expression may contribute to the development of gastric Previous studies have implicated miR-34a as a tumor suppressor gene whose transcription is activated by p53.
  16. 16. Example state of the art: MEDIE Add this knowledge during authoring? Alteration of nm23, P53, and S100A4 expression may contribute to the development of gastric Previous studies have implicated miR-34a as a tumor suppressor gene whose transcription is activated by p53.
  17. 17. First attempt: allow authors to validate entities
  18. 18. Improve time + quality of knowledgebase entry
  19. 19. Improve time + quality of knowledgebase entry - For database curators: save time and money
  20. 20. Improve time + quality of knowledgebase entry - For database curators: save time and money - For authors: lower the threshold to submitting papers with metadata
  21. 21. Improve time + quality of knowledgebase entry - For database curators: save time and money - For authors: lower the threshold to submitting papers with metadata - Structured Digital Abstract: an editorial experiment to increase the reach of online published articles
  22. 22. Improve time + quality of knowledgebase entry - For database curators: save time and money - For authors: lower the threshold to submitting papers with metadata - Structured Digital Abstract: an editorial experiment to increase the reach of online published articles - SDA encodes in a schema information contained in the article
  23. 23. expression of GSG1 stimulates TPAP targeting to the ER, suggesting that interactions between the two proteins lead to the redistribution of TPAP from the cytosol to the ER. MINT-6168263: Gsg1 (uniprotkb:Q8R1W2), TPAP (uniprotkb:Q9WVP6) and Calmegin (uniprotkb:P52194) colocalize (MI:0403) by cosedimentation (MI:0027) MINT-6168204, MINT-6168178: Gsg1 (uniprotkb:Q8R1W2) and TPAP (uniprotkb:Q9WVP6) colocalize (MI:0403) by fluorescence microscopy (MI:0416) MINT-6167930: Gsg1 (uniprotkb:Q8R1W2) physically interacts (MI:
  24. 24. expression of GSG1 stimulates TPAP targeting to the ER, suggesting that interactions between the two proteins lead to the redistribution of TPAP from the cytosol to the ER. MINT-6168263: Gsg1 (uniprotkb:Q8R1W2), TPAP (uniprotkb:Q9WVP6) and Calmegin (uniprotkb:P52194) colocalize (MI:0403) by cosedimentation (MI:0027) MINT-6168204, MINT-6168178: Gsg1 (uniprotkb:Q8R1W2) and TPAP (uniprotkb:Q9WVP6) colocalize (MI:0403) by fluorescence microscopy (MI:0416) MINT-6167930: Gsg1 (uniprotkb:Q8R1W2) physically interacts (MI:
  25. 25. expression of GSG1 stimulates TPAP targeting to the ER, suggesting that interactions between the two proteins lead to the redistribution of TPAP from the cytosol to the ER. MINT-6168263: Gsg1 (uniprotkb:Q8R1W2), TPAP (uniprotkb:Q9WVP6) and Calmegin (uniprotkb:P52194) colocalize (MI:0403) by cosedimentation (MI:0027) MINT-6168204, MINT-6168178: Gsg1 (uniprotkb:Q8R1W2) and TPAP (uniprotkb:Q9WVP6) colocalize (MI:0403) by fluorescence microscopy (MI:0416) MINT-6167930: Gsg1 (uniprotkb:Q8R1W2) physically interacts (MI:
  26. 26. expression of GSG1 stimulates TPAP targeting to the ER, suggesting that interactions between the two proteins lead to the redistribution of TPAP from the cytosol to the ER. MINT-6168263: Gsg1 (uniprotkb:Q8R1W2), TPAP (uniprotkb:Q9WVP6) and Calmegin (uniprotkb:P52194) colocalize (MI:0403) by cosedimentation (MI:0027) MINT-6168204, MINT-6168178: Gsg1 (uniprotkb:Q8R1W2) and TPAP (uniprotkb:Q9WVP6) colocalize (MI:0403) by fluorescence microscopy (MI:0416) MINT-6167930: Gsg1 (uniprotkb:Q8R1W2) physically interacts (MI:
  27. 27. expression of GSG1 stimulates TPAP targeting to the ER, suggesting that interactions between the two proteins lead to the redistribution of TPAP from the cytosol to the ER. MINT-6168263: Gsg1 (uniprotkb:Q8R1W2), TPAP (uniprotkb:Q9WVP6) and Calmegin (uniprotkb:P52194) colocalize (MI:0403) by cosedimentation (MI:0027) MINT-6168204, MINT-6168178: Gsg1 (uniprotkb:Q8R1W2) and TPAP (uniprotkb:Q9WVP6) colocalize (MI:0403) by fluorescence microscopy (MI:0416) MINT-6167930: Gsg1 (uniprotkb:Q8R1W2) physically interacts (MI:
  28. 28. How? Word Plugin 5
  29. 29. How? Word Plugin - Okkam4MsW: a Microsoft Word plugin interact with Web Services performing NLP and semantic technologies to detect entities and contextual information - The OKKAM repository is queried to get the right OKKAM id and alternative ids (UniProt in this case) 5
  30. 30. How? Word Plugin - Okkam4MsW: a Microsoft Word plugin interact with Web Services performing NLP and semantic technologies to detect entities and contextual information - The OKKAM repository is queried to get the right OKKAM id and alternative ids (UniProt in this case) 5
  31. 31. OKKAM Entity Editor in MS Word
  32. 32. OKKAM Entity Editor in MS Word
  33. 33. OKKAM Entity Editor in MS Word
  34. 34. OKKAM Entity Editor in MS Word
  35. 35. http://sig.ma
  36. 36. Second attempt: discourse analysis
  37. 37. What else is wrong with MEDIE?
  38. 38. What else is wrong with MEDIE? Alteration of nm23, P53, and S100A4 expression may contribute to the development of gastric
  39. 39. What else is wrong with MEDIE? Alteration of nm23, P53, and S100A4 expression may contribute to the development of gastric Previous studies have implicated miR-34a as a tumor suppressor gene whose transcription is activated by p53.
  40. 40. What else is wrong with MEDIE? without some idea of the status of the sentence, it cannot be interpreted! Alteration of nm23, P53, and S100A4 expression may contribute to the development of gastric Previous studies have implicated miR-34a as a tumor suppressor gene whose transcription is activated by p53.
  41. 41. Discourse Analysis
  42. 42. Discourse Analysis Underlying model of text mining systems: - Scientific paper is ‘statement of pertinent facts’ - So: finding entities and relationships will give you a summary of the knowledge within the paper - However, information extracted this way is not very useful....
  43. 43. Discourse Analysis Underlying model of text mining systems: - Scientific paper is ‘statement of pertinent facts’ - So: finding entities and relationships will give you a summary of the knowledge within the paper - However, information extracted this way is not very useful.... Proposed approach: treat scientific paper as a persuasive text: specific genre, with genre characteristics and allowed persuasive techniques: - ‘these results suggest’ (depersonification) - ‘as fig. 2a shows’ (evidence is in the data) - ‘oncogenes produce a stress response [Serrano, 2003]’ References and data form a “folded array of successive defense lines, behind which scientists ensconce themselves” (Latour, 1986)
  44. 44. Overall Research Questions
  45. 45. Overall Research Questions i. How can we model the discourse/suasive moves in a biological paper?
  46. 46. Overall Research Questions i. How can we model the discourse/suasive moves in a biological paper? ii. Can this model help enable automated epistemic markup?
  47. 47. Overall Research Questions i. How can we model the discourse/suasive moves in a biological paper? ii. Can this model help enable automated epistemic markup? iii. Can it improve knowledge representations of collections of papers?
  48. 48. Discourse analysis
  49. 49. Discourse analysis Segmentation and classification:
  50. 50. Discourse analysis Segmentation and classification: 1. Parse text into discourse segments (edu’s) containing a single rhetorical move (if possible...)
  51. 51. Discourse analysis Segmentation and classification: 1. Parse text into discourse segments (edu’s) containing a single rhetorical move (if possible...) 2. Determine categories or types of discourse segments that have similar semantic/pragmatic properties
  52. 52. Discourse analysis Segmentation and classification: 1. Parse text into discourse segments (edu’s) containing a single rhetorical move (if possible...) 2. Determine categories or types of discourse segments that have similar semantic/pragmatic properties 3. Look at a number of linguistic characteristics and see if these segment types share those characteristics.
  53. 53. Segmentation
  54. 54. Segmentation Goal: ‘one new thought per segment’:
  55. 55. Segmentation Goal: ‘one new thought per segment’: Figure 4A shows that following RASV12 stimulation, p53 was stabilized and activated, and its target gene, p21cip1, was induced in all cases, indicating an intact p53 pathway in these cells.
  56. 56. Segmentation Goal: ‘one new thought per segment’: Figure 4A shows that following RASV12 stimulation, p53 was stabilized and activated, and its target gene, p21cip1, was induced in all cases, indicating an intact p53 pathway in these cells. a. Figure 4a shows that b. following RASV12 stimulation c. p53 was stabilized and activated d. and the target gene, p21cip1, was induced in all cases, e. indicating an intact p53 pathway in these cells.
  57. 57. Segmentation Goal: ‘one new thought per segment’: Figure 4A shows that following RASV12 stimulation, p53 was stabilized and activated, and its target gene, p21cip1, was induced in all cases, indicating an intact p53 pathway in these cells. a. Figure 4a shows that Intratextual b. following RASV12 stimulation Method c. p53 was stabilized and activated Result d. and the target gene, p21cip1, was induced in all cases, Result e. indicating an intact p53 pathway in these cells. Implication
  58. 58. Segmentation Goal: ‘one new thought per segment’: Figure 4A shows that following RASV12 stimulation, p53 was stabilized and activated, and its target gene, p21cip1, was induced in all cases, indicating an intact p53 pathway in these cells. a. Figure 4a shows that Intratextual b. following RASV12 stimulation Method c. p53 was stabilized and activated Result d. and the target gene, p21cip1, was induced in all cases, Result e. indicating an intact p53 pathway in these cells. Implication
  59. 59. Segment Types
  60. 60. Segment Types Segment Description Example Fact a known fact, generally without mature miR-373 is a homolog of miR-372 explicit citation Hypothesis a proposed idea, not supported by This could for instance be a result of high evidence mdm2 levels Problem unresolved, contradictory, or However, further investigation is required to unclear issue demonstrate the exact mechanism of LATS2 action Goal research goal To identify novel functions of miRNAs, Method experimental method Using fluorescence microscopy and luciferase assays, Result a restatement of the outcome of all constructs yielded high expression levels an experiment of mature miRNAs Implication an interpretation of the results, in our procedure is sensitive enough to detect light of earlier hypotheses and facts mild growth differences
  61. 61. Segment Types Segment Description Example Fact a known fact, generally without mature miR-373 is a homolog of miR-372 explicit citation Hypothesis a proposed idea, not supported by This could for instance be a result of high evidence mdm2 levels Problem unresolved, contradictory, or However, further investigation is required to unclear issue demonstrate the exact mechanism of LATS2 action Goal research goal To identify novel functions of miRNAs, Method experimental method Using fluorescence microscopy and luciferase assays, Result a restatement of the outcome of all constructs yielded high expression levels an experiment of mature miRNAs Implication an interpretation of the results, in our procedure is sensitive enough to detect light of earlier hypotheses and facts mild growth differences ‘Other-segments’, related to (referenced) other work:
  62. 62. Segment Types Segment Description Example Fact a known fact, generally without mature miR-373 is a homolog of miR-372 explicit citation Hypothesis a proposed idea, not supported by This could for instance be a result of high evidence mdm2 levels Problem unresolved, contradictory, or However, further investigation is required to unclear issue demonstrate the exact mechanism of LATS2 action Goal research goal To identify novel functions of miRNAs, Method experimental method Using fluorescence microscopy and luciferase assays, Result a restatement of the outcome of all constructs yielded high expression levels an experiment of mature miRNAs Implication an interpretation of the results, in our procedure is sensitive enough to detect light of earlier hypotheses and facts mild growth differences ‘Other-segments’, related to (referenced) other work: Regulatory segments, acting as matrix sentences framing other segments:
  63. 63. Linguistic and structural properties
  64. 64. Linguistic and structural properties
  65. 65. Linguistic and structural properties 1. Position in text - Section of the paper (Introduction, Results, Discussion) - Beginning/middle/end of section - First/second/third part of sentence
  66. 66. Linguistic and structural properties 1. Position in text - Section of the paper (Introduction, Results, Discussion) - Beginning/middle/end of section - First/second/third part of sentence 2. Verb: - Tense, aspect, voice - Verb class: Thing (increase), Thing-Thing (inhibit), Person-Thing (examine, observe, operate, implicate), Person: Report - Lexicon
  67. 67. Linguistic and structural properties 1. Position in text - Section of the paper (Introduction, Results, Discussion) - Beginning/middle/end of section - First/second/third part of sentence 2. Verb: - Tense, aspect, voice - Verb class: Thing (increase), Thing-Thing (inhibit), Person-Thing (examine, observe, operate, implicate), Person: Report - Lexicon 3. Metadiscourse markers [Hyland, 2003]: - Connectives - Endophorics, Evidentials - Hedges, Boosters - Person markers
  68. 68. Results: Section and Sequence
  69. 69. Results: Section and Sequence 1. Voorhoeve, 2006: Cell - 427 segments
  70. 70. Results: Section and Sequence 1. Voorhoeve, 2006: Cell - 427 segments 2. Louiseau, 2008: European Neuropsychopharmacology - 281 segments
  71. 71. Results: Section and Sequence 1. Voorhoeve, 2006: Cell - 427 segments 2. Louiseau, 2008: European Neuropsychopharmacology - 281 segments - Introduction (90): Other-Result (24), Other-Implication (11), Problem (9), Fact (8)
  72. 72. Results: Section and Sequence 1. Voorhoeve, 2006: Cell - 427 segments 2. Louiseau, 2008: European Neuropsychopharmacology - 281 segments - Introduction (90): Other-Result (24), Other-Implication (11), Problem (9), Fact (8) - Result (334): Goal (26) -> Method (68) -> Result (105) -> Reg-Implication (23) ->Implication (50)
  73. 73. Results: Section and Sequence 1. Voorhoeve, 2006: Cell - 427 segments 2. Louiseau, 2008: European Neuropsychopharmacology - 281 segments - Introduction (90): Other-Result (24), Other-Implication (11), Problem (9), Fact (8) - Result (334): Goal (26) -> Method (68) -> Result (105) -> Reg-Implication (23) ->Implication (50) - Discussion (187): Implication (27), Result (21), Other-Result (24), Hypothesis (19), Problem (17)
  74. 74. Results:Verb Tense
  75. 75. Results:Verb Tense - Realm of the Present: Fact (82%), Hypothesis (71%), Implication (62%)
  76. 76. Results:Verb Tense - Realm of the Present: Fact (82%), Hypothesis (71%), Implication (62%) - Realm of the Past: Result (82%), Method (76%) - 50% Passive, of Method 50% Past Perfect
  77. 77. Results:Verb Tense - Realm of the Present: Fact (82%), Hypothesis (71%), Implication (62%) - Realm of the Past: Result (82%), Method (76%) - 50% Passive, of Method 50% Past Perfect - Realm of the Modal: 44% in Hypothesis
  78. 78. Results:Verb Tense - Realm of the Present: Fact (82%), Hypothesis (71%), Implication (62%) - Realm of the Past: Result (82%), Method (76%) - 50% Passive, of Method 50% Past Perfect - Realm of the Modal: 44% in Hypothesis - Realm of the To-Infinitive: 50% is Goal, 75% of Goal is to-infinitive (Purpose Clause)
  79. 79. Results: Verb Type
  80. 80. Results: Verb Type - Thing - Thing: high in experimental (Method, Result) and conceptual (Problem, Hypothesis, Fact, Implication) segments: ‣ Need to differentiate between ‘concept’ things and ‘experimental’ things!
  81. 81. Results: Verb Type - Thing - Thing: high in experimental (Method, Result) and conceptual (Problem, Hypothesis, Fact, Implication) segments: ‣ Need to differentiate between ‘concept’ things and ‘experimental’ things! - Person - Implicate: high in Hypothesis, Implication, Problem
  82. 82. Results: Verb Type - Thing - Thing: high in experimental (Method, Result) and conceptual (Problem, Hypothesis, Fact, Implication) segments: ‣ Need to differentiate between ‘concept’ things and ‘experimental’ things! - Person - Implicate: high in Hypothesis, Implication, Problem - Person - Operate: high in Methods (90%)
  83. 83. Results: Verb Type - Thing - Thing: high in experimental (Method, Result) and conceptual (Problem, Hypothesis, Fact, Implication) segments: ‣ Need to differentiate between ‘concept’ things and ‘experimental’ things! - Person - Implicate: high in Hypothesis, Implication, Problem - Person - Operate: high in Methods (90%) - Person - Examine: high in Goal (87%)
  84. 84. Results: Metadiscourse Markers
  85. 85. Results: Metadiscourse Markers - Causitive: high in Implications (therefore, thus), - Comparison: high in Results (whereas, in contrast), - Temporality: high in Methods (next, subsequently) - Person markers: high in Methods (50%) and Results - Boosters: high in Results (indeed, surprisingly, interestingly) - Hedges: high in Implication, Reg-Implication (raises the possibility that, explains at least in part) - but modals and ‘suggest’ verbs are left out
  86. 86. i. How can we model the discourse moves in a biological paper? Discourse as a Fact-ory hypothesis problem fact fact fact
  87. 87. i. How can we model the discourse moves in a biological paper? Discourse as a Fact-ory hypothesis goal to problem fact fact fact
  88. 88. i. How can we model the discourse moves in a biological paper? Discourse as a Fact-ory hypothesis goal to problem fact fact fact
  89. 89. i. How can we model the discourse moves in a biological paper? Discourse as a Fact-ory hypothesis goal to problem we method resulting in result fact fact fact
  90. 90. i. How can we model the discourse moves in a biological paper? Discourse as a Fact-ory hypothesis goal to problem we method resulting in result fact fact fact
  91. 91. i. How can we model the discourse moves in a biological paper? Discourse as a Fact-ory hypothesis goal to problem we method resulting in result suggests that discussion fact fact fact implication
  92. 92. i. How can we model the discourse moves in a biological paper? Discourse as a Fact-ory hypothesis goal to problem we method resulting in result suggests that discussion fact fact fact implication
  93. 93. i. How can we model the discourse moves in a biological paper? Discourse as a Fact-ory hypothesis goal to problem we method introduction resulting in result suggests that discussion fact fact fact implication
  94. 94. i. How can we model the discourse moves in a biological paper? Discourse as a Fact-ory hypothesis goal to problem results we method introduction resulting in result suggests that discussion fact fact fact implication
  95. 95. i. How can we model the discourse moves in a biological paper? Discourse as a Fact-ory hypothesis goal to problem results we method introduction resulting in result suggests that discussion fact fact fact implication discussion
  96. 96. i. How can we model the discourse moves in a biological paper? Discourse as a Fact-ory hypothesis goal to problem results we method introduction resulting in result suggests that discussion fact fact fact implication Shared view Own view discussion
  97. 97. i. How can we model the discourse moves in a biological paper? Discourse as a Fact-ory hypothetical realm: hypothesis realm of activity: (might, would) (to test, to see) goal to problem results we realm of method introduction experience: past resulting in result suggests that discussion realm of models: fact fact fact present implication Shared view Own view discussion
  98. 98. ii. Is this useful for enabling automated epistemic markup?
  99. 99. ii. Is this useful for enabling automated epistemic markup? ✓ first efforts seem promising: simple markers (‘suggest’ verbs, connectives, etc.) already help:
  100. 100. ii. Is this useful for enabling automated epistemic markup? ✓ first efforts seem promising: simple markers (‘suggest’ verbs, connectives, etc.) already help: 6> It is thus emerging that A_1-42-induced memory deficits may involve subtler neuronal alternations leading to synaptic deficits, prior to frank neurodegeneration in AD brains.
  101. 101. ii. Is this useful for enabling automated epistemic markup? ✓ first efforts seem promising: simple markers (‘suggest’ verbs, connectives, etc.) already help: 6> It is thus emerging that A_1-42-induced memory deficits may involve subtler neuronal alternations leading to synaptic deficits, prior to frank neurodegeneration in AD brains. TRIPLET(that A_1_GENE:+ - 42 - induced memory deficits,involve,subtler neuronal alternations)
  102. 102. ii. Is this useful for enabling automated epistemic markup? ✓ first efforts seem promising: simple markers (‘suggest’ verbs, connectives, etc.) already help: 6> It is thus emerging that A_1-42-induced memory deficits may involve subtler neuronal alternations leading to synaptic deficits, prior to frank neurodegeneration in AD brains. TRIPLET(that A_1_GENE:+ - 42 - induced memory deficits,involve,subtler neuronal alternations)
  103. 103. ii. Is this useful for enabling automated epistemic markup? ✓ first efforts seem promising: simple markers (‘suggest’ verbs, connectives, etc.) already help: 6> It is thus emerging that A_1-42-induced memory deficits may involve subtler neuronal alternations leading to synaptic deficits, prior to frank neurodegeneration in AD brains. TRIPLET(that A_1_GENE:+ - 42 - induced memory deficits,involve,subtler neuronal alternations) ‣ issue: segment parsing is difficult!
  104. 104. ii. Is this useful for enabling automated epistemic markup? ✓ first efforts seem promising: simple markers (‘suggest’ verbs, connectives, etc.) already help: 6> It is thus emerging that A_1-42-induced memory deficits may involve subtler neuronal alternations leading to synaptic deficits, prior to frank neurodegeneration in AD brains. TRIPLET(that A_1_GENE:+ - 42 - induced memory deficits,involve,subtler neuronal alternations) ‣ issue: segment parsing is difficult! ‣ issue: verb tense is not always accessible
  105. 105. ii. Is this useful for enabling automated epistemic markup? ✓ first efforts seem promising: simple markers (‘suggest’ verbs, connectives, etc.) already help: 6> It is thus emerging that A_1-42-induced memory deficits may involve subtler neuronal alternations leading to synaptic deficits, prior to frank neurodegeneration in AD brains. TRIPLET(that A_1_GENE:+ - 42 - induced memory deficits,involve,subtler neuronal alternations) ‣ issue: segment parsing is difficult! ‣ issue: verb tense is not always accessible ‣ bionlp: not that much work on full text, since commercial publishers are difficult :-)!
  106. 106. ii. Is this useful for enabling automated epistemic markup? ✓ first efforts seem promising: simple markers (‘suggest’ verbs, connectives, etc.) already help: 6> It is thus emerging that A_1-42-induced memory deficits may involve subtler neuronal alternations leading to synaptic deficits, prior to frank neurodegeneration in AD brains. TRIPLET(that A_1_GENE:+ - 42 - induced memory deficits,involve,subtler neuronal alternations) ‣ issue: segment parsing is difficult! ‣ issue: verb tense is not always accessible ‣ bionlp: not that much work on full text, since commercial publishers are difficult :-)! ‣ possible challenge at biolink 2011: watch this space...
  107. 107. KnownFact KnownFact Concepts
  108. 108. To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we... KnownFact KnownFact Concepts Hypothesis
  109. 109. To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we... KnownFact KnownFact Concepts Hypothesis Goal Method Result Data Experiment 1
  110. 110. To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we... Therefore, these results point to LATS2 as a mediator of the miR-372 and miR-373 effects on cell proliferation and tumorigenicity, KnownFact KnownFact Concepts Hypothesis Implication Goal Method Result Data Experiment 1
  111. 111. Voorhoeve, 2006 To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we... Therefore, these results point to LATS2 as a mediator of the miR-372 and miR-373 effects on cell proliferation and tumorigenicity, KnownFact KnownFact Concepts Hypothesis Implication Goal Method Result Data Experiment 1
  112. 112. Voorhoeve, 2006 To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we... Therefore, these results point to LATS2 as a mediator of the miR-372 and miR-373 effects on cell proliferation and tumorigenicity, KnownFact KnownFact Concepts Hypothesis Implication Goal Goal Method Result Method Result Data Data Experiment 1 Experiment 2
  113. 113. Voorhoeve, 2006 To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we... Raver-Shapira et.al, JMolCell 2007 Therefore, these results point to two miRNAs, miRNA-372 and-373, function as LATS2 as a mediator of the miR-372 and potential novel oncogenes in testicular germ cell miR-373 effects on cell proliferation and tumors by inhibition of LATS2 expression, which tumorigenicity, suggests that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006). KnownFact KnownFact Concepts Hypothesis Implication Fact Goal Goal Method Result Method Result Data Data Experiment 1 Experiment 2
  114. 114. Yabuta, JBioChem 2007 Voorhoeve, 2006 miR-372 and miR-373 target the Lats2 tumor suppressor To investigate the possibility that (Voorhoeve et al., 2006) miR-372 and miR-373 suppress the expression of LATS2, we... Raver-Shapira et.al, JMolCell 2007 Therefore, these results point to two miRNAs, miRNA-372 and-373, function as LATS2 as a mediator of the miR-372 and potential novel oncogenes in testicular germ cell miR-373 effects on cell proliferation and tumors by inhibition of LATS2 expression, which tumorigenicity, suggests that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006). KnownFact KnownFact Concepts Hypothesis Implication Fact Goal Goal Method Result Method Result Data Data Experiment 1 Experiment 2
  115. 115. Fact creation vs. Latour (1986)
  116. 116. Fact creation vs. Latour (1986)
  117. 117. Future research:
  118. 118. Future research: ‣ Need co-annotators to verify semantic types
  119. 119. Future research: ‣ Need co-annotators to verify semantic types ‣ Need to scale up with more (types of) texts!
  120. 120. Future research: ‣ Need co-annotators to verify semantic types ‣ Need to scale up with more (types of) texts! I. How is a scientific fact created, as it moves from a hedged claim to a throughout successive citations?
  121. 121. Future research: ‣ Need co-annotators to verify semantic types ‣ Need to scale up with more (types of) texts! I. How is a scientific fact created, as it moves from a hedged claim to a throughout successive citations? II. Can we identify a rhetorically successful text, using these segments and characteristics?
  122. 122. Future research: ‣ Need co-annotators to verify semantic types ‣ Need to scale up with more (types of) texts! I. How is a scientific fact created, as it moves from a hedged claim to a throughout successive citations? II. Can we identify a rhetorically successful text, using these segments and characteristics? III. Can we help authors create such texts (guidelines, tools?
  123. 123. Third attempt: collaboration!
  124. 124. Improve ‘what is claimed about an entity’ insulin ::: maintaining glucose ... diabetes defect) to overcome GB000841 homeostasis insulin resistance in maintaining glucose homeostasis, hyperglycemia and glucose improve glucose intolerance able to increase ... in T2D is ... homeostasis insulin secretion and improve glucose homeostasis. improves glucose ... SIRT1, whose administration homeostasis to insulin-resistant animals improves glucose homeostasis. is capable glucose S15511 is a novel insulin homeostasis sensitizer that is capable of improving glucose homeostasis in nondiabetic rats. maintains glucose Pancreatic beta-cells possess a homeostasis well-regulated insulin secretory property that maintains systemic glucose homeostasis. may be glucose ... similar way to those of involved homeostasis insulin, PANDER may be involved in glucose homeostasis. participates glucose Fine-tuning of insulin secretion homeostasis from pancreatic beta-cells participates in blood glucose homeostasis.
  125. 125. Improve ‘what is claimed about an entity’ insulin ::: maintaining glucose ... diabetes defect) to overcome When insulin secretion cannot be increased adequately (type I diabetes defect) to overcome insulin resistance in maintaining glucose homeostasis, GB000841 homeostasis insulin resistance in maintaining hyperglycemia and glucose intolerance ensues. Insulin resistance and glucose glucose homeostasis, intolerance has been well recognized in patients with advanced chronic hyperglycemia and glucose kidney diseases (CKD). improve glucose intolerance able to increase ... in T2D is ... .. Incretin metabolism is abnormal in T2D, evidenced by a decreased incretin effect, reduction in nutrient-mediated secretion of GIP and GLP-1 in homeostasis insulin secretion and improve T2D, and resistance to GIP. GLP-1, on the other hand, when administered glucose homeostasis. intravenously in T2D is able to increase insulin secretion and improve glucose homeostasis. improves glucose ... SIRT1, whose administration SIRT1, a NAD(+)-dependent protein deacetylase that regulates transcription factors involved in key cellular processes, has been implicated as a mediator homeostasis to insulin-resistant animals of the beneficial effects of calorie restriction. In a recent issue of Nature, improves glucose homeostasis. Milne et al. (2007) describe novel potent activators of SIRT1, whose administration to insulin-resistant animals improves glucose homeostasis. is capable glucose S15511 is a novel insulin S15511 is a novel insulin sensitizer that is capable of improving glucose homeostasis in nondiabetic rats.... However, the mechanisms behind the insulin- homeostasis sensitizer that is capable of sensitizing effect of S15511 are unknown. The aim of our study was to improving glucose homeostasis explore whether S15511 improves insulin sensitivity in skeletal muscles. in nondiabetic rats. S15511 treatment was associated with an increase in insulin-stimulated glucose transport in type IIb well-regulatedtype I fibers were unaffected. Pancreatic beta-cells possess a fibers, while insulin secretory property that maintains glucose Pancreatic beta-cells possess a maintains systemic glucose homeostasis. Although it has long been homeostasis well-regulated insulin secretory thought that differentiated beta-cells are nearly static, recent studies property that maintains have shown that beta-cell mass dynamically changes throughout the systemic glucose homeostasis. lifetime. In this article, recent progress of regenerative medicine of the pancreasresults showed that glucose up-regulated PANDER mRNA and ... Our is reviewed. may be glucose ... similar way to those of protein levels in a time- and dose-dependent manner in MIN6 cells and involved homeostasis insulin, PANDER may be pancreatic islets. ...Because PANDER is expressed by pancreatic beta-cells involved in glucose homeostasis. and in response to glucose in a similar way to those of insulin, PANDER may be involved in glucose homeostasis. participates glucose Fine-tuning of insulin secretion Fine-tuning of insulin secretion from pancreatic beta-cells participates in blood glucose homeostasis. ... Our data identify miR124a and miR96 as novel homeostasis from pancreatic beta-cells regulators of the expression of proteins playing a critical role in insulin participates in blood glucose exocytosis and in the release of other hormones and neurotransmitters. homeostasis.
  126. 126. Improve ‘what is claimed about an entity’ insulin ::: maintaining glucose ... diabetes defect) to overcome When insulin secretion cannot be increased adequately (type I diabetes defect) to overcome insulin resistance in maintaining glucose homeostasis, GB000841 homeostasis insulin resistance in maintaining hyperglycemia and glucose intolerance ensues. Insulin resistance and glucose glucose homeostasis, intolerance has been well recognized in patients with advanced chronic hyperglycemia and glucose kidney diseases (CKD). improve glucose intolerance able to increase ... in T2D is ... .. Incretin metabolism is abnormal in T2D, evidenced by a decreased incretin effect, reduction in nutrient-mediated secretion of GIP and GLP-1 in homeostasis insulin secretion and improve T2D, and resistance to GIP. GLP-1, on the other hand, when administered glucose homeostasis. intravenously in T2D is able to increase insulin secretion and improve glucose homeostasis. improves glucose ... SIRT1, whose administration SIRT1, a NAD(+)-dependent protein deacetylase that regulates transcription factors involved in key cellular processes, has been implicated as a mediator homeostasis to insulin-resistant animals of the beneficial effects of calorie restriction. In a recent issue of Nature, improves glucose homeostasis. Milne et al. (2007) describe novel potent activators of SIRT1, whose administration to insulin-resistant animals improves glucose homeostasis. is capable glucose S15511 is a novel insulin S15511 is a novel insulin sensitizer that is capable of improving glucose homeostasis in nondiabetic rats.... However, the mechanisms behind the insulin- homeostasis sensitizer that is capable of sensitizing effect of S15511 are unknown. The aim of our study was to improving glucose homeostasis explore whether S15511 improves insulin sensitivity in skeletal muscles. in nondiabetic rats. S15511 treatment was associated with an increase in insulin-stimulated glucose transport in type IIb well-regulatedtype I fibers were unaffected. Pancreatic beta-cells possess a fibers, while insulin secretory property that maintains glucose Pancreatic beta-cells possess a maintains systemic glucose homeostasis. Although it has long been homeostasis well-regulated insulin secretory thought that differentiated beta-cells are nearly static, recent studies property that maintains have shown that beta-cell mass dynamically changes throughout the systemic glucose homeostasis. lifetime. In this article, recent progress of regenerative medicine of the pancreasresults showed that glucose up-regulated PANDER mRNA and ... Our is reviewed. may be glucose ... similar way to those of protein levels in a time- and dose-dependent manner in MIN6 cells and involved homeostasis insulin, PANDER may be pancreatic islets. ...Because PANDER is expressed by pancreatic beta-cells involved in glucose homeostasis. and in response to glucose in a similar way to those of insulin, PANDER may be involved in glucose homeostasis. participates glucose Fine-tuning of insulin secretion Fine-tuning of insulin secretion from pancreatic beta-cells participates in blood glucose homeostasis. ... Our data identify miR124a and miR96 as novel homeostasis from pancreatic beta-cells regulators of the expression of proteins playing a critical role in insulin participates in blood glucose exocytosis and in the release of other hormones and neurotransmitters. homeostasis.
  127. 127. A network of hypotheses and evidence 30
  128. 128. A network of hypotheses and evidence PHC undergo Growth arrest 30
  129. 129. A network of hypotheses and evidence PHC undergo Growth arrest Paper A: implication method fact goal fact results 30
  130. 130. A network of hypotheses and evidence PHC undergo Growth arrest Paper A: implication method fact goal fact results data 1 data 2 data 3 30
  131. 131. A network of hypotheses and evidence PHC undergo Growth arrest Paper A: Paper B: implication implication method fact method fact goal fact goal fact results results data 1 data 4 data 2 data 3 data 5 data 6 30
  132. 132. A network of hypotheses and evidence PHC undergo Growth arrest Paper A: Paper B: implication implication method fact method fact goal fact goal fact results results data 1 data 4 data 2 data 3 data 5 data 6 30
  133. 133. A network of hypotheses and evidence PHC undergo Growth arrest Paper A: Paper B: implication implication method fact method fact goal fact goal fact results results data 1 data 4 data 2 data 3 data 5 data 6 30
  134. 134. A network of hypotheses and evidence PHC undergo Growth arrest Paper A: Paper B: implication implication g n nin method fact rpi method de fact un goal fact goal fact results results data 1 data 4 data 2 data 3 data 5 data 6 30
  135. 135. A network of hypotheses and evidence PHC undergo Growth arrest Paper A: Paper B: implication implication method fact method fact goal fact goal fact results results data 1 data 4 data 2 data 3 data 5 data 6 30
  136. 136. A network of hypotheses and evidence PHC undergo Growth arrest Paper A: Paper B: implication implication method method link fact method fact goal fact goal fact results results data 1 data 4 data 2 data 3 data 5 data 6 30
  137. 137. For Example: SWAN
  138. 138. For Example: SWAN
  139. 139. For Example: SWAN
  140. 140. For Example: SWAN
  141. 141. HypER Working Group: - Goal: Align and expand existing efforts on detection and analysis of Hypotheses, Evidence & Relationships - Partners: - Harvard/MGH: SWAN, ARF - Open University: Cohere - Oxford University: CiTO, eLearning/Rhetoric - DERI: SALT, aTags - University of Trento: LiquidPub - Xerox Research: XIP hypothesis identifier - U Tilburg: ML for Science - Elsevier, UUtrecht: Discourse analysis of biology
  142. 142. HypER Working Group: - Goal: Align and expand existing efforts on detection and analysis of Hypotheses, Evidence & Relationships - Partners: - Harvard/MGH: SWAN, ARF - Open University: Cohere - Oxford University: CiTO, eLearning/Rhetoric - DERI: SALT, aTags - University of Trento: LiquidPub - Xerox Research: XIP hypothesis identifier - U Tilburg: ML for Science - Elsevier, UUtrecht: Discourse analysis of biology
  143. 143. HypER Working Group: - Goal: Align and expand existing efforts on detection and analysis of Hypotheses, Evidence & Relationships - Partners: - Harvard/MGH: SWAN, ARF - Open University: Cohere - Oxford University: CiTO, eLearning/Rhetoric - DERI: SALT, aTags - University of Trento: LiquidPub - Xerox Research: XIP hypothesis identifier - U Tilburg: ML for Science - Elsevier, UUtrecht: Discourse analysis of biology
  144. 144. HypER Working Group: - Goal: Align and expand existing efforts on detection and analysis of Hypotheses, Evidence & Relationships - Partners: - Harvard/MGH: SWAN, ARF - Open University: Cohere - Oxford University: CiTO, eLearning/Rhetoric - DERI: SALT, aTags - University of Trento: LiquidPub - Xerox Research: XIP hypothesis identifier - U Tilburg: ML for Science - Elsevier, UUtrecht: Discourse analysis of biology
  145. 145. HypER Working Group: - Goal: Align and expand existing efforts on detection and analysis of Hypotheses, Evidence & Relationships - Partners: - Harvard/MGH: SWAN, ARF - Open University: Cohere - Oxford University: CiTO, eLearning/Rhetoric - DERI: SALT, aTags - University of Trento: LiquidPub - Xerox Research: XIP hypothesis identifier - U Tilburg: ML for Science - Elsevier, UUtrecht: Discourse analysis of biology
  146. 146. HypER Working Group: - Goal: Align and expand existing efforts on detection and analysis of Hypotheses, Evidence & Relationships - Partners: - Harvard/MGH: SWAN, ARF - Open University: Cohere - Oxford University: CiTO, eLearning/Rhetoric - DERI: SALT, aTags - University of Trento: LiquidPub - Xerox Research: XIP hypothesis identifier - U Tilburg: ML for Science - Elsevier, UUtrecht: Discourse analysis of biology
  147. 147. HypER Working Group: - Goal: Align and expand existing efforts on detection and analysis of Hypotheses, Evidence & Relationships - Partners: - Hypothesis 22: Intramembrenous Aβ dimer may be toxic. Harvard/MGH: SWAN, ARF - Derived of these Abeta peptides never leave theessay explores the possibility they aare Open from: POSTAT_CONTRIBUTION(This fraction University: Cohere membrane lipid bilayer after that - generated,University: CiTO,their toxic effects by competing with and compromising Oxford but instead exert eLearning/Rhetoric the functions of intramembranous segments of membrane-bound proteins that serve - many critical functions. DERI: SALT, aTags - University of Trento: LiquidPub - Xerox Research: XIP hypothesis identifier - U Tilburg: ML for Science - Elsevier, UUtrecht: Discourse analysis of biology
  148. 148. HypER Activities: http://hyper.wik.is
  149. 149. HypER Activities: http://hyper.wik.is Current activities: - Aligning discourse ontologies: joint task with W3C HCLSSig - Aligning architectures to exchange hypotheses + evidence - Format for a rhetorical conference paper (SALT + abcde) - Parser test of hypothesis identification tools on pharmacology corpus
  150. 150. HypER Activities: http://hyper.wik.is Current activities: - Aligning discourse ontologies: joint task with W3C HCLSSig - Aligning architectures to exchange hypotheses + evidence - Format for a rhetorical conference paper (SALT + abcde) - Parser test of hypothesis identification tools on pharmacology corpus Further interests: - Better structure of evidence: MyExperiment, KeFeD, ... - Granularity of annotation/access: entity, hypothesis, discussion?
  151. 151. Conclusion
  152. 152. Conclusion Problem: too much discourse, tools are not yet good enough...
  153. 153. Conclusion Problem: too much discourse, tools are not yet good enough... 1. First attempt: allow authors to validate entities - pursue
  154. 154. Conclusion Problem: too much discourse, tools are not yet good enough... 1. First attempt: allow authors to validate entities - pursue 2. Second attempt: discourse analysis - any help is great!
  155. 155. Conclusion Problem: too much discourse, tools are not yet good enough... 1. First attempt: allow authors to validate entities - pursue 2. Second attempt: discourse analysis - any help is great! 3. Third attempt: collaboration to identify hypotheses: do join!
  156. 156. Questions? a.dewaard@elsevier.com http://elsatglabs.elsevier.com/labs/anita
  157. 157. References Hyland, K. (2004). Disciplinary Discourses: Social Interactions in Academic Writing, Addison Wesley Publishing Company, 2004. Latour, B., and Woolgar, S. (1986). Laboratory Life: The Construction of Scientific Facts. 2nd ed. Princeton, NJ: Princeton University Press, 1986. ISBN: 9780691028323. Latour, B. (1987). Science in Action, How to Follow Scientists and Engineers through Society, (Cambridge, Ma.: Harvard University Press, 1987)
  158. 158. Segmentation Criteria (summary) Finite/ Grammatical role Segment? Example Non-finite The extent to which miRNAs specifically affect Finite/Non-finite Subject N metastasis Finite/Non-finite Direct Object Y these miRNAs are potential novel oncogenes Phrase-level adjunct (restrictive and Nonfinite N spanning a given miRNA genomic region non-restrictive) Nonfinite Clause-level adjunct Y by cloning eight miR-Vec plasmids which is only active when tamoxifen is added (De Finite Non-restrictive Phrase-level adjunct Y Vita et al, 2005) […] Finite Restrictive Phrase-level adjunct N that we examined which correlates with the reported ES-cell Finite Clause-level adjunct Y expression pattern of the miR-371-3 cluster (Suh et al, 2004)
  159. 159. Basic Segment Types Segment Description Example a known fact, generally Fact mature miR-373 is a homolog of miR-372 without explicit citation a proposed idea, not Hypothesis This could for instance be a result of high mdm2 levels supported by evidence unresolved, contradictory, or However, further investigation is required to Problem unclear issue demonstrate the exact mechanism of LATS2 action Goal research goal To identify novel functions of miRNAs, Method experimental method Using fluorescence microscopy and luciferase assays, a restatement of the outcome all constructs yielded high expression levels of mature Result of an experiment miRNAs an interpretation of the our procedure is sensitive enough to detect mild growth Implication results, in light of earlier hypotheses and facts differences
  160. 160. Two Types of Derived Segment Types
  161. 161. Two Types of Derived Segment Types ‘Other-segments’, related to (referenced) other work:
  162. 162. Two Types of Derived Segment Types ‘Other-segments’, related to (referenced) other work: - other-result: ‘they are also found in the FCX and other cortical structures ([Sokoloff et al., 1990]’
  163. 163. Two Types of Derived Segment Types ‘Other-segments’, related to (referenced) other work: - other-result: ‘they are also found in the FCX and other cortical structures ([Sokoloff et al., 1990]’ - other-goal: ‘the role of D3 receptors in the control of motivation and affect has been intensively studied [Heidbreder et al., 2005]’
  164. 164. Two Types of Derived Segment Types ‘Other-segments’, related to (referenced) other work: - other-result: ‘they are also found in the FCX and other cortical structures ([Sokoloff et al., 1990]’ - other-goal: ‘the role of D3 receptors in the control of motivation and affect has been intensively studied [Heidbreder et al., 2005]’ - other-implication: ‘D1 or, more likely, D5, receptors have been implicated in mechanisms underlying long-term spatial memory [Hersi et al., 1995]’
  165. 165. Two Types of Derived Segment Types ‘Other-segments’, related to (referenced) other work: - other-result: ‘they are also found in the FCX and other cortical structures ([Sokoloff et al., 1990]’ - other-goal: ‘the role of D3 receptors in the control of motivation and affect has been intensively studied [Heidbreder et al., 2005]’ - other-implication: ‘D1 or, more likely, D5, receptors have been implicated in mechanisms underlying long-term spatial memory [Hersi et al., 1995]’ Regulatory segments, acting as matrix sentences framing other segments:
  166. 166. Two Types of Derived Segment Types ‘Other-segments’, related to (referenced) other work: - other-result: ‘they are also found in the FCX and other cortical structures ([Sokoloff et al., 1990]’ - other-goal: ‘the role of D3 receptors in the control of motivation and affect has been intensively studied [Heidbreder et al., 2005]’ - other-implication: ‘D1 or, more likely, D5, receptors have been implicated in mechanisms underlying long-term spatial memory [Hersi et al., 1995]’ Regulatory segments, acting as matrix sentences framing other segments: - reg-hypothesis: ‘we hypothesized that ’
  167. 167. Two Types of Derived Segment Types ‘Other-segments’, related to (referenced) other work: - other-result: ‘they are also found in the FCX and other cortical structures ([Sokoloff et al., 1990]’ - other-goal: ‘the role of D3 receptors in the control of motivation and affect has been intensively studied [Heidbreder et al., 2005]’ - other-implication: ‘D1 or, more likely, D5, receptors have been implicated in mechanisms underlying long-term spatial memory [Hersi et al., 1995]’ Regulatory segments, acting as matrix sentences framing other segments: - reg-hypothesis: ‘we hypothesized that ’ - reg-implication: ‘These observations suggest that’
  168. 168. Two Types of Derived Segment Types ‘Other-segments’, related to (referenced) other work: - other-result: ‘they are also found in the FCX and other cortical structures ([Sokoloff et al., 1990]’ - other-goal: ‘the role of D3 receptors in the control of motivation and affect has been intensively studied [Heidbreder et al., 2005]’ - other-implication: ‘D1 or, more likely, D5, receptors have been implicated in mechanisms underlying long-term spatial memory [Hersi et al., 1995]’ Regulatory segments, acting as matrix sentences framing other segments: - reg-hypothesis: ‘we hypothesized that ’ - reg-implication: ‘These observations suggest that’ - intratextual: ‘Fig 4 shows that’
  169. 169. Two Types of Derived Segment Types ‘Other-segments’, related to (referenced) other work: - other-result: ‘they are also found in the FCX and other cortical structures ([Sokoloff et al., 1990]’ - other-goal: ‘the role of D3 receptors in the control of motivation and affect has been intensively studied [Heidbreder et al., 2005]’ - other-implication: ‘D1 or, more likely, D5, receptors have been implicated in mechanisms underlying long-term spatial memory [Hersi et al., 1995]’ Regulatory segments, acting as matrix sentences framing other segments: - reg-hypothesis: ‘we hypothesized that ’ - reg-implication: ‘These observations suggest that’ - intratextual: ‘Fig 4 shows that’ - intertextual: ‘reviewed in (Serrano, 1997)’
  170. 170. My categories vs. Latour (1979)
  171. 171. Linguistic and structural properties
  172. 172. Linguistic and structural properties
  173. 173. Linguistic and structural properties 1. Position in text
  174. 174. Linguistic and structural properties 1. Position in text - Section of the paper (Introduction, Results, Discussion)
  175. 175. Linguistic and structural properties 1. Position in text - Section of the paper (Introduction, Results, Discussion) - Beginning/middle/end of section
  176. 176. Linguistic and structural properties 1. Position in text - Section of the paper (Introduction, Results, Discussion) - Beginning/middle/end of section - First/second third part of sentence
  177. 177. Linguistic and structural properties 1. Position in text - Section of the paper (Introduction, Results, Discussion) - Beginning/middle/end of section - First/second third part of sentence 2. Verb:
  178. 178. Linguistic and structural properties 1. Position in text - Section of the paper (Introduction, Results, Discussion) - Beginning/middle/end of section - First/second third part of sentence 2. Verb: - Tense, aspect, voice
  179. 179. Linguistic and structural properties 1. Position in text - Section of the paper (Introduction, Results, Discussion) - Beginning/middle/end of section - First/second third part of sentence 2. Verb: - Tense, aspect, voice - Verb class (idiosyncratic)
  180. 180. Linguistic and structural properties 1. Position in text - Section of the paper (Introduction, Results, Discussion) - Beginning/middle/end of section - First/second third part of sentence 2. Verb: - Tense, aspect, voice - Verb class (idiosyncratic) - Lexicon
  181. 181. Linguistic and structural properties 1. Position in text - Section of the paper (Introduction, Results, Discussion) - Beginning/middle/end of section - First/second third part of sentence 2. Verb: - Tense, aspect, voice - Verb class (idiosyncratic) - Lexicon 3. Metadiscourse markers [Hyland, 2003]:
  182. 182. Linguistic and structural properties 1. Position in text - Section of the paper (Introduction, Results, Discussion) - Beginning/middle/end of section - First/second third part of sentence 2. Verb: - Tense, aspect, voice - Verb class (idiosyncratic) - Lexicon 3. Metadiscourse markers [Hyland, 2003]: - Connectives
  183. 183. Linguistic and structural properties 1. Position in text - Section of the paper (Introduction, Results, Discussion) - Beginning/middle/end of section - First/second third part of sentence 2. Verb: - Tense, aspect, voice - Verb class (idiosyncratic) - Lexicon 3. Metadiscourse markers [Hyland, 2003]: - Connectives - Endophorics, Evidentials
  184. 184. Linguistic and structural properties 1. Position in text - Section of the paper (Introduction, Results, Discussion) - Beginning/middle/end of section - First/second third part of sentence 2. Verb: - Tense, aspect, voice - Verb class (idiosyncratic) - Lexicon 3. Metadiscourse markers [Hyland, 2003]: - Connectives - Endophorics, Evidentials - Hedges, Boosters
  185. 185. Linguistic and structural properties 1. Position in text - Section of the paper (Introduction, Results, Discussion) - Beginning/middle/end of section - First/second third part of sentence 2. Verb: - Tense, aspect, voice - Verb class (idiosyncratic) - Lexicon 3. Metadiscourse markers [Hyland, 2003]: - Connectives - Endophorics, Evidentials - Hedges, Boosters - Person markers
  186. 186. Verb class
  187. 187. Verb class Two types of entities interact in biology texts: - Thing: - Thing -> Increase, die, etc - Thing-thing: affect, stimulate etc. - People: - People -> Thing: - Examine (Goal) - Operate (Method) - Observe (Result) - Implicate (Implication) - People - people: Report
  188. 188. Interpretation: 3 Realms of Science: Conceptual realm Experimental realm Data realm
  189. 189. Interpretation: 3 Realms of Science: (1) Oncogene-induced senescence is (4b) transduction with either Conceptual characterized by the appearance of miR-Vec-371&2 or miR-Vec- V12 cells with a flat morphology that 373 prevents RAS - realm express senescence associated (SA)- induced growth arrest in -Galactosid a s e . primary human cells. (2a) Indeed, (4a) Altogether, these data show that Experimental realm (2b) control RAS V12 -arrested (3b) very few cells showed cells showed relatively high senescent morphology when (3a) Consistent abundance of flat cells transduced with either miR- with the cell expressing SA- - Vec-371&2, miR-Vec-373, or growth assay, kd Galactosidase control p53 . (2c) (Figures 2G and 2H). Data realm (Figures)
  190. 190. Interpretation: 3 Realms of Science: (1) Oncogene-induced senescence is (4b) transduction with either Conceptual characterized by the appearance of miR-Vec-371&2 or miR-Vec- V12 cells with a flat morphology that 373 prevents RAS - realm express senescence associated (SA)- induced growth arrest in -Galactosid a s e . primary human cells. (2a) Indeed, (4a) Altogether, these data show that Experimental realm (2b) control RAS V12 -arrested (3b) very few cells showed cells showed relatively high senescent morphology when (3a) Consistent abundance of flat cells transduced with either miR- with the cell expressing SA- - Vec-371&2, miR-Vec-373, or growth assay, kd Galactosidase control p53 . (2c) (Figures 2G and 2H). Data realm (Figures)
  191. 191. Tense 1: Concepts vs. Experiment (1) Oncogene-induced senescence is (4b) transduction with either Concept realm characterized by the appearance of miR-Vec-371&2 or miR-Vec- V12 cells with a flat morphology that 373 prevents RAS - express senescence associated (SA)- induced growth arrest in -Galactosid a s e . primary human cells. (2a) Indeed, (4a) Altogether, these data show that Experimental realm (personal, past) V12 (2b) control RAS -arrested (3b) very few cells showed cells showed relatively high senescent morphology when (3a) Consistent abundance of flat cells transduced with either miR- with the cell expressing SA- - Vec-371&2, miR-Vec-373, or growth assay, kd Galactosidase control p53 . (2c) (Figures 2G and 2H). (nontverbal) Data realm (Figures)
  192. 192. Tense 2: Referral past present future Introduction Discussion own paper After Before current Current work After current other work: present work: past (= Results section) work: past other papers Other Work
  193. 193. Tense 1+ 2 = 3: Claim, fact Conceptual Experi ment Experiential past present future Reading time
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×