Executing the Research Paper

  • 1,388 views
Uploaded on

Talk held at STM Innovations Workshop, London, UK December 2, 2011 …

Talk held at STM Innovations Workshop, London, UK December 2, 2011
http://www.stm-assoc.org/events/stm-innovations-seminar-2011/

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,388
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
16
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. How to Execute the Research Paper Anita de WaardDisruptive Technology Director, Elsevier Labs http://elsatglabs.com/labs/anita
  • 2. How to execute a research paper- Why? - Three use cases for linked, integrated knowledge- What? - Three technologies for enabling this linking and execution- How? - Three tools for annotation, storage and access- What next? - Force11 and ideas about the future
  • 3. Three Use Cases 3
  • 4. Use case #1: Claim-Evidence Network in Medicine
  • 5. Use case #1: Claim-Evidence Network in MedicineBackground: Proper implementation of clinical decision support systems (CDS) can: - Reduce errors in medical care - Bring research results faster to the front-line clinician - Significantly improve patient outcome.
  • 6. Use case #1: Claim-Evidence Network in MedicineBackground: Proper implementation of clinical decision support systems (CDS) can: - Reduce errors in medical care - Bring research results faster to the front-line clinician - Significantly improve patient outcome.Requirements: To that end, such systems need to: - Be able to answer complex questions - Aggregate data from multiple sources, combining complex patient specific data with information from external sources - Be semantically aware - Be continually updated with the latest validated research results.
  • 7. Use case #1: Claim-Evidence Network in MedicineBackground: Proper implementation of clinical decision support systems (CDS) can: - Reduce errors in medical care - Bring research results faster to the front-line clinician - Significantly improve patient outcome.Requirements: To that end, such systems need to: - Be able to answer complex questions - Aggregate data from multiple sources, combining complex patient specific data with information from external sources - Be semantically aware - Be continually updated with the latest validated research results.Components: To develop such semantically aware systems, we need: - Flexible frameworks supporting the development of such applications - Seamless integration of relevant content - Content sources with high quality content - Tools enabling the extraction and aggregation of such content.
  • 8. Use case #1: Claim-Evidence Network in Medicine B. Elsevier-publishedA. Philips’ Electronic Patient Records Clinical Guideline C. Elsevier (or other publisher’s) Research Report or Data 5
  • 9. Use case #1: Claim-Evidence Network in Medicine Step 1: Patient data + diagnosis link to Guideline recommendation B. Elsevier-publishedA. Philips’ Electronic Patient Records Clinical Guideline C. Elsevier (or other publisher’s) Research Report or Data 5
  • 10. Use case #1: Claim-Evidence Network in Medicine Step 1: Patient data + diagnosis link to Guideline recommendation B. Elsevier-publishedA. Philips’ Electronic Patient Records Clinical Guideline Step 2: Guideline recommendation links to evidence in report or data C. Elsevier (or other publisher’s) Research Report or Data 5
  • 11. Use case #2: Updating Drug-Drug Interactions
  • 12. Use case #2: Updating Drug-Drug InteractionsBackground: - Drug-drug interactions (DDIs) are a significant source of preventable adverse effects - Factors contributing to the occurrence of preventable DDIs include: - a lack of knowledge of the patient’s concurrent medications - inaccurate or inadequate knowledge of interactions by health care providers
  • 13. Use case #2: Updating Drug-Drug InteractionsBackground: - Drug-drug interactions (DDIs) are a significant source of preventable adverse effects - Factors contributing to the occurrence of preventable DDIs include: - a lack of knowledge of the patient’s concurrent medications - inaccurate or inadequate knowledge of interactions by health care providersRequirements: We (HCLS SciDiscourse group: Elsevier, DERI, Pittsburgh, EBI) will: - Manually mark up a diverse collection of content with DDIs - Develop/train NLP tools to recognize these - Create a triple store to maintain the relationships between drugs-DDIs-content
  • 14. Use case #2: Updating Drug-Drug InteractionsBackground: - Drug-drug interactions (DDIs) are a significant source of preventable adverse effects - Factors contributing to the occurrence of preventable DDIs include: - a lack of knowledge of the patient’s concurrent medications - inaccurate or inadequate knowledge of interactions by health care providersRequirements: We (HCLS SciDiscourse group: Elsevier, DERI, Pittsburgh, EBI) will: - Manually mark up a diverse collection of content with DDIs - Develop/train NLP tools to recognize these - Create a triple store to maintain the relationships between drugs-DDIs-contentComponents: To develop this system, we need: - Scientific discourse ontologies to mark up relevant statement and seed NLP - Natural language processing to identify relevant DDI - Linked Data architecture to enable storage and access to this information
  • 15. Use case #2: Updating Drug-Drug Interactions Images from: Discovering drug–drug interactions: a text-mining and reasoning approach based on properties of drug metabolism, Luis Tari∗, Saadat Anwar, Shanshan Liang, James Cai and Chitta Baral Vol. 26 ECCB 2010, pages i547–i553 doi:10.1093/bioinformatics/btq382 7
  • 16. Use case #2: Updating Drug-Drug Interactions Step 1: Manually identify DDIs and drug names in wide collection of content sources Images from: Discovering drug–drug interactions: a text-mining and reasoning approach based on properties of drug metabolism, Luis Tari∗, Saadat Anwar, Shanshan Liang, James Cai and Chitta Baral Vol. 26 ECCB 2010, pages i547–i553 doi:10.1093/bioinformatics/btq382 7
  • 17. Use case #2: Updating Drug-Drug Interactions Step 1: Manually identify DDIs and drug names in wide collection of content sources Step 2: Develop a model of Drug- Drug Interaction and define candidates Images from: Discovering drug–drug interactions: a text-mining and reasoning approach based on properties of drug metabolism, Luis Tari∗, Saadat Anwar, Shanshan Liang, James Cai and Chitta Baral Vol. 26 ECCB 2010, pages i547–i553 doi:10.1093/bioinformatics/btq382 7
  • 18. Use case #2: Updating Drug-Drug Interactions Step 1: Manually identify DDIs and drug names in wide collection of content sources Step 2: Develop a model of Drug- Drug Interaction and define candidates Step 3: Automate this process and store as Linked Data Images from: Discovering drug–drug interactions: a text-mining and reasoning approach based on properties of drug metabolism, Luis Tari∗, Saadat Anwar, Shanshan Liang, James Cai and Chitta Baral Vol. 26 ECCB 2010, pages i547–i553 doi:10.1093/bioinformatics/btq382 7
  • 19. Use Case #3: Review and share code
  • 20. Use Case #3: Review and share codeBackground: - Core of computational papers is the software - If code is not part of the paper, hard to assess quality - Code reuse can reduce waste of time and (taxpayer’s) money
  • 21. Use Case #3: Review and share codeBackground: - Core of computational papers is the software - If code is not part of the paper, hard to assess quality - Code reuse can reduce waste of time and (taxpayer’s) moneyRequirements: - Provide a way to create, share and review code - Integrate this with the research paper - Enable integration with publisher’s system
  • 22. Use Case #3: Review and share codeBackground: - Core of computational papers is the software - If code is not part of the paper, hard to assess quality - Code reuse can reduce waste of time and (taxpayer’s) moneyRequirements: - Provide a way to create, share and review code - Integrate this with the research paper - Enable integration with publisher’s systemComponents: - Integration between workflow and text authoring - Code authoring tools and standards that allow reuse - User environment that allows access to disparate results types
  • 23. Use Case #3: Review and share code Step 1: Develop Virtual Machine environment for creating code Pieter Van Gorp, Stefen Mazanek, SHARE: a web portal for creating and sharing executable research papers Procedia Computer Science 00 (2011) 1–6 9
  • 24. Use Case #3: Review and share code Step 1: Develop Virtual Machine environment for creating code Step 2: Create authoring/review environment to allow VM evaluation Pieter Van Gorp, Stefen Mazanek, SHARE: a web portal for creating and sharing executable research papers Procedia Computer Science 00 (2011) 1–6 9
  • 25. Use Case #3: Review and share code Step 1: Develop Virtual Machine environment for creating code Step 2: Create authoring/review environment to allow VM evaluation Step 3: Allow access to integrated environment through SciVerse App store Pieter Van Gorp, Stefen Mazanek, SHARE: a web portal for creating and sharing executable research papers Procedia Computer Science 00 (2011) 1–6 9
  • 26. Three Technologies 10
  • 27. Technology #1: Discourse Annotation - at text level 11
  • 28. Technology #1: Discourse Annotation - at text levelAristotle Quintilian Scientific Paper The introduction of a speech, where one announces the Introduction subject and purpose of the discourse, and where one usually Introduction:prooimion / exordium employs the persuasive appeal to ethos in order to positioning establish credibility with the audience. Statement The speaker here provides a narrative account of what has Introduction: researchprothesis of Facts/ happened and generally explains the nature of the case. narratio question The propositio provides a brief summary of what one is Summary/  propostitio about to speak on, or concisely puts forth the charges or Summary of contents accusation. The main body of the speech where one offers logical Proof/pistis confirmatio arguments as proof. The appeal to logos is emphasized Results here. Refutation/ As the name connotes, this section of a speech was devoted  refutatio to answering the counterarguments of ones opponent. Related Work Following the refutatio and concluding the classical oration, Discussion: summary,epilogos peroratio  the peroratio conventionally employed appeals through pathos, and often included a summing up. implications. 11
  • 29. Technology #1: Discourse Annotation - at text levelAristotle Quintilian Scientific Paper The introduction of a speech, where one announces the Introduction subject and purpose of the discourse, and where one usually Introduction:prooimion / exordium employs the persuasive appeal to ethos in order to positioning establish credibility with the audience. Statement The speaker here provides a narrative account of what has Introduction: researchprothesis of Facts/ happened and generally explains the nature of the case. narratio question The propositio provides a brief summary of what one is Summary/  propostitio about to speak on, or concisely puts forth the charges or Summary of contents accusation. The main body of the speech where one offers logical Proof/pistis confirmatio arguments as proof. The appeal to logos is emphasized Results here. Refutation/ As the name connotes, this section of a speech was devoted  refutatio to answering the counterarguments of ones opponent. Related Work Following the refutatio and concluding the classical oration, Discussion: summary,epilogos peroratio  the peroratio conventionally employed appeals through pathos, and often included a summing up. implications. 11
  • 30. Technology #1: Discourse Annotation - at paragraph levelThe Story of Goldilocks and Story Grammar Paper The AXH Domain of Ataxin-1 Mediatesthe Three Bears Neurodegeneration through Its Interaction with Gfi-1/ Senseless ProteinsOnce upon a time Time Setting Background The mechanisms mediating SCA1 pathogenesis are still not fully understood, but some general principles have emerged.a little girl named Goldilocks Characters Objects of study the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ tract,She went for a walk in the forest. Location Experimental studied and compared in vivo effects and interactions to those ofPretty soon, she came upon a setup the human proteinhouse.She knocked and, when no one Goal Theme Research Gain insight into how Atx-1s function contributes to SCA1answered, goal pathogenesis. How these interactions might contribute to the disease process and how they might cause toxicity in only ashe walked right in. subset of neurons in SCA1 is not fully understood. Atx-1 may play a role in the regulation of gene expression Attempt HypothesisAt the table in the kitchen, there Name Episode 1 Name dAtX-1 and hAtx-1 Induce Similar Phenotypes Whenwere three bowls of porridge. Overexpressed in FilesGoldilocks was hungry. Subgoal Subgoal test the function of the AXH domainShe tasted the porridge from the Attempt Method overexpressed dAtx-1 in flies using the GAL4/UAS systemfirst bowl. (Brand and Perrimon, 1993) and compared its effects to those ofThis porridge is too hot! she Outcome Results hAtx-1. Overexpression of dAtx-1 by Rhodopsin1(Rh1)-GAL4, whichexclaimed. drives expression in the differentiated R1-R6 photoreceptor cells (Mollereau et al., 2000 and OTousa et al., 1985), results in neurodegeneration in the eye, as does overexpression of hAtx-1 [82Q]. Although at 2 days after eclosion, overexpression of eitherSo, she tasted the porridge from Activity Data (data not shown), Atx-1 does not show obvious morphological changes in thethe second bowl. photoreceptor cellsThis porridge is too cold, she said Outcome Results both genotypes show many large holes and loss of cell integrity at 28 daysSo, she tasted the last bowl of  Activity Data (Figures 1B-1D).porridge.Ahhh, this porridge is just right, Outcome Results Overexpression of dAtx-1 using the GMR-GAL4 driver alsoshe said happily and induces eye abnormalities. The external structures of the eyes 12 that overexpress dAtx-1 show disorganized ommatidia and lossshe ate it all up.   Data (Figure 1F), of interommatidial bristles
  • 31. Technology #1: Discourse Annotation - at clause levelBoth seminomas and the EC component ofnonseminomas share features with ES cells. Toexclude that the detection of miR-371-3 merelyreflects its expression pattern in ES cells, we testedby RPA miR-302a-d, another ES cells-specificmiRNA cluster (Suh et al, 2004). In many of themiR-371-3 expressing seminomas andnonseminomas, miR-302a-d was undetectable (FigsS7 and S8), suggesting that miR-371-3 expressionis a selective event during tumorigenesis.
  • 32. Technology #1: Discourse Annotation - at clause levelBoth seminomas and the EC component of Both seminomas and the EC component ofnonseminomas share features with ES cells. nonseminomas share features with ES cells. Toexclude thatthat detection of miR-371-3 merely To exclude thereflects its expression pattern in ES cells,reflects its the detection of miR-371-3 merely we testedby RPA miR-302a-d, another ES cells-specific expression pattern in ES cells,miRNA cluster RPA miR-302a-d, another ES cells- we tested by (Suh et al, 2004). In many of them i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d specific 1 - 3 e cluster i n g s al, i n o mnonseminomas, miR-302a-d was undetectable (Figs In many of the miR-371-3 expressing seminomasS7 and S8), suggesting that miR-371-3undetectable and nonseminomas, miR-302a-d was expressionis a selective event during tumorigenesis. (Figs S7 and S8), suggesting that miR-371-3 expression is a selective event during tumorigenesis.
  • 33. Technology #1: Discourse Annotation - at clause levelBoth seminomas and the EC component of Both seminomas and the EC component of Factnonseminomas share features with ES cells. nonseminomas share features with ES cells. Toexclude thatthat detection of miR-371-3 merely To exclude thereflects its expression pattern in ES cells,reflects its the detection of miR-371-3 merely we testedby RPA miR-302a-d, another ES cells-specific expression pattern in ES cells,miRNA cluster RPA miR-302a-d, another ES cells- we tested by (Suh et al, 2004). In many of them i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d specific 1 - 3 e cluster i n g s al, i n o mnonseminomas, miR-302a-d was undetectable (Figs In many of the miR-371-3 expressing seminomasS7 and S8), suggesting that miR-371-3undetectable and nonseminomas, miR-302a-d was expressionis a selective event during tumorigenesis. (Figs S7 and S8), suggesting that miR-371-3 expression is a selective event during tumorigenesis.
  • 34. Technology #1: Discourse Annotation - at clause levelBoth seminomas and the EC component of Both seminomas and the EC component of Factnonseminomas share features with ES cells. nonseminomas share features with ES cells. Toexclude thatthat detection of miR-371-3 merely To exclude thereflects its expression pattern in ES cells,reflects its the detection of miR-371-3 merely we tested Hypothesisby RPA miR-302a-d, another ES cells-specific expression pattern in ES cells,miRNA cluster RPA miR-302a-d, another ES cells- we tested by (Suh et al, 2004). In many of them i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d specific 1 - 3 e cluster i n g s al, i n o mnonseminomas, miR-302a-d was undetectable (Figs In many of the miR-371-3 expressing seminomasS7 and S8), suggesting that miR-371-3undetectable and nonseminomas, miR-302a-d was expressionis a selective event during tumorigenesis. (Figs S7 and S8), suggesting that miR-371-3 expression is a selective event during tumorigenesis.
  • 35. Technology #1: Discourse Annotation - at clause levelBoth seminomas and the EC component of Both seminomas and the EC component of Factnonseminomas share features with ES cells. nonseminomas share features with ES cells. Toexclude thatthat detection of miR-371-3 merely To exclude thereflects its expression pattern in ES cells,reflects its the detection of miR-371-3 merely we tested Hypothesisby RPA miR-302a-d, another ES cells-specific expression pattern in ES cells,miRNA cluster RPA miR-302a-d, another ES cells- we tested by (Suh et al, 2004). In many of them i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d specific 1 - 3 e cluster i n g s al, i n o m Methodnonseminomas, miR-302a-d was undetectable (Figs In many of the miR-371-3 expressing seminomasS7 and S8), suggesting that miR-371-3undetectable and nonseminomas, miR-302a-d was expressionis a selective event during tumorigenesis. (Figs S7 and S8), suggesting that miR-371-3 expression is a selective event during tumorigenesis.
  • 36. Technology #1: Discourse Annotation - at clause levelBoth seminomas and the EC component of Both seminomas and the EC component of Factnonseminomas share features with ES cells. nonseminomas share features with ES cells. Toexclude thatthat detection of miR-371-3 merely To exclude thereflects its expression pattern in ES cells,reflects its the detection of miR-371-3 merely we tested Hypothesisby RPA miR-302a-d, another ES cells-specific expression pattern in ES cells,miRNA cluster RPA miR-302a-d, another ES cells- we tested by (Suh et al, 2004). In many of them i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d specific 1 - 3 e cluster i n g s al, i n o m Methodnonseminomas, miR-302a-d was undetectable (Figs In many of the miR-371-3 expressing seminomasS7 and S8), suggesting that miR-371-3undetectable and nonseminomas, miR-302a-d was expression Resultis a selective event during tumorigenesis. (Figs S7 and S8), suggesting that miR-371-3 expression is a selective event during tumorigenesis.
  • 37. Technology #1: Discourse Annotation - at clause levelBoth seminomas and the EC component of Both seminomas and the EC component of Factnonseminomas share features with ES cells. nonseminomas share features with ES cells. Toexclude thatthat detection of miR-371-3 merely To exclude thereflects its expression pattern in ES cells,reflects its the detection of miR-371-3 merely we tested Hypothesisby RPA miR-302a-d, another ES cells-specific expression pattern in ES cells,miRNA cluster RPA miR-302a-d, another ES cells- we tested by (Suh et al, 2004). In many of them i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d specific 1 - 3 e cluster i n g s al, i n o m Methodnonseminomas, miR-302a-d was undetectable (Figs In many of the miR-371-3 expressing seminomasS7 and S8), suggesting that miR-371-3undetectable and nonseminomas, miR-302a-d was expression Resultis a selective event during tumorigenesis. (Figs S7 and S8), suggesting that miR-371-3 expression is a selective event during Implication tumorigenesis.
  • 38. Technology #1: Discourse Annotation - at clause levelBoth seminomas and the EC component of Both seminomas and the EC component of Factnonseminomas share features with ES cells. nonseminomas share features with ES cells. Toexclude thatthat detection of miR-371-3 merely To exclude the Goalreflects its expression pattern in ES cells,reflects its the detection of miR-371-3 merely we tested Hypothesisby RPA miR-302a-d, another ES cells-specific expression pattern in ES cells,miRNA cluster RPA miR-302a-d, another ES cells- we tested by (Suh et al, 2004). In many of them i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d specific 1 - 3 e cluster i n g s al, i n o m Methodnonseminomas, miR-302a-d was undetectable (Figs In many of the miR-371-3 expressing seminomasS7 and S8), suggesting that miR-371-3undetectable and nonseminomas, miR-302a-d was expression Resultis a selective event during tumorigenesis. (Figs S7 and S8), suggesting that miR-371-3 expression is a selective event during Implication tumorigenesis.
  • 39. Technology #1: Discourse Annotation - at clause levelBoth seminomas and the EC component of Both seminomas and the EC component of Factnonseminomas share features with ES cells. nonseminomas share features with ES cells. Toexclude thatthat detection of miR-371-3 merely To exclude the Goalreflects its expression pattern in ES cells,reflects its the detection of miR-371-3 merely we tested Hypothesisby RPA miR-302a-d, another ES cells-specific expression pattern in ES cells,miRNA cluster RPA miR-302a-d, another ES cells- we tested by (Suh et al, 2004). In many of them i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d specific 1 - 3 e cluster i n g s al, i n o m Methodnonseminomas, miR-302a-d was undetectable (Figs In many of the miR-371-3 expressing seminomasS7 and S8), suggesting that miR-371-3undetectable and nonseminomas, miR-302a-d was expression Resultis a selective event during tumorigenesis. (Figs S7 and S8), suggesting that Reg-Implication miR-371-3 expression is a selective event during Implication tumorigenesis.
  • 40. Technology #1: Discourse Annotation - at clause level ConceptualBoth seminomas and the EC component of Both seminomas and the EC component of knowledge Factnonseminomas share features with ES cells. nonseminomas share features with ES cells. Toexclude thatthat detection of miR-371-3 merely To exclude the Goalreflects its expression pattern in ES cells,reflects its the detection of miR-371-3 merely we tested Hypothesisby RPA miR-302a-d, another ES cells-specific expression pattern in ES cells,miRNA cluster RPA miR-302a-d, another ES cells- we tested by (Suh et al, 2004). In many of them i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d specific 1 - 3 e cluster i n g s al, i n o m Methodnonseminomas, miR-302a-d was undetectable (Figs In many of the miR-371-3 expressing seminomasS7 and S8), suggesting that miR-371-3undetectable and nonseminomas, miR-302a-d was expression Resultis a selective event during tumorigenesis. (Figs S7 and S8), suggesting that Reg-Implication miR-371-3 expression is a selective event during Implication tumorigenesis.
  • 41. Technology #1: Discourse Annotation - at clause level ConceptualBoth seminomas and the EC component of Both seminomas and the EC component of knowledge Factnonseminomas share features with ES cells. nonseminomas share features with ES cells. Toexclude thatthat detection of miR-371-3 merely To exclude the Goalreflects its expression pattern in ES cells,reflects its the detection of miR-371-3 merely we tested Hypothesisby RPA miR-302a-d, another ES cells-specific expression pattern in ES cells,miRNA cluster RPA miR-302a-d, another ES cells- we tested by (Suh et al, 2004). In many of them i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d specific 1 - 3 e cluster i n g s al, i n o m Method Experimentalnonseminomas, miR-302a-d was undetectable (Figs In many of the miR-371-3 expressing seminomas EvidenceS7 and S8), suggesting that miR-371-3undetectable and nonseminomas, miR-302a-d was expression Resultis a selective event during tumorigenesis. (Figs S7 and S8), suggesting that Reg-Implication miR-371-3 expression is a selective event during Implication tumorigenesis.
  • 42. Technology #1: Discourse Annotation - across textsVoorhoeve et al, Cell, 2006:To investigate the possibility that miR-372 and miR-373 suppress theexpression of LATS2, we...Therefore, these results point to LATS2 as a mediator of the miR-372 andmiR-373 effects on cell proliferation and tumorigenicity,
  • 43. Technology #1: Discourse Annotation - across textsVoorhoeve et al, Cell, 2006:To investigate the possibility that miR-372 and miR-373 suppress the Hypothesisexpression of LATS2, we...Therefore, these results point to LATS2 as a mediator of the miR-372 andmiR-373 effects on cell proliferation and tumorigenicity,
  • 44. Technology #1: Discourse Annotation - across textsVoorhoeve et al, Cell, 2006:To investigate the possibility that miR-372 and miR-373 suppress the Hypothesisexpression of LATS2, we...Therefore, these results point to LATS2 as a mediator of the miR-372 andmiR-373 effects on cell proliferation and tumorigenicity, Implication
  • 45. Technology #1: Discourse Annotation - across textsVoorhoeve et al, Cell, 2006:To investigate the possibility that miR-372 and miR-373 suppress the Hypothesisexpression of LATS2, we...Therefore, these results point to LATS2 as a mediator of the miR-372 andmiR-373 effects on cell proliferation and tumorigenicity, ImplicationRaver-Shapira et.al, JMolCell 2007... two miRNAs, miRNA-372 and-373, function as potential novel oncogenes intesticular germ cell tumors by inhibition of LATS2 expression, which suggeststhat Lats2 is an important tumor suppressor (Voorhoeve et al., 2006).
  • 46. Technology #1: Discourse Annotation - across textsVoorhoeve et al, Cell, 2006:To investigate the possibility that miR-372 and miR-373 suppress the Hypothesisexpression of LATS2, we...Therefore, these results point to LATS2 as a mediator of the miR-372 andmiR-373 effects on cell proliferation and tumorigenicity, ImplicationRaver-Shapira et.al, JMolCell 2007 Cited Implication... two miRNAs, miRNA-372 and-373, function as potential novel oncogenes intesticular germ cell tumors by inhibition of LATS2 expression, which suggeststhat Lats2 is an important tumor suppressor (Voorhoeve et al., 2006).
  • 47. Technology #1: Discourse Annotation - across textsVoorhoeve et al, Cell, 2006:To investigate the possibility that miR-372 and miR-373 suppress the Hypothesisexpression of LATS2, we...Therefore, these results point to LATS2 as a mediator of the miR-372 andmiR-373 effects on cell proliferation and tumorigenicity, ImplicationRaver-Shapira et.al, JMolCell 2007 Cited Implication... two miRNAs, miRNA-372 and-373, function as potential novel oncogenes intesticular germ cell tumors by inhibition of LATS2 expression, which suggeststhat Lats2 is an important tumor suppressor (Voorhoeve et al., 2006).Yabuta, JBioChem 2007:miR-372 and miR-373 target the Lats2 tumor suppressor (Voorhoeve et al., 2006)
  • 48. Technology #1: Discourse Annotation - across textsVoorhoeve et al, Cell, 2006:To investigate the possibility that miR-372 and miR-373 suppress the Hypothesisexpression of LATS2, we...Therefore, these results point to LATS2 as a mediator of the miR-372 andmiR-373 effects on cell proliferation and tumorigenicity, ImplicationRaver-Shapira et.al, JMolCell 2007 Cited Implication... two miRNAs, miRNA-372 and-373, function as potential novel oncogenes intesticular germ cell tumors by inhibition of LATS2 expression, which suggeststhat Lats2 is an important tumor suppressor (Voorhoeve et al., 2006).Yabuta, JBioChem 2007: FactmiR-372 and miR-373 target the Lats2 tumor suppressor (Voorhoeve et al., 2006)
  • 49. Technology #1: Towards automated Discourse Annotation: CoreSC
  • 50. Technology #1: Towards automated Discourse Annotation: CoreSC
  • 51. Technology #1: Towards automated Discourse Annotation: CoreSC- Classified with Support Vector Machines (SVM)- Sequence labelling by Conditional Random Fields (CRF)- F-score between 18% (motivation) and 76% (experimental methods)- ‘We plan to use CoreSC annotated papers in biology to guide information extraction and retrieval, characterise extracted events and relations and facilitate inference from hypotheses to conclusions in scientific papers.’ Automatic recognition of conceptualisation zones in scientific articles to aid biological information extraction Maria Liakata,, Shyamasree Saha. Simon Dobnik,Colin Batchelor and Dietrich Rebholz-Schuhmann Bioinformatics 2011 (Accepted)
  • 52. Technology #2: Linked Data
  • 53. Technology #2: Linked Data1. Use URIs to name things2. Use HTTP URIs so they can be looked up3. Return useful data when things are looked up4. Include links to other things in the returned data
  • 54. Technology #2: Linked Data1. Use URIs to name things2. Use HTTP URIs so they can be looked up3. Return useful data when things are looked up4. Include links to other things in the returned data “Linked data is just a term for how to publish data on the web while working with the web. And the web is the best architecture we know for publishing information in a hugely diverse and distributed environment, in a gradual and sustainable way.” Tennison J, 2010. Why Linked Data for data.gov.uk? http://www.jenitennison.com/blog/node/140
  • 55. Technology # 3: Workflow integration A. de Waard, The Future of the Journal? Integrating research data with scientific discourse http://precedings.nature.com/documents/4742/version/1
  • 56. Technology # 3: Workflow integration 1. Research: Each item in the system has metadata metadata (including provenance) and relations to other data items metadata added to it. metadata metadata metadata A. de Waard, The Future of the Journal? Integrating research data with scientific discourse http://precedings.nature.com/documents/4742/version/1
  • 57. Technology # 3: Workflow integration 1. Research: Each item in the system has metadata metadata (including provenance) and relations to other data items metadata added to it. 2. Workflow: All data items created in the lab are added metadata to a (lab-owned) workflow system. metadata metadata A. de Waard, The Future of the Journal? Integrating research data with scientific discourse http://precedings.nature.com/documents/4742/version/1
  • 58. Technology # 3: Workflow integration 1. Research: Each item in the system has metadata metadata (including provenance) and relations to other data items metadata added to it. 2. Workflow: All data items created in the lab are added metadata to a (lab-owned) workflow system. 3. Authoring: A paper is written in an authoring tool which can pull data with provenance from the workflow tool in the appropriate representation into the document. metadata metadata Rats were subjected to two grueling tests (click on fig 2 to see underlying data). These results suggest that the neurological pain pro- A. de Waard, The Future of the Journal? Integrating research data with scientific discourse http://precedings.nature.com/documents/4742/version/1
  • 59. Technology # 3: Workflow integration 1. Research: Each item in the system has metadata metadata (including provenance) and relations to other data items metadata added to it. 2. Workflow: All data items created in the lab are added metadata to a (lab-owned) workflow system. 3. Authoring: A paper is written in an authoring tool which can pull data with provenance from the workflow tool in the appropriate representation into the document. metadata 4. Editing and review: Once the co-authors agree, the paper is ‘exposed’ to the editors, who in turn expose it to metadata reviewers. Reports are stored in the authoring/editing system, the paper gets updated, until it is validated. Rats were subjected to two grueling tests (click on fig 2 to see underlying data). These results suggest that the neurological pain pro-Review Revise Edit A. de Waard, The Future of the Journal? Integrating research data with scientific discourse http://precedings.nature.com/documents/4742/version/1
  • 60. Technology # 3: Workflow integration 1. Research: Each item in the system has metadata metadata (including provenance) and relations to other data items metadata added to it. 2. Workflow: All data items created in the lab are added metadata to a (lab-owned) workflow system. 3. Authoring: A paper is written in an authoring tool which can pull data with provenance from the workflow tool in the appropriate representation into the document. metadata 4. Editing and review: Once the co-authors agree, the paper is ‘exposed’ to the editors, who in turn expose it to metadata reviewers. Reports are stored in the authoring/editing system, the paper gets updated, until it is validated. 5. Publishing and distribution: When a paper is published, a collection of validated information is exposed to the world. It remains connected to its related Rats were subjected to two grueling data item, and its heritage can be traced. tests (click on fig 2 to see underlying data). These results suggest that the neurological pain pro-Review Revise Edit A. de Waard, The Future of the Journal? Integrating research data with scientific discourse http://precedings.nature.com/documents/4742/version/1
  • 61. Technology # 3: Workflow integration 1. Research: Each item in the system has metadata metadata (including provenance) and relations to other data items metadata added to it. 2. Workflow: All data items created in the lab are added metadata to a (lab-owned) workflow system. 3. Authoring: A paper is written in an authoring tool which can pull data with provenance from the workflow tool in the appropriate representation into the document. metadata 4. Editing and review: Once the co-authors agree, the paper is ‘exposed’ to the editors, who in turn expose it to metadata reviewers. Reports are stored in the authoring/editing system, the paper gets updated, until it is validated. 5. Publishing and distribution: When a paper is published, a collection of validated information is exposed to the world. It remains connected to its related Rats were subjected to two grueling data item, and its heritage can be traced. tests (click on fig 2 to see underlying 6. User applications: distributed applications run on this data). These results suggest that the ‘exposed data’ universe. neurological pain pro- Some other publisherReview Revise Edit A. de Waard, The Future of the Journal? Integrating research data with scientific discourse http://precedings.nature.com/documents/4742/version/1
  • 62. Technology # 3: Workflow integration QTL(C)  Dave  De  Roure Results Workflow  16 Logs Metadata Slides Paper Common  pathways Workflow  13 Results
  • 63. Technology # 3: Workflow integration QTL(C)  Dave  De  Roure Results Workflow  16 Logs Metadata Slides Paper Common  pathways Workflow  13 Results
  • 64. Technology # 3: Workflow integration QTL(C)  Dave  De  Roure Results Workflow  16 produces Included  in Included  in Published  in Feeds  into Logs produces Included  in Included  in Metadata Slides Paper produces Published  in Common  pathways Workflow  13 Results
  • 65. Three Tools 19
  • 66. Tool # 1: DOMEO annotation toolhttp://purl.org/swan/af e.g. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2224208/?tool=pubmed Paolo Ciccarese, Marco Ocana, Tim Clark, DOMEO: a web-based tool for semantic annotation of online documents, Bioontologies, 2011
  • 67. Tool # 1: DOMEO annotation toolhttp://purl.org/swan/af e.g. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2224208/?tool=pubmed - Allows for manual and automated annotation, or both - Now linked to NCBO text mining tool, expanding to all UIMA - Standoff annotations in Annotation Ontology = RDF format, can be exported Paolo Ciccarese, Marco Ocana, Tim Clark, DOMEO: a web-based tool for semantic annotation of online documents, Bioontologies, 2011
  • 68. Tool # 1: DOMEO annotation toolhttp://purl.org/swan/af e.g. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2224208/?tool=pubmed - Allows for manual and automated annotation, or both - Now linked to NCBO text mining tool, expanding to all UIMA - Standoff annotations in Annotation Ontology = RDF format, can be exported Paolo Ciccarese, Marco Ocana, Tim Clark, DOMEO: a web-based tool for semantic annotation of online documents, Bioontologies, 2011
  • 69. Tool # 2: Linked Data Repository
  • 70. Tool # 2: Linked Data Repository Dublin Core and SKOS
  • 71. Tool # 2: Linked Data Repository Dublin Core and SKOS SWAN’s PAV (Provenance, Authoring and Versioning) ontology
  • 72. Tool # 3: ScienceDirect app store
  • 73. Tool # 3: ScienceDirect app store - Eclipse SDK platform accessing all ScienceDirect/Scopus content - Build applications on top of content - Offer to users in marketplace
  • 74. What next? 23
  • 75. Force11http://force11.orgForce11 = Future of Research Communicationand e-Scholarship, 2011 is a community ofscholars, librarians, archivists, publishers andresearch funders that has arisen organically tohelp facilitate the change toward improvedknowledge creation and sharing.
  • 76. Force11http://force11.orgForce11 = Future of Research Communicationand e-Scholarship, 2011 is a community ofscholars, librarians, archivists, publishers andresearch funders that has arisen organically tohelp facilitate the change toward improvedknowledge creation and sharing.
  • 77. Force11 http://force11.org Force11 = Future of Research Communication and e-Scholarship, 2011 is a community of scholars, librarians, archivists, publishers and research funders that has arisen organically to help facilitate the change toward improved knowledge creation and sharing.Individually and collectively, we aimto bring about a change inscholarly communication throughthe effective use of informationtechnologiesNext step: work on these issues.We need more publishers onboard!
  • 78. Some thoughts about the future:
  • 79. Some thoughts about the future:- Let’s think in terms of use cases, not technologies: - Identify where knowledge exists, within and outside of the article - Identify what the information needs are, and which components need to be connected - Only if our content plays well with others does it get to stay in the game!
  • 80. Some thoughts about the future:- Let’s think in terms of use cases, not technologies: - Identify where knowledge exists, within and outside of the article - Identify what the information needs are, and which components need to be connected - Only if our content plays well with others does it get to stay in the game!- Work with scientists, grant agencies, libraries, software developers big and small and.... each other!
  • 81. Some thoughts about the future:- Let’s think in terms of use cases, not technologies: - Identify where knowledge exists, within and outside of the article - Identify what the information needs are, and which components need to be connected - Only if our content plays well with others does it get to stay in the game!- Work with scientists, grant agencies, libraries, software developers big and small and.... each other!- For instance, let’s collectively look at enabling: - Standoff annotation formats - Research data and workflow standards/integration - Claim-evidence networks and discourse annotation:
  • 82. - Which discourse annotation schemes are most portable? Can they be applied to both full papers and abstracts? Can they be applied to texts in different domains and different genres (research papers, reviews, patents, etc)?- How can we compare annotations, and how can we decide which features, approaches or techniques work best? What are the most topical use cases? How can we evaluate performance and what are the most appropriate tasks?- What corpora are currently available for comparing and contrasting discourse annotation, and how can we improve and increase these?- How applicable are these efforts for improving methods of publishing, detecting and correcting authors errors at the discourse level, or summarizing scholarly text? How close are we to implementing them at a production scale?
  • 83. Thank you!- Tim Clark, Paolo Ciccarese, Harvard, More information: Cambridge, USA - Data2Semantics:- Eduard Hovy, Gully Burns, Cartic http://www.data2semantics.org Ramakrishnan, ISI/USC, Los Angeles, USA - W3C group on Discourse Structure:- Phil Bourne, Maryann Martone, UCSD, USA http://www.w3.org/wiki/HCLSIG/SWANSIOC- Sophia Ananiadou, NaCTeM, Manchester, UK - Executable Paper Challenge: http://www.executablepapers.com- Dave DeRoure, Oxford eScience Center, UK - Parsing rhetoric:- Maria Liakata, EBI, Cambridge, UK http://elsatglabs.com/labs/anita/- Paul Groth, Frank van Harmelen,Vrije - Sapienta: http://www.sapientaproject.com/ Universiteit, Amsterdam, Netherlands - SciVerse: http://developer.sciverse.com- Henk Pander Maat, Ted Sanders, Universiteit Utrecht, Netherlands - Force11: http://force11.org- The Force11 members - DSSD2012: http://www.nactem.ac.uk/dssd/ Or contact me: Anita de Waard, a.dewaard@elsevier.com