What’s wrong with research papers - and (how) can we fix it?

1,225 views

Published on

Talk given at DERI in Galway on May 2, 2012

Published in: Technology, Education, Business

What’s wrong with research papers - and (how) can we fix it?

  1. 1. Whatʼs wrong with research papers -and (how) can we fix it? Anita de Waard Disruptive Technologies Director Elsevier Labs a.dewaard@elsevier.com http://elsatglabs.com/labs/anita
  2. 2. The Big Problem: 2
  3. 3. The Big Problem:1)" There are too many papers 2
  4. 4. The Big Problem:1)" There are too many papers2)" We have too little time to read them 2
  5. 5. The Big Problem:1)" There are too many papers2)" We have too little time to read them 2
  6. 6. To address this problem, we make: 3
  7. 7. To address this problem, we make:• databases• text mining tools• nanopublications• data publications• wiki publications• ontologies; ontology integration tools• workflow/data integration systems• executable components• ....and write emails/grants/papers/blogs about this...• ... and we end up with: 3
  8. 8. To address this problem, we make:• databases• text mining tools• nanopublications• data publications• wiki publications• ontologies; ontology integration tools• workflow/data integration systems• executable components• ....and write emails/grants/papers/blogs about this...• ... and we end up with: 1)" Even more papers!! 2)" Even less time to read them!! 3
  9. 9. What problems are we solving? 4
  10. 10. What problems are we solving?• Weʼre mostly improving the format of the research article. 4
  11. 11. What problems are we solving?• Weʼre mostly improving the format of the research article.• This talk: aspects of the format that are being improved (and some examples of work to improve them): A.Issues with the paper format B.Issues pertaining to habits of writing C.Issues inherent to language D.Issues in trying to create connected content 4
  12. 12. What problems are we solving?• Weʼre mostly improving the format of the research article.• This talk: aspects of the format that are being improved (and some examples of work to improve them): A.Issues with the paper format B.Issues pertaining to habits of writing C.Issues inherent to language D.Issues in trying to create connected content• Do any of these address the Big Problem? 4
  13. 13. What problems are we solving?• Weʼre mostly improving the format of the research article.• This talk: aspects of the format that are being improved (and some examples of work to improve them): A.Issues with the paper format B.Issues pertaining to habits of writing C.Issues inherent to language D.Issues in trying to create connected content• Do any of these address the Big Problem?• What shall we do about it? 4
  14. 14. A. Issue: the paper format 5
  15. 15. A. Issue: the paper formatA1:" Paper is two-dimensional 5
  16. 16. A. Issue: the paper formatA1:" Paper is two-dimensionalA2:" Paper is linear 5
  17. 17. A. Issue: the paper formatA1:" Paper is two-dimensionalA2:" Paper is linearA3: Paper is not interactive 5
  18. 18. A. Issue: the paper formatA1:" Paper is two-dimensionalA2:" Paper is linearA3: Paper is not interactive 5
  19. 19. A1: Issue: paper is two-dimensional 6
  20. 20. A1: Issue: paper is two-dimensional• Some experiments: allow representations of interactive figures (Wolfram Alpha), Utopia, Chem-3d 6
  21. 21. A1: Issue: paper is two-dimensional• Some experiments: allow representations of interactive figures (Wolfram Alpha), Utopia, Chem-3d• Lack of experimentation with formats: the internet is multi-dimensional, so why do we still need page limits? 6
  22. 22. A1: Issue: paper is two-dimensional• Some experiments: allow representations of interactive figures (Wolfram Alpha), Utopia, Chem-3d• Lack of experimentation with formats: the internet is multi-dimensional, so why do we still need page limits? 6
  23. 23. A2: Issue: paper is linear 7
  24. 24. A2: Issue: paper is linear• Read from front to back (although research suggests a quick skim to core parts, but linearity helps us do that) 7
  25. 25. A2: Issue: paper is linear• Read from front to back (although research suggests a quick skim to core parts, but linearity helps us do that)• References are at the end, so your reading is not interrupted 7
  26. 26. A2: Issue: paper is linear• Read from front to back (although research suggests a quick skim to core parts, but linearity helps us do that)• References are at the end, so your reading is not interrupted• Headers are sequential - and not directly accessible 7
  27. 27. A2: (Old) Experiment: ABCDE 8
  28. 28. A2: (Old) Experiment: ABCDE• LaTeX Stylesheet: –Annotation –Background –Contribution –Discussion –Entities (references, projects, terms in ontologies, etc) in RDF –Core sentences create structured abstract 8
  29. 29. A2: (Old) Experiment: ABCDE• LaTeX Stylesheet: –Annotation –Background –Contribution –Discussion –Entities (references, projects, terms in ontologies, etc) in RDF –Core sentences create structured abstract• E.g. in proceedings: collect all core Contribution components 8
  30. 30. A2: (Old) Experiment: ABCDE• LaTeX Stylesheet: –Annotation –Background –Contribution –Discussion –Entities (references, projects, terms in ontologies, etc) in RDF –Core sentences create structured abstract• E.g. in proceedings: collect all core Contribution components• I still have the stylesheets, if anyone’s interested :-)! 8
  31. 31. A3: Paper is not interactive 9
  32. 32. A3: Paper is not interactive• Experiment: Executable papers: –Run code within a paper –Experiments: R, SPSS, Vistrails –Rerender code within a paper, change algorithm/see effect; run different dataset –How do you archive software? Satyanarayanan at CMU: Olive, ‘Internet ecosystem of curated virtual machine image collections’ 9
  33. 33. B. Issue: habits of writing 10
  34. 34. B. Issue: habits of writingB1: Cite a paper - not a claim 10
  35. 35. B. Issue: habits of writingB1: Cite a paper - not a claimB2: No precision in describing entities 10
  36. 36. B. Issue: habits of writingB1: Cite a paper - not a claimB2: No precision in describing entitiesB3: We write post-mortems (stories :-)!) 10
  37. 37. B1: Citations create facts: 11
  38. 38. B1: Citations create facts:- Voorhoeve, 2006: “These miRNAs neutralize p53- mediated CDK inhibition, possibly through direct inhibition of the expression of the tumorsuppressor LATS2.” 11
  39. 39. B1: Citations create facts:- Voorhoeve, 2006: “These miRNAs neutralize p53- mediated CDK inhibition, possibly through direct inhibition of the expression of the tumorsuppressor LATS2.”- Kloosterman and Plasterk, 2006: “In a genetic screen, miR-372 and miR-373 were found to allow proliferation of primary human cells that express oncogenic RAS and active p53, possibly by inhibiting the tumor suppressor LATS2 (Voorhoeve et al., 2006).” 11
  40. 40. B1: Citations create facts:- Voorhoeve, 2006: “These miRNAs neutralize p53- mediated CDK inhibition, possibly through direct inhibition of the expression of the tumorsuppressor LATS2.”- Kloosterman and Plasterk, 2006: “In a genetic screen, miR-372 and miR-373 were found to allow proliferation of primary human cells that express oncogenic RAS and active p53, possibly by inhibiting the tumor suppressor LATS2 (Voorhoeve et al., 2006).”- Yabuta et al., 2007: “[On the other hand,] two miRNAs, miRNA-372 and-373, function as potential novel oncogenes in testicular germ cell tumors by inhibition of LATS2 expression, which suggests that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006).” 11
  41. 41. B1: Citations create facts:- Voorhoeve, 2006: “These miRNAs neutralize p53- mediated CDK inhibition, possibly through direct inhibition of the expression of the tumorsuppressor LATS2.”- Kloosterman and Plasterk, 2006: “In a genetic screen, miR-372 and miR-373 were found to allow proliferation of primary human cells that express oncogenic RAS and active p53, possibly by inhibiting the tumor suppressor LATS2 (Voorhoeve et al., 2006).”- Yabuta et al., 2007: “[On the other hand,] two miRNAs, miRNA-372 and-373, function as potential novel oncogenes in testicular germ cell tumors by inhibition of LATS2 expression, which suggests that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006).”- Okada et al., 2011: “Two oncogenic miRNAs, miR-372 and miR-373, directly inhibit the expression of Lats2, thereby allowing tumorigenic growth in the presence of p53 (Voorhoeve et al., 2006).” 11
  42. 42. B1: TAC2012: Add authorʼs text to citation 12
  43. 43. B1: TAC2012: Add authorʼs text to citationVoorhoeve, P. M.; le Sage, C et al. (2006). A Genetic Screen Implicates miRNA-372 andmiRNA-373 As Oncogenes in Testicular Germ Cell Tumors, Cell 124 (6) pp.1169 - 1181Citing goal: “To perform genetic screens for novel functions of miRNAs,”− in order to identify miRNAs functionally associated with carcinogenesis− to identify miRNAs that when overexpressed could substitute for p53 loss and allowcontinued proliferation in the context of Ras activation 12
  44. 44. B1: TAC2012: Add authorʼs text to citationVoorhoeve, P. M.; le Sage, C et al. (2006). A Genetic Screen Implicates miRNA-372 andmiRNA-373 As Oncogenes in Testicular Germ Cell Tumors, Cell 124 (6) pp.1169 - 1181Citing goal: “To perform genetic screens for novel functions of miRNAs,”− in order to identify miRNAs functionally associated with carcinogenesis− to identify miRNAs that when overexpressed could substitute for p53 loss and allowcontinued proliferation in the context of Ras activationCiting method: “We subsequently created a human miRNA expression library (miR-Lib) bycloning almost all annotated human miRNAs into our vector (Rfam release 6) (Figure S3).”− Voorhoeve et al. (116) employed a novel strategy by combining an miRNA vector libraryand corresponding bar code array− using a retroviral expression library of miRNAs,− Using a novel retroviral miRNA expression library, Agami and co-workers performed acell-based screen 12
  45. 45. B1: TAC2012: Add authorʼs text to citationVoorhoeve, P. M.; le Sage, C et al. (2006). A Genetic Screen Implicates miRNA-372 andmiRNA-373 As Oncogenes in Testicular Germ Cell Tumors, Cell 124 (6) pp.1169 - 1181Citing goal: “To perform genetic screens for novel functions of miRNAs,”− in order to identify miRNAs functionally associated with carcinogenesis− to identify miRNAs that when overexpressed could substitute for p53 loss and allowcontinued proliferation in the context of Ras activationCiting method: “We subsequently created a human miRNA expression library (miR-Lib) bycloning almost all annotated human miRNAs into our vector (Rfam release 6) (Figure S3).”− Voorhoeve et al. (116) employed a novel strategy by combining an miRNA vector libraryand corresponding bar code array− using a retroviral expression library of miRNAs,− Using a novel retroviral miRNA expression library, Agami and co-workers performed acell-based screenCiting result: “we identified miR-372-373, each permitting proliferation and tumorigenesis ofprimary human cells that harbor both oncogenic RAS and active wildtype p53.”− miR-372 and miR-373 were consequently found to permit proliferation and tumorigenesisof these primary cells carrying both oncogenic RAS and wild-type p53,− Voorhoeve et al. (2006) identified miR-372 and miR-373− miR-372 and miR-373 were found to allow proliferation of primary human cells thatexpress oncogenic RAS and active p53, 12
  46. 46. B2: Issue: entities in papers are not exact • Midfrontal cortex tissue samples from neurologically unimpaired subjects (n9) and from subjects with AD (n11) were obtained from the Rapid Autopsy Program • Immunoblot analysis and antibodies • The following antibodies were used for immunoblotting: -actin mAb (1:10,000 dilution, Sigma-Aldrich); -tubulin mAb (1:10,000, Abcam); T46 mAb (specific to tau 404–441, 1:1000, Invitrogen); Tau-5 mAb (human tau 218–225, 1:1000, BD Biosciences) (Porzig et al., 2007); AT8 mAb (phospho-tau Ser199, Ser202, and Thr205, 1:500, Innogenetics); PHF-1 mAb (phospho-tau Ser396 and Ser404, 1:250, gift from P. Davies); 12E8 mAb (phospho-tau Ser262 and Ser356, 1:1000, gift from P. Seubert); NMDA receptors 2A, 2B and 2D goat pAbs (C terminus, 1:1000, Santa Cruz Biotechnology)… Maryann Martone, Jan 2012: 2012 ACM SIGHIT International Health Informatics Symposium (IHI 2012)
  47. 47. B2: Issue: entities in papers are not exact • Midfrontal cortex tissue samples from neurologically unimpaired subjects (n9) and from subjects with AD (n11) were obtained from the Rapid Autopsy Program • Immunoblot analysis and antibodies • The following antibodies were used for immunoblotting: -actin mAb (1:10,000 dilution, Sigma-Aldrich); -tubulin mAb (1:10,000, Abcam); T46 mAb (specific to tau 404–441, 1:1000, Invitrogen); Tau-5 mAb (human tau 218–225, 1:1000, BD Biosciences) (Porzig et al., 2007); AT8 mAb (phospho-tau Ser199, Ser202, and Thr205, 1:500, Innogenetics); PHF-1 mAb (phospho-tau Ser396 and Ser404, 1:250, gift from P. Davies); 12E8 mAb (phospho-tau Ser262 and Ser356, 1:1000, gift from P. Seubert); NMDA receptors 2A, 2B and 2D goat pAbs (C terminus, 1:1000, Santa Cruz Biotechnology)… Maryann Martone, Jan 2012: 2012 ACM SIGHIT International Health Informatics Symposium (IHI 2012)
  48. 48. B2: Issue: entities in papers are not exact • Midfrontal cortex tissue samples from neurologically unimpaired subjects (n9) and from subjects with AD (n11) were obtained from the Rapid Autopsy Program • Immunoblot analysis and antibodies • The following antibodies were used for immunoblotting: -actin mAb (1:10,000 dilution, Sigma-Aldrich); -tubulin mAb (1:10,000, Abcam); T46 mAb (specific to tau 404–441, 1:1000, Invitrogen); Tau-5 mAb (human tau 218–225, 1:1000, BD Biosciences) (Porzig et al., 2007); AT8 mAb (phospho-tau Ser199, Ser202, and Thr205, 1:500, Innogenetics); PHF-1 mAb (phospho-tau Ser396 and Ser404, 1:250, gift from P. Davies); 12E8 mAb (phospho-tau Ser262 and Ser356, 1:1000, gift from P. Seubert); NMDA receptors 2A, 2B and 2D goat pAbs (C terminus, 1:1000, •95 antibodies were identified in 8 articles Santa Cruz Biotechnology)… •52 did not contain enough information to determine the antibody used Maryann Martone, Jan 2012: 2012 ACM SIGHIT International Health Informatics Symposium (IHI 2012)
  49. 49. B3: Issue: methods are written post-mortem 14
  50. 50. B3: Issue: methods are written post-mortem• Yolanda Gil at ISI modeled Bourne et al. paper in Wings 14
  51. 51. B3: Issue: methods are written post-mortem• Yolanda Gil at ISI modeled Bourne et al. paper in Wings• Anecdotal evidence: Phil Bourne couldn’t remember most of this, even after digging through emails! 14
  52. 52. B3: So why not write the data first and wrap the paper around it??
  53. 53. B3: So why not write the data first and wrap the paper around it?? metadata 1. Research: Each item in the system has metadata (including metadata provenance) and relations to other data items added to it. metadata metadata metadata
  54. 54. B3: So why not write the data first and wrap the paper around it?? metadata 1. Research: Each item in the system has metadata (including metadata provenance) and relations to other data items added to it. 2. Workflow: All data items created in the lab are added to a metadata (lab-owned) workflow system. metadata metadata
  55. 55. B3: So why not write the data first and wrap the paper around it?? metadata 1. Research: Each item in the system has metadata (including metadata provenance) and relations to other data items added to it. 2. Workflow: All data items created in the lab are added to a metadata (lab-owned) workflow system. 3. Authoring: A paper is written in an authoring tool which can pull data with provenance from the workflow tool in the appropriate representation into the document. metadata metadata Rats  were  subjected  to  two  grueling   tests (click  on  fig  2  to  see  underlying  data).   These  results  suggest  that  the   neurological  pain  pro-­‐
  56. 56. B3: So why not write the data first and wrap the paper around it?? metadata 1. Research: Each item in the system has metadata (including metadata provenance) and relations to other data items added to it. 2. Workflow: All data items created in the lab are added to a metadata (lab-owned) workflow system. 3. Authoring: A paper is written in an authoring tool which can pull data with provenance from the workflow tool in the appropriate representation into the document. metadata 4. Editing and review: Once the co-authors agree, the paper is ‘exposed’ to the editors, who in turn expose it to reviewers. metadata Reports are stored in the authoring/editing system, the paper gets updated, until it is validated. Rats  were  subjected  to  two  grueling   tests (click  on  fig  2  to  see  underlying  data).   These  results  suggest  that  the   neurological  pain  pro-­‐ Review Revise Edit
  57. 57. B3: So why not write the data first and wrap the paper around it?? metadata 1. Research: Each item in the system has metadata (including metadata provenance) and relations to other data items added to it. 2. Workflow: All data items created in the lab are added to a metadata (lab-owned) workflow system. 3. Authoring: A paper is written in an authoring tool which can pull data with provenance from the workflow tool in the appropriate representation into the document. metadata 4. Editing and review: Once the co-authors agree, the paper is ‘exposed’ to the editors, who in turn expose it to reviewers. metadata Reports are stored in the authoring/editing system, the paper gets updated, until it is validated. 5. Publishing and distribution: When a paper is published, a collection of validated information is exposed to the world. It remains connected to its related data item, and its heritage can Rats  were  subjected  to  two  grueling   be traced. tests (click  on  fig  2  to  see  underlying  data).   These  results  suggest  that  the   neurological  pain  pro-­‐ Review Revise Edit
  58. 58. B3: So why not write the data first and wrap the paper around it?? metadata 1. Research: Each item in the system has metadata (including metadata provenance) and relations to other data items added to it. 2. Workflow: All data items created in the lab are added to a metadata (lab-owned) workflow system. 3. Authoring: A paper is written in an authoring tool which can pull data with provenance from the workflow tool in the appropriate representation into the document. metadata 4. Editing and review: Once the co-authors agree, the paper is ‘exposed’ to the editors, who in turn expose it to reviewers. metadata Reports are stored in the authoring/editing system, the paper gets updated, until it is validated. 5. Publishing and distribution: When a paper is published, a collection of validated information is exposed to the world. It remains connected to its related data item, and its heritage can Rats  were  subjected  to  two  grueling   be traced. tests (click  on  fig  2  to  see  underlying  data).   6. User applications: distributed applications run on this These  results  suggest  that  the   ‘exposed data’ universe. neurological  pain  pro-­‐ Some  other  publisher Review Revise Edit
  59. 59. C. Issue: language 16
  60. 60. C. Issue: languageC1:" Language is coherent 16
  61. 61. C. Issue: languageC1:" Language is coherentC2:" Language is narrative 16
  62. 62. C. Issue: languageC1:" Language is coherentC2:" Language is narrativeC3:" Language is abstract 16
  63. 63. C. Issue: languageC1:" Language is coherentC2:" Language is narrativeC3:" Language is abstract 16
  64. 64. C1: Language is coherent:Adding drug-drug interactions to DIKB 17
  65. 65. C1: Language is coherent:Adding drug-drug interactions to DIKB• Drug-Interaction Knowledge Base: Clinically-oriented, evidence-based knowledge base designed to support adding data to product inserts 17
  66. 66. C1: Language is coherent:Adding drug-drug interactions to DIKB• Drug-Interaction Knowledge Base: Clinically-oriented, evidence-based knowledge base designed to support adding data to product inserts• Contains quantitative and qualitative assertions about drug mechanisms and pharmacokinetic drug-drug interactions (DDI) for over 60 drugs 17
  67. 67. C1: Language is coherent:Adding drug-drug interactions to DIKB• Drug-Interaction Knowledge Base: Clinically-oriented, evidence-based knowledge base designed to support adding data to product inserts• Contains quantitative and qualitative assertions about drug mechanisms and pharmacokinetic drug-drug interactions (DDI) for over 60 drugs• HCLS Sig: Currently working on expanding the DIKB with more content and making a “mash‐up” view of package inserts adding up‐to‐date information View project: http://dbmi-icode-01.dbmi.pitt.edu/dikb-evidence/front-page.html SPARQL endpoint: http://dbmi-icode-01.dbmi.pitt.edu:2020/directory/Drugs 17
  68. 68. C1: Coherent language is hard to parse 18
  69. 69. C1: Coherent language is hard to parse• Self-reference: R-CT and its metabolites, studied using the same procedures, had properties very similar to those of the corresponding S-enantiomers. 18
  70. 70. C1: Coherent language is hard to parse• Self-reference: R-CT and its metabolites, studied using the same procedures, had properties very similar to those of the corresponding S-enantiomers. 18
  71. 71. C1: Coherent language is hard to parse• Self-reference: R-CT and its metabolites, studied using the same procedures, had properties very similar to those of the corresponding S-enantiomers.• Reference to external data sources: Average relative in vivo abundances equivalent to the relative activity factors, were estimated using methods described in detail previously (Crespi, 1995; Venkatakrishnan et al., 1998 a,c, 1999, 2000, 2001; von Moltke et al., 1999 a,b; Störmer et al., 2000). 18
  72. 72. C1: Coherent language is hard to parse• Self-reference: R-CT and its metabolites, studied using the same procedures, had properties very similar to those of the corresponding S-enantiomers.• Reference to external data sources: Average relative in vivo abundances equivalent to the relative activity factors, were estimated using methods described in detail previously (Crespi, 1995; Venkatakrishnan et al., 1998 a,c, 1999, 2000, 2001; von Moltke et al., 1999 a,b; Störmer et al., 2000). 18
  73. 73. C1: Coherent language is hard to parse• Self-reference: R-CT and its metabolites, studied using the same procedures, had properties very similar to those of the corresponding S-enantiomers.• Reference to external data sources: Average relative in vivo abundances equivalent to the relative activity factors, were estimated using methods described in detail previously (Crespi, 1995; Venkatakrishnan et al., 1998 a,c, 1999, 2000, 2001; von Moltke et al., 1999 a,b; Störmer et al., 2000). 18
  74. 74. C1: Coherent language is hard to parse• Self-reference: R-CT and its metabolites, studied using the same procedures, had properties very similar to those of the corresponding S-enantiomers.• Reference to external data sources: Average relative in vivo abundances equivalent to the relative activity factors, were estimated using methods described in detail previously (Crespi, 1995; Venkatakrishnan et al., 1998 a,c, 1999, 2000, 2001; von Moltke et al., 1999 a,b; Störmer et al., 2000).• Ways of describing meant for human eyes Based on established index reactions, S-CT and S-DCT were negligible inhibitors (IC50> 100 µM) of CYP1A2, -2C9, -2C19, -2E1, and -3A, and weakly inhibited CYP2D6 (IC50 = 70–80 µM) 18
  75. 75. C1: Coherent language is hard to parse• Self-reference: R-CT and its metabolites, studied using the same procedures, had properties very similar to those of the corresponding S-enantiomers.• Reference to external data sources: Average relative in vivo abundances equivalent to the relative activity factors, were estimated using methods described in detail previously (Crespi, 1995; Venkatakrishnan et al., 1998 a,c, 1999, 2000, 2001; von Moltke et al., 1999 a,b; Störmer et al., 2000).• Ways of describing meant for human eyes Based on established index reactions, S-CT and S-DCT were negligible inhibitors (IC50> 100 µM) of CYP1A2, -2C9, -2C19, -2E1, and -3A, and weakly inhibited CYP2D6 (IC50 = 70–80 µM)• Many statements wrapped into one: S-CT was transformed to S-DCT by CYP2C19 (Km = 69 µM), CYP2D6 (Km = 29 µM), and CYP3A4 (Km = 588 µM). 18
  76. 76. C2: Issue: Language is narrative 19
  77. 77. C2: Issue: Language is narrative• ‘The truth can only be told in stories’ 19
  78. 78. C2: Issue: Language is narrative• ‘The truth can only be told in stories’• Complex knowledge such as scientific theories, findings, conclusions have a narrative/rhetorical structure 19
  79. 79. C2: Issue: Language is narrative• ‘The truth can only be told in stories’• Complex knowledge such as scientific theories, findings, conclusions have a narrative/rhetorical structure• Typical pattern: claim/hypothesis, discussion of experimental findings, recap of claim, rebuttals, recap of claim 19
  80. 80. C2: Issue: Language is narrative• ‘The truth can only be told in stories’• Complex knowledge such as scientific theories, findings, conclusions have a narrative/rhetorical structure• Typical pattern: claim/hypothesis, discussion of experimental findings, recap of claim, rebuttals, recap of claim• Roughly the same claim appears 4 or 5 times in a paper 19
  81. 81. C2: Experiment:ʻClaimed Knowledge Updatesʼ 20
  82. 82. C3: Issue: Language is abstract 21
  83. 83. C3: Issue: Language is abstract“These results are consistent with those obtained by RPAand demonstrate that AhR ligands suppress IL-6 mRNA levelsby approximately 40–60%.”“Data presented in Figure 5A extend previous studiesperformed with monocytes by demonstrating thatLPS induces NF-κB-DNA binding in bone marrow stromal cells.”“An added incentive for these studies was provided by theobservation that the IL-6 gene promoter contains an NF-κBbinding site which plays a major role in regulating LPS-inducedIL-6 transcription [55-57].”• Purple = deictic/anaphoric markers, pointing to current text• Blue = metalanguage/epistemic evaluation• Green = experimental method• Red = conceptual claim• Orange = claim referred to in other work 21
  84. 84. C3: Formal Language: Biological Exchange LanguageIn a screen for miRNAs that cooperate with oncogenes in cellular transformation,we identified miR-372 and miR-373, each permitting proliferation and tumorigenesisof primary human cells that harbor both oncogenic RAS and active wild-type p53.Increased abundance of miR-372 increases cell proliferationr(MIR:miR-372) -| bp(GO:”Cell Proliferation”))Increased abundance of miR-372 increases tumorgenesisr(MIR:miR-372) -| bp(GO:Tumorgenesis))We provide evidence that these miRNAs are potential novel oncogenesparticipating in the development of human testicular germ cell tumors by numbingthe p53 pathway, thus allowing tumorigenic growth in the presence of wild-type p53.Increased abundance of miR-372 decreases activity of TP53r(MIR:miR-372) -| tscript(p(HUGO:Trp53))Context: cancerActivity of TP53 decreases cell growthSET Disease = “Cancer”tscript(p(HUGO:Trp53)) -| bp(GO:”Cell Growth” 22
  85. 85. C3: Experiment: add epistemic evaluation/ knowledge attribution to BEL
  86. 86. C3: Experiment: add epistemic evaluation/ knowledge attribution to BELFor a Proposition P, an epistemically marked clause E is anEvaluation of P, EV, B, S(P), with:- V = Value: 3 = Assumed true, 2 = Probable, 1 = Possible, 0 = Unknown, (- 1= possibly untrue, - 2 = probably untrue, -3 = assumed untrue)- B = Basis: Reasoning Data- S = Source: A = speaker is author A, explicit IA = speaker author, A, implicit N = other author N, explicit NN = other author NN, implicit
  87. 87. D. Collections of papers 24
  88. 88. D. Collections of papersD1:" Canʼt search papers easily 24
  89. 89. D. Collections of papersD1:" Canʼt search papers easilyD2:" Canʼt connect papers well 24
  90. 90. D. Collections of papersD1:" Canʼt search papers easilyD2:" Canʼt connect papers wellD3:" Canʼt combine knowledge fromdifferent papers 24
  91. 91. D1: Searching collections of papers 25
  92. 92. D1: Searching collections of papers• It is relatively easy to find a paper you are looking for: Google Scholar, Google,..., Scopus... (in that order?) 25
  93. 93. D1: Searching collections of papers• It is relatively easy to find a paper you are looking for: Google Scholar, Google,..., Scopus... (in that order?)• But it is very hard to find if something was done about a certain topic (e.g. ‘citances’) 25
  94. 94. D1: Searching collections of papers• It is relatively easy to find a paper you are looking for: Google Scholar, Google,..., Scopus... (in that order?)• But it is very hard to find if something was done about a certain topic (e.g. ‘citances’)• And it’s impossible to know if nothing was done on a topic 25
  95. 95. D1: Searching collections of papers• It is relatively easy to find a paper you are looking for: Google Scholar, Google,..., Scopus... (in that order?)• But it is very hard to find if something was done about a certain topic (e.g. ‘citances’)• And it’s impossible to know if nothing was done on a topic• Why aren’t more people working on this? 25
  96. 96. D1: Searching collections of papers• It is relatively easy to find a paper you are looking for: Google Scholar, Google,..., Scopus... (in that order?)• But it is very hard to find if something was done about a certain topic (e.g. ‘citances’)• And it’s impossible to know if nothing was done on a topic• Why aren’t more people working on this?• What happened to the semantic desktop?? 25
  97. 97. D2: How do we connect papers? 26
  98. 98. D2: How do we connect papers?• Papers exist within a con-text: preceding knowledge, succeeding knowledge, knowledge in your head or on your computer 26
  99. 99. D2: How do we connect papers?• Papers exist within a con-text: preceding knowledge, succeeding knowledge, knowledge in your head or on your computer• How can we annotate these relations, maintain connections, explore ones that others have made? 26
  100. 100. D2: Experiment:Annotation in SWAN using DOMEO rdf:type "#$%&()*+,-./012#341546! !"#$%&()#*+! dct:title ,$-.#+&+/.#$01!2342/&5#6&!2#!275#8&.0$&!2092 0-5&.2+&+/.#$&28.0-&*$!! G1 swanrel:referencesAsSupportiveEvidence "#$%&7841%-7.9):0/9=4(0)<6! G5 pav:contributedBy "#$%&7841%-7.9):0%7,;0)<6! G6 27
  101. 101. D2: Experiment:Annotation in SWAN using DOMEO rdf:type "#$%&()*+,-./012#341546! !"#$%&()#*+! dct:title ,$-.#+&+/.#$01!2342/&5#6&!2#!275#8&.0$&!2092 0-5&.2+&+/.#$&28.0-&*$!! G1 swanrel:referencesAsSupportiveEvidence "#$%&7841%-7.9):0/9=4(0)<6! G5 pav:contributedBy "#$%&7841%-7.9):0%7,;0)<6! G6 27
  102. 102. D2: Experiment:Annotation in SWAN using DOMEO rdf:type "#$%&()*+,-./012#341546! !"#$%&()#*+! dct:title ,$-.#+&+/.#$01!2342/&5#6&!2#!275#8&.0$&!2092 0-5&.2+&+/.#$&28.0-&*$!! G1 swanrel:referencesAsSupportiveEvidence "#$%&7841%-7.9):0/9=4(0)<6! G5 pav:contributedBy "#$%&7841%-7.9):0%7,;0)<6! G6 27
  103. 103. D2: Experiment:Annotation in SWAN using DOMEO rdf:type "#$%&()*+,-./012#341546! !"#$%&()#*+! dct:title ,$-.#+&+/.#$01!2342/&5#6&!2#!275#8&.0$&!2092 0-5&.2+&+/.#$&28.0-&*$!! G1 swanrel:referencesAsSupportiveEvidence "#$%&7841%-7.9):0/9=4(0)<6! G5 pav:contributedBy "#$%&7841%-7.9):0%7,;0)<6! G6 27
  104. 104. D3: Tracing the heritage of a statement 28
  105. 105. D3: Tracing the heritage of a statement• On paper, you can’t see whether a claim or a recommendation is valid 28
  106. 106. D3: Tracing the heritage of a statement• On paper, you can’t see whether a claim or a recommendation is valid• E.g. required to check for clinical recommendations: –Is this statistically valid? –Was it shown for my patient? –Are there other things I need to know (side effects, funding, etc) 28
  107. 107. D3: Experiment: Linking Clinical Guidelines to Evidence B.  Elsevier-­‐published  .  Philips’  Electronic  PaNent  Records   Clinical  Guideline C.  Elsevier  (or  other  publisher’s)   29 Research  Report  or  Data
  108. 108. D3: Experiment: Linking Clinical Guidelines to Evidence Step  1:  PaNent  data  +  diagnosis  link   to  Guideline  recommendaNon B.  Elsevier-­‐published  .  Philips’  Electronic  PaNent  Records   Clinical  Guideline C.  Elsevier  (or  other  publisher’s)   29 Research  Report  or  Data
  109. 109. D3: Experiment: Linking Clinical Guidelines to Evidence Step  1:  PaNent  data  +  diagnosis  link   to  Guideline  recommendaNon B.  Elsevier-­‐published  .  Philips’  Electronic  PaNent  Records   Clinical  Guideline Step  2:  Guideline  recommendaNon  links   to  research  report/data C.  Elsevier  (or  other  publisher’s)   29 Research  Report  or  Data
  110. 110. D3: The reality of linking evidence:Recommenda)on  in  Guideline Level Evidence  (in  the  text) Ref Recommenda)on  in  Reference5.1.  Laboratory  tests  should   A-­‐III No  evidence  in  text No  referenceinclude  a  CBC  count  with  differenNal  leukocyte  count  and  platelet  count;  5.2.  measurement  of  serum  levels   A-­‐III CBC  counts  and  determinaNon  of  the   No  referenceof  creaNnine  and  blood  urea   levels  of  serum  creaNnine  and  urea  nitrogen;   nitrogen  are  needed  to  plan  supporNve   care  and  to  monitor  for  the  possible   occurrence  of  drug  toxicity.5.3.  and  measurement  of   A-­‐III No  evidence  in  text No  referenceelectrolytes,  hepaNc  transaminase  enzymes,  and  total  bilirubin  (A-­‐III).Not  menNoned:   The  total  volume  of  blood  cultured  is  a   [47] Our  data,  together  with  an  GET  ENOUGH  BLOOD,  IN  TWO   crucial  determinant  of  detecNng  a   analysis  of  previous  studies,  SEPARATE  BOTTLES   bloodstream  infecNon  [47]. show  that  the  yield  of  blood   cultures  in  adults  increases   (a  ‘‘set’’  consists  of  1  venipuncture  or   approximately  3%  per  millilitre  of   catheter  access  draw  of  20  mL  of  blood   blood  cultured.   divided  into  1  aerobic  and  1  anaerobic   blood  culture  bogle).Not  menNoned:  REPEAT  TESTS These  tests  should  be  done  at  least   every  3  days  during  the  course  of   intensive  anNbioNc  therapy. At  least  weekly  monitoring  of  serum   transaminase  levels  is  advisable  for   30
  111. 111. In summary:Type Problems Experiments IssuesA. Paper format:A1 Two-dimensional Utopia, Wolfram CDF Standards, toolsA2 Linear ABCDE Adoption?A3 Not interactive Executable papers AdoptionB. Writing habitsB1 Reference to papers TAC: CItance summaries Need to start at authorB2 Inexact entity references NIF antibodies Need mandate!B3 Methods post-mortem Data-centric publishing Change research recording!C. Language:C1 Coherent DIKB Hard to parse!C2 Narrative CKUs Fractal nature of paperC3 Abstract BEL Formalize knowledge levelD. Collections of papers:D1 Can’t find Scientific search engines? Is anyone working on this?D2 Can’t compare DOMEO/SWAN Manual, doesn’t scaleD3 Can’t combine Evidence-based guidelines Inconsistencies! 31
  112. 112. Have we solved the Big Problem? 32
  113. 113. Have we solved the Big Problem? 1) Too many papers?• Do not make publication numbers factor in evaluation• Do not make conference attendance contingent on publication• Write fewer papers! Limit yourself to write only what is significant and profound (and entertaining!)2)! Too little time to read?• Collectively: change expectation of work in a day• Make grant process less of a waste of time and talent• Reduce burden of administration on (senior) scientists: reinstate departmental administrators!• Teach administration as a class: Lethbridge journal incubator• Make time to read some new (or old!) interesting work! 32
  114. 114. So how do we tackle all this?• DERI-Elsevier collaboration - define research projects?• Perhaps under aegis of Force11? • Dagstuhl Workshop in August of 2011: 35 invited attendees from different parts of science, industry, funding agencies, data centers • Goal: map main obstacles preventing new models of science publishing and develop ways to overcome them • Just received funding from Sloan foundation to: –Start online community –Hold next workshop –Collaboratively work on next steps• Any thoughts? 33
  115. 115. Acknowledgements/collaborations:1.Executable papers: Juliana Freire, NYU & Matthias Troyer, ETH Zurich (Vistrails); Micah Altman, Harvard SQSS (R), Gloriana St. Claire & Mahadev Satyanarayanan, CMU (Olive) (pending IMLS grant)2.Citance summaries: Lucy Vanderwende, Microsoft Research; Hoa Trang, NIST; Eduard Hovy, ISI/USC3.NIF antibodies: Maryann Martone, NIF/UCSD4.Data-centric publishing: Phil Bourne, UCSD, Yolanda Gil, ISI/USC (funded in part by Elsevier Labs)5.DIKB: Rich Boyce, U Pittsburgh, Jodi Schneider, DERI, Maria Liakata, EBI (looking for funding opportunities!)6.CKUs: Agnes Sandor, Xerox Research Europe7.BEL/knowledge attribution: Dexter Pratt, Selventa; Henk Pander Maat, University Utrecht (funded in part by NWO)8.DOMEO/SWAN:Paolo Ciccarese & Tim Clark, Harvard/MGH (funded in part by Elsevier Labs)9.Evidence-based guidelines: Paul Groth, Rinke Hoekstra, Frank van Harmelen, VU; Richard Vdovjak, Philips Research (funded by STW)10.Force11: Phil Bourne, UCSD; Eduard Hovy, ISI/USC; Tim Clark, Harvard/MGH; Cameron Neylon, PLoS; Ivan Herman, W3C (funded in part by Sloan Foundation) 34
  116. 116. Anything here we can work on?Type Problems Experiments IssuesA. Paper format:A1 Two-dimensional Utopia, Wolfram CDF Standards, toolsA2 Linear ABCDE Adoption?A3 Not interactive Executable papers AdoptionB. Writing habitsB1 Reference to papers TAC: Citance summaries Need to start at authorB2 Inexact entity references NIF antibodies Need mandate!B3 Methods post-mortem Data-centric publishing Change research recording!C. Language:C1 Coherent DIKB Hard to parse!C2 Narrative CKUs Fractal nature of paperC3 Abstract BEL Formalize knowledge levelD. Collections of papers:D1 Can’t find Scientific search engines? Is anyone working on this?D2 Can’t compare DOMEO/SWAN Manual, doesn’t scaleD3 Can’t combine Evidence-based guidelines Inconsistencies!Writing less and reading more Force11, perhaps? Social/political/personal!35
  117. 117. What about writing completely differently?[[1] Bleecker, J. ‘A Manifesto for Networked Objects — Cohabiting with Pigeons, Arphids and Aibos in the Internet of Thingshttp://nearfuturelaboratory.com/2006/02/26/a-manifesto-for-networked-objects/2] Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange andReuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA.http://precedings.nature.com/documents/4626/version/1[3] Neylon, C. ‘Network Enabled Research: Maximise scale and connectivity, minimise friction’, http://cameronneylon.net/blog/ 36network-enabled-research/
  118. 118. What about writing completely differently?[[1] Bleecker, J. ‘A Manifesto for Networked Objects — Cohabiting with Pigeons, Arphids and Aibos in the Internet of Thingshttp://nearfuturelaboratory.com/2006/02/26/a-manifesto-for-networked-objects/2] Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange andReuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA.http://precedings.nature.com/documents/4626/version/1[3] Neylon, C. ‘Network Enabled Research: Maximise scale and connectivity, minimise friction’, http://cameronneylon.net/blog/ 36network-enabled-research/
  119. 119. What about writing completely differently? Internet of things: (Bleecker, [1]) Interact with ‘objects that blog’ or ‘Blogjects’, that: track where they are and where they’ve been; have histories of their encounters and experiences have agency - an assertive voice on the social web [2][[1] Bleecker, J. ‘A Manifesto for Networked Objects — Cohabiting with Pigeons, Arphids and Aibos in the Internet of Thingshttp://nearfuturelaboratory.com/2006/02/26/a-manifesto-for-networked-objects/2] Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange andReuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA.http://precedings.nature.com/documents/4626/version/1[3] Neylon, C. ‘Network Enabled Research: Maximise scale and connectivity, minimise friction’, http://cameronneylon.net/blog/ 36network-enabled-research/
  120. 120. What about writing completely differently? Internet of things: (Bleecker, [1]) Interact with ‘objects that blog’ or ‘Blogjects’, that: track where they are and where they’ve been; have histories of their encounters and experiences have agency - an assertive voice on the social web [2][[1] Bleecker, J. ‘A Manifesto for Networked Objects — Cohabiting with Pigeons, Arphids and Aibos in the Internet of Thingshttp://nearfuturelaboratory.com/2006/02/26/a-manifesto-for-networked-objects/2] Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange andReuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA.http://precedings.nature.com/documents/4626/version/1[3] Neylon, C. ‘Network Enabled Research: Maximise scale and connectivity, minimise friction’, http://cameronneylon.net/blog/ 36network-enabled-research/
  121. 121. What about writing completely differently? Internet of things: (Bleecker, [1]) Interact with ‘objects that blog’ or ‘Blogjects’, that: track where they are and where they’ve been; have histories of their encounters and experiences have agency - an assertive voice on the social web [2] Research Objects: (Bechofer et al, [2]) Create semantically rich aggregations of resources, that can possess some scientific intent or support some research objective[[1] Bleecker, J. ‘A Manifesto for Networked Objects — Cohabiting with Pigeons, Arphids and Aibos in the Internet of Thingshttp://nearfuturelaboratory.com/2006/02/26/a-manifesto-for-networked-objects/2] Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange andReuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA.http://precedings.nature.com/documents/4626/version/1[3] Neylon, C. ‘Network Enabled Research: Maximise scale and connectivity, minimise friction’, http://cameronneylon.net/blog/ 36network-enabled-research/
  122. 122. What about writing completely differently? Internet of things: (Bleecker, [1]) Interact with ‘objects that blog’ or ‘Blogjects’, that: track where they are and where they’ve been; have histories of their encounters and experiences have agency - an assertive voice on the social web [2] Research Objects: (Bechofer et al, [2]) Create semantically rich aggregations of resources, that can possess some scientific intent or support some research objective[[1] Bleecker, J. ‘A Manifesto for Networked Objects — Cohabiting with Pigeons, Arphids and Aibos in the Internet of Thingshttp://nearfuturelaboratory.com/2006/02/26/a-manifesto-for-networked-objects/2] Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange andReuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA.http://precedings.nature.com/documents/4626/version/1[3] Neylon, C. ‘Network Enabled Research: Maximise scale and connectivity, minimise friction’, http://cameronneylon.net/blog/ 36network-enabled-research/
  123. 123. What about writing completely differently? Internet of things: (Bleecker, [1]) Interact with ‘objects that blog’ or ‘Blogjects’, that: track where they are and where they’ve been; have histories of their encounters and experiences have agency - an assertive voice on the social web [2] Research Objects: (Bechofer et al, [2]) Create semantically rich aggregations of resources, that can possess some scientific intent or support some research objective Networked Knowledge: (Neylon, [3]) If we care about taking advantage of the web and internet for research then we must tackle the building of scholarly communication networks. These networks will have two critical characteristics: scale and a lack of friction. [3][[1] Bleecker, J. ‘A Manifesto for Networked Objects — Cohabiting with Pigeons, Arphids and Aibos in the Internet of Thingshttp://nearfuturelaboratory.com/2006/02/26/a-manifesto-for-networked-objects/2] Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange andReuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA.http://precedings.nature.com/documents/4626/version/1[3] Neylon, C. ‘Network Enabled Research: Maximise scale and connectivity, minimise friction’, http://cameronneylon.net/blog/ 36network-enabled-research/
  124. 124. Networked science in action:• Galaxy Zoo: citizen science: classify galaxies in the comfort of your own home – like Hanny!• Tim Gowers, Polymath: “This  is  to  normal  research  as  driving  is   to  pushing  a  car”• Mathoverflow: virtual network of mathematicians working collectively to answer big/small, clear/fuzzy questions• Jean-Claude Bradley: ‘short-form chemistry’: tweet/blog about an experiment, Storify into a narrative• Read Cameron Neylon’s blog on networked science! 37
  125. 125. Anything here we can work on?Type Problems Experiments IssuesA. Paper format:A1 Two-dimensional Utopia, Wolfram CDF Standards, toolsA2 Linear ABCDE Adoption?A3 Not interactive Executable papers AdoptionB. Writing habitsB1 Reference to papers TAC: Citance summaries Need to start at authorB2 Inexact entity references NIF antibodies Need mandate!B3 Methods post-mortem Data-centric publishing Change research recording!C. Language:C1 Coherent DIKB Hard to parse!C2 Narrative CKUs Fractal nature of paperC3 Abstract BEL Formalize knowledge levelD. Collections of papers:D1 Can’t find Scientific search engines? Is anyone working on this?D2 Can’t compare DOMEO/SWAN Manual, doesn’t scaleD3 Can’t combine Evidence-based guidelines Inconsistencies!Networked science Mathoverflow, Bradley But is it science?Writing less and reading more Force11, perhaps? Social/political/personal!38

×