How To Execute
The Research Paper
Anita de Waard
Disruptive Technologies Director
Elsevier Labs
Pittsburgh, April 2012
Outline
• Ten people who are changing scholarly publishing:
– New forms
– Workflow/data integration
– New models of business/attribution
• So what does this mean?
• Some projects to help us move towards these new
models:
– Claim-evidence networks
– Workflow/data integration and executable papers
– Creating a community of practice
Theme 1: New forms of publication
• Main issue: the format of the scientific paper comes
from a time when our communication was paper-
centric
• Solution: Rethink the unit and form of the scholarly
publication from the ground (i.e., the experiment) up
• Three projects doing that:
Steve Pettifer, U Manchester
• Utopia: ‘Everything you always wanted to do
with a PDF….’: interactive, sharable
• Working on integration with DOMEO to
add/share annotations
• Final goal: don’t ‘reconstruct the cow from a
hamburger’: include workflows and models
Gully Burns, USC ISI
• KEfED: model of research as an
activity
• Map out
dependent/independent
variables
within an experiment and
model them
• Start: appendix to paper; later:
precede paper, graft paper on
top of model.
Tim Clark, Harvard/MGH
swande:Claim
<http://tinyurl.com/4h2am3a>
Intramembranous Aβ behaves as chaperones of
other membrane proteins
rdf:type
dct:title
G1
<http://example.info/person/1>
pav:contributedBy
<http://example.info/citation/1>
swanrel:referencesAsSupportiveEvidence
G5
G6
• Annotation ontology allows you to trace claims
• DOMEO offers interface to do both automated
entity markup + manual mark up of
claim/evidence networks
Theme 2: data and workflow integration
• Issues:
– Format of the research paper hard to integrate within a
scientific/clinical workflow
– Hard to reproduce/deduce: what methods were used and
what data was created for a piece of research, making
reproduction or even review difficult
• Some solutions for sharing workflows and data:
Results
Logs
Results
Metadata
Paper
Slides
Feeds
into
produc
es
Includ
ed in
produc
es
Published
in
produc
es
Included
in
Include
d in
Included
in
Published
in
Workflow 16
Workflow 13
Common
pathways
Q
T
L
• Research objects: consist of all
academic output, including:
- Papers
- Workflows
- Data
- Talks, lectures
- Blogs
• Move towards executable work:
- Execute periodically to validate
- Run automatically when data updates – by self or others!
- Notify researchers of new results
Dave DeRoure, Oxford e-Research Centre
Phil Bourne, UCSD
• Big need: keep track of the data in my lab!
• Other need: know what I did/what other people
did – Yolanda Gil made workflow representation,
was hard to remember what we did…
• Need: better ways to record, share, archive what
we did.
• New role for the publisher >
Deborah McGuinness, RPI
• Future Web:
• ‘if everything is everywhere, how do we find
it/know what we want?’
• Internet, Web, Grid, Cloud, Semantic Grid
Middleware
• Xinformatics:
• Where X = geo, eco, econo…
• Linked Data to Semantics
• Semantic Foundations:
• Pushing the boundaries of
Semantic Web standards
• Ontology evolution
Theme 3: New Models for Access/Attribution
• Issues:
– User-created content, crowdsourcing means (scientific)
impact is measured very differently from the past
– Need new models for copyright/IP
– Citizen scientists participate as well
• Some efforts to address this:
Paul Groth, VU Amsterdam
Altmetrics: “the creation and study of
new metrics based on the Social Web
for analyzing and informing scholarship.”
Including:
- Downloads
- Where readers read
- Data citation
- Social network diffusion
- Slide reuse
- Peer review contributions
- Youtube views
Leslie Chan, U. Toronto Scarborough
• ElPub conference series that focus
on globally connecting information scientists
• Bioline International system “a not-for-profit
scholarly publishing cooperative committed to
providing open access to quality research journals
published in developing countries”:
John Wilbanks, Kauffman/CC
• As data becomes more accessible, need:
• raw metadata
• standards processes
• consensus processes
• document submission standards
• data archives
• Ways of governing access:
• Privacy vs. IP vs. policies
• Technology only helps so much…
• This is mostly a social/policy issue
Cameron Neylon, Cambridge
• Main arguments for Open Access:
• Citizen science is becoming more important
• Science changes when it is crowdsourced:
Tim Gowers: ‘This is to normal research as
driving is to pushing a car’
• Three principles:
• Scale and connectivity
• Reduced friction to access
• Demand-side filters
In summary, scientists are working on:
• Tools for knowledge…
– Visualisation (Steve Pettifer)
– Modeling (Gully Burns)
– Annotation (Tim Clark)
• Ways to link to
– Workflows (Dave De Roure)
– Lab data (Phil Bourne)
– Linked research data (Deborah McGuinness)
• And models for
– Attribution/credit (Paul Groth)
– Allowing new players to participate (Leslie Chan)
– Copyright/IP rights (John Wilbanks)
– Networked science (Cameron Neylon).
New roles for publishers and libraries
• Technically, there is no reason to publish in a
journal– or for that matter, to publish a paper at all:
• Perhaps a good blog post linked to workflows and
data with some validation from peers and good
download statistics might work just as well?
• Is publishing in journals mostly a habit?
“Publishers have been thinking we’re going out of
business for 20 years, what has suddenly changed?”
The internet! Not the technical web, but the social web….
‘The value of a […] network is proportional to the square of
the number of users of the system (n²)’ Metcalfe’s Law
1990’s:
Big Player
2000’s:
Medium Participant
2015:
Irrelevant!
19
What do we need?
[[1] Bleecker, J. ‘A Manifesto for Networked Objects — Cohabiting with Pigeons, Arphids and Aibos in the Internet of Things
http://nearfuturelaboratory.com/2006/02/26/a-manifesto-for-networked-objects/
2] Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange and
Reuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA.
http://precedings.nature.com/documents/4626/version/1
[3] Neylon, C. ‘Network Enabled Research: Maximise scale and connectivity, minimise friction’,
http://cameronneylon.net/blog/network-enabled-research/ ‘
Internet of things: (Bleecker, [1])
Interact with ‘objects that blog’ or ‘Blogjects’, that:
track where they are and where they’ve been;have
histories of their encounters and experienceshave
agency - an assertive voice on the social web [2]
Research Objects: (Bechofer et al, [2])
Create semantically rich aggregations of resources,
that can possess some scientific intent or support
some research objective
Networked Knowledge: (Neylon, [3])
If we care about taking advantage of the web and
internet for research then we must tackle the building
of scholarly communication networks.
These networks will have two critical characteristics:
scale and a lack of friction. [3]
Some examples of networked science:
• Galaxy Zoo: citizen science: classify galaxies in the
comfort of your own home – like Hanny!
• Tim Gowers, Polymath: “…the real contributors will be
the process owners and project leaders that are able to
provide horizontal leadership. To support this shift,
organizations will need to reward and recognize
horizontal contributions as much, if not more, than
hierarchical positions.”
• Mathoverflow: virtual network of mathemagicians
working collectively to answer big, small, clear and
fuzzy questions
Executable Papers
• E.g.:
http://www.vistrails.org/index.php/User:Tohline/CPM/Levels2and3
Some other publisher
6. User applications: distributed applications run on this
‘exposed data’ universe.
Wrapping a story around your data:
Concept developed with Ed Hovy, Phil Bourne,
Gully Burns and Cartic Ramakrishnan
1. Research: Each item in the system has metadata (including
provenance) and relations to other data items added to it.
metadata
metadata
metadata
metadata
metadata
5. Publishing and distribution: When a paper is published, a
collection of validated information is exposed to the world. It
remains connected to its related data item, and its heritage can
be traced.
2. Workflow: All data items created in the lab are added to a
(lab-owned) workflow system.
4. Editing and review: Once the co-authors agree, the paper is
‘exposed’ to the editors, who in turn expose it to reviewers.
Reports are stored in the authoring/editing system, the paper gets
updated, until it is validated.
Review
Edit
Revise
Rats were subjected to two grueling
tests
(click on fig 2 to see underlying data).
These results suggest that the
neurological pain pro-
3. Authoring: A paper is written in an authoring tool which can pull
data with provenance from the workflow tool in the appropriate
representation into the document.
23
Creating claim-evidence networks:
• DOMEO: connect to Science Direct
• Rich Boyce’s Drug-drug interactions: tracing
heritage of claims
• Founding that: linguistic markers for identifying
cited/own knowledge:
How a claim becomes a fact:
• Voorhoeve, 2006: “These miRNAs neutralize p53- mediated CDK
inhibition, possibly through direct inhibition of the expression of the
tumorsuppressor LATS2.”
• Kloosterman and Plasterk, 2006: “In a genetic screen, miR-372 and miR-
373 were found to allow proliferation of primary human cells that
express oncogenic RAS and active p53, possibly by inhibiting the tumor
suppressor LATS2 (Voorhoeve et al., 2006).”
• Yabuta et al., 2007: “[On the other hand,] two miRNAs, miRNA-372 and-
373, function as potential novel oncogenes in testicular germ cell
tumors by inhibition of LATS2 expression, which suggests that Lats2 is an
important tumor suppressor (Voorhoeve et al., 2006).”
• Okada et al., 2011: “Two oncogenic miRNAs, miR-372 and miR-373,
directly inhibit the expression of Lats2, thereby allowing tumorigenic
growth in the presence of p53 (Voorhoeve et al., 2006).”
Working on ontology:
1. Add to formal knowledge representations, e.g. Biological
Expression Language add {V = 3, S = N, B = 0}:
• SET Evidence = "Arterial cells are highly susceptible to oxidative stress, which can induce
both necrosis and apoptosis (programmed cell death) [1,2]"
• biologicalProcess(GO:"response to oxidative stress") increases
biologicalProcess(GO:"apoptotic process")
• biologicalProcess(GO:"response to oxidative stress") increases
biologicalProcess(GO:necrosis)
2. Improve triple search engines, e.g. compare in iHop:
• The Lats2 tumor suppressor protein has been implicated earlier in promoting p53
activation in response to mitotic apparatus stress {V = 2, S = NN, B = 0}
• Our findings reveal that miR-373 would be a potential oncogene and it participates in
the carcinogenesis of human esophageal cancer {V = 1/2?, S = A, B = D}
Application: Elsevier/Philips Use Case:
3 Content Sources, 2 Link Steps
A. Philips’ Electronic Patient Records
B. Elsevier-published
Clinical Guideline
C. Elsevier (or other publisher’s)
Research Report or Data
Step 1: Patient data +
diagnosis link to Guideline
recommendation
Step 2: Guideline recommendation
links to research report/data
27
Application: Find ‘Claimed Knowledge Updates’
Work done with Agnes Sandor,
Xerox Research Europe
FORCE11 Community of Practice
• Workshop in August of 2011: 35 invited attendees from different
parts of science, industry, funding agencies, data centers
• Goal: map main obstacles preventing new models of science
publishing and develop ways to overcome them
• Just received funding from
Sloan foundation to:
• Start online community
• Hold next workshop
• Collaboratively work on
new efforts
Summary:
• Ten people who are changing scholarly publishing:
• We (publishers, editors, libraries, etc) need to revisit if
and how we are needed
• Some projects are underway to help us move towards
these new models:
– Networked science
– Workflow/data integration
– Identifying claim-evidence trails
• ….happy to collaborate on others!
http://elsatglabs.com/labs/anita
a.dewaard@elsevier.com

How to Execute A Research Paper

  • 1.
    How To Execute TheResearch Paper Anita de Waard Disruptive Technologies Director Elsevier Labs Pittsburgh, April 2012
  • 2.
    Outline • Ten peoplewho are changing scholarly publishing: – New forms – Workflow/data integration – New models of business/attribution • So what does this mean? • Some projects to help us move towards these new models: – Claim-evidence networks – Workflow/data integration and executable papers – Creating a community of practice
  • 3.
    Theme 1: Newforms of publication • Main issue: the format of the scientific paper comes from a time when our communication was paper- centric • Solution: Rethink the unit and form of the scholarly publication from the ground (i.e., the experiment) up • Three projects doing that:
  • 4.
    Steve Pettifer, UManchester • Utopia: ‘Everything you always wanted to do with a PDF….’: interactive, sharable • Working on integration with DOMEO to add/share annotations • Final goal: don’t ‘reconstruct the cow from a hamburger’: include workflows and models
  • 5.
    Gully Burns, USCISI • KEfED: model of research as an activity • Map out dependent/independent variables within an experiment and model them • Start: appendix to paper; later: precede paper, graft paper on top of model.
  • 6.
    Tim Clark, Harvard/MGH swande:Claim <http://tinyurl.com/4h2am3a> IntramembranousAβ behaves as chaperones of other membrane proteins rdf:type dct:title G1 <http://example.info/person/1> pav:contributedBy <http://example.info/citation/1> swanrel:referencesAsSupportiveEvidence G5 G6 • Annotation ontology allows you to trace claims • DOMEO offers interface to do both automated entity markup + manual mark up of claim/evidence networks
  • 7.
    Theme 2: dataand workflow integration • Issues: – Format of the research paper hard to integrate within a scientific/clinical workflow – Hard to reproduce/deduce: what methods were used and what data was created for a piece of research, making reproduction or even review difficult • Some solutions for sharing workflows and data:
  • 8.
    Results Logs Results Metadata Paper Slides Feeds into produc es Includ ed in produc es Published in produc es Included in Include d in Included in Published in Workflow16 Workflow 13 Common pathways Q T L • Research objects: consist of all academic output, including: - Papers - Workflows - Data - Talks, lectures - Blogs • Move towards executable work: - Execute periodically to validate - Run automatically when data updates – by self or others! - Notify researchers of new results Dave DeRoure, Oxford e-Research Centre
  • 9.
    Phil Bourne, UCSD •Big need: keep track of the data in my lab! • Other need: know what I did/what other people did – Yolanda Gil made workflow representation, was hard to remember what we did… • Need: better ways to record, share, archive what we did. • New role for the publisher >
  • 10.
    Deborah McGuinness, RPI •Future Web: • ‘if everything is everywhere, how do we find it/know what we want?’ • Internet, Web, Grid, Cloud, Semantic Grid Middleware • Xinformatics: • Where X = geo, eco, econo… • Linked Data to Semantics • Semantic Foundations: • Pushing the boundaries of Semantic Web standards • Ontology evolution
  • 11.
    Theme 3: NewModels for Access/Attribution • Issues: – User-created content, crowdsourcing means (scientific) impact is measured very differently from the past – Need new models for copyright/IP – Citizen scientists participate as well • Some efforts to address this:
  • 12.
    Paul Groth, VUAmsterdam Altmetrics: “the creation and study of new metrics based on the Social Web for analyzing and informing scholarship.” Including: - Downloads - Where readers read - Data citation - Social network diffusion - Slide reuse - Peer review contributions - Youtube views
  • 13.
    Leslie Chan, U.Toronto Scarborough • ElPub conference series that focus on globally connecting information scientists • Bioline International system “a not-for-profit scholarly publishing cooperative committed to providing open access to quality research journals published in developing countries”:
  • 14.
    John Wilbanks, Kauffman/CC •As data becomes more accessible, need: • raw metadata • standards processes • consensus processes • document submission standards • data archives • Ways of governing access: • Privacy vs. IP vs. policies • Technology only helps so much… • This is mostly a social/policy issue
  • 15.
    Cameron Neylon, Cambridge •Main arguments for Open Access: • Citizen science is becoming more important • Science changes when it is crowdsourced: Tim Gowers: ‘This is to normal research as driving is to pushing a car’ • Three principles: • Scale and connectivity • Reduced friction to access • Demand-side filters
  • 16.
    In summary, scientistsare working on: • Tools for knowledge… – Visualisation (Steve Pettifer) – Modeling (Gully Burns) – Annotation (Tim Clark) • Ways to link to – Workflows (Dave De Roure) – Lab data (Phil Bourne) – Linked research data (Deborah McGuinness) • And models for – Attribution/credit (Paul Groth) – Allowing new players to participate (Leslie Chan) – Copyright/IP rights (John Wilbanks) – Networked science (Cameron Neylon).
  • 17.
    New roles forpublishers and libraries • Technically, there is no reason to publish in a journal– or for that matter, to publish a paper at all: • Perhaps a good blog post linked to workflows and data with some validation from peers and good download statistics might work just as well? • Is publishing in journals mostly a habit?
  • 18.
    “Publishers have beenthinking we’re going out of business for 20 years, what has suddenly changed?” The internet! Not the technical web, but the social web…. ‘The value of a […] network is proportional to the square of the number of users of the system (n²)’ Metcalfe’s Law 1990’s: Big Player 2000’s: Medium Participant 2015: Irrelevant!
  • 19.
    19 What do weneed? [[1] Bleecker, J. ‘A Manifesto for Networked Objects — Cohabiting with Pigeons, Arphids and Aibos in the Internet of Things http://nearfuturelaboratory.com/2006/02/26/a-manifesto-for-networked-objects/ 2] Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange and Reuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA. http://precedings.nature.com/documents/4626/version/1 [3] Neylon, C. ‘Network Enabled Research: Maximise scale and connectivity, minimise friction’, http://cameronneylon.net/blog/network-enabled-research/ ‘ Internet of things: (Bleecker, [1]) Interact with ‘objects that blog’ or ‘Blogjects’, that: track where they are and where they’ve been;have histories of their encounters and experienceshave agency - an assertive voice on the social web [2] Research Objects: (Bechofer et al, [2]) Create semantically rich aggregations of resources, that can possess some scientific intent or support some research objective Networked Knowledge: (Neylon, [3]) If we care about taking advantage of the web and internet for research then we must tackle the building of scholarly communication networks. These networks will have two critical characteristics: scale and a lack of friction. [3]
  • 20.
    Some examples ofnetworked science: • Galaxy Zoo: citizen science: classify galaxies in the comfort of your own home – like Hanny! • Tim Gowers, Polymath: “…the real contributors will be the process owners and project leaders that are able to provide horizontal leadership. To support this shift, organizations will need to reward and recognize horizontal contributions as much, if not more, than hierarchical positions.” • Mathoverflow: virtual network of mathemagicians working collectively to answer big, small, clear and fuzzy questions
  • 21.
  • 22.
    Some other publisher 6.User applications: distributed applications run on this ‘exposed data’ universe. Wrapping a story around your data: Concept developed with Ed Hovy, Phil Bourne, Gully Burns and Cartic Ramakrishnan 1. Research: Each item in the system has metadata (including provenance) and relations to other data items added to it. metadata metadata metadata metadata metadata 5. Publishing and distribution: When a paper is published, a collection of validated information is exposed to the world. It remains connected to its related data item, and its heritage can be traced. 2. Workflow: All data items created in the lab are added to a (lab-owned) workflow system. 4. Editing and review: Once the co-authors agree, the paper is ‘exposed’ to the editors, who in turn expose it to reviewers. Reports are stored in the authoring/editing system, the paper gets updated, until it is validated. Review Edit Revise Rats were subjected to two grueling tests (click on fig 2 to see underlying data). These results suggest that the neurological pain pro- 3. Authoring: A paper is written in an authoring tool which can pull data with provenance from the workflow tool in the appropriate representation into the document.
  • 23.
    23 Creating claim-evidence networks: •DOMEO: connect to Science Direct • Rich Boyce’s Drug-drug interactions: tracing heritage of claims • Founding that: linguistic markers for identifying cited/own knowledge:
  • 24.
    How a claimbecomes a fact: • Voorhoeve, 2006: “These miRNAs neutralize p53- mediated CDK inhibition, possibly through direct inhibition of the expression of the tumorsuppressor LATS2.” • Kloosterman and Plasterk, 2006: “In a genetic screen, miR-372 and miR- 373 were found to allow proliferation of primary human cells that express oncogenic RAS and active p53, possibly by inhibiting the tumor suppressor LATS2 (Voorhoeve et al., 2006).” • Yabuta et al., 2007: “[On the other hand,] two miRNAs, miRNA-372 and- 373, function as potential novel oncogenes in testicular germ cell tumors by inhibition of LATS2 expression, which suggests that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006).” • Okada et al., 2011: “Two oncogenic miRNAs, miR-372 and miR-373, directly inhibit the expression of Lats2, thereby allowing tumorigenic growth in the presence of p53 (Voorhoeve et al., 2006).”
  • 25.
    Working on ontology: 1.Add to formal knowledge representations, e.g. Biological Expression Language add {V = 3, S = N, B = 0}: • SET Evidence = "Arterial cells are highly susceptible to oxidative stress, which can induce both necrosis and apoptosis (programmed cell death) [1,2]" • biologicalProcess(GO:"response to oxidative stress") increases biologicalProcess(GO:"apoptotic process") • biologicalProcess(GO:"response to oxidative stress") increases biologicalProcess(GO:necrosis) 2. Improve triple search engines, e.g. compare in iHop: • The Lats2 tumor suppressor protein has been implicated earlier in promoting p53 activation in response to mitotic apparatus stress {V = 2, S = NN, B = 0} • Our findings reveal that miR-373 would be a potential oncogene and it participates in the carcinogenesis of human esophageal cancer {V = 1/2?, S = A, B = D}
  • 26.
    Application: Elsevier/Philips UseCase: 3 Content Sources, 2 Link Steps A. Philips’ Electronic Patient Records B. Elsevier-published Clinical Guideline C. Elsevier (or other publisher’s) Research Report or Data Step 1: Patient data + diagnosis link to Guideline recommendation Step 2: Guideline recommendation links to research report/data
  • 27.
    27 Application: Find ‘ClaimedKnowledge Updates’ Work done with Agnes Sandor, Xerox Research Europe
  • 28.
    FORCE11 Community ofPractice • Workshop in August of 2011: 35 invited attendees from different parts of science, industry, funding agencies, data centers • Goal: map main obstacles preventing new models of science publishing and develop ways to overcome them • Just received funding from Sloan foundation to: • Start online community • Hold next workshop • Collaboratively work on new efforts
  • 29.
    Summary: • Ten peoplewho are changing scholarly publishing: • We (publishers, editors, libraries, etc) need to revisit if and how we are needed • Some projects are underway to help us move towards these new models: – Networked science – Workflow/data integration – Identifying claim-evidence trails • ….happy to collaborate on others! http://elsatglabs.com/labs/anita a.dewaard@elsevier.com

Editor's Notes

  • #9 This is reflected in a third distinctive – the pack. This is Paul Fishers pack from the Tryps example. Some packs contain example input and output data so workflows can be checked for “decay” (they don’t actually rot, but the world changes round them). While others are looking at semantically enhanced publication, we are asking “what is the shared artefact of future research?” We come at the same problem from the other side. We have it surrounded! Our approach relieves us of the paper mindest – so, for example, a Research Object could contain information for many audiences and purposes, with a commonly interpreted core (social scientists will recognise the idea of a “boundary object”).