The end of
the scientific paper
as we know it
(in 4 easy steps)
Frank van Harmelen
(+ Paul Groth)
VU Amsterdam
Reports on
the death of
the scientific paper
have been greatly
exaggerated
Frank van Harmelen
(+ Paul Groth)
VU Amsterdam
And how the Semantic Web
makes it possible
Semsci 2017 workshop
• 7/10 papers about data
• 3/10 are about papers
and they are about papers written by&for people
Thanks (in order of appearance) to:
• Paul Groth
• Tobias Kuhn
• Jan Velterop
• Barend Mons
• Anita de Waard
• Carole Goble
Scientific publishing hasn’t changed
in 350 years
• Letter from Christian Huygens (1652)
• Writing to his prof in Mathematics
• Citing (and complaining about)
work of Descartes
• One of 3000 letters by Huygens
2017: Only superficial changes
• Different format & style
• Different medium
(Web, PDF)
• Different speed
(PubMed = 2 papers/min)
Section 1: Related work
Section 2: Research question
Section 3: Experimental design
Section 4: Experimental findings
Section 5: Interpretation, conclusions
And our papers still follow
this storyline:
Step 1: Study & interpret literature
Step 2: Formulate hypothesis
Step 3: Design experiment
Step 4: Execute experiment
Step 5: Publish results
This storyline is important,
but only readable by people,
not for machines
How to make our papers more usable?
“We only need information extraction
because we first did information burial” (Barend Mons)
“A journal paper
is a state-funeral
for your results”
(Hans Akkermans)
Step 1: explicit rhetorical structure
Capture the roles of blocks of text &
make these roles explicit
1 paper = 1 Network of blocks
N papers = 1 Network of blocks
Results Results
Interpretati
ons
Interpretati
ons
Conclusio
ns
Problem
Method
Results
Interpretati
ons
Conclusio
ns
Problem
Method
One paper Another paper
Step 2: explicit fine-grained
rhetorical structure
Locate individual knowledge items
and their relationships
Example: Scholonto, ClaiMaker [Buckinham-Shum]
Paper = set of claims
Claim = text – relation – text
Relation = causes, predicts, prevents; addresses, solves
equals, is-similar-to; proofs, supports, challenges
1 paper = 1 fine-grained network of relations
N papers = 1 fine-grained network of relations
Step 3: do away with the paper altogether.
• Any fact is a relation between two things (“triple”)
• Count each fact as a nano-publication
• Together, these nano-publications form a
huge very fine-grained network of relations,
a web of knowledge,
a “semantic web”
• Computers as colleagues,
not (only) tools
Just publish the facts
What is a Nanopublication
“A nanopublication is the smallest unit of
publishable information: an assertion about
anything that can be uniquely identified and
attributed to its author”
http://nanopub.org
Step 4: turning context into a
1st class citizen
• Link to all the stuff that goes on before publication:
– Datasets, workflows
– Open Lab books
– Open peer reviewing
• Link to all the stuff that goes on after publication:
– Websites
– Blogs
– Emails
– Tweets
– Give web-addresses to objects (URIs)
– Use the web to link between the objects
– Provide meaning in a form that computers can handle (RDF)
These principles embodied
in already deployed technology
We can build this using
semantic web technology
So now we have…
No longer a set of
disconnected monolithic PDFs
A network of facts, reviews,
evidence, opinions, data
The story so far…
• Publishing hasn't changed for 300+ years
• The structure and format of our papers
is still based on this
• Deconstruct the scientific paper
– from monolithic block of text
– to a network of computer readable facts & context
• All of this made possible by the semantic web
But…….
Pragmatic infeasiblility
Pragmatic infeasiblility
Previous experiments in formalising (social) science
turned out to be very hard:
• Hannan and Freeman's theory of organizational inertia
in first-order logic
American Sociological Review 59(4):571-593 · August
1994
• Caroll & Hannan’s resource portioning theory
in first order logic
Computational & Mathematical Organization Theory 7,
87–111, 2001.
Pragmatic (in)feasiblility
Many sciences are quantitave,
but I guess this is still possible in RDF + MathML:
Pragmatic infeasiblility
Science is a social activity, which includes persuasion,
rhetorics, deliberate ambiguity, etc.
Issue #3: hedging
s
CACM, Vol. 22, No. 5, May 1979
“A proof doesn't settle a mathematical argument.
Contrary to what its name suggests,
a proof is only one step in the direction of confidence.
We believe that, in the end,
it is a social process that determines whether
mathematicians feel confident about a theorem.
Thomas, J., The Axiom of Choice, North-Holland, Amsterdam, 1973
(a historical review of independence results in set theory)
Technical infeasibility: Scalability
Scalability
#statements/year =
#statements/nanopub x #nanopubs/paper x #papers/year
= 30 x N x 1.5M = N x 45M/yr
Let’s hope N ≈ O(10)….
Technical infeasibility: expressivity
• RDF hopelessly simple
• Needs at least DL:
“Mosquito’s transmit malaria“
All? no.
Some? yes.
Only? probably.
transmit. Malaria  Mosquitos
Many? Most?
• Beyond DL:
Probabilities, fuzziness, inconsistencies
Technical (in)feasibility:
Argumentation graphs
Escilatopram does not inhibit CYP2D6”
Micropublications, Clark, Ciccarese, Goble, 2013
Technical (in)feasibility:
Argumentation graphs
Argumentation graphs require:
• Defeasible logic
• Modal logic
• Higher-order logic
• ….
at scale of 450M statement/yr 
Should we give up on computers
as scientific colleagues?
• A more modest role for nano-publications?
– Annotations of datasets?
– Very approximate annotations of papers?
• Make them speak our language
instead of us speaking theirs?

The end of the scientific paper as we know it (or not...)

  • 1.
    The end of thescientific paper as we know it (in 4 easy steps) Frank van Harmelen (+ Paul Groth) VU Amsterdam
  • 2.
    Reports on the deathof the scientific paper have been greatly exaggerated Frank van Harmelen (+ Paul Groth) VU Amsterdam And how the Semantic Web makes it possible
  • 3.
    Semsci 2017 workshop •7/10 papers about data • 3/10 are about papers and they are about papers written by&for people Thanks (in order of appearance) to: • Paul Groth • Tobias Kuhn • Jan Velterop • Barend Mons • Anita de Waard • Carole Goble
  • 4.
    Scientific publishing hasn’tchanged in 350 years • Letter from Christian Huygens (1652) • Writing to his prof in Mathematics • Citing (and complaining about) work of Descartes • One of 3000 letters by Huygens
  • 5.
    2017: Only superficialchanges • Different format & style • Different medium (Web, PDF) • Different speed (PubMed = 2 papers/min)
  • 6.
    Section 1: Relatedwork Section 2: Research question Section 3: Experimental design Section 4: Experimental findings Section 5: Interpretation, conclusions And our papers still follow this storyline: Step 1: Study & interpret literature Step 2: Formulate hypothesis Step 3: Design experiment Step 4: Execute experiment Step 5: Publish results This storyline is important, but only readable by people, not for machines
  • 7.
    How to makeour papers more usable? “We only need information extraction because we first did information burial” (Barend Mons) “A journal paper is a state-funeral for your results” (Hans Akkermans)
  • 10.
    Step 1: explicitrhetorical structure Capture the roles of blocks of text & make these roles explicit 1 paper = 1 Network of blocks N papers = 1 Network of blocks Results Results Interpretati ons Interpretati ons Conclusio ns Problem Method Results Interpretati ons Conclusio ns Problem Method One paper Another paper
  • 11.
    Step 2: explicitfine-grained rhetorical structure Locate individual knowledge items and their relationships Example: Scholonto, ClaiMaker [Buckinham-Shum] Paper = set of claims Claim = text – relation – text Relation = causes, predicts, prevents; addresses, solves equals, is-similar-to; proofs, supports, challenges 1 paper = 1 fine-grained network of relations N papers = 1 fine-grained network of relations
  • 13.
    Step 3: doaway with the paper altogether. • Any fact is a relation between two things (“triple”) • Count each fact as a nano-publication • Together, these nano-publications form a huge very fine-grained network of relations, a web of knowledge, a “semantic web” • Computers as colleagues, not (only) tools Just publish the facts
  • 14.
    What is aNanopublication “A nanopublication is the smallest unit of publishable information: an assertion about anything that can be uniquely identified and attributed to its author” http://nanopub.org
  • 20.
    Step 4: turningcontext into a 1st class citizen • Link to all the stuff that goes on before publication: – Datasets, workflows – Open Lab books – Open peer reviewing • Link to all the stuff that goes on after publication: – Websites – Blogs – Emails – Tweets
  • 21.
    – Give web-addressesto objects (URIs) – Use the web to link between the objects – Provide meaning in a form that computers can handle (RDF) These principles embodied in already deployed technology We can build this using semantic web technology
  • 22.
    So now wehave… No longer a set of disconnected monolithic PDFs A network of facts, reviews, evidence, opinions, data
  • 26.
    The story sofar… • Publishing hasn't changed for 300+ years • The structure and format of our papers is still based on this • Deconstruct the scientific paper – from monolithic block of text – to a network of computer readable facts & context • All of this made possible by the semantic web
  • 27.
  • 28.
  • 29.
    Pragmatic infeasiblility Previous experimentsin formalising (social) science turned out to be very hard: • Hannan and Freeman's theory of organizational inertia in first-order logic American Sociological Review 59(4):571-593 · August 1994 • Caroll & Hannan’s resource portioning theory in first order logic Computational & Mathematical Organization Theory 7, 87–111, 2001.
  • 30.
    Pragmatic (in)feasiblility Many sciencesare quantitave, but I guess this is still possible in RDF + MathML:
  • 31.
    Pragmatic infeasiblility Science isa social activity, which includes persuasion, rhetorics, deliberate ambiguity, etc.
  • 34.
  • 35.
  • 36.
    CACM, Vol. 22,No. 5, May 1979 “A proof doesn't settle a mathematical argument. Contrary to what its name suggests, a proof is only one step in the direction of confidence. We believe that, in the end, it is a social process that determines whether mathematicians feel confident about a theorem.
  • 37.
    Thomas, J., TheAxiom of Choice, North-Holland, Amsterdam, 1973 (a historical review of independence results in set theory)
  • 38.
    Technical infeasibility: Scalability Scalability #statements/year= #statements/nanopub x #nanopubs/paper x #papers/year = 30 x N x 1.5M = N x 45M/yr Let’s hope N ≈ O(10)….
  • 39.
    Technical infeasibility: expressivity •RDF hopelessly simple • Needs at least DL: “Mosquito’s transmit malaria“ All? no. Some? yes. Only? probably. transmit. Malaria  Mosquitos Many? Most? • Beyond DL: Probabilities, fuzziness, inconsistencies
  • 40.
    Technical (in)feasibility: Argumentation graphs Escilatopramdoes not inhibit CYP2D6” Micropublications, Clark, Ciccarese, Goble, 2013
  • 41.
    Technical (in)feasibility: Argumentation graphs Argumentationgraphs require: • Defeasible logic • Modal logic • Higher-order logic • …. at scale of 450M statement/yr 
  • 42.
    Should we giveup on computers as scientific colleagues? • A more modest role for nano-publications? – Annotations of datasets? – Very approximate annotations of papers? • Make them speak our language instead of us speaking theirs?

Editor's Notes

  • #6 Use circular diagram?
  • #21 Move this to start of talk: this workshop does mostly before & after, not papers themselves.
  • #23  Overall conclusion: Publishing hasn't changed for 300+ years The storyline of our papers is still based on this Deconstruct the scientific paper from monolithic block to a network of facts & context All of this made possible by the semantic web Consequences for science mapping: Science maps will get better Science maps will be more needed