brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress
Brian Hole
Copyright & Research and Innovation Policy meeting,
European Parliament, Brussels, 12 November 2013
Text and data mining
brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress
The Social Contract
of Science
• Validation
• Dissemination
• Further development
Scientific Malpractice
brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress
• Open access and open data (with CC-By and CC0 licenses)
would mean that all research text and data were available
for mining, reuse and analysis
• But legacy publishers are resisting open practices
Text and data mining
The ideal situation:
• A fair dealing exception is required that allows for
academic (and arguably other, e.g. commercial) mining of
both text and data
For other cases:
• Research in general
• Teaching
Exceptions also required for:
brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress
• TDM involves multiple, highly heterogenous sources, not
only in journals and books, but anywhere on the Internet.
Licensing cannot practically scale to cover this.
• TDM is simply reading of content, a right researchers
already have. Copyright was never intended to cover such
use. This is temporary copying for reading, not creative use.
Copyright should therefore be amended, not additional
licenses imposed to perpetuate the problem.
Licensing
Additional licensing is not a suitable solution:
• TDM licenses would prevent progress, prevent efficient
use of taxpayer money, prevent growth of SMEs, and
block work that prevents deaths.
brian.hole@ubiquitypress.com www.ubiquitypress.com / @ubiquitypress
• TDM is not a highly frequent activity, and involves
touching each resource only once.
• This is much lower than normal user behaviour and crawling
by other services.
False objections
• Any higher level of load could be easily and cost-effectively
managed – benefits of additional use and citation far
outway this.
‘Server overload’:
• Scientific facts and information are not your content.
The need to control access to ‘our content’:
• This results in building a reputation of being against open
science and scientific progress.

Brian Hole - Text and Data Mining - European Parliament presentation

  • 1.
    brian.hole@ubiquitypress.com www.ubiquitypress.com /@ubiquitypress Brian Hole Copyright & Research and Innovation Policy meeting, European Parliament, Brussels, 12 November 2013 Text and data mining
  • 2.
    brian.hole@ubiquitypress.com www.ubiquitypress.com /@ubiquitypress The Social Contract of Science • Validation • Dissemination • Further development Scientific Malpractice
  • 3.
    brian.hole@ubiquitypress.com www.ubiquitypress.com /@ubiquitypress • Open access and open data (with CC-By and CC0 licenses) would mean that all research text and data were available for mining, reuse and analysis • But legacy publishers are resisting open practices Text and data mining The ideal situation: • A fair dealing exception is required that allows for academic (and arguably other, e.g. commercial) mining of both text and data For other cases: • Research in general • Teaching Exceptions also required for:
  • 4.
    brian.hole@ubiquitypress.com www.ubiquitypress.com /@ubiquitypress • TDM involves multiple, highly heterogenous sources, not only in journals and books, but anywhere on the Internet. Licensing cannot practically scale to cover this. • TDM is simply reading of content, a right researchers already have. Copyright was never intended to cover such use. This is temporary copying for reading, not creative use. Copyright should therefore be amended, not additional licenses imposed to perpetuate the problem. Licensing Additional licensing is not a suitable solution: • TDM licenses would prevent progress, prevent efficient use of taxpayer money, prevent growth of SMEs, and block work that prevents deaths.
  • 5.
    brian.hole@ubiquitypress.com www.ubiquitypress.com /@ubiquitypress • TDM is not a highly frequent activity, and involves touching each resource only once. • This is much lower than normal user behaviour and crawling by other services. False objections • Any higher level of load could be easily and cost-effectively managed – benefits of additional use and citation far outway this. ‘Server overload’: • Scientific facts and information are not your content. The need to control access to ‘our content’: • This results in building a reputation of being against open science and scientific progress.

Editor's Notes

  • #3 This is for Stuart from the Royal Society