SlideShare a Scribd company logo
1 of 45
TEXT AND DATA MINING IN
PUBLIC RESEARCH
Rob Johnson – 13/12/2016
1
2
1.Why doesTDM matter?
2.Why isn’t it used more widely in public
research?
3. How do we change this?
Study aims
Assess economic impact of TDM
on public research in France via:
• Case studies (France, UK, Europe)
• Analysis of the relevance of a
copyright exception for TDM
3
http://adbu.fr/etude-tdm/
6-fold return
€6 contribution to EU economy for each €1 directly
generated by research universities (source: Biggar
Economics)
20% per annum
Estimated rate of return to public investment in
science and innovation (source: Frontier Economics)
€16 billion
Value of R&D performed within French universities
and public research bodies (source: Eurostat)
4
2.4 million
Scientific articles per annum
Zero
Number of researchers who can keep up
2.5 quintillion
bytes
Data produced each day
5
Any automated analytical technique
aiming to analyse text and data in
digital form in order to generate
information such as patterns, trends
and correlations.
European Commission. Proposal for a Directive of the European Parliament
and of the Council on copyright in the Digital Single Market
6
What is TDM?
BASE CAMP
Where are we now, and how did we get here?
7
…countries, in which academic researchers must
acquire the express consent of rights holders to
conduct lawful datamining, exhibit a
significantly lower share of data mining
research output relative to total research
output
Handke, Guilbault and Vallbe IS EUROPE FALLING BEHIND IN DATA MINING? (2015)
8
What is the problem?
The European ecosystem for engaging
in text and data mining remains
highly problematic… The end result:
Europe is being leapfrogged by rising
interest in other regions, notably
Asia.
Filippov, S. & Hofheinz, P. Text and Data Mining for Research and Innovation
(2016)
9
What is the result?
Legislative options
10
2014 2017?
Industry
self-
regulation
Mandatory exceptions to copyright
Non-commercial
research only
Commercial
research,
beneficiaries
restricted
1 2 3 4
Commercial
research purpose,
beneficiaries
unrestricted
Loi pour une République Numérique (Loi LEMAIRE)
28 September 2016
1.5?
Restriction France
No lawful access
Not scientific
literature
-
Not public research
Commercial
purpose
Conservation not by
designated body
Using a TDM exception
11
1.
ACHIEVING LEGAL CLARITY
12
Copyright exception
(Base Camp)
Camp 1:
Legal clarity
EC Directive
Camp 2: Access
to content
Camp 3: Technical
infrastructure
Camp 4: Skills
and support
Summit: Researchers
embrace TDM
The exception has made
a massive difference...
Petr Knoth, Open University, UK
14
…the definition of commercial
and non-commercial research
is creating uncertainty
Petr Knoth, Open University, UK
15
EC Proposed
Directive
• Consistent with the existing EU
copyright legal framework
• Could help resolve uncertainty over
commercial partnerships
• Currently out for consultation
Source: http://www.comodinicachia.com/timeline.html
What needs to happen?
• Communicate legal provisions for TDM with
certainty and clarity
• Clarify the exception’s scope where public
researchers collaborate with commercial partners
• Monitor the interaction of the copyright exception
with digital rights management (DRM), licensing and
other relevant legal regimes
17
Any questions?
18
2.
SECURING ACCESS
19
I scaled down my TDM research,
and had to exclude two
publishers… I couldn’t do what I
set out to do
Chris Hartgerink, Tilburg University, Netherlands
20
I had to ask too many publishers for the
right to download … it takes a lot of time
and … the publishers’ servers frequently
block us.
Mathieu Andro, INRA, France
21
What is the problem
with access?
• Technical protection measures (TPMs)
• Crawler traps
• Restricted access to application programming
interfaces (APIs)
22
• Incorporate TDM clauses into model licence
agreements
• Educate researchers on their rights
• Maintain dialogue with publishers
• Improve access through better infrastructure…
23
What needs to happen?
3.
INFRASTRUCTURE & TOOLS
24
Image: National Geographic
…Every time you have a new project or
data source… you hit issues about how
the documents are structured, oddities
of formatting, and so on.
Mark Greenwood, GATE, UK
25
The TDM Landscape
26
Source: OpenMinTED
• Invest in TDM infrastructure
• Make TDM accessible to non-specialists
• Streamline access
• Open standards and harmonised data formats
27
What needs to happen?
4.
SKILLS & SUPPORT
28
…We have algorithms to
answer questions, but we do
not have algorithms to ask
questions
François Rioult, GREYC Laboratory, Université de
Caen, France
• François Rioult
29
30
What is the role of the librarian?
Photo: REUTERS
The library needs to be able to say: ‘If
you’ve got a question about TDM,
come to us’
Danny Kingsley, Head of Scholarly Communications,
University of Cambridge, UK
31
Library support for TDM
• Advocacy
• Copyright advice
• Access to legal expertise
• Skills development and training
• Advice on data sources and tools
32
5.
EMBRACING TDM
33
34
"Because it's there"
35
Why?
There are so many obstructions in the
way of doing this research, and doing it
well. It is just too hard and so people do
other things
Ross Mounce, University of Cambridge, UK
36
• Endorsement by senior research leaders
• Funding and incentives linked to TDM
• Alignment with moves to open science
37
What needs to happen?
38
1.Why doesTDM matter?
2.Why isn’t it used more widely in
public research?
3. How do we change this?
Why does TDM matter?
Public research is valuable
39
TDM makes research more efficient
TDM is worth investing in
40
1.Why doesTDM matter?
2.Why isn’t it used more widely in
public research?
3. How do we change this?
Copyright exception
(Base Camp)
Camp 1:
Legal clarity
EC Directive
Camp 2: Access
to content
Camp 3: Technical
infrastructure
Camp 4: Skills
and support
Summit: Researchers
embrace TDM
42
1.Why doesTDM matter?
2.Why isn’t it used more widely in
public research?
3. How do we change this?
43
Libraries
•Monitor researchers’ experience
•Develop case studies and guidance
•Involve the national library
•Invest in TDM support
•Incorporate TDM clauses into licence
agreements
researchers’ experiences
Making TDM a reality
44
Legislators
• Provide certainty
• Enable public/private partnerships
• Monitor interaction with other
legislation (e.g. DRM)
Institutions/research leaders
• Endorse TDM
• Invest in library services
• Explore knowledge exchange
opportunities
Research funders
• Invest in infrastructure
• Forum to improve access
• Link TDM to Open Science
Publishers & providers
• Cloud services for TDM
• Steamline access
• Open, harmonised standards
Making TDM a reality
Rob Johnson
Template inspired by SlidesCarnival
Thank you
rob.johnson@research-consulting.com
www.research-consulting.com
45
http://adbu.fr/etude-tdm/
Full report available at::

More Related Content

What's hot

The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?petermurrayrust
 
A Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and WikidataA Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and Wikidatapetermurrayrust
 
Climate Change and Human Migration
Climate Change and Human MigrationClimate Change and Human Migration
Climate Change and Human Migrationpetermurrayrust
 
Open Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureOpen Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureRoss Mounce
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literatureHigh throughput mining of the scholarly literature
High throughput mining of the scholarly literaturepetermurrayrust
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themRoss Mounce
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome Trustpetermurrayrust
 
Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)TheContentMine
 
High throughput mining of the scholarly literature: journals and theses
High throughput mining of the scholarly literature: journals and thesesHigh throughput mining of the scholarly literature: journals and theses
High throughput mining of the scholarly literature: journals and thesespetermurrayrust
 
The State of Open Research Data
The State of Open Research DataThe State of Open Research Data
The State of Open Research DataRoss Mounce
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureTheContentMine
 
Modern Tools & Rationales for 21st Century Research
Modern Tools & Rationales  for 21st Century ResearchModern Tools & Rationales  for 21st Century Research
Modern Tools & Rationales for 21st Century ResearchRoss Mounce
 
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Ross Mounce
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in NeuroscienceTheContentMine
 
The Content Mine (presented at UKSG)
The Content Mine (presented at UKSG)The Content Mine (presented at UKSG)
The Content Mine (presented at UKSG)petermurrayrust
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literature High throughput mining of the scholarly literature
High throughput mining of the scholarly literature TheContentMine
 
ContentMine: Mining the Scientific Literature
ContentMine: Mining the Scientific LiteratureContentMine: Mining the Scientific Literature
ContentMine: Mining the Scientific Literaturepetermurrayrust
 
Disrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic ComplexDisrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic Complexpetermurrayrust
 
Liberating facts from the scientific literature - Jisc Digifest 2016
Liberating facts from the scientific literature - Jisc Digifest 2016 Liberating facts from the scientific literature - Jisc Digifest 2016
Liberating facts from the scientific literature - Jisc Digifest 2016 TheContentMine
 
Cochrane workshop 2016
Cochrane workshop 2016Cochrane workshop 2016
Cochrane workshop 2016TheContentMine
 

What's hot (20)

The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
 
A Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and WikidataA Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and Wikidata
 
Climate Change and Human Migration
Climate Change and Human MigrationClimate Change and Human Migration
Climate Change and Human Migration
 
Open Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureOpen Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | Future
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literatureHigh throughput mining of the scholarly literature
High throughput mining of the scholarly literature
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on them
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome Trust
 
Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)
 
High throughput mining of the scholarly literature: journals and theses
High throughput mining of the scholarly literature: journals and thesesHigh throughput mining of the scholarly literature: journals and theses
High throughput mining of the scholarly literature: journals and theses
 
The State of Open Research Data
The State of Open Research DataThe State of Open Research Data
The State of Open Research Data
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the Literature
 
Modern Tools & Rationales for 21st Century Research
Modern Tools & Rationales  for 21st Century ResearchModern Tools & Rationales  for 21st Century Research
Modern Tools & Rationales for 21st Century Research
 
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neuroscience
 
The Content Mine (presented at UKSG)
The Content Mine (presented at UKSG)The Content Mine (presented at UKSG)
The Content Mine (presented at UKSG)
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literature High throughput mining of the scholarly literature
High throughput mining of the scholarly literature
 
ContentMine: Mining the Scientific Literature
ContentMine: Mining the Scientific LiteratureContentMine: Mining the Scientific Literature
ContentMine: Mining the Scientific Literature
 
Disrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic ComplexDisrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic Complex
 
Liberating facts from the scientific literature - Jisc Digifest 2016
Liberating facts from the scientific literature - Jisc Digifest 2016 Liberating facts from the scientific literature - Jisc Digifest 2016
Liberating facts from the scientific literature - Jisc Digifest 2016
 
Cochrane workshop 2016
Cochrane workshop 2016Cochrane workshop 2016
Cochrane workshop 2016
 

Similar to Text and data mining in UK and France (ADBU - 13 Dec 16)

Text and Data Mining : Making the Most of a Copyright Exception. Julien Roche...
Text and Data Mining : Making the Most of a Copyright Exception. Julien Roche...Text and Data Mining : Making the Most of a Copyright Exception. Julien Roche...
Text and Data Mining : Making the Most of a Copyright Exception. Julien Roche...LIBER Europe
 
FutureTDM Workshop II 29 March
FutureTDM Workshop II 29 MarchFutureTDM Workshop II 29 March
FutureTDM Workshop II 29 MarchFutureTDM
 
FutureTDM: Increasing Uptake of Text and Data Mining in the EU
FutureTDM: Increasing Uptake of Text and Data Mining in the EUFutureTDM: Increasing Uptake of Text and Data Mining in the EU
FutureTDM: Increasing Uptake of Text and Data Mining in the EUBrian Hole
 
Library Science Talk: Tensions between copyright and knowledge discovery
Library Science Talk: Tensions between copyright and knowledge discoveryLibrary Science Talk: Tensions between copyright and knowledge discovery
Library Science Talk: Tensions between copyright and knowledge discoveryLIBER Europe
 
The importance of content-mining in the EC policy on open access
The importance of content-mining in the EC policy on open accessThe importance of content-mining in the EC policy on open access
The importance of content-mining in the EC policy on open accessJean-François Dechamp
 
Text and data mining - the opportunities and the EU conundrum - why aren’t we...
Text and data mining - the opportunities and the EU conundrum - why aren’t we...Text and data mining - the opportunities and the EU conundrum - why aren’t we...
Text and data mining - the opportunities and the EU conundrum - why aren’t we...FutureTDM
 
Open Forum Summit June 2010
Open Forum Summit June 2010Open Forum Summit June 2010
Open Forum Summit June 2010Jerry Fishenden
 
Research and Innovation in transformation: the transition to Open Science
Research and Innovation in transformation: the transition to Open ScienceResearch and Innovation in transformation: the transition to Open Science
Research and Innovation in transformation: the transition to Open ScienceJean-François Dechamp
 
A research-friendly copyright environment in the digital age: a European pers...
A research-friendly copyright environment in the digital age: a European pers...A research-friendly copyright environment in the digital age: a European pers...
A research-friendly copyright environment in the digital age: a European pers...Jean-François Dechamp
 
Introduction to the FutureTDM project
Introduction to the FutureTDM projectIntroduction to the FutureTDM project
Introduction to the FutureTDM projectFutureTDM
 
Copyright literacy survey review
Copyright literacy survey review Copyright literacy survey review
Copyright literacy survey review Jane Secker
 
Libraries at the centre of the debate on copyright and text and data mining: ...
Libraries at the centre of the debate on copyright and text and data mining: ...Libraries at the centre of the debate on copyright and text and data mining: ...
Libraries at the centre of the debate on copyright and text and data mining: ...LIBER Europe
 
School of rock(ing) UE copyright - 2017 Ljubljana
School of rock(ing) UE copyright - 2017 LjubljanaSchool of rock(ing) UE copyright - 2017 Ljubljana
School of rock(ing) UE copyright - 2017 Ljubljanacentrumcyfrowe
 
The current status of TDM in Europe
The current status of TDM in EuropeThe current status of TDM in Europe
The current status of TDM in EuropeLIBER Europe
 
Horizon 2020 - Oportunidades entre UE y LAC. Octubre 2017
Horizon 2020 - Oportunidades entre UE y LAC. Octubre 2017Horizon 2020 - Oportunidades entre UE y LAC. Octubre 2017
Horizon 2020 - Oportunidades entre UE y LAC. Octubre 2017Elan Network
 
School of rocking copyright 2017 in Lisbon
School of rocking copyright 2017 in Lisbon School of rocking copyright 2017 in Lisbon
School of rocking copyright 2017 in Lisbon centrumcyfrowe
 
Tensions between intellectual property and knowledge discovery in the digital...
Tensions between intellectual property and knowledge discovery in the digital...Tensions between intellectual property and knowledge discovery in the digital...
Tensions between intellectual property and knowledge discovery in the digital...LIBER Europe
 
Ramon Rentmeester (AgentschapNL) @ Horizon 2020 voorlichtingsbijeenkomst
Ramon Rentmeester (AgentschapNL) @ Horizon 2020 voorlichtingsbijeenkomstRamon Rentmeester (AgentschapNL) @ Horizon 2020 voorlichtingsbijeenkomst
Ramon Rentmeester (AgentschapNL) @ Horizon 2020 voorlichtingsbijeenkomstMedia Perspectives
 
Crowdsourcing on what are the new sources of ict enabled growth and jobs to t...
Crowdsourcing on what are the new sources of ict enabled growth and jobs to t...Crowdsourcing on what are the new sources of ict enabled growth and jobs to t...
Crowdsourcing on what are the new sources of ict enabled growth and jobs to t...polenumerique33
 

Similar to Text and data mining in UK and France (ADBU - 13 Dec 16) (20)

Text and Data Mining : Making the Most of a Copyright Exception. Julien Roche...
Text and Data Mining : Making the Most of a Copyright Exception. Julien Roche...Text and Data Mining : Making the Most of a Copyright Exception. Julien Roche...
Text and Data Mining : Making the Most of a Copyright Exception. Julien Roche...
 
FutureTDM Workshop II 29 March
FutureTDM Workshop II 29 MarchFutureTDM Workshop II 29 March
FutureTDM Workshop II 29 March
 
FutureTDM: Increasing Uptake of Text and Data Mining in the EU
FutureTDM: Increasing Uptake of Text and Data Mining in the EUFutureTDM: Increasing Uptake of Text and Data Mining in the EU
FutureTDM: Increasing Uptake of Text and Data Mining in the EU
 
Library Science Talk: Tensions between copyright and knowledge discovery
Library Science Talk: Tensions between copyright and knowledge discoveryLibrary Science Talk: Tensions between copyright and knowledge discovery
Library Science Talk: Tensions between copyright and knowledge discovery
 
The importance of content-mining in the EC policy on open access
The importance of content-mining in the EC policy on open accessThe importance of content-mining in the EC policy on open access
The importance of content-mining in the EC policy on open access
 
Text and data mining - the opportunities and the EU conundrum - why aren’t we...
Text and data mining - the opportunities and the EU conundrum - why aren’t we...Text and data mining - the opportunities and the EU conundrum - why aren’t we...
Text and data mining - the opportunities and the EU conundrum - why aren’t we...
 
Open Forum Summit June 2010
Open Forum Summit June 2010Open Forum Summit June 2010
Open Forum Summit June 2010
 
Research and Innovation in transformation: the transition to Open Science
Research and Innovation in transformation: the transition to Open ScienceResearch and Innovation in transformation: the transition to Open Science
Research and Innovation in transformation: the transition to Open Science
 
A research-friendly copyright environment in the digital age: a European pers...
A research-friendly copyright environment in the digital age: a European pers...A research-friendly copyright environment in the digital age: a European pers...
A research-friendly copyright environment in the digital age: a European pers...
 
Tdm dechamp colin_open_minted
Tdm dechamp colin_open_mintedTdm dechamp colin_open_minted
Tdm dechamp colin_open_minted
 
Introduction to the FutureTDM project
Introduction to the FutureTDM projectIntroduction to the FutureTDM project
Introduction to the FutureTDM project
 
Copyright literacy survey review
Copyright literacy survey review Copyright literacy survey review
Copyright literacy survey review
 
Libraries at the centre of the debate on copyright and text and data mining: ...
Libraries at the centre of the debate on copyright and text and data mining: ...Libraries at the centre of the debate on copyright and text and data mining: ...
Libraries at the centre of the debate on copyright and text and data mining: ...
 
School of rock(ing) UE copyright - 2017 Ljubljana
School of rock(ing) UE copyright - 2017 LjubljanaSchool of rock(ing) UE copyright - 2017 Ljubljana
School of rock(ing) UE copyright - 2017 Ljubljana
 
The current status of TDM in Europe
The current status of TDM in EuropeThe current status of TDM in Europe
The current status of TDM in Europe
 
Horizon 2020 - Oportunidades entre UE y LAC. Octubre 2017
Horizon 2020 - Oportunidades entre UE y LAC. Octubre 2017Horizon 2020 - Oportunidades entre UE y LAC. Octubre 2017
Horizon 2020 - Oportunidades entre UE y LAC. Octubre 2017
 
School of rocking copyright 2017 in Lisbon
School of rocking copyright 2017 in Lisbon School of rocking copyright 2017 in Lisbon
School of rocking copyright 2017 in Lisbon
 
Tensions between intellectual property and knowledge discovery in the digital...
Tensions between intellectual property and knowledge discovery in the digital...Tensions between intellectual property and knowledge discovery in the digital...
Tensions between intellectual property and knowledge discovery in the digital...
 
Ramon Rentmeester (AgentschapNL) @ Horizon 2020 voorlichtingsbijeenkomst
Ramon Rentmeester (AgentschapNL) @ Horizon 2020 voorlichtingsbijeenkomstRamon Rentmeester (AgentschapNL) @ Horizon 2020 voorlichtingsbijeenkomst
Ramon Rentmeester (AgentschapNL) @ Horizon 2020 voorlichtingsbijeenkomst
 
Crowdsourcing on what are the new sources of ict enabled growth and jobs to t...
Crowdsourcing on what are the new sources of ict enabled growth and jobs to t...Crowdsourcing on what are the new sources of ict enabled growth and jobs to t...
Crowdsourcing on what are the new sources of ict enabled growth and jobs to t...
 

More from Rob Johnson

Research for development - ARMA conference June 2019
Research for development - ARMA conference June 2019Research for development - ARMA conference June 2019
Research for development - ARMA conference June 2019Rob Johnson
 
Where next for Plan S?
Where next for Plan S?Where next for Plan S?
Where next for Plan S?Rob Johnson
 
Embracing Complexity - The new normal in scholarly communication
Embracing Complexity - The new normal in scholarly communicationEmbracing Complexity - The new normal in scholarly communication
Embracing Complexity - The new normal in scholarly communicationRob Johnson
 
OA market presentation for open aire 20 april (final)
OA market presentation for open aire 20 april (final)OA market presentation for open aire 20 april (final)
OA market presentation for open aire 20 april (final)Rob Johnson
 
Securing the future of OA policies - Rob Johnson
Securing the future of OA policies - Rob JohnsonSecuring the future of OA policies - Rob Johnson
Securing the future of OA policies - Rob JohnsonRob Johnson
 
Open Access Advocacy - Joining the Dots (session 4c)
Open Access Advocacy - Joining the Dots (session 4c)Open Access Advocacy - Joining the Dots (session 4c)
Open Access Advocacy - Joining the Dots (session 4c)Rob Johnson
 

More from Rob Johnson (6)

Research for development - ARMA conference June 2019
Research for development - ARMA conference June 2019Research for development - ARMA conference June 2019
Research for development - ARMA conference June 2019
 
Where next for Plan S?
Where next for Plan S?Where next for Plan S?
Where next for Plan S?
 
Embracing Complexity - The new normal in scholarly communication
Embracing Complexity - The new normal in scholarly communicationEmbracing Complexity - The new normal in scholarly communication
Embracing Complexity - The new normal in scholarly communication
 
OA market presentation for open aire 20 april (final)
OA market presentation for open aire 20 april (final)OA market presentation for open aire 20 april (final)
OA market presentation for open aire 20 april (final)
 
Securing the future of OA policies - Rob Johnson
Securing the future of OA policies - Rob JohnsonSecuring the future of OA policies - Rob Johnson
Securing the future of OA policies - Rob Johnson
 
Open Access Advocacy - Joining the Dots (session 4c)
Open Access Advocacy - Joining the Dots (session 4c)Open Access Advocacy - Joining the Dots (session 4c)
Open Access Advocacy - Joining the Dots (session 4c)
 

Recently uploaded

Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsNurulAfiqah307317
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICEayushi9330
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Monika Rani
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 

Recently uploaded (20)

Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening Designs
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 

Text and data mining in UK and France (ADBU - 13 Dec 16)

  • 1. TEXT AND DATA MINING IN PUBLIC RESEARCH Rob Johnson – 13/12/2016 1
  • 2. 2 1.Why doesTDM matter? 2.Why isn’t it used more widely in public research? 3. How do we change this?
  • 3. Study aims Assess economic impact of TDM on public research in France via: • Case studies (France, UK, Europe) • Analysis of the relevance of a copyright exception for TDM 3 http://adbu.fr/etude-tdm/
  • 4. 6-fold return €6 contribution to EU economy for each €1 directly generated by research universities (source: Biggar Economics) 20% per annum Estimated rate of return to public investment in science and innovation (source: Frontier Economics) €16 billion Value of R&D performed within French universities and public research bodies (source: Eurostat) 4
  • 5. 2.4 million Scientific articles per annum Zero Number of researchers who can keep up 2.5 quintillion bytes Data produced each day 5
  • 6. Any automated analytical technique aiming to analyse text and data in digital form in order to generate information such as patterns, trends and correlations. European Commission. Proposal for a Directive of the European Parliament and of the Council on copyright in the Digital Single Market 6 What is TDM?
  • 7. BASE CAMP Where are we now, and how did we get here? 7
  • 8. …countries, in which academic researchers must acquire the express consent of rights holders to conduct lawful datamining, exhibit a significantly lower share of data mining research output relative to total research output Handke, Guilbault and Vallbe IS EUROPE FALLING BEHIND IN DATA MINING? (2015) 8 What is the problem?
  • 9. The European ecosystem for engaging in text and data mining remains highly problematic… The end result: Europe is being leapfrogged by rising interest in other regions, notably Asia. Filippov, S. & Hofheinz, P. Text and Data Mining for Research and Innovation (2016) 9 What is the result?
  • 10. Legislative options 10 2014 2017? Industry self- regulation Mandatory exceptions to copyright Non-commercial research only Commercial research, beneficiaries restricted 1 2 3 4 Commercial research purpose, beneficiaries unrestricted Loi pour une République Numérique (Loi LEMAIRE) 28 September 2016 1.5?
  • 11. Restriction France No lawful access Not scientific literature - Not public research Commercial purpose Conservation not by designated body Using a TDM exception 11
  • 13. Copyright exception (Base Camp) Camp 1: Legal clarity EC Directive Camp 2: Access to content Camp 3: Technical infrastructure Camp 4: Skills and support Summit: Researchers embrace TDM
  • 14. The exception has made a massive difference... Petr Knoth, Open University, UK 14
  • 15. …the definition of commercial and non-commercial research is creating uncertainty Petr Knoth, Open University, UK 15
  • 16. EC Proposed Directive • Consistent with the existing EU copyright legal framework • Could help resolve uncertainty over commercial partnerships • Currently out for consultation Source: http://www.comodinicachia.com/timeline.html
  • 17. What needs to happen? • Communicate legal provisions for TDM with certainty and clarity • Clarify the exception’s scope where public researchers collaborate with commercial partners • Monitor the interaction of the copyright exception with digital rights management (DRM), licensing and other relevant legal regimes 17
  • 20. I scaled down my TDM research, and had to exclude two publishers… I couldn’t do what I set out to do Chris Hartgerink, Tilburg University, Netherlands 20
  • 21. I had to ask too many publishers for the right to download … it takes a lot of time and … the publishers’ servers frequently block us. Mathieu Andro, INRA, France 21
  • 22. What is the problem with access? • Technical protection measures (TPMs) • Crawler traps • Restricted access to application programming interfaces (APIs) 22
  • 23. • Incorporate TDM clauses into model licence agreements • Educate researchers on their rights • Maintain dialogue with publishers • Improve access through better infrastructure… 23 What needs to happen?
  • 24. 3. INFRASTRUCTURE & TOOLS 24 Image: National Geographic
  • 25. …Every time you have a new project or data source… you hit issues about how the documents are structured, oddities of formatting, and so on. Mark Greenwood, GATE, UK 25
  • 27. • Invest in TDM infrastructure • Make TDM accessible to non-specialists • Streamline access • Open standards and harmonised data formats 27 What needs to happen?
  • 29. …We have algorithms to answer questions, but we do not have algorithms to ask questions François Rioult, GREYC Laboratory, Université de Caen, France • François Rioult 29
  • 30. 30 What is the role of the librarian? Photo: REUTERS
  • 31. The library needs to be able to say: ‘If you’ve got a question about TDM, come to us’ Danny Kingsley, Head of Scholarly Communications, University of Cambridge, UK 31
  • 32. Library support for TDM • Advocacy • Copyright advice • Access to legal expertise • Skills development and training • Advice on data sources and tools 32
  • 34. 34
  • 36. There are so many obstructions in the way of doing this research, and doing it well. It is just too hard and so people do other things Ross Mounce, University of Cambridge, UK 36
  • 37. • Endorsement by senior research leaders • Funding and incentives linked to TDM • Alignment with moves to open science 37 What needs to happen?
  • 38. 38 1.Why doesTDM matter? 2.Why isn’t it used more widely in public research? 3. How do we change this?
  • 39. Why does TDM matter? Public research is valuable 39 TDM makes research more efficient TDM is worth investing in
  • 40. 40 1.Why doesTDM matter? 2.Why isn’t it used more widely in public research? 3. How do we change this?
  • 41. Copyright exception (Base Camp) Camp 1: Legal clarity EC Directive Camp 2: Access to content Camp 3: Technical infrastructure Camp 4: Skills and support Summit: Researchers embrace TDM
  • 42. 42 1.Why doesTDM matter? 2.Why isn’t it used more widely in public research? 3. How do we change this?
  • 43. 43 Libraries •Monitor researchers’ experience •Develop case studies and guidance •Involve the national library •Invest in TDM support •Incorporate TDM clauses into licence agreements researchers’ experiences Making TDM a reality
  • 44. 44 Legislators • Provide certainty • Enable public/private partnerships • Monitor interaction with other legislation (e.g. DRM) Institutions/research leaders • Endorse TDM • Invest in library services • Explore knowledge exchange opportunities Research funders • Invest in infrastructure • Forum to improve access • Link TDM to Open Science Publishers & providers • Cloud services for TDM • Steamline access • Open, harmonised standards Making TDM a reality
  • 45. Rob Johnson Template inspired by SlidesCarnival Thank you rob.johnson@research-consulting.com www.research-consulting.com 45 http://adbu.fr/etude-tdm/ Full report available at::

Editor's Notes

  1. France - €6.4billion R&D in government sector, €10 billion in HE UK - €3billion in government, €9 billion
  2. A number of studies indicate that TDM can increase the efficiency of research Increase coverage of literature reviews Cut down manual work Automate information retrieval Accelerate drug discovery
  3. Note - Conservation requirements could be a positive in terms of reproducibility
  4. A many-to-many problem
  5. Edmund Hillary and Tenzing Norgay
  6. Advocacy for the benefits of TDM at all levels of the organisation Copyright advice on using the TDM exception Access to legal expertise Skills development (indexing and metadata curation) and access to technical training (coding and high performance computing) Advice on data sources and tools