Coordination and Support Action
GARRI-3-2014 Scientific Information in the Digital Age: Text and Data Mining (TDM)
Project number: 665940
Supporting the Uptake of TDM
Reducing Barriers and Increasing Uptake of Text and Data Mining for Research Environments
using a Collaborative Knowledge and Open Information Approach
5 July, 2017
Activities in 2017
• FutureTDM Policy Framework
• Future applications and economics of TDM
• Guidelines for TDM stakeholders and practitioners
• Case studies: Best practices to support TDM
• Roadmap for increasing uptake of TDM
FutureTDM Stakeholder Guidelines
• Aimed at giving concrete, practical advice
• Legal guidelines
• Is my TDM project lawful? Do I need expert advice?
• Licensee guidelines
• What makes a TDM licence reasonable and
proportionate to user needs?
• Data management guidelines
• What do machine ‘readers’ need for TDM?
• Guidelines for university policy
• How can universities support TDM strategically?
Legal uncertainty around TDM
• What laws apply, and to which TDM
• Copyright, database rights, data protection…
• Reading, processing, copying, publishing excerpts…
• Who benefits from legal exceptions?
• What is “non-commercial”?
• What is a “research organisation”?
FutureTDM Legal Guidelines
• Breakdown of when legal considerations are
• Intellectual property: Do I need a licence?
• Data protection: What do I need to consider?
• When to seek expert legal advice
Licensing content for TDM
• TDM requires large datasets
• Potentially from hundreds or thousands of sources
• Uncertainty around who may benefit from
exceptions to copyright
• Many existing licences are unclear or silent on
Licensing considerations for TDM: Reasonable and Proportionate?
• Does it make practical sense to distinguish between
‘commercial’ and ‘non-commercial’ research?
• Does intrusive usage and activity monitoring affect
researchers’ academic freedoms?
• Can researchers reproduce reasonable, illustrative
excerpts of content with their analysis?
• Is it practical for researchers to attribute credit to
every piece of content used in analysis?
• Do technical protection measures prevent
researchers from carrying out TDM at reasonable
Data Management in the Context of TDM
• Human and machine ‘readers’ have different
• Open access does not necessarily mean
practically accessible for TDM
• Machine ‘readers’ need:
• Machine-readable file format
• Machine-readable metadata
• Bulk access to content
What do Machine ‘Readers’ Need?
• Key (machine-readable) metadata:
• Permanent identifiers
• What licences or rights apply to content
• Data type, format, size
• Any specific tools needed to work with the data
• Data provenance and rights-holder information
• Data changes and versioning
• Other domain-specific metadata
The Role of Universities
• Universities are involved in all aspects of the
TDM value chain
• Content creation
• Content dissemination
• Development and use of TDM tools
• Sharing of new knowledge and insights
• …but very few in Europe have strategic policy
approaches to TDM
Encouraging Development of Strategic TDM Policy
• Demonstrate need
• Involve stakeholders
• Understand your institution
• Consolidate information
• Identify promoters and early adopters
• Introduce incentives
• Share your progress!
How can Libraries Support TDM?
• Help connect researchers with expert legal
advice where needed
• Consider researcher needs in the context of
TDM when negotiating licences
• Ensure repositories have machine-readable
metadata to make TDM practically feasible
• Work within universities or institutions to
develop strategic approaches to supporting