FutureTDM: Increasing Uptake of Text and Data Mining in the EU
1. OpenDataMonitor
Horizon 2020
Coordination and Support Action
GARRI-3-2014 Scientific Information in the Digital Age: Text and Data Mining (TDM)
Project number: 665940
Increasing Uptake of Text and Data Mining in the EU
FutureTDM
Reducing Barriers and Increasing Uptake of Text and Data Mining for Research Environments
using a Collaborative Knowledge and Open Information Approach
Brian Hole, Ubiquity Press, Open Repositories workshop, Brisbane, Australia
27 June 2017
2. Workshop overview
• Introduction
•Defining Text and Data Mining (Brian Hole)
•An overview of the FTDM project and its aims (Brian Hole)
•Results of the FTDM project (Freyja van den Boom)
•Related projects: OpenMinted and CORE (Petr Knoth)
• Discussion of workshop expectations
• Hands on TDM (Petr Knoth)
Tea
• Experimenting with TDM
• Q&A
• Wrap up (Freya van den Boom)
2FutureTDM
3. “the discovery by computer of new, previously
unknown information, by automatically extracting
and relating information from different (…)resources,
to reveal otherwise hidden meanings” (Hearst, 1999)
What is TDM?
3
16 trillion
gigabytes of
data by 2020
(236% growth)
Doubles every
2 years
(Moores Law,
1965)
Over 80% EU
citizens have
internet access
(Eurostat 2014)
4. Potential of TDM
4
• Addressing grand challenges such as climate change and global
epidemics
• Improving population health, wealth and development
• Creating new jobs and employment
• Exponentially increasing the speed and progress of science through new
insights and greater efficiency of research
• Increasing transparency of governments and their actions
• Fostering innovation and collaboration and boosting the impact of open
science
• Creating tools for education and research
• Providing new and richer cultural insights
• Speeding economic and social development in all parts of the globe
(The Hague Declaration on Knowledge Discovery)
5. 5
TDM is not a homogeneous, self-contained,
scientific domain, but rather a diverse and
complex set of methods and technologies
deployed in the framework of diverse disciplines
and business activities
The challenge
5
6. FutureTDM - the opportunity
The FutureTDM project seeks to improve uptake of text and data mining
(TDM) in the EU by actively engaging with stakeholders such as
researchers, developers, publishers and SMEs.
The use of content mining is significantly lower in Europe
than in some American and Asian countries.
The partners in the FutureTDM consortium share the ambition behind the
EC’s call to develop policy and legal frameworks to reduce the barriers of
TDM uptake and with it, promote the awareness of TDM opportunities
across Europe.
6
7. FutureTDM
7
ELABORATE a legal and policy
framework for future TDM, define
policy priorities, specify a research
agenda to foster the spread of TDM in
various research fields within the EU
BUILD a Collaborative
Knowledge Base and an
Open Information Hub
combined on a web-based
platform including intuitive
tools
ANALYSE current application areas
and trends in TDM including
statistics and key figures, collect
relevant research and industrial
projects and best practices
ASSESS existing studies, legal
regulations and policies on
TDM within the European
Union
Main Objectives of FutureTDM
INVOLVE all key
stakeholders to identify
practices, requirements, and
specific challenges in the
field of TDM
INCREASE awareness of
TDM to attract new
target groups and
science domains
8. 8
8
Remove existing legal, technological and skill barriers that
prevent TDM technology from being adopted within the EU.
Increase awareness about the social, economic and
scientific benefits of TDM.
Increase the Union’s competitiveness with other
high-tech economies (like Japan, South Korea, US) by
enhancing TDM adoption.
Foster the adoption of TDM in science and
economy.
Lead to Research & Innovation policy that is more relevant and
responsive to society
Impact
9. 9
We were involved in the “Licenses for Europe”
consultation by the European Commission in 2013.
TDM from a publisher’s experience
Legacy publishers present were lobbying against free access to TDM.
As an unconditionally open publisher, Ubiquity Press is committed to making content
available in all forms, for consumption by anyone and for any means. Allowing TDM on
our platform is therefore standard practice, and we fully encourage it.
Our position was that copyright reform with a clear exception for TDM was the best way
to go forward, and that we would not support a licensing solution.
The EC backed down from imposing a licensing solution through legislation.
The EC committed to taking all positions into Account when reviewing the EU Copyright
framework.
10. 10
Automatic download of all article XML, or a subset
TDM from a publisher’s experience
Features under development
Automatic deposit of XML to repositories
Journal and press-wide TDM resources for TDM,
e.g. tool profiles
Integration with TDM tools, e.g. ContentMine
QuickScrape
Investigating features to track TDM usage and
use cases
Investigating hosting of managed, TDM-optimized
Hydra/Samvera repositories.
The number of wireless sensors and actuators worldwide has exceeded 24 million, presenting an increase of 553% between 2011 and 20166.
● By 2020 there will be more than 16 zettabytes of useful data (16 Trillion GB)7.
● YouTube claims to upload 24 hours of video every minute, making the site a hugely significant data aggregator8.
● “Every second, on average, around 6,000 tweets are tweeted on Twitter, which corresponds to over 350,000 tweets sent per minute, 500 million tweets per day and around 200 billion tweets per year”9.
● 74,200,000 pages exist on Facebook, with 7 million apps and websites integrated with Facebook on 30/5/2016.10
● Over 1 billion websites and 3,36 billion internet users, on 11 May 201611.
● On average a new scientific article is being published every 30 seconds12.
● 60 000 publications on a single gene, p53, in the literature13.
Instead of reading through these maybe pick out one or two and give examples of TDM in practice?