The Future is All Mine
Text and Data Mining
Projects in Europe
@openminted_eu @futuretdm
@openminted_eu
@futuretdm
Funded by:
Projects funded by
@openminted_eu
@futuretdm
Text and data mining is
the future
“Text and data mining (TDM) is the
process of deriving information from
machine-read material. It works by
copying large quantities of material,
extracting the data, and recombining it
to identify patterns.”
JISC
Projects funded by
@openminted_eu
@futuretdm
Text and data mining
helps us understand the
past
Mining historical
books:
the evolution of
language
Source: http://www.sciencemag.org/content/331/6014/176 (Baylor College of Medicine, Houston)
Projects funded by
@openminted_eu
@futuretdm
Text and data mining
predicts the future
Mining newspapers:
Predicts revolutions
Source: http://journals.uic.edu/ojs/index.php/fm/article/view/3663/3040 (University of Illinois)
Projects funded by
@openminted_eu
@futuretdm
Text and data mining
saves the future
Mining scientific
publications about
diseases:
Save lives
Source: http://dl.acm.org/citation.cfm?id=2623667 (Baylor College of Medicine, Houston)
Projects funded by
@openminted_eu
@futuretdm
Text mining – it seems so easy:
Linguistic
Analysis:
Entity
Recognition
Data Mining
Knowledge
Discovery
Information
Extraction
STAGE 1 STAGE 2 STAGE 3 STAGE 4
Information
Retrieval
Projects funded by
@openminted_eu
@futuretdm
But it actually poses many
challenges…
?
?
?
?
?
?
?
??
?? ?
?
??
?
?
How do I
make my texts
readable by
machines?
?Which mining
method to
use?
STAGE 1 STAGE 2 STAGE 3 STAGE 4
Where do I
find data?
Projects funded by
@openminted_eu
@futuretdm
9
Current Barriers in Europe
Awareness across Institutions & Stakeholders
 Lack of awareness among research
communities
 Lack of guidance to uncover TDM potential
Skills and Tools
 Availability and accessibility across disciplines
 Gap in skills across various sectors
Licensing & Open Access
 License proliferation and interoperability
issues
 License barriers to transparent open access
Copyright and Data Protection
 TDM activities infringing current copyright laws
 Legal and policy limitations and barriers for
TDM
Projects funded by
@openminted_eu
@futuretdm
EU PROJECTS on TDM
FutureTDM
Identify TDM
barriers and
policy solutions
Open mine
Build a TDM
eInfrastructure
Projects funded by
@openminted_eu
@futuretdm
ELABORATE a legal and
policy framework for future
TDM and specify a research
agenda to foster the spread
of TDM
BUILD a website: a
Collaborative
Knowledge Base and
an Open Information
Hub combined
ANALYSE current
application areas and best
practices in TDM
ASSESS existing
studies, legal
regulations and
policies on TDM
Main Objectives of FutureTDM
INVOLVE all key
stakeholders to
identify practices,
requirements, and
specific challenges
INCREASE
awareness of
TDM to attract
new target
groups and
science domains
@openminted_eu
@futuretdm
This project has received funding from the European Union’s Horizon 2020
Research and Innovation Programme under Grant Agreement No 665940.
Bottom-up
approach:
Stakeholder
workshops and
knowledge cafes
throughout Europe
FutureTDM
@openminted_eu
@futuretdm
This project has received funding from the European Union’s Horizon 2020
Research and Innovation Programme under Grant Agreement No 665940.
Data centre Data centre Data centre Data centre
in public cloud
Publisher text
corpus
OpenAIRE/CORE text
corpus
PMC text
corpus
Other text
corpora
Other text
corpora
Other text
corpora
Other types of text
corpora
Layer 3:
Interoperability
to shared storage and
computing resources
Language resources
Language resources
Language resources Language resources
Layer 2:
Interoperability of
language resources
& corpora
Layer 1:
Interoperability
of text mining services
(platforms or
components)
Language resources and corpora registry service
Platform services Registry Workflow ManagementAuth2 & Policy management Annotator Accounting
Mining Platforms Mining Platforms Mining Platforms
Proprietary architectures
Mining Platforms
Objective of OpenMinTeD
@openminted_eu
Projects funded by@futuretdm
OpenMinTeD brings together:
14
ACCESSIBLE
CONTENT
DISCOVERABLE
SERVICES
EFFICIENT
PROCESSING
TDM
COMMUNITIES
VALUE ADDED
APPS
Via standardised programmatic
interfaces and access rules
Easily discoverable text mining
services and workflows which
process, analyse and annotate text
Operate on public e-Infrastructures
via standarized APIs
Different scientific communities
have different challenges
Community-driven applications to
illustrate the value of the
infastructure. Engage with industry.
OPENMINTED = The Open Mining Infrastructure for Text and Data
Become involved
Follow us on Twitter for the latest updates and blogs
@openminted_eu
@futuretdm
Follow our websites
www.openminted.eu
www.futuretdm.eu
Projects funded by
@openminted_eu
@futuretdm
THANK YOU
• Athena RIC
• Univ. of Manchester (NacTem)
• Univ. of Darmstadt
• INRA
• EMBL-EBI
• Agro-Know
• LIBER
• Univ. of Amsterdam
• Open University UK
• EPFL
• CNIO
• Univ. of Sheffield (GATE)
• GESIS
• GRNET
• Frontiers
• Univ. of Stirling
PARTNERS OPENMINTEDPARTNERS FUTURETDM
• SYNYO GmbH (SYNYO)
• LIBER Europe
• Open Knowledge Foundation
LBG (OK/CM)
• Radboud Univ. Nijmegen
• The British Library Board
• Univ. of Amsterdam
• Athena RIC
• Ubiquity Press
• Fundacja Projekt: Polska (FPP)

The Future is All Mine

  • 1.
    The Future isAll Mine Text and Data Mining Projects in Europe @openminted_eu @futuretdm @openminted_eu @futuretdm Funded by:
  • 2.
  • 3.
    Text and datamining is the future “Text and data mining (TDM) is the process of deriving information from machine-read material. It works by copying large quantities of material, extracting the data, and recombining it to identify patterns.” JISC Projects funded by @openminted_eu @futuretdm
  • 4.
    Text and datamining helps us understand the past Mining historical books: the evolution of language Source: http://www.sciencemag.org/content/331/6014/176 (Baylor College of Medicine, Houston) Projects funded by @openminted_eu @futuretdm
  • 5.
    Text and datamining predicts the future Mining newspapers: Predicts revolutions Source: http://journals.uic.edu/ojs/index.php/fm/article/view/3663/3040 (University of Illinois) Projects funded by @openminted_eu @futuretdm
  • 6.
    Text and datamining saves the future Mining scientific publications about diseases: Save lives Source: http://dl.acm.org/citation.cfm?id=2623667 (Baylor College of Medicine, Houston) Projects funded by @openminted_eu @futuretdm
  • 7.
    Text mining –it seems so easy: Linguistic Analysis: Entity Recognition Data Mining Knowledge Discovery Information Extraction STAGE 1 STAGE 2 STAGE 3 STAGE 4 Information Retrieval Projects funded by @openminted_eu @futuretdm
  • 8.
    But it actuallyposes many challenges… ? ? ? ? ? ? ? ?? ?? ? ? ?? ? ? How do I make my texts readable by machines? ?Which mining method to use? STAGE 1 STAGE 2 STAGE 3 STAGE 4 Where do I find data? Projects funded by @openminted_eu @futuretdm
  • 9.
    9 Current Barriers inEurope Awareness across Institutions & Stakeholders  Lack of awareness among research communities  Lack of guidance to uncover TDM potential Skills and Tools  Availability and accessibility across disciplines  Gap in skills across various sectors Licensing & Open Access  License proliferation and interoperability issues  License barriers to transparent open access Copyright and Data Protection  TDM activities infringing current copyright laws  Legal and policy limitations and barriers for TDM Projects funded by @openminted_eu @futuretdm
  • 10.
    EU PROJECTS onTDM FutureTDM Identify TDM barriers and policy solutions Open mine Build a TDM eInfrastructure Projects funded by @openminted_eu @futuretdm
  • 11.
    ELABORATE a legaland policy framework for future TDM and specify a research agenda to foster the spread of TDM BUILD a website: a Collaborative Knowledge Base and an Open Information Hub combined ANALYSE current application areas and best practices in TDM ASSESS existing studies, legal regulations and policies on TDM Main Objectives of FutureTDM INVOLVE all key stakeholders to identify practices, requirements, and specific challenges INCREASE awareness of TDM to attract new target groups and science domains @openminted_eu @futuretdm This project has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under Grant Agreement No 665940.
  • 12.
    Bottom-up approach: Stakeholder workshops and knowledge cafes throughoutEurope FutureTDM @openminted_eu @futuretdm This project has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under Grant Agreement No 665940.
  • 13.
    Data centre Datacentre Data centre Data centre in public cloud Publisher text corpus OpenAIRE/CORE text corpus PMC text corpus Other text corpora Other text corpora Other text corpora Other types of text corpora Layer 3: Interoperability to shared storage and computing resources Language resources Language resources Language resources Language resources Layer 2: Interoperability of language resources & corpora Layer 1: Interoperability of text mining services (platforms or components) Language resources and corpora registry service Platform services Registry Workflow ManagementAuth2 & Policy management Annotator Accounting Mining Platforms Mining Platforms Mining Platforms Proprietary architectures Mining Platforms Objective of OpenMinTeD @openminted_eu Projects funded by@futuretdm
  • 14.
    OpenMinTeD brings together: 14 ACCESSIBLE CONTENT DISCOVERABLE SERVICES EFFICIENT PROCESSING TDM COMMUNITIES VALUEADDED APPS Via standardised programmatic interfaces and access rules Easily discoverable text mining services and workflows which process, analyse and annotate text Operate on public e-Infrastructures via standarized APIs Different scientific communities have different challenges Community-driven applications to illustrate the value of the infastructure. Engage with industry. OPENMINTED = The Open Mining Infrastructure for Text and Data
  • 15.
    Become involved Follow uson Twitter for the latest updates and blogs @openminted_eu @futuretdm Follow our websites www.openminted.eu www.futuretdm.eu Projects funded by @openminted_eu @futuretdm
  • 16.
    THANK YOU • AthenaRIC • Univ. of Manchester (NacTem) • Univ. of Darmstadt • INRA • EMBL-EBI • Agro-Know • LIBER • Univ. of Amsterdam • Open University UK • EPFL • CNIO • Univ. of Sheffield (GATE) • GESIS • GRNET • Frontiers • Univ. of Stirling PARTNERS OPENMINTEDPARTNERS FUTURETDM • SYNYO GmbH (SYNYO) • LIBER Europe • Open Knowledge Foundation LBG (OK/CM) • Radboud Univ. Nijmegen • The British Library Board • Univ. of Amsterdam • Athena RIC • Ubiquity Press • Fundacja Projekt: Polska (FPP)