SlideShare a Scribd company logo
1 of 26
Download to read offline
Overview
LIBER Conference
5 July 2017, Patras, Greece
Natalia Manola
Athena Research and Innovation Centre
The problem
PART I
A few sobering facts on content production
LIBER conference - PATRAS, 5 July 2017
ā— 1,8 billion websites & 3,46 billion internet users, on 25 September 2016.
ā— 24 million wireless sensors and actuators worldwide (553% up, between 2011 and
2016)
ā— 16 zettabytes of useful data (16 Trillion GB) by 2020
ā— YouTube claims to upload 24 hours of video every minute, making the site a
hugely significant data aggregator.
ā— Every second, on average, around 6,000 tweets are tweeted on Twitter, which
corresponds to over 350,000 tweets sent per minute, >500 million tweets per day
and around 200 billion tweets per year.
ā— 74,200,000 pages existed on Facebook, with 7 million apps and websites
integrated with Facebook on 30/5/2016
3
ā€¦ And some facts on scientific literature
LIBER conference - PATRAS, 5 July 2017
The global research community generates ~2.5 million new scholarly
articles per year (English only)
The STM report (2015)
ā€¦ some 90% of papers ā€¦ are never cited (82% in the humanities)
ā€¦ of those articles that are cited, only 20 percent have actually been read
ā€¦ 50% of papers are never read by anyone other than their authors,
referees and journal editors
Lokman I. Meho, The rise and rise of citation analysis, 2007
ā€¦ one paper published every 12seconds
ā€¦ 70,000 papers published on a single protein, the tumor suppressor p53
Spangler et al, Automated Hypothesis Generation based on Mining Scientific
Literature, 2014
4
How can we make sense of this data?
5
PART II
TDM - AN Emerging solution
Machine reading
process textual sources, organise and classify in various dimensions, extract
main (indexical) information items,
ā€¦ and ā€œunderstandingā€
identify and extract entities and relations between entities, facilitate the
transformation of unstructured textual sources into structured data
ā€¦ and predicting
enable the multidimensional analysis of structured data to extract meaningful
insights and improve the ability to predict
LIBER conference - PATRAS, 5 July 2017
6
However, ā€¦
Multitude of solutions catering for different
Text Types
Newswire
Scientific Literature
Tweets/blogs
Patents
Clinical/medical records
Textbooks, monographs
Online forums
ā€¦.
Languages
English
French
German
Spanish
Portuguese
Italian
Polish
ā€¦.
Tasks
Translation
Information Extraction
Semantic Search
Question Answering
Sentiment Analysis
Summarization
Knowledge Discovery
ā€¦.
Domains
Finance/Business
Health
Biology
Social Sciences
Humanities
ā€¦.
Creating a fragmented landscape
LIBER conference - PATRAS, 5 July 2017
7
A complex and fragmented Landscape
LIBER conference - PATRAS, 5 July 2017
Text Mining Researchers
Computing Infrastructures
Content Providers
End Users
8
The components
9
PART III
1. Share content
ā€¢ Document literature content
ā€¢ Share in a meaningful way: what does Open Access really mean?
IPR and licensing
ā€¢ Study IPR restrictions for reuse of sources as well as possible exceptions
ā€¢ Promote clarity and standardisation of legal rights and obligations
Challenges
ā€¢ Rights statement vs. Open licenses (for repositories)
ā€¢ No access to full text. We live in a metadata world
ā€¢ No standard protocols, formats and APIs for access and retrieval
ā€¢ No capacity to handle extra traffic
LIBER conference - PATRAS, 5 July 2017
10
Proposed solution : Make TDM enabled hubs
LIBER conference - PATRAS, 5 July 2017
11
Literature
Repositories
OA Journals
Data
Repositories
Aggregators
Archives
Metadata
Full text
Data
OpenAIRE
CORE
PMC Europe
ā€¦
Guidelines APIs
TDM
Research
networks
WIkiPedia/
Media/Research
ā€¦
Open Data
Open Protocols
OpenAIRE +
OpenMinTeD
2. Share TDM Services
ā€¢ Document language processing/text mining services and workflows in a
meaningful way for domain discipline researchers
ā€¢ Document language/knowledge resources, data categories taxonomies,
provenance information
Interoperable services
ā€¢ Common way of presenting annotated results
ā€¢ Combine services into workflows
ā€¢ Combine content and language resources with services and workflows
ā€¢ Combine automatic and manual/crowdsourcing annotation services
IPR and licensing
ā€¢ Translate the legal & policy aspects into specifications for lawful user-to-
service and service-to-service interactions
Challenges
ā€¢ Bring text miners close to the researcher problems and needs
ā€¢ Semantic interoperability (not just technical)
LIBER conference - PATRAS, 5 July 2017
12
3. Use/Share computing resources
ā€¢ Capacities and capabilities
Interoperable services at the lower level
ā€¢ Common way of deploying operations/jobs
ā€¢ Authentication and Authorisation services: Single Sign On (SSO)
ā€¢ Accounting
Challenges
ā€¢ Legal, organisational, ā€¦
LIBER conference - PATRAS, 5 July 2017
13
The OpenMinted platform
14
PART III
OpenMinted framework & focus
LIBER conference - PATRAS, 5 July 2017
15
OpenMinted sets out to create an open,
service-oriented e-Infrastructure for Text
and Data Mining (TDM) of scientific and
scholarly content.
ā€¦
Content/Corpora Services/tools Annotated corpora
Register and Discover TDM Services and tools
Link to Content hubs - Share corpora
Run a TDM job
Store, document, Publish and Share results (ANNOTATED CORPORA)
Our Services
16
LIBER conference - PATRAS, 5 July 2017
Build your own service ā€“ Combine components into a
Workflow and SHARE
key goalsapart from interoperability
Recognise that the results
of TDM, i.e., annotations, are
valuable research data that
should be preserved, shared,
re-used.
Scientiļ¬c publications are
data, and should abide to the
FAIR principles of data.
LIBER conference - PATRAS, 5 July 2017
who is openminted for
PART IV
End users as consumers
Domain specific researchers & research communities
Rather novice users and who want to ļ¬nd services (end to end) that ļ¬ll their
needs in an off the shelf type of situation. (>100.000)
Application developers / RI data scientists
Understand basic usage of NLP and TDM services, but not the details. They
know how to connect components, which content they must work on to get the
required results. They need to develop end to end applications. (>10.000)
Infrastructure operators
agnostic to the internal speciļ¬cs of TDM, but they need to integrate and
operate TDM services into daily workļ¬‚ows. (<100)
LIBER conference - PATRAS, 5 July 2017
content and services contributors
FOR Content
Publishers and repository managers (research libraries). (<1000)
For services
Expert language technology oriented people, who are using speciļ¬c
technologies and frameworks to develop and enhance their services. (< 500)
Non NLP expert developers, creating TDM modules based on off the shelf
libraries and tools (e.g. Python, Jupyter). Not familiar with NLP frameworks
and terminology but are eager to publish their small services. (<5.000)
LIBER conference - PATRAS, 5 July 2017
challenges
PART v
LIBER conference - PATRAS, 5 July 2017
interoperability
At which level of
component
A technical issue that
requires consensus
building
Legal issues at all
levels (IPRs,
liabilities)
Go Open!
Policies & rules of
engagement for
content /service
providers and
consumers
EOSC compatible
priorities
policies Legal issues
2 31
LIBER conference - PATRAS, 5 July 2017
Beta release in AUGUST 2017
REAL TIME Building
corpora:
OpenAIRE
CORE
Uploading OWN
corpora
Registering a
service
Running a service
Viewing annotations
Storing results in
zenodo
sustainability?
What is the role of the libraries
of (OA) publishers in TDM?
How does Open Access
translate to Open Science?
How can they help researchers
achieve the best in their
knowledge extraction
endeavour?
What is the role of e-
Infrastructures like OpenAIRE
and OpenMinTeD?
LIBER conference - PATRAS, 5 July 2017
Join us in the Openscience fair
athens, sept 6-8, 2017
www.opensciencefair.eu
THANK YOU!
Questions?
natalia manola
natalia@di.uoa.gr

More Related Content

What's hot

20190527_Brecht Wyns & Christophe Bahim _ FAIR data maturity model
20190527_Brecht Wyns & Christophe Bahim _ FAIR data maturity model20190527_Brecht Wyns & Christophe Bahim _ FAIR data maturity model
20190527_Brecht Wyns & Christophe Bahim _ FAIR data maturity modelOpenAIRE
Ā 
Horizon 2020 Open Research Data Pilot, Jean-Claude Burgelman, DG RTD European...
Horizon 2020 Open Research Data Pilot, Jean-Claude Burgelman, DG RTD European...Horizon 2020 Open Research Data Pilot, Jean-Claude Burgelman, DG RTD European...
Horizon 2020 Open Research Data Pilot, Jean-Claude Burgelman, DG RTD European...OpenAIRE
Ā 
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...BigData_Europe
Ā 
EDF2014: Christian Lindemann, Wolters Kluwer Germany & Christian Dirschl, Wol...
EDF2014: Christian Lindemann, Wolters Kluwer Germany & Christian Dirschl, Wol...EDF2014: Christian Lindemann, Wolters Kluwer Germany & Christian Dirschl, Wol...
EDF2014: Christian Lindemann, Wolters Kluwer Germany & Christian Dirschl, Wol...European Data Forum
Ā 
20190527_Helena Cousijn _ FREYA
20190527_Helena Cousijn _ FREYA20190527_Helena Cousijn _ FREYA
20190527_Helena Cousijn _ FREYAOpenAIRE
Ā 
OpenAIRE implementing open science
OpenAIRE implementing open scienceOpenAIRE implementing open science
OpenAIRE implementing open scienceJisc
Ā 
Open access to publications in Horizon 2020
Open access to publications in Horizon 2020Open access to publications in Horizon 2020
Open access to publications in Horizon 2020Jisc
Ā 
European open science cloud
European open science cloudEuropean open science cloud
European open science cloudJisc
Ā 
OpenAIRE-connect: Services for open science
OpenAIRE-connect: Services for open scienceOpenAIRE-connect: Services for open science
OpenAIRE-connect: Services for open scienceJisc
Ā 
EDF2014: Nikolaos Loutas, Manager at PwC Belgium, Business Models for Linked ...
EDF2014: Nikolaos Loutas, Manager at PwC Belgium, Business Models for Linked ...EDF2014: Nikolaos Loutas, Manager at PwC Belgium, Business Models for Linked ...
EDF2014: Nikolaos Loutas, Manager at PwC Belgium, Business Models for Linked ...European Data Forum
Ā 
20190527_Diego Chialva_ Research evaluation: the unseized opportunities ...
20190527_Diego Chialva_ Research evaluation: the unseized opportunities ...20190527_Diego Chialva_ Research evaluation: the unseized opportunities ...
20190527_Diego Chialva_ Research evaluation: the unseized opportunities ...OpenAIRE
Ā 
EDF2014: Allan Hanbury, Senior Researcher, Vienna University of Technology, A...
EDF2014: Allan Hanbury, Senior Researcher, Vienna University of Technology, A...EDF2014: Allan Hanbury, Senior Researcher, Vienna University of Technology, A...
EDF2014: Allan Hanbury, Senior Researcher, Vienna University of Technology, A...European Data Forum
Ā 
Big Data: Big Issues for IP
Big Data: Big Issues for IPBig Data: Big Issues for IP
Big Data: Big Issues for IPDr. Haxel Consult
Ā 
OpenAIRE-RDM@healthdata
OpenAIRE-RDM@healthdataOpenAIRE-RDM@healthdata
OpenAIRE-RDM@healthdataOpenAIRE
Ā 
FAIR data principles and data management plans - 31 Oct 2017
FAIR data principles and data management plans - 31 Oct 2017FAIR data principles and data management plans - 31 Oct 2017
FAIR data principles and data management plans - 31 Oct 2017ARDC
Ā 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...BigData_Europe
Ā 

What's hot (20)

An overview of piv initiatives(papaloi,gouscos)final21.5
An overview of piv initiatives(papaloi,gouscos)final21.5An overview of piv initiatives(papaloi,gouscos)final21.5
An overview of piv initiatives(papaloi,gouscos)final21.5
Ā 
20140521 presentation ce de mv3
20140521 presentation ce de mv320140521 presentation ce de mv3
20140521 presentation ce de mv3
Ā 
20190527_Brecht Wyns & Christophe Bahim _ FAIR data maturity model
20190527_Brecht Wyns & Christophe Bahim _ FAIR data maturity model20190527_Brecht Wyns & Christophe Bahim _ FAIR data maturity model
20190527_Brecht Wyns & Christophe Bahim _ FAIR data maturity model
Ā 
Horizon 2020 Open Research Data Pilot, Jean-Claude Burgelman, DG RTD European...
Horizon 2020 Open Research Data Pilot, Jean-Claude Burgelman, DG RTD European...Horizon 2020 Open Research Data Pilot, Jean-Claude Burgelman, DG RTD European...
Horizon 2020 Open Research Data Pilot, Jean-Claude Burgelman, DG RTD European...
Ā 
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Ā 
EDF2014: Christian Lindemann, Wolters Kluwer Germany & Christian Dirschl, Wol...
EDF2014: Christian Lindemann, Wolters Kluwer Germany & Christian Dirschl, Wol...EDF2014: Christian Lindemann, Wolters Kluwer Germany & Christian Dirschl, Wol...
EDF2014: Christian Lindemann, Wolters Kluwer Germany & Christian Dirschl, Wol...
Ā 
20190527_Helena Cousijn _ FREYA
20190527_Helena Cousijn _ FREYA20190527_Helena Cousijn _ FREYA
20190527_Helena Cousijn _ FREYA
Ā 
OpenAIRE implementing open science
OpenAIRE implementing open scienceOpenAIRE implementing open science
OpenAIRE implementing open science
Ā 
Open access to publications in Horizon 2020
Open access to publications in Horizon 2020Open access to publications in Horizon 2020
Open access to publications in Horizon 2020
Ā 
European open science cloud
European open science cloudEuropean open science cloud
European open science cloud
Ā 
OpenAIRE-connect: Services for open science
OpenAIRE-connect: Services for open scienceOpenAIRE-connect: Services for open science
OpenAIRE-connect: Services for open science
Ā 
Esociety presentation krems cedem 2014
Esociety presentation krems cedem 2014Esociety presentation krems cedem 2014
Esociety presentation krems cedem 2014
Ā 
EDF2014: Nikolaos Loutas, Manager at PwC Belgium, Business Models for Linked ...
EDF2014: Nikolaos Loutas, Manager at PwC Belgium, Business Models for Linked ...EDF2014: Nikolaos Loutas, Manager at PwC Belgium, Business Models for Linked ...
EDF2014: Nikolaos Loutas, Manager at PwC Belgium, Business Models for Linked ...
Ā 
20190527_Diego Chialva_ Research evaluation: the unseized opportunities ...
20190527_Diego Chialva_ Research evaluation: the unseized opportunities ...20190527_Diego Chialva_ Research evaluation: the unseized opportunities ...
20190527_Diego Chialva_ Research evaluation: the unseized opportunities ...
Ā 
EDF2014: Allan Hanbury, Senior Researcher, Vienna University of Technology, A...
EDF2014: Allan Hanbury, Senior Researcher, Vienna University of Technology, A...EDF2014: Allan Hanbury, Senior Researcher, Vienna University of Technology, A...
EDF2014: Allan Hanbury, Senior Researcher, Vienna University of Technology, A...
Ā 
Big Data: Big Issues for IP
Big Data: Big Issues for IPBig Data: Big Issues for IP
Big Data: Big Issues for IP
Ā 
OpenAIRE-RDM@healthdata
OpenAIRE-RDM@healthdataOpenAIRE-RDM@healthdata
OpenAIRE-RDM@healthdata
Ā 
Open data is only the beginning
Open data is only the beginningOpen data is only the beginning
Open data is only the beginning
Ā 
FAIR data principles and data management plans - 31 Oct 2017
FAIR data principles and data management plans - 31 Oct 2017FAIR data principles and data management plans - 31 Oct 2017
FAIR data principles and data management plans - 31 Oct 2017
Ā 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Ā 

Similar to Overview of Text and Data Mining Challenges and Solutions

Toward FAIR Semantic Resources
Toward FAIR Semantic ResourcesToward FAIR Semantic Resources
Toward FAIR Semantic ResourcesEUDAT
Ā 
OpenMinteD Project - building a TDM infrastructure
OpenMinteD Project - building a TDM infrastructureOpenMinteD Project - building a TDM infrastructure
OpenMinteD Project - building a TDM infrastructureFutureTDM
Ā 
Text Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open AccessText Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open Accessopenminted_eu
Ā 
Linked Open Data about Springer Nature conferences. The story so far
Linked Open Data about Springer Nature conferences. The story so farLinked Open Data about Springer Nature conferences. The story so far
Linked Open Data about Springer Nature conferences. The story so farAliaksandr Birukou
Ā 
Smart cities no ai without ia
Smart cities   no ai without iaSmart cities   no ai without ia
Smart cities no ai without iaFredric Landqvist
Ā 
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...OpenAIRE
Ā 
NordForsk Open Access Reykjavik 14-15/8-2014:Rda
NordForsk Open Access Reykjavik 14-15/8-2014:RdaNordForsk Open Access Reykjavik 14-15/8-2014:Rda
NordForsk Open Access Reykjavik 14-15/8-2014:RdaNordForsk
Ā 
CINECA webinar slides: FAIR software tools
CINECA webinar slides: FAIR software toolsCINECA webinar slides: FAIR software tools
CINECA webinar slides: FAIR software toolsCINECAProject
Ā 
QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...
QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...
QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...Georg Rehm
Ā 
Lider Reference Model ld4lt session March, 3rd, 2015
Lider Reference Model ld4lt session  March, 3rd, 2015Lider Reference Model ld4lt session  March, 3rd, 2015
Lider Reference Model ld4lt session March, 3rd, 2015Sebastian Hellmann
Ā 
TING.concept ELAG conference presentation 2010-06-09
TING.concept ELAG conference presentation  2010-06-09 TING.concept ELAG conference presentation  2010-06-09
TING.concept ELAG conference presentation 2010-06-09 hernvall
Ā 
Language Resources for Multilingual Europe
Language Resources for Multilingual EuropeLanguage Resources for Multilingual Europe
Language Resources for Multilingual EuropeGeorg Rehm
Ā 
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
Session 4.2   unleash the triple: leveraging a corporate discovery interface....Session 4.2   unleash the triple: leveraging a corporate discovery interface....
Session 4.2 unleash the triple: leveraging a corporate discovery interface....semanticsconference
Ā 
LD4LT Roadmap session 19_02_2015
LD4LT Roadmap session 19_02_2015LD4LT Roadmap session 19_02_2015
LD4LT Roadmap session 19_02_2015Sebastian Hellmann
Ā 
Lights Out, Translation is Datafied, by Jaap van der Meer (TAUS)
Lights Out, Translation is Datafied, by Jaap van der Meer (TAUS)Lights Out, Translation is Datafied, by Jaap van der Meer (TAUS)
Lights Out, Translation is Datafied, by Jaap van der Meer (TAUS)TAUS - The Language Data Network
Ā 
Open government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impactOpen government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impactElena Simperl
Ā 
Integrating Semantic Systems
Integrating Semantic SystemsIntegrating Semantic Systems
Integrating Semantic SystemsKingsley Uyi Idehen
Ā 
FAIR workshop Vienna
FAIR workshop ViennaFAIR workshop Vienna
FAIR workshop ViennaSarah Jones
Ā 
Update on the TKUN Project, by Professor Hitoshi Isahara, Toyohashi Universit...
Update on the TKUN Project, by Professor Hitoshi Isahara, Toyohashi Universit...Update on the TKUN Project, by Professor Hitoshi Isahara, Toyohashi Universit...
Update on the TKUN Project, by Professor Hitoshi Isahara, Toyohashi Universit...TAUS - The Language Data Network
Ā 
OpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of DataOpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of Dataopenminted_eu
Ā 

Similar to Overview of Text and Data Mining Challenges and Solutions (20)

Toward FAIR Semantic Resources
Toward FAIR Semantic ResourcesToward FAIR Semantic Resources
Toward FAIR Semantic Resources
Ā 
OpenMinteD Project - building a TDM infrastructure
OpenMinteD Project - building a TDM infrastructureOpenMinteD Project - building a TDM infrastructure
OpenMinteD Project - building a TDM infrastructure
Ā 
Text Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open AccessText Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open Access
Ā 
Linked Open Data about Springer Nature conferences. The story so far
Linked Open Data about Springer Nature conferences. The story so farLinked Open Data about Springer Nature conferences. The story so far
Linked Open Data about Springer Nature conferences. The story so far
Ā 
Smart cities no ai without ia
Smart cities   no ai without iaSmart cities   no ai without ia
Smart cities no ai without ia
Ā 
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
Ā 
NordForsk Open Access Reykjavik 14-15/8-2014:Rda
NordForsk Open Access Reykjavik 14-15/8-2014:RdaNordForsk Open Access Reykjavik 14-15/8-2014:Rda
NordForsk Open Access Reykjavik 14-15/8-2014:Rda
Ā 
CINECA webinar slides: FAIR software tools
CINECA webinar slides: FAIR software toolsCINECA webinar slides: FAIR software tools
CINECA webinar slides: FAIR software tools
Ā 
QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...
QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...
QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...
Ā 
Lider Reference Model ld4lt session March, 3rd, 2015
Lider Reference Model ld4lt session  March, 3rd, 2015Lider Reference Model ld4lt session  March, 3rd, 2015
Lider Reference Model ld4lt session March, 3rd, 2015
Ā 
TING.concept ELAG conference presentation 2010-06-09
TING.concept ELAG conference presentation  2010-06-09 TING.concept ELAG conference presentation  2010-06-09
TING.concept ELAG conference presentation 2010-06-09
Ā 
Language Resources for Multilingual Europe
Language Resources for Multilingual EuropeLanguage Resources for Multilingual Europe
Language Resources for Multilingual Europe
Ā 
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
Session 4.2   unleash the triple: leveraging a corporate discovery interface....Session 4.2   unleash the triple: leveraging a corporate discovery interface....
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
Ā 
LD4LT Roadmap session 19_02_2015
LD4LT Roadmap session 19_02_2015LD4LT Roadmap session 19_02_2015
LD4LT Roadmap session 19_02_2015
Ā 
Lights Out, Translation is Datafied, by Jaap van der Meer (TAUS)
Lights Out, Translation is Datafied, by Jaap van der Meer (TAUS)Lights Out, Translation is Datafied, by Jaap van der Meer (TAUS)
Lights Out, Translation is Datafied, by Jaap van der Meer (TAUS)
Ā 
Open government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impactOpen government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impact
Ā 
Integrating Semantic Systems
Integrating Semantic SystemsIntegrating Semantic Systems
Integrating Semantic Systems
Ā 
FAIR workshop Vienna
FAIR workshop ViennaFAIR workshop Vienna
FAIR workshop Vienna
Ā 
Update on the TKUN Project, by Professor Hitoshi Isahara, Toyohashi Universit...
Update on the TKUN Project, by Professor Hitoshi Isahara, Toyohashi Universit...Update on the TKUN Project, by Professor Hitoshi Isahara, Toyohashi Universit...
Update on the TKUN Project, by Professor Hitoshi Isahara, Toyohashi Universit...
Ā 
OpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of DataOpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of Data
Ā 

More from openminted_eu

Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...openminted_eu
Ā 
Seamless access to the world's open access research papers via resources sync
Seamless access to the world's open access research papers via resources syncSeamless access to the world's open access research papers via resources sync
Seamless access to the world's open access research papers via resources syncopenminted_eu
Ā 
Webinar slides: Interoperability between resources involved in TDM at the lev...
Webinar slides: Interoperability between resources involved in TDM at the lev...Webinar slides: Interoperability between resources involved in TDM at the lev...
Webinar slides: Interoperability between resources involved in TDM at the lev...openminted_eu
Ā 
Legal issues Text and Data Mining
Legal issues Text and Data MiningLegal issues Text and Data Mining
Legal issues Text and Data Miningopenminted_eu
Ā 
How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?openminted_eu
Ā 
Tentative steps in mining UK theses
Tentative steps in mining UK thesesTentative steps in mining UK theses
Tentative steps in mining UK thesesopenminted_eu
Ā 
OpenMinTeD - Repositories in the centre of new scientific knowledge
OpenMinTeD - Repositories in the centre of new scientific knowledgeOpenMinTeD - Repositories in the centre of new scientific knowledge
OpenMinTeD - Repositories in the centre of new scientific knowledgeopenminted_eu
Ā 
Jisc Text Mining Capabilities
Jisc Text Mining CapabilitiesJisc Text Mining Capabilities
Jisc Text Mining Capabilitiesopenminted_eu
Ā 
OpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social SciencesOpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social Sciencesopenminted_eu
Ā 
OpenMinTeD - Une infrastructure text-mining au service des scientifiques
OpenMinTeD - Une infrastructure text-mining au service des scientifiquesOpenMinTeD - Une infrastructure text-mining au service des scientifiques
OpenMinTeD - Une infrastructure text-mining au service des scientifiquesopenminted_eu
Ā 
The Future is All Mine
The Future is All MineThe Future is All Mine
The Future is All Mineopenminted_eu
Ā 
Infrastructure crossroads... and the way we walked them in DKPro
Infrastructure crossroads... and the way we walked them in DKProInfrastructure crossroads... and the way we walked them in DKPro
Infrastructure crossroads... and the way we walked them in DKProopenminted_eu
Ā 
Experiences of Text Mining; the National Library of Austria perspective
Experiences of Text Mining; the National Library of Austria perspectiveExperiences of Text Mining; the National Library of Austria perspective
Experiences of Text Mining; the National Library of Austria perspectiveopenminted_eu
Ā 
Text and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the NetherlandsText and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the Netherlandsopenminted_eu
Ā 
The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?openminted_eu
Ā 

More from openminted_eu (15)

Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...
Ā 
Seamless access to the world's open access research papers via resources sync
Seamless access to the world's open access research papers via resources syncSeamless access to the world's open access research papers via resources sync
Seamless access to the world's open access research papers via resources sync
Ā 
Webinar slides: Interoperability between resources involved in TDM at the lev...
Webinar slides: Interoperability between resources involved in TDM at the lev...Webinar slides: Interoperability between resources involved in TDM at the lev...
Webinar slides: Interoperability between resources involved in TDM at the lev...
Ā 
Legal issues Text and Data Mining
Legal issues Text and Data MiningLegal issues Text and Data Mining
Legal issues Text and Data Mining
Ā 
How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?
Ā 
Tentative steps in mining UK theses
Tentative steps in mining UK thesesTentative steps in mining UK theses
Tentative steps in mining UK theses
Ā 
OpenMinTeD - Repositories in the centre of new scientific knowledge
OpenMinTeD - Repositories in the centre of new scientific knowledgeOpenMinTeD - Repositories in the centre of new scientific knowledge
OpenMinTeD - Repositories in the centre of new scientific knowledge
Ā 
Jisc Text Mining Capabilities
Jisc Text Mining CapabilitiesJisc Text Mining Capabilities
Jisc Text Mining Capabilities
Ā 
OpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social SciencesOpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social Sciences
Ā 
OpenMinTeD - Une infrastructure text-mining au service des scientifiques
OpenMinTeD - Une infrastructure text-mining au service des scientifiquesOpenMinTeD - Une infrastructure text-mining au service des scientifiques
OpenMinTeD - Une infrastructure text-mining au service des scientifiques
Ā 
The Future is All Mine
The Future is All MineThe Future is All Mine
The Future is All Mine
Ā 
Infrastructure crossroads... and the way we walked them in DKPro
Infrastructure crossroads... and the way we walked them in DKProInfrastructure crossroads... and the way we walked them in DKPro
Infrastructure crossroads... and the way we walked them in DKPro
Ā 
Experiences of Text Mining; the National Library of Austria perspective
Experiences of Text Mining; the National Library of Austria perspectiveExperiences of Text Mining; the National Library of Austria perspective
Experiences of Text Mining; the National Library of Austria perspective
Ā 
Text and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the NetherlandsText and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the Netherlands
Ā 
The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?
Ā 

Recently uploaded

Call Girls in Majnu Ka Tilla Delhi šŸ”9711014705šŸ” Genuine
Call Girls in Majnu Ka Tilla Delhi šŸ”9711014705šŸ” GenuineCall Girls in Majnu Ka Tilla Delhi šŸ”9711014705šŸ” Genuine
Call Girls in Majnu Ka Tilla Delhi šŸ”9711014705šŸ” Genuinethapagita
Ā 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
Ā 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
Ā 
User Guide: Pulsarā„¢ Weather Station (Columbia Weather Systems)
User Guide: Pulsarā„¢ Weather Station (Columbia Weather Systems)User Guide: Pulsarā„¢ Weather Station (Columbia Weather Systems)
User Guide: Pulsarā„¢ Weather Station (Columbia Weather Systems)Columbia Weather Systems
Ā 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
Ā 
Best Call Girls In Sector 29 Gurgaonā¤ļø8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaonā¤ļø8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaonā¤ļø8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaonā¤ļø8860477959 EscorTs Service In 24/7 Delh...lizamodels9
Ā 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
Ā 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
Ā 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
Ā 
Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)itwameryclare
Ā 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
Ā 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
Ā 
怊Queenslandęƕäøšę–‡å‡­-ę˜†å£«å…°å¤§å­¦ęƕäøščÆęˆē»©å•ć€‹
怊Queenslandęƕäøšę–‡å‡­-ę˜†å£«å…°å¤§å­¦ęƕäøščÆęˆē»©å•ć€‹ć€ŠQueenslandęƕäøšę–‡å‡­-ę˜†å£«å…°å¤§å­¦ęƕäøščÆęˆē»©å•ć€‹
怊Queenslandęƕäøšę–‡å‡­-ę˜†å£«å…°å¤§å­¦ęƕäøščÆęˆē»©å•ć€‹rnrncn29
Ā 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
Ā 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
Ā 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
Ā 
User Guide: Orionā„¢ Weather Station (Columbia Weather Systems)
User Guide: Orionā„¢ Weather Station (Columbia Weather Systems)User Guide: Orionā„¢ Weather Station (Columbia Weather Systems)
User Guide: Orionā„¢ Weather Station (Columbia Weather Systems)Columbia Weather Systems
Ā 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -INandakishor Bhaurao Deshmukh
Ā 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
Ā 
REVISTA DE BIOLOGIA E CIƊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIƊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIƊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIƊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
Ā 

Recently uploaded (20)

Call Girls in Majnu Ka Tilla Delhi šŸ”9711014705šŸ” Genuine
Call Girls in Majnu Ka Tilla Delhi šŸ”9711014705šŸ” GenuineCall Girls in Majnu Ka Tilla Delhi šŸ”9711014705šŸ” Genuine
Call Girls in Majnu Ka Tilla Delhi šŸ”9711014705šŸ” Genuine
Ā 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
Ā 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Ā 
User Guide: Pulsarā„¢ Weather Station (Columbia Weather Systems)
User Guide: Pulsarā„¢ Weather Station (Columbia Weather Systems)User Guide: Pulsarā„¢ Weather Station (Columbia Weather Systems)
User Guide: Pulsarā„¢ Weather Station (Columbia Weather Systems)
Ā 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
Ā 
Best Call Girls In Sector 29 Gurgaonā¤ļø8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaonā¤ļø8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaonā¤ļø8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaonā¤ļø8860477959 EscorTs Service In 24/7 Delh...
Ā 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
Ā 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Ā 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Ā 
Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)
Ā 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
Ā 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
Ā 
怊Queenslandęƕäøšę–‡å‡­-ę˜†å£«å…°å¤§å­¦ęƕäøščÆęˆē»©å•ć€‹
怊Queenslandęƕäøšę–‡å‡­-ę˜†å£«å…°å¤§å­¦ęƕäøščÆęˆē»©å•ć€‹ć€ŠQueenslandęƕäøšę–‡å‡­-ę˜†å£«å…°å¤§å­¦ęƕäøščÆęˆē»©å•ć€‹
怊Queenslandęƕäøšę–‡å‡­-ę˜†å£«å…°å¤§å­¦ęƕäøščÆęˆē»©å•ć€‹
Ā 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
Ā 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Ā 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
Ā 
User Guide: Orionā„¢ Weather Station (Columbia Weather Systems)
User Guide: Orionā„¢ Weather Station (Columbia Weather Systems)User Guide: Orionā„¢ Weather Station (Columbia Weather Systems)
User Guide: Orionā„¢ Weather Station (Columbia Weather Systems)
Ā 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
Ā 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
Ā 
REVISTA DE BIOLOGIA E CIƊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIƊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIƊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIƊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
Ā 

Overview of Text and Data Mining Challenges and Solutions

  • 1. Overview LIBER Conference 5 July 2017, Patras, Greece Natalia Manola Athena Research and Innovation Centre
  • 3. A few sobering facts on content production LIBER conference - PATRAS, 5 July 2017 ā— 1,8 billion websites & 3,46 billion internet users, on 25 September 2016. ā— 24 million wireless sensors and actuators worldwide (553% up, between 2011 and 2016) ā— 16 zettabytes of useful data (16 Trillion GB) by 2020 ā— YouTube claims to upload 24 hours of video every minute, making the site a hugely significant data aggregator. ā— Every second, on average, around 6,000 tweets are tweeted on Twitter, which corresponds to over 350,000 tweets sent per minute, >500 million tweets per day and around 200 billion tweets per year. ā— 74,200,000 pages existed on Facebook, with 7 million apps and websites integrated with Facebook on 30/5/2016 3
  • 4. ā€¦ And some facts on scientific literature LIBER conference - PATRAS, 5 July 2017 The global research community generates ~2.5 million new scholarly articles per year (English only) The STM report (2015) ā€¦ some 90% of papers ā€¦ are never cited (82% in the humanities) ā€¦ of those articles that are cited, only 20 percent have actually been read ā€¦ 50% of papers are never read by anyone other than their authors, referees and journal editors Lokman I. Meho, The rise and rise of citation analysis, 2007 ā€¦ one paper published every 12seconds ā€¦ 70,000 papers published on a single protein, the tumor suppressor p53 Spangler et al, Automated Hypothesis Generation based on Mining Scientific Literature, 2014 4
  • 5. How can we make sense of this data? 5 PART II
  • 6. TDM - AN Emerging solution Machine reading process textual sources, organise and classify in various dimensions, extract main (indexical) information items, ā€¦ and ā€œunderstandingā€ identify and extract entities and relations between entities, facilitate the transformation of unstructured textual sources into structured data ā€¦ and predicting enable the multidimensional analysis of structured data to extract meaningful insights and improve the ability to predict LIBER conference - PATRAS, 5 July 2017 6
  • 7. However, ā€¦ Multitude of solutions catering for different Text Types Newswire Scientific Literature Tweets/blogs Patents Clinical/medical records Textbooks, monographs Online forums ā€¦. Languages English French German Spanish Portuguese Italian Polish ā€¦. Tasks Translation Information Extraction Semantic Search Question Answering Sentiment Analysis Summarization Knowledge Discovery ā€¦. Domains Finance/Business Health Biology Social Sciences Humanities ā€¦. Creating a fragmented landscape LIBER conference - PATRAS, 5 July 2017 7
  • 8. A complex and fragmented Landscape LIBER conference - PATRAS, 5 July 2017 Text Mining Researchers Computing Infrastructures Content Providers End Users 8
  • 10. 1. Share content ā€¢ Document literature content ā€¢ Share in a meaningful way: what does Open Access really mean? IPR and licensing ā€¢ Study IPR restrictions for reuse of sources as well as possible exceptions ā€¢ Promote clarity and standardisation of legal rights and obligations Challenges ā€¢ Rights statement vs. Open licenses (for repositories) ā€¢ No access to full text. We live in a metadata world ā€¢ No standard protocols, formats and APIs for access and retrieval ā€¢ No capacity to handle extra traffic LIBER conference - PATRAS, 5 July 2017 10
  • 11. Proposed solution : Make TDM enabled hubs LIBER conference - PATRAS, 5 July 2017 11 Literature Repositories OA Journals Data Repositories Aggregators Archives Metadata Full text Data OpenAIRE CORE PMC Europe ā€¦ Guidelines APIs TDM Research networks WIkiPedia/ Media/Research ā€¦ Open Data Open Protocols OpenAIRE + OpenMinTeD
  • 12. 2. Share TDM Services ā€¢ Document language processing/text mining services and workflows in a meaningful way for domain discipline researchers ā€¢ Document language/knowledge resources, data categories taxonomies, provenance information Interoperable services ā€¢ Common way of presenting annotated results ā€¢ Combine services into workflows ā€¢ Combine content and language resources with services and workflows ā€¢ Combine automatic and manual/crowdsourcing annotation services IPR and licensing ā€¢ Translate the legal & policy aspects into specifications for lawful user-to- service and service-to-service interactions Challenges ā€¢ Bring text miners close to the researcher problems and needs ā€¢ Semantic interoperability (not just technical) LIBER conference - PATRAS, 5 July 2017 12
  • 13. 3. Use/Share computing resources ā€¢ Capacities and capabilities Interoperable services at the lower level ā€¢ Common way of deploying operations/jobs ā€¢ Authentication and Authorisation services: Single Sign On (SSO) ā€¢ Accounting Challenges ā€¢ Legal, organisational, ā€¦ LIBER conference - PATRAS, 5 July 2017 13
  • 15. OpenMinted framework & focus LIBER conference - PATRAS, 5 July 2017 15 OpenMinted sets out to create an open, service-oriented e-Infrastructure for Text and Data Mining (TDM) of scientific and scholarly content. ā€¦ Content/Corpora Services/tools Annotated corpora
  • 16. Register and Discover TDM Services and tools Link to Content hubs - Share corpora Run a TDM job Store, document, Publish and Share results (ANNOTATED CORPORA) Our Services 16 LIBER conference - PATRAS, 5 July 2017 Build your own service ā€“ Combine components into a Workflow and SHARE
  • 17. key goalsapart from interoperability Recognise that the results of TDM, i.e., annotations, are valuable research data that should be preserved, shared, re-used. Scientiļ¬c publications are data, and should abide to the FAIR principles of data. LIBER conference - PATRAS, 5 July 2017
  • 18. who is openminted for PART IV
  • 19. End users as consumers Domain specific researchers & research communities Rather novice users and who want to ļ¬nd services (end to end) that ļ¬ll their needs in an off the shelf type of situation. (>100.000) Application developers / RI data scientists Understand basic usage of NLP and TDM services, but not the details. They know how to connect components, which content they must work on to get the required results. They need to develop end to end applications. (>10.000) Infrastructure operators agnostic to the internal speciļ¬cs of TDM, but they need to integrate and operate TDM services into daily workļ¬‚ows. (<100) LIBER conference - PATRAS, 5 July 2017
  • 20. content and services contributors FOR Content Publishers and repository managers (research libraries). (<1000) For services Expert language technology oriented people, who are using speciļ¬c technologies and frameworks to develop and enhance their services. (< 500) Non NLP expert developers, creating TDM modules based on off the shelf libraries and tools (e.g. Python, Jupyter). Not familiar with NLP frameworks and terminology but are eager to publish their small services. (<5.000) LIBER conference - PATRAS, 5 July 2017
  • 22. LIBER conference - PATRAS, 5 July 2017 interoperability At which level of component A technical issue that requires consensus building Legal issues at all levels (IPRs, liabilities) Go Open! Policies & rules of engagement for content /service providers and consumers EOSC compatible priorities policies Legal issues 2 31
  • 23. LIBER conference - PATRAS, 5 July 2017 Beta release in AUGUST 2017 REAL TIME Building corpora: OpenAIRE CORE Uploading OWN corpora Registering a service Running a service Viewing annotations Storing results in zenodo
  • 24. sustainability? What is the role of the libraries of (OA) publishers in TDM? How does Open Access translate to Open Science? How can they help researchers achieve the best in their knowledge extraction endeavour? What is the role of e- Infrastructures like OpenAIRE and OpenMinTeD? LIBER conference - PATRAS, 5 July 2017
  • 25. Join us in the Openscience fair athens, sept 6-8, 2017 www.opensciencefair.eu