OpenMinTeD, LIBER conference 2017

Overview
LIBER Conference
5 July 2017, Patras, Greece
Natalia Manola
Athena Research and Innovation Centre
The problem
PART I
A few sobering facts on content production
LIBER conference - PATRAS, 5 July 2017
● 1,8 billion websites & 3,46 billion internet users, on 25 September 2016.
● 24 million wireless sensors and actuators worldwide (553% up, between 2011 and
2016)
● 16 zettabytes of useful data (16 Trillion GB) by 2020
● YouTube claims to upload 24 hours of video every minute, making the site a
hugely significant data aggregator.
● Every second, on average, around 6,000 tweets are tweeted on Twitter, which
corresponds to over 350,000 tweets sent per minute, >500 million tweets per day
and around 200 billion tweets per year.
● 74,200,000 pages existed on Facebook, with 7 million apps and websites
integrated with Facebook on 30/5/2016
3
… And some facts on scientific literature
LIBER conference - PATRAS, 5 July 2017
The global research community generates ~2.5 million new scholarly
articles per year (English only)
The STM report (2015)
… some 90% of papers … are never cited (82% in the humanities)
… of those articles that are cited, only 20 percent have actually been read
… 50% of papers are never read by anyone other than their authors,
referees and journal editors
Lokman I. Meho, The rise and rise of citation analysis, 2007
… one paper published every 12seconds
… 70,000 papers published on a single protein, the tumor suppressor p53
Spangler et al, Automated Hypothesis Generation based on Mining Scientific
Literature, 2014
4
How can we make sense of this data?
5
PART II
TDM - AN Emerging solution
Machine reading
process textual sources, organise and classify in various dimensions, extract
main (indexical) information items,
… and “understanding”
identify and extract entities and relations between entities, facilitate the
transformation of unstructured textual sources into structured data
… and predicting
enable the multidimensional analysis of structured data to extract meaningful
insights and improve the ability to predict
LIBER conference - PATRAS, 5 July 2017
6
However, …
Multitude of solutions catering for different
Text Types
Newswire
Scientific Literature
Tweets/blogs
Patents
Clinical/medical records
Textbooks, monographs
Online forums
….
Languages
English
French
German
Spanish
Portuguese
Italian
Polish
….
Tasks
Translation
Information Extraction
Semantic Search
Question Answering
Sentiment Analysis
Summarization
Knowledge Discovery
….
Domains
Finance/Business
Health
Biology
Social Sciences
Humanities
….
Creating a fragmented landscape
LIBER conference - PATRAS, 5 July 2017
7
A complex and fragmented Landscape
LIBER conference - PATRAS, 5 July 2017
Text Mining Researchers
Computing Infrastructures
Content Providers
End Users
8
The components
9
PART III
1. Share content
• Document literature content
• Share in a meaningful way: what does Open Access really mean?
IPR and licensing
• Study IPR restrictions for reuse of sources as well as possible exceptions
• Promote clarity and standardisation of legal rights and obligations
Challenges
• Rights statement vs. Open licenses (for repositories)
• No access to full text. We live in a metadata world
• No standard protocols, formats and APIs for access and retrieval
• No capacity to handle extra traffic
LIBER conference - PATRAS, 5 July 2017
10
Proposed solution : Make TDM enabled hubs
LIBER conference - PATRAS, 5 July 2017
11
Literature
Repositories
OA Journals
Data
Repositories
Aggregators
Archives
Metadata
Full text
Data
OpenAIRE
CORE
PMC Europe
…
Guidelines APIs
TDM
Research
networks
WIkiPedia/
Media/Research
…
Open Data
Open Protocols
OpenAIRE +
OpenMinTeD
2. Share TDM Services
• Document language processing/text mining services and workflows in a
meaningful way for domain discipline researchers
• Document language/knowledge resources, data categories taxonomies,
provenance information
Interoperable services
• Common way of presenting annotated results
• Combine services into workflows
• Combine content and language resources with services and workflows
• Combine automatic and manual/crowdsourcing annotation services
IPR and licensing
• Translate the legal & policy aspects into specifications for lawful user-to-
service and service-to-service interactions
Challenges
• Bring text miners close to the researcher problems and needs
• Semantic interoperability (not just technical)
LIBER conference - PATRAS, 5 July 2017
12
3. Use/Share computing resources
• Capacities and capabilities
Interoperable services at the lower level
• Common way of deploying operations/jobs
• Authentication and Authorisation services: Single Sign On (SSO)
• Accounting
Challenges
• Legal, organisational, …
LIBER conference - PATRAS, 5 July 2017
13
The OpenMinted platform
14
PART III
OpenMinted framework & focus
LIBER conference - PATRAS, 5 July 2017
15
OpenMinted sets out to create an open,
service-oriented e-Infrastructure for Text
and Data Mining (TDM) of scientific and
scholarly content.
…
Content/Corpora Services/tools Annotated corpora
Register and Discover TDM Services and tools
Link to Content hubs - Share corpora
Run a TDM job
Store, document, Publish and Share results (ANNOTATED CORPORA)
Our Services
16
LIBER conference - PATRAS, 5 July 2017
Build your own service – Combine components into a
Workflow and SHARE
key goalsapart from interoperability
Recognise that the results
of TDM, i.e., annotations, are
valuable research data that
should be preserved, shared,
re-used.
Scientific publications are
data, and should abide to the
FAIR principles of data.
LIBER conference - PATRAS, 5 July 2017
who is openminted for
PART IV
End users as consumers
Domain specific researchers & research communities
Rather novice users and who want to find services (end to end) that fill their
needs in an off the shelf type of situation. (>100.000)
Application developers / RI data scientists
Understand basic usage of NLP and TDM services, but not the details. They
know how to connect components, which content they must work on to get the
required results. They need to develop end to end applications. (>10.000)
Infrastructure operators
agnostic to the internal specifics of TDM, but they need to integrate and
operate TDM services into daily workflows. (<100)
LIBER conference - PATRAS, 5 July 2017
content and services contributors
FOR Content
Publishers and repository managers (research libraries). (<1000)
For services
Expert language technology oriented people, who are using specific
technologies and frameworks to develop and enhance their services. (< 500)
Non NLP expert developers, creating TDM modules based on off the shelf
libraries and tools (e.g. Python, Jupyter). Not familiar with NLP frameworks
and terminology but are eager to publish their small services. (<5.000)
LIBER conference - PATRAS, 5 July 2017
challenges
PART v
LIBER conference - PATRAS, 5 July 2017
interoperability
At which level of
component
A technical issue that
requires consensus
building
Legal issues at all
levels (IPRs,
liabilities)
Go Open!
Policies & rules of
engagement for
content /service
providers and
consumers
EOSC compatible
priorities
policies Legal issues
2 31
LIBER conference - PATRAS, 5 July 2017
Beta release in AUGUST 2017
REAL TIME Building
corpora:
OpenAIRE
CORE
Uploading OWN
corpora
Registering a
service
Running a service
Viewing annotations
Storing results in
zenodo
sustainability?
What is the role of the libraries
of (OA) publishers in TDM?
How does Open Access
translate to Open Science?
How can they help researchers
achieve the best in their
knowledge extraction
endeavour?
What is the role of e-
Infrastructures like OpenAIRE
and OpenMinTeD?
LIBER conference - PATRAS, 5 July 2017
Join us in the Openscience fair
athens, sept 6-8, 2017
www.opensciencefair.eu
THANK YOU!
Questions?
natalia manola
natalia@di.uoa.gr
1 of 26

Recommended

Supporting the uptake of TDM by
Supporting the uptake of TDMSupporting the uptake of TDM
Supporting the uptake of TDMopenminted_eu
521 views23 slides
Connecting the dots - e-Infra services for open science by
Connecting the dots - e-Infra services for open scienceConnecting the dots - e-Infra services for open science
Connecting the dots - e-Infra services for open scienceOpenAIRE
1.2K views20 slides
What have we learned from talking with the TDM community? by
What have we learned from talking with the TDM community?What have we learned from talking with the TDM community?
What have we learned from talking with the TDM community?FutureTDM
432 views12 slides
So where are we now? The TDM landscape by
So where are we now? The TDM landscapeSo where are we now? The TDM landscape
So where are we now? The TDM landscapeFutureTDM
333 views8 slides
Rajendra Akerkar - LeMO Project by
Rajendra Akerkar - LeMO ProjectRajendra Akerkar - LeMO Project
Rajendra Akerkar - LeMO ProjectBigData_Europe
1.1K views12 slides
The legal factors by
The legal factorsThe legal factors
The legal factorsFutureTDM
436 views4 slides

More Related Content

What's hot

An overview of piv initiatives(papaloi,gouscos)final21.5 by
An overview of piv initiatives(papaloi,gouscos)final21.5An overview of piv initiatives(papaloi,gouscos)final21.5
An overview of piv initiatives(papaloi,gouscos)final21.5Danube University Krems, Centre for E-Governance
1.4K views35 slides
20190527_Brecht Wyns & Christophe Bahim _ FAIR data maturity model by
20190527_Brecht Wyns & Christophe Bahim _ FAIR data maturity model20190527_Brecht Wyns & Christophe Bahim _ FAIR data maturity model
20190527_Brecht Wyns & Christophe Bahim _ FAIR data maturity modelOpenAIRE
450 views24 slides
Horizon 2020 Open Research Data Pilot, Jean-Claude Burgelman, DG RTD European... by
Horizon 2020 Open Research Data Pilot, Jean-Claude Burgelman, DG RTD European...Horizon 2020 Open Research Data Pilot, Jean-Claude Burgelman, DG RTD European...
Horizon 2020 Open Research Data Pilot, Jean-Claude Burgelman, DG RTD European...OpenAIRE
1.8K views22 slides
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie... by
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...BigData_Europe
519 views14 slides
EDF2014: Christian Lindemann, Wolters Kluwer Germany & Christian Dirschl, Wol... by
EDF2014: Christian Lindemann, Wolters Kluwer Germany & Christian Dirschl, Wol...EDF2014: Christian Lindemann, Wolters Kluwer Germany & Christian Dirschl, Wol...
EDF2014: Christian Lindemann, Wolters Kluwer Germany & Christian Dirschl, Wol...European Data Forum
5.2K views19 slides

What's hot(20)

20190527_Brecht Wyns & Christophe Bahim _ FAIR data maturity model by OpenAIRE
20190527_Brecht Wyns & Christophe Bahim _ FAIR data maturity model20190527_Brecht Wyns & Christophe Bahim _ FAIR data maturity model
20190527_Brecht Wyns & Christophe Bahim _ FAIR data maturity model
OpenAIRE450 views
Horizon 2020 Open Research Data Pilot, Jean-Claude Burgelman, DG RTD European... by OpenAIRE
Horizon 2020 Open Research Data Pilot, Jean-Claude Burgelman, DG RTD European...Horizon 2020 Open Research Data Pilot, Jean-Claude Burgelman, DG RTD European...
Horizon 2020 Open Research Data Pilot, Jean-Claude Burgelman, DG RTD European...
OpenAIRE1.8K views
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie... by BigData_Europe
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
BigData_Europe519 views
EDF2014: Christian Lindemann, Wolters Kluwer Germany & Christian Dirschl, Wol... by European Data Forum
EDF2014: Christian Lindemann, Wolters Kluwer Germany & Christian Dirschl, Wol...EDF2014: Christian Lindemann, Wolters Kluwer Germany & Christian Dirschl, Wol...
EDF2014: Christian Lindemann, Wolters Kluwer Germany & Christian Dirschl, Wol...
European Data Forum5.2K views
20190527_Helena Cousijn _ FREYA by OpenAIRE
20190527_Helena Cousijn _ FREYA20190527_Helena Cousijn _ FREYA
20190527_Helena Cousijn _ FREYA
OpenAIRE213 views
OpenAIRE implementing open science by Jisc
OpenAIRE implementing open scienceOpenAIRE implementing open science
OpenAIRE implementing open science
Jisc289 views
Open access to publications in Horizon 2020 by Jisc
Open access to publications in Horizon 2020Open access to publications in Horizon 2020
Open access to publications in Horizon 2020
Jisc953 views
European open science cloud by Jisc
European open science cloudEuropean open science cloud
European open science cloud
Jisc309 views
OpenAIRE-connect: Services for open science by Jisc
OpenAIRE-connect: Services for open scienceOpenAIRE-connect: Services for open science
OpenAIRE-connect: Services for open science
Jisc294 views
EDF2014: Nikolaos Loutas, Manager at PwC Belgium, Business Models for Linked ... by European Data Forum
EDF2014: Nikolaos Loutas, Manager at PwC Belgium, Business Models for Linked ...EDF2014: Nikolaos Loutas, Manager at PwC Belgium, Business Models for Linked ...
EDF2014: Nikolaos Loutas, Manager at PwC Belgium, Business Models for Linked ...
European Data Forum2.6K views
20190527_Diego Chialva_ Research evaluation: the unseized opportunities ... by OpenAIRE
20190527_Diego Chialva_ Research evaluation: the unseized opportunities ...20190527_Diego Chialva_ Research evaluation: the unseized opportunities ...
20190527_Diego Chialva_ Research evaluation: the unseized opportunities ...
OpenAIRE461 views
EDF2014: Allan Hanbury, Senior Researcher, Vienna University of Technology, A... by European Data Forum
EDF2014: Allan Hanbury, Senior Researcher, Vienna University of Technology, A...EDF2014: Allan Hanbury, Senior Researcher, Vienna University of Technology, A...
EDF2014: Allan Hanbury, Senior Researcher, Vienna University of Technology, A...
European Data Forum2.6K views
OpenAIRE-RDM@healthdata by OpenAIRE
OpenAIRE-RDM@healthdataOpenAIRE-RDM@healthdata
OpenAIRE-RDM@healthdata
OpenAIRE767 views
FAIR data principles and data management plans - 31 Oct 2017 by ARDC
FAIR data principles and data management plans - 31 Oct 2017FAIR data principles and data management plans - 31 Oct 2017
FAIR data principles and data management plans - 31 Oct 2017
ARDC471 views
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A... by BigData_Europe
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
BigData_Europe423 views

Similar to OpenMinTeD, LIBER conference 2017

Toward FAIR Semantic Resources by
Toward FAIR Semantic ResourcesToward FAIR Semantic Resources
Toward FAIR Semantic ResourcesEUDAT
52 views30 slides
OpenMinteD Project - building a TDM infrastructure by
OpenMinteD Project - building a TDM infrastructureOpenMinteD Project - building a TDM infrastructure
OpenMinteD Project - building a TDM infrastructureFutureTDM
242 views17 slides
Text Mining: the next data frontier. Beyond Open Access by
Text Mining: the next data frontier. Beyond Open AccessText Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open Accessopenminted_eu
417 views27 slides
Linked Open Data about Springer Nature conferences. The story so far by
Linked Open Data about Springer Nature conferences. The story so farLinked Open Data about Springer Nature conferences. The story so far
Linked Open Data about Springer Nature conferences. The story so farAliaksandr Birukou
312 views43 slides
Smart cities no ai without ia by
Smart cities   no ai without iaSmart cities   no ai without ia
Smart cities no ai without iaFredric Landqvist
1.2K views68 slides
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta... by
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...OpenAIRE
838 views41 slides

Similar to OpenMinTeD, LIBER conference 2017(20)

Toward FAIR Semantic Resources by EUDAT
Toward FAIR Semantic ResourcesToward FAIR Semantic Resources
Toward FAIR Semantic Resources
EUDAT52 views
OpenMinteD Project - building a TDM infrastructure by FutureTDM
OpenMinteD Project - building a TDM infrastructureOpenMinteD Project - building a TDM infrastructure
OpenMinteD Project - building a TDM infrastructure
FutureTDM242 views
Text Mining: the next data frontier. Beyond Open Access by openminted_eu
Text Mining: the next data frontier. Beyond Open AccessText Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open Access
openminted_eu417 views
Linked Open Data about Springer Nature conferences. The story so far by Aliaksandr Birukou
Linked Open Data about Springer Nature conferences. The story so farLinked Open Data about Springer Nature conferences. The story so far
Linked Open Data about Springer Nature conferences. The story so far
Aliaksandr Birukou312 views
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta... by OpenAIRE
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
OpenAIRE838 views
NordForsk Open Access Reykjavik 14-15/8-2014:Rda by NordForsk
NordForsk Open Access Reykjavik 14-15/8-2014:RdaNordForsk Open Access Reykjavik 14-15/8-2014:Rda
NordForsk Open Access Reykjavik 14-15/8-2014:Rda
NordForsk281 views
CINECA webinar slides: FAIR software tools by CINECAProject
CINECA webinar slides: FAIR software toolsCINECA webinar slides: FAIR software tools
CINECA webinar slides: FAIR software tools
CINECAProject43 views
QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera... by Georg Rehm
QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...
QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...
Georg Rehm297 views
Lider Reference Model ld4lt session March, 3rd, 2015 by Sebastian Hellmann
Lider Reference Model ld4lt session  March, 3rd, 2015Lider Reference Model ld4lt session  March, 3rd, 2015
Lider Reference Model ld4lt session March, 3rd, 2015
Sebastian Hellmann1.2K views
TING.concept ELAG conference presentation 2010-06-09 by hernvall
TING.concept ELAG conference presentation  2010-06-09 TING.concept ELAG conference presentation  2010-06-09
TING.concept ELAG conference presentation 2010-06-09
hernvall521 views
Language Resources for Multilingual Europe by Georg Rehm
Language Resources for Multilingual EuropeLanguage Resources for Multilingual Europe
Language Resources for Multilingual Europe
Georg Rehm1.6K views
Session 4.2 unleash the triple: leveraging a corporate discovery interface.... by semanticsconference
Session 4.2   unleash the triple: leveraging a corporate discovery interface....Session 4.2   unleash the triple: leveraging a corporate discovery interface....
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
Open government data portals: from publishing to use and impact by Elena Simperl
Open government data portals: from publishing to use and impactOpen government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impact
Elena Simperl173 views
FAIR workshop Vienna by Sarah Jones
FAIR workshop ViennaFAIR workshop Vienna
FAIR workshop Vienna
Sarah Jones329 views
OpenMinTeD: Making Sense of Large Volumes of Data by openminted_eu
OpenMinTeD: Making Sense of Large Volumes of DataOpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of Data
openminted_eu481 views

More from openminted_eu

Resource sync overview and real-world use cases for discovery, harvesting, an... by
Resource sync overview and real-world use cases for discovery, harvesting, an...Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...openminted_eu
218 views67 slides
Seamless access to the world's open access research papers via resources sync by
Seamless access to the world's open access research papers via resources syncSeamless access to the world's open access research papers via resources sync
Seamless access to the world's open access research papers via resources syncopenminted_eu
434 views19 slides
Webinar slides: Interoperability between resources involved in TDM at the lev... by
Webinar slides: Interoperability between resources involved in TDM at the lev...Webinar slides: Interoperability between resources involved in TDM at the lev...
Webinar slides: Interoperability between resources involved in TDM at the lev...openminted_eu
238 views33 slides
Legal issues Text and Data Mining by
Legal issues Text and Data MiningLegal issues Text and Data Mining
Legal issues Text and Data Miningopenminted_eu
1.6K views10 slides
How can repositories support the text mining of their content and why? by
How can repositories support the text mining of their content and why?How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?openminted_eu
1.1K views20 slides
Tentative steps in mining UK theses by
Tentative steps in mining UK thesesTentative steps in mining UK theses
Tentative steps in mining UK thesesopenminted_eu
598 views20 slides

More from openminted_eu(15)

Resource sync overview and real-world use cases for discovery, harvesting, an... by openminted_eu
Resource sync overview and real-world use cases for discovery, harvesting, an...Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...
openminted_eu218 views
Seamless access to the world's open access research papers via resources sync by openminted_eu
Seamless access to the world's open access research papers via resources syncSeamless access to the world's open access research papers via resources sync
Seamless access to the world's open access research papers via resources sync
openminted_eu434 views
Webinar slides: Interoperability between resources involved in TDM at the lev... by openminted_eu
Webinar slides: Interoperability between resources involved in TDM at the lev...Webinar slides: Interoperability between resources involved in TDM at the lev...
Webinar slides: Interoperability between resources involved in TDM at the lev...
openminted_eu238 views
Legal issues Text and Data Mining by openminted_eu
Legal issues Text and Data MiningLegal issues Text and Data Mining
Legal issues Text and Data Mining
openminted_eu1.6K views
How can repositories support the text mining of their content and why? by openminted_eu
How can repositories support the text mining of their content and why?How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?
openminted_eu1.1K views
Tentative steps in mining UK theses by openminted_eu
Tentative steps in mining UK thesesTentative steps in mining UK theses
Tentative steps in mining UK theses
openminted_eu598 views
OpenMinTeD - Repositories in the centre of new scientific knowledge by openminted_eu
OpenMinTeD - Repositories in the centre of new scientific knowledgeOpenMinTeD - Repositories in the centre of new scientific knowledge
OpenMinTeD - Repositories in the centre of new scientific knowledge
openminted_eu573 views
Jisc Text Mining Capabilities by openminted_eu
Jisc Text Mining CapabilitiesJisc Text Mining Capabilities
Jisc Text Mining Capabilities
openminted_eu356 views
OpenMinted: It's Uses and Benefits for the Social Sciences by openminted_eu
OpenMinted: It's Uses and Benefits for the Social SciencesOpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social Sciences
openminted_eu1.4K views
OpenMinTeD - Une infrastructure text-mining au service des scientifiques by openminted_eu
OpenMinTeD - Une infrastructure text-mining au service des scientifiquesOpenMinTeD - Une infrastructure text-mining au service des scientifiques
OpenMinTeD - Une infrastructure text-mining au service des scientifiques
openminted_eu638 views
Infrastructure crossroads... and the way we walked them in DKPro by openminted_eu
Infrastructure crossroads... and the way we walked them in DKProInfrastructure crossroads... and the way we walked them in DKPro
Infrastructure crossroads... and the way we walked them in DKPro
openminted_eu505 views
Experiences of Text Mining; the National Library of Austria perspective by openminted_eu
Experiences of Text Mining; the National Library of Austria perspectiveExperiences of Text Mining; the National Library of Austria perspective
Experiences of Text Mining; the National Library of Austria perspective
openminted_eu554 views
Text and Data Mining at the Royal Library in the Netherlands by openminted_eu
Text and Data Mining at the Royal Library in the NetherlandsText and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the Netherlands
openminted_eu679 views
The Breakdown: What is OpenMinTeD? by openminted_eu
The Breakdown: What is OpenMinTeD?The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?
openminted_eu467 views

Recently uploaded

application of genetic engineering 2.pptx by
application of genetic engineering 2.pptxapplication of genetic engineering 2.pptx
application of genetic engineering 2.pptxSankSurezz
12 views12 slides
ELECTRON TRANSPORT CHAIN by
ELECTRON TRANSPORT CHAINELECTRON TRANSPORT CHAIN
ELECTRON TRANSPORT CHAINDEEKSHA RANI
9 views16 slides
Ecology by
Ecology Ecology
Ecology Abhijith Raj.R
10 views10 slides
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ... by
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...ILRI
5 views1 slide
Metatheoretical Panda-Samaneh Borji.pdf by
Metatheoretical Panda-Samaneh Borji.pdfMetatheoretical Panda-Samaneh Borji.pdf
Metatheoretical Panda-Samaneh Borji.pdfsamanehborji
16 views29 slides
Disinfectants & Antiseptic by
Disinfectants & AntisepticDisinfectants & Antiseptic
Disinfectants & AntisepticSanket P Shinde
52 views36 slides

Recently uploaded(20)

application of genetic engineering 2.pptx by SankSurezz
application of genetic engineering 2.pptxapplication of genetic engineering 2.pptx
application of genetic engineering 2.pptx
SankSurezz12 views
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ... by ILRI
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
ILRI5 views
Metatheoretical Panda-Samaneh Borji.pdf by samanehborji
Metatheoretical Panda-Samaneh Borji.pdfMetatheoretical Panda-Samaneh Borji.pdf
Metatheoretical Panda-Samaneh Borji.pdf
samanehborji16 views
Open Access Publishing in Astrophysics by Peter Coles
Open Access Publishing in AstrophysicsOpen Access Publishing in Astrophysics
Open Access Publishing in Astrophysics
Peter Coles1K views
PRINCIPLES-OF ASSESSMENT by rbalmagro
PRINCIPLES-OF ASSESSMENTPRINCIPLES-OF ASSESSMENT
PRINCIPLES-OF ASSESSMENT
rbalmagro12 views
별헤는 사람들 2023년 12월호 전명원 교수 자료 by sciencepeople
별헤는 사람들 2023년 12월호 전명원 교수 자료별헤는 사람들 2023년 12월호 전명원 교수 자료
별헤는 사람들 2023년 12월호 전명원 교수 자료
sciencepeople50 views
CSF -SHEEBA.D presentation.pptx by SheebaD7
CSF -SHEEBA.D presentation.pptxCSF -SHEEBA.D presentation.pptx
CSF -SHEEBA.D presentation.pptx
SheebaD714 views
Light Pollution for LVIS students by CWBarthlmew
Light Pollution for LVIS studentsLight Pollution for LVIS students
Light Pollution for LVIS students
CWBarthlmew7 views
Distinct distributions of elliptical and disk galaxies across the Local Super... by Sérgio Sacani
Distinct distributions of elliptical and disk galaxies across the Local Super...Distinct distributions of elliptical and disk galaxies across the Local Super...
Distinct distributions of elliptical and disk galaxies across the Local Super...
Sérgio Sacani33 views
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance... by InsideScientific
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...
InsideScientific67 views
Experimental animal Guinea pigs.pptx by Mansee Arya
Experimental animal Guinea pigs.pptxExperimental animal Guinea pigs.pptx
Experimental animal Guinea pigs.pptx
Mansee Arya28 views
Conventional and non-conventional methods for improvement of cucurbits.pptx by gandhi976
Conventional and non-conventional methods for improvement of cucurbits.pptxConventional and non-conventional methods for improvement of cucurbits.pptx
Conventional and non-conventional methods for improvement of cucurbits.pptx
gandhi97620 views
Synthesis and Characterization of Magnetite-Magnesium Sulphate-Sodium Dodecyl... by GIFT KIISI NKIN
Synthesis and Characterization of Magnetite-Magnesium Sulphate-Sodium Dodecyl...Synthesis and Characterization of Magnetite-Magnesium Sulphate-Sodium Dodecyl...
Synthesis and Characterization of Magnetite-Magnesium Sulphate-Sodium Dodecyl...
GIFT KIISI NKIN28 views

OpenMinTeD, LIBER conference 2017

  • 1. Overview LIBER Conference 5 July 2017, Patras, Greece Natalia Manola Athena Research and Innovation Centre
  • 3. A few sobering facts on content production LIBER conference - PATRAS, 5 July 2017 ● 1,8 billion websites & 3,46 billion internet users, on 25 September 2016. ● 24 million wireless sensors and actuators worldwide (553% up, between 2011 and 2016) ● 16 zettabytes of useful data (16 Trillion GB) by 2020 ● YouTube claims to upload 24 hours of video every minute, making the site a hugely significant data aggregator. ● Every second, on average, around 6,000 tweets are tweeted on Twitter, which corresponds to over 350,000 tweets sent per minute, >500 million tweets per day and around 200 billion tweets per year. ● 74,200,000 pages existed on Facebook, with 7 million apps and websites integrated with Facebook on 30/5/2016 3
  • 4. … And some facts on scientific literature LIBER conference - PATRAS, 5 July 2017 The global research community generates ~2.5 million new scholarly articles per year (English only) The STM report (2015) … some 90% of papers … are never cited (82% in the humanities) … of those articles that are cited, only 20 percent have actually been read … 50% of papers are never read by anyone other than their authors, referees and journal editors Lokman I. Meho, The rise and rise of citation analysis, 2007 … one paper published every 12seconds … 70,000 papers published on a single protein, the tumor suppressor p53 Spangler et al, Automated Hypothesis Generation based on Mining Scientific Literature, 2014 4
  • 5. How can we make sense of this data? 5 PART II
  • 6. TDM - AN Emerging solution Machine reading process textual sources, organise and classify in various dimensions, extract main (indexical) information items, … and “understanding” identify and extract entities and relations between entities, facilitate the transformation of unstructured textual sources into structured data … and predicting enable the multidimensional analysis of structured data to extract meaningful insights and improve the ability to predict LIBER conference - PATRAS, 5 July 2017 6
  • 7. However, … Multitude of solutions catering for different Text Types Newswire Scientific Literature Tweets/blogs Patents Clinical/medical records Textbooks, monographs Online forums …. Languages English French German Spanish Portuguese Italian Polish …. Tasks Translation Information Extraction Semantic Search Question Answering Sentiment Analysis Summarization Knowledge Discovery …. Domains Finance/Business Health Biology Social Sciences Humanities …. Creating a fragmented landscape LIBER conference - PATRAS, 5 July 2017 7
  • 8. A complex and fragmented Landscape LIBER conference - PATRAS, 5 July 2017 Text Mining Researchers Computing Infrastructures Content Providers End Users 8
  • 10. 1. Share content • Document literature content • Share in a meaningful way: what does Open Access really mean? IPR and licensing • Study IPR restrictions for reuse of sources as well as possible exceptions • Promote clarity and standardisation of legal rights and obligations Challenges • Rights statement vs. Open licenses (for repositories) • No access to full text. We live in a metadata world • No standard protocols, formats and APIs for access and retrieval • No capacity to handle extra traffic LIBER conference - PATRAS, 5 July 2017 10
  • 11. Proposed solution : Make TDM enabled hubs LIBER conference - PATRAS, 5 July 2017 11 Literature Repositories OA Journals Data Repositories Aggregators Archives Metadata Full text Data OpenAIRE CORE PMC Europe … Guidelines APIs TDM Research networks WIkiPedia/ Media/Research … Open Data Open Protocols OpenAIRE + OpenMinTeD
  • 12. 2. Share TDM Services • Document language processing/text mining services and workflows in a meaningful way for domain discipline researchers • Document language/knowledge resources, data categories taxonomies, provenance information Interoperable services • Common way of presenting annotated results • Combine services into workflows • Combine content and language resources with services and workflows • Combine automatic and manual/crowdsourcing annotation services IPR and licensing • Translate the legal & policy aspects into specifications for lawful user-to- service and service-to-service interactions Challenges • Bring text miners close to the researcher problems and needs • Semantic interoperability (not just technical) LIBER conference - PATRAS, 5 July 2017 12
  • 13. 3. Use/Share computing resources • Capacities and capabilities Interoperable services at the lower level • Common way of deploying operations/jobs • Authentication and Authorisation services: Single Sign On (SSO) • Accounting Challenges • Legal, organisational, … LIBER conference - PATRAS, 5 July 2017 13
  • 15. OpenMinted framework & focus LIBER conference - PATRAS, 5 July 2017 15 OpenMinted sets out to create an open, service-oriented e-Infrastructure for Text and Data Mining (TDM) of scientific and scholarly content. … Content/Corpora Services/tools Annotated corpora
  • 16. Register and Discover TDM Services and tools Link to Content hubs - Share corpora Run a TDM job Store, document, Publish and Share results (ANNOTATED CORPORA) Our Services 16 LIBER conference - PATRAS, 5 July 2017 Build your own service – Combine components into a Workflow and SHARE
  • 17. key goalsapart from interoperability Recognise that the results of TDM, i.e., annotations, are valuable research data that should be preserved, shared, re-used. Scientific publications are data, and should abide to the FAIR principles of data. LIBER conference - PATRAS, 5 July 2017
  • 18. who is openminted for PART IV
  • 19. End users as consumers Domain specific researchers & research communities Rather novice users and who want to find services (end to end) that fill their needs in an off the shelf type of situation. (>100.000) Application developers / RI data scientists Understand basic usage of NLP and TDM services, but not the details. They know how to connect components, which content they must work on to get the required results. They need to develop end to end applications. (>10.000) Infrastructure operators agnostic to the internal specifics of TDM, but they need to integrate and operate TDM services into daily workflows. (<100) LIBER conference - PATRAS, 5 July 2017
  • 20. content and services contributors FOR Content Publishers and repository managers (research libraries). (<1000) For services Expert language technology oriented people, who are using specific technologies and frameworks to develop and enhance their services. (< 500) Non NLP expert developers, creating TDM modules based on off the shelf libraries and tools (e.g. Python, Jupyter). Not familiar with NLP frameworks and terminology but are eager to publish their small services. (<5.000) LIBER conference - PATRAS, 5 July 2017
  • 22. LIBER conference - PATRAS, 5 July 2017 interoperability At which level of component A technical issue that requires consensus building Legal issues at all levels (IPRs, liabilities) Go Open! Policies & rules of engagement for content /service providers and consumers EOSC compatible priorities policies Legal issues 2 31
  • 23. LIBER conference - PATRAS, 5 July 2017 Beta release in AUGUST 2017 REAL TIME Building corpora: OpenAIRE CORE Uploading OWN corpora Registering a service Running a service Viewing annotations Storing results in zenodo
  • 24. sustainability? What is the role of the libraries of (OA) publishers in TDM? How does Open Access translate to Open Science? How can they help researchers achieve the best in their knowledge extraction endeavour? What is the role of e- Infrastructures like OpenAIRE and OpenMinTeD? LIBER conference - PATRAS, 5 July 2017
  • 25. Join us in the Openscience fair athens, sept 6-8, 2017 www.opensciencefair.eu