SlideShare a Scribd company logo
Presentation’s Subtitle
#openminted_eu
beyond Open Access
Text Mining: the next data frontier
Natalia Manola
Athena Research & Innovation Centre
OpenCon Satellite Berlin, 25 Nov 2016
A few sobering facts on content production
OpenCon SatelliteBerlin, 25Nov 2016
● 1,8 billion websites & 3,46 billion internet users, on 25 September 2016.
● 24 million wireless sensors and actuators worldwide (553% up, between 2011
and 2016)
● 16 zettabytes of useful data (16 Trillion GB) by 2020
● YouTube claims to upload 24 hours of video every minute, making the site a
hugely significant data aggregator.
● Every second, on average, around 6,000 tweets are tweeted on Twitter, which
corresponds to over 350,000 tweets sent per minute, >500 million tweets per
day and around 200 billion tweets per year.
● 74,200,000 pages existed on Facebook, with 7 million apps and websites
integrated with Facebook on 30/5/2016
2
… And some facts on scientific literature
OpenCon SatelliteBerlin, 25Nov 2016
The global research community generates ~2.5 million new scholarly
articles per year (English only)
The STM report (2015)
… some 90% of papers … are never cited (82% in the humanities)
… of those articles that are cited, only 20 percent have actually been read
… 50% of papers are never read by anyone other than their authors,
referees and journal editors
Lokman I. Meho, The rise and rise of citation analysis, 2007
… one paper published every 12seconds
… 70,000 papers published on a single protein, the tumor suppressor p53
Spangler et al, Automated Hypothesis Generation based on Mining Scientific
Literature, 2014
3
How can we make sense
of this data?
OpenCon SatelliteBerlin, 25Nov 2016
4
Emerging solutions
Machine reading
process textual sources, organise and classify in various dimensions, extract
main (indexical) information items,
… and “understanding”
identify and extract entities and relations between entities, facilitate the
transformation of unstructured textual sources into structured data
… and predicting
enable the multidimensional analysis of structured data to extract meaningful
insights and improve the ability to predict
OpenCon SatelliteBerlin, 25Nov 2016
5
However, …
Multitude of solutions catering for different
Text Types
Newswire
Scientific Literature
Tweets/blogs
Patents
Clinical/medical records
Textbooks, monographs
Online forums
….
Languages
English
French
German
Spanish
Portuguese
Italian
Polish
….
Tasks
Translation
Information Extraction
Semantic Search
Question Answering
Sentiment Analysis
Summarization
Knowledge Discovery
….
Domains
Finance/Business
Health
Biology
Social Sciences
Humanities
….
Creating a fragmented landscape
OpenCon SatelliteBerlin, 25Nov 2016
6
A glimpse on the TDM landscape
OpenCon SatelliteBerlin, 25Nov 2016
7
Resource: FutureTDM project (www.fututetdm.eu)
What can we do?
8
1. Share content
• Document literature content
• Share in a meaningful way: what does Open Access really mean?
IPR and licensing
• Study IPR restrictions for reuse of sources as well as possible exceptions
• Promote clarity and standardisation of legal rights and obligations
Challenges
• Rights statement vs. Open licenses (for repositories)
• No access to full text. We live in a metadata world
• No standard protocols, formats and APIs for access and retrieval
• No capacity to handle extra traffic
OpenCon SatelliteBerlin, 25Nov 2016
9
Proposed solution : Make TDM enabled hubs
OpenCon SatelliteBerlin, 25Nov 2016
10
Literature
Repositories
OA Journals
Data
Repositories
Aggregators
Archives
Metadata
Full text
Data
OpenAIRE
CORE
PMC
Europe
…
Guidelines APIs
TDM
Research
networks
WIkiPedia/Med
ia/Research
…
2. Share TDM Services
• Document language processing/text mining services and workflows in a
meaningful way for domain discipline researchers
• Document language/knowledge resources, data categories taxonomies,
provenance information
Interoperable services
• Common way of presenting annotated results
• Combine services into workflows
• Combine content and language resources with services and workflows
• Combine automatic and manual/crowdsourcing annotation services
IPR and licensing
• Translate the legal & policy aspects into specifications for lawful user-to-
service and service-to-service interactions
Challenges
• Bring text miners close to the researcher problems and needs
• Semantic interoperability (not just technical)
OpenCon SatelliteBerlin, 25Nov 2016
11
OpenMinted
Establish an open and sustainable Text and Data
Mining (TDM) platform and infrastructure where
researchers can discover, collaboratively create, share
and re-use knowledge from a wide range of text based
scientific and scholarly related sources.
OpenCon SatelliteBerlin, 25Nov 2016
12
A step from Open Access to Open Science
HIGH LEVEL ARCHITECTURE
OpenCon SatelliteBerlin, 25Nov 2016
13
Policies &
guidelines
Register and Discover TDM Services and tools
Link to Content hubs
Run a TDM job and share results
Get people’s knowledge - Crowdsourced Annotation
Our Services
14
OpenCon SatelliteBerlin, 25Nov 2016
Build your own service – Combine components into
a Workflow and SHARE
Our Users
End users
• Researchers, data base curators, Research Infrastructure
operators
• Novice: use services to advance their science
• Advanced: use TDM components into complex workflows
OpenCon SatelliteBerlin, 25Nov 2016
15
Content and service providers
- Publishers, libraries, scientific data base centres, …
- TDM researchers
- SMEs
OpenCon SatelliteBerlin, 25Nov 2016
Scholarly
Comm.
Feature extraction
Data citation
Research analytics
Life Sciences
Curation of
databases and lexica
in Chembolomics &
neuroinformatics
Agriculture
Extracting
information from
tables for food safety
alerts
Social Sciences
Data citation
Community Driven
16
From the very beginning…
Requirements, content, barriers, expected outcomes.
… to the very end
Create applications, validate and evaluate the results.
Examples of OpenAIRE TDM services
we want to share
17
@openaire_eu
18
Discover research in context
OpenCon SatelliteBerlin, 25Nov 2016
19
Research Trends and correlations
Text and data mining with
domain specific knowledge
Interactive visualization for
drill-down information
…
Trends in science
Correlations of funding programmes
Within a funder, or
across countries
OpenCon SatelliteBerlin, 25Nov 2016
What will it look like?
20
the openminted registry
OpenCon SatelliteBerlin, 25Nov 2016
21
Browse tdm resources & tools/services
OpenCon SatelliteBerlin, 25Nov 2016
22
Register, document, share tools
OpenCon SatelliteBerlin, 25Nov 2016
23
Create your corpus, annotate, share
OpenCon SatelliteBerlin, 25Nov 2016
24
How does this all bind together?
OpenCon SatelliteBerlin, 25Nov 2016
25
OpenAIRE
CORE
CrossRef
…
OpenMinted REGISTRY
CLARIN
META-SHARE
OpenMinted WORKFLOWS
TDM TOOLS
Repositories
(OA) Journals
Other textual resources
e.g. medical records, PSI
How DOES open Science help?
Language
resources
…
What’s next
Participate with your ideas
• Give us your feedback on our pending guidelines and APIs
• Provide us with your TDM requirements – we have the
experts to consult you
• Register your TDM services
• Test out the system when it comes live (spring)
Watch out for
• OpenAIRE’s datathons, tenders and challenges (60K in total)
• OpenMinTeD’s tenders and challenges (240K in total)
OpenCon SatelliteBerlin, 25Nov 2016
26
twitter.com/openminted_eu
facebook.com/openminted
bit.do/openmintedlinkedin
vimeo.com/openminted
bit.do/openmintedplus
THANK YOU!
Natalia Manola
natalia@di.uoa.gr
twitter.com/openminted_eu
facebook.com/openminted
bit.do/openmintedlinkedin
vimeo.com/openminted
bit.do/openmintedplus27

More Related Content

What's hot

FutureTDM Roadmap
FutureTDM RoadmapFutureTDM Roadmap
FutureTDM Roadmap
FutureTDM
 
OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...
OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...
OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...
OpenAIRE
 
Technologies and infrastructures supporting text and data analytics: Challeng...
Technologies and infrastructures supporting text and data analytics: Challeng...Technologies and infrastructures supporting text and data analytics: Challeng...
Technologies and infrastructures supporting text and data analytics: Challeng...
FutureTDM
 
Zenodo - The catch-all repository
Zenodo - The catch-all repository Zenodo - The catch-all repository
Zenodo - The catch-all repository
OpenAccessBelgium
 
LIBER on the path towards Open Science: Libraries as enablers
LIBER on the path towards Open Science:  Libraries as enablers LIBER on the path towards Open Science:  Libraries as enablers
LIBER on the path towards Open Science: Libraries as enablers
LIBER Europe
 
Jisc Text Mining Capabilities
Jisc Text Mining CapabilitiesJisc Text Mining Capabilities
Jisc Text Mining Capabilities
openminted_eu
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
BigData_Europe
 
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
BigData_Europe
 
Belgium webinar - openAIRE Research Graph
Belgium webinar - openAIRE Research GraphBelgium webinar - openAIRE Research Graph
Belgium webinar - openAIRE Research Graph
OpenAccessBelgium
 
OpenAIRE – The path from OpenAIRE to EOSC in Belgium
OpenAIRE – The path from OpenAIRE to EOSC in BelgiumOpenAIRE – The path from OpenAIRE to EOSC in Belgium
OpenAIRE – The path from OpenAIRE to EOSC in Belgium
OpenAccessBelgium
 
Open science policy in flanders
Open science policy in flanders Open science policy in flanders
Open science policy in flanders
OpenAccessBelgium
 
A general introduction to the Europeana Cloud project
A general introduction to the Europeana Cloud project A general introduction to the Europeana Cloud project
A general introduction to the Europeana Cloud project
TU Delft, Netherlands
 
OpenAIRE Broker Service and the Dashboard for Content Providers
OpenAIRE Broker Service and the Dashboard for Content ProvidersOpenAIRE Broker Service and the Dashboard for Content Providers
OpenAIRE Broker Service and the Dashboard for Content Providers
OpenAIRE
 
Rebecca Grant - DRI Training Series: 1. Organising Your Collection
Rebecca Grant - DRI Training Series: 1. Organising Your Collection Rebecca Grant - DRI Training Series: 1. Organising Your Collection
Rebecca Grant - DRI Training Series: 1. Organising Your Collection
dri_ireland
 
OpenAIRE@info day_amsterdam_jan_2016
OpenAIRE@info day_amsterdam_jan_2016OpenAIRE@info day_amsterdam_jan_2016
OpenAIRE@info day_amsterdam_jan_2016
OpenAIRE
 
Marina Angelaki - PASTEUR4OA: Supporting Open Access Policies
Marina Angelaki - PASTEUR4OA: Supporting Open Access PoliciesMarina Angelaki - PASTEUR4OA: Supporting Open Access Policies
Marina Angelaki - PASTEUR4OA: Supporting Open Access Policies
OpenAIRE
 
Towards a European Research Information Infrastructure
Towards a European Research Information InfrastructureTowards a European Research Information Infrastructure
Towards a European Research Information Infrastructure
OpenAIRE
 
Toward FAIR Semantic Resources
Toward FAIR Semantic ResourcesToward FAIR Semantic Resources
Toward FAIR Semantic Resources
EUDAT
 
Open content opens up new avenues of research
Open content opens up new avenues of researchOpen content opens up new avenues of research
Open content opens up new avenues of research
Felix Lohmeier
 
(Big) bibliographic data @ ScaDS project meeting - 2015-06-12
(Big) bibliographic data @ ScaDS project meeting - 2015-06-12(Big) bibliographic data @ ScaDS project meeting - 2015-06-12
(Big) bibliographic data @ ScaDS project meeting - 2015-06-12
Felix Lohmeier
 

What's hot (20)

FutureTDM Roadmap
FutureTDM RoadmapFutureTDM Roadmap
FutureTDM Roadmap
 
OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...
OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...
OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...
 
Technologies and infrastructures supporting text and data analytics: Challeng...
Technologies and infrastructures supporting text and data analytics: Challeng...Technologies and infrastructures supporting text and data analytics: Challeng...
Technologies and infrastructures supporting text and data analytics: Challeng...
 
Zenodo - The catch-all repository
Zenodo - The catch-all repository Zenodo - The catch-all repository
Zenodo - The catch-all repository
 
LIBER on the path towards Open Science: Libraries as enablers
LIBER on the path towards Open Science:  Libraries as enablers LIBER on the path towards Open Science:  Libraries as enablers
LIBER on the path towards Open Science: Libraries as enablers
 
Jisc Text Mining Capabilities
Jisc Text Mining CapabilitiesJisc Text Mining Capabilities
Jisc Text Mining Capabilities
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
 
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
 
Belgium webinar - openAIRE Research Graph
Belgium webinar - openAIRE Research GraphBelgium webinar - openAIRE Research Graph
Belgium webinar - openAIRE Research Graph
 
OpenAIRE – The path from OpenAIRE to EOSC in Belgium
OpenAIRE – The path from OpenAIRE to EOSC in BelgiumOpenAIRE – The path from OpenAIRE to EOSC in Belgium
OpenAIRE – The path from OpenAIRE to EOSC in Belgium
 
Open science policy in flanders
Open science policy in flanders Open science policy in flanders
Open science policy in flanders
 
A general introduction to the Europeana Cloud project
A general introduction to the Europeana Cloud project A general introduction to the Europeana Cloud project
A general introduction to the Europeana Cloud project
 
OpenAIRE Broker Service and the Dashboard for Content Providers
OpenAIRE Broker Service and the Dashboard for Content ProvidersOpenAIRE Broker Service and the Dashboard for Content Providers
OpenAIRE Broker Service and the Dashboard for Content Providers
 
Rebecca Grant - DRI Training Series: 1. Organising Your Collection
Rebecca Grant - DRI Training Series: 1. Organising Your Collection Rebecca Grant - DRI Training Series: 1. Organising Your Collection
Rebecca Grant - DRI Training Series: 1. Organising Your Collection
 
OpenAIRE@info day_amsterdam_jan_2016
OpenAIRE@info day_amsterdam_jan_2016OpenAIRE@info day_amsterdam_jan_2016
OpenAIRE@info day_amsterdam_jan_2016
 
Marina Angelaki - PASTEUR4OA: Supporting Open Access Policies
Marina Angelaki - PASTEUR4OA: Supporting Open Access PoliciesMarina Angelaki - PASTEUR4OA: Supporting Open Access Policies
Marina Angelaki - PASTEUR4OA: Supporting Open Access Policies
 
Towards a European Research Information Infrastructure
Towards a European Research Information InfrastructureTowards a European Research Information Infrastructure
Towards a European Research Information Infrastructure
 
Toward FAIR Semantic Resources
Toward FAIR Semantic ResourcesToward FAIR Semantic Resources
Toward FAIR Semantic Resources
 
Open content opens up new avenues of research
Open content opens up new avenues of researchOpen content opens up new avenues of research
Open content opens up new avenues of research
 
(Big) bibliographic data @ ScaDS project meeting - 2015-06-12
(Big) bibliographic data @ ScaDS project meeting - 2015-06-12(Big) bibliographic data @ ScaDS project meeting - 2015-06-12
(Big) bibliographic data @ ScaDS project meeting - 2015-06-12
 

Similar to Text Mining: the next data frontier. Beyond Open Access

OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017
openminted_eu
 
20191119_The OpenAIRE Research Graph
20191119_The OpenAIRE Research Graph 20191119_The OpenAIRE Research Graph
20191119_The OpenAIRE Research Graph
OpenAIRE
 
OpenMinteD Project - building a TDM infrastructure
OpenMinteD Project - building a TDM infrastructureOpenMinteD Project - building a TDM infrastructure
OpenMinteD Project - building a TDM infrastructure
FutureTDM
 
OpenAIRE-connect: Services for open science
OpenAIRE-connect: Services for open scienceOpenAIRE-connect: Services for open science
OpenAIRE-connect: Services for open science
Jisc
 
Information search tools for engineers
Information search tools for engineersInformation search tools for engineers
Information search tools for engineers
Biblioteca del Campus Terrassa
 
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
Pedro Príncipe
 
Infraestructuras, recursos y servicios de OpenAIRE. OpenAIRE Workshop Spain, ...
Infraestructuras, recursos y servicios de OpenAIRE. OpenAIRE Workshop Spain, ...Infraestructuras, recursos y servicios de OpenAIRE. OpenAIRE Workshop Spain, ...
Infraestructuras, recursos y servicios de OpenAIRE. OpenAIRE Workshop Spain, ...
OpenAIRE
 
A user journey in OpenAIRE services through the lens of repository managers -...
A user journey in OpenAIRE services through the lens of repository managers -...A user journey in OpenAIRE services through the lens of repository managers -...
A user journey in OpenAIRE services through the lens of repository managers -...
OpenAIRE
 
ScienceOpen Presentation at COASP Paris 17-18 Sept 2014
ScienceOpen Presentation at COASP Paris 17-18 Sept 2014ScienceOpen Presentation at COASP Paris 17-18 Sept 2014
ScienceOpen Presentation at COASP Paris 17-18 Sept 2014
ScienceOpen
 
Connecting the dots - e-Infra services for open science
Connecting the dots - e-Infra services for open scienceConnecting the dots - e-Infra services for open science
Connecting the dots - e-Infra services for open science
OpenAIRE
 
Facilitate Research Communities Adoption of Open Science Publishing Principle...
Facilitate Research Communities Adoption of Open Science Publishing Principle...Facilitate Research Communities Adoption of Open Science Publishing Principle...
Facilitate Research Communities Adoption of Open Science Publishing Principle...
OpenAIRE
 
Does DH Scholarship Take Place in the Lab?
Does DH Scholarship Take Place in the Lab?Does DH Scholarship Take Place in the Lab?
Does DH Scholarship Take Place in the Lab?Shawn Day
 
Scientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an OverviewScientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an Overview
Angelo Salatino
 
OpenAIRE support and training activities (flash talk at the #DI4R2017 - sessi...
OpenAIRE support and training activities (flash talk at the #DI4R2017 - sessi...OpenAIRE support and training activities (flash talk at the #DI4R2017 - sessi...
OpenAIRE support and training activities (flash talk at the #DI4R2017 - sessi...
OpenAIRE
 
Open sciencerefresher2019
Open sciencerefresher2019Open sciencerefresher2019
Open sciencerefresher2019
heila1
 
Introduction to OpenAIRE services and the OpenAIRE Research Graph
Introduction to OpenAIRE services and the OpenAIRE Research GraphIntroduction to OpenAIRE services and the OpenAIRE Research Graph
Introduction to OpenAIRE services and the OpenAIRE Research Graph
OpenAIRE
 
Online promises beyond the policies: what's under the skin
Online promises beyond the policies: what's under the skin Online promises beyond the policies: what's under the skin
Online promises beyond the policies: what's under the skin
Nicolaie Constantinescu
 
An open science introduction. Olinfer 18, La havana, Cuba 12-14 nov 2018
An open science introduction.  Olinfer 18, La havana, Cuba 12-14 nov 2018An open science introduction.  Olinfer 18, La havana, Cuba 12-14 nov 2018
An open science introduction. Olinfer 18, La havana, Cuba 12-14 nov 2018
pascal aventurier
 
Open Science as-a-Service for research communities: preliminary results and u...
Open Science as-a-Service for research communities: preliminary results and u...Open Science as-a-Service for research communities: preliminary results and u...
Open Science as-a-Service for research communities: preliminary results and u...
OpenAIRE
 

Similar to Text Mining: the next data frontier. Beyond Open Access (20)

OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017
 
20191119_The OpenAIRE Research Graph
20191119_The OpenAIRE Research Graph 20191119_The OpenAIRE Research Graph
20191119_The OpenAIRE Research Graph
 
OpenMinteD Project - building a TDM infrastructure
OpenMinteD Project - building a TDM infrastructureOpenMinteD Project - building a TDM infrastructure
OpenMinteD Project - building a TDM infrastructure
 
OpenAIRE-connect: Services for open science
OpenAIRE-connect: Services for open scienceOpenAIRE-connect: Services for open science
OpenAIRE-connect: Services for open science
 
Information search tools for engineers
Information search tools for engineersInformation search tools for engineers
Information search tools for engineers
 
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
 
Infraestructuras, recursos y servicios de OpenAIRE. OpenAIRE Workshop Spain, ...
Infraestructuras, recursos y servicios de OpenAIRE. OpenAIRE Workshop Spain, ...Infraestructuras, recursos y servicios de OpenAIRE. OpenAIRE Workshop Spain, ...
Infraestructuras, recursos y servicios de OpenAIRE. OpenAIRE Workshop Spain, ...
 
A user journey in OpenAIRE services through the lens of repository managers -...
A user journey in OpenAIRE services through the lens of repository managers -...A user journey in OpenAIRE services through the lens of repository managers -...
A user journey in OpenAIRE services through the lens of repository managers -...
 
ScienceOpen Presentation at COASP Paris 17-18 Sept 2014
ScienceOpen Presentation at COASP Paris 17-18 Sept 2014ScienceOpen Presentation at COASP Paris 17-18 Sept 2014
ScienceOpen Presentation at COASP Paris 17-18 Sept 2014
 
Connecting the dots - e-Infra services for open science
Connecting the dots - e-Infra services for open scienceConnecting the dots - e-Infra services for open science
Connecting the dots - e-Infra services for open science
 
OKFN_OpenDataMx
OKFN_OpenDataMxOKFN_OpenDataMx
OKFN_OpenDataMx
 
Facilitate Research Communities Adoption of Open Science Publishing Principle...
Facilitate Research Communities Adoption of Open Science Publishing Principle...Facilitate Research Communities Adoption of Open Science Publishing Principle...
Facilitate Research Communities Adoption of Open Science Publishing Principle...
 
Does DH Scholarship Take Place in the Lab?
Does DH Scholarship Take Place in the Lab?Does DH Scholarship Take Place in the Lab?
Does DH Scholarship Take Place in the Lab?
 
Scientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an OverviewScientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an Overview
 
OpenAIRE support and training activities (flash talk at the #DI4R2017 - sessi...
OpenAIRE support and training activities (flash talk at the #DI4R2017 - sessi...OpenAIRE support and training activities (flash talk at the #DI4R2017 - sessi...
OpenAIRE support and training activities (flash talk at the #DI4R2017 - sessi...
 
Open sciencerefresher2019
Open sciencerefresher2019Open sciencerefresher2019
Open sciencerefresher2019
 
Introduction to OpenAIRE services and the OpenAIRE Research Graph
Introduction to OpenAIRE services and the OpenAIRE Research GraphIntroduction to OpenAIRE services and the OpenAIRE Research Graph
Introduction to OpenAIRE services and the OpenAIRE Research Graph
 
Online promises beyond the policies: what's under the skin
Online promises beyond the policies: what's under the skin Online promises beyond the policies: what's under the skin
Online promises beyond the policies: what's under the skin
 
An open science introduction. Olinfer 18, La havana, Cuba 12-14 nov 2018
An open science introduction.  Olinfer 18, La havana, Cuba 12-14 nov 2018An open science introduction.  Olinfer 18, La havana, Cuba 12-14 nov 2018
An open science introduction. Olinfer 18, La havana, Cuba 12-14 nov 2018
 
Open Science as-a-Service for research communities: preliminary results and u...
Open Science as-a-Service for research communities: preliminary results and u...Open Science as-a-Service for research communities: preliminary results and u...
Open Science as-a-Service for research communities: preliminary results and u...
 

More from openminted_eu

Supporting the uptake of TDM
Supporting the uptake of TDMSupporting the uptake of TDM
Supporting the uptake of TDM
openminted_eu
 
Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...
openminted_eu
 
Seamless access to the world's open access research papers via resources sync
Seamless access to the world's open access research papers via resources syncSeamless access to the world's open access research papers via resources sync
Seamless access to the world's open access research papers via resources sync
openminted_eu
 
Webinar slides: Interoperability between resources involved in TDM at the lev...
Webinar slides: Interoperability between resources involved in TDM at the lev...Webinar slides: Interoperability between resources involved in TDM at the lev...
Webinar slides: Interoperability between resources involved in TDM at the lev...
openminted_eu
 
Legal issues Text and Data Mining
Legal issues Text and Data MiningLegal issues Text and Data Mining
Legal issues Text and Data Mining
openminted_eu
 
Tentative steps in mining UK theses
Tentative steps in mining UK thesesTentative steps in mining UK theses
Tentative steps in mining UK theses
openminted_eu
 
OpenMinTeD - Une infrastructure text-mining au service des scientifiques
OpenMinTeD - Une infrastructure text-mining au service des scientifiquesOpenMinTeD - Une infrastructure text-mining au service des scientifiques
OpenMinTeD - Une infrastructure text-mining au service des scientifiques
openminted_eu
 
Infrastructure crossroads... and the way we walked them in DKPro
Infrastructure crossroads... and the way we walked them in DKProInfrastructure crossroads... and the way we walked them in DKPro
Infrastructure crossroads... and the way we walked them in DKPro
openminted_eu
 
Experiences of Text Mining; the National Library of Austria perspective
Experiences of Text Mining; the National Library of Austria perspectiveExperiences of Text Mining; the National Library of Austria perspective
Experiences of Text Mining; the National Library of Austria perspective
openminted_eu
 
Text and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the NetherlandsText and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the Netherlands
openminted_eu
 

More from openminted_eu (10)

Supporting the uptake of TDM
Supporting the uptake of TDMSupporting the uptake of TDM
Supporting the uptake of TDM
 
Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...
 
Seamless access to the world's open access research papers via resources sync
Seamless access to the world's open access research papers via resources syncSeamless access to the world's open access research papers via resources sync
Seamless access to the world's open access research papers via resources sync
 
Webinar slides: Interoperability between resources involved in TDM at the lev...
Webinar slides: Interoperability between resources involved in TDM at the lev...Webinar slides: Interoperability between resources involved in TDM at the lev...
Webinar slides: Interoperability between resources involved in TDM at the lev...
 
Legal issues Text and Data Mining
Legal issues Text and Data MiningLegal issues Text and Data Mining
Legal issues Text and Data Mining
 
Tentative steps in mining UK theses
Tentative steps in mining UK thesesTentative steps in mining UK theses
Tentative steps in mining UK theses
 
OpenMinTeD - Une infrastructure text-mining au service des scientifiques
OpenMinTeD - Une infrastructure text-mining au service des scientifiquesOpenMinTeD - Une infrastructure text-mining au service des scientifiques
OpenMinTeD - Une infrastructure text-mining au service des scientifiques
 
Infrastructure crossroads... and the way we walked them in DKPro
Infrastructure crossroads... and the way we walked them in DKProInfrastructure crossroads... and the way we walked them in DKPro
Infrastructure crossroads... and the way we walked them in DKPro
 
Experiences of Text Mining; the National Library of Austria perspective
Experiences of Text Mining; the National Library of Austria perspectiveExperiences of Text Mining; the National Library of Austria perspective
Experiences of Text Mining; the National Library of Austria perspective
 
Text and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the NetherlandsText and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the Netherlands
 

Recently uploaded

Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
sachin783648
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
muralinath2
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Richard Gill
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Scintica Instrumentation
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
Areesha Ahmad
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
AlguinaldoKong
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
DiyaBiswas10
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
aishnasrivastava
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
ssuserbfdca9
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
anitaento25
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 

Recently uploaded (20)

Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 

Text Mining: the next data frontier. Beyond Open Access

  • 1. Presentation’s Subtitle #openminted_eu beyond Open Access Text Mining: the next data frontier Natalia Manola Athena Research & Innovation Centre OpenCon Satellite Berlin, 25 Nov 2016
  • 2. A few sobering facts on content production OpenCon SatelliteBerlin, 25Nov 2016 ● 1,8 billion websites & 3,46 billion internet users, on 25 September 2016. ● 24 million wireless sensors and actuators worldwide (553% up, between 2011 and 2016) ● 16 zettabytes of useful data (16 Trillion GB) by 2020 ● YouTube claims to upload 24 hours of video every minute, making the site a hugely significant data aggregator. ● Every second, on average, around 6,000 tweets are tweeted on Twitter, which corresponds to over 350,000 tweets sent per minute, >500 million tweets per day and around 200 billion tweets per year. ● 74,200,000 pages existed on Facebook, with 7 million apps and websites integrated with Facebook on 30/5/2016 2
  • 3. … And some facts on scientific literature OpenCon SatelliteBerlin, 25Nov 2016 The global research community generates ~2.5 million new scholarly articles per year (English only) The STM report (2015) … some 90% of papers … are never cited (82% in the humanities) … of those articles that are cited, only 20 percent have actually been read … 50% of papers are never read by anyone other than their authors, referees and journal editors Lokman I. Meho, The rise and rise of citation analysis, 2007 … one paper published every 12seconds … 70,000 papers published on a single protein, the tumor suppressor p53 Spangler et al, Automated Hypothesis Generation based on Mining Scientific Literature, 2014 3
  • 4. How can we make sense of this data? OpenCon SatelliteBerlin, 25Nov 2016 4
  • 5. Emerging solutions Machine reading process textual sources, organise and classify in various dimensions, extract main (indexical) information items, … and “understanding” identify and extract entities and relations between entities, facilitate the transformation of unstructured textual sources into structured data … and predicting enable the multidimensional analysis of structured data to extract meaningful insights and improve the ability to predict OpenCon SatelliteBerlin, 25Nov 2016 5
  • 6. However, … Multitude of solutions catering for different Text Types Newswire Scientific Literature Tweets/blogs Patents Clinical/medical records Textbooks, monographs Online forums …. Languages English French German Spanish Portuguese Italian Polish …. Tasks Translation Information Extraction Semantic Search Question Answering Sentiment Analysis Summarization Knowledge Discovery …. Domains Finance/Business Health Biology Social Sciences Humanities …. Creating a fragmented landscape OpenCon SatelliteBerlin, 25Nov 2016 6
  • 7. A glimpse on the TDM landscape OpenCon SatelliteBerlin, 25Nov 2016 7 Resource: FutureTDM project (www.fututetdm.eu)
  • 8. What can we do? 8
  • 9. 1. Share content • Document literature content • Share in a meaningful way: what does Open Access really mean? IPR and licensing • Study IPR restrictions for reuse of sources as well as possible exceptions • Promote clarity and standardisation of legal rights and obligations Challenges • Rights statement vs. Open licenses (for repositories) • No access to full text. We live in a metadata world • No standard protocols, formats and APIs for access and retrieval • No capacity to handle extra traffic OpenCon SatelliteBerlin, 25Nov 2016 9
  • 10. Proposed solution : Make TDM enabled hubs OpenCon SatelliteBerlin, 25Nov 2016 10 Literature Repositories OA Journals Data Repositories Aggregators Archives Metadata Full text Data OpenAIRE CORE PMC Europe … Guidelines APIs TDM Research networks WIkiPedia/Med ia/Research …
  • 11. 2. Share TDM Services • Document language processing/text mining services and workflows in a meaningful way for domain discipline researchers • Document language/knowledge resources, data categories taxonomies, provenance information Interoperable services • Common way of presenting annotated results • Combine services into workflows • Combine content and language resources with services and workflows • Combine automatic and manual/crowdsourcing annotation services IPR and licensing • Translate the legal & policy aspects into specifications for lawful user-to- service and service-to-service interactions Challenges • Bring text miners close to the researcher problems and needs • Semantic interoperability (not just technical) OpenCon SatelliteBerlin, 25Nov 2016 11
  • 12. OpenMinted Establish an open and sustainable Text and Data Mining (TDM) platform and infrastructure where researchers can discover, collaboratively create, share and re-use knowledge from a wide range of text based scientific and scholarly related sources. OpenCon SatelliteBerlin, 25Nov 2016 12 A step from Open Access to Open Science
  • 13. HIGH LEVEL ARCHITECTURE OpenCon SatelliteBerlin, 25Nov 2016 13 Policies & guidelines
  • 14. Register and Discover TDM Services and tools Link to Content hubs Run a TDM job and share results Get people’s knowledge - Crowdsourced Annotation Our Services 14 OpenCon SatelliteBerlin, 25Nov 2016 Build your own service – Combine components into a Workflow and SHARE
  • 15. Our Users End users • Researchers, data base curators, Research Infrastructure operators • Novice: use services to advance their science • Advanced: use TDM components into complex workflows OpenCon SatelliteBerlin, 25Nov 2016 15 Content and service providers - Publishers, libraries, scientific data base centres, … - TDM researchers - SMEs
  • 16. OpenCon SatelliteBerlin, 25Nov 2016 Scholarly Comm. Feature extraction Data citation Research analytics Life Sciences Curation of databases and lexica in Chembolomics & neuroinformatics Agriculture Extracting information from tables for food safety alerts Social Sciences Data citation Community Driven 16 From the very beginning… Requirements, content, barriers, expected outcomes. … to the very end Create applications, validate and evaluate the results.
  • 17. Examples of OpenAIRE TDM services we want to share 17 @openaire_eu
  • 18. 18 Discover research in context OpenCon SatelliteBerlin, 25Nov 2016
  • 19. 19 Research Trends and correlations Text and data mining with domain specific knowledge Interactive visualization for drill-down information … Trends in science Correlations of funding programmes Within a funder, or across countries OpenCon SatelliteBerlin, 25Nov 2016
  • 20. What will it look like? 20
  • 21. the openminted registry OpenCon SatelliteBerlin, 25Nov 2016 21
  • 22. Browse tdm resources & tools/services OpenCon SatelliteBerlin, 25Nov 2016 22
  • 23. Register, document, share tools OpenCon SatelliteBerlin, 25Nov 2016 23
  • 24. Create your corpus, annotate, share OpenCon SatelliteBerlin, 25Nov 2016 24
  • 25. How does this all bind together? OpenCon SatelliteBerlin, 25Nov 2016 25 OpenAIRE CORE CrossRef … OpenMinted REGISTRY CLARIN META-SHARE OpenMinted WORKFLOWS TDM TOOLS Repositories (OA) Journals Other textual resources e.g. medical records, PSI How DOES open Science help? Language resources …
  • 26. What’s next Participate with your ideas • Give us your feedback on our pending guidelines and APIs • Provide us with your TDM requirements – we have the experts to consult you • Register your TDM services • Test out the system when it comes live (spring) Watch out for • OpenAIRE’s datathons, tenders and challenges (60K in total) • OpenMinTeD’s tenders and challenges (240K in total) OpenCon SatelliteBerlin, 25Nov 2016 26