SlideShare a Scribd company logo
1
TDM: Unlocking the hidden
potential from scholarly
content
2
Until recently, text mining has mostly been
restricted to post-publication PDFs and has
proved slow and difficult. The focus for scholarly
content has often been limited to metadata and
abstracts.
TDM is evolving to extract a wealth of
information that can support the entire scholarly
community – from authors to publishers.
Making sense of unstructured
content
3
Landscape
4
6% YoY growth in manuscript submissions
42% authors post their preprint before
journal submission
300% increase in the number of preprint
servers since 2015
The research keeps growing
Published work and preprints
6%
300%
42%
5
Too many manuscripts. Not enough time.
Submission to publication time expanding.
48 Hours
First review
round
Submission to
publication
Screening
13 Weeks 400 Days
6
XML often made available for Open Access articles, but not all publishers make XML
available to TDM services (API).
Rise of preprint servers and number of journals inviting article submission via these
servers increases need to mine non-XML content.
Most authors still submit manuscripts to publishers & preprint servers in Word or
PDF.
Some servers convert content into XML, but majority of platforms only allow for the
preprint to be downloaded in the same format it was uploaded in.
The format challenge
7
Software used by authors
Word still the preferred format
Writing software used by authors submitting to bioRxiv.
Source: Sever et al (2019) bioRxiv: the preprint server for biology. https://dx.doi.org/10.1101/833400
8
Format shouldn’t matter
9
Extracting structured content from any document
Dixon WG, Beukenhorst AL, Yimer BB et al. 2019. doi:10.1038/s41746-019-
0180-3
Content extracted to a structured format
10
Distilling research into headlines and key information
Rosyadi S, Haryanto A. 2019. doi:10.31124/advance.9989639.v1 Distillation to unified format
11
Opportunities
12
Manuscript
submission
Manuscript
screening
Peer review
Promotion
TDM: What are the opportunities?
TDM can work at any stage of the publishing process, opening up a huge number of opportunities from
manuscript drafting and screening to promoting the published article.
13
• Metadata extraction to automate
population of submissions system (Title,
author, affiliations, abstract, keywords).
• Reduces author friction / duplication of
effort.
• Previous work in this area has focused on
the biomedical domain, but this
opportunity can apply to any domain.
Automating submissions process
14
• Data extraction for manuscript screening
(key methods, results, sample size,
participants, ethical compliance etc.)
• Clear article context/overview for
reviewers.
• One-click access of cited sources & main
findings.
• Table extraction for analysis of statistical
calculations.
Speeding up peer review
15
Surfacing cited sources & their main findings
Krohn L, Ruskey JA, Rudakou U et al. 2019. doi:10.1101/19010991 Cited sources and their main findings surfaced
16
• Extract, parse and link citations from
archives dating back hundreds of years.
• Large scale reference population of open
citation networks (BMJ Case study)
• Improve exposure/discovery of older
research.
Exposing more content through
citation networks
17
What’s needed?
18
How publishers can help.
Make XML available for all Open Access articles rather than just the final
PDF for text mining.
Enrich citation networks with additional content (e.g. abstract,
highlights) in a machine-readable format.
Make all cited sources more easily verifiable for authors and
researchers.
Converting articles & preprints into a universally structured format for
more effective TDM. Allow authors to write articles natively in a
machine-readable format.
1
2
3
4
19
…equal rights for friendly bots!
And finally…

More Related Content

What's hot

Data Citation: A Critical Role for Publishers
Data Citation: A Critical Role for PublishersData Citation: A Critical Role for Publishers
Data Citation: A Critical Role for Publishers
Brian Hole
 
Data availability and feasibility of validation – A genomics case study
Data availability and feasibility of validation – A genomics case studyData availability and feasibility of validation – A genomics case study
Data availability and feasibility of validation – A genomics case study
Verena139
 
2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery
National Information Standards Organization (NISO)
 
Citation Analysis for the Free, Online Literature
Citation Analysis for the Free, Online LiteratureCitation Analysis for the Free, Online Literature
Citation Analysis for the Free, Online Literature
Balachandar Radhakrishnan
 
UKSG 2018 Breakout - Trouble(shooting) with a capital T: how categorising and...
UKSG 2018 Breakout - Trouble(shooting) with a capital T: how categorising and...UKSG 2018 Breakout - Trouble(shooting) with a capital T: how categorising and...
UKSG 2018 Breakout - Trouble(shooting) with a capital T: how categorising and...
UKSG: connecting the knowledge community
 
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
NASIG
 
Where you should publish
Where you should publishWhere you should publish
Where you should publish
Jason Price, PhD
 
2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery
National Information Standards Organization (NISO)
 
Data Metadata and Data Citation - Emma Ganley (PLoS)
Data Metadata and Data Citation - Emma Ganley (PLoS)Data Metadata and Data Citation - Emma Ganley (PLoS)
Data Metadata and Data Citation - Emma Ganley (PLoS)
National Information Standards Organization (NISO)
 
2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery
National Information Standards Organization (NISO)
 
COVID-19 and Changing Paradigm in Scholarly communication
COVID-19 and Changing Paradigm in Scholarly communication COVID-19 and Changing Paradigm in Scholarly communication
COVID-19 and Changing Paradigm in Scholarly communication
Vasantha Raju N
 
Oct 14 NISO Webinar: Cloud and Web Services for Libraries
Oct 14 NISO Webinar: Cloud and Web Services for LibrariesOct 14 NISO Webinar: Cloud and Web Services for Libraries
Oct 14 NISO Webinar: Cloud and Web Services for Libraries
National Information Standards Organization (NISO)
 
2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery
National Information Standards Organization (NISO)
 
How Accessible Is Our Collection? Performing an E-Resources Accessibility Review
How Accessible Is Our Collection? Performing an E-Resources Accessibility ReviewHow Accessible Is Our Collection? Performing an E-Resources Accessibility Review
How Accessible Is Our Collection? Performing an E-Resources Accessibility Review
NASIG
 
Advancing the International Plant Names Index (IPNI)
Advancing the International Plant Names Index (IPNI) Advancing the International Plant Names Index (IPNI)
Advancing the International Plant Names Index (IPNI)
nickyn
 
CI4CC sustainability-panel
CI4CC sustainability-panelCI4CC sustainability-panel
CI4CC sustainability-panel
Ravi Madduri
 
Fox-Keynote-Now and Now of Data Publishing-nfdp13
Fox-Keynote-Now and Now of Data Publishing-nfdp13Fox-Keynote-Now and Now of Data Publishing-nfdp13
Fox-Keynote-Now and Now of Data Publishing-nfdp13
DataDryad
 
The Intersection of InterLibrary Loan and Acquisition Models: A review of rec...
The Intersection of InterLibrary Loan and Acquisition Models: A review of rec...The Intersection of InterLibrary Loan and Acquisition Models: A review of rec...
The Intersection of InterLibrary Loan and Acquisition Models: A review of rec...
NASIG
 
Biosharing sansone-dryad-may13
Biosharing sansone-dryad-may13Biosharing sansone-dryad-may13
Biosharing sansone-dryad-may13
Susanna-Assunta Sansone
 
2 flash presentations for annual meeting tdm and cross check final
2 flash presentations for annual meeting tdm and cross check final2 flash presentations for annual meeting tdm and cross check final
2 flash presentations for annual meeting tdm and cross check final
Crossref
 

What's hot (20)

Data Citation: A Critical Role for Publishers
Data Citation: A Critical Role for PublishersData Citation: A Critical Role for Publishers
Data Citation: A Critical Role for Publishers
 
Data availability and feasibility of validation – A genomics case study
Data availability and feasibility of validation – A genomics case studyData availability and feasibility of validation – A genomics case study
Data availability and feasibility of validation – A genomics case study
 
2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery
 
Citation Analysis for the Free, Online Literature
Citation Analysis for the Free, Online LiteratureCitation Analysis for the Free, Online Literature
Citation Analysis for the Free, Online Literature
 
UKSG 2018 Breakout - Trouble(shooting) with a capital T: how categorising and...
UKSG 2018 Breakout - Trouble(shooting) with a capital T: how categorising and...UKSG 2018 Breakout - Trouble(shooting) with a capital T: how categorising and...
UKSG 2018 Breakout - Trouble(shooting) with a capital T: how categorising and...
 
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
 
Where you should publish
Where you should publishWhere you should publish
Where you should publish
 
2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery
 
Data Metadata and Data Citation - Emma Ganley (PLoS)
Data Metadata and Data Citation - Emma Ganley (PLoS)Data Metadata and Data Citation - Emma Ganley (PLoS)
Data Metadata and Data Citation - Emma Ganley (PLoS)
 
2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery
 
COVID-19 and Changing Paradigm in Scholarly communication
COVID-19 and Changing Paradigm in Scholarly communication COVID-19 and Changing Paradigm in Scholarly communication
COVID-19 and Changing Paradigm in Scholarly communication
 
Oct 14 NISO Webinar: Cloud and Web Services for Libraries
Oct 14 NISO Webinar: Cloud and Web Services for LibrariesOct 14 NISO Webinar: Cloud and Web Services for Libraries
Oct 14 NISO Webinar: Cloud and Web Services for Libraries
 
2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery
 
How Accessible Is Our Collection? Performing an E-Resources Accessibility Review
How Accessible Is Our Collection? Performing an E-Resources Accessibility ReviewHow Accessible Is Our Collection? Performing an E-Resources Accessibility Review
How Accessible Is Our Collection? Performing an E-Resources Accessibility Review
 
Advancing the International Plant Names Index (IPNI)
Advancing the International Plant Names Index (IPNI) Advancing the International Plant Names Index (IPNI)
Advancing the International Plant Names Index (IPNI)
 
CI4CC sustainability-panel
CI4CC sustainability-panelCI4CC sustainability-panel
CI4CC sustainability-panel
 
Fox-Keynote-Now and Now of Data Publishing-nfdp13
Fox-Keynote-Now and Now of Data Publishing-nfdp13Fox-Keynote-Now and Now of Data Publishing-nfdp13
Fox-Keynote-Now and Now of Data Publishing-nfdp13
 
The Intersection of InterLibrary Loan and Acquisition Models: A review of rec...
The Intersection of InterLibrary Loan and Acquisition Models: A review of rec...The Intersection of InterLibrary Loan and Acquisition Models: A review of rec...
The Intersection of InterLibrary Loan and Acquisition Models: A review of rec...
 
Biosharing sansone-dryad-may13
Biosharing sansone-dryad-may13Biosharing sansone-dryad-may13
Biosharing sansone-dryad-may13
 
2 flash presentations for annual meeting tdm and cross check final
2 flash presentations for annual meeting tdm and cross check final2 flash presentations for annual meeting tdm and cross check final
2 flash presentations for annual meeting tdm and cross check final
 

Similar to Text Data Mining: Unlocking the hidden potential from scholarly content.

OSFair2017 training | Machine accessibility of Open Access scientific publica...
OSFair2017 training | Machine accessibility of Open Access scientific publica...OSFair2017 training | Machine accessibility of Open Access scientific publica...
OSFair2017 training | Machine accessibility of Open Access scientific publica...
Open Science Fair
 
How can we ensure research data is re-usable? The role of Publishers in Resea...
How can we ensure research data is re-usable? The role of Publishers in Resea...How can we ensure research data is re-usable? The role of Publishers in Resea...
How can we ensure research data is re-usable? The role of Publishers in Resea...
LEARN Project
 
UKSG 2018 Breakout - Setting your cites to open I4OC - Maccallum
UKSG 2018 Breakout - Setting your cites to open I4OC - MaccallumUKSG 2018 Breakout - Setting your cites to open I4OC - Maccallum
UKSG 2018 Breakout - Setting your cites to open I4OC - Maccallum
UKSG: connecting the knowledge community
 
Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)
Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)
Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)
Frank Oellien
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?
Anita de Waard
 
ALAMW14 Altmetrics Panel: Redefining Research Impact
ALAMW14 Altmetrics Panel: Redefining Research ImpactALAMW14 Altmetrics Panel: Redefining Research Impact
ALAMW14 Altmetrics Panel: Redefining Research Impact
William Gunn
 
Elsevier - Smart Data and Algorithms for the Publishing Industry
Elsevier - Smart Data and Algorithms for the Publishing IndustryElsevier - Smart Data and Algorithms for the Publishing Industry
Elsevier - Smart Data and Algorithms for the Publishing Industry
Antonio Gulli
 
A scalable hybrid research paper recommender system for micro
A scalable hybrid research paper recommender system for microA scalable hybrid research paper recommender system for micro
A scalable hybrid research paper recommender system for micro
aman341480
 
Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...
Lucy McKenna
 
CrossRef Text and Data Mining
CrossRef Text and Data MiningCrossRef Text and Data Mining
CrossRef Text and Data Mining
Crossref
 
NISO April 30th RA21 Webinar
NISO April 30th RA21 WebinarNISO April 30th RA21 Webinar
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...
petrknoth
 
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...
petrknoth
 
Supporting the ref5
Supporting the ref5Supporting the ref5
Supporting the ref5
lshavald
 
A Pragmatic Approach to Facilitating Text and Data Mining
A Pragmatic Approach to Facilitating Text and Data Mining A Pragmatic Approach to Facilitating Text and Data Mining
A Pragmatic Approach to Facilitating Text and Data Mining
Chris Shillum
 
From Open Access to Open Data
From Open Access to Open DataFrom Open Access to Open Data
From Open Access to Open Data
Brian Hole
 
Research Data Publishing
Research Data PublishingResearch Data Publishing
Research Data Publishing
Brian Hole
 
Simons orcid forum canberra 2018-PIDs in research
Simons orcid forum canberra 2018-PIDs in researchSimons orcid forum canberra 2018-PIDs in research
Simons orcid forum canberra 2018-PIDs in research
ARDC
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
Research Data Alliance
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
Research Data Alliance
 

Similar to Text Data Mining: Unlocking the hidden potential from scholarly content. (20)

OSFair2017 training | Machine accessibility of Open Access scientific publica...
OSFair2017 training | Machine accessibility of Open Access scientific publica...OSFair2017 training | Machine accessibility of Open Access scientific publica...
OSFair2017 training | Machine accessibility of Open Access scientific publica...
 
How can we ensure research data is re-usable? The role of Publishers in Resea...
How can we ensure research data is re-usable? The role of Publishers in Resea...How can we ensure research data is re-usable? The role of Publishers in Resea...
How can we ensure research data is re-usable? The role of Publishers in Resea...
 
UKSG 2018 Breakout - Setting your cites to open I4OC - Maccallum
UKSG 2018 Breakout - Setting your cites to open I4OC - MaccallumUKSG 2018 Breakout - Setting your cites to open I4OC - Maccallum
UKSG 2018 Breakout - Setting your cites to open I4OC - Maccallum
 
Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)
Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)
Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?
 
ALAMW14 Altmetrics Panel: Redefining Research Impact
ALAMW14 Altmetrics Panel: Redefining Research ImpactALAMW14 Altmetrics Panel: Redefining Research Impact
ALAMW14 Altmetrics Panel: Redefining Research Impact
 
Elsevier - Smart Data and Algorithms for the Publishing Industry
Elsevier - Smart Data and Algorithms for the Publishing IndustryElsevier - Smart Data and Algorithms for the Publishing Industry
Elsevier - Smart Data and Algorithms for the Publishing Industry
 
A scalable hybrid research paper recommender system for micro
A scalable hybrid research paper recommender system for microA scalable hybrid research paper recommender system for micro
A scalable hybrid research paper recommender system for micro
 
Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...
 
CrossRef Text and Data Mining
CrossRef Text and Data MiningCrossRef Text and Data Mining
CrossRef Text and Data Mining
 
NISO April 30th RA21 Webinar
NISO April 30th RA21 WebinarNISO April 30th RA21 Webinar
NISO April 30th RA21 Webinar
 
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...
 
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...
 
Supporting the ref5
Supporting the ref5Supporting the ref5
Supporting the ref5
 
A Pragmatic Approach to Facilitating Text and Data Mining
A Pragmatic Approach to Facilitating Text and Data Mining A Pragmatic Approach to Facilitating Text and Data Mining
A Pragmatic Approach to Facilitating Text and Data Mining
 
From Open Access to Open Data
From Open Access to Open DataFrom Open Access to Open Data
From Open Access to Open Data
 
Research Data Publishing
Research Data PublishingResearch Data Publishing
Research Data Publishing
 
Simons orcid forum canberra 2018-PIDs in research
Simons orcid forum canberra 2018-PIDs in researchSimons orcid forum canberra 2018-PIDs in research
Simons orcid forum canberra 2018-PIDs in research
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
 

Recently uploaded

RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 

Recently uploaded (20)

RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 

Text Data Mining: Unlocking the hidden potential from scholarly content.

  • 1. 1 TDM: Unlocking the hidden potential from scholarly content
  • 2. 2 Until recently, text mining has mostly been restricted to post-publication PDFs and has proved slow and difficult. The focus for scholarly content has often been limited to metadata and abstracts. TDM is evolving to extract a wealth of information that can support the entire scholarly community – from authors to publishers. Making sense of unstructured content
  • 4. 4 6% YoY growth in manuscript submissions 42% authors post their preprint before journal submission 300% increase in the number of preprint servers since 2015 The research keeps growing Published work and preprints 6% 300% 42%
  • 5. 5 Too many manuscripts. Not enough time. Submission to publication time expanding. 48 Hours First review round Submission to publication Screening 13 Weeks 400 Days
  • 6. 6 XML often made available for Open Access articles, but not all publishers make XML available to TDM services (API). Rise of preprint servers and number of journals inviting article submission via these servers increases need to mine non-XML content. Most authors still submit manuscripts to publishers & preprint servers in Word or PDF. Some servers convert content into XML, but majority of platforms only allow for the preprint to be downloaded in the same format it was uploaded in. The format challenge
  • 7. 7 Software used by authors Word still the preferred format Writing software used by authors submitting to bioRxiv. Source: Sever et al (2019) bioRxiv: the preprint server for biology. https://dx.doi.org/10.1101/833400
  • 9. 9 Extracting structured content from any document Dixon WG, Beukenhorst AL, Yimer BB et al. 2019. doi:10.1038/s41746-019- 0180-3 Content extracted to a structured format
  • 10. 10 Distilling research into headlines and key information Rosyadi S, Haryanto A. 2019. doi:10.31124/advance.9989639.v1 Distillation to unified format
  • 12. 12 Manuscript submission Manuscript screening Peer review Promotion TDM: What are the opportunities? TDM can work at any stage of the publishing process, opening up a huge number of opportunities from manuscript drafting and screening to promoting the published article.
  • 13. 13 • Metadata extraction to automate population of submissions system (Title, author, affiliations, abstract, keywords). • Reduces author friction / duplication of effort. • Previous work in this area has focused on the biomedical domain, but this opportunity can apply to any domain. Automating submissions process
  • 14. 14 • Data extraction for manuscript screening (key methods, results, sample size, participants, ethical compliance etc.) • Clear article context/overview for reviewers. • One-click access of cited sources & main findings. • Table extraction for analysis of statistical calculations. Speeding up peer review
  • 15. 15 Surfacing cited sources & their main findings Krohn L, Ruskey JA, Rudakou U et al. 2019. doi:10.1101/19010991 Cited sources and their main findings surfaced
  • 16. 16 • Extract, parse and link citations from archives dating back hundreds of years. • Large scale reference population of open citation networks (BMJ Case study) • Improve exposure/discovery of older research. Exposing more content through citation networks
  • 18. 18 How publishers can help. Make XML available for all Open Access articles rather than just the final PDF for text mining. Enrich citation networks with additional content (e.g. abstract, highlights) in a machine-readable format. Make all cited sources more easily verifiable for authors and researchers. Converting articles & preprints into a universally structured format for more effective TDM. Allow authors to write articles natively in a machine-readable format. 1 2 3 4
  • 19. 19 …equal rights for friendly bots! And finally…