SlideShare a Scribd company logo
1 of 19
Download to read offline
1
TDM: Unlocking the hidden
potential from scholarly
content
2
Until recently, text mining has mostly been
restricted to post-publication PDFs and has
proved slow and difficult. The focus for scholarly
content has often been limited to metadata and
abstracts.
TDM is evolving to extract a wealth of
information that can support the entire scholarly
community – from authors to publishers.
Making sense of unstructured
content
3
Landscape
4
6% YoY growth in manuscript submissions
42% authors post their preprint before
journal submission
300% increase in the number of preprint
servers since 2015
The research keeps growing
Published work and preprints
6%
300%
42%
5
Too many manuscripts. Not enough time.
Submission to publication time expanding.
48 Hours
First review
round
Submission to
publication
Screening
13 Weeks 400 Days
6
XML often made available for Open Access articles, but not all publishers make XML
available to TDM services (API).
Rise of preprint servers and number of journals inviting article submission via these
servers increases need to mine non-XML content.
Most authors still submit manuscripts to publishers & preprint servers in Word or
PDF.
Some servers convert content into XML, but majority of platforms only allow for the
preprint to be downloaded in the same format it was uploaded in.
The format challenge
7
Software used by authors
Word still the preferred format
Writing software used by authors submitting to bioRxiv.
Source: Sever et al (2019) bioRxiv: the preprint server for biology. https://dx.doi.org/10.1101/833400
8
Format shouldn’t matter
9
Extracting structured content from any document
Dixon WG, Beukenhorst AL, Yimer BB et al. 2019. doi:10.1038/s41746-019-
0180-3
Content extracted to a structured format
10
Distilling research into headlines and key information
Rosyadi S, Haryanto A. 2019. doi:10.31124/advance.9989639.v1 Distillation to unified format
11
Opportunities
12
Manuscript
submission
Manuscript
screening
Peer review
Promotion
TDM: What are the opportunities?
TDM can work at any stage of the publishing process, opening up a huge number of opportunities from
manuscript drafting and screening to promoting the published article.
13
• Metadata extraction to automate
population of submissions system (Title,
author, affiliations, abstract, keywords).
• Reduces author friction / duplication of
effort.
• Previous work in this area has focused on
the biomedical domain, but this
opportunity can apply to any domain.
Automating submissions process
14
• Data extraction for manuscript screening
(key methods, results, sample size,
participants, ethical compliance etc.)
• Clear article context/overview for
reviewers.
• One-click access of cited sources & main
findings.
• Table extraction for analysis of statistical
calculations.
Speeding up peer review
15
Surfacing cited sources & their main findings
Krohn L, Ruskey JA, Rudakou U et al. 2019. doi:10.1101/19010991 Cited sources and their main findings surfaced
16
• Extract, parse and link citations from
archives dating back hundreds of years.
• Large scale reference population of open
citation networks (BMJ Case study)
• Improve exposure/discovery of older
research.
Exposing more content through
citation networks
17
What’s needed?
18
How publishers can help.
Make XML available for all Open Access articles rather than just the final
PDF for text mining.
Enrich citation networks with additional content (e.g. abstract,
highlights) in a machine-readable format.
Make all cited sources more easily verifiable for authors and
researchers.
Converting articles & preprints into a universally structured format for
more effective TDM. Allow authors to write articles natively in a
machine-readable format.
1
2
3
4
19
…equal rights for friendly bots!
And finally…

More Related Content

What's hot

Data Citation: A Critical Role for Publishers
Data Citation: A Critical Role for PublishersData Citation: A Critical Role for Publishers
Data Citation: A Critical Role for PublishersBrian Hole
 
Data availability and feasibility of validation – A genomics case study
Data availability and feasibility of validation – A genomics case studyData availability and feasibility of validation – A genomics case study
Data availability and feasibility of validation – A genomics case studyVerena139
 
Citation Analysis for the Free, Online Literature
Citation Analysis for the Free, Online LiteratureCitation Analysis for the Free, Online Literature
Citation Analysis for the Free, Online LiteratureBalachandar Radhakrishnan
 
UKSG 2018 Breakout - Trouble(shooting) with a capital T: how categorising and...
UKSG 2018 Breakout - Trouble(shooting) with a capital T: how categorising and...UKSG 2018 Breakout - Trouble(shooting) with a capital T: how categorising and...
UKSG 2018 Breakout - Trouble(shooting) with a capital T: how categorising and...UKSG: connecting the knowledge community
 
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...NASIG
 
COVID-19 and Changing Paradigm in Scholarly communication
COVID-19 and Changing Paradigm in Scholarly communication COVID-19 and Changing Paradigm in Scholarly communication
COVID-19 and Changing Paradigm in Scholarly communication Vasantha Raju N
 
How Accessible Is Our Collection? Performing an E-Resources Accessibility Review
How Accessible Is Our Collection? Performing an E-Resources Accessibility ReviewHow Accessible Is Our Collection? Performing an E-Resources Accessibility Review
How Accessible Is Our Collection? Performing an E-Resources Accessibility ReviewNASIG
 
Advancing the International Plant Names Index (IPNI)
Advancing the International Plant Names Index (IPNI) Advancing the International Plant Names Index (IPNI)
Advancing the International Plant Names Index (IPNI) nickyn
 
CI4CC sustainability-panel
CI4CC sustainability-panelCI4CC sustainability-panel
CI4CC sustainability-panelRavi Madduri
 
Fox-Keynote-Now and Now of Data Publishing-nfdp13
Fox-Keynote-Now and Now of Data Publishing-nfdp13Fox-Keynote-Now and Now of Data Publishing-nfdp13
Fox-Keynote-Now and Now of Data Publishing-nfdp13DataDryad
 
The Intersection of InterLibrary Loan and Acquisition Models: A review of rec...
The Intersection of InterLibrary Loan and Acquisition Models: A review of rec...The Intersection of InterLibrary Loan and Acquisition Models: A review of rec...
The Intersection of InterLibrary Loan and Acquisition Models: A review of rec...NASIG
 
2 flash presentations for annual meeting tdm and cross check final
2 flash presentations for annual meeting tdm and cross check final2 flash presentations for annual meeting tdm and cross check final
2 flash presentations for annual meeting tdm and cross check finalCrossref
 

What's hot (20)

Data Citation: A Critical Role for Publishers
Data Citation: A Critical Role for PublishersData Citation: A Critical Role for Publishers
Data Citation: A Critical Role for Publishers
 
Data availability and feasibility of validation – A genomics case study
Data availability and feasibility of validation – A genomics case studyData availability and feasibility of validation – A genomics case study
Data availability and feasibility of validation – A genomics case study
 
2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery
 
Citation Analysis for the Free, Online Literature
Citation Analysis for the Free, Online LiteratureCitation Analysis for the Free, Online Literature
Citation Analysis for the Free, Online Literature
 
UKSG 2018 Breakout - Trouble(shooting) with a capital T: how categorising and...
UKSG 2018 Breakout - Trouble(shooting) with a capital T: how categorising and...UKSG 2018 Breakout - Trouble(shooting) with a capital T: how categorising and...
UKSG 2018 Breakout - Trouble(shooting) with a capital T: how categorising and...
 
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
 
Where you should publish
Where you should publishWhere you should publish
Where you should publish
 
2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery
 
Data Metadata and Data Citation - Emma Ganley (PLoS)
Data Metadata and Data Citation - Emma Ganley (PLoS)Data Metadata and Data Citation - Emma Ganley (PLoS)
Data Metadata and Data Citation - Emma Ganley (PLoS)
 
2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery
 
COVID-19 and Changing Paradigm in Scholarly communication
COVID-19 and Changing Paradigm in Scholarly communication COVID-19 and Changing Paradigm in Scholarly communication
COVID-19 and Changing Paradigm in Scholarly communication
 
Oct 14 NISO Webinar: Cloud and Web Services for Libraries
Oct 14 NISO Webinar: Cloud and Web Services for LibrariesOct 14 NISO Webinar: Cloud and Web Services for Libraries
Oct 14 NISO Webinar: Cloud and Web Services for Libraries
 
2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery
 
How Accessible Is Our Collection? Performing an E-Resources Accessibility Review
How Accessible Is Our Collection? Performing an E-Resources Accessibility ReviewHow Accessible Is Our Collection? Performing an E-Resources Accessibility Review
How Accessible Is Our Collection? Performing an E-Resources Accessibility Review
 
Advancing the International Plant Names Index (IPNI)
Advancing the International Plant Names Index (IPNI) Advancing the International Plant Names Index (IPNI)
Advancing the International Plant Names Index (IPNI)
 
CI4CC sustainability-panel
CI4CC sustainability-panelCI4CC sustainability-panel
CI4CC sustainability-panel
 
Fox-Keynote-Now and Now of Data Publishing-nfdp13
Fox-Keynote-Now and Now of Data Publishing-nfdp13Fox-Keynote-Now and Now of Data Publishing-nfdp13
Fox-Keynote-Now and Now of Data Publishing-nfdp13
 
The Intersection of InterLibrary Loan and Acquisition Models: A review of rec...
The Intersection of InterLibrary Loan and Acquisition Models: A review of rec...The Intersection of InterLibrary Loan and Acquisition Models: A review of rec...
The Intersection of InterLibrary Loan and Acquisition Models: A review of rec...
 
Biosharing sansone-dryad-may13
Biosharing sansone-dryad-may13Biosharing sansone-dryad-may13
Biosharing sansone-dryad-may13
 
2 flash presentations for annual meeting tdm and cross check final
2 flash presentations for annual meeting tdm and cross check final2 flash presentations for annual meeting tdm and cross check final
2 flash presentations for annual meeting tdm and cross check final
 

Similar to Text Data Mining: Unlocking the hidden potential from scholarly content.

OSFair2017 training | Machine accessibility of Open Access scientific publica...
OSFair2017 training | Machine accessibility of Open Access scientific publica...OSFair2017 training | Machine accessibility of Open Access scientific publica...
OSFair2017 training | Machine accessibility of Open Access scientific publica...Open Science Fair
 
How can we ensure research data is re-usable? The role of Publishers in Resea...
How can we ensure research data is re-usable? The role of Publishers in Resea...How can we ensure research data is re-usable? The role of Publishers in Resea...
How can we ensure research data is re-usable? The role of Publishers in Resea...LEARN Project
 
Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)
Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)
Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)Frank Oellien
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?Anita de Waard
 
ALAMW14 Altmetrics Panel: Redefining Research Impact
ALAMW14 Altmetrics Panel: Redefining Research ImpactALAMW14 Altmetrics Panel: Redefining Research Impact
ALAMW14 Altmetrics Panel: Redefining Research ImpactWilliam Gunn
 
Elsevier - Smart Data and Algorithms for the Publishing Industry
Elsevier - Smart Data and Algorithms for the Publishing IndustryElsevier - Smart Data and Algorithms for the Publishing Industry
Elsevier - Smart Data and Algorithms for the Publishing IndustryAntonio Gulli
 
A scalable hybrid research paper recommender system for micro
A scalable hybrid research paper recommender system for microA scalable hybrid research paper recommender system for micro
A scalable hybrid research paper recommender system for microaman341480
 
Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Lucy McKenna
 
CrossRef Text and Data Mining
CrossRef Text and Data MiningCrossRef Text and Data Mining
CrossRef Text and Data MiningCrossref
 
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...petrknoth
 
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...petrknoth
 
Supporting the ref5
Supporting the ref5Supporting the ref5
Supporting the ref5lshavald
 
A Pragmatic Approach to Facilitating Text and Data Mining
A Pragmatic Approach to Facilitating Text and Data Mining A Pragmatic Approach to Facilitating Text and Data Mining
A Pragmatic Approach to Facilitating Text and Data Mining Chris Shillum
 
From Open Access to Open Data
From Open Access to Open DataFrom Open Access to Open Data
From Open Access to Open DataBrian Hole
 
Research Data Publishing
Research Data PublishingResearch Data Publishing
Research Data PublishingBrian Hole
 
Simons orcid forum canberra 2018-PIDs in research
Simons orcid forum canberra 2018-PIDs in researchSimons orcid forum canberra 2018-PIDs in research
Simons orcid forum canberra 2018-PIDs in researchARDC
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation Research Data Alliance
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation Research Data Alliance
 

Similar to Text Data Mining: Unlocking the hidden potential from scholarly content. (20)

OSFair2017 training | Machine accessibility of Open Access scientific publica...
OSFair2017 training | Machine accessibility of Open Access scientific publica...OSFair2017 training | Machine accessibility of Open Access scientific publica...
OSFair2017 training | Machine accessibility of Open Access scientific publica...
 
How can we ensure research data is re-usable? The role of Publishers in Resea...
How can we ensure research data is re-usable? The role of Publishers in Resea...How can we ensure research data is re-usable? The role of Publishers in Resea...
How can we ensure research data is re-usable? The role of Publishers in Resea...
 
UKSG 2018 Breakout - Setting your cites to open I4OC - Maccallum
UKSG 2018 Breakout - Setting your cites to open I4OC - MaccallumUKSG 2018 Breakout - Setting your cites to open I4OC - Maccallum
UKSG 2018 Breakout - Setting your cites to open I4OC - Maccallum
 
Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)
Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)
Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?
 
ALAMW14 Altmetrics Panel: Redefining Research Impact
ALAMW14 Altmetrics Panel: Redefining Research ImpactALAMW14 Altmetrics Panel: Redefining Research Impact
ALAMW14 Altmetrics Panel: Redefining Research Impact
 
Elsevier - Smart Data and Algorithms for the Publishing Industry
Elsevier - Smart Data and Algorithms for the Publishing IndustryElsevier - Smart Data and Algorithms for the Publishing Industry
Elsevier - Smart Data and Algorithms for the Publishing Industry
 
A scalable hybrid research paper recommender system for micro
A scalable hybrid research paper recommender system for microA scalable hybrid research paper recommender system for micro
A scalable hybrid research paper recommender system for micro
 
Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...
 
CrossRef Text and Data Mining
CrossRef Text and Data MiningCrossRef Text and Data Mining
CrossRef Text and Data Mining
 
NISO April 30th RA21 Webinar
NISO April 30th RA21 WebinarNISO April 30th RA21 Webinar
NISO April 30th RA21 Webinar
 
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...
 
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...
 
Supporting the ref5
Supporting the ref5Supporting the ref5
Supporting the ref5
 
A Pragmatic Approach to Facilitating Text and Data Mining
A Pragmatic Approach to Facilitating Text and Data Mining A Pragmatic Approach to Facilitating Text and Data Mining
A Pragmatic Approach to Facilitating Text and Data Mining
 
From Open Access to Open Data
From Open Access to Open DataFrom Open Access to Open Data
From Open Access to Open Data
 
Research Data Publishing
Research Data PublishingResearch Data Publishing
Research Data Publishing
 
Simons orcid forum canberra 2018-PIDs in research
Simons orcid forum canberra 2018-PIDs in researchSimons orcid forum canberra 2018-PIDs in research
Simons orcid forum canberra 2018-PIDs in research
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
 

Recently uploaded

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 

Recently uploaded (20)

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 

Text Data Mining: Unlocking the hidden potential from scholarly content.

  • 1. 1 TDM: Unlocking the hidden potential from scholarly content
  • 2. 2 Until recently, text mining has mostly been restricted to post-publication PDFs and has proved slow and difficult. The focus for scholarly content has often been limited to metadata and abstracts. TDM is evolving to extract a wealth of information that can support the entire scholarly community – from authors to publishers. Making sense of unstructured content
  • 4. 4 6% YoY growth in manuscript submissions 42% authors post their preprint before journal submission 300% increase in the number of preprint servers since 2015 The research keeps growing Published work and preprints 6% 300% 42%
  • 5. 5 Too many manuscripts. Not enough time. Submission to publication time expanding. 48 Hours First review round Submission to publication Screening 13 Weeks 400 Days
  • 6. 6 XML often made available for Open Access articles, but not all publishers make XML available to TDM services (API). Rise of preprint servers and number of journals inviting article submission via these servers increases need to mine non-XML content. Most authors still submit manuscripts to publishers & preprint servers in Word or PDF. Some servers convert content into XML, but majority of platforms only allow for the preprint to be downloaded in the same format it was uploaded in. The format challenge
  • 7. 7 Software used by authors Word still the preferred format Writing software used by authors submitting to bioRxiv. Source: Sever et al (2019) bioRxiv: the preprint server for biology. https://dx.doi.org/10.1101/833400
  • 9. 9 Extracting structured content from any document Dixon WG, Beukenhorst AL, Yimer BB et al. 2019. doi:10.1038/s41746-019- 0180-3 Content extracted to a structured format
  • 10. 10 Distilling research into headlines and key information Rosyadi S, Haryanto A. 2019. doi:10.31124/advance.9989639.v1 Distillation to unified format
  • 12. 12 Manuscript submission Manuscript screening Peer review Promotion TDM: What are the opportunities? TDM can work at any stage of the publishing process, opening up a huge number of opportunities from manuscript drafting and screening to promoting the published article.
  • 13. 13 • Metadata extraction to automate population of submissions system (Title, author, affiliations, abstract, keywords). • Reduces author friction / duplication of effort. • Previous work in this area has focused on the biomedical domain, but this opportunity can apply to any domain. Automating submissions process
  • 14. 14 • Data extraction for manuscript screening (key methods, results, sample size, participants, ethical compliance etc.) • Clear article context/overview for reviewers. • One-click access of cited sources & main findings. • Table extraction for analysis of statistical calculations. Speeding up peer review
  • 15. 15 Surfacing cited sources & their main findings Krohn L, Ruskey JA, Rudakou U et al. 2019. doi:10.1101/19010991 Cited sources and their main findings surfaced
  • 16. 16 • Extract, parse and link citations from archives dating back hundreds of years. • Large scale reference population of open citation networks (BMJ Case study) • Improve exposure/discovery of older research. Exposing more content through citation networks
  • 18. 18 How publishers can help. Make XML available for all Open Access articles rather than just the final PDF for text mining. Enrich citation networks with additional content (e.g. abstract, highlights) in a machine-readable format. Make all cited sources more easily verifiable for authors and researchers. Converting articles & preprints into a universally structured format for more effective TDM. Allow authors to write articles natively in a machine-readable format. 1 2 3 4
  • 19. 19 …equal rights for friendly bots! And finally…