Mining the scientific literature for plants and chemistry

Mining science from the plant
literature
ContentMine
Open Plant Forum, Norwich,
UK, 2016-07-27
Peter Murray-Rust
[1]University of Cambridge [2]TheContentMine
10,000 scholarly publications every day.
How many relate to plants?

(2x digital music industry!)
Non-profit
Downloading several thousand
papers per day and making search
results open for everyone
http://contentmine.org
Downloadable Open source

MozFest 2015
ContentMine + TGAC / hack

Terpinome Phytochemists!
Salvia officinalis
Salvia microphylla
Origanum vulgare
Ocimum basilicum
Laurus nobilis [1]
[1] Lauraceae

We can search for
• Plants
• Compounds
• Other species
• Diseases
• Places
• Important terms
• We’ll need: sources, dictionaries, software

Europe PubMedCentral
Over 1 million biomedical papers

catalogue
getpapers
query
Daily
Crawl
EuPMC, arXiv
CORE , HAL,
(UNIV repos)
ToC
services
PDF HTML
DOC ePUB
TeX XML
PNG
EPS CSV
XLSURLs
DOIs quickscrape
norma
Normalizer
Structurer
Semantic
Tagger
Text
Data
Figures
ami
UNIV
Repos
search
Lookup
CONTENT
MINING
Chem
Phylo
Trials
Crystal
Plants
COMMUNITY
plugins
Visualization
and Analysis
PloSONE, BMC,
peerJ… Nature, IEEE,
Elsevier…
Publisher Sites
scrapers
queries
taggers
abstract
methods
references
Captioned
Figures
Fig. 1
HTML tables
100, 000 pages/day
Semantic ScholarlyHTML
(W3C community group)
Facts
Latest 20150908
CONTENTMINE SOFTWARE
Crossref

What plants produce Carvone?
https://en.wikipedia.org/wiki/Carvone

WIKIDATA

Carvone in Wikidata
Also SPARQL endpoint
WP identifier
Chemical
type
Chemical
identifier

ARTICLES FACETS
gene disease drug Phyto
chem
species genus words

Suggest the title of this article

species words
drug Phytochemdisease

species words
drug Phytochemdisease
disease

(2x digital music industry!)
Downloading and searching
Several thousand papers per day and
making the results open for everyone

Mining for phytochemicals
• getpapers –q carvone –o carvone –x –k 100
Search “carvone”, output to carvone/, fmt XML, limit 100 hits
• cmine carvone
Normalize papers;
search locally for species, sequences, diseases, drugs
Results in dataTables.html
and results/…/results.xml (includes W3C annotation)
• python cmhypy.py carvone/ -u petermr <key>
send annotations -> hypothes.is

Annotation (entity in context)
prefix
surface
label
location
suffix

Mining for phytochemicals
• getpapers –q carvone –o carvone –x –k 100
Search “carvone”, output to carvone/, fmt XML, limit 100 hits
• cmine carvone
Normalize papers;
search locally for species, sequences, diseases, drugs
Results in dataTables.html
and results/…/results.xml (includes W3C annotation)
• python cmhypy.py carvone/ -u petermr <key>
send IUCN redlist plant annotations -> hypothes.is

of plant-microbe interactions. Richard Smith-Unna, PhD student, Plant Sci
Cambridge. Peter Murray-Rust, a (retired but highly active) chemist in Cam
University.
Report of 2-day workshop (hack) held at TGAC 2016-03-10/11
The workshop centered on novel methods for discovering information ab
from the existing literature (“Content Mining”). We prepared ContentMin
specifically for the workshop on the basis that “anyone can run it and get
“. Everyone was asked to install the software on whatever platform they c
used (Mac, Windows, Unix). There were few problems and most people w
within an hour. A typical example was “find all you can about diseases of o
EuropePubMedCentral (with over 1 million Open Access papers). This retr
500 papers, which were further filtered for chemicals, diseases, species, e
displayed within a minute or two, significantly increasing the speed of kno
driven scientific discovery. We also jointly made considerable improveme
software and have agreed to meet regularly to take this forward.

Mining the scientific literature for plants and chemistry

Mining the scientific literature for plants and chemistry

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (9)

Similar to Mining the scientific literature for plants and chemistry

Similar to Mining the scientific literature for plants and chemistry (20)

More from petermurrayrust

More from petermurrayrust (20)

Recently uploaded

Recently uploaded (20)

Mining the scientific literature for plants and chemistry