Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Mining the scientific literature for plants and chemistry

639 views

Published on

ContentMine can read the daily scientific literature and extract facts. This talk was given to the OpenPlant project - with whom ContentMine collaborate at a meeting on 2016-07-25/27 in Norwich. Examples of extracted facts are given.

Published in: Science
  • Six weeks into the program and I no longer suffer from the debilitating symptoms that had practically ruined my life. There are no more migraines, no more joint and knee pains, no more menstrual pain, no more rashes on my chest and no more seborrhea on my eyebrows. It is truly a miracle that you have given me... ◆◆◆ https://tinyurl.com/y4uu6uch
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Linda Allen - certified nutritionist and former yeast infection sufferer teaches you her candida freedom step by step success system jam-packed with a valuable information on how to naturally and permanently eliminate your yeast infection from the ROOT and achieve LASTING freedom from candida related symptoms. ➤➤ https://tinyurl.com/y4uu6uch
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Mining the scientific literature for plants and chemistry

  1. 1. Mining science from the plant literature ContentMine Open Plant Forum, Norwich, UK, 2016-07-27 Peter Murray-Rust [1]University of Cambridge [2]TheContentMine 10,000 scholarly publications every day. How many relate to plants?
  2. 2. (2x digital music industry!) Non-profit Downloading several thousand papers per day and making search results open for everyone http://contentmine.org Downloadable Open source
  3. 3. MozFest 2015 ContentMine + TGAC / hack
  4. 4. Terpinome Phytochemists! Salvia officinalis Salvia microphylla Origanum vulgare Ocimum basilicum Laurus nobilis [1] [1] Lauraceae
  5. 5. We can search for • Plants • Compounds • Other species • Diseases • Places • Important terms • We’ll need: sources, dictionaries, software
  6. 6. Europe PubMedCentral Over 1 million biomedical papers
  7. 7. Dictionaries! Diseases (WHO)
  8. 8. catalogue getpapers query Daily Crawl EuPMC, arXiv CORE , HAL, (UNIV repos) ToC services PDF HTML DOC ePUB TeX XML PNG EPS CSV XLSURLs DOIs quickscrape norma Normalizer Structurer Semantic Tagger Text Data Figures ami UNIV Repos search Lookup CONTENT MINING Chem Phylo Trials Crystal Plants COMMUNITY plugins Visualization and Analysis PloSONE, BMC, peerJ… Nature, IEEE, Elsevier… Publisher Sites scrapers queries taggers abstract methods references Captioned Figures Fig. 1 HTML tables 100, 000 pages/day Semantic ScholarlyHTML (W3C community group) Facts Latest 20150908 CONTENTMINE SOFTWARE Crossref
  9. 9. What plants produce Carvone? https://en.wikipedia.org/wiki/Carvone https://en.wikipedia.org/wiki/Carvone
  10. 10. https://en.wikipedia.org/wiki/Carvone WIKIDATA
  11. 11. Carvone in Wikidata Also SPARQL endpoint WP identifier Chemical type Chemical identifier
  12. 12. ARTICLES FACETS gene disease drug Phyto chem species genus words
  13. 13. Suggest the title of this article
  14. 14. species words drug Phytochemdisease
  15. 15. species words drug Phytochemdisease disease
  16. 16. (2x digital music industry!) Downloading and searching Several thousand papers per day and making the results open for everyone
  17. 17. end
  18. 18. Mining for phytochemicals • getpapers –q carvone –o carvone –x –k 100 Search “carvone”, output to carvone/, fmt XML, limit 100 hits • cmine carvone Normalize papers; search locally for species, sequences, diseases, drugs Results in dataTables.html and results/…/results.xml (includes W3C annotation) • python cmhypy.py carvone/ -u petermr <key> send annotations -> hypothes.is
  19. 19. Annotation (entity in context) prefix surface label location suffix
  20. 20. Search for carvone
  21. 21. Mining for phytochemicals • getpapers –q carvone –o carvone –x –k 100 Search “carvone”, output to carvone/, fmt XML, limit 100 hits • cmine carvone Normalize papers; search locally for species, sequences, diseases, drugs Results in dataTables.html and results/…/results.xml (includes W3C annotation) • python cmhypy.py carvone/ -u petermr <key> send IUCN redlist plant annotations -> hypothes.is
  22. 22. Annotation (entity in context) prefix surface label location suffix
  23. 23. of plant-microbe interactions. Richard Smith-Unna, PhD student, Plant Sci Cambridge. Peter Murray-Rust, a (retired but highly active) chemist in Cam University. Report of 2-day workshop (hack) held at TGAC 2016-03-10/11 The workshop centered on novel methods for discovering information ab from the existing literature (“Content Mining”). We prepared ContentMin specifically for the workshop on the basis that “anyone can run it and get “. Everyone was asked to install the software on whatever platform they c used (Mac, Windows, Unix). There were few problems and most people w within an hour. A typical example was “find all you can about diseases of o EuropePubMedCentral (with over 1 million Open Access papers). This retr 500 papers, which were further filtered for chemicals, diseases, species, e displayed within a minute or two, significantly increasing the speed of kno driven scientific discovery. We also jointly made considerable improveme software and have agreed to meet regularly to take this forward.

×