Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Content Mine + Europe PMC
Peter Murray-Rust,
ContentMine.org and UniversityofCambridge
London, UK 2016-02-08
Finding Zika!...
The Right to Read is the Right to Mine**PeterMurray-Rust, 2011
http://contentmine.org
Semantic Fulltext
• EuropePMC coherent OpenAccess
• getpapers: query , download (through API).
• AMI filters, checks[1], t...
catalogue
getpapers
query
Daily
Crawl
EPMC, arXiv
CORE , HAL,
(UNIV repos)
ToC
services
PDF HTML
DOC ePUB
TeX XML
PNG
EPS ...
Download all Open Access “Zika” from
EuropePMC in 10 seconds
(click below for movie)
Aedes aegypti, Wikimedia CC-BY-SA
Not...
Downloaded all Open Access “Zika” from
EuropePMC in 10 seconds
Final download screen
Eyeballing 20/120 Zika papers,
click below for movie
Yellow Fever Virus
Wikimedia CC-BY-SA
Note: movie of this and other s...
3011 virus
1939 Ae./Aedes
1212 dengue
901 mosquito/es
894 species
791 ZIKV
721 using
716 DENV
567 detection
513 aegypti
48...
Filtering local files for sequence and viruses
AMI (part of ContentMine software)
(click below for movie)
Note: movies of ...
DNA Primers in running text
…the sodium channel voltage dependent gene (Nav). Primers
used to amplify this fragment were A...
Commonest species in 120 Zika papers
423 Ae./Aedes aegypti
333 Ae./Aedes albopictus
63 Ae. bromeliae
58 Ae. lilii
46 Ae. h...
183 Wolbachia
70 Aedes
69 Flavivirus/Flaviviridae
30 Glossina
17 Culex
Commonest genera in Zika papers
pre=”…-negative end...
38 ITS
20 MHC2TA
19 COI
14 CYPJ92
5 CYP6BB2
4 CYP9J28
3 MHC
Commonest genes in 120 Zika papers
• microcephaly 400/2400 papers; 2 mins;
commonest genes:
203 MCPH1
86 MECP2
54 SOX2
49 E2F1
47 SNAP29
40 IKBKG
40 NDE1
N-t...
CM Future
• Hypothes.is use ContentMine results for annotation
• (with Cambridge Univ Library) extracting daily
scientific...
The Right to Read is the Right to Mine**PeterMurray-Rust, 2011
http://contentmine.org
Upcoming SlideShare
Loading in …5
×

ContentMine + EPMC: Finding Zika!

1,341 views

Published on

Use of ContentMine tools on the Open Access subset of EuropePubMedCentral to discover new knowledge about the Zika virus.
Three slides have embedded movies - these do not show in slideshare and a first pass of this can be seen as a single file at https://vimeo.com/154705161

Published in: Health & Medicine
  • Be the first to comment

  • Be the first to like this

ContentMine + EPMC: Finding Zika!

  1. 1. Content Mine + Europe PMC Peter Murray-Rust, ContentMine.org and UniversityofCambridge London, UK 2016-02-08 Finding Zika! getpapers and AMI download and analyze papers from EuropePMC API: F/OSS tools from contentmine.org Images from Wikimedia CC-BY-SA
  2. 2. The Right to Read is the Right to Mine**PeterMurray-Rust, 2011 http://contentmine.org
  3. 3. Semantic Fulltext • EuropePMC coherent OpenAccess • getpapers: query , download (through API). • AMI filters, checks[1], transforms facts in papers. • sequences, species, genera, genes, dictionaries [0] All operations shown run in total of <3 minutes. [1] Dictionaries and lookup. [2] Usable from home by anyone Zika endemic areas Wikimedia CC-BY-SA
  4. 4. catalogue getpapers query Daily Crawl EPMC, arXiv CORE , HAL, (UNIV repos) ToC services PDF HTML DOC ePUB TeX XML PNG EPS CSV XLSURLs DOIs crawl quickscrape norma Normalizer Structurer Semantic Tagger Text Data Figures ami UNIV Repos search Lookup CONTENT MINING Chem Phylo Trials Crystal Plants COMMUNITY plugins Visualization and Analysis PloSONE, BMC, peerJ… Nature, IEEE, Elsevier… Publisher Sites scrapers queries taggers abstract methods references Captioned Figures Fig. 1 HTML tables 30, 000 pages/day Semantic ScholarlyHTML Facts CONTENTMINE Complete OPEN Platform for Mining Scientific Literature
  5. 5. Download all Open Access “Zika” from EuropePMC in 10 seconds (click below for movie) Aedes aegypti, Wikimedia CC-BY-SA Note: movies of this and other slides can be seen at https://vimeo.com/154705161
  6. 6. Downloaded all Open Access “Zika” from EuropePMC in 10 seconds Final download screen
  7. 7. Eyeballing 20/120 Zika papers, click below for movie Yellow Fever Virus Wikimedia CC-BY-SA Note: movie of this and other slides can be seen at https://vimeo.com/154705161
  8. 8. 3011 virus 1939 Ae./Aedes 1212 dengue 901 mosquito/es 894 species 791 ZIKV 721 using 716 DENV 567 detection 513 aegypti 484 infection 442 RNA 428 protein 401 albopictus 360 viral Commonest words in 120 Zika papers Mosquito spp. Wikimedia CC-BY-SA
  9. 9. Filtering local files for sequence and viruses AMI (part of ContentMine software) (click below for movie) Note: movies of this and other slides can be seen at https://vimeo.com/154705161
  10. 10. DNA Primers in running text …the sodium channel voltage dependent gene (Nav). Primers used to amplify this fragment were AaNaA 5’-ACAATGTGGATCGCTTCCC-3’ and AaNaB 5’-TGGACAAAAGCAAGGCTAAG-3’(8). The primers amplify a fragment of approximately 472… Snippet (quotable under 2014 UK Statutory Instrument (“Hargreaves”): ~/PMC4654492/results/sequence/dnaprimer/results.xml” W3C Annotation [PREFIX] [MATCH] (link to target) [SUFFIX] CMine structure plugin option DNA double stranded fragment Wikimedia CC-BY-SA
  11. 11. Commonest species in 120 Zika papers 423 Ae./Aedes aegypti 333 Ae./Aedes albopictus 63 Ae. bromeliae 58 Ae. lilii 46 Ae. hensilli 42 Glossina pallidipes 40 Plasmodium vivax 35 Ae. luteocephalus 28 Ae. vittatus 25 Ae. furcifer 22 Plasmodium falciparum 21 Drosophila melanogaster pre=“fever (DHF), are caused by the world's most prevalent mosquito-borne virus. 37 DENV is carried by " exact="Aedes aegypti” post=" mosquito, which is strongly affected by ecological and human drivers, but also influenced by clima" name="binomial"/>
  12. 12. 183 Wolbachia 70 Aedes 69 Flavivirus/Flaviviridae 30 Glossina 17 Culex Commonest genera in Zika papers pre=”…-negative endosymbiotic bacterium, is a promising tool against diseases transmitted by mosquitoes. " exact="Wolbachia” post=" can be found worldwide in numerous arthropod species. More than 65% of all insect species are natu…” Wolbachia in insect cell Wikimedia CC-BY-SA
  13. 13. 38 ITS 20 MHC2TA 19 COI 14 CYPJ92 5 CYP6BB2 4 CYP9J28 3 MHC Commonest genes in 120 Zika papers
  14. 14. • microcephaly 400/2400 papers; 2 mins; commonest genes: 203 MCPH1 86 MECP2 54 SOX2 49 E2F1 47 SNAP29 40 IKBKG 40 NDE1 N-terminal domain of microcephalin Wikimedia CC-BY-SA
  15. 15. CM Future • Hypothes.is use ContentMine results for annotation • (with Cambridge Univ Library) extracting daily scientific facts from open and closed literature. • with EBI, Cochrane Collaborations, JISC, OKF, LIBER, TGAC/JohnInnes, DNADigest. • Running workshops, hackdays. • Planned outreach: MEPs, EC, Slashdot, Reddit, Kickstarter, geekdom • http://contentmine.org (OpenLock non-profit)
  16. 16. The Right to Read is the Right to Mine**PeterMurray-Rust, 2011 http://contentmine.org

×