Virtual Open Publishing
Fest, 2020-05-28
openVirus
Knowledge for citizens in the time of COVID-19
Peter Murray-Rust
TheContentMine and collaborators
citizens
knowledgebase
Images from ContentMine CC BY and Wikimedia CC BY-SA
pm286@cam.ac.uk
peter@contentmine.org
ContentMine is OpenLocked Non-Profit http://contentmine.org
The Right to Read is the Right to Mine
openVirus collaborators
Remko Popma,
Lezan Hawizy, Tim Voronov,
Andy Jackson,
Clyde Davies,
Thomas Shafee,
Priya JK , Kareena Singh,
Simon Worthington,
ContentMine Workshops on Mining
Chris Kittel, CM, atMozfest 2015
Stefan Kasberger, CM
… knowledge can prevent viral epidemics …
The Ebola outbreak in Liberia was predicted in 1984 …
… and forgotten. https://www.nytimes.com/2015/04/08/opinion/yes-we-were-warned-about-ebola.html
11,310 dead in
West Africa 2014
All: 426,613
Open: 21,919 5% is Open to citizens
Is this article relevant to policy makers?
openVirus will give YOU tools to test
how much vital info is PAYWALLED
PREPRINTS!!
Crossref
EuropePMC
Wikidata
getpapers
AMI
We can change all that!
We can do everything ourselves!
Delhi, IN
Priya and Kareena are 3rd year interns on openVirus
PMR
Gitanjali Yadav
Mining!
• build scrapers for Openly readable sources.
• Users queries for scraping
• download raw content
• clean and semantify
• annotate with dictionaries.
• analyze, display.
Scrape -> Clean-> Annotate -> Display
Open sources publish
|
v
|
v
|
v
Sources
https://ethos.bl.uk/Home.dohttps://www.redalyc.org/
100,000 Theses
4,700,000 abstracts
50,000 preprints
https://doaj.org https://biorxiv.org
https://medrxiv.org
Mexico, Latin America
https://europepmc.org
And your archive?
“(virus OR viral) AND
epidemic”
45 hits
DOAJ
Directory of Open Access Journals100,000
abstracts
Only 4.6 million
more to go 0.05%
20 GB total
Clyde Davies
Complete repo would yield
> 2000 articles
UK Theses (EThOS)
A full-text search API to find relevant
theses.
data from the EThOS service and the tools
of the UK Web Archive -> full-text search
API to find relevant theses.
1: Searching eTheses for the openVirus
project
2: Bringing Metadata & Full-text Together
This notebook illustrates how to use the
API
Andy Jackson
framework: ami + CProject data
scrapers: getpapers, Ferret, curl, scrapy
cleaners: PDFBox, Tidy/Jsoup, etc. Grobid
transformers: xml2html, ami ocr, KNIME
dictionaries: ami dictionary
indexing and annotation: Solr, ami
Analysis and display: R, KNIME
openVirus Tools
scrape clean annotate display
Dictionaries
disease.xml
country.xml
Generous support from
Annotation
Dictionary ->
A
R
T
I
C
L
E
Cooccurrence
bioRxiv in
Citizen Health Search (CHS)
A proposal to Wellcome Trust (
Open Research in Health call) with
ContentMine, Cochrane and UCL-EPPI (CCU)
CHS puts semantic search on the desktop
of the searcher. We index all the visible
Medical literature, normalize, section
and index against a bank of user-chosen
dictionaries.
CHS takes input from EPMC, bioRxiv and
emerging community sources such as
Crossref, unpaywall and outputs to Zenodo,
Wikidata and CM-Science Source.
Citizen Dashboard
5 million Open Scientific articles ( 0.5
TB), indexed by ContentMine . Disk
30 GBP Raspberry Pi3. 50 GBP
CC BY, PeterMR
Disk
Raspberry PI
Power
CONTAINERISATION!
TESTERS!!
GRAPHICS
DOCUMENTING
QUERIES
SCRAPERS
SOFTWARE
Contentmine.org. Join us! We need…
http://github.com/petermr/openVirus
The world’s existential problems need
knowledge
2019* “Open Climate Knowledge” (OCK)
to build tools to mine the scientific articles
about climate change.
Why?
50-90% of all published science is PAYWALLED.
The rest is very hard to find…
*Simon Worthington and PMR
BUT COVID-19 hit …
http://www.budapestopenaccessinitiative.org/read
… an unprecedented public good. …
… completely free and unrestricted access to [peer-
reviewed literature] by all scientists, scholars, teachers,
students, and other curious minds. …
…Removing access barriers to this literature will
accelerate research, enrich education, share the
learning of the rich with the poor and the poor with
the rich, make this literature as useful as it can be, and
lay the foundation for uniting humanity in a common
intellectual conversation and quest for knowledge.
(Budapest Open Access Initiative, 2003)

openVirus - tools for discovering literature on viruses

  • 1.
    Virtual Open Publishing Fest,2020-05-28 openVirus Knowledge for citizens in the time of COVID-19 Peter Murray-Rust TheContentMine and collaborators citizens knowledgebase Images from ContentMine CC BY and Wikimedia CC BY-SA pm286@cam.ac.uk peter@contentmine.org
  • 2.
    ContentMine is OpenLockedNon-Profit http://contentmine.org The Right to Read is the Right to Mine openVirus collaborators Remko Popma, Lezan Hawizy, Tim Voronov, Andy Jackson, Clyde Davies, Thomas Shafee, Priya JK , Kareena Singh, Simon Worthington,
  • 3.
    ContentMine Workshops onMining Chris Kittel, CM, atMozfest 2015 Stefan Kasberger, CM
  • 4.
    … knowledge canprevent viral epidemics … The Ebola outbreak in Liberia was predicted in 1984 … … and forgotten. https://www.nytimes.com/2015/04/08/opinion/yes-we-were-warned-about-ebola.html 11,310 dead in West Africa 2014
  • 6.
    All: 426,613 Open: 21,9195% is Open to citizens Is this article relevant to policy makers? openVirus will give YOU tools to test how much vital info is PAYWALLED
  • 7.
  • 8.
    Delhi, IN Priya andKareena are 3rd year interns on openVirus PMR Gitanjali Yadav
  • 9.
    Mining! • build scrapersfor Openly readable sources. • Users queries for scraping • download raw content • clean and semantify • annotate with dictionaries. • analyze, display. Scrape -> Clean-> Annotate -> Display Open sources publish | v | v | v
  • 10.
    Sources https://ethos.bl.uk/Home.dohttps://www.redalyc.org/ 100,000 Theses 4,700,000 abstracts 50,000preprints https://doaj.org https://biorxiv.org https://medrxiv.org Mexico, Latin America https://europepmc.org And your archive?
  • 11.
    “(virus OR viral)AND epidemic” 45 hits DOAJ Directory of Open Access Journals100,000 abstracts Only 4.6 million more to go 0.05% 20 GB total Clyde Davies Complete repo would yield > 2000 articles
  • 12.
    UK Theses (EThOS) Afull-text search API to find relevant theses. data from the EThOS service and the tools of the UK Web Archive -> full-text search API to find relevant theses. 1: Searching eTheses for the openVirus project 2: Bringing Metadata & Full-text Together This notebook illustrates how to use the API Andy Jackson
  • 13.
    framework: ami +CProject data scrapers: getpapers, Ferret, curl, scrapy cleaners: PDFBox, Tidy/Jsoup, etc. Grobid transformers: xml2html, ami ocr, KNIME dictionaries: ami dictionary indexing and annotation: Solr, ami Analysis and display: R, KNIME openVirus Tools scrape clean annotate display
  • 14.
  • 15.
  • 16.
  • 17.
    bioRxiv in Citizen HealthSearch (CHS) A proposal to Wellcome Trust ( Open Research in Health call) with ContentMine, Cochrane and UCL-EPPI (CCU) CHS puts semantic search on the desktop of the searcher. We index all the visible Medical literature, normalize, section and index against a bank of user-chosen dictionaries. CHS takes input from EPMC, bioRxiv and emerging community sources such as Crossref, unpaywall and outputs to Zenodo, Wikidata and CM-Science Source. Citizen Dashboard
  • 18.
    5 million OpenScientific articles ( 0.5 TB), indexed by ContentMine . Disk 30 GBP Raspberry Pi3. 50 GBP CC BY, PeterMR Disk Raspberry PI Power CONTAINERISATION!
  • 19.
  • 20.
    The world’s existentialproblems need knowledge 2019* “Open Climate Knowledge” (OCK) to build tools to mine the scientific articles about climate change. Why? 50-90% of all published science is PAYWALLED. The rest is very hard to find… *Simon Worthington and PMR BUT COVID-19 hit …
  • 21.
    http://www.budapestopenaccessinitiative.org/read … an unprecedentedpublic good. … … completely free and unrestricted access to [peer- reviewed literature] by all scientists, scholars, teachers, students, and other curious minds. … …Removing access barriers to this literature will accelerate research, enrich education, share the learning of the rich with the poor and the poor with the rich, make this literature as useful as it can be, and lay the foundation for uniting humanity in a common intellectual conversation and quest for knowledge. (Budapest Open Access Initiative, 2003)