Virtual, 2020-05-28
openVirus
Scientific Knowledge for citizens in the time of COVID
TheContentMine and Others
Presented by Peter Murray-Rust
A new knowledgebase by/for citizens
Images from ContentMine CC BY and Wikimedia CC BY-SA
pm286@cam.ac.uk
peter@contentmine.org
INYAS NIPGR
KARYA
1982 scientific paper predicts 2015 Liberian Ebola outbreak
TIGR2ESS 2019 Delhi
Food Security FEB 2019
Priya and Kareena 3rd year
interns on openVirus
JUNE 2020
Reusable
• Technology
• Strategy
• management
5 core facets country
drug
funder
disease
virus
8 miniprojects
Corpus (950 docs)
Dictionary (3000 terms)
3 complex topics:
- zoonosis (animal hosts)
- non-pharmaceutical (masks, social distancing, etc.)
- test and trace
- country
- disease
- drug
- funder
- virus
framework: ami + CProject data
scrapers: getpapers, Ferret, curl, scrapy
cleaners: PDFBox, Tidy/Jsoup, etc. Grobid
transformers: xml2html, ami ocr, KNIME
dictionaries: ami dictionary
indexing and annotation: Solr, ami
Analysis and display: R, KNIME
SOME TOOLS
challenges
- connectivity
- language
- equipment (laptops mobiles)
skills learnt
- science/medicine - viruses and disease
- data science - tools, quality, validation, repositories
- machine learning, NLP
- management - planning, recording, collaboration
Achievement 2020-08-10 Interns Jitu, Urja, Pooja, Simraleen, Omprakash, Dheeraj
Slides from Peter Murray-Rust, CC-BY ; images NIPGR, Wikimedia, PM-R

Open Virus Indian Presentation

  • 1.
    Virtual, 2020-05-28 openVirus Scientific Knowledgefor citizens in the time of COVID TheContentMine and Others Presented by Peter Murray-Rust A new knowledgebase by/for citizens Images from ContentMine CC BY and Wikimedia CC BY-SA pm286@cam.ac.uk peter@contentmine.org INYAS NIPGR KARYA
  • 2.
    1982 scientific paperpredicts 2015 Liberian Ebola outbreak
  • 3.
    TIGR2ESS 2019 Delhi FoodSecurity FEB 2019 Priya and Kareena 3rd year interns on openVirus JUNE 2020 Reusable • Technology • Strategy • management
  • 4.
    5 core facetscountry drug funder disease virus
  • 5.
    8 miniprojects Corpus (950docs) Dictionary (3000 terms) 3 complex topics: - zoonosis (animal hosts) - non-pharmaceutical (masks, social distancing, etc.) - test and trace - country - disease - drug - funder - virus
  • 6.
    framework: ami +CProject data scrapers: getpapers, Ferret, curl, scrapy cleaners: PDFBox, Tidy/Jsoup, etc. Grobid transformers: xml2html, ami ocr, KNIME dictionaries: ami dictionary indexing and annotation: Solr, ami Analysis and display: R, KNIME SOME TOOLS
  • 7.
    challenges - connectivity - language -equipment (laptops mobiles) skills learnt - science/medicine - viruses and disease - data science - tools, quality, validation, repositories - machine learning, NLP - management - planning, recording, collaboration Achievement 2020-08-10 Interns Jitu, Urja, Pooja, Simraleen, Omprakash, Dheeraj Slides from Peter Murray-Rust, CC-BY ; images NIPGR, Wikimedia, PM-R