SlideShare a Scribd company logo
1 of 60
OMDI2021 , 2021-10-05
Scientific Ontologies in the Digital Age
Peter Murray-Rust
University of Cambridge
collaborators
Matthew Dunstan (Cambridge) ,
Shweata N Hegde (NIPGR)
Images from ContentMine CC BY and Wikimedia CC BY-SA
pm286@cam.ac.uk
peter@contentmine.org
Talk will show a range of ontologies, demos, software, and suggestions.
Ontologies support a
Computable
Reusable
Object
Narrative
• Ontologies are relevant to all fields, not just
materials, so this talk is multidisciplinary
• The software and techniques shown are very
widely applicable
• All ontologies must aim to be FAIR, and
frictionless (helpful, no login).
• Ontologies need software. They are
computable and merge into objects and
declarative programs.
Notes
What do these mean?
Drug Drug
Receptor
Computable Reusable Object Narrative
What’s an Ontology? And its purpose?
Purposes:
• ICD-10 for national reporting every disease/condition
• ICD-10-CM diagnosis and insurance (USA)
We use Dictionary == Ontology
Why ontologies?
• Explanation, translation, to humans
• Data validation e.g. CIF, crystallography
• Data transformation , CML[1] comp. chemistry
• Linking to Linked Data Cloud
• Mining documents by words, e.g. lithium battery
• Mining documents by data, e.g. cell dimension
• Contractual process (ICD-10-CM) US insurance
• Sociopolitical (DSM). Linguistic research gender
[1] Chemical Markup Language (Computable
Reusable Object Narrative)
Diagnostic and Statistical Manual of Mental Disorders
Ontologies constrain/enhance the way we think and talk
Where do ontologies come from?
• Authoritative bodies
– Government and NGO (CERN, NIH, EBI, Brookhaven)
– Learned societies (IUCr)
– Major labs
• Industry
• Community
– Researchers [1]
– Wikidata
Long-term ontologies come from years of dedicated human effort, especially
as support is needed.
Adoption requires consistency, running code, tutorials, support, users
[1] In our current projects (e.g. battery materials, or terpene synthases) we
use multiple rapid small linked dictionaries and Wikidata
Ontologies in practice
http://chemicaltagger.ch.cam.ac.uk/
• Typical
Typical chemical synthesis
This contains several small annotating and parsing ontologies
http://chemicaltagger.ch.cam.ac.uk/
• Typical
Typical chemical synthesis
This contains several small annotating and parsing ontologies
Materials discourse
The syntheses of NiO and LiNi 0.4Mn 0.4Co 0.18Ti 0.02O 2 (NMC) were performed according to
previously developed protocols 56. In short, NiO was synthesized using a solvothermal method aided
with an alcohol pseudo-supercritical drying technique. NMC was synthesized using a co-precipitation
method followed by high-temperature annealing with LiOH. 2032 Coin cells were fabricated using
composites of NiO or NMC as working electrodes and lithium metal foils as counter electrodes. The NiO
working electrodes were composed of 80 wt.% active material, 10 wt.% polyvinylidene fluoride (Kureha
Chemical Ind. Co. Ltd) and 10 wt.% acetylene carbon black (Denka, 50% compressed) and loadings were
typically 12 mg/cm 2 of active material. To make the electrodes, these solids were mixed into N-methyl-
2-pyrrolidinone and the resulting slurry cast onto copper current collectors and dried. NMC working
electrodes were prepared similarly and contained 84 wt.% active material, 8 wt.% polyvinylidene
fluoride, 4 wt.% acetylene carbon black and 4 wt.% SFG-6 synthetic graphite on carbon-coated
aluminum current collectors, with typical active material loadings of 67 mg/cm 2. The coin cells were
assembled in a helium-filled glove box using Celgard 2400 separators and 1 M LiPF6 electrolyte in 1:2
w/w ethylene carbonate/dimethyl carbonate (Ferro Corporation). Battery testing was performed on a
computer controlled VMP3 potentiostat/galvanostat (BioLogic). NiO and NMC electrodes were cycled at
C/2 and C/20 rates, respectively. 1C was defined as fully discharging or charging an electrode in 1 h,
corresponding to specific current densities of 718 mA/g and 280 mA/g for NiO and NMC materials,
respectively.
http://chemicaltagger.ch.cam.ac.uk/
SCIENTIFIC REPORTS | 4 : 5694 | DOI: 10.1038/srep05694
Cut-n-paste into
Written ca 2007 by Lezan Hawizy
Daniel Lowe wrote OPSIN (name to structure)
ChemicalTagger “out of the box” on Materials discourse
The unmarked fields need ontologies!
Crystallography: CIF where much of
this began
Hall SR, Allen FH, Brown ID (1991). "The Crystallographic Information File (CIF):
a new standard archive file for crystallography". Acta Crystallographica Section A. 47 (6):
655–685. doi:10.1107/S010876739101067X.
CIF: crystallographic ontology – a model beyond formats
30 years !
In the late [1970s] the IUCr Commissions […] promoted the
development of the Standard Crystallographic File Structure
“Framework” (model) , not “file”
[late 1980s] IUCr promoted the submission of data
in machine-readable form
CIF Supports:
• Editing
• Checking
• Transformation
• Human discourse
Unique_id
Datatype
Classification
Error limits
Allowed range
Mandated units
Typical CIF data entry
Data can be mixed with text (LaTeX)
Container of Name-value pairs
CIF Supports:
• Editing
• Checking
• Transformation
• Human discourse
Unique_id
Datatype
Classification
Error limits
Allowed range
Mandated units
Typical CIF data entry
Data can be mixed with text (LaTeX)
Container of Name-value pairs
Computable Reusable Object
Bibliography. YOU don’t need to invent your own
Communal ontologies
• DBpedia – a dataset containing extracted data from
Wikipedia; it contains about 3.4 million concepts
described by 1 billion triples, including abstracts in 11
different languages
• GeoNames – provides RDF descriptions of more than
7,500,000 geographical features worldwide.
• Wikidata – a collaboratively-created linked dataset that
acts as central storage for the structured data of
its Wikimedia Foundation sister projects
• Global Research Identifier Database (GRID) – an
international database of 89,506 institutions engaged
in academic research
Gene Ontology (GO) and browser links Species, Genes and Proteins
Maize
Proteins
Gene Product
Wikidata in Linked OpenData cloud
Wikidata in Linked OpenData cloud
https://en.wikipedia.org/wiki/Linked_data
DBpedia
PDB
Drugbank
UniProt
ChEMBL
Gene
Ontolopgy
PubChem
DBPedia 2007
Collection of
links in Wikipedia
https://query.wikidata.org/
https://query.wikidata.org/#%23A%20network%20of%20D
#A network of Drug-disease interactions on infectious diseases (Source: Disease Ontology, NDF-RT and ChEMBL)
#defaultView:Graph
SELECT DISTINCT ?item ?itemLabel ?rgb ?link
WHERE
{
VALUES ?toggle { true false }
?disease wdt:P699 ?doid;
wdt:P279+ wd:Q18123741;
wdt:P2176 ?drug.
?drug rdfs:label ?drugLabel.
FILTER(LANG(?drugLabel) = "en").
?disease rdfs:label ?diseaseLabel.
FILTER(LANG(?diseaseLabel) = "en").
BIND(IF(?toggle,?disease,?drug) AS ?item).
BIND(IF(?toggle,?diseaseLabel,?drugLabel) AS ?itemLabel).
Graph
BIND(IF(?toggle,"FFA500","7FFF00") AS ?rgb).
BIND(IF(?toggle,"",?disease) AS ?link).
}
Search, annotation, mining
Unsupervised Extraction of phrases from 100 papers( YAKE, SciSpacy)
LLZNO ceramic
Active material
Graphene
Ceramic pellets
Open circuit
DMF molecules
Voltage plateau
(Shweata N Hegde, Mysore)
Unsupervised Extraction of phrases from 100 papers( YAKE, SciSpacy)
LLZNO ceramic
Active material
Graphene
Ceramic pellets
Open circuit
DMF molecules
Voltage plateau
(Shweata N Hegde, Mysore)
We make a simple dictionary for materials
<dictionary title="materials">
<entry term="anode" wikipedia="anode"
wikidata="Q181232"
description="electrode through which
conventional current flows into a
polarized electrical device"/>
<entry term="cathode"
wikidata="Q175233"
description="electrode from which …"/>
<entry term="current density"
wikidata="Q77680811”/> …
Annotation using Dictionaries
file="liion/PMC4062906/methods/search/
elements/results.xml">
<result pre="foil and dried at
80°C under vacuum for 5 h."
exact="Lithium" post="sheet was served
as counter and reference …"/>
methods = section
PMC4062906 = reference
elements = dictionary
(annotation to W3C spec)
What is “Content”?
http://www.plosone.org/article/fetchObject.action?uri=info:doi/10.1371/journal.pone.01113
03&representation=PDF CC-BY
SECTIONS
MAPS
TABLES
CHEMISTRY
TEXT
MATH
contentmine.org tackles these
What is “Content”?
http://www.plosone.org/article/fetchObject.action?uri=info:doi/10.1371/journal.pone.01113
03&representation=PDF CC-BY
SECTIONS
MAPS
TABLES
CHEMISTRY
TEXT
MATH
contentmine.org tackles these
What is “Content”?
http://www.plosone.org/article/fetchObject.action?uri=info:doi/10.1371/journal.pone.01113
03&representation=PDF CC-BY
SECTIONS
MAPS
TABLES
CHEMISTRY
TEXT
MATH
contentmine.org tackles these
What is “Content”?
http://www.plosone.org/article/fetchObject.action?uri=info:doi/10.1371/journal.pone.01113
03&representation=PDF CC-BY
SECTIONS
MAPS
TABLES
CHEMISTRY
TEXT
MATH
contentmine.org tackles these
What is “Content”?
http://www.plosone.org/article/fetchObject.action?uri=info:doi/10.1371/journal.pone.01113
03&representation=PDF CC-BY
SECTIONS
MAPS
TABLES
CHEMISTRY
TEXT
MATH
contentmine.org tackles these
What is “Content”?
http://www.plosone.org/article/fetchObject.action?uri=info:doi/10.1371/journal.pone.01113
03&representation=PDF CC-BY
SECTIONS
MAPS
TABLES
CHEMISTRY
TEXT
MATH
contentmine.org tackles these
getpapers -q "lithium-ion battery" -n
info: Searching using eupmc API
info: Found 3305 open access results
framework: ami + CProject data
scrapers: getpapers, Ferret, curl, scrapy
cleaners: PDFBox, Tidy/Jsoup, etc. Grobid
transformers: xml2html, ami ocr, KNIME
dictionaries: ami dictionary
indexing and annotation: Solr, ami
Analysis and display: R, KNIME
ContentMine Tools
scrape clean annotate display
abstract
methods
references
Captioned
Figures
Fig. 1
HTML tables
abstract
methods
references
Captioned
Figures
Fig. 1
HTML tables
Dict A
Dict B
Image
Caption
Table
Caption
MINING
with sections
and dictionaries
[W3C Annotation / https://hypothes.is/ ]
Dashboard of 200 articles searched with 5 dictionaries
biblio country element funder magnetism battery
Dashboard of 200 articles searched with 5 dictionaries
biblio country element funder magnetism battery
Downstream Computation:
Co-occurrence of words
Mining images with ontologies
Can we get anything useful with automatic tools?
Work with Matthew Dunstan, Cambridge –
Cyclic voltammetry of battery materials
Raw materials
Ball milling at 800 rpm for 6 h.
Drying at 70 °C for 14h.
Mixture powder
Calcined at 950 °C for 12 h.
LLZNO powder
Attrition milling at 1000 rpm for 2 h and
drying at 70 °C for 14 h.
Submicron LLZNO powder
Pressed into pellets with 19 mm diameter at
200 MPa for 3 min.
Green pellets
Sintering without mother powder.
LLZNO ceramics
Mining text from images with Tesseract
Figure Extracted text
Mining data from plots
Current density: 3 Ag?
0 300 600 900 1200 1500 1800
Cycle numbers
Specific
capacity
(mAh
g!
1600
1400
600
400
200
0
An ontology with units would easily fix the errors
Extraction of data from diagrams
ami-image
ami-pixel
Extraction of data from diagrams
Force fields need a computable ontology
Chemical Markup Language (CML) + dictionaries supports
chemistry as a computable ontology
• Reactions
• Spectra
• Crystallography and nano
• Polymers
• Computational chemistry
CML + dictionaries supports
chemistry as a computable ontology
• Reactions
• Spectra
• Crystallography and nano
• Polymers
• Computational chemistry
CompChem Logfile
(VASP, CASTEP, etc…
Free Document Sources
https://ethos.bl.uk/Home.do
https://www.redalyc.org/
100,000 Theses
4,700,000 abstracts
50,000 preprints
https://doaj.org https://biorxiv.org
https://medrxiv.org
Mexico, Latin America
https://europepmc.org
And your archive?
UK Theses (EThOS)
A full-text search API to find relevant
theses.
data from the EThOS service and the tools
of the UK Web Archive -> full-text search
API to find relevant theses.
1: Searching eTheses for the openVirus
project
2: Bringing Metadata & Full-text Together
This notebook illustrates how to use the
API
Andy Jackson
All tools (mining, ontologies, etc.) are Open.
Happy to collaborate.
pm286@cam.ac.uk
https://github.com/petermr : many repositories
Thanks:
Mathew Dunstan: Batteries
Shweata N Hegde: word extractions
Ayusg Garg: pygetpapers
Lezan Hawizy: ChemicalTagger
Daniel Lowe: OPSIN
Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Age

More Related Content

What's hot

ContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKpetermurrayrust
 
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?petermurrayrust
 
Biovision2017 Accessing the scientific literature
Biovision2017 Accessing the scientific literatureBiovision2017 Accessing the scientific literature
Biovision2017 Accessing the scientific literaturepetermurrayrust
 
ContentMining and Copyright at CopyCamp2017
ContentMining and Copyright at CopyCamp2017ContentMining and Copyright at CopyCamp2017
ContentMining and Copyright at CopyCamp2017petermurrayrust
 
Digital Scholarship: Enlightenment or Devastated Landscape?
Digital Scholarship: Enlightenment or Devastated Landscape? Digital Scholarship: Enlightenment or Devastated Landscape?
Digital Scholarship: Enlightenment or Devastated Landscape? TheContentMine
 
Content Mining of Science in Europe
Content Mining of Science in EuropeContent Mining of Science in Europe
Content Mining of Science in Europepetermurrayrust
 
High throughput mining of the plant-science literature
High throughput mining of the plant-science literatureHigh throughput mining of the plant-science literature
High throughput mining of the plant-science literaturepetermurrayrust
 
Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature TheContentMine
 
ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!petermurrayrust
 
Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literatureAutomatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literaturepetermurrayrust
 
Content Mining of Science in Cambridge
Content Mining of Science in CambridgeContent Mining of Science in Cambridge
Content Mining of Science in CambridgeTheContentMine
 
Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)TheContentMine
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureTheContentMine
 
Climate Change and Human Migration
Climate Change and Human MigrationClimate Change and Human Migration
Climate Change and Human Migrationpetermurrayrust
 
Mining the scientific literature for plants and chemistry
Mining the scientific literature for plants and chemistryMining the scientific literature for plants and chemistry
Mining the scientific literature for plants and chemistrypetermurrayrust
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literatureHigh throughput mining of the scholarly literature
High throughput mining of the scholarly literaturepetermurrayrust
 
Can machines understand the scientific literature
Can machines understand the scientific literatureCan machines understand the scientific literature
Can machines understand the scientific literaturepetermurrayrust
 
Wikidata and the Semantic Web of Food
Wikidata and the  Semantic Web of FoodWikidata and the  Semantic Web of Food
Wikidata and the Semantic Web of FoodBenjamin Good
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literatureAmanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literaturepetermurrayrust
 
Open Tree of Life @NSF
Open Tree of Life @NSFOpen Tree of Life @NSF
Open Tree of Life @NSFKaren Cranston
 

What's hot (20)

ContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UK
 
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
 
Biovision2017 Accessing the scientific literature
Biovision2017 Accessing the scientific literatureBiovision2017 Accessing the scientific literature
Biovision2017 Accessing the scientific literature
 
ContentMining and Copyright at CopyCamp2017
ContentMining and Copyright at CopyCamp2017ContentMining and Copyright at CopyCamp2017
ContentMining and Copyright at CopyCamp2017
 
Digital Scholarship: Enlightenment or Devastated Landscape?
Digital Scholarship: Enlightenment or Devastated Landscape? Digital Scholarship: Enlightenment or Devastated Landscape?
Digital Scholarship: Enlightenment or Devastated Landscape?
 
Content Mining of Science in Europe
Content Mining of Science in EuropeContent Mining of Science in Europe
Content Mining of Science in Europe
 
High throughput mining of the plant-science literature
High throughput mining of the plant-science literatureHigh throughput mining of the plant-science literature
High throughput mining of the plant-science literature
 
Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature
 
ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!
 
Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literatureAutomatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature
 
Content Mining of Science in Cambridge
Content Mining of Science in CambridgeContent Mining of Science in Cambridge
Content Mining of Science in Cambridge
 
Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the Literature
 
Climate Change and Human Migration
Climate Change and Human MigrationClimate Change and Human Migration
Climate Change and Human Migration
 
Mining the scientific literature for plants and chemistry
Mining the scientific literature for plants and chemistryMining the scientific literature for plants and chemistry
Mining the scientific literature for plants and chemistry
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literatureHigh throughput mining of the scholarly literature
High throughput mining of the scholarly literature
 
Can machines understand the scientific literature
Can machines understand the scientific literatureCan machines understand the scientific literature
Can machines understand the scientific literature
 
Wikidata and the Semantic Web of Food
Wikidata and the  Semantic Web of FoodWikidata and the  Semantic Web of Food
Wikidata and the Semantic Web of Food
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literatureAmanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature
 
Open Tree of Life @NSF
Open Tree of Life @NSFOpen Tree of Life @NSF
Open Tree of Life @NSF
 

Similar to Omdi2021 Ontologies for (Materials) Science in the Digital Age

MADICES Mungall 2022.pptx
MADICES Mungall 2022.pptxMADICES Mungall 2022.pptx
MADICES Mungall 2022.pptxChris Mungall
 
Collaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeCollaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeChris Mungall
 
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...Dr. Haxel Consult
 
The Logical Model Designer - Binding Information Models to Terminology
The Logical Model Designer - Binding Information Models to TerminologyThe Logical Model Designer - Binding Information Models to Terminology
The Logical Model Designer - Binding Information Models to TerminologySnow Owl
 
FAIR connectivity for DARCP
FAIR  connectivity for DARCPFAIR  connectivity for DARCP
FAIR connectivity for DARCPChris Southan
 
Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...Valery Tkachenko
 
EVOLUTION OF ONTOLOGY-BASED MAPPINGS
EVOLUTION OF ONTOLOGY-BASED MAPPINGSEVOLUTION OF ONTOLOGY-BASED MAPPINGS
EVOLUTION OF ONTOLOGY-BASED MAPPINGSAksw Group
 
Computational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methodsComputational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methodsAnubhav Jain
 
The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...Anubhav Jain
 
Introduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental BiologyIntroduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental BiologyBarry Smith
 
biotechnology of aminophenol PhD defenseppt.ppt
biotechnology of aminophenol PhD defenseppt.pptbiotechnology of aminophenol PhD defenseppt.ppt
biotechnology of aminophenol PhD defenseppt.pptmisgana18
 
On chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurementsOn chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurementsNina Jeliazkova
 
DisGeNET: A discovery platform for the dynamical exploration of human disease...
DisGeNET: A discovery platform for the dynamical exploration of human disease...DisGeNET: A discovery platform for the dynamical exploration of human disease...
DisGeNET: A discovery platform for the dynamical exploration of human disease...Núria Queralt Rosinach
 
wolstencroft-ogf20-astro
wolstencroft-ogf20-astrowolstencroft-ogf20-astro
wolstencroft-ogf20-astrowebuploader
 
Connectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivityConnectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivityChris Southan
 

Similar to Omdi2021 Ontologies for (Materials) Science in the Digital Age (20)

MADICES Mungall 2022.pptx
MADICES Mungall 2022.pptxMADICES Mungall 2022.pptx
MADICES Mungall 2022.pptx
 
Collaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeCollaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of Life
 
NRNB EAC Meeting 2012
NRNB EAC Meeting 2012NRNB EAC Meeting 2012
NRNB EAC Meeting 2012
 
BioNLPSADI
BioNLPSADIBioNLPSADI
BioNLPSADI
 
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
 
2016 bmdid-mappings
2016 bmdid-mappings2016 bmdid-mappings
2016 bmdid-mappings
 
The Logical Model Designer - Binding Information Models to Terminology
The Logical Model Designer - Binding Information Models to TerminologyThe Logical Model Designer - Binding Information Models to Terminology
The Logical Model Designer - Binding Information Models to Terminology
 
FAIR connectivity for DARCP
FAIR  connectivity for DARCPFAIR  connectivity for DARCP
FAIR connectivity for DARCP
 
Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...
 
ICME Workshop Jul 2014 - The Materials Project
ICME Workshop Jul 2014 - The Materials ProjectICME Workshop Jul 2014 - The Materials Project
ICME Workshop Jul 2014 - The Materials Project
 
EVOLUTION OF ONTOLOGY-BASED MAPPINGS
EVOLUTION OF ONTOLOGY-BASED MAPPINGSEVOLUTION OF ONTOLOGY-BASED MAPPINGS
EVOLUTION OF ONTOLOGY-BASED MAPPINGS
 
Computational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methodsComputational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methods
 
Liquid Chromatography
Liquid ChromatographyLiquid Chromatography
Liquid Chromatography
 
The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...
 
Introduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental BiologyIntroduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental Biology
 
biotechnology of aminophenol PhD defenseppt.ppt
biotechnology of aminophenol PhD defenseppt.pptbiotechnology of aminophenol PhD defenseppt.ppt
biotechnology of aminophenol PhD defenseppt.ppt
 
On chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurementsOn chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurements
 
DisGeNET: A discovery platform for the dynamical exploration of human disease...
DisGeNET: A discovery platform for the dynamical exploration of human disease...DisGeNET: A discovery platform for the dynamical exploration of human disease...
DisGeNET: A discovery platform for the dynamical exploration of human disease...
 
wolstencroft-ogf20-astro
wolstencroft-ogf20-astrowolstencroft-ogf20-astro
wolstencroft-ogf20-astro
 
Connectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivityConnectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivity
 

More from petermurrayrust

Open Science Principles and Practice
Open Science Principles and PracticeOpen Science Principles and Practice
Open Science Principles and Practicepetermurrayrust
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentationpetermurrayrust
 
Automatic mining of data from materials science literature
Automatic mining of data from materials science literatureAutomatic mining of data from materials science literature
Automatic mining of data from materials science literaturepetermurrayrust
 
openVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on virusesopenVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on virusespetermurrayrust
 
XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?petermurrayrust
 
Early Career Reseachers and Open Healthcare
Early Career Reseachers and Open HealthcareEarly Career Reseachers and Open Healthcare
Early Career Reseachers and Open Healthcarepetermurrayrust
 
Rapid biomedical search
Rapid biomedical search Rapid biomedical search
Rapid biomedical search petermurrayrust
 
Scientific search for everyone
Scientific search for everyoneScientific search for everyone
Scientific search for everyonepetermurrayrust
 
Openplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searchingOpenplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searchingpetermurrayrust
 
Extracting science from the archive
Extracting science from the archiveExtracting science from the archive
Extracting science from the archivepetermurrayrust
 
WikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and EverythingWikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and Everythingpetermurrayrust
 
Disrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic ComplexDisrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic Complexpetermurrayrust
 
Paradise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to MineParadise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to Minepetermurrayrust
 
Young people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge NeocolonialismYoung people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge Neocolonialismpetermurrayrust
 
WikiFactMine: Science for Everyone
WikiFactMine: Science for EveryoneWikiFactMine: Science for Everyone
WikiFactMine: Science for Everyonepetermurrayrust
 
WikiFactMine for Plant Chemistry
WikiFactMine for Plant ChemistryWikiFactMine for Plant Chemistry
WikiFactMine for Plant Chemistrypetermurrayrust
 
Asking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolismAsking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolismpetermurrayrust
 
Asking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolismAsking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolismpetermurrayrust
 

More from petermurrayrust (18)

Open Science Principles and Practice
Open Science Principles and PracticeOpen Science Principles and Practice
Open Science Principles and Practice
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentation
 
Automatic mining of data from materials science literature
Automatic mining of data from materials science literatureAutomatic mining of data from materials science literature
Automatic mining of data from materials science literature
 
openVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on virusesopenVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on viruses
 
XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?
 
Early Career Reseachers and Open Healthcare
Early Career Reseachers and Open HealthcareEarly Career Reseachers and Open Healthcare
Early Career Reseachers and Open Healthcare
 
Rapid biomedical search
Rapid biomedical search Rapid biomedical search
Rapid biomedical search
 
Scientific search for everyone
Scientific search for everyoneScientific search for everyone
Scientific search for everyone
 
Openplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searchingOpenplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searching
 
Extracting science from the archive
Extracting science from the archiveExtracting science from the archive
Extracting science from the archive
 
WikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and EverythingWikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and Everything
 
Disrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic ComplexDisrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic Complex
 
Paradise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to MineParadise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to Mine
 
Young people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge NeocolonialismYoung people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge Neocolonialism
 
WikiFactMine: Science for Everyone
WikiFactMine: Science for EveryoneWikiFactMine: Science for Everyone
WikiFactMine: Science for Everyone
 
WikiFactMine for Plant Chemistry
WikiFactMine for Plant ChemistryWikiFactMine for Plant Chemistry
WikiFactMine for Plant Chemistry
 
Asking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolismAsking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolism
 
Asking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolismAsking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolism
 

Recently uploaded

Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 

Recently uploaded (20)

Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 

Omdi2021 Ontologies for (Materials) Science in the Digital Age

  • 1. OMDI2021 , 2021-10-05 Scientific Ontologies in the Digital Age Peter Murray-Rust University of Cambridge collaborators Matthew Dunstan (Cambridge) , Shweata N Hegde (NIPGR) Images from ContentMine CC BY and Wikimedia CC BY-SA pm286@cam.ac.uk peter@contentmine.org Talk will show a range of ontologies, demos, software, and suggestions.
  • 3. • Ontologies are relevant to all fields, not just materials, so this talk is multidisciplinary • The software and techniques shown are very widely applicable • All ontologies must aim to be FAIR, and frictionless (helpful, no login). • Ontologies need software. They are computable and merge into objects and declarative programs. Notes
  • 4. What do these mean? Drug Drug Receptor Computable Reusable Object Narrative
  • 5. What’s an Ontology? And its purpose? Purposes: • ICD-10 for national reporting every disease/condition • ICD-10-CM diagnosis and insurance (USA) We use Dictionary == Ontology
  • 6. Why ontologies? • Explanation, translation, to humans • Data validation e.g. CIF, crystallography • Data transformation , CML[1] comp. chemistry • Linking to Linked Data Cloud • Mining documents by words, e.g. lithium battery • Mining documents by data, e.g. cell dimension • Contractual process (ICD-10-CM) US insurance • Sociopolitical (DSM). Linguistic research gender [1] Chemical Markup Language (Computable Reusable Object Narrative)
  • 7. Diagnostic and Statistical Manual of Mental Disorders Ontologies constrain/enhance the way we think and talk
  • 8. Where do ontologies come from? • Authoritative bodies – Government and NGO (CERN, NIH, EBI, Brookhaven) – Learned societies (IUCr) – Major labs • Industry • Community – Researchers [1] – Wikidata Long-term ontologies come from years of dedicated human effort, especially as support is needed. Adoption requires consistency, running code, tutorials, support, users [1] In our current projects (e.g. battery materials, or terpene synthases) we use multiple rapid small linked dictionaries and Wikidata
  • 10. http://chemicaltagger.ch.cam.ac.uk/ • Typical Typical chemical synthesis This contains several small annotating and parsing ontologies
  • 11. http://chemicaltagger.ch.cam.ac.uk/ • Typical Typical chemical synthesis This contains several small annotating and parsing ontologies
  • 12. Materials discourse The syntheses of NiO and LiNi 0.4Mn 0.4Co 0.18Ti 0.02O 2 (NMC) were performed according to previously developed protocols 56. In short, NiO was synthesized using a solvothermal method aided with an alcohol pseudo-supercritical drying technique. NMC was synthesized using a co-precipitation method followed by high-temperature annealing with LiOH. 2032 Coin cells were fabricated using composites of NiO or NMC as working electrodes and lithium metal foils as counter electrodes. The NiO working electrodes were composed of 80 wt.% active material, 10 wt.% polyvinylidene fluoride (Kureha Chemical Ind. Co. Ltd) and 10 wt.% acetylene carbon black (Denka, 50% compressed) and loadings were typically 12 mg/cm 2 of active material. To make the electrodes, these solids were mixed into N-methyl- 2-pyrrolidinone and the resulting slurry cast onto copper current collectors and dried. NMC working electrodes were prepared similarly and contained 84 wt.% active material, 8 wt.% polyvinylidene fluoride, 4 wt.% acetylene carbon black and 4 wt.% SFG-6 synthetic graphite on carbon-coated aluminum current collectors, with typical active material loadings of 67 mg/cm 2. The coin cells were assembled in a helium-filled glove box using Celgard 2400 separators and 1 M LiPF6 electrolyte in 1:2 w/w ethylene carbonate/dimethyl carbonate (Ferro Corporation). Battery testing was performed on a computer controlled VMP3 potentiostat/galvanostat (BioLogic). NiO and NMC electrodes were cycled at C/2 and C/20 rates, respectively. 1C was defined as fully discharging or charging an electrode in 1 h, corresponding to specific current densities of 718 mA/g and 280 mA/g for NiO and NMC materials, respectively. http://chemicaltagger.ch.cam.ac.uk/ SCIENTIFIC REPORTS | 4 : 5694 | DOI: 10.1038/srep05694 Cut-n-paste into Written ca 2007 by Lezan Hawizy Daniel Lowe wrote OPSIN (name to structure)
  • 13. ChemicalTagger “out of the box” on Materials discourse The unmarked fields need ontologies!
  • 14. Crystallography: CIF where much of this began
  • 15. Hall SR, Allen FH, Brown ID (1991). "The Crystallographic Information File (CIF): a new standard archive file for crystallography". Acta Crystallographica Section A. 47 (6): 655–685. doi:10.1107/S010876739101067X. CIF: crystallographic ontology – a model beyond formats 30 years ! In the late [1970s] the IUCr Commissions […] promoted the development of the Standard Crystallographic File Structure “Framework” (model) , not “file” [late 1980s] IUCr promoted the submission of data in machine-readable form
  • 16. CIF Supports: • Editing • Checking • Transformation • Human discourse Unique_id Datatype Classification Error limits Allowed range Mandated units Typical CIF data entry Data can be mixed with text (LaTeX) Container of Name-value pairs
  • 17. CIF Supports: • Editing • Checking • Transformation • Human discourse Unique_id Datatype Classification Error limits Allowed range Mandated units Typical CIF data entry Data can be mixed with text (LaTeX) Container of Name-value pairs Computable Reusable Object
  • 18.
  • 19. Bibliography. YOU don’t need to invent your own
  • 21. • DBpedia – a dataset containing extracted data from Wikipedia; it contains about 3.4 million concepts described by 1 billion triples, including abstracts in 11 different languages • GeoNames – provides RDF descriptions of more than 7,500,000 geographical features worldwide. • Wikidata – a collaboratively-created linked dataset that acts as central storage for the structured data of its Wikimedia Foundation sister projects • Global Research Identifier Database (GRID) – an international database of 89,506 institutions engaged in academic research
  • 22. Gene Ontology (GO) and browser links Species, Genes and Proteins Maize Proteins Gene Product
  • 23. Wikidata in Linked OpenData cloud
  • 24. Wikidata in Linked OpenData cloud
  • 27. https://query.wikidata.org/ https://query.wikidata.org/#%23A%20network%20of%20D #A network of Drug-disease interactions on infectious diseases (Source: Disease Ontology, NDF-RT and ChEMBL) #defaultView:Graph SELECT DISTINCT ?item ?itemLabel ?rgb ?link WHERE { VALUES ?toggle { true false } ?disease wdt:P699 ?doid; wdt:P279+ wd:Q18123741; wdt:P2176 ?drug. ?drug rdfs:label ?drugLabel. FILTER(LANG(?drugLabel) = "en"). ?disease rdfs:label ?diseaseLabel. FILTER(LANG(?diseaseLabel) = "en"). BIND(IF(?toggle,?disease,?drug) AS ?item). BIND(IF(?toggle,?diseaseLabel,?drugLabel) AS ?itemLabel). Graph BIND(IF(?toggle,"FFA500","7FFF00") AS ?rgb). BIND(IF(?toggle,"",?disease) AS ?link). }
  • 28.
  • 29.
  • 31. Unsupervised Extraction of phrases from 100 papers( YAKE, SciSpacy) LLZNO ceramic Active material Graphene Ceramic pellets Open circuit DMF molecules Voltage plateau (Shweata N Hegde, Mysore)
  • 32. Unsupervised Extraction of phrases from 100 papers( YAKE, SciSpacy) LLZNO ceramic Active material Graphene Ceramic pellets Open circuit DMF molecules Voltage plateau (Shweata N Hegde, Mysore)
  • 33. We make a simple dictionary for materials <dictionary title="materials"> <entry term="anode" wikipedia="anode" wikidata="Q181232" description="electrode through which conventional current flows into a polarized electrical device"/> <entry term="cathode" wikidata="Q175233" description="electrode from which …"/> <entry term="current density" wikidata="Q77680811”/> …
  • 34. Annotation using Dictionaries file="liion/PMC4062906/methods/search/ elements/results.xml"> <result pre="foil and dried at 80°C under vacuum for 5 h." exact="Lithium" post="sheet was served as counter and reference …"/> methods = section PMC4062906 = reference elements = dictionary (annotation to W3C spec)
  • 35.
  • 36. What is “Content”? http://www.plosone.org/article/fetchObject.action?uri=info:doi/10.1371/journal.pone.01113 03&representation=PDF CC-BY SECTIONS MAPS TABLES CHEMISTRY TEXT MATH contentmine.org tackles these What is “Content”? http://www.plosone.org/article/fetchObject.action?uri=info:doi/10.1371/journal.pone.01113 03&representation=PDF CC-BY SECTIONS MAPS TABLES CHEMISTRY TEXT MATH contentmine.org tackles these What is “Content”? http://www.plosone.org/article/fetchObject.action?uri=info:doi/10.1371/journal.pone.01113 03&representation=PDF CC-BY SECTIONS MAPS TABLES CHEMISTRY TEXT MATH contentmine.org tackles these
  • 37. What is “Content”? http://www.plosone.org/article/fetchObject.action?uri=info:doi/10.1371/journal.pone.01113 03&representation=PDF CC-BY SECTIONS MAPS TABLES CHEMISTRY TEXT MATH contentmine.org tackles these What is “Content”? http://www.plosone.org/article/fetchObject.action?uri=info:doi/10.1371/journal.pone.01113 03&representation=PDF CC-BY SECTIONS MAPS TABLES CHEMISTRY TEXT MATH contentmine.org tackles these What is “Content”? http://www.plosone.org/article/fetchObject.action?uri=info:doi/10.1371/journal.pone.01113 03&representation=PDF CC-BY SECTIONS MAPS TABLES CHEMISTRY TEXT MATH contentmine.org tackles these
  • 38. getpapers -q "lithium-ion battery" -n info: Searching using eupmc API info: Found 3305 open access results
  • 39. framework: ami + CProject data scrapers: getpapers, Ferret, curl, scrapy cleaners: PDFBox, Tidy/Jsoup, etc. Grobid transformers: xml2html, ami ocr, KNIME dictionaries: ami dictionary indexing and annotation: Solr, ami Analysis and display: R, KNIME ContentMine Tools scrape clean annotate display
  • 40. abstract methods references Captioned Figures Fig. 1 HTML tables abstract methods references Captioned Figures Fig. 1 HTML tables Dict A Dict B Image Caption Table Caption MINING with sections and dictionaries [W3C Annotation / https://hypothes.is/ ]
  • 41. Dashboard of 200 articles searched with 5 dictionaries biblio country element funder magnetism battery
  • 42. Dashboard of 200 articles searched with 5 dictionaries biblio country element funder magnetism battery
  • 44. Mining images with ontologies
  • 45. Can we get anything useful with automatic tools? Work with Matthew Dunstan, Cambridge – Cyclic voltammetry of battery materials
  • 46. Raw materials Ball milling at 800 rpm for 6 h. Drying at 70 °C for 14h. Mixture powder Calcined at 950 °C for 12 h. LLZNO powder Attrition milling at 1000 rpm for 2 h and drying at 70 °C for 14 h. Submicron LLZNO powder Pressed into pellets with 19 mm diameter at 200 MPa for 3 min. Green pellets Sintering without mother powder. LLZNO ceramics Mining text from images with Tesseract Figure Extracted text
  • 47. Mining data from plots Current density: 3 Ag? 0 300 600 900 1200 1500 1800 Cycle numbers Specific capacity (mAh g! 1600 1400 600 400 200 0 An ontology with units would easily fix the errors
  • 48. Extraction of data from diagrams
  • 50. Force fields need a computable ontology
  • 51. Chemical Markup Language (CML) + dictionaries supports chemistry as a computable ontology • Reactions • Spectra • Crystallography and nano • Polymers • Computational chemistry
  • 52. CML + dictionaries supports chemistry as a computable ontology • Reactions • Spectra • Crystallography and nano • Polymers • Computational chemistry CompChem Logfile (VASP, CASTEP, etc…
  • 53.
  • 54.
  • 55.
  • 56. Free Document Sources https://ethos.bl.uk/Home.do https://www.redalyc.org/ 100,000 Theses 4,700,000 abstracts 50,000 preprints https://doaj.org https://biorxiv.org https://medrxiv.org Mexico, Latin America https://europepmc.org And your archive?
  • 57. UK Theses (EThOS) A full-text search API to find relevant theses. data from the EThOS service and the tools of the UK Web Archive -> full-text search API to find relevant theses. 1: Searching eTheses for the openVirus project 2: Bringing Metadata & Full-text Together This notebook illustrates how to use the API Andy Jackson
  • 58. All tools (mining, ontologies, etc.) are Open. Happy to collaborate. pm286@cam.ac.uk https://github.com/petermr : many repositories Thanks: Mathew Dunstan: Batteries Shweata N Hegde: word extractions Ayusg Garg: pygetpapers Lezan Hawizy: ChemicalTagger Daniel Lowe: OPSIN