SlideShare a Scribd company logo
The Semantic Web in Physical

Science/Engineering
Peter Murray-Rust
University of Cambridge
Open Knowledge Foundation
Culham Laboratory, 2013-09-11, UK
Themes
To make the complete scientific literature
accessible to machines and humans
•
•
•
•

The Semantic Web.
The power and need for Open
Building Communities
Multidisciplinarity

Funding includes JISC, Unilever, EPSRC.
The Semantic Web
"The Semantic Web is an extension of the
current web in which information is given welldefined meaning, better enabling computers
and people to work in cooperation."
Tim Berners-Lee, James Hendler, Ora Lassila, The
Semantic Web, Scientific American, May 2001
The scientist’s amanuensis
• "The bane of my life is doing things I know computers could do
for me" (Dan Connolly, W3C)
Example: A semantic amanuensis could
• Give me a daily digest of zeolite papers
• Extract all the crystal structures from them
• Compute physical properties with GULP and NWChem
• Compare the results statistically
• Preserve and distribute the complete operation
• Prepare the results for publication

The semantic web is having a personal amanuensis
Linked Open Data – the world’s knowledge
RDF
triples
Music,
Social
Art
Literature

Knowledge
bases

DBPedia

Lib

GOV.uk
Comp

PDB

GOV
Ontologies

BIO

very little physical science 
http://upload.wikimedia.org/wikipedia/commons/3/34/LOD_Cloud_Diagram_as_of_September_2011.png
Linked Open data from Wikipedia

“Which Rivers flow into the Rhine and are longer
than 50 kilometers?” or “Which Skyscrapers
in China have more than 50 floors and have
been constructed before the year 2000?”
Open Crystallography?
“Which countries where tropical diseases are
endemic have published structures of chiral
natural products?”
CC-BY-SA from Wikipedia
Semantics: (Things Take Time)*
• 1994 1st WWW Conference
• 1994 , Chemical MIME , Chemical Markup
Language (Henry Rzepa, PMR)
• 2001 UK eScience programme, eMinerals
• 2005 Materials Grid (Martin Dove group)
• 2006 Blue Obelisk (Open Source chemistry)
• 2011 PNNL (US) meetings and visit
• 2012 Semantic Physical Science (Cambridge)
*TTT: Piet Hein
Componentised approach liberates

Individual, manual,
unreusable, flaky

Commodity, standard,
reliable, re-usable
Representing Semantics
Interoperating approaches:
Markup Languages (“hardcoded objects”) MathML,
G(eo)ML, CellML, S(ys)B(io)ML,
• CML (Chemistry and numeric science):
1.
2.
3.
4.
5.

Molecules (atoms, bonds, coordinates,
Reactions,
Spectra,
Solid state,
Computation

RDF (relationships, annotations, linking).
Ontologies (Dictionaries)
Humans and machines use different
languages
Scalable Vector Graphics (SVG)
Human-friendly

Automatic!

<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="1280" height="640" viewBox="0 0
30240 15120">
<defs id="defs6">
<polygon points="0,-9 1.735535,-3.6038755 7.0364833,-5.6114082 3.8997116,-0.89008374 8.7743512,2.0026884 3.1273259,2.4939592
3.9049537,8.1087198 0,4 -3.9049537,8.1087198 -3.1273259,2.4939592 -8.7743512,2.0026884 -3.8997116,-0.89008374 -7.0364833,5.6114082 -1.735535,-3.6038755 0,-9 " id="Star7"/>
</defs>
<path d="M 0,0 L 30240,0 L 30240,15120 L 0,15120 L 0,0 z" style="fill:#00008b"/>
<use transform="matrix(252,0,0,252,7560,11340)" id="Commonwealth_Star" style="fill:#fff" xlink:href="#Star7"/>
<use transform="matrix(120,0,0,120,22680,12600)" id="Star_Alpha_Crucis" style="fill:#fff" xlink:href="#Star7"/>
<!– snipped 
217,2520 L 10080,2520 L 15120,0 z" id="Red_Diagonals" style="fill:red"/>
<use transform="matrix(-1,0,0,-1,15120,7560)" id="Red_Diagonals_Rotated" style="fill:red" xlink:href="#Red_Diagonals"/>
</svg>

Machine-friendly
MathML

Mathematics Markup Language
Energy of c.c.p lattice of argon

Automatic!

Human-friendly

4 pages clipped

Many editors and tools exist
We used MathWeaver

Machinefriendly
CML (Chemical Markup Language)

Automatic!

Human-friendly

Machine-friendly
Current scientific information flow
… is broken for data-rich science
Non-semantic
data

PDF

Lineprinter output

Human input
Text files

Data extraction
difficult and
incomplete

Human
readers
Semantic network closes the loop
Measurement

Computation

Semantic
Authoring

Analysis

Community

Data available for
e-science and reuse

Data mined from
document
The network grows autonomously

Human-machine

Human-human

Machine-human

Machine-machine
• Example: Materials 2012, 5, 27-46;
doi:10.3390/ma5010027
CHEMICAL STRUCTURES
REACTIONS

ABBREVIATIONS
“… electron donor (ED), such as an electron rich,
metal-based light absorber (LA), and electron
acceptor (EA) sites.”
SPECTRA
TABLES
PROPERTIES (NAME-VALUE-UNITS)

Name
VU N

Value
VU N

Units
N

U
N

VV

U

Note CML supports value ranges and errors

VV
Mathematics

CML is being integrated with
computable (content) MathML
Materials Search Challenge
• What would you like a “Google for materials”
to find for you in the scientific literature?
TimBerners-Lee’s Open data
http://5stardata.info
★
CIFDIC
ACS ★★
IUCr

make your stuff available on the Web (whatever
format) under an OPEN license
make it available as structured data (i.e. NOT
PDF)
CRYSTALEYE

★★★

use non-proprietary formats (e.g., CSV)

★★★
★

use URIs to denote things, so that people can
point at your stuff

★★★
★★

link your data to other data to provide context
• http://upload.wikimedia.org/wikipedia/comm
ons/3/34/LOD_Cloud_Diagram_as_of_Septem
ber_2011.png
CIFDIC

COD
Creating semantic content
1.
2.
3.
4.

Authoring tools for humans
Program output
Chemical databases
Content mining and Natural
Language Processing (Text) (NLP)
5. Community
Semantic authoring IUCr
• http://blogs.ch.cam.ac.uk/pmr/2012/01/23/brian-mcmahonpublishing-semantic-crystallography-every-science-data-publishershould-watch-this-all-the-way-through/
•
•
•
•
•
•
•
•
•

1:08 CIF
3:36 CIF Syntax and dataTypes
4:30 Publishing with CIF
6:41 Demonstration: CheckCIF
12:02 Interactive Chemical validation
14:42 Linking data to journal article and search for novelty of data
15:08 Jmol display applet
21:03 Supplementary data
21:47 PublCIF a tool to merge data and text and annotate them
Semanticizing Logfiles: JumboConverters

LOGFILE
QUIXOTE: Semantic KnowledgeBase for
Computational Chemistry
Content Mining of Chemistry

Typical chemical synthesis

http://wwmm.ch.cam.ac.uk/chemicaltagger
Automatic semantic markup of chemistry

Could be used for analytical, crystallization, etc.
Open Content Mining of FACTs

Machines can interpret chemical reactions

We have done 500,000 patents. There are >
3,000,000 reactions/year. Added value > 1B Eur.
Open Content Mining of FACTS
Machines can interpret phylogenetic trees

Unusable
FACT
Re-usable
FACT

>100,000 diagrams in literature; cost 1,000,000,000 hours
Crowdcrafting and Hackathons
Crowdcrafting for Aegis/CERN
•
•
•
•
•
•

•

Does antimatter fall down or up?
Help the AEgIS experiment at CERN to work out how antimatter is affected by
gravity. Just join the dots!
Antimatter
The observable universe is composed almost entirely of matter but we can
produce stuff called antimatter in the lab. Antimatter is material composed of
antiparticles.
Antiparticles have the same mass as normal matter particles but the opposite
charge. When an antiparticle collides with an ordinary matter particle they both
annihilate - producing a burst of other particles and radiation.
Antiparticles should interact gravitationally just like particles of ordinary matter
because Einstein's weak equivalence principle states that gravity doesn't depend
on composition. But if they don't then gravity is much more complicated than our
current understanding indicates.

http://crowdcrafting.org http://crowdcrafting.org/antimatter
Crowdcrafting for CERN/Aegis
RCUK
Wellcome
ERC
NSF …
require
fully OPEN

[at Research Data Alliance, we are entering a new “era of open science”, which will be “good
for citizens, good for scientists and good for society”.
She explicitly highlighted the transformative potential of open access, open data, open
software and open educational resources – mentioning the EU’s policy requiring open access
to all publications and data resulting from EU funded research.
http://blog.okfn.org/2013/03/21/we-are-entering-an-era-of-open-science-says-eu-vp-neeliekroes/#sthash.3SWDXDE6.dpuf
Open Definition
• “A piece of data or content is open if anyone is
free to use, reuse, and redistribute it —
subject only, at most, to the requirement to
attribute and/or share-alike.”
OPEN

NOT OPEN

PDB
COD,Crystaleye

CCDC, ICSD

RSC/ACS/IUCr CIFs

Elsevier/Wiley/Springer CIFs

Acta Cryst E

Acta Cryst ABCD (default)

CIF dictionaries
From Saulius Grazulis
Crystaleye
• A database of 200,000 crystal structures scraped
from publications CIF supplemental information
• CML molecules and name-value pairs
• Re-usable as fragment base
Nick Day, Jim Downing, Sam Adams, N. W. England
and Peter Murray-Rust*
J.Appl.Cryst. (2012). 45 , 316–323,
doi:10.1107/S0021889812006462
http://wwmm.ch.cam.ac.uk/crystaleye
Supplemental
Information (CIFs)
harvested
from Publications

ACS
IUCr

RSC

ELS
As-Cl Bond lengths

Short

Long
Long
Short
Link to Journal
COD Letter to Editors 2012
[We] have become aware of growing concerns regarding the publication,
preservation and quality maintenance of crystallographic data. …However,
we believe that completely open deposition of data and multiple checks can
ensure the quality and wide availability of scientific data
[Please] recommend to your authors that, they also deposit their
supplementary crystallographic data into the COD when they submit
scientific papers to your journals.
Being open by its design, the COD enables the creation of multiple mirrors
and backup copies. It provides, thus, archival storage of scientific data with
adequate reliability. … services for reviewers and editors to facilitate the
peer-review. …since our database follows the Open Access model, all
material deposited into the COD is available to other databases. The COD
team actually encourages the use of our data collection for any possible
scientific or industrial application by putting the database into the public
domain
Recommendations for Open
Crystallography
• Require Open Crystal Data for all publications
• Deposition of Open Data in COD
• Integrate CIF dictionaries as RDF into Linked
Open Data
• Integrate COD into Linked Open Data Cloud
• CCDC/ICSD to publish RAW author CIFs Openly
Most “Open Access” is not re-usable
CC-BY / Reusable
Restricted by
licence or
lack of clarity

CC-NC
CC-ND

Nothing/
unclear
0

6000
PRICE per article USD

Ross Mounce
Panton Fellow
2012
Panton Principles for Open Data in Science
Why? Wanted to avoid the mess in OA
• Peter Murray-Rust, Cameron
Neylon, Rufus Pollock, John
Wilbanks
2008-> 2010 (launch) at
Panton Arms
Launch 2010
Peter
John
Jordan
Panton Fellowships (2012)Murray-Rust
Hatcher Wilbanks
Jenny
Molloy

Rufus
Pollock

Cameron
Neylon

“Licence STM Data as CC0”
Panton Fellows

Ross Mounce & Sophie Kershaw
(Support from Open Society Foundations)
* Data should be open
• Make your wishes clear
• Use an appropriate licence
Open Mining Manifesto
1. Define ‘open content mining’ in a broad and useful
manner
‘Open Content Mining’ means the unrestricted right of subscribers to extract, process and
republish content manually or by machine in whatever form (without prior specific
permissions and subject only to community norms of responsible behaviour in the electronic
age.
Text
Numbers
Tables
Diagrams
Graphical representations of relationships between variables
Images and video and audio when it is the means of expressing a fact.
Semantics (XML, RDF)
2. Urge publishers and institutional repositories to adhere to the following principles:

Principle 1: Right of Legitimate Accessors to Mine
We assert that there is no legal, ethical or moral reason to refuse to allow legitimate
accessors of research content (OA or otherwise) to use machines to analyse the published
output of the research community. Researchers expect to access and process the full
content of the research literature with their computer programs and should be able to use
their machines as they use their eyes. The

right to read is the right to mine

Principle 2: Lightweight Processing Terms and Conditions
Mining by legitimate subscribers should not be prohibited by contractual or other legal
barriers. Publishers should add clarifying language in subscription agreements that content
is available for information mining by download or by remote access. Where access is
through researcher-provided tools, no further cost should be required. Users and

providers should encourage machine processing
Principle 3: Use
Researchers can and will publish facts and excerpts which they discover by reading and
processing documents. They expect to disseminate and aggregate statistical results as facts
and context text as fair use excerpts, openly and with no restrictions other than attribution.
Publisher efforts to claim rights in the results of mining further retard the advancement of
science by making those results less available to the research community; Such claims should
be prohibited.

Facts don’t belong to anyone.
3. Strategies

Assert the above rights by:
Educating researchers and librarians about the potential of
content mining and the current impediments to doing so,
including alerting librarians to the need not to cede any of the
above rights when signing contracts with publishers
Compiling a list of publishers and indicating what rights they
currently permit, in order to highlight the gap between the
rights here being asserted and what is currently possible
Urging governments and funders to promote and aid the
enjoyment of the above rights.
Take-away messages
•
•
•
•
•

Lost/unused STM* data costs 30-100Billion /yr [1]
Licence: DATA as CCZero and TEXT as CC-BY
Content Mining for DATA is a RIGHT
Apathy is our worst enemy
Trust and empower young people

“A piece of content or data is open if anyone is free to
use, reuse, and redistribute it — subject only, at most,
to the requirement to attribute and/or share-alike.”
Une donnée est ouverte, si chacun est libre de l'utiliser,
de la réutiliser et de la redistribuer
*Scientific Technical Medical

[1] PMR: submission to UK Hargreaves process
To make the complete scientific literature
accessible to machines and humans

More Related Content

What's hot

The Content Mine (presented at UKSG)
The Content Mine (presented at UKSG)The Content Mine (presented at UKSG)
The Content Mine (presented at UKSG)
petermurrayrust
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neuroscience
TheContentMine
 
Big Data and ContentMining for Libraries
Big Data and ContentMining for LibrariesBig Data and ContentMining for Libraries
Big Data and ContentMining for Libraries
petermurrayrust
 
ContentMine: Open Data and Social Machines
ContentMine: Open Data and Social MachinesContentMine: Open Data and Social Machines
ContentMine: Open Data and Social Machines
petermurrayrust
 
The OAI-ORE Interoperability Framework in the Context of the Current Scholarl...
The OAI-ORE Interoperability Framework in the Context of the Current Scholarl...The OAI-ORE Interoperability Framework in the Context of the Current Scholarl...
The OAI-ORE Interoperability Framework in the Context of the Current Scholarl...Herbert Van de Sompel
 
Open Notebook Science
Open Notebook ScienceOpen Notebook Science
Open Notebook Science
petermurrayrust
 
Open data and Open Science
Open data and Open ScienceOpen data and Open Science
Open data and Open Science
petermurrayrust
 
ContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UK
petermurrayrust
 
ContentMine: Liberating scholarship from Open publications and theses
ContentMine: Liberating scholarship from Open publications and thesesContentMine: Liberating scholarship from Open publications and theses
ContentMine: Liberating scholarship from Open publications and theses
petermurrayrust
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neuroscience
petermurrayrust
 
Towards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectiveTowards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspective
petermurrayrust
 
Open science, open-source, and open data: Collaboration as an emergent property?
Open science, open-source, and open data: Collaboration as an emergent property?Open science, open-source, and open data: Collaboration as an emergent property?
Open science, open-source, and open data: Collaboration as an emergent property?
Hilmar Lapp
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome Trust
petermurrayrust
 
Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Age
petermurrayrust
 
Petermrjisc20141201
Petermrjisc20141201Petermrjisc20141201
Petermrjisc20141201
petermurrayrust
 
Understanding the Big Picture of e-Science
Understanding the Big Picture of e-ScienceUnderstanding the Big Picture of e-Science
Understanding the Big Picture of e-Science
Andrew Sallans
 
Making Theses USEFUL
Making Theses USEFULMaking Theses USEFUL
Making Theses USEFUL
petermurrayrust
 
A Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and WikidataA Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and Wikidata
petermurrayrust
 
MESUR: Making sense and use of usage data
MESUR: Making sense and use of usage dataMESUR: Making sense and use of usage data
MESUR: Making sense and use of usage data
Herbert Van de Sompel
 

What's hot (20)

The Content Mine (presented at UKSG)
The Content Mine (presented at UKSG)The Content Mine (presented at UKSG)
The Content Mine (presented at UKSG)
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neuroscience
 
Big Data and ContentMining for Libraries
Big Data and ContentMining for LibrariesBig Data and ContentMining for Libraries
Big Data and ContentMining for Libraries
 
ContentMine: Open Data and Social Machines
ContentMine: Open Data and Social MachinesContentMine: Open Data and Social Machines
ContentMine: Open Data and Social Machines
 
The OAI-ORE Interoperability Framework in the Context of the Current Scholarl...
The OAI-ORE Interoperability Framework in the Context of the Current Scholarl...The OAI-ORE Interoperability Framework in the Context of the Current Scholarl...
The OAI-ORE Interoperability Framework in the Context of the Current Scholarl...
 
Open Notebook Science
Open Notebook ScienceOpen Notebook Science
Open Notebook Science
 
Open data and Open Science
Open data and Open ScienceOpen data and Open Science
Open data and Open Science
 
ContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UK
 
ContentMine: Liberating scholarship from Open publications and theses
ContentMine: Liberating scholarship from Open publications and thesesContentMine: Liberating scholarship from Open publications and theses
ContentMine: Liberating scholarship from Open publications and theses
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neuroscience
 
Towards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectiveTowards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspective
 
Open science, open-source, and open data: Collaboration as an emergent property?
Open science, open-source, and open data: Collaboration as an emergent property?Open science, open-source, and open data: Collaboration as an emergent property?
Open science, open-source, and open data: Collaboration as an emergent property?
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome Trust
 
Csvconf
CsvconfCsvconf
Csvconf
 
Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Age
 
Petermrjisc20141201
Petermrjisc20141201Petermrjisc20141201
Petermrjisc20141201
 
Understanding the Big Picture of e-Science
Understanding the Big Picture of e-ScienceUnderstanding the Big Picture of e-Science
Understanding the Big Picture of e-Science
 
Making Theses USEFUL
Making Theses USEFULMaking Theses USEFUL
Making Theses USEFUL
 
A Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and WikidataA Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and Wikidata
 
MESUR: Making sense and use of usage data
MESUR: Making sense and use of usage dataMESUR: Making sense and use of usage data
MESUR: Making sense and use of usage data
 

Similar to Semantic Web in Physical Science

ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsJon Voss
 
Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)
TheContentMine
 
AH-XLDBEurope-position-09 jun2011
AH-XLDBEurope-position-09 jun2011AH-XLDBEurope-position-09 jun2011
AH-XLDBEurope-position-09 jun2011
Alex Hardisty
 
Foundations for the Future of Science
Foundations for the Future of ScienceFoundations for the Future of Science
Foundations for the Future of Science
Globus
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an example
Enis Afgan
 
The Symbiotic Nature of Provenance and Workflow
The Symbiotic Nature of Provenance and WorkflowThe Symbiotic Nature of Provenance and Workflow
The Symbiotic Nature of Provenance and Workflow
Eric Stephan
 
Building COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science ProjectBuilding COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science Project
vty
 
Open Knowledge and University of Cambridge European Bioinformatics Institute
Open Knowledge and University of Cambridge European Bioinformatics InstituteOpen Knowledge and University of Cambridge European Bioinformatics Institute
Open Knowledge and University of Cambridge European Bioinformatics Institute
TheContentMine
 
IASSIT Kansa Presentation
IASSIT Kansa PresentationIASSIT Kansa Presentation
IASSIT Kansa Presentation
ekansa
 
Media, information and the promise of new technologies in Knowledge Transfer ...
Media, information and the promise of new technologies in Knowledge Transfer ...Media, information and the promise of new technologies in Knowledge Transfer ...
Media, information and the promise of new technologies in Knowledge Transfer ...maudelfin
 
An Open Context for Archaeology
An Open Context for ArchaeologyAn Open Context for Archaeology
An Open Context for Archaeology
guest756e05
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neuroscience
TheContentMine
 
Open Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureOpen Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | Future
Ross Mounce
 
Towards Knowledge-Enabled Society
Towards Knowledge-Enabled SocietyTowards Knowledge-Enabled Society
Towards Knowledge-Enabled Society
National Institute of Informatics (NII)
 
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Robert H. McDonald
 
Cornell 2011 05-13
Cornell 2011 05-13Cornell 2011 05-13
Cornell 2011 05-13
Johannes Keizer
 
Reproducible Research and the Cloud
Reproducible Research and the CloudReproducible Research and the Cloud
Reproducible Research and the Cloud
Microsoft Azure for Research
 

Similar to Semantic Web in Physical Science (20)

ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
 
Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)
 
AH-XLDBEurope-position-09 jun2011
AH-XLDBEurope-position-09 jun2011AH-XLDBEurope-position-09 jun2011
AH-XLDBEurope-position-09 jun2011
 
Foundations for the Future of Science
Foundations for the Future of ScienceFoundations for the Future of Science
Foundations for the Future of Science
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an example
 
The Symbiotic Nature of Provenance and Workflow
The Symbiotic Nature of Provenance and WorkflowThe Symbiotic Nature of Provenance and Workflow
The Symbiotic Nature of Provenance and Workflow
 
Ji cv6n1
Ji cv6n1Ji cv6n1
Ji cv6n1
 
Building COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science ProjectBuilding COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science Project
 
Ngsp
NgspNgsp
Ngsp
 
Open Knowledge and University of Cambridge European Bioinformatics Institute
Open Knowledge and University of Cambridge European Bioinformatics InstituteOpen Knowledge and University of Cambridge European Bioinformatics Institute
Open Knowledge and University of Cambridge European Bioinformatics Institute
 
IASSIT Kansa Presentation
IASSIT Kansa PresentationIASSIT Kansa Presentation
IASSIT Kansa Presentation
 
Media, information and the promise of new technologies in Knowledge Transfer ...
Media, information and the promise of new technologies in Knowledge Transfer ...Media, information and the promise of new technologies in Knowledge Transfer ...
Media, information and the promise of new technologies in Knowledge Transfer ...
 
An Open Context for Archaeology
An Open Context for ArchaeologyAn Open Context for Archaeology
An Open Context for Archaeology
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neuroscience
 
Open Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureOpen Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | Future
 
Towards Knowledge-Enabled Society
Towards Knowledge-Enabled SocietyTowards Knowledge-Enabled Society
Towards Knowledge-Enabled Society
 
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
 
Ciard Initiative and a Global Infrastructure for Linked Open Data
Ciard Initiative and a Global Infrastructure for Linked Open Data Ciard Initiative and a Global Infrastructure for Linked Open Data
Ciard Initiative and a Global Infrastructure for Linked Open Data
 
Cornell 2011 05-13
Cornell 2011 05-13Cornell 2011 05-13
Cornell 2011 05-13
 
Reproducible Research and the Cloud
Reproducible Research and the CloudReproducible Research and the Cloud
Reproducible Research and the Cloud
 

More from petermurrayrust

Open Science Principles and Practice
Open Science Principles and PracticeOpen Science Principles and Practice
Open Science Principles and Practice
petermurrayrust
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentation
petermurrayrust
 
Can machines understand the scientific literature?
Can machines understand the scientific literature?Can machines understand the scientific literature?
Can machines understand the scientific literature?
petermurrayrust
 
OpenVirus at OpenPublishingFest
OpenVirus at OpenPublishingFestOpenVirus at OpenPublishingFest
OpenVirus at OpenPublishingFest
petermurrayrust
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentation
petermurrayrust
 
Automatic mining of data from materials science literature
Automatic mining of data from materials science literatureAutomatic mining of data from materials science literature
Automatic mining of data from materials science literature
petermurrayrust
 
Climate Change and Human Migration
Climate Change and Human MigrationClimate Change and Human Migration
Climate Change and Human Migration
petermurrayrust
 
openVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on virusesopenVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on viruses
petermurrayrust
 
XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?
petermurrayrust
 
Early Career Reseachers in Science. Start Early, Be Open , Be Brave
Early Career Reseachers in Science. Start Early, Be Open , Be BraveEarly Career Reseachers in Science. Start Early, Be Open , Be Brave
Early Career Reseachers in Science. Start Early, Be Open , Be Brave
petermurrayrust
 
Early Career Reseachers and Open Healthcare
Early Career Reseachers and Open HealthcareEarly Career Reseachers and Open Healthcare
Early Career Reseachers and Open Healthcare
petermurrayrust
 
Rapid biomedical search
Rapid biomedical search Rapid biomedical search
Rapid biomedical search
petermurrayrust
 
Scientific search for everyone
Scientific search for everyoneScientific search for everyone
Scientific search for everyone
petermurrayrust
 
Openplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searchingOpenplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searching
petermurrayrust
 
Extracting science from the archive
Extracting science from the archiveExtracting science from the archive
Extracting science from the archive
petermurrayrust
 
WikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and EverythingWikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and Everything
petermurrayrust
 
Disrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic ComplexDisrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic Complex
petermurrayrust
 
Paradise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to MineParadise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to Mine
petermurrayrust
 
Young people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge NeocolonialismYoung people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge Neocolonialism
petermurrayrust
 
WikiFactMine: Science for Everyone
WikiFactMine: Science for EveryoneWikiFactMine: Science for Everyone
WikiFactMine: Science for Everyone
petermurrayrust
 

More from petermurrayrust (20)

Open Science Principles and Practice
Open Science Principles and PracticeOpen Science Principles and Practice
Open Science Principles and Practice
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentation
 
Can machines understand the scientific literature?
Can machines understand the scientific literature?Can machines understand the scientific literature?
Can machines understand the scientific literature?
 
OpenVirus at OpenPublishingFest
OpenVirus at OpenPublishingFestOpenVirus at OpenPublishingFest
OpenVirus at OpenPublishingFest
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentation
 
Automatic mining of data from materials science literature
Automatic mining of data from materials science literatureAutomatic mining of data from materials science literature
Automatic mining of data from materials science literature
 
Climate Change and Human Migration
Climate Change and Human MigrationClimate Change and Human Migration
Climate Change and Human Migration
 
openVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on virusesopenVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on viruses
 
XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?
 
Early Career Reseachers in Science. Start Early, Be Open , Be Brave
Early Career Reseachers in Science. Start Early, Be Open , Be BraveEarly Career Reseachers in Science. Start Early, Be Open , Be Brave
Early Career Reseachers in Science. Start Early, Be Open , Be Brave
 
Early Career Reseachers and Open Healthcare
Early Career Reseachers and Open HealthcareEarly Career Reseachers and Open Healthcare
Early Career Reseachers and Open Healthcare
 
Rapid biomedical search
Rapid biomedical search Rapid biomedical search
Rapid biomedical search
 
Scientific search for everyone
Scientific search for everyoneScientific search for everyone
Scientific search for everyone
 
Openplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searchingOpenplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searching
 
Extracting science from the archive
Extracting science from the archiveExtracting science from the archive
Extracting science from the archive
 
WikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and EverythingWikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and Everything
 
Disrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic ComplexDisrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic Complex
 
Paradise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to MineParadise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to Mine
 
Young people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge NeocolonialismYoung people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge Neocolonialism
 
WikiFactMine: Science for Everyone
WikiFactMine: Science for EveryoneWikiFactMine: Science for Everyone
WikiFactMine: Science for Everyone
 

Recently uploaded

Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
timhan337
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
Group Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana BuscigliopptxGroup Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana Buscigliopptx
ArianaBusciglio
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
DhatriParmar
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
kimdan468
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
Scholarhat
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 

Recently uploaded (20)

Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
Group Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana BuscigliopptxGroup Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana Buscigliopptx
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 

Semantic Web in Physical Science

  • 1. The Semantic Web in Physical Science/Engineering Peter Murray-Rust University of Cambridge Open Knowledge Foundation Culham Laboratory, 2013-09-11, UK
  • 2. Themes To make the complete scientific literature accessible to machines and humans • • • • The Semantic Web. The power and need for Open Building Communities Multidisciplinarity Funding includes JISC, Unilever, EPSRC.
  • 3. The Semantic Web "The Semantic Web is an extension of the current web in which information is given welldefined meaning, better enabling computers and people to work in cooperation." Tim Berners-Lee, James Hendler, Ora Lassila, The Semantic Web, Scientific American, May 2001
  • 4. The scientist’s amanuensis • "The bane of my life is doing things I know computers could do for me" (Dan Connolly, W3C) Example: A semantic amanuensis could • Give me a daily digest of zeolite papers • Extract all the crystal structures from them • Compute physical properties with GULP and NWChem • Compare the results statistically • Preserve and distribute the complete operation • Prepare the results for publication The semantic web is having a personal amanuensis
  • 5. Linked Open Data – the world’s knowledge RDF triples Music, Social Art Literature Knowledge bases DBPedia Lib GOV.uk Comp PDB GOV Ontologies BIO very little physical science  http://upload.wikimedia.org/wikipedia/commons/3/34/LOD_Cloud_Diagram_as_of_September_2011.png
  • 6. Linked Open data from Wikipedia “Which Rivers flow into the Rhine and are longer than 50 kilometers?” or “Which Skyscrapers in China have more than 50 floors and have been constructed before the year 2000?” Open Crystallography? “Which countries where tropical diseases are endemic have published structures of chiral natural products?” CC-BY-SA from Wikipedia
  • 7. Semantics: (Things Take Time)* • 1994 1st WWW Conference • 1994 , Chemical MIME , Chemical Markup Language (Henry Rzepa, PMR) • 2001 UK eScience programme, eMinerals • 2005 Materials Grid (Martin Dove group) • 2006 Blue Obelisk (Open Source chemistry) • 2011 PNNL (US) meetings and visit • 2012 Semantic Physical Science (Cambridge) *TTT: Piet Hein
  • 8. Componentised approach liberates Individual, manual, unreusable, flaky Commodity, standard, reliable, re-usable
  • 9. Representing Semantics Interoperating approaches: Markup Languages (“hardcoded objects”) MathML, G(eo)ML, CellML, S(ys)B(io)ML, • CML (Chemistry and numeric science): 1. 2. 3. 4. 5. Molecules (atoms, bonds, coordinates, Reactions, Spectra, Solid state, Computation RDF (relationships, annotations, linking). Ontologies (Dictionaries)
  • 10. Humans and machines use different languages
  • 11. Scalable Vector Graphics (SVG) Human-friendly Automatic! <?xml version="1.0" encoding="UTF-8"?> <svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="1280" height="640" viewBox="0 0 30240 15120"> <defs id="defs6"> <polygon points="0,-9 1.735535,-3.6038755 7.0364833,-5.6114082 3.8997116,-0.89008374 8.7743512,2.0026884 3.1273259,2.4939592 3.9049537,8.1087198 0,4 -3.9049537,8.1087198 -3.1273259,2.4939592 -8.7743512,2.0026884 -3.8997116,-0.89008374 -7.0364833,5.6114082 -1.735535,-3.6038755 0,-9 " id="Star7"/> </defs> <path d="M 0,0 L 30240,0 L 30240,15120 L 0,15120 L 0,0 z" style="fill:#00008b"/> <use transform="matrix(252,0,0,252,7560,11340)" id="Commonwealth_Star" style="fill:#fff" xlink:href="#Star7"/> <use transform="matrix(120,0,0,120,22680,12600)" id="Star_Alpha_Crucis" style="fill:#fff" xlink:href="#Star7"/> <!– snipped  217,2520 L 10080,2520 L 15120,0 z" id="Red_Diagonals" style="fill:red"/> <use transform="matrix(-1,0,0,-1,15120,7560)" id="Red_Diagonals_Rotated" style="fill:red" xlink:href="#Red_Diagonals"/> </svg> Machine-friendly
  • 12. MathML Mathematics Markup Language Energy of c.c.p lattice of argon Automatic! Human-friendly 4 pages clipped Many editors and tools exist We used MathWeaver Machinefriendly
  • 13. CML (Chemical Markup Language) Automatic! Human-friendly Machine-friendly
  • 14. Current scientific information flow … is broken for data-rich science Non-semantic data PDF Lineprinter output Human input Text files Data extraction difficult and incomplete Human readers
  • 15. Semantic network closes the loop Measurement Computation Semantic Authoring Analysis Community Data available for e-science and reuse Data mined from document
  • 16. The network grows autonomously Human-machine Human-human Machine-human Machine-machine
  • 17. • Example: Materials 2012, 5, 27-46; doi:10.3390/ma5010027
  • 19. REACTIONS ABBREVIATIONS “… electron donor (ED), such as an electron rich, metal-based light absorber (LA), and electron acceptor (EA) sites.”
  • 22. PROPERTIES (NAME-VALUE-UNITS) Name VU N Value VU N Units N U N VV U Note CML supports value ranges and errors VV
  • 23. Mathematics CML is being integrated with computable (content) MathML
  • 24. Materials Search Challenge • What would you like a “Google for materials” to find for you in the scientific literature?
  • 25. TimBerners-Lee’s Open data http://5stardata.info ★ CIFDIC ACS ★★ IUCr make your stuff available on the Web (whatever format) under an OPEN license make it available as structured data (i.e. NOT PDF) CRYSTALEYE ★★★ use non-proprietary formats (e.g., CSV) ★★★ ★ use URIs to denote things, so that people can point at your stuff ★★★ ★★ link your data to other data to provide context
  • 26.
  • 28. Creating semantic content 1. 2. 3. 4. Authoring tools for humans Program output Chemical databases Content mining and Natural Language Processing (Text) (NLP) 5. Community
  • 29. Semantic authoring IUCr • http://blogs.ch.cam.ac.uk/pmr/2012/01/23/brian-mcmahonpublishing-semantic-crystallography-every-science-data-publishershould-watch-this-all-the-way-through/ • • • • • • • • • 1:08 CIF 3:36 CIF Syntax and dataTypes 4:30 Publishing with CIF 6:41 Demonstration: CheckCIF 12:02 Interactive Chemical validation 14:42 Linking data to journal article and search for novelty of data 15:08 Jmol display applet 21:03 Supplementary data 21:47 PublCIF a tool to merge data and text and annotate them
  • 31. QUIXOTE: Semantic KnowledgeBase for Computational Chemistry
  • 32. Content Mining of Chemistry Typical chemical synthesis http://wwmm.ch.cam.ac.uk/chemicaltagger
  • 33. Automatic semantic markup of chemistry Could be used for analytical, crystallization, etc.
  • 34. Open Content Mining of FACTs Machines can interpret chemical reactions We have done 500,000 patents. There are > 3,000,000 reactions/year. Added value > 1B Eur.
  • 35. Open Content Mining of FACTS Machines can interpret phylogenetic trees Unusable FACT Re-usable FACT >100,000 diagrams in literature; cost 1,000,000,000 hours
  • 37. Crowdcrafting for Aegis/CERN • • • • • • • Does antimatter fall down or up? Help the AEgIS experiment at CERN to work out how antimatter is affected by gravity. Just join the dots! Antimatter The observable universe is composed almost entirely of matter but we can produce stuff called antimatter in the lab. Antimatter is material composed of antiparticles. Antiparticles have the same mass as normal matter particles but the opposite charge. When an antiparticle collides with an ordinary matter particle they both annihilate - producing a burst of other particles and radiation. Antiparticles should interact gravitationally just like particles of ordinary matter because Einstein's weak equivalence principle states that gravity doesn't depend on composition. But if they don't then gravity is much more complicated than our current understanding indicates. http://crowdcrafting.org http://crowdcrafting.org/antimatter
  • 39. RCUK Wellcome ERC NSF … require fully OPEN [at Research Data Alliance, we are entering a new “era of open science”, which will be “good for citizens, good for scientists and good for society”. She explicitly highlighted the transformative potential of open access, open data, open software and open educational resources – mentioning the EU’s policy requiring open access to all publications and data resulting from EU funded research. http://blog.okfn.org/2013/03/21/we-are-entering-an-era-of-open-science-says-eu-vp-neeliekroes/#sthash.3SWDXDE6.dpuf
  • 40. Open Definition • “A piece of data or content is open if anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and/or share-alike.” OPEN NOT OPEN PDB COD,Crystaleye CCDC, ICSD RSC/ACS/IUCr CIFs Elsevier/Wiley/Springer CIFs Acta Cryst E Acta Cryst ABCD (default) CIF dictionaries
  • 42. Crystaleye • A database of 200,000 crystal structures scraped from publications CIF supplemental information • CML molecules and name-value pairs • Re-usable as fragment base Nick Day, Jim Downing, Sam Adams, N. W. England and Peter Murray-Rust* J.Appl.Cryst. (2012). 45 , 316–323, doi:10.1107/S0021889812006462 http://wwmm.ch.cam.ac.uk/crystaleye
  • 45. Long
  • 46. Short
  • 48. COD Letter to Editors 2012 [We] have become aware of growing concerns regarding the publication, preservation and quality maintenance of crystallographic data. …However, we believe that completely open deposition of data and multiple checks can ensure the quality and wide availability of scientific data [Please] recommend to your authors that, they also deposit their supplementary crystallographic data into the COD when they submit scientific papers to your journals. Being open by its design, the COD enables the creation of multiple mirrors and backup copies. It provides, thus, archival storage of scientific data with adequate reliability. … services for reviewers and editors to facilitate the peer-review. …since our database follows the Open Access model, all material deposited into the COD is available to other databases. The COD team actually encourages the use of our data collection for any possible scientific or industrial application by putting the database into the public domain
  • 49. Recommendations for Open Crystallography • Require Open Crystal Data for all publications • Deposition of Open Data in COD • Integrate CIF dictionaries as RDF into Linked Open Data • Integrate COD into Linked Open Data Cloud • CCDC/ICSD to publish RAW author CIFs Openly
  • 50. Most “Open Access” is not re-usable CC-BY / Reusable Restricted by licence or lack of clarity CC-NC CC-ND Nothing/ unclear 0 6000 PRICE per article USD Ross Mounce Panton Fellow 2012
  • 51. Panton Principles for Open Data in Science Why? Wanted to avoid the mess in OA • Peter Murray-Rust, Cameron Neylon, Rufus Pollock, John Wilbanks 2008-> 2010 (launch) at Panton Arms Launch 2010 Peter John Jordan Panton Fellowships (2012)Murray-Rust Hatcher Wilbanks Jenny Molloy Rufus Pollock Cameron Neylon “Licence STM Data as CC0”
  • 52. Panton Fellows Ross Mounce & Sophie Kershaw (Support from Open Society Foundations)
  • 53. * Data should be open • Make your wishes clear • Use an appropriate licence
  • 54. Open Mining Manifesto 1. Define ‘open content mining’ in a broad and useful manner ‘Open Content Mining’ means the unrestricted right of subscribers to extract, process and republish content manually or by machine in whatever form (without prior specific permissions and subject only to community norms of responsible behaviour in the electronic age. Text Numbers Tables Diagrams Graphical representations of relationships between variables Images and video and audio when it is the means of expressing a fact. Semantics (XML, RDF)
  • 55. 2. Urge publishers and institutional repositories to adhere to the following principles: Principle 1: Right of Legitimate Accessors to Mine We assert that there is no legal, ethical or moral reason to refuse to allow legitimate accessors of research content (OA or otherwise) to use machines to analyse the published output of the research community. Researchers expect to access and process the full content of the research literature with their computer programs and should be able to use their machines as they use their eyes. The right to read is the right to mine Principle 2: Lightweight Processing Terms and Conditions Mining by legitimate subscribers should not be prohibited by contractual or other legal barriers. Publishers should add clarifying language in subscription agreements that content is available for information mining by download or by remote access. Where access is through researcher-provided tools, no further cost should be required. Users and providers should encourage machine processing Principle 3: Use Researchers can and will publish facts and excerpts which they discover by reading and processing documents. They expect to disseminate and aggregate statistical results as facts and context text as fair use excerpts, openly and with no restrictions other than attribution. Publisher efforts to claim rights in the results of mining further retard the advancement of science by making those results less available to the research community; Such claims should be prohibited. Facts don’t belong to anyone.
  • 56. 3. Strategies Assert the above rights by: Educating researchers and librarians about the potential of content mining and the current impediments to doing so, including alerting librarians to the need not to cede any of the above rights when signing contracts with publishers Compiling a list of publishers and indicating what rights they currently permit, in order to highlight the gap between the rights here being asserted and what is currently possible Urging governments and funders to promote and aid the enjoyment of the above rights.
  • 57. Take-away messages • • • • • Lost/unused STM* data costs 30-100Billion /yr [1] Licence: DATA as CCZero and TEXT as CC-BY Content Mining for DATA is a RIGHT Apathy is our worst enemy Trust and empower young people “A piece of content or data is open if anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and/or share-alike.” Une donnée est ouverte, si chacun est libre de l'utiliser, de la réutiliser et de la redistribuer *Scientific Technical Medical [1] PMR: submission to UK Hargreaves process
  • 58. To make the complete scientific literature accessible to machines and humans