oreChem: Planning and
Enacting Chemistry on the
Semantic Web
Microsoft Research eScience Workshop 2010
Berkeley, CA USA
Ma...
Overview
• Introduction
• Ontology
• Case Study: X-ray Crystallography
• Future Work
• Summary
2
The Scientific Method
• A systematic process
for knowledge
acquisition
• Becoming increasingly
data-intensive
Planning
Ena...
The Data Deluge
4
• In Haiku:
– Lots of producers;
Generating more data
than ever before.
• 40 years ago, a PhD
student wo...
The Scientific Method (on the Web)
5
Provenance (The Elephant in the Room)
• The 7 W’s [Goble 2002]
– Who, What, Where,
Why, When, Which, &
(W)How
• The Why as...
The oreChem Project
• Funded by Microsoft
Research
• Investigating the design and
deployment of a semantic-
based eScience...
oreChem Core Ontology
8
Planning
• Prospective provenance
• Describes a scientific
experiment that will be
enacted (in the future)
• Three entity ...
Enactment
• Retrospective provenance
• Describes a scientific
experiment that was
enacted
• Three entity types:
– Run
– St...
“In theory, there is no difference
between theory and practice.
But, in practice, there is.”
Unknown (possibly Yogi Berra)
Realisation (is not Instantiation)
• Each ‘run thing’ is
linked to zero or one
‘plan thing’
– Deviation from the plan
is a...
X-RAY CRYSTALLOGRAPHY
Case Study
13
Current Practice in Crystallography
• Crystallography data is
highly structured
– The de facto standard
adopted by the
com...
Crystallography and Fraud
15
The eCrystals Federation
• JISC project
• Network of
crystallography
resources
• All published records
are available as
Op...
eCrystal #20
• Each eCrystals record
contains:
– Bibliographic metadata
– Fundamental and
derived data (excluding
raw imag...
Single Crystal Structure Determination
18
1. Take powder
specimen of chemical
substance
2. Measure diffraction of
X-rays
3...
oreChem Plan for eCrystals
• Machine-readable
representation of
methodology
• Describes requirements
for software and data...
oreChem Run for eCrystal #20
• Exported by “oreChem”
plug-in for EPrints 3.1
– RDF/XML serialisation
– Uses SWRL rules to ...
Retrospective Provenance
Graphs for eCrystal #20
Stages and Objects Objects
21
used (dashed)
emitted (solid)
derivedFrom (...
Crystallography and Fraud – SPARQL
PREFIX orechem: <http://www.openarchives.org/2010/05/24-orechem-ns#>
PREFIX ecrystals: ...
Crystallography and Fraud – SPARQL (2)
23
Crystallography and Fraud – SPARQL (3)
24
?run ?raw
?reported
?derived
http://ecrystals.chem.soton.ac.uk/cgi/export/20/ORE...
Crystallography and Fraud – SPARQL (4)
?run ?raw ?derived ?reported
_:eCrystal_20_Run 02sot126.hkl 02sot126.prp 02sot126.c...
Future Work
• oreChem Core Ontology
– Support for conditionals and continuations
• oreChem Lower Ontology
– Specialised fo...
Summary
• <summary/>
27
Acknowledgements
• Microsoft Research
– Tony Hey
– Lee Dirks
– Savas Parastatidis
– Alex Wade
• oreChem Project
– Carl Lag...
Thank You
• Questions?
29
Upcoming SlideShare
Loading in …5
×

oreChem: Planning and Enacting Chemistry on the Semantic Web

1,877 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,877
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
21
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

oreChem: Planning and Enacting Chemistry on the Semantic Web

  1. 1. oreChem: Planning and Enacting Chemistry on the Semantic Web Microsoft Research eScience Workshop 2010 Berkeley, CA USA Mark Borkum, Simon Coles and Jeremy Frey 12 October 2010
  2. 2. Overview • Introduction • Ontology • Case Study: X-ray Crystallography • Future Work • Summary 2
  3. 3. The Scientific Method • A systematic process for knowledge acquisition • Becoming increasingly data-intensive Planning Enactment Analysis Publication 3
  4. 4. The Data Deluge 4 • In Haiku: – Lots of producers; Generating more data than ever before. • 40 years ago, a PhD student would determine 3 structures over the entire course of their study! The Great Wave off Kanagawa by Katsushika Hokusai
  5. 5. The Scientific Method (on the Web) 5
  6. 6. Provenance (The Elephant in the Room) • The 7 W’s [Goble 2002] – Who, What, Where, Why, When, Which, & (W)How • The Why aspect is often ignored  6 Why Planning Who Authorship What & (W)How Enactment Where & When Annotations
  7. 7. The oreChem Project • Funded by Microsoft Research • Investigating the design and deployment of a semantic- based eScience infrastructure for Chemistry • Project website: – http://research.microsoft.com/ en-us/projects/orechem/ 7 Why Planning Who Authorship What & (W)How Enactment Where & When Annotations oreChem Dublin Core, FOAF, SIOC, OWL Time, GeoNames, etc…
  8. 8. oreChem Core Ontology 8
  9. 9. Planning • Prospective provenance • Describes a scientific experiment that will be enacted (in the future) • Three entity types: – Plan – Plan Stage – Plan Object 9
  10. 10. Enactment • Retrospective provenance • Describes a scientific experiment that was enacted • Three entity types: – Run – Stage – Object 10
  11. 11. “In theory, there is no difference between theory and practice. But, in practice, there is.” Unknown (possibly Yogi Berra)
  12. 12. Realisation (is not Instantiation) • Each ‘run thing’ is linked to zero or one ‘plan thing’ – Deviation from the plan is allowed 12
  13. 13. X-RAY CRYSTALLOGRAPHY Case Study 13
  14. 14. Current Practice in Crystallography • Crystallography data is highly structured – The de facto standard adopted by the community is the CIF (Crystallographic Information File) • Relatively few crystal structures are openly available online 14 http://www.rin.ac.uk/our-work/data-management-and- curation/share-or-not-share-research-data-outputs
  15. 15. Crystallography and Fraud 15
  16. 16. The eCrystals Federation • JISC project • Network of crystallography resources • All published records are available as Open Data • Based on EPrints repository 16 http://ecrystals.chem.soton.ac.uk/
  17. 17. eCrystal #20 • Each eCrystals record contains: – Bibliographic metadata – Fundamental and derived data (excluding raw images) – Final structure solution 17
  18. 18. Single Crystal Structure Determination 18 1. Take powder specimen of chemical substance 2. Measure diffraction of X-rays 3. Compute electron densities 4. Solve for crystal structure
  19. 19. oreChem Plan for eCrystals • Machine-readable representation of methodology • Describes requirements for software and data products • Available online at: – http://ecrystals.chem.soton. ac.uk/plan.rdf 19
  20. 20. oreChem Run for eCrystal #20 • Exported by “oreChem” plug-in for EPrints 3.1 – RDF/XML serialisation – Uses SWRL rules to infer causal relationships • Describes: – Software – Data products 20 http://ecrystals.chem.soton.ac.uk/cgi/export/20/ORE_Chem/ecry stals-eprint-20.xml?include_xsl=1
  21. 21. Retrospective Provenance Graphs for eCrystal #20 Stages and Objects Objects 21 used (dashed) emitted (solid) derivedFrom (solid) used(?s, ?o1) & emitted(?s, ?o2)  derivedFrom(?o2, ?o1)
  22. 22. Crystallography and Fraud – SPARQL PREFIX orechem: <http://www.openarchives.org/2010/05/24-orechem-ns#> PREFIX ecrystals: <http://ecrystals.chem.soton.ac.uk/plan.rdf#> SELECT ?run ?raw ?derived ?reported WHERE { ?run a orechem:Run ; orechem:hasPlan ecrystals:Ecrystals ; orechem:containsObject ?raw ; orechem:containsObject ?derived ; orechem:containsObject ?reported . ?raw a orechem:File ; orechem:hasPlanObject ecrystals:HKL . ?derived a orechem:File ; orechem:derivedFrom ?raw . ?reported a orechem:File ; orechem:hasPlanObject ecrystals:CIF ; orechem:derivedFrom ?derived . } 22
  23. 23. Crystallography and Fraud – SPARQL (2) 23
  24. 24. Crystallography and Fraud – SPARQL (3) 24 ?run ?raw ?reported ?derived http://ecrystals.chem.soton.ac.uk/cgi/export/20/ORE_Chem/ecry stals-eprint-20.xml?include_xsl=1
  25. 25. Crystallography and Fraud – SPARQL (4) ?run ?raw ?derived ?reported _:eCrystal_20_Run 02sot126.hkl 02sot126.prp 02sot126.cif _:eCrystal_20_Run 02sot126.hkl 02sot126.lst 02sot126.cif _:eCrystal_20_Run 02sot126.hkl 02sot126.res 02sot126.cif 25
  26. 26. Future Work • oreChem Core Ontology – Support for conditionals and continuations • oreChem Lower Ontology – Specialised for Physical and Computational Chemistry • Applications and Services – oreChem Plan Designer and Enactor – oreChem Run Inspector 26
  27. 27. Summary • <summary/> 27
  28. 28. Acknowledgements • Microsoft Research – Tony Hey – Lee Dirks – Savas Parastatidis – Alex Wade • oreChem Project – Carl Lagoze, Theresa Velden – Jeremy Frey, Simon Coles – Peter Murray-Rust, Nick Day, Jim Downing – C. Lee Giles, Prasenjit Mitra, William Brouwer, Na Li – Marlon Pierce, Sashi Kiran Challa 28
  29. 29. Thank You • Questions? 29

×