Integration of oreChemwith the eCrystals repository for crystal structures<br />Mark Borkum, Simon Coles and Jeremy Frey15...
Overview<br />Motivation<br />Implementation<br />Discussion and Summary<br />2<br />
Current Practice in Crystallography<br />Crystallography data is highly structured<br />The de facto standard adopted by t...
Open Access Journals<br />Advantages:<br />Rapid publication<br />Highly cited<br />Data is available to download<br />Dis...
Crystallography and Fraud<br />5<br />
The eCrystals Federation<br />JISC project to establish a network of crystallography resources on the Internet, with metad...
eCrystals – University of Southampton<br />Located @ http://ecrystals.chem.soton.ac.uk<br />Archive for crystal structures...
What is an eCrystal?<br />“all the fundamental and derived data resulting from a single crystal X-ray structure determinat...
The Scientific Web<br />9<br />
The Data Deluge<br />10<br />In Haiku:<br />Lots of producers;Generating more datathan ever before.<br />40 years ago, a P...
Provenance<br />The 7 W’s [Goble 2002]<br />Who, What, Where,  Why, When, Which, & (W)How<br />The Why aspect is usually i...
“In theory, there is no difference between theory and practice.But, in practice, there is.” Unknown (possibly Yogi Berra)<...
Why “Why” Matters<br />It is the reason for the data’s existence<br />It gives us the ability to interpret the data in the...
The oreChem Core Ontology<br />Describes three concepts:<br />The methodology (planned method) of a scientific experiment<...
Methodology (Planned Method)<br />The “plan” is modelled as a directed graph<br />Two node types:<br />Plan Stagedescripti...
Enactment (of a Methodology)<br />Each “run” (of a plan) is modelled as a directed graph<br /> Two node types:<br />Staged...
Provenance<br />Prospective<br />The plan describes a scientific experiment that will be enacted<br />Retrospective<br />T...
oreChem Plug-in for eCrystals<br />Three components:<br />orechem:Plan (the eCrystals methodology)<br /> “eCrystalorechem...
The eCrystals Methodology<br />19<br />Before<br />After<br />
Example: eCrystal #643<br />Before<br />After<br />20<br />
SPARQL Request<br />PREFIX orechem:   <http://www.openarchives.org/2010/05/24-orechem-ns#><br />PREFIX ecrystals: <http://...
SPARQL Response (for eCrystal #643)<br />22<br />?run<br />?reported<br />?derived<br />?raw<br />
Summary<br /><summary/><br />23<br />
Acknowledgments<br />oreChem is funded by Microsoft External Research<br />eCrystals is funded by both EPSRC and JISC<br /...
25<br />#ahm2010<br />#ahm<br />#ahm10<br />#pch2010<br />http://pegasus.chem.soton.ac.uk<br />#ahm2010 until 11am Wed 15 ...
Upcoming SlideShare
Loading in …5
×

Integration of oreChem with the eCrystals repository for crystal structures

1,433 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,433
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
17
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Integration of oreChem with the eCrystals repository for crystal structures

  1. 1. Integration of oreChemwith the eCrystals repository for crystal structures<br />Mark Borkum, Simon Coles and Jeremy Frey15 September 2010<br />
  2. 2. Overview<br />Motivation<br />Implementation<br />Discussion and Summary<br />2<br />
  3. 3. Current Practice in Crystallography<br />Crystallography data is highly structured<br />The de facto standard adopted by the community is the CIF (Crystallographic Information File)<br />Relatively few crystal structures are openly published<br />3<br />http://www.rin.ac.uk/our-work/data-management-and-curation/share-or-not-share-research-data-outputs<br />
  4. 4. Open Access Journals<br />Advantages:<br />Rapid publication<br />Highly cited<br />Data is available to download<br />Disadvantages:<br />Electronic only<br />Not all data is of primary importance to the underlying chemistry<br />By-products, unexpected results, tracking reactions, etc.<br />4<br />
  5. 5. Crystallography and Fraud<br />5<br />
  6. 6. The eCrystals Federation<br />JISC project to establish a network of crystallography resources on the Internet, with metadata that is harvested by a number of aggregation services<br />Led by the UK National Crystallography Service (NCS)<br />With core partners at UKOLN, the Digital Curation Centre, and the Unilever Centre for Molecular Science Informatics<br />6<br />
  7. 7. eCrystals – University of Southampton<br />Located @ http://ecrystals.chem.soton.ac.uk<br />Archive for crystal structures that are generated by:<br />Southampton Chemical Crystallography Group<br />UK National Crystallography Service (NCS)<br />Modified version of EPrints 3.1<br />OAI-PMH compliant<br />Extensible platform (with plug-ins architecture)<br />7<br />
  8. 8. What is an eCrystal?<br />“all the fundamental and derived data resulting from a single crystal X-ray structure determination”<br />“the information supplied should enable any reader to check the reliability and validity”<br />8<br />http://www.ukoln.ac.uk/projects/ebank-uk/images/collage-web.gif<br />
  9. 9. The Scientific Web<br />9<br />
  10. 10. The Data Deluge<br />10<br />In Haiku:<br />Lots of producers;Generating more datathan ever before.<br />40 years ago, a PhD student would determine 3 structures over the entire course of their study!<br />The Great Wave off Kanagawa by Katsushika Hokusai<br />
  11. 11. Provenance<br />The 7 W’s [Goble 2002]<br />Who, What, Where, Why, When, Which, & (W)How<br />The Why aspect is usually ignored <br />Rational, intent, hypothesis, protocol, methodology, workflow, etc.<br />11<br />“Diana and Actaeon by Titian has a full provenance covering its passage through several owners and four countries since it was painted for Philip II of Spain in the 1550s.”<br />Source: http://en.wikipedia.org/wiki/Diana_and_Actaeon_%28Titian%29<br />
  12. 12. “In theory, there is no difference between theory and practice.But, in practice, there is.” Unknown (possibly Yogi Berra)<br />12<br />
  13. 13. Why “Why” Matters<br />It is the reason for the data’s existence<br />It gives us the ability to interpret the data in the correct context<br />It allows us to align the data with the big picture<br />13<br />http://www.myexperiment.org/workflows/16.html<br />
  14. 14. The oreChem Core Ontology<br />Describes three concepts:<br />The methodology (planned method) of a scientific experiment<br />The enactment of methodologies<br />The provenance of realised artefacts<br />14<br />
  15. 15. Methodology (Planned Method)<br />The “plan” is modelled as a directed graph<br />Two node types:<br />Plan Stagedescription of an activity that will be enacted<br />Plan Objectdescription of an artefact that will be realised<br />15<br />
  16. 16. Enactment (of a Methodology)<br />Each “run” (of a plan) is modelled as a directed graph<br /> Two node types:<br />Stagedescription of an activity that has been enacted<br />Objectdescription of an artefact that has been realised<br />16<br />
  17. 17. Provenance<br />Prospective<br />The plan describes a scientific experiment that will be enacted<br />Retrospective<br />The run describes a scientific experiment that hasbeen enacted<br />Every ‘run thing’ is linked to exactly one ‘plan thing’<br />17<br />
  18. 18. oreChem Plug-in for eCrystals<br />Three components:<br />orechem:Plan (the eCrystals methodology)<br /> “eCrystalorechem:Run” mapping<br /> “orechem:Run provenance graph” pipeline<br />18<br />
  19. 19. The eCrystals Methodology<br />19<br />Before<br />After<br />
  20. 20. Example: eCrystal #643<br />Before<br />After<br />20<br />
  21. 21. SPARQL Request<br />PREFIX orechem: <http://www.openarchives.org/2010/05/24-orechem-ns#><br />PREFIX ecrystals: <http://ecrystals.chem.soton.ac.uk/plan.rdf#>SELECT ?run ?raw ?derived ?reported<br />WHERE {<br /> ?run a orechem:Run ;<br />orechem:hasPlanecrystals:Ecrystals ;<br />orechem:containsObject ?raw ;<br />orechem:containsObject ?derived ;<br />orechem:containsObject ?reported .<br /> ?raw a orechem:File ;<br />orechem:hasPlanObjectecrystals:HKL .<br /> ?derived a orechem:File ;<br />orechem:derivedFrom ?raw .<br /> ?reported a orechem:File ;<br />orechem:hasPlanObjectecrystals:CIF ;<br />orechem:derivedFrom ?derived .<br />}<br />21<br />
  22. 22. SPARQL Response (for eCrystal #643)<br />22<br />?run<br />?reported<br />?derived<br />?raw<br />
  23. 23. Summary<br /><summary/><br />23<br />
  24. 24. Acknowledgments<br />oreChem is funded by Microsoft External Research<br />eCrystals is funded by both EPSRC and JISC<br />The oreChem project team:<br />Nico Adams, Mark Borkum, William Brouwer, RameswaraSashiKiranChalla, Simon Coles, Nick Day, Jim Downing, Jeremy Frey, C. Lee Giles, Carl Lagoze (PI), Na Li, PrasenjitMitra, Karl Meuller, Peter Murray-Rust, Marlon Pierce, Joe Townsend, and Theresa Velden.<br />24<br />
  25. 25. 25<br />#ahm2010<br />#ahm<br />#ahm10<br />#pch2010<br />http://pegasus.chem.soton.ac.uk<br />#ahm2010 until 11am Wed 15 Sept 2010<br />

×