SlideShare a Scribd company logo
1 of 25
Integration of oreChemwith the eCrystals repository for crystal structures Mark Borkum, Simon Coles and Jeremy Frey15 September 2010
Overview Motivation Implementation Discussion and Summary 2
Current Practice in Crystallography Crystallography data is highly structured The de facto standard adopted by the community is the CIF (Crystallographic Information File) Relatively few crystal structures are openly published 3 http://www.rin.ac.uk/our-work/data-management-and-curation/share-or-not-share-research-data-outputs
Open Access Journals Advantages: Rapid publication Highly cited Data is available to download Disadvantages: Electronic only Not all data is of primary importance to the underlying chemistry By-products, unexpected results, tracking reactions, etc. 4
Crystallography and Fraud 5
The eCrystals Federation JISC project to establish a network of crystallography resources on the Internet, with metadata that is harvested by a number of aggregation services Led by the UK National Crystallography Service (NCS) With core partners at UKOLN, the Digital Curation Centre, and the Unilever Centre for Molecular Science Informatics 6
eCrystals – University of Southampton Located @ http://ecrystals.chem.soton.ac.uk Archive for crystal structures that are generated by: Southampton Chemical Crystallography Group UK National Crystallography Service (NCS) Modified version of EPrints 3.1 OAI-PMH compliant Extensible platform (with plug-ins architecture) 7
What is an eCrystal? “all the fundamental and derived data resulting from a single crystal X-ray structure determination” “the information supplied should enable any reader to check the reliability and validity” 8 http://www.ukoln.ac.uk/projects/ebank-uk/images/collage-web.gif
The Scientific Web 9
The Data Deluge 10 In Haiku: Lots of producers;Generating more datathan ever before. 40 years ago, a PhD student would determine 3 structures over the entire course of their study! The Great Wave off Kanagawa by Katsushika Hokusai
Provenance The 7 W’s [Goble 2002] Who, What, Where,  Why, When, Which, & (W)How The Why aspect is usually ignored  Rational, intent, hypothesis, protocol, methodology, workflow, etc. 11 “Diana and Actaeon by Titian has a full provenance covering its passage through several owners and four countries since it was painted for Philip II of Spain in the 1550s.” Source: http://en.wikipedia.org/wiki/Diana_and_Actaeon_%28Titian%29
“In theory, there is no difference between theory and practice.But, in practice, there is.” Unknown (possibly Yogi Berra) 12
Why “Why” Matters It is the reason for the data’s existence It gives us the ability to interpret the data in the correct context It allows us to align the data with the big picture 13 http://www.myexperiment.org/workflows/16.html
The oreChem Core Ontology Describes three concepts: The methodology (planned method) of a scientific experiment The enactment of methodologies The provenance of realised artefacts 14
Methodology (Planned Method) The “plan” is modelled as a directed graph Two node types: Plan Stagedescription of an activity that will be enacted Plan Objectdescription of an artefact that will be realised 15
Enactment (of a Methodology) Each “run” (of a plan) is modelled as a directed graph  Two node types: Stagedescription of an activity that has been enacted Objectdescription of an artefact that has been realised 16
Provenance Prospective The plan describes a scientific experiment that will be enacted Retrospective The run describes a scientific experiment that hasbeen enacted Every ‘run thing’ is linked to exactly one ‘plan thing’ 17
oreChem Plug-in for eCrystals Three components: orechem:Plan (the eCrystals methodology)  “eCrystalorechem:Run” mapping  “orechem:Run provenance graph” pipeline 18
The eCrystals Methodology 19 Before After
Example: eCrystal #643 Before After 20
SPARQL Request PREFIX orechem:   <http://www.openarchives.org/2010/05/24-orechem-ns#> PREFIX ecrystals: <http://ecrystals.chem.soton.ac.uk/plan.rdf#>SELECT ?run ?raw ?derived ?reported WHERE {   ?run a orechem:Run ; orechem:hasPlanecrystals:Ecrystals ; orechem:containsObject ?raw ; orechem:containsObject ?derived ; orechem:containsObject ?reported .   ?raw a orechem:File ; orechem:hasPlanObjectecrystals:HKL .   ?derived a orechem:File ; orechem:derivedFrom ?raw .   ?reported a orechem:File ; orechem:hasPlanObjectecrystals:CIF ; orechem:derivedFrom ?derived . } 21
SPARQL Response (for eCrystal #643) 22 ?run ?reported ?derived ?raw
Summary <summary/> 23
Acknowledgments oreChem is funded by Microsoft External Research eCrystals is funded by both EPSRC and JISC The oreChem project team: Nico Adams, Mark Borkum, William Brouwer, RameswaraSashiKiranChalla, Simon Coles, Nick Day, Jim Downing, Jeremy Frey, C. Lee Giles, Carl Lagoze (PI), Na Li, PrasenjitMitra, Karl Meuller, Peter Murray-Rust, Marlon Pierce, Joe Townsend, and Theresa Velden. 24
25 #ahm2010 #ahm #ahm10 #pch2010 http://pegasus.chem.soton.ac.uk #ahm2010 until 11am Wed 15 Sept 2010

More Related Content

Viewers also liked

Soo presentation
Soo presentationSoo presentation
Soo presentation
frank tan
 

Viewers also liked (8)

Change
ChangeChange
Change
 
新年
新年新年
新年
 
The Power Of Multiplication
The Power Of MultiplicationThe Power Of Multiplication
The Power Of Multiplication
 
Soo presentation
Soo presentationSoo presentation
Soo presentation
 
Presentatie webrichtlijnen
Presentatie webrichtlijnenPresentatie webrichtlijnen
Presentatie webrichtlijnen
 
FAS: Shop2market over Conversie Attributie
FAS: Shop2market over Conversie AttributieFAS: Shop2market over Conversie Attributie
FAS: Shop2market over Conversie Attributie
 
Peter Sinnige - webvideo
Peter Sinnige - webvideoPeter Sinnige - webvideo
Peter Sinnige - webvideo
 
New Excited Info
New Excited InfoNew Excited Info
New Excited Info
 

Similar to Integration of oreChem with the eCrystals repository for crystal structures

Similar to Integration of oreChem with the eCrystals repository for crystal structures (20)

The eCrystals Federation
The eCrystals FederationThe eCrystals Federation
The eCrystals Federation
 
Datasalon6 2011 - "Rise of the robo scientists": where is data coming from?
Datasalon6 2011 - "Rise of the robo scientists": where is data coming from?Datasalon6 2011 - "Rise of the robo scientists": where is data coming from?
Datasalon6 2011 - "Rise of the robo scientists": where is data coming from?
 
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
 
Benefits and practice of open science
Benefits and practice of open scienceBenefits and practice of open science
Benefits and practice of open science
 
Understanding the Big Picture of e-Science
Understanding the Big Picture of e-ScienceUnderstanding the Big Picture of e-Science
Understanding the Big Picture of e-Science
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Towards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemTowards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication System
 
The Developing Needs for e-infrastructures
The Developing Needs for e-infrastructuresThe Developing Needs for e-infrastructures
The Developing Needs for e-infrastructures
 
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
 
Perx and TechXtra
Perx and TechXtraPerx and TechXtra
Perx and TechXtra
 
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
 
On chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurementsOn chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurements
 
Showcasing research data tools
Showcasing research data toolsShowcasing research data tools
Showcasing research data tools
 
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
 
Cyberinfrastructure to Support Ocean Observatories
Cyberinfrastructure to Support Ocean ObservatoriesCyberinfrastructure to Support Ocean Observatories
Cyberinfrastructure to Support Ocean Observatories
 
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR:  "How to manage your data to make them open and fair"  WEBINAR:  "How to manage your data to make them open and fair"
WEBINAR: "How to manage your data to make them open and fair"
 
Berlin 6 Open Access Conference: Tony Hey
Berlin 6 Open Access Conference: Tony HeyBerlin 6 Open Access Conference: Tony Hey
Berlin 6 Open Access Conference: Tony Hey
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
Showcasing research data tools - Jisc Digifest 2016
Showcasing research data tools - Jisc Digifest 2016Showcasing research data tools - Jisc Digifest 2016
Showcasing research data tools - Jisc Digifest 2016
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on them
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 

Integration of oreChem with the eCrystals repository for crystal structures

  • 1. Integration of oreChemwith the eCrystals repository for crystal structures Mark Borkum, Simon Coles and Jeremy Frey15 September 2010
  • 2. Overview Motivation Implementation Discussion and Summary 2
  • 3. Current Practice in Crystallography Crystallography data is highly structured The de facto standard adopted by the community is the CIF (Crystallographic Information File) Relatively few crystal structures are openly published 3 http://www.rin.ac.uk/our-work/data-management-and-curation/share-or-not-share-research-data-outputs
  • 4. Open Access Journals Advantages: Rapid publication Highly cited Data is available to download Disadvantages: Electronic only Not all data is of primary importance to the underlying chemistry By-products, unexpected results, tracking reactions, etc. 4
  • 6. The eCrystals Federation JISC project to establish a network of crystallography resources on the Internet, with metadata that is harvested by a number of aggregation services Led by the UK National Crystallography Service (NCS) With core partners at UKOLN, the Digital Curation Centre, and the Unilever Centre for Molecular Science Informatics 6
  • 7. eCrystals – University of Southampton Located @ http://ecrystals.chem.soton.ac.uk Archive for crystal structures that are generated by: Southampton Chemical Crystallography Group UK National Crystallography Service (NCS) Modified version of EPrints 3.1 OAI-PMH compliant Extensible platform (with plug-ins architecture) 7
  • 8. What is an eCrystal? “all the fundamental and derived data resulting from a single crystal X-ray structure determination” “the information supplied should enable any reader to check the reliability and validity” 8 http://www.ukoln.ac.uk/projects/ebank-uk/images/collage-web.gif
  • 10. The Data Deluge 10 In Haiku: Lots of producers;Generating more datathan ever before. 40 years ago, a PhD student would determine 3 structures over the entire course of their study! The Great Wave off Kanagawa by Katsushika Hokusai
  • 11. Provenance The 7 W’s [Goble 2002] Who, What, Where, Why, When, Which, & (W)How The Why aspect is usually ignored  Rational, intent, hypothesis, protocol, methodology, workflow, etc. 11 “Diana and Actaeon by Titian has a full provenance covering its passage through several owners and four countries since it was painted for Philip II of Spain in the 1550s.” Source: http://en.wikipedia.org/wiki/Diana_and_Actaeon_%28Titian%29
  • 12. “In theory, there is no difference between theory and practice.But, in practice, there is.” Unknown (possibly Yogi Berra) 12
  • 13. Why “Why” Matters It is the reason for the data’s existence It gives us the ability to interpret the data in the correct context It allows us to align the data with the big picture 13 http://www.myexperiment.org/workflows/16.html
  • 14. The oreChem Core Ontology Describes three concepts: The methodology (planned method) of a scientific experiment The enactment of methodologies The provenance of realised artefacts 14
  • 15. Methodology (Planned Method) The “plan” is modelled as a directed graph Two node types: Plan Stagedescription of an activity that will be enacted Plan Objectdescription of an artefact that will be realised 15
  • 16. Enactment (of a Methodology) Each “run” (of a plan) is modelled as a directed graph Two node types: Stagedescription of an activity that has been enacted Objectdescription of an artefact that has been realised 16
  • 17. Provenance Prospective The plan describes a scientific experiment that will be enacted Retrospective The run describes a scientific experiment that hasbeen enacted Every ‘run thing’ is linked to exactly one ‘plan thing’ 17
  • 18. oreChem Plug-in for eCrystals Three components: orechem:Plan (the eCrystals methodology) “eCrystalorechem:Run” mapping “orechem:Run provenance graph” pipeline 18
  • 19. The eCrystals Methodology 19 Before After
  • 20. Example: eCrystal #643 Before After 20
  • 21. SPARQL Request PREFIX orechem: <http://www.openarchives.org/2010/05/24-orechem-ns#> PREFIX ecrystals: <http://ecrystals.chem.soton.ac.uk/plan.rdf#>SELECT ?run ?raw ?derived ?reported WHERE { ?run a orechem:Run ; orechem:hasPlanecrystals:Ecrystals ; orechem:containsObject ?raw ; orechem:containsObject ?derived ; orechem:containsObject ?reported . ?raw a orechem:File ; orechem:hasPlanObjectecrystals:HKL . ?derived a orechem:File ; orechem:derivedFrom ?raw . ?reported a orechem:File ; orechem:hasPlanObjectecrystals:CIF ; orechem:derivedFrom ?derived . } 21
  • 22. SPARQL Response (for eCrystal #643) 22 ?run ?reported ?derived ?raw
  • 24. Acknowledgments oreChem is funded by Microsoft External Research eCrystals is funded by both EPSRC and JISC The oreChem project team: Nico Adams, Mark Borkum, William Brouwer, RameswaraSashiKiranChalla, Simon Coles, Nick Day, Jim Downing, Jeremy Frey, C. Lee Giles, Carl Lagoze (PI), Na Li, PrasenjitMitra, Karl Meuller, Peter Murray-Rust, Marlon Pierce, Joe Townsend, and Theresa Velden. 24
  • 25. 25 #ahm2010 #ahm #ahm10 #pch2010 http://pegasus.chem.soton.ac.uk #ahm2010 until 11am Wed 15 Sept 2010