SlideShare a Scribd company logo
1 of 25
Integration of oreChemwith the eCrystals repository for crystal structures Mark Borkum, Simon Coles and Jeremy Frey15 September 2010
Overview Motivation Implementation Discussion and Summary 2
Current Practice in Crystallography Crystallography data is highly structured The de facto standard adopted by the community is the CIF (Crystallographic Information File) Relatively few crystal structures are openly published 3 http://www.rin.ac.uk/our-work/data-management-and-curation/share-or-not-share-research-data-outputs
Open Access Journals Advantages: Rapid publication Highly cited Data is available to download Disadvantages: Electronic only Not all data is of primary importance to the underlying chemistry By-products, unexpected results, tracking reactions, etc. 4
Crystallography and Fraud 5
The eCrystals Federation JISC project to establish a network of crystallography resources on the Internet, with metadata that is harvested by a number of aggregation services Led by the UK National Crystallography Service (NCS) With core partners at UKOLN, the Digital Curation Centre, and the Unilever Centre for Molecular Science Informatics 6
eCrystals – University of Southampton Located @ http://ecrystals.chem.soton.ac.uk Archive for crystal structures that are generated by: Southampton Chemical Crystallography Group UK National Crystallography Service (NCS) Modified version of EPrints 3.1 OAI-PMH compliant Extensible platform (with plug-ins architecture) 7
What is an eCrystal? “all the fundamental and derived data resulting from a single crystal X-ray structure determination” “the information supplied should enable any reader to check the reliability and validity” 8 http://www.ukoln.ac.uk/projects/ebank-uk/images/collage-web.gif
The Scientific Web 9
The Data Deluge 10 In Haiku: Lots of producers;Generating more datathan ever before. 40 years ago, a PhD student would determine 3 structures over the entire course of their study! The Great Wave off Kanagawa by Katsushika Hokusai
Provenance The 7 W’s [Goble 2002] Who, What, Where,  Why, When, Which, & (W)How The Why aspect is usually ignored  Rational, intent, hypothesis, protocol, methodology, workflow, etc. 11 “Diana and Actaeon by Titian has a full provenance covering its passage through several owners and four countries since it was painted for Philip II of Spain in the 1550s.” Source: http://en.wikipedia.org/wiki/Diana_and_Actaeon_%28Titian%29
“In theory, there is no difference between theory and practice.But, in practice, there is.” Unknown (possibly Yogi Berra) 12
Why “Why” Matters It is the reason for the data’s existence It gives us the ability to interpret the data in the correct context It allows us to align the data with the big picture 13 http://www.myexperiment.org/workflows/16.html
The oreChem Core Ontology Describes three concepts: The methodology (planned method) of a scientific experiment The enactment of methodologies The provenance of realised artefacts 14
Methodology (Planned Method) The “plan” is modelled as a directed graph Two node types: Plan Stagedescription of an activity that will be enacted Plan Objectdescription of an artefact that will be realised 15
Enactment (of a Methodology) Each “run” (of a plan) is modelled as a directed graph  Two node types: Stagedescription of an activity that has been enacted Objectdescription of an artefact that has been realised 16
Provenance Prospective The plan describes a scientific experiment that will be enacted Retrospective The run describes a scientific experiment that hasbeen enacted Every ‘run thing’ is linked to exactly one ‘plan thing’ 17
oreChem Plug-in for eCrystals Three components: orechem:Plan (the eCrystals methodology)  “eCrystalorechem:Run” mapping  “orechem:Run provenance graph” pipeline 18
The eCrystals Methodology 19 Before After
Example: eCrystal #643 Before After 20
SPARQL Request PREFIX orechem:   <http://www.openarchives.org/2010/05/24-orechem-ns#> PREFIX ecrystals: <http://ecrystals.chem.soton.ac.uk/plan.rdf#>SELECT ?run ?raw ?derived ?reported WHERE {   ?run a orechem:Run ; orechem:hasPlanecrystals:Ecrystals ; orechem:containsObject ?raw ; orechem:containsObject ?derived ; orechem:containsObject ?reported .   ?raw a orechem:File ; orechem:hasPlanObjectecrystals:HKL .   ?derived a orechem:File ; orechem:derivedFrom ?raw .   ?reported a orechem:File ; orechem:hasPlanObjectecrystals:CIF ; orechem:derivedFrom ?derived . } 21
SPARQL Response (for eCrystal #643) 22 ?run ?reported ?derived ?raw
Summary <summary/> 23
Acknowledgments oreChem is funded by Microsoft External Research eCrystals is funded by both EPSRC and JISC The oreChem project team: Nico Adams, Mark Borkum, William Brouwer, RameswaraSashiKiranChalla, Simon Coles, Nick Day, Jim Downing, Jeremy Frey, C. Lee Giles, Carl Lagoze (PI), Na Li, PrasenjitMitra, Karl Meuller, Peter Murray-Rust, Marlon Pierce, Joe Townsend, and Theresa Velden. 24
25 #ahm2010 #ahm #ahm10 #pch2010 http://pegasus.chem.soton.ac.uk #ahm2010 until 11am Wed 15 Sept 2010

More Related Content

Viewers also liked

The Power Of Multiplication
The Power Of MultiplicationThe Power Of Multiplication
The Power Of Multiplicationfrank tan
 
Soo presentation
Soo presentationSoo presentation
Soo presentationfrank tan
 
FAS: Shop2market over Conversie Attributie
FAS: Shop2market over Conversie AttributieFAS: Shop2market over Conversie Attributie
FAS: Shop2market over Conversie AttributieTjitte Folkertsma
 
New Excited Info
New Excited InfoNew Excited Info
New Excited Infofrank tan
 

Viewers also liked (8)

Change
ChangeChange
Change
 
新年
新年新年
新年
 
The Power Of Multiplication
The Power Of MultiplicationThe Power Of Multiplication
The Power Of Multiplication
 
Soo presentation
Soo presentationSoo presentation
Soo presentation
 
Presentatie webrichtlijnen
Presentatie webrichtlijnenPresentatie webrichtlijnen
Presentatie webrichtlijnen
 
FAS: Shop2market over Conversie Attributie
FAS: Shop2market over Conversie AttributieFAS: Shop2market over Conversie Attributie
FAS: Shop2market over Conversie Attributie
 
Peter Sinnige - webvideo
Peter Sinnige - webvideoPeter Sinnige - webvideo
Peter Sinnige - webvideo
 
New Excited Info
New Excited InfoNew Excited Info
New Excited Info
 

Similar to Integration of oreChem with the eCrystals repository for crystal structures

The eCrystals Federation
The eCrystals FederationThe eCrystals Federation
The eCrystals FederationManjulaPatel
 
Datasalon6 2011 - "Rise of the robo scientists": where is data coming from?
Datasalon6 2011 - "Rise of the robo scientists": where is data coming from?Datasalon6 2011 - "Rise of the robo scientists": where is data coming from?
Datasalon6 2011 - "Rise of the robo scientists": where is data coming from?Pieter Pauwels
 
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...Paragon_Science_Inc
 
Benefits and practice of open science
Benefits and practice of open scienceBenefits and practice of open science
Benefits and practice of open scienceSarah Jones
 
Understanding the Big Picture of e-Science
Understanding the Big Picture of e-ScienceUnderstanding the Big Picture of e-Science
Understanding the Big Picture of e-ScienceAndrew Sallans
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 
Towards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemTowards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemHerbert Van de Sompel
 
The Developing Needs for e-infrastructures
The Developing Needs for e-infrastructuresThe Developing Needs for e-infrastructures
The Developing Needs for e-infrastructuresguest0dc425
 
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...EOSC-hub project
 
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019PaNOSC
 
On chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurementsOn chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurementsNina Jeliazkova
 
Showcasing research data tools
Showcasing research data toolsShowcasing research data tools
Showcasing research data toolsJisc RDM
 
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...Carole Goble
 
Cyberinfrastructure to Support Ocean Observatories
Cyberinfrastructure to Support Ocean ObservatoriesCyberinfrastructure to Support Ocean Observatories
Cyberinfrastructure to Support Ocean ObservatoriesLarry Smarr
 
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR:  "How to manage your data to make them open and fair"  WEBINAR:  "How to manage your data to make them open and fair"
WEBINAR: "How to manage your data to make them open and fair" OpenAIRE
 
Berlin 6 Open Access Conference: Tony Hey
Berlin 6 Open Access Conference: Tony HeyBerlin 6 Open Access Conference: Tony Hey
Berlin 6 Open Access Conference: Tony HeyCornelius Puschmann
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8Scott Edmunds
 
Showcasing research data tools - Jisc Digifest 2016
Showcasing research data tools - Jisc Digifest 2016Showcasing research data tools - Jisc Digifest 2016
Showcasing research data tools - Jisc Digifest 2016Jisc
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themRoss Mounce
 

Similar to Integration of oreChem with the eCrystals repository for crystal structures (20)

The eCrystals Federation
The eCrystals FederationThe eCrystals Federation
The eCrystals Federation
 
Datasalon6 2011 - "Rise of the robo scientists": where is data coming from?
Datasalon6 2011 - "Rise of the robo scientists": where is data coming from?Datasalon6 2011 - "Rise of the robo scientists": where is data coming from?
Datasalon6 2011 - "Rise of the robo scientists": where is data coming from?
 
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
 
Benefits and practice of open science
Benefits and practice of open scienceBenefits and practice of open science
Benefits and practice of open science
 
Understanding the Big Picture of e-Science
Understanding the Big Picture of e-ScienceUnderstanding the Big Picture of e-Science
Understanding the Big Picture of e-Science
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Towards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemTowards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication System
 
The Developing Needs for e-infrastructures
The Developing Needs for e-infrastructuresThe Developing Needs for e-infrastructures
The Developing Needs for e-infrastructures
 
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
 
Perx and TechXtra
Perx and TechXtraPerx and TechXtra
Perx and TechXtra
 
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
 
On chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurementsOn chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurements
 
Showcasing research data tools
Showcasing research data toolsShowcasing research data tools
Showcasing research data tools
 
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
 
Cyberinfrastructure to Support Ocean Observatories
Cyberinfrastructure to Support Ocean ObservatoriesCyberinfrastructure to Support Ocean Observatories
Cyberinfrastructure to Support Ocean Observatories
 
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR:  "How to manage your data to make them open and fair"  WEBINAR:  "How to manage your data to make them open and fair"
WEBINAR: "How to manage your data to make them open and fair"
 
Berlin 6 Open Access Conference: Tony Hey
Berlin 6 Open Access Conference: Tony HeyBerlin 6 Open Access Conference: Tony Hey
Berlin 6 Open Access Conference: Tony Hey
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
Showcasing research data tools - Jisc Digifest 2016
Showcasing research data tools - Jisc Digifest 2016Showcasing research data tools - Jisc Digifest 2016
Showcasing research data tools - Jisc Digifest 2016
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on them
 

Recently uploaded

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Recently uploaded (20)

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Integration of oreChem with the eCrystals repository for crystal structures

  • 1. Integration of oreChemwith the eCrystals repository for crystal structures Mark Borkum, Simon Coles and Jeremy Frey15 September 2010
  • 2. Overview Motivation Implementation Discussion and Summary 2
  • 3. Current Practice in Crystallography Crystallography data is highly structured The de facto standard adopted by the community is the CIF (Crystallographic Information File) Relatively few crystal structures are openly published 3 http://www.rin.ac.uk/our-work/data-management-and-curation/share-or-not-share-research-data-outputs
  • 4. Open Access Journals Advantages: Rapid publication Highly cited Data is available to download Disadvantages: Electronic only Not all data is of primary importance to the underlying chemistry By-products, unexpected results, tracking reactions, etc. 4
  • 6. The eCrystals Federation JISC project to establish a network of crystallography resources on the Internet, with metadata that is harvested by a number of aggregation services Led by the UK National Crystallography Service (NCS) With core partners at UKOLN, the Digital Curation Centre, and the Unilever Centre for Molecular Science Informatics 6
  • 7. eCrystals – University of Southampton Located @ http://ecrystals.chem.soton.ac.uk Archive for crystal structures that are generated by: Southampton Chemical Crystallography Group UK National Crystallography Service (NCS) Modified version of EPrints 3.1 OAI-PMH compliant Extensible platform (with plug-ins architecture) 7
  • 8. What is an eCrystal? “all the fundamental and derived data resulting from a single crystal X-ray structure determination” “the information supplied should enable any reader to check the reliability and validity” 8 http://www.ukoln.ac.uk/projects/ebank-uk/images/collage-web.gif
  • 10. The Data Deluge 10 In Haiku: Lots of producers;Generating more datathan ever before. 40 years ago, a PhD student would determine 3 structures over the entire course of their study! The Great Wave off Kanagawa by Katsushika Hokusai
  • 11. Provenance The 7 W’s [Goble 2002] Who, What, Where, Why, When, Which, & (W)How The Why aspect is usually ignored  Rational, intent, hypothesis, protocol, methodology, workflow, etc. 11 “Diana and Actaeon by Titian has a full provenance covering its passage through several owners and four countries since it was painted for Philip II of Spain in the 1550s.” Source: http://en.wikipedia.org/wiki/Diana_and_Actaeon_%28Titian%29
  • 12. “In theory, there is no difference between theory and practice.But, in practice, there is.” Unknown (possibly Yogi Berra) 12
  • 13. Why “Why” Matters It is the reason for the data’s existence It gives us the ability to interpret the data in the correct context It allows us to align the data with the big picture 13 http://www.myexperiment.org/workflows/16.html
  • 14. The oreChem Core Ontology Describes three concepts: The methodology (planned method) of a scientific experiment The enactment of methodologies The provenance of realised artefacts 14
  • 15. Methodology (Planned Method) The “plan” is modelled as a directed graph Two node types: Plan Stagedescription of an activity that will be enacted Plan Objectdescription of an artefact that will be realised 15
  • 16. Enactment (of a Methodology) Each “run” (of a plan) is modelled as a directed graph Two node types: Stagedescription of an activity that has been enacted Objectdescription of an artefact that has been realised 16
  • 17. Provenance Prospective The plan describes a scientific experiment that will be enacted Retrospective The run describes a scientific experiment that hasbeen enacted Every ‘run thing’ is linked to exactly one ‘plan thing’ 17
  • 18. oreChem Plug-in for eCrystals Three components: orechem:Plan (the eCrystals methodology) “eCrystalorechem:Run” mapping “orechem:Run provenance graph” pipeline 18
  • 19. The eCrystals Methodology 19 Before After
  • 20. Example: eCrystal #643 Before After 20
  • 21. SPARQL Request PREFIX orechem: <http://www.openarchives.org/2010/05/24-orechem-ns#> PREFIX ecrystals: <http://ecrystals.chem.soton.ac.uk/plan.rdf#>SELECT ?run ?raw ?derived ?reported WHERE { ?run a orechem:Run ; orechem:hasPlanecrystals:Ecrystals ; orechem:containsObject ?raw ; orechem:containsObject ?derived ; orechem:containsObject ?reported . ?raw a orechem:File ; orechem:hasPlanObjectecrystals:HKL . ?derived a orechem:File ; orechem:derivedFrom ?raw . ?reported a orechem:File ; orechem:hasPlanObjectecrystals:CIF ; orechem:derivedFrom ?derived . } 21
  • 22. SPARQL Response (for eCrystal #643) 22 ?run ?reported ?derived ?raw
  • 24. Acknowledgments oreChem is funded by Microsoft External Research eCrystals is funded by both EPSRC and JISC The oreChem project team: Nico Adams, Mark Borkum, William Brouwer, RameswaraSashiKiranChalla, Simon Coles, Nick Day, Jim Downing, Jeremy Frey, C. Lee Giles, Carl Lagoze (PI), Na Li, PrasenjitMitra, Karl Meuller, Peter Murray-Rust, Marlon Pierce, Joe Townsend, and Theresa Velden. 24
  • 25. 25 #ahm2010 #ahm #ahm10 #pch2010 http://pegasus.chem.soton.ac.uk #ahm2010 until 11am Wed 15 Sept 2010