Integration of oreChem with the eCrystals repository for crystal structures
Upcoming SlideShare
Loading in...5
×
 

Integration of oreChem with the eCrystals repository for crystal structures

on

  • 1,263 views

 

Statistics

Views

Total Views
1,263
Views on SlideShare
1,263
Embed Views
0

Actions

Likes
1
Downloads
15
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Integration of oreChem with the eCrystals repository for crystal structures Integration of oreChem with the eCrystals repository for crystal structures Presentation Transcript

  • Integration of oreChemwith the eCrystals repository for crystal structures
    Mark Borkum, Simon Coles and Jeremy Frey15 September 2010
  • Overview
    Motivation
    Implementation
    Discussion and Summary
    2
  • Current Practice in Crystallography
    Crystallography data is highly structured
    The de facto standard adopted by the community is the CIF (Crystallographic Information File)
    Relatively few crystal structures are openly published
    3
    http://www.rin.ac.uk/our-work/data-management-and-curation/share-or-not-share-research-data-outputs
  • Open Access Journals
    Advantages:
    Rapid publication
    Highly cited
    Data is available to download
    Disadvantages:
    Electronic only
    Not all data is of primary importance to the underlying chemistry
    By-products, unexpected results, tracking reactions, etc.
    4
  • Crystallography and Fraud
    5
  • The eCrystals Federation
    JISC project to establish a network of crystallography resources on the Internet, with metadata that is harvested by a number of aggregation services
    Led by the UK National Crystallography Service (NCS)
    With core partners at UKOLN, the Digital Curation Centre, and the Unilever Centre for Molecular Science Informatics
    6
  • eCrystals – University of Southampton
    Located @ http://ecrystals.chem.soton.ac.uk
    Archive for crystal structures that are generated by:
    Southampton Chemical Crystallography Group
    UK National Crystallography Service (NCS)
    Modified version of EPrints 3.1
    OAI-PMH compliant
    Extensible platform (with plug-ins architecture)
    7
  • What is an eCrystal?
    “all the fundamental and derived data resulting from a single crystal X-ray structure determination”
    “the information supplied should enable any reader to check the reliability and validity”
    8
    http://www.ukoln.ac.uk/projects/ebank-uk/images/collage-web.gif
  • The Scientific Web
    9
  • The Data Deluge
    10
    In Haiku:
    Lots of producers;Generating more datathan ever before.
    40 years ago, a PhD student would determine 3 structures over the entire course of their study!
    The Great Wave off Kanagawa by Katsushika Hokusai
  • Provenance
    The 7 W’s [Goble 2002]
    Who, What, Where, Why, When, Which, & (W)How
    The Why aspect is usually ignored 
    Rational, intent, hypothesis, protocol, methodology, workflow, etc.
    11
    “Diana and Actaeon by Titian has a full provenance covering its passage through several owners and four countries since it was painted for Philip II of Spain in the 1550s.”
    Source: http://en.wikipedia.org/wiki/Diana_and_Actaeon_%28Titian%29
  • “In theory, there is no difference between theory and practice.But, in practice, there is.” Unknown (possibly Yogi Berra)
    12
  • Why “Why” Matters
    It is the reason for the data’s existence
    It gives us the ability to interpret the data in the correct context
    It allows us to align the data with the big picture
    13
    http://www.myexperiment.org/workflows/16.html
  • The oreChem Core Ontology
    Describes three concepts:
    The methodology (planned method) of a scientific experiment
    The enactment of methodologies
    The provenance of realised artefacts
    14
  • Methodology (Planned Method)
    The “plan” is modelled as a directed graph
    Two node types:
    Plan Stagedescription of an activity that will be enacted
    Plan Objectdescription of an artefact that will be realised
    15
  • Enactment (of a Methodology)
    Each “run” (of a plan) is modelled as a directed graph
    Two node types:
    Stagedescription of an activity that has been enacted
    Objectdescription of an artefact that has been realised
    16
  • Provenance
    Prospective
    The plan describes a scientific experiment that will be enacted
    Retrospective
    The run describes a scientific experiment that hasbeen enacted
    Every ‘run thing’ is linked to exactly one ‘plan thing’
    17
  • oreChem Plug-in for eCrystals
    Three components:
    orechem:Plan (the eCrystals methodology)
    “eCrystalorechem:Run” mapping
    “orechem:Run provenance graph” pipeline
    18
  • The eCrystals Methodology
    19
    Before
    After
  • Example: eCrystal #643
    Before
    After
    20
  • SPARQL Request
    PREFIX orechem: <http://www.openarchives.org/2010/05/24-orechem-ns#>
    PREFIX ecrystals: <http://ecrystals.chem.soton.ac.uk/plan.rdf#>SELECT ?run ?raw ?derived ?reported
    WHERE {
    ?run a orechem:Run ;
    orechem:hasPlanecrystals:Ecrystals ;
    orechem:containsObject ?raw ;
    orechem:containsObject ?derived ;
    orechem:containsObject ?reported .
    ?raw a orechem:File ;
    orechem:hasPlanObjectecrystals:HKL .
    ?derived a orechem:File ;
    orechem:derivedFrom ?raw .
    ?reported a orechem:File ;
    orechem:hasPlanObjectecrystals:CIF ;
    orechem:derivedFrom ?derived .
    }
    21
  • SPARQL Response (for eCrystal #643)
    22
    ?run
    ?reported
    ?derived
    ?raw
  • Summary
    <summary/>
    23
  • Acknowledgments
    oreChem is funded by Microsoft External Research
    eCrystals is funded by both EPSRC and JISC
    The oreChem project team:
    Nico Adams, Mark Borkum, William Brouwer, RameswaraSashiKiranChalla, Simon Coles, Nick Day, Jim Downing, Jeremy Frey, C. Lee Giles, Carl Lagoze (PI), Na Li, PrasenjitMitra, Karl Meuller, Peter Murray-Rust, Marlon Pierce, Joe Townsend, and Theresa Velden.
    24
  • 25
    #ahm2010
    #ahm
    #ahm10
    #pch2010
    http://pegasus.chem.soton.ac.uk
    #ahm2010 until 11am Wed 15 Sept 2010