An architecture for an Open Science molecular compound database
Upcoming SlideShare
Loading in...5
×
 

An architecture for an Open Science molecular compound database

on

  • 2,177 views

The past few years has seen a tremendous leap forward in public compound databases. Both PubChem and ChemSpider have made a clear message: chemical sciences can only move forward if we can search ...

The past few years has seen a tremendous leap forward in public compound databases. Both PubChem and ChemSpider have made a clear message: chemical sciences can only move forward if we can search existing chemistry. However, the exact Open nature of “public” database is not always crystal clear. PubChem is mostly public domain but contains proprietary content too, while ChemSpider is mostly proprietary but has Open Data content. Neither are clear in how the Open Data parts of these databases can be used, modified, and redistributed, the three corner stones of Open Science.

We will demo, based on previous work on http://rdf.openmolecules.net/, an architecture where semantic web technologies, the InChI, and Open Source cheminformatics tools are used to create a Panton Principles-compliant compound database to aid the next-generation public databases. Standards proposed in the Open PHACTS community will be use to specify links between this new resource and other databases, and to provide compound properties. All this input will be available with provenance on the origin of that data, as separate downloadable files, and using ontologies to provide explicit meaning. Using ontologies like ChEBI and CHEMINF, applications in the areas of metabolomics and toxicology will be presented.

Statistics

Views

Total Views
2,177
Views on SlideShare
724
Embed Views
1,453

Actions

Likes
1
Downloads
6
Comments
1

47 Embeds 1,453

http://planetrdf.com 574
http://chem-bla-ics.blogspot.com 467
http://chem-bla-ics.blogspot.nl 73
http://chem-bla-ics.blogspot.co.uk 48
http://chem-bla-ics.blogspot.de 36
http://chem-bla-ics.blogspot.it 32
http://chem-bla-ics.blogspot.ca 20
http://chem-bla-ics.blogspot.fr 16
http://chem-bla-ics.blogspot.jp 15
http://feeds.feedburner.com 14
http://newsblur.com 12
http://chem-bla-ics.blogspot.in 11
http://lanyrd.com 11
http://chem-bla-ics.blogspot.com.au 10
http://nrnb.org 9
http://chem-bla-ics.blogspot.fi 9
http://chem-bla-ics.blogspot.co.at 8
http://chem-bla-ics.blogspot.mx 8
http://www.infominder.com 8
http://chem-bla-ics.blogspot.com.es 8
http://www.newsblur.com 7
http://chem-bla-ics.blogspot.sg 5
http://chem-bla-ics.blogspot.cz 5
http://chem-bla-ics.blogspot.co.il 5
http://chem-bla-ics.blogspot.se 5
http://chem-bla-ics.blogspot.ch 3
http://feedreader.com 3
http://chem-bla-ics.blogspot.ru 3
http://chem-bla-ics.blogspot.be 3
http://chem-bla-ics.blogspot.tw 2
http://infominder.com 2
http://abtasty.com 2
http://chem-bla-ics.blogspot.dk 2
http://programming.collected.info 2
http://chem-bla-ics.blogspot.kr 2
http://www.tuicool.com 2
http://chem-bla-ics.blogspot.com.br 1
https://demo.plu.mx 1
http://chem-bla-ics.blogspot.co.nz 1
http://feedproxy.google.com 1
http://chem-bla-ics.blogspot.ie 1
http://chem-bla-ics.blogspot.gr 1
http://131.253.14.98 1
http://plus.url.google.com 1
http://l.lj-toys.com 1
http://chem-bla-ics.blogspot.pt 1
http://news.google.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

An architecture for an Open Science molecular compound database An architecture for an Open Science molecular compound database Presentation Transcript

  • Department of Bioinformatics - BiGCaT 1An architecture for anOpen Sciencemolecular compound databaseEgon Willighagen, @egonwillighagenDept. of Bioinformatics - BiGCaT - Maastricht Universityorcid.org/0000-0001-7542-0286ACS New Orleans, 9 April 2013, #ACSNola
  • Department of Bioinformatics - BiGCaT 2This session: Public Databases ...• Public: whats that?– free access?– redistribute?– Modify?• BTW, what is “Open Access” ???
  • Department of Bioinformatics - BiGCaT 3This session: Serving the community...• Service– What do people want?– Do they know what is possible?• Community– Who are they? Personas!→– Usability must include learnability
  • Department of Bioinformatics - BiGCaT 4Personas• Not every scientist is alike• You cannot and must not target oneuser• Instead, target at least 2 differentusers, particularly:– The hacker doing all the actualbioinformatics in the lab– The professor who has too little time tounderstand things outside his narrow field
  • Department of Bioinformatics - BiGCaT 5Reason #1: Bioclipse decision supportSpjuth, O. et al. JCIM 2011 51(8):1840-1847.
  • Department of Bioinformatics - BiGCaT 6Data #1: Linked Open Drug DataM. Samwald, et al, Linked open drug data for pharmaceutical research and development,2011, JChemInf.
  • Department of Bioinformatics - BiGCaT 7Data^2: Linked Open DataLinking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.http://lod-cloud.net/ Sept 2011, CC-BY-SA.
  • Department of Bioinformatics - BiGCaT 8Linked Open Data in the Life Sciences
  • Department of Bioinformatics - BiGCaT 9WikiPathwaysPico, AR et al. PLoS biology 6.7 (2008): e184.
  • Department of Bioinformatics - BiGCaT 10PathVisio: Pathway AnalysisVan Iersel, M. et al. BMC Bioinfo. 2008 9(1):399.
  • Department of Bioinformatics - BiGCaT 11Reason #2: Publishing• Journals will increasingly require datadeposition– e.g. BioMed Central:
  • Department of Bioinformatics - BiGCaT 12Needs• We must propagate rights– whether open or not!• We must make things explicit– e.g. by using semantics– e.g. by using the InChI
  • Department of Bioinformatics - BiGCaT 13Tool #1: licensing
  • Department of Bioinformatics - BiGCaT 14Open Data #1: crystallography
  • Department of Bioinformatics - BiGCaT 15Open Data #2: Open Notebook Science
  • Department of Bioinformatics - BiGCaT 16Open Data #3: CrystalEye
  • Department of Bioinformatics - BiGCaT 17Licensing Open not Required→• But not providing infois a killer– no, not really becauseno scientist seems tocare– yes, because how will amachine do? Thinkscalability and massivedata integration efforts
  • Department of Bioinformatics - BiGCaT 18Why does explicit licensing matter?Because when there is a fire, you want immediateaccess to the fire hose. You do not want to wait forpermission from the mayor.Because when you like to validate your scientificresults, you want immediate access to related data.You do not want to wait for permission from thatprofessor who is on a conference tour for the next 4weeks. You must have an immediate answer,whatever it is.
  • Department of Bioinformatics - BiGCaT 19Tool #2: Semantic Web to the rescue• Allows provenance– provide where data came from– tells us our rights
  • Department of Bioinformatics - BiGCaT 20App #1: Spidering the semantic webSpjuth, O et al. JChemInf 2013 5:14.
  • Department of Bioinformatics - BiGCaT 21App #2: Making a webhttp://rdf.openmolecules.net/
  • Department of Bioinformatics - BiGCaT 22App #3: Open PHACTS Explorerhttp://www.openphacts.org/→ room 349, 2:20pm
  • Department of Bioinformatics - BiGCaT 23How #1: RDF Graphs
  • Department of Bioinformatics - BiGCaT 24How #1: RDF GraphsPREFIX cheminf:<http://semanticscience.org/resource/>SELECT ?graph ?p ?o WHERE {GRAPH ?graph {?mol cheminf:CHEMINF_000200 [ acheminf:CHEMINF_000059 ;cheminf:SIO_000300 "$inchikey" ] ;?p ?o .}}
  • Department of Bioinformatics - BiGCaT 25NanoPub.org
  • Department of Bioinformatics - BiGCaT 26Graph outputorcid.org/0000-0001-7542-0286
  • Department of Bioinformatics - BiGCaT 27Is that it?!? Just an architecture??Yes, but a simple and flexible one. Keep an eye out on myblog. This will happen in the next few months:1. Aggregate all CCZero/PDDL data around chemicalproperties1.Open Notebook Science (solubility, melting point)2.ChemPedia3.Crystallography (COD, CrystalEye)4....2. Calculate molecular properties with the CDK (andrelease as CCZero)3. Host on http://linkedchemistry.info/chembox
  • Department of Bioinformatics - BiGCaT 28CHEMINF ontologyorcid.org/0000-0001-7542-0286Hastings, J. et al. PLoS ONE 2011 6(10):e25513.
  • Department of Bioinformatics - BiGCaT 29ArchitectureTriple Store(e.g. Virtuoso)Web server(HTML / RDF)• Graphs• Explicit licenseinfo• InChI/FixedH
  • Department of Bioinformatics - BiGCaT 30/FixedH ?!?!
  • Department of Bioinformatics - BiGCaT 31Conclusions & Outlook• We must propagate rights– whether open or not!• We must make things explicit– e.g. by using semantics– e.g. by using the InChI with FixedH
  • Department of Bioinformatics - BiGCaT 32More information• @egonwillighagen• http://chem-bla-ics.blogspot.com/• http://egonw.github.com/• http://orcid.org/0000-0001-7542-0286