These are the slides I will be giving here at the Science Commons Symposium Pacific Northwest at the Microsoft Campus here in Redmond in about 5 minutes time
2. A Pragmatic Vision
“Build a Structure Centric Community”
December 2006 – A hobby project initiated to
connect chemistry on the web
Integrate chemical structure data on the web
Create a “structure-based hub” to information and
data
Provide access to structure-based “algorithms”
Let chemists contribute their own data
Allow the community to curate/correct data
3. Where is chemistry online?
Encyclopedic articles (Wikipedia)
Chemical vendor databases
Metabolic pathway databases
Property databases
Patents with chemical structures
Drug Discovery data
Scientific publications
Compound aggregators
Blogs/Wikis and Open Notebook Science
4. Chemistry on the Internet TODAY
Chemistry searches are generally limited to text-
based searches across the internet
Data are dirty: sorting the wheat from the chaff.
Who can you trust?
Too many searches required to resource data
6. Chemistry on the Internet FUTURE
The semantic web for chemistry is in place
Crowdsourced contributions are commonplace
Chemists will search by structure/substructure
Chemistry articles indexed and searchable
Reduced number of searches to find data
Data are integrated – compounds, vendors,
syntheses, data, publications and patents
A world of Open Access and Open Data
Classical business models will have to morph
7. Getting it done
March 2007 – A beta system opened online
One purchased computer, two home-built
Seeded with 10.5 million structures
Structure/substructure searching
June 2007
A curating layer to flag data
A deposition interface to add to the data
And so it continued….
19. Link off a structure in ChemSpider
Chemical suppliers
Other publications
Analytical Data
Related Reactions
Wikipedia
Patents
“Everything”
20. Answering Questions for Chemists
Questions a chemist might ask…
What is the melting point of n-butanol?
What is the chemical structure of Xanax?
Chemically, what is phenolphthalein?
What are the stereocenters of cholesterol?
Where can I find publications about xylene?
What are the different trade names for Ketoconazole?
What is the NMR spectrum of Aspirin?
What are the safety handling issues for Thymol Blue?
22. ChemSpider is a structure-centric hub
ChemSpider aggregates and links out across the
internet
Data aggregate based on “structures and links”
What defines a chemical compound?
46. InChIKeys for Taxol
DrugBank: RCINICONZNJXQF-CLDWUXIMDD
ChEBI: RCINICONZNJXQF-GXKQXQCDDN
Wikipedia: RCINICONZNJXQF-MZXODVADBJ
ChEBI and Wikipedia are the SAME structure
Drugbank is a DIFFERENT structure – ONE
stereocenter
48. Does one stereocenter matter?
Distaval, Talimol, Nibrol,
Sedimide, Quietoplex,
Contergan, Neurosedyn,
and Softenon
49. Does one stereocenter matter?
Distaval, Talimol, Nibrol,
Sedimide, Quietoplex,
Contergan, Neurosedyn,
and Softenon
50. Assertion and Chemical Entities
Who says what Taxol is?
What is the “timeline” for a molecule?
How do we clean up the Public data?
The Quality source is Chemical Abstracts Service…
66. Semantic Linking of Structures
What would you want
to link off a structure?
Chemical suppliers
Other publications
Analytical Data
Related Reactions
Wikipedia
Patents
“Everything”
71. ChemSpider Everywhere
Linked from Wikipedia and many Public Databases
Linked from Open Notebook Science sites
Linked from Blogs using Structure/Spectra EMBED
Integrated into structure drawing packages
Integrated to software offerings from Thermo,
Waters, Agilent, Bruker
73. There will always be gaps...
What ChemSpider does not deal with, yet...
Materials
Minerals
Polymers
Biological macromolecules
74. Open Source, Access and Data
ChemSpider is NOT Open Source but we do use
Open Source components (OpenBabel,
JSpecView, Jmol). Thanks Microsoft!
ChemSpider is not an “Open Access Database” –
it’s a “free access” resource
We do not assume copyright. Rights to the data
and the creative works remain with the depositor
Is ChemSpider “Open Data”?
76. Who declares data as Open?
Data licensing is very interesting and can spark
“interesting” conversations. Opinions differ:
Are images data? Are assertions data?
What on a ChemSpider record is data?
Is PubChem or PubMed Open Data?
We allow people to declare their data as Open and
add an Open Data button at upload
A lot of data on ChemSpider are free but not Open
Pragmatism: Our focus is a community resource
77. Conclusions: ChemSpider Today
ChemSpider is an established community resource
>23 million compounds from >300 data sources
About 7000 unique users per day and up to ½ million
transactions per day
A crowdsourced deposition and curation platform
Grows daily – more depositions, more links, more data
Web services provider
Linked to commercial and open source software
Supporting analytical companies: Agilent, Thermo, Waters, Bruker
Serving ONS, providing games to students, ChemSpidey robot
A publishing platform for the community
78. ChemSpider Tomorrow
Continue the curation effort and keep cleaning
Finish depositions – millions left to deposit
Integrate RSC content – a massive archive!
Integrate RSC publishing workflows and databases
Enable the semantic web for chemistry
79. Acknowledgments
Royal Society of Chemistry
Valery Tkachenko and Sergey Shevelev
Commercial Software: Microsoft, Advanced
Chemistry Development, OpenEye and Symyx
Open Source Software: Jmol, OpenBabel,
JSpecView
JC Bradley, Andrew Lang – The Spectral Game
and Open Notebook Science integration
The “Crowd” of curators
306 Data Source providers
SyntheticPages.org