The Great Promise of Online Data for Chemistry and the Life Sciences

The Great Promise of Online Data for
Chemistry and the Life Sciences

Antony J Williams
Silverchair Colloquium 2012

READ FAST – IT’S HAPPENING NOW

20 minutes, >40 slides

Disruption Can be Cheap,
Fast and Unexpectedly
Successful

Online Chemistry Databases in 2007

A search gave LOTS of “info”..
What is Yohimbine?

Why not Index the web of chemistry?
 Build a search engine for chemistry

 Index all public domain chemicals and link

 Build a structure searchable web

 Crowdsource new chemistry from the community

 Crowdsource curation and annotation

Create a structure-centric hub

Answering Real Questions
 Questions a chemist might ask…
 What is the melting point of n-heptanol?
 What is the chemical structure of Xanax?
 Chemically, what is phenolphthalein?
 What are the stereocenters of cholesterol?
 Where can I find publications about xylene?
 What are the different trade names for Ketoconazole?
 What is the NMR spectrum of Aspirin?
 What are the safety handling issues for Thymol Blue?

The World of Online Chemistry
 Safety data
 Toxicity data
 Blogs and Wikis
 Property databases
 Experimental results
 Scientific publications
 Compound aggregators
 Open Notebook Science
 Metabolic pathway databases
 Encyclopedic articles (Wikipedia)

Linked Data for Life Sciences growing…

Solve Real World Problems
 Provide programmable interface against content
 Provide a chemistry database tuned to integrators

RSC and ChemSpider – May 2009

Why RSC acquired ChemSpider
 Commitment to serve the community

 Bring cheminformatics expertise in-house

 Add additional data to publications

 Potential freemium model – web services, data

 Because data is critical to science

Making sense of data is overwhelming

Publications are Hosts to Data

Data has value, is Free, is Open
 Data cannot be copyrighted. A particular
expression of data, such as a chart or table in a
publication, can be.

 Data licensing is being dealt with and openness
encouraged

 Research data mandates are starting…

 Who will manage the integration and curation
and keep the access FREE!

SOME Chemistry Databases in 2012

Tell me more…but…
 Where can I find the electronic structure?
 Papers/Patents about Yohimbine?
 What are the side effects of Yohimbine?
 Where can I order Yohimbine?
 What are the physicochemical properties?
 What are the associated metabolic pathways?
 Different synonyms of Yohimbine?
 Are there side effects with Yohimbine?

 ChemSpider links all of this information and more

And so are…
 Chemical vendors
 Safety and Toxicity information
 Experimental and Predicted properties
 Analytical data
 Images and Movies

 And all for free…

Not only compounds but syntheses

The world can take and contribute
 Scientists can deposit their data

 They can annotate and curate

 They can download data

 They can embed data in the social network

 They can integrate and connect

Integrate to electronic lab notebooks

Integrate to instruments and software
 Primary analytical instrumentation vendors integrate

 Agilent, Bruker, Thermo, Waters

 Cheminformatics vendors link to ChemSpider

 Accelrys, ACD/Labs, ChemAxon, iChemLabs

Publications are a summary of work
 Scientific publications are a summary of work
 Is all work reported?
 How much science is lost to pruning?
 What of value sits in notebooks and is lost?

 How much data is lost?
 How many compounds never reported?
 How many syntheses fail or succeed?
 How many characterization measurements?

What if we could capture it all?

Start with data in publications

But in the time of Big Data…it’s linked!

ONE example – data for life sciences
IP?
What’s the
structure?
Are they in
our file?
What’s
similar?
What’s the
Pharmacology target?
data?

Known
Pathways?
Competitors?
Working On
Connections Now?
to disease?
Expressed in
right cell type?

 Crowdsourcing across drug discovery
 Open PHACTS : partnership between European
Community and European Pharma Companies
 22 partners, 8 pharmaceutical companies, 3
biotechs working together for 3 years

 Freely accessible for knowledge discovery and
verification.
 Data on chemistry and biology
 Pharmacological profiles
 Proprietary and public data sources.

All that glisters is not gold…

Crowdsourced Assertions
 The future of publishing will include generation
and consumption of “nanopublications”

 http://www.nanopub.org/

So what’s the business model?
 Decisions are based on data

 Publications encapsulate, reference and link data

 More data is free and open. More services and
APIS allow access – free or for fee. Ask Google

 The large-scale licensed content business model
is at risk without interfaces to integrate and mine

Acknowledgments
 The RSC ChemSpider team

 Our users, our depositors, our curators

 GGA Software Services, OpenEye, ACD/Labs
and a lot of Open Source code!

 And Al Gore for supporting the internet
http://
en.wikipedia.org/wiki/Al_Gore_and_information_techn

Thank you

Email: williamsa@rsc.org
Twitter: ChemConnector
Personal Blog: www.chemconnector.com
SLIDES: www.slideshare.net/AntonyWilliams

The Great Promise of Online Data for Chemistry and the Life Sciences

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to The Great Promise of Online Data for Chemistry and the Life Sciences

Similar to The Great Promise of Online Data for Chemistry and the Life Sciences (20)

Recently uploaded

Recently uploaded (20)

The Great Promise of Online Data for Chemistry and the Life Sciences