• 57 million chemicals and growing
• Data sourced from >500 different sources
• Crowdsourced curation and annotation
• Ongoing deposition of data from our
journals and our collaborators
• A structure centric hub for web-searching
Reaction 1: NextMove reaction text-
mined from RSC archive – cml output
<?xml version="1.0" encoding="UTF-8"?>
<reactionList xmlns="http://www.xml-cml.org/schema" xmlns:cmlDict="http://www.xml-cml.org/dictionary/cml/" xmlns:nameDict="http://www.xml-cml.org/dictionary/cml/name/"
xmlns:unit="http://www.xml-cml.org/unit/" xmlns:cml="http://www.xml-cml.org/schema" xmlns:dl="http://bitbucket.org/dan2097">
<reaction>
<dl:source>
<dl:documentId>c3ra45871g</dl:documentId>
<dl:paragraphText>Diisobutylaluminium hydride (1.1 M in cyclohexane, 2.93 mL, 3.23 mmol) was added dropwise to the solution of 9 (500 mg, 1.29 mmol) and dichloromethane (20 mL)
at −78 °C. The reaction mixture was stirred at −78 °C for another 2 h, warmed up to rt, quenched with methanol (3 mL) and citric acid(aq) (w/w, 10%, 5 mL), concentrated. The residue
was added with water (10 mL) and extracted with dichloromethane (12 mL × 3). The organic layers were combined, dried over Na2SO4, filtered and concentrated. The crude product
was further purified by column chromatography (SiO2, EtOAc–hexanes, 1 : 7; Rf 0.33) to give 10 (308 mg, 1.02 mmol, 79%) as a colourless liquid. [α]D20 −24.2 (c 1.1, CHCl3); 1H NMR
(CDCl3, 300 MHz) δ 0.04 (s, 3H), 0.07 (s, 3H), 0.85 (s, 9H), 1.34 (s, 3H), 1.44 (s, 3H), 2.16 (br, 1H), 3.68–3.81 (m, 3H), 4.16 (t, J = 13.8 Hz, J = 13.8 Hz, 1H), 4.59 (t, J = 6.6 Hz, J = 6.6
Hz, 1H), 5.22 (d, J = 10.7 Hz, 1H), 5.34 (d, J = 17.1 Hz, 1H), 5.90 (ddd, J = 7.2 Hz, J = 10.2 Hz, J = 17.2 Hz, 1H); 13C NMR (CDCl3, 75 MHz) δ 134.1, 118.4, 108.5, 79.5, 78.8, 70.8,
65.0, 27.8, 25.9, 25.4, 18.1, −3.7, −4.4. HRMS (ESI) calcd for [M + Na]+ (C15H30O4SiNa) 325.1811, found 325.1807.</dl:paragraphText>
</dl:source>
<dl:reactionSmiles>[H-
].C([Al+]CC(C)C)C(C)C.C([O:17][CH2:18][C@@H:19]([O:29][Si:30]([C:33]([CH3:36])([CH3:35])[CH3:34])([CH3:32])[CH3:31])[C@@H:20]1[C@H:24]([CH:25]=[CH2:26])[O:23][C:22]([CH3
:28])([CH3:27])[O:21]1)(=O)C(C)(C)C>ClCCl>[C:33]([Si:30]([CH3:32])([CH3:31])[O:29][C@@H:19]([C@@H:20]1[C@H:24]([CH:25]=[CH2:26])[O:23][C:22]([CH3:28])([CH3:27])[O:21
]1)[CH2:18][OH:17])([CH3:36])([CH3:35])[CH3:34] |f:0.1|</dl:reactionSmiles>
<productList>
<product role="product">
<molecule id="m0">
<name dictRef="nameDict:unknown">10</name>
<dl:nameResolved>(R)-2-((tert-Butyldimethylsilyl)oxy)-2-((4S,5S)-2,2-dimethyl-5-vinyl-1,3-dioxolan-4-yl)ethanol</dl:nameResolved>
</molecule>
<amount dl:propertyType="AMOUNT" dl:normalizedValue="0.00102">1.02 mmol</amount>
<amount dl:propertyType="MASS" dl:normalizedValue="0.308">308 mg</amount>
<amount dl:propertyType="PERCENTYIELD" dl:normalizedValue="79">79%</amount>
<amount dl:propertyType="CALCULATEDPERCENTYIELD" dl:normalizedValue="79.1" units="unit:percentYield">79.1</amount>
<identifier dictRef="cml:smiles" value="C(C)(C)(C)[Si](O[C@H](CO)[C@H]1OC(O[C@H]1C=C)(C)C)(C)C"/>
<identifier dictRef="cml:inchi" value="InChI=1S/C15H30O4Si/c1-9-11-13(18-15(5,6)17-11)12(10-16)19-20(7,8)14(2,3)4/h9,11-13,16H,1,10H2,2-8H3/t11-,12+,13-/m0/s1"/>
<dl:entityType>definiteReference</dl:entityType>
<dl:appearance>colourless</dl:appearance>
<dl:state>liquid</dl:state>
</product>
</productList>
<reactantList>
<reactant role="reactant">
<molecule id="m1">
<name dictRef="nameDict:unknown">Diisobutylaluminium hydride</name>
</molecule>
<amount dl:propertyType="AMOUNT" dl:normalizedValue="0.00323">3.23 mmol</amount>
Reaction 1: procedure steps
Diisobutylaluminium hydride (1.1 M in
cyclohexane, 2.93 mL, 3.23 mmol) was added
dropwise to the solution of 9 (500 mg, 1.29
mmol) and dichloromethane (20 mL) at −78 °C.
The reaction mixture was stirred at −78 °C for
another 2 h, warmed up to rt, quenched with
methanol (3 mL) and citric acid (aq) (w/w, 10%,
5 mL), concentrated. The residue was added
with water (10 mL) and extracted with
dichloromethane (12 mL × 3). The organic
layers were combined, dried over Na2SO4,
filtered and concentrated. The crude product
was further purified by column chromatography
(SiO2, EtOAc–hexanes, 1 : 7; Rf 0.33) to give
10 (308 mg, 1.02 mmol, 79%) as a colourless
liquid.
Text mining breaks down procedure summary into steps:
<dl:reactionActionList/dl:reactionActions> dl:phraseTexts
• action="Add“: Diisobutylaluminium hydride (1.1 M in
cyclohexane, 2.93 mL, 3.23 mmol) was added dropwise to
the solution of 9 (500 mg, 1.29 mmol) and
dichloromethane (20 mL) at −78 °C
• action=" Stir“: The reaction mixture was stirred at −78 °C
for another 2 h
• action="Heat“: warmed up to rt
• action="Quench“: quenched with methanol (3 mL) and
citric acid(aq) (w/w, 10%, 5 mL)
• action="Concentrate“: concentrated
• action="Add“: The residue was added with water (10 mL)
• action="Extract“: extracted with dichloromethane (12 mL ×
3)
• action="Dry“: dried over Na2SO4
• action="Filter“: filtered
• action="Concentrate“: concentrated
• action="Purify“: The crude product was further purified by
column chromatography (SiO2, EtOAc–hexanes, 1 : 7; Rf
0.33)
• action="Yield“: to give 10 (308 mg, 1.02 mmol, 79%) as a
colourless liquid
info@openphactsfoundation.org @Open_PHACTS
Open PHACTS Practical Semantics
OpenPHACTS
GlaxoSmithKline – Coordinator
Universität Wien – Managing entity
Technical University of Denmark
University of Hamburg, Center for
Bioinformatics
BioSolveIT GmBH
Consorci Mar Parc de Salut de Barcelona
Leiden University Medical Centre
Royal Society of Chemistry
Vrije Universiteit Amsterdam
Novartis
Merck Serono
H. Lundbeck A/S
Eli Lilly
Netherlands Bioinformatics Centre
Swiss Institute of Bioinformatics
ConnectedDiscovery
EMBL-European Bioinformatics Institute
Janssen Esteve Almirall
OpenLink Scibite
The Open PHACTS Foundation
Spanish National Cancer Research Centre
University of Manchester
Maastricht University
Aqnowledge
University of Santiago de Compostela
Rheinische Friedrich-Wilhelms-Universität
Bonn
AstraZeneca
Pfizer
Why is it so hard to….
Competitors?
What’s the
structure?
Are they in our
file?
What’s
similar?
What’s the
target?Pharmacology
data?
Known
Pathways?
Working On
Now?
Connections to
disease?
Expressed in right
cell type?
IP?
Publishers - the guardians of knowledge
This is a poster for Guardians of the Galaxy. The poster art copyright is believed to belong to the distributor of the Film, Walt Disney Studios Motion
Pictures, the publisher, Marvel Studios, or the graphic artist.
Data Publishing
Original artist: Joseph Ferdinand Keppler (1838-1894) Restoration: Adam Cuerden - http://www.loc.gov/pictures/item/2011661385/ by way
ofhttp://adamcuerden.deviantart.com/gallery/#/d5onmxh
We are on a verge of a new technical revolution
and it feels great to anticipate it and be ready to ride!
Image from surfline.com by Mike Cianciulli
Data Science @ RSC
The team. From left to right: Valery Tkachenko and Alexey Pshenichnov, based in the United States;
Aileen Day, based in Southampton; John Boyle, Peter Corbett, Colin Batchelor, Jeff White, Nicholas
Bailey and Val the plant, based at TGH
Remember this, some of these questions are easier to answer than others
Open PHACTS was developed to support the key questions of drug discovery
Business questions have been at the heart of Open PHACTS and have driven the development of the platform
Mx/psa, how calculated who did it?
Mash up. With your data too,
- top layer join together but need them all
commercial
Data provided by many publishers
Originally in many formats: relational, SD files and RDF
Worked closely with publishers
Data licensing was a major issue
Over 5 billion triples – 14 datasets & growing
Hosted on beefy hardware; data in memory (aim)
Extensive memcaching
Pose complex queries to extract data