Advanced Chemical Reaction Engineering-Part-1-10-Apr-2016
Virtual Reaction Service Using Chem Axon Reactor July06
1.
2. Why use reactor? Access relevant/target and novel set of chemical space for fast/accurate docking Importantly: Have a clear synthetic route to obtain real molecules
3.
4. Source of Virtual reactants is C - Space SDF JSP UNQ QUE QUE STG STG INT MOL SUP JOIN FRA SMI I handle all of the Java processing rather than aurora and so need memory too! My SGA needs to be as large as possible, but I should consider tomcat Mandatory Server Licence Is required for continious 24/7 ~3M recs ~1M recs
5.
6.
7.
8. Virtual synthesis approach: De-Novo vs Exhaustive LOG JSP FRA PHA RXN PRJ EXH RXN FOC rxn pha JSP JSP rcta rctb I could also do with some disk space! Reactor is memory intensive PARTLY due to Oracle/Tomcat round trip… Multiple instances of tomcat are required As single instance has physical limit 1 2 Edit
9. Developing the Amide reaction Still allows Amide as reactant, Aniline type amine allowed Does not allow for secondary Amine Or Acid Bromide Does allow Amide Ideal definition? SMARTS reactivity rule 1 Ensure amide not included ..r:!match(ratom(4),'NC=O') SMARTS reactivity rule 2 Basicity criteria for Amine N ..r:!match(ratom(4),'NC=O') && (pka(ratom(4))>20)
10. Aromatic Heterocyclic reactions (2 steps AmidoOximes / 1,2,4 Oxadiazoles) (OXADI) ~1000 Acid Halide Two points of diversity for our pharmacophore fragments filter ~100M to choose from (AMIDO) The other tautomer is produced in reality ~100’000 C#N fragments avaiable HydroxylAmine reagent Comments Reaction definition
11. Developing Imidazoles (one step/regioisomers) Di-carbonyls ~1000 (IMIDA) 14 million + Regioisomers This is a one-pot reaction Aldehyde ~14000 Comments Reaction definition
16. Case 2: pKa plug in applied to Enolate C reaction Possible de-protonations depending upon conditions… Logical resolve: Focus on trapped Enolate C reacting with say AlkylHalide/Aldehyde, not , Micheal additions, O-alkylation…which are separate atom mapping instance…
17. Kinetic vs Thermodynamic partition using pKa (water) range? pKa in water can help to define likely kinetic/thermodynamic de-protonation regioselectivity Although pKa value may not be directly relevant to actual solvent defined for reaction Abstraction of H pKa > 20 Alkene stability increases with substitution Abstraction of H pKa < 20 more likely de-protonation
25. Working closely with ChemAxon developers… I believe they are working on building this directly into reactor? (jc_standardize is available) Direct pre/post standardization built into reaction definitions (just realised how useful this is!) Helping to define fastest syntax for automated reaction - in particular Oracle cartridge table functions Speed of automated virtual reaction relatively slow (Oracle/Tomcat round trip) They swiftly provided the facility to easily create unique synthesis ID using reactant ID as a parameter We suggest a synthesis ID be part of reactor output to cover any isomers, possible errors ChemAxon response Request/issue
26.
27.
28. Suggested way of working…use reactor GUI! Marvin Sketch .rxn .sma Marvin Viewer standardizer .bat jcman table jcman import pl/sql JSP sqlplus .sdf .sdf A.sma B.sma A.sdf B.sdf Client Server
Editor's Notes
Point out how you see the rest of the workshop going…
Access relevant chemical space that at present may not be available in reality… Screen fast then accurate, validate results with QSAR in order to find cancer inhibitor drug molecules
Inhibox core services…data platform based on ChemAxon toolkits and Oracle…
A tangible starting point for virtual synthetic reactions is C-Space build. This is a data warehouse style view of global, approaching real time, available commercial molecules. A continual queue of SDF data to process (running 24/7) is downloaded from supplier websites. A job is added to QUE table and the Oracle QUE package is called to process the job. The QUE table controls all the transactions, STG, INT and UNQ. This is the minimal user interaction possible! [Can have many jobs running at once but relatively slower] First the STG transaction is used to create a JChem staging table and import the SDF data (JChem base will standardize on import, this is implemented as Java Stored Procedure as there is no cartridge equivalent). The INT transaction integrates/migrates data from each stage area into the JOIN table. JOIN table is a route to map supplier information supplier_no to corporate molecule ID (one molecule_id to many supplier_no) The INT.JCSearch function is used to define the MOLECULE_ID which is the system wide unique identifier. [Query precedence is Exact search, Exact fragment search (salts), Exact (fragment) double bond stereo relaxed and then supplier_no, this is also the approx order of query time!] A new molecule_id is generated if the search return 0. A JChem Server licence is definitely required to build in anything real time! The SUP table simply lists the suppliers currently available to the system. The UNQ transaction migrates any new molecules from the JOIN table to MOL table. The MOL table is CA table and is visible to JSP application for subsequent SSS. The MOL table is tending towards 3 million distinct structures. The MOL table forms the basis of subsequent building blocks/fragments table FRA which is the actual source of virtual reactants, MOL can be used for direct screening purposes also. The FRA (fragments) table is the subset of UNQ with certain criteria applied which makes them suitable Virtual reactions (Mwt <= 350, heavy atom count <= 13). The FRA table is tending towards X. FRA use JCIDX. SMI is a table of “cleansed” SMILES in non-JChem, partitioned table for external file and docking. Cartridge functions used: JC_EXACT,JC_COMPARE and JC_INSERT. Oracle standards used: Oracle Packages, table partitions (JOIN), Optimiser hints (JOIN), large SGA as possible, locally managed Tablespaces. Note: Ultimately all fast search is completed in TOMCAT cache and available memory is required for this. On an Oracle server not all available physical memory is available for use and so using Tomcat in conjunction with Oracle Uses up any excess memory i.e. Java processing complete is completed in TOMCAT, PL/SQL is Oracle allocated memory and so uses the best of further available resources. Other Oracle performance feature like bulk collect may help speed up the “PL/SQL bit”.
Established synthetic transformations, reliable literature sources, reasonable yields…
Translation of real reaction to virtual really is completed on a case by case basis…
Automation adds some overhead and so should only process desired reactants (pharmacophore filters/undesirables removed)…
3d chemical space is exponentially larger than the “real time” commercial set, C-Space, covers and so in order to identify novel leads, virtual chemical space V-space needs to be generated and explored also. In order to be able to subsequently work with real world chemists, we have chosen to adopt a reaction based approach to generating our virtual chemical space / libraries. We see that this task can be completed in two ways below, either exhaustively or by using functional group filters in a de-novo led approach in order to define more tenable sets: De-Novo: Initial fast screen of C-space helps to define functional groups of interest for a particular protein cavity, these are placed in a project pha list. Generic functional groups of interest can be maintained in PHA of which pha can be a snapshot (A). Previous experience has shown that we should source virtual reactant input from tangible real world C–space, the FRA table is a viable subset (<400 Mwt) of the MOL table (B). FRA are maintained automatically from the C-space build, however PHA of interest are maintained in a more judicious way. A project copy of all available reactions can be defined from the master “RXN” table (C) and placed in “rxn” copy (D) this way only Linker/core/template of interest will be processed for the project. The relevant reactant and reaction SMARTS queries are extracted from RXN table are used to generate project level copy of the reactants of interest which are optionally placed in “rcta” and “rctb” (E) and of course react the reactants via CA reactor engine. The result is a focused project products table “FOC” (F) which will contain the results of many chosen reactions producing products with previously defined pharmacophores of interest only. Generating these sets require relatively less enumeration time, disk and memory consumption and subsequent screening time. We can then visualise and fast search our output with JSP (G) as we don’t expect the record numbers and thus memory requirements to be too limiting. All project level objects are treated as editable copies of their source tables. Exhaustive: Enumeration time, disk and memory consumption and subsequent screening time are significant factors here hence this approach is treated in a secondary manner as a back ground job that should not contend with primary processes for resource . For example the Amide reaction can currently yield 35 million + products from C-space FRA, most of which will not be of interest for the individual query. Data directly from FRA (B) can be processed by RXN package from RXN (C) into set of non-JChem EXH tables (H), which can then be queried through the cartridge JChem index. The memory requirements will certainly stop JSP visualisation using Tomcat on even relatively high specification hardware (memory!) but the set can be queried in a screening sense (I). In both cases updates to reactants in FRA table are carefully managed through to FOC and EXH so that, ideally “big queries” are only completed once. The JCartridge functions used in Oracle RXN package, which handles all processing are JC_REACT4, JCTF_REACT,JC_COMPARE,JC_INSERT,JC_MATCHCOUNT.
Works well in the real world! (~1000 Acid Halide, lots 1 or 2 amines) Works well in the virtual world! Easy way to join two fragments in a “linear” fashion, Good to use for all benchmarking! Where to stop in terms of accuracy of library vs speed and data redundancy Have a go at developing the Amide reaction to your satisfaction!
Reactions which we have found: Work in both the real and virtual worlds Enough reactants in the fragments set to warrant use is de-novo (or exhaustive) Introduce several points of diversity in the product molecule Produce drug like molecules that exhibit less conformers / pharmacophore hypothesis easier Can be reacted further in well known reactions Aromatic heterocyclic chemistry David T Davies !
Diagram of Aromatic Nitration: Reality Toluene goes all the way to TNT
Can use charge and reactivity/tolerance rules in order to emulate accurate output for this reaction (Maybe need to access the runtime charge data in order to make decisions regarding boundary) Step 1 in reality gives us mix of –o AND –p product, however to achieve end result could use specified numbers If step 1 goes to meta first then second product can be two isomers (end result is achieved irrespective)
Looking at reactions at alpha C only for now…
pKa in water can help to define likely kinetic/thermodynamic de-protonation regioselectivity Although pKa may not be directly relevant to actual solvent defined for reaction
Prochiral centre + primary alkyl halide can give two enantiomers at alpha C (SN2) Prochiral centre + secondary (chiral) alkyl halide can give 4 diastereoisomers (inversion if SN2/ inversion + retention if SN1 at electrophilic C) 3. Prochiral centre + Tertiary (chiral) alkyl halide can give 4 diastereoisomers (inversion + retention as likely SN1 at electrophilic C) Likely Enolate Alkylation occurs by as SN2 process so either only consider case 1 and 2 (+ enforce inversion) Possibly due to carbocation intermediate not approach Lithium counter ion conversely polar aprotic solvate the Lithium cation and generate bare enolate anion which may even react SN1 (additional effect is Carbocation is stabalised)? So maybe query should only get primary/secondary AlkylHalides? ~ 12000 1 or 2 Alkyl Halides many Carbonyl with possible alpha Carbon
Query results, intermediate, stereochemistry May find some “enforced” stereochemistry if use well defined inputs so need to run through set to define all outputs… Alternative is to standardize all the hits to remove stereochemistry and thus effectively generate intermediate species
So…might use most acidic selectivity rule + tolerance rule of x pka units either side of this up to kinetic or thermodynamic cut off point…In reality mixtures are always formed as equilibration occurs to a greater or lesser extent…
Stereo Chemistry needs to be explicitly defined… Could use standardizer in order to remove any retained chirality from alpha C reaction site… Mapping does not accurately reflect two step mechanism, so this is compression…
Frequently substituted Carbonyl are used for ring formation stereochemistry is naturally removed and propagation of correct configuration is of no further concern…regiochemistry is now a consideration though!
Alternatively extend virtual chemical space and don’t worry about chirality issues until after the molecule has been identified by screening to be of interest
Can be used as a catch all stereochemistry generator to further cover chemical space... Logical thing to do in a screening sense although subsequent chiral synthesis may be reasonably untenable Easier than carefully defining all possible stereo chemistry outcomes though…
/* Fast? */ BEGIN FOR recsa in cursA (ReactASmart) LOOP EXIT WHEN cursa%notfound; FOR recsb in cursB (ReactBSmart) LOOP EXIT WHEN when cursb%notfound; Options := 'method:n mappingStyle:c outFormat:sdf reactionID:amide reactantIDs:' || recsa.molecule_id || ',' || recsb.molecule_id || ' productIdTag:synthesis_id'; BEGIN outSDF := jcf_react4 (reaction,recsa.cd_smiles,recsb.cd_smiles,null,null,Options); porductSDF := jc_standardize (outSDF,’aromatize”); EXCEPTION WHEN OTHERS THEN COMMIT; END; IF productsdf IS NOT NULL THEN BEGIN cdidarr := jc_insert (productSDF,products,'jchemproperties', 'true','false','userDefColMap:synthesis_id=synthesis_id'); EXCEPTION WHEN OTHERS THEN COMMIT; END; END IF; COMMIT; productsdf := null; END LOOP; END LOOP; END; /* Faster? */ BEGIN INSERT INTO tab (smiles,synthesis_id) SELECT product,synthesis_code FROM reactant t1, reactant t2, TABLE (jctf_react4(reaction,t1.cd_smiles,t2.cd_smiles,null,null, 'reactionId:amide reactantIds:' || t1.molecule_id || ',' || t2.molecule_id || '')) WHERE jc_compare (t1.cd_smiles,ReactASmart,'t:s') =1 AND jc_compare (t2.cd_smiles,ReactBSmart,'t:s')=1; COMMIT; END; /* Fastest? */ BEGIN SELECT molecule_id,cd_smiles BULK COLLECT INTO vMoleculeIDsB,vSmilesB FROM reactant WHERE jc_compare(cd_smiles,ReactASmart,'t:s)=1; SELECT molecule_id,cd_smiles BULK COLLECT INTO vMoleculeIDsA,vSmilesA FROM reactant WHERE jc_compare(cd_smiles,ReactBSmart,'t:s')=1; FORALL indx IN vMoleculeIDsA.First .. vMoleculeIDsA.Last INSERT INTO product (smiles,synthesis_id) SELECT product,synthesis_code FROM TABLE(jctf_react4(reaction, SELECT * FROM TABLE (CAST (vSmilesA(indx) AS vSmilesType), SELECT * FROM TABLE (CAST (vSmilesB(indx) AS vSmilesType), null,null,'reactionId:amide reactantIds:' || SELECT * FROM TABLE (CAST (vMoleculeIDsA(indx) AS vMolIDType) || ',' || SELECT * FROM TABLE (CAST (vMoleculeIDsB(indx) AS vMolIDType) || '')); COMMIT; END;
Thanks for listening…
Print off equivalent to circulate…
Define you reaction in Marvin Sketch application using react buttons (ensure no red box around this) Save it as a .rxn file. Determine your query for reactant A and B and generate some sdf data (reality minimum 100 * 100). Edit the react.bat call for your reaction Run the react.bat call and generate your product sdf Run standardizer call and standardize your product sdf Create a JChem table in schema (using custom standardization) and import your standardizer sdf file View the table using the JSP application <URL> Edit PL/SQL, import reactant SDF into fragments table and run the anonymous PL/SQL block in order to “automate”. Use JSP application to view results .rxn and pl/sql template script available