A ChemAxon/KNIME based tool
for designing chemical libraries
Tim Parrott
Dart NeuroScience
September 25, 2013
Brock Luty
Dart NeuroScience
ChemAxon UGM
Dart NeuroScience
Small molecules to maintain
cognitive vitality (LTM)
Currently about 200 FTEs with
build-out expected at 260
Privately held LLC by a single
individual
Scientific Computing
Scientific Computing collaborates with other DNS
Departments to deliver solutions that simplify and
accelerate the drug discovery process.
We rely on our (non-traditional) knowledge and
experience in both Science and Technology to develop
novel and efficient systems to meet this goal
Scientific Computing Groups
Bioinformatics
Philip Cheung
Doug Fenger
+ 1 FTE
Information
Management
+ 1 Group Lead
John Jaeger
Tim Parrott
James Harr
Eileen Tompkins
Heather Jones
Methods
Development
Ron Blanford
Daniel Garden
Kevin Neal
Hari Muddana
+ 1 FTE
Computational
Chemistry
*Tami Marrone
Meg McCarrick
James Na
Amy Shih
Bill Sinko
Project Support
- Modeling
- SBDD/Library Design
- Apply Methods
- Pre-LO/LO/PCC
Data / Biz Analysis
- Data Capture
- Analytics
- Data Access
- QA/Scientific Support
- Project Management
Software Development
- Informatics Software Development
- Developing new methods
- Enterprise Scale Architecture
- RIA (MVC) with SOA
- Extensions for ELN, Spotfire, IJC, etc
Project Support
- Target ID
- Expression Analysis / Pathways
- Novel Software algorithms
- Enterprise Software (with Methods)
Background
Dart NeuroScience (DNS)
200+ Scientists
50+ Chemists
Parallel Synthesis Group
About 20 chemists involved in
the design and creation of
chemical libraries
We need a
chemical library
design tool !
A Basic Chemical Library Design Tool
Enumerate Products
Calculate Properties
Analyze & Filter
Select Reactants
DesignTest
Analyze
Synthesize
Goals
SupportEase of Use
Productivity
Standardize calculations &
reactions (services)
Simplify: wrap processes and
minimize import/export operations
Enhance capabilities and speed by
doing calculations remotely
Constraints
Limited IT/IM support
Chemists already on
software overload
Approach
=
Chemical Property
Calculations,
Reaction
Enumeration
Data Pipelining
Visualization /
Analytics
3D Scoring
Platforms
Architecture
Heavily invested in Service Oriented architecture (Rest Style API) with
standardized DNS patterns
Domain CRUD (Create, Read, Update, Delete) GUIs written for specific
entities using MVC pattern (relying on Backbone.js and standardized
DNS patterns)
Traditional Stateless Computational Services (Property Calculation,
Enumeration, etc)
Services can be based on Scripts using command-line applications (primary
use-case). Services can also be written on KNIME and run in this
architecture.
Move all the heavy lifting to the servers (automated parallelization). KNIME
as a Service Orchestration Layer
Application
Service
Database
Brock’s Geeky
Slide
Tool Overview
Selection &
Configuration
Panel
Custom
Nodes
Spotfire
Export
Reactant Selection
Import curated
classes of
reactants
(CRUD Service)
Reactant Selection
Import list of
Reagent Numbers
(CRUD Service)
Reactant Deduplication
Input Output
Need to identify and
remove functionally
equivalent reactants
(Comp Service)
Reaction Selection
Reactions: A Look under the Hood
“Reactor” nodes
can contain multi-
step workflows.
(Comp Services)
Server-Side
Calculations
Clustering
Server-Side
Calculations --- OpenEye ROCS
ROCS output includes the
Shape/Pose that scored
best and the Tanimoto
Score against that query.
(Computational Service)
Pausing Local Execution
Export to Spotfire
Selections made in Spotfire
Spotfire Selections returned to KNIME
New nodes with
selected products
& reactants
appear in KNIME
Final Steps
The library design plan contains
separate sdf files for the products
and each reactant, along with a .csv
file listing how many times each
reactant is used. The zipped file is
parsed on import into a chemist’s
electronic laboratory notebook.
Stereochemical codes
needed for registration
are assigned based on
structure.
(Computational Service)
Load Library Design Plan into the
Agilent ELN
Custom Forms for
planning and
products tables
Summary
• June 2011
• June 2012
• Sept 2012
• November 2012
• April 2013
• August 2013
Parallel Synthesis Group formed
First release of Library Design Tool (LDT)
Additional KNIME training
Second release (Clustering, ROCS)
Pausable Nodes, Deduplication
RN Lookup, Stereo Code Assigner
40 Total Reactions
Acknowledgments
Node Development
Services & Deployment Testing and troubleshooting
Management & PM
loki der quaeler
Ron Blanford
Karen Do
Kenny Leung
Zach Young
Daniel Garden
Eileen Tompkins
Andrew Burritt
The SGC Team
Melanie Nelson
Heather Jones
Brock Luty

EUGM 2014 - Brock Luty (Dart Neuroscience): A ChemAxon/KNIME based tool for designing chemical libraries

  • 1.
    A ChemAxon/KNIME basedtool for designing chemical libraries Tim Parrott Dart NeuroScience September 25, 2013 Brock Luty Dart NeuroScience ChemAxon UGM
  • 2.
    Dart NeuroScience Small moleculesto maintain cognitive vitality (LTM) Currently about 200 FTEs with build-out expected at 260 Privately held LLC by a single individual
  • 3.
    Scientific Computing Scientific Computingcollaborates with other DNS Departments to deliver solutions that simplify and accelerate the drug discovery process. We rely on our (non-traditional) knowledge and experience in both Science and Technology to develop novel and efficient systems to meet this goal
  • 4.
    Scientific Computing Groups Bioinformatics PhilipCheung Doug Fenger + 1 FTE Information Management + 1 Group Lead John Jaeger Tim Parrott James Harr Eileen Tompkins Heather Jones Methods Development Ron Blanford Daniel Garden Kevin Neal Hari Muddana + 1 FTE Computational Chemistry *Tami Marrone Meg McCarrick James Na Amy Shih Bill Sinko Project Support - Modeling - SBDD/Library Design - Apply Methods - Pre-LO/LO/PCC Data / Biz Analysis - Data Capture - Analytics - Data Access - QA/Scientific Support - Project Management Software Development - Informatics Software Development - Developing new methods - Enterprise Scale Architecture - RIA (MVC) with SOA - Extensions for ELN, Spotfire, IJC, etc Project Support - Target ID - Expression Analysis / Pathways - Novel Software algorithms - Enterprise Software (with Methods)
  • 5.
    Background Dart NeuroScience (DNS) 200+Scientists 50+ Chemists Parallel Synthesis Group About 20 chemists involved in the design and creation of chemical libraries We need a chemical library design tool !
  • 6.
    A Basic ChemicalLibrary Design Tool Enumerate Products Calculate Properties Analyze & Filter Select Reactants DesignTest Analyze Synthesize
  • 7.
    Goals SupportEase of Use Productivity Standardizecalculations & reactions (services) Simplify: wrap processes and minimize import/export operations Enhance capabilities and speed by doing calculations remotely Constraints Limited IT/IM support Chemists already on software overload Approach =
  • 8.
  • 9.
    Architecture Heavily invested inService Oriented architecture (Rest Style API) with standardized DNS patterns Domain CRUD (Create, Read, Update, Delete) GUIs written for specific entities using MVC pattern (relying on Backbone.js and standardized DNS patterns) Traditional Stateless Computational Services (Property Calculation, Enumeration, etc) Services can be based on Scripts using command-line applications (primary use-case). Services can also be written on KNIME and run in this architecture. Move all the heavy lifting to the servers (automated parallelization). KNIME as a Service Orchestration Layer Application Service Database Brock’s Geeky Slide
  • 10.
  • 11.
    Reactant Selection Import curated classesof reactants (CRUD Service)
  • 12.
    Reactant Selection Import listof Reagent Numbers (CRUD Service)
  • 13.
    Reactant Deduplication Input Output Needto identify and remove functionally equivalent reactants (Comp Service)
  • 14.
  • 15.
    Reactions: A Lookunder the Hood “Reactor” nodes can contain multi- step workflows. (Comp Services) Server-Side
  • 16.
  • 17.
  • 18.
    Calculations --- OpenEyeROCS ROCS output includes the Shape/Pose that scored best and the Tanimoto Score against that query. (Computational Service)
  • 19.
  • 20.
  • 21.
  • 22.
    Spotfire Selections returnedto KNIME New nodes with selected products & reactants appear in KNIME
  • 23.
    Final Steps The librarydesign plan contains separate sdf files for the products and each reactant, along with a .csv file listing how many times each reactant is used. The zipped file is parsed on import into a chemist’s electronic laboratory notebook. Stereochemical codes needed for registration are assigned based on structure. (Computational Service)
  • 24.
    Load Library DesignPlan into the Agilent ELN Custom Forms for planning and products tables
  • 25.
    Summary • June 2011 •June 2012 • Sept 2012 • November 2012 • April 2013 • August 2013 Parallel Synthesis Group formed First release of Library Design Tool (LDT) Additional KNIME training Second release (Clustering, ROCS) Pausable Nodes, Deduplication RN Lookup, Stereo Code Assigner 40 Total Reactions
  • 26.
    Acknowledgments Node Development Services &Deployment Testing and troubleshooting Management & PM loki der quaeler Ron Blanford Karen Do Kenny Leung Zach Young Daniel Garden Eileen Tompkins Andrew Burritt The SGC Team Melanie Nelson Heather Jones Brock Luty