SlideShare a Scribd company logo
Farewell, PipelinePilot
Migrating the Exquiron cheminformatics platform
to KNIME and the ChemAxon technology
Serge P. Parel, PhD
ChemAxon User Group Meeting, Budapest
21st May, 2014
© copyright 2014 Exquiron Biotech AG2
Outline
 Exquiron – Who are we ?
 Research informatics infrastructure
• Why migrate ?
• What to migrate ?
• Who can help ?
 Example 1: Reporting of dose-response results
 Example 2: Hit expansion toolbox
 Summary
 Acknowledgments
© copyright 2014 Exquiron Biotech AG3
Exquiron – who are we?
 Hit identification & profiling services
 Independent and privately held startup company
 10 scientific staff and expanding
 >60 cumulated years experience in hit identification services
Covering human health, veterinary, consumer care, nutraceutical,
and agrochemical areas
 850m2 lab & office space
 Located in Reinach, near Basel, Switzerland
© copyright 2014 Exquiron Biotech AG4
Research informatics infrastructure
As of end of 2013
Chemistry cartridge:
• ChemAxon JChem on Oracle
Workflow platform:
• Accelrys Pipeline Pilot
Biological data warehouse:
• Exquiron NaviGator database
on MS-SQL server
Cheminformatics platform:
• Accelrys PP Chemistry Library
© copyright 2014 Exquiron Biotech AG5
Research informatics infrastructure
 Renewal financial terms not suitable
for small enterprises with only very
few users (compared with pricing of
Start-Up package)
 Duplicate functionalities (Accelrys PP
Chemistry Library and ChemAxon PP
components)
Problem
Chemistry cartridge:
• ChemAxon JChem on Oracle
Workflow platform:
• Accelrys Pipeline Pilot
Biological data warehouse:
• Exquiron NaviGator database
on MS-SQL server
Cheminformatics platform:
• Accelrys PP Chemistry Library
© copyright 2014 Exquiron Biotech AG6
Research informatics infrastructure
Solution ?
Chemistry cartridge:
• ChemAxon JChem on Oracle
Workflow platform:
• Accelrys Pipeline Pilot
Biological data warehouse:
• Exquiron NaviGator database
on MS-SQL server
Cheminformatics platform:
• Accelrys PP Chemistry Library
Workflow platform:
• KNIME
Cheminformatics platform:
• ChemAxon/Infocom nodes
© copyright 2014 Exquiron Biotech AG7
Research informatics infrastructure
Solution ?
Chemistry cartridge:
• ChemAxon JChem on Oracle
Workflow platform:
• Accelrys Pipeline Pilot
Biological data warehouse:
• Exquiron NaviGator database
on MS-SQL server
Cheminformatics platform:
• Accelrys PP Chemistry Library
Workflow platform:
• KNIME
Cheminformatics platform:
• ChemAxon/Infocom nodes
 Simple protocols can be ported in
house, but what about more complex
workflows?
© copyright 2014 Exquiron Biotech AG8
Research informatics infrastructure
 Ask KNIME if they can help:
 Yes, but they rely rather on partners for consulting services and support them
where needed
 Ask ChemAxon if they can help:
 Yes, through their Consultancy Services
Who can help ?
Workflow platform:
• Accelrys Pipeline Pilot
Cheminformatics platform:
• Accelrys PP Chemistry Library
Workflow platform:
• KNIME
Cheminformatics platform:
• ChemAxon/Infocom nodes
© copyright 2014 Exquiron Biotech AG9
Scope of the migration
 Lots of short PP protocols (SD files and
lists manipulation, virtual screening,…)
 Dose-response data reporting
• Input file generation
• Report generation
 Exhaustive hit expansion toolbox
• Hit expansion
• Data fusion and analysis
Done over Consultancy
with support from KNIME
Done over Consultancy
Done in house
Done in house
© copyright 2014 Exquiron Biotech AG10
Example 1
 Typical workflow for a hit finding campaign:
Reporting of dose-response experiments
Many 100K compounds A few 1,000s A few 100s HitsActives
Confirmed
actives
HTS
Confirmation
usually in duplicates
Dose-response
determination
assay + counterassay(s)
Reporting
• Chemists want to see structures and
biological results
• Biologists want to see biological results and
dose-response curves
© copyright 2014 Exquiron Biotech AG11
Reporting of dose-response experiments
Workflow (I)
Exp. data
points
IC50 fitting
parameters
Purity
Select project
and set of
experiments
Tag each data point with
experiment, concentration,
replicate and experiment. Tag
IC50 parameters by experiment
SD File
Additonal
data (Text
file)
JChem
cartridge
or SD File
Exquiron data
warehouse
© copyright 2014 Exquiron Biotech AG12
Reporting of dose-response experiments
Workflow (II)
Compound
structure
SD File
Generate
report
elements
Results
table
Dose-
response
curves
Export
as pdf
© copyright 2014 Exquiron Biotech AG13
Reporting of dose-response experiments
 Concentration may be different for each experiment
 Number of replicates per concentration may vary
 Datapoint at a given concentration could be invalid
 Number of experiments to report may vary
 Fields for SD file cannot be defined a priori
SD file generation (I)
exq_id exp_id Conc Valid Activity(%)
EXQ0012345 2344 90 TRUE 6.411991119
EXQ0012345 2344 30 TRUE 41.63282013
EXQ0012345 2344 10 TRUE 74.8677597
EXQ0012345 2344 3.333333254 TRUE 83.46128082
EXQ0012345 2344 1.111111164 TRUE 87.6084137
EXQ0012345 2344 0.370370358 TRUE 93.19058228
.
.
M END
> <dtag_nr>
EXQ0012345
> <Activity_0.37037_1_S1>
93.19058227539062
> <Activity_1.11111_1_S1>
87.60841369628906
> <Activity_3.33333_1_S1>
83.4612808227539
> <Activity_10.0_1_S1>
74.86775970458984
> <Activity_30.0_1_S1>
41.63282012939453
> <Activity_90.0_1_S1>
6.411991119384766
.
.
© copyright 2014 Exquiron Biotech AG14
Reporting of dose-response experiments
SD file generation (II)
PipelinePilot KNIME
Get data from database:
• SELECT statement
• Each datapoint is a separate row
Get data from database:
• SELECT statement
• Each datapoint is a separate row
Crunch data:
• Through Group By‟s and a lot of hash
tables in PilotScript code
 New columns/column names
generated on the fly
• For each compound, one single row with
each datapoint is a separate column
Crunch data:
• Through Group By‟s, pivoting and some
string manipulation
 All new columns/column names
generated at once through pivoting
• For each compound, one single row with
each datapoint is a separate column
Finalize:
• Merge with structures, regression
parameters and purity information
• Write SD file
Finalize:
• Merge with structures, regression
parameters and purity information
• Write SD file
© copyright 2014 Exquiron Biotech AG15
Reporting of dose-response experiments
Report generation in PipelinePilot
PipelinePilot
 Read and parse SD file
 Generate report elements (PP Reporting Collection)
• Compound ID
• Structure
• Results table
 Generate DR curves based on logistic regression
parameters (done in Perl !!)
 Feed DR curves and experimental data points to XY-
Chart component
 Combine into report with header, footer, logo…
 Generate pdf file
© copyright 2014 Exquiron Biotech AG16
Reporting of dose-response experiments
PipelinePilot
 Read and parse SD file
 Generate report elements (PP Reporting Collection)
• Compound ID
• Structure
• Results table
 Generate DR curves based on logistic regression
parameters (done in Perl !!)
 Feed DR curves and experimental data points to XY-
Chart component
 Combine into report with header, footer, logo…
 Generate pdf file
…and in KNIME
KNIME
 Read and parse SD file
 Generate report information as tables
• Compound ID
• Structure
• Results table
 Generate DR curves based on logistic regression
parameters (using a Math Formula node)
 Feed DR curves and experimental data points to R and
use ggplot to generate the DR plot, return it as picture
 Send all elements to the KNIME Report Designer
(BIRT)
 Combine to a report with header, footer, logo…
 Generate pdf file
© copyright 2014 Exquiron Biotech AG17
Reporting of dose-response experiments
Result
PipelinePilot KNIME
© copyright 2014 Exquiron Biotech AG18
Example 2
 Purpose of hit expansion (aka SAR expansion, SAR by catalogue):
• To consolidate and expand knowledge on active regions in chemical space
• To find derivatives of active scaffolds to establish or broaden SAR
information
• To expand hit series by scaffold hopping
• Strengthen knowledge base for direct use in H2L campaign
Hit expansion workflow
© copyright 2014 Exquiron Biotech AG19
Hit Expansion workflow
Workflow
A. Bergner, S. P. Parel, J. Chem. Inf. Model. 2013, 53, 1057 (http://dx.doi.org/10.1021/ci400059p)
© copyright 2014 Exquiron Biotech AG20
Hit Expansion workflow
Workflow
A. Bergner, S. P. Parel, J. Chem. Inf. Model. 2013, 53, 1057 (http://dx.doi.org/10.1021/ci400059p)
Applied to bridge gaps
in chemical space
© copyright 2014 Exquiron Biotech AG21
Hit Expansion workflow
Example (ChemBL VEGFR2 dataset)
A. Bergner, S. P. Parel, J. Chem. Inf. Model. 2013, 53, 1057 (http://dx.doi.org/10.1021/ci400059p)
© copyright 2014 Exquiron Biotech AG22
Hit expansion workflow
 Port all functionalities of the workflow to KNIME using the
ChemAxon/Infocom nodes (with the exception of the pharmacophore
graphs)
 Improve and optimize workflow for increased execution speed (current
Turbosim implementation very slow due to the large number of similarity
searches performed)
Assignment more than fulfilled
To do
© copyright 2014 Exquiron Biotech AG23
Hit expansion workflow
 Workflow combines similarity searches using FCFP, ECFC and
MACCS keys
 Workflow also performs various substructure searches
 In PP, all fingerprints (FP) were pre-calculated and stored with their
corresponding structure into a PP cache
• Still a lot of «overhead» since structures are not needed to run
similarity searches (FP already calculated) and FP are not needed
when running substructures searches, but everything is carried
through the whole workflow
Target database preparation
© copyright 2014 Exquiron Biotech AG24
Hit expansion workflow
 Workflow combines similarity searches using FCFP, ECFC and
MACCS keys
 Workflow also performs various substructure searches
 In PP, all fingerprints (FP) were pre-calculated and stored with their
corresponding structure into a PP cache
• Still a lot of «overhead» since structures are not needed to run
similarity searches (FP already calculated) and FP are not needed
when running substructures searches, but everything is carried
through the whole workflow
Target database preparation
After calculation, FPs are stored into a KNIME table file, and the structures are
stored separately into a local JChem database, then load only what is needed
where you need it
© copyright 2014 Exquiron Biotech AG25
Hit expansion workflow
 Idea: create substructure queries with generalized ring structures
(somewhat the opposite of Bemis-Murcko scaffolds)
 Implementation:
• Loop through each atom, set to atom type „any‟ if it is in a ring
• Loop through each bond, set to bond type „any‟ if it is in a ring
Ring generalization
© copyright 2014 Exquiron Biotech AG26
Hit expansion workflow
 Idea: create substructure queries with generalized ring structures
(somewhat the opposite of Bemis-Murcko scaffolds)
 Implementation:
• Loop through each atom, set to atom type „any‟ if it is in a ring
• Loop through each bond, set to bond type „any‟ if it is in a ring
Ring generalization
Doesn’t work in PP
(with PilotScript and
Chemistry Collection)
© copyright 2014 Exquiron Biotech AG27
Hit expansion workflow
 Idea: create substructure queries with generalized ring structures
(somewhat the opposite of Bemis-Murcko scaffolds)
 Implementation:
• Loop through each atom, set to atom type „any‟ if it is in a ring
• Loop through each bond, set to bond type „any‟ if it is in a ring
Ring generalization
Works perfectly fine in
KNIME by calling the
ChemAxon API in a
Java Snippet
© copyright 2014 Exquiron Biotech AG28
Hit expansion workflow
 Bioisosteric transformations are applied to generate new virtual
template structures
 This step makes use of the Wagener and Lommerse dataset (J. Chem.
Inf. Model. 2006, 46, 677) as originally implemented in PP (MDL .rxn
files)
 But there is no bioisosteric transformation node available under KNIME
Bioisostere enumeration
© copyright 2014 Exquiron Biotech AG29
 Bioisosteric transformations were implemented as a KNIME metanode
using a loop iterating through each reaction file and applying it to the
incoming template with a ChemAxon/Infocom UniReactor node
 Works well, but requires a strong atom-to-atom mapping in the .rxn
file (which was not the case under PP)
 Still a couple of issues to solve:
• May be due to the way UniReactor handles 2-attachment points reactions
(?):
 But we are on the right track !
Hit expansion workflow
Bioisostere enumeration - implementation
© copyright 2014 Exquiron Biotech AG30
Summary
 KNIME combined with the ChemAxon/Infocom cheminformatics nodes is an efficient
and cost-effective alternative to PipelinePilot
 State-of-the-art reporting platform (WYSIWYG and Drag&Drop)
 Fingerprint-based similarity search orders of magnitude faster
 RAM memory requirements high (at least for the Desktop version of KNIME when dealing
with large SD files, prob. less relevant for the Server version). 2 GB assigned to KNIME is
probably the minimum
 Loops can sometimes be quite slow (“equivalent” of the RunToCompletion option in PP)
 Currently no KNIME implementation for the Discngine Pharmacophore Graphs, but
possibility to implement Screen3D instead under evaluation
© copyright 2014 Exquiron Biotech AG31
Summary
 ChemAxon and KNIME have done a terrific job in helping us successfully migrate
business critical protocols/workflows
• Required timelines could be maintained (due to expiring PP license)
• Workflows have been optimized and streamlined
If you are planning a similar migration and you need help,
ask the ChemAxon team. They definitively can help !
© copyright 2014 Exquiron Biotech AG32
Acknowledgements
 Attila Szabó
 Mihály „Medzi‟ Medzihradszky
 Andreas Bergner
 Aaron Hart
Thank you!
For more information contact
Serge P. Parel
Exquiron Biotech AG
Kägenstrasse 17
4153 Reinach
Switzerland
E-mail: sparel@exquiron.com
www.exquiron.com

More Related Content

Similar to EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating the Exquiron cheminformatics platform to KNIME and the ChemAxon technology

big-book-of-data-science-2ndedition.pdf
big-book-of-data-science-2ndedition.pdfbig-book-of-data-science-2ndedition.pdf
big-book-of-data-science-2ndedition.pdf
ssuserd397dd
 
Agile Development in a Regulated Environment
Agile Development in a Regulated EnvironmentAgile Development in a Regulated Environment
Agile Development in a Regulated Environment
TechWell
 
Drone
DroneDrone
Speeding up information extraction programs: a holistic optimizer and a learn...
Speeding up information extraction programs: a holistic optimizer and a learn...Speeding up information extraction programs: a holistic optimizer and a learn...
Speeding up information extraction programs: a holistic optimizer and a learn...
INRIA-OAK
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Paolo Missier
 
process control instrumentation lab and labview report
process control  instrumentation lab and labview  reportprocess control  instrumentation lab and labview  report
process control instrumentation lab and labview report
Hari Krishna
 
Réveil en Form' - Dual use - Verhaert
Réveil en Form' - Dual use - VerhaertRéveil en Form' - Dual use - Verhaert
Réveil en Form' - Dual use - Verhaert
Alain Krafft
 
Exploring Neo4j Graph Database as a Fast Data Access Layer
Exploring Neo4j Graph Database as a Fast Data Access LayerExploring Neo4j Graph Database as a Fast Data Access Layer
Exploring Neo4j Graph Database as a Fast Data Access Layer
Sambit Banerjee
 
Fl lab collection_05_2011_en
Fl lab collection_05_2011_enFl lab collection_05_2011_en
Fl lab collection_05_2011_enVahid RG-zadeh
 
Live Coding 12 Factor App
Live Coding 12 Factor AppLive Coding 12 Factor App
Live Coding 12 Factor App
Emily Jiang
 
Value stream mapping for DevOps
Value stream mapping for DevOpsValue stream mapping for DevOps
Value stream mapping for DevOps
Marc Hornbeek
 
Virtual Human Brain Simulations with Abaqus in the Cloud
Virtual Human Brain Simulations with Abaqus in the CloudVirtual Human Brain Simulations with Abaqus in the Cloud
Virtual Human Brain Simulations with Abaqus in the Cloud
The UberCloud
 
Quality and capacity expansion of thematic services in EOSC-SYNERGY
Quality and capacity expansion of thematic services in EOSC-SYNERGYQuality and capacity expansion of thematic services in EOSC-SYNERGY
Quality and capacity expansion of thematic services in EOSC-SYNERGY
Jisc
 
V mware business continuity and disaster recovery accelerator service
V mware business continuity and disaster recovery accelerator serviceV mware business continuity and disaster recovery accelerator service
V mware business continuity and disaster recovery accelerator service
solarisyougood
 
01-06 OCRE Test Suite - Fernandes.pdf
01-06 OCRE Test Suite - Fernandes.pdf01-06 OCRE Test Suite - Fernandes.pdf
01-06 OCRE Test Suite - Fernandes.pdf
OCRE | Open Clouds for Research Environments
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science research
Denis C. Bauer
 
Certified Reasoning for Automated Verification
Certified Reasoning for Automated VerificationCertified Reasoning for Automated Verification
Certified Reasoning for Automated Verification
Asankhaya Sharma
 
Deep learning enhanced digital twin for Closed-loop In-Process Quality Improv...
Deep learning enhanced digital twin for Closed-loop In-Process Quality Improv...Deep learning enhanced digital twin for Closed-loop In-Process Quality Improv...
Deep learning enhanced digital twin for Closed-loop In-Process Quality Improv...
WMG centre High Value Manufacturing Catapult
 

Similar to EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating the Exquiron cheminformatics platform to KNIME and the ChemAxon technology (20)

big-book-of-data-science-2ndedition.pdf
big-book-of-data-science-2ndedition.pdfbig-book-of-data-science-2ndedition.pdf
big-book-of-data-science-2ndedition.pdf
 
Agile Development in a Regulated Environment
Agile Development in a Regulated EnvironmentAgile Development in a Regulated Environment
Agile Development in a Regulated Environment
 
Drone
DroneDrone
Drone
 
Speeding up information extraction programs: a holistic optimizer and a learn...
Speeding up information extraction programs: a holistic optimizer and a learn...Speeding up information extraction programs: a holistic optimizer and a learn...
Speeding up information extraction programs: a holistic optimizer and a learn...
 
kshitij_resume_R&D
kshitij_resume_R&Dkshitij_resume_R&D
kshitij_resume_R&D
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...
 
process control instrumentation lab and labview report
process control  instrumentation lab and labview  reportprocess control  instrumentation lab and labview  report
process control instrumentation lab and labview report
 
Réveil en Form' - Dual use - Verhaert
Réveil en Form' - Dual use - VerhaertRéveil en Form' - Dual use - Verhaert
Réveil en Form' - Dual use - Verhaert
 
Exploring Neo4j Graph Database as a Fast Data Access Layer
Exploring Neo4j Graph Database as a Fast Data Access LayerExploring Neo4j Graph Database as a Fast Data Access Layer
Exploring Neo4j Graph Database as a Fast Data Access Layer
 
Fl lab collection_05_2011_en
Fl lab collection_05_2011_enFl lab collection_05_2011_en
Fl lab collection_05_2011_en
 
Live Coding 12 Factor App
Live Coding 12 Factor AppLive Coding 12 Factor App
Live Coding 12 Factor App
 
resumelrs_jan_2017
resumelrs_jan_2017resumelrs_jan_2017
resumelrs_jan_2017
 
Value stream mapping for DevOps
Value stream mapping for DevOpsValue stream mapping for DevOps
Value stream mapping for DevOps
 
Virtual Human Brain Simulations with Abaqus in the Cloud
Virtual Human Brain Simulations with Abaqus in the CloudVirtual Human Brain Simulations with Abaqus in the Cloud
Virtual Human Brain Simulations with Abaqus in the Cloud
 
Quality and capacity expansion of thematic services in EOSC-SYNERGY
Quality and capacity expansion of thematic services in EOSC-SYNERGYQuality and capacity expansion of thematic services in EOSC-SYNERGY
Quality and capacity expansion of thematic services in EOSC-SYNERGY
 
V mware business continuity and disaster recovery accelerator service
V mware business continuity and disaster recovery accelerator serviceV mware business continuity and disaster recovery accelerator service
V mware business continuity and disaster recovery accelerator service
 
01-06 OCRE Test Suite - Fernandes.pdf
01-06 OCRE Test Suite - Fernandes.pdf01-06 OCRE Test Suite - Fernandes.pdf
01-06 OCRE Test Suite - Fernandes.pdf
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science research
 
Certified Reasoning for Automated Verification
Certified Reasoning for Automated VerificationCertified Reasoning for Automated Verification
Certified Reasoning for Automated Verification
 
Deep learning enhanced digital twin for Closed-loop In-Process Quality Improv...
Deep learning enhanced digital twin for Closed-loop In-Process Quality Improv...Deep learning enhanced digital twin for Closed-loop In-Process Quality Improv...
Deep learning enhanced digital twin for Closed-loop In-Process Quality Improv...
 

More from ChemAxon

Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?
Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?
Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?
ChemAxon
 
Chemaxon EU UGM 2022 | Translating data to predictive models
Chemaxon EU UGM 2022 | Translating data to predictive modelsChemaxon EU UGM 2022 | Translating data to predictive models
Chemaxon EU UGM 2022 | Translating data to predictive models
ChemAxon
 
Translating data to predictive models
Translating data to predictive modelsTranslating data to predictive models
Translating data to predictive models
ChemAxon
 
Efficient biomolecular structural data handling and analysis - Webinar with D...
Efficient biomolecular structural data handling and analysis - Webinar with D...Efficient biomolecular structural data handling and analysis - Webinar with D...
Efficient biomolecular structural data handling and analysis - Webinar with D...
ChemAxon
 
Biomolecule structural data management
Biomolecule structural data managementBiomolecule structural data management
Biomolecule structural data management
ChemAxon
 
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first release
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first releaseCheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first release
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first release
ChemAxon
 
Enhanced stereochemistry representation
Enhanced stereochemistry representation Enhanced stereochemistry representation
Enhanced stereochemistry representation
ChemAxon
 
Intellectual property (IP) intelligence solutions designed for the way resear...
Intellectual property (IP) intelligence solutions designed for the way resear...Intellectual property (IP) intelligence solutions designed for the way resear...
Intellectual property (IP) intelligence solutions designed for the way resear...
ChemAxon
 
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
ChemAxon
 
Patent Data for Artificial Intelligence based Drug Discovery
Patent Data for Artificial Intelligence based Drug DiscoveryPatent Data for Artificial Intelligence based Drug Discovery
Patent Data for Artificial Intelligence based Drug Discovery
ChemAxon
 
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
ChemAxon
 
Research data management on the cloud
Research data management on the cloudResearch data management on the cloud
Research data management on the cloud
ChemAxon
 
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound Registration
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound RegistrationCheminfo Stories APAC 2020 - Introducing Design Hub & Compound Registration
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound Registration
ChemAxon
 
Cheminfo Stories APAC 2020 - JChem Engines introduction
Cheminfo Stories APAC 2020 - JChem Engines introduction Cheminfo Stories APAC 2020 - JChem Engines introduction
Cheminfo Stories APAC 2020 - JChem Engines introduction
ChemAxon
 
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...
ChemAxon
 
Cheminfo Stories APAC 2020 -- Markush technology
Cheminfo Stories APAC 2020 -- Markush technology Cheminfo Stories APAC 2020 -- Markush technology
Cheminfo Stories APAC 2020 -- Markush technology
ChemAxon
 
JChem Microservices
JChem MicroservicesJChem Microservices
JChem Microservices
ChemAxon
 
Migration from joc to jpc or choral
Migration from joc to jpc or choralMigration from joc to jpc or choral
Migration from joc to jpc or choral
ChemAxon
 
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5
ChemAxon
 
Chemicalize Pro - Cheminfo Stories 2020 Day 5
Chemicalize Pro - Cheminfo Stories 2020 Day 5Chemicalize Pro - Cheminfo Stories 2020 Day 5
Chemicalize Pro - Cheminfo Stories 2020 Day 5
ChemAxon
 

More from ChemAxon (20)

Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?
Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?
Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?
 
Chemaxon EU UGM 2022 | Translating data to predictive models
Chemaxon EU UGM 2022 | Translating data to predictive modelsChemaxon EU UGM 2022 | Translating data to predictive models
Chemaxon EU UGM 2022 | Translating data to predictive models
 
Translating data to predictive models
Translating data to predictive modelsTranslating data to predictive models
Translating data to predictive models
 
Efficient biomolecular structural data handling and analysis - Webinar with D...
Efficient biomolecular structural data handling and analysis - Webinar with D...Efficient biomolecular structural data handling and analysis - Webinar with D...
Efficient biomolecular structural data handling and analysis - Webinar with D...
 
Biomolecule structural data management
Biomolecule structural data managementBiomolecule structural data management
Biomolecule structural data management
 
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first release
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first releaseCheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first release
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first release
 
Enhanced stereochemistry representation
Enhanced stereochemistry representation Enhanced stereochemistry representation
Enhanced stereochemistry representation
 
Intellectual property (IP) intelligence solutions designed for the way resear...
Intellectual property (IP) intelligence solutions designed for the way resear...Intellectual property (IP) intelligence solutions designed for the way resear...
Intellectual property (IP) intelligence solutions designed for the way resear...
 
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
 
Patent Data for Artificial Intelligence based Drug Discovery
Patent Data for Artificial Intelligence based Drug DiscoveryPatent Data for Artificial Intelligence based Drug Discovery
Patent Data for Artificial Intelligence based Drug Discovery
 
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
 
Research data management on the cloud
Research data management on the cloudResearch data management on the cloud
Research data management on the cloud
 
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound Registration
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound RegistrationCheminfo Stories APAC 2020 - Introducing Design Hub & Compound Registration
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound Registration
 
Cheminfo Stories APAC 2020 - JChem Engines introduction
Cheminfo Stories APAC 2020 - JChem Engines introduction Cheminfo Stories APAC 2020 - JChem Engines introduction
Cheminfo Stories APAC 2020 - JChem Engines introduction
 
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...
 
Cheminfo Stories APAC 2020 -- Markush technology
Cheminfo Stories APAC 2020 -- Markush technology Cheminfo Stories APAC 2020 -- Markush technology
Cheminfo Stories APAC 2020 -- Markush technology
 
JChem Microservices
JChem MicroservicesJChem Microservices
JChem Microservices
 
Migration from joc to jpc or choral
Migration from joc to jpc or choralMigration from joc to jpc or choral
Migration from joc to jpc or choral
 
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5
 
Chemicalize Pro - Cheminfo Stories 2020 Day 5
Chemicalize Pro - Cheminfo Stories 2020 Day 5Chemicalize Pro - Cheminfo Stories 2020 Day 5
Chemicalize Pro - Cheminfo Stories 2020 Day 5
 

Recently uploaded

APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi ArabiaTop 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
Yara Milbes
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
abdulrafaychaudhry
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
informapgpstrackings
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Natan Silnitsky
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
e20449
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
Roshan Dwivedi
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 

Recently uploaded (20)

APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi ArabiaTop 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 

EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating the Exquiron cheminformatics platform to KNIME and the ChemAxon technology

  • 1. Farewell, PipelinePilot Migrating the Exquiron cheminformatics platform to KNIME and the ChemAxon technology Serge P. Parel, PhD ChemAxon User Group Meeting, Budapest 21st May, 2014
  • 2. © copyright 2014 Exquiron Biotech AG2 Outline  Exquiron – Who are we ?  Research informatics infrastructure • Why migrate ? • What to migrate ? • Who can help ?  Example 1: Reporting of dose-response results  Example 2: Hit expansion toolbox  Summary  Acknowledgments
  • 3. © copyright 2014 Exquiron Biotech AG3 Exquiron – who are we?  Hit identification & profiling services  Independent and privately held startup company  10 scientific staff and expanding  >60 cumulated years experience in hit identification services Covering human health, veterinary, consumer care, nutraceutical, and agrochemical areas  850m2 lab & office space  Located in Reinach, near Basel, Switzerland
  • 4. © copyright 2014 Exquiron Biotech AG4 Research informatics infrastructure As of end of 2013 Chemistry cartridge: • ChemAxon JChem on Oracle Workflow platform: • Accelrys Pipeline Pilot Biological data warehouse: • Exquiron NaviGator database on MS-SQL server Cheminformatics platform: • Accelrys PP Chemistry Library
  • 5. © copyright 2014 Exquiron Biotech AG5 Research informatics infrastructure  Renewal financial terms not suitable for small enterprises with only very few users (compared with pricing of Start-Up package)  Duplicate functionalities (Accelrys PP Chemistry Library and ChemAxon PP components) Problem Chemistry cartridge: • ChemAxon JChem on Oracle Workflow platform: • Accelrys Pipeline Pilot Biological data warehouse: • Exquiron NaviGator database on MS-SQL server Cheminformatics platform: • Accelrys PP Chemistry Library
  • 6. © copyright 2014 Exquiron Biotech AG6 Research informatics infrastructure Solution ? Chemistry cartridge: • ChemAxon JChem on Oracle Workflow platform: • Accelrys Pipeline Pilot Biological data warehouse: • Exquiron NaviGator database on MS-SQL server Cheminformatics platform: • Accelrys PP Chemistry Library Workflow platform: • KNIME Cheminformatics platform: • ChemAxon/Infocom nodes
  • 7. © copyright 2014 Exquiron Biotech AG7 Research informatics infrastructure Solution ? Chemistry cartridge: • ChemAxon JChem on Oracle Workflow platform: • Accelrys Pipeline Pilot Biological data warehouse: • Exquiron NaviGator database on MS-SQL server Cheminformatics platform: • Accelrys PP Chemistry Library Workflow platform: • KNIME Cheminformatics platform: • ChemAxon/Infocom nodes  Simple protocols can be ported in house, but what about more complex workflows?
  • 8. © copyright 2014 Exquiron Biotech AG8 Research informatics infrastructure  Ask KNIME if they can help:  Yes, but they rely rather on partners for consulting services and support them where needed  Ask ChemAxon if they can help:  Yes, through their Consultancy Services Who can help ? Workflow platform: • Accelrys Pipeline Pilot Cheminformatics platform: • Accelrys PP Chemistry Library Workflow platform: • KNIME Cheminformatics platform: • ChemAxon/Infocom nodes
  • 9. © copyright 2014 Exquiron Biotech AG9 Scope of the migration  Lots of short PP protocols (SD files and lists manipulation, virtual screening,…)  Dose-response data reporting • Input file generation • Report generation  Exhaustive hit expansion toolbox • Hit expansion • Data fusion and analysis Done over Consultancy with support from KNIME Done over Consultancy Done in house Done in house
  • 10. © copyright 2014 Exquiron Biotech AG10 Example 1  Typical workflow for a hit finding campaign: Reporting of dose-response experiments Many 100K compounds A few 1,000s A few 100s HitsActives Confirmed actives HTS Confirmation usually in duplicates Dose-response determination assay + counterassay(s) Reporting • Chemists want to see structures and biological results • Biologists want to see biological results and dose-response curves
  • 11. © copyright 2014 Exquiron Biotech AG11 Reporting of dose-response experiments Workflow (I) Exp. data points IC50 fitting parameters Purity Select project and set of experiments Tag each data point with experiment, concentration, replicate and experiment. Tag IC50 parameters by experiment SD File Additonal data (Text file) JChem cartridge or SD File Exquiron data warehouse
  • 12. © copyright 2014 Exquiron Biotech AG12 Reporting of dose-response experiments Workflow (II) Compound structure SD File Generate report elements Results table Dose- response curves Export as pdf
  • 13. © copyright 2014 Exquiron Biotech AG13 Reporting of dose-response experiments  Concentration may be different for each experiment  Number of replicates per concentration may vary  Datapoint at a given concentration could be invalid  Number of experiments to report may vary  Fields for SD file cannot be defined a priori SD file generation (I) exq_id exp_id Conc Valid Activity(%) EXQ0012345 2344 90 TRUE 6.411991119 EXQ0012345 2344 30 TRUE 41.63282013 EXQ0012345 2344 10 TRUE 74.8677597 EXQ0012345 2344 3.333333254 TRUE 83.46128082 EXQ0012345 2344 1.111111164 TRUE 87.6084137 EXQ0012345 2344 0.370370358 TRUE 93.19058228 . . M END > <dtag_nr> EXQ0012345 > <Activity_0.37037_1_S1> 93.19058227539062 > <Activity_1.11111_1_S1> 87.60841369628906 > <Activity_3.33333_1_S1> 83.4612808227539 > <Activity_10.0_1_S1> 74.86775970458984 > <Activity_30.0_1_S1> 41.63282012939453 > <Activity_90.0_1_S1> 6.411991119384766 . .
  • 14. © copyright 2014 Exquiron Biotech AG14 Reporting of dose-response experiments SD file generation (II) PipelinePilot KNIME Get data from database: • SELECT statement • Each datapoint is a separate row Get data from database: • SELECT statement • Each datapoint is a separate row Crunch data: • Through Group By‟s and a lot of hash tables in PilotScript code  New columns/column names generated on the fly • For each compound, one single row with each datapoint is a separate column Crunch data: • Through Group By‟s, pivoting and some string manipulation  All new columns/column names generated at once through pivoting • For each compound, one single row with each datapoint is a separate column Finalize: • Merge with structures, regression parameters and purity information • Write SD file Finalize: • Merge with structures, regression parameters and purity information • Write SD file
  • 15. © copyright 2014 Exquiron Biotech AG15 Reporting of dose-response experiments Report generation in PipelinePilot PipelinePilot  Read and parse SD file  Generate report elements (PP Reporting Collection) • Compound ID • Structure • Results table  Generate DR curves based on logistic regression parameters (done in Perl !!)  Feed DR curves and experimental data points to XY- Chart component  Combine into report with header, footer, logo…  Generate pdf file
  • 16. © copyright 2014 Exquiron Biotech AG16 Reporting of dose-response experiments PipelinePilot  Read and parse SD file  Generate report elements (PP Reporting Collection) • Compound ID • Structure • Results table  Generate DR curves based on logistic regression parameters (done in Perl !!)  Feed DR curves and experimental data points to XY- Chart component  Combine into report with header, footer, logo…  Generate pdf file …and in KNIME KNIME  Read and parse SD file  Generate report information as tables • Compound ID • Structure • Results table  Generate DR curves based on logistic regression parameters (using a Math Formula node)  Feed DR curves and experimental data points to R and use ggplot to generate the DR plot, return it as picture  Send all elements to the KNIME Report Designer (BIRT)  Combine to a report with header, footer, logo…  Generate pdf file
  • 17. © copyright 2014 Exquiron Biotech AG17 Reporting of dose-response experiments Result PipelinePilot KNIME
  • 18. © copyright 2014 Exquiron Biotech AG18 Example 2  Purpose of hit expansion (aka SAR expansion, SAR by catalogue): • To consolidate and expand knowledge on active regions in chemical space • To find derivatives of active scaffolds to establish or broaden SAR information • To expand hit series by scaffold hopping • Strengthen knowledge base for direct use in H2L campaign Hit expansion workflow
  • 19. © copyright 2014 Exquiron Biotech AG19 Hit Expansion workflow Workflow A. Bergner, S. P. Parel, J. Chem. Inf. Model. 2013, 53, 1057 (http://dx.doi.org/10.1021/ci400059p)
  • 20. © copyright 2014 Exquiron Biotech AG20 Hit Expansion workflow Workflow A. Bergner, S. P. Parel, J. Chem. Inf. Model. 2013, 53, 1057 (http://dx.doi.org/10.1021/ci400059p) Applied to bridge gaps in chemical space
  • 21. © copyright 2014 Exquiron Biotech AG21 Hit Expansion workflow Example (ChemBL VEGFR2 dataset) A. Bergner, S. P. Parel, J. Chem. Inf. Model. 2013, 53, 1057 (http://dx.doi.org/10.1021/ci400059p)
  • 22. © copyright 2014 Exquiron Biotech AG22 Hit expansion workflow  Port all functionalities of the workflow to KNIME using the ChemAxon/Infocom nodes (with the exception of the pharmacophore graphs)  Improve and optimize workflow for increased execution speed (current Turbosim implementation very slow due to the large number of similarity searches performed) Assignment more than fulfilled To do
  • 23. © copyright 2014 Exquiron Biotech AG23 Hit expansion workflow  Workflow combines similarity searches using FCFP, ECFC and MACCS keys  Workflow also performs various substructure searches  In PP, all fingerprints (FP) were pre-calculated and stored with their corresponding structure into a PP cache • Still a lot of «overhead» since structures are not needed to run similarity searches (FP already calculated) and FP are not needed when running substructures searches, but everything is carried through the whole workflow Target database preparation
  • 24. © copyright 2014 Exquiron Biotech AG24 Hit expansion workflow  Workflow combines similarity searches using FCFP, ECFC and MACCS keys  Workflow also performs various substructure searches  In PP, all fingerprints (FP) were pre-calculated and stored with their corresponding structure into a PP cache • Still a lot of «overhead» since structures are not needed to run similarity searches (FP already calculated) and FP are not needed when running substructures searches, but everything is carried through the whole workflow Target database preparation After calculation, FPs are stored into a KNIME table file, and the structures are stored separately into a local JChem database, then load only what is needed where you need it
  • 25. © copyright 2014 Exquiron Biotech AG25 Hit expansion workflow  Idea: create substructure queries with generalized ring structures (somewhat the opposite of Bemis-Murcko scaffolds)  Implementation: • Loop through each atom, set to atom type „any‟ if it is in a ring • Loop through each bond, set to bond type „any‟ if it is in a ring Ring generalization
  • 26. © copyright 2014 Exquiron Biotech AG26 Hit expansion workflow  Idea: create substructure queries with generalized ring structures (somewhat the opposite of Bemis-Murcko scaffolds)  Implementation: • Loop through each atom, set to atom type „any‟ if it is in a ring • Loop through each bond, set to bond type „any‟ if it is in a ring Ring generalization Doesn’t work in PP (with PilotScript and Chemistry Collection)
  • 27. © copyright 2014 Exquiron Biotech AG27 Hit expansion workflow  Idea: create substructure queries with generalized ring structures (somewhat the opposite of Bemis-Murcko scaffolds)  Implementation: • Loop through each atom, set to atom type „any‟ if it is in a ring • Loop through each bond, set to bond type „any‟ if it is in a ring Ring generalization Works perfectly fine in KNIME by calling the ChemAxon API in a Java Snippet
  • 28. © copyright 2014 Exquiron Biotech AG28 Hit expansion workflow  Bioisosteric transformations are applied to generate new virtual template structures  This step makes use of the Wagener and Lommerse dataset (J. Chem. Inf. Model. 2006, 46, 677) as originally implemented in PP (MDL .rxn files)  But there is no bioisosteric transformation node available under KNIME Bioisostere enumeration
  • 29. © copyright 2014 Exquiron Biotech AG29  Bioisosteric transformations were implemented as a KNIME metanode using a loop iterating through each reaction file and applying it to the incoming template with a ChemAxon/Infocom UniReactor node  Works well, but requires a strong atom-to-atom mapping in the .rxn file (which was not the case under PP)  Still a couple of issues to solve: • May be due to the way UniReactor handles 2-attachment points reactions (?):  But we are on the right track ! Hit expansion workflow Bioisostere enumeration - implementation
  • 30. © copyright 2014 Exquiron Biotech AG30 Summary  KNIME combined with the ChemAxon/Infocom cheminformatics nodes is an efficient and cost-effective alternative to PipelinePilot  State-of-the-art reporting platform (WYSIWYG and Drag&Drop)  Fingerprint-based similarity search orders of magnitude faster  RAM memory requirements high (at least for the Desktop version of KNIME when dealing with large SD files, prob. less relevant for the Server version). 2 GB assigned to KNIME is probably the minimum  Loops can sometimes be quite slow (“equivalent” of the RunToCompletion option in PP)  Currently no KNIME implementation for the Discngine Pharmacophore Graphs, but possibility to implement Screen3D instead under evaluation
  • 31. © copyright 2014 Exquiron Biotech AG31 Summary  ChemAxon and KNIME have done a terrific job in helping us successfully migrate business critical protocols/workflows • Required timelines could be maintained (due to expiring PP license) • Workflows have been optimized and streamlined If you are planning a similar migration and you need help, ask the ChemAxon team. They definitively can help !
  • 32. © copyright 2014 Exquiron Biotech AG32 Acknowledgements  Attila Szabó  Mihály „Medzi‟ Medzihradszky  Andreas Bergner  Aaron Hart
  • 33. Thank you! For more information contact Serge P. Parel Exquiron Biotech AG Kägenstrasse 17 4153 Reinach Switzerland E-mail: sparel@exquiron.com www.exquiron.com