Processing malaria HTS results using KNIME: a tutorial

© 2018 KNIME AG. All Rights Reserved.
Processing malaria HTS results using
KNIME: a tutorial
21 February, 2018
Greg Landrum, Ph.D.
greg.landrum@knime.com

© 2018 KNIME AG. All Rights Reserved. 2
Agenda
• Very brief intro to KNIME
• The HTS processing workflow
• Q&A
• Chemistry in KNIME with the RDKit
The workflows and data used in this presentation can all be
downloaded from the EXAMPLES folder in KNIME in the folder:
knime://EXAMPLES/50_Applications/32_Hitlist_Processing

KNIME, the company
• KNIME AG founded in 2008
• Offices in Zurich (HQ), Konstanz, Berlin, and Austin
• 40+ employees
• Maintainer of the Open Source KNIME Analytics Platform
– comprehensive data loading, processing, analysis, modeling platform
– visual frontend
– open: to all sorts of data, other tools (R and Python, etc.), various user
personas
– 20+ open source releases since 2006
– Free and open source.
• KNIME Server
– 14 commercial product releases since 2008
• KNIME cloud offerings

The KNIME® Analytics Platform

Analysis & Mining
Statistics, Machine Learning, Data
Mining, Web Analytics, Text
Mining, Network Analysis, Social
Media Analysis, R, Weka, Python,
Community / 3rd party, ...
Data Access
MySQL, Oracle, ...
SAS, SPSS, ...
Excel, Flat, ...
Hive, Impala, ...
XML, JSON, PMML
Text, Doc, Image, ...
Web Crawlers,
Industry Specific,
Community / 3rd
party ...
Transformation
Row, Column, Matrix
Text, Image, Networks, Time
Series, Java, Python,
Visualization
R, Python,
JFreeChart,
JavaScript,
Deployment
via BIRT
PMML, XML, JSON
Databases, Excel, Flat, etc.
Text, Doc, Image
Industry Specific
Over 2000 native and embedded nodes included:
Big Data
Hive, Impala, HDFS Vertica,
Teradata/Aster, Spark, MLlib,

Free E-Learning Course: Web Page
6
• Hands-on e-learning course
• Data Access, ETL, Analytics, Control
Structures, Visualization
• Around 50 small units
• … with exercises
• … and with solutions on the
EXAMPLES server
• Final exercises to test your
knowledge!
https://www.knime.org/knime-
introductory-course

KNIME Products Overview
KNIME®
Analytics
Platform
Open Source
Extensions
Community
&
Partner
Extensions
Chem- & Bioinf,
Data Providers,
Signal Processing,
...
R & Python,
Big Data,
Deep Learning
Text Processing,
Image Analysis,
High Speed ML,
...
Deployment:
- to Applications
- to Humans
Collaboration:
- Compliance
- Best Practices
- Sharing Expertise
Automation:
- Scheduling
- (Model) Management
KNIME® Server
- on Premise
- in the Cloud

KNIME Server
Shared Repositories Access Management Web Enablement
Flexible Execution

9© 2018 KNIME AG. All Rights Reserved.
Processing HTS Data with KNME

Background
• The problem: Processing a hit list from a high-
throughput phenotypic screen for malaria.
– Clean up the hit list
– Suggest compounds to be sent to a validation assay
• Data source: 2014 Teach-Discover-Treat challenge
http://www.tdtproject.org/challenge-1---malaria-
hts.html
• Additional info:
– https://github.com/sriniker/TDT-tutorial-2014
– Riniker et al. https://f1000research.com/articles/6-1136/v2

Approach we’ll take: cleanup
• Remove ”ugly” molecules:
– PAINS filters1,2: containing substructures that are likely to
interfere with/have interfered with the assay.
– ”Rapid elimination of swill” (REOS)3: Too big, complicated
or greasy.
• Don’t want to apply these filters mindlessly, so we
should always look at the results and allow manual
rescue
1. Baell, J. B. & Holloway, G. A. J. Med. Chem. 53, 2719–40 (2010).
2. http://rdkit.blogspot.ch/2015/08/curating-pains-filters.html
3. Walters, W. P. & Namchuk, M. Nat. Rev. Drug Discov. 2, 259–66 (2003).

Approach we’ll take: selection for validation
• We want good coverage of the chemical space of
the HTS actives, but would ideally also like to learn
something from the validation results
• Approach:
– Start with a diverse subset of the cleaned actives
– Pick neighbors of each of these so that we have some SAR
information in the results
https://github.com/sriniker/TDT-tutorial-2014

Selection example: some cluster centroids

Selection example: the picks
Cluster 1 Cluster 2

Cleanup workflow (part 1)

The output

Selection workflow

The output

The workflows
• Download (with data) from the
EXAMPLES folder in KNIME:
knime://EXAMPLES/50_Applications/
32_Hitlist_Processing
…

Brief intro to the RDKit

• Business-friendly BSD license
• Runs on Linux/Mac/Windows
• Commercial support available
• Releases every six months
• Active and engaged community
• Core data structures and algorithms in C++
• Usable from Python (2 or 3), C#, or Java
• Strong integration with other tools like KNIME,
Jupyter, Pandas, and PostgreSQL
• Pretty good documentation
• Basic functionality highlights:
– Chemical reactions
– 2D depiction
– Substructure searching
– Canonical SMILES
– Gasteiger-Marsili charges
– Molecular standardization
• 2D Functionality highlights:
– RECAP and BRICS support
– Multi-molecule MCS
– Similarity maps
– Functional group filters
– Diversity picking
• Supported fingerprint highlights:
– Morgan/Feature Morgan (ECFP/FCFP-like)
– RDKit (Daylight-like)
– Atom-pairs and topological torsions
– MACCS keys
– Avalon
• Descriptor highlights:
– Hall-Kier 𝜒 and 𝜅 descriptors
– SLogP, SMR, TPSA
– MQN
– “MOE-like” VSA
– Compositional (number of donors, number of
rings, number of heterocycles, etc.)
• 3D Functionality highlights:
– 2D->3D conversion/conformational analysis via
distance geometry
– UFF and MMFF94/MMFF94S implementations for
cleaning up structures
– Feature maps and feature-map vectors
– Shape-based similarity
– RMSD-based molecule-molecule alignment
– Open3DAlign implementation
– Integration with PyMOL
– Torsion Fingerprint Differences
The RDKit: An open-source toolkit for cheminformatics
www.rdkit.org

The RDKit code ecosystem
C++ :
Core data structures and algorithms
PostgreSQL
Boost.Python SWIG
Python Java C#
Jupyter Pandas KNIME
The exact same implementation is available in all endpoints

The RDKit and KNIME
34
34
• Open-source wrappers for KNIME maintained by NIBR
and the open-source community
• Useful for:
• Descriptor calculation
• Cleaning structures
• Canonical SMILES and InChi conversion
• Fingerprints
• Scaffolds/substructures
• Reaction simulation
• Conformation generation
• and more…
www.rdkit.org

“Demo” 1: finding the scaffold for a set of compounds
knime://EXAMPLES/99_Community/03_RDKit/06_Find_Scaffolds_And_Sidechains

“Demo” 1: finding the scaffold for a set of compounds

“Demo” 2: library enumeration
knime://EXAMPLES/99_Community/03_RDKit/02_Reaction_Enumeration

“Demo” 2: library enumeration

“Demo” 2: library enumeration results

“Demo” 3: key compound from a patent
knime://EXAMPLES/50_Applications/29_Patent_Network_Analysis/Tarceva_neighbor_network_-_From_SureChEMBL

Read structures from the
Tarceva patent
(exported from SureChEMBL)

Build network by connecting
similar molecules

That’s Tarceva

Wrapping up
The workflows and data used in this presentation can all be
downloaded from the EXAMPLES folder in KNIME in the folder:
knime://EXAMPLES/50_Applications/32_Hitlist_Processing

KNIME Spring Summit 2018
March 5 – 9 at Hotel Berlin, Berlin in Germany
• Monday & Tuesday: One and two-day courses
– From Basics to Big Data and Text Processing as well as Advanced Analytics
• Wednesday & Thursday: Summit sessions
• Friday: Workshops
Registration at
www.KNIME.com

Processing malaria HTS results using KNIME: a tutorial

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Processing malaria HTS results using KNIME: a tutorial

Similar to Processing malaria HTS results using KNIME: a tutorial (20)

More from Greg Landrum

More from Greg Landrum (12)

Recently uploaded

Recently uploaded (20)

Processing malaria HTS results using KNIME: a tutorial