SlideShare a Scribd company logo
© 2018 KNIME AG. All Rights Reserved.
Processing malaria HTS results using
KNIME: a tutorial
21 February, 2018
Greg Landrum, Ph.D.
greg.landrum@knime.com
© 2018 KNIME AG. All Rights Reserved. 2
Agenda
• Very brief intro to KNIME
• The HTS processing workflow
• Q&A
• Chemistry in KNIME with the RDKit
The workflows and data used in this presentation can all be
downloaded from the EXAMPLES folder in KNIME in the folder:
knime://EXAMPLES/50_Applications/32_Hitlist_Processing
© 2018 KNIME AG. All Rights Reserved. 3
KNIME, the company
• KNIME AG founded in 2008
• Offices in Zurich (HQ), Konstanz, Berlin, and Austin
• 40+ employees
• Maintainer of the Open Source KNIME Analytics Platform
– comprehensive data loading, processing, analysis, modeling platform
– visual frontend
– open: to all sorts of data, other tools (R and Python, etc.), various user
personas
– 20+ open source releases since 2006
– Free and open source.
• KNIME Server
– 14 commercial product releases since 2008
• KNIME cloud offerings
© 2018 KNIME AG. All Rights Reserved. 4
The KNIME® Analytics Platform
© 2018 KNIME AG. All Rights Reserved. 5
Analysis & Mining
Statistics, Machine Learning, Data
Mining, Web Analytics, Text
Mining, Network Analysis, Social
Media Analysis, R, Weka, Python,
Community / 3rd party, ...
Data Access
MySQL, Oracle, ...
SAS, SPSS, ...
Excel, Flat, ...
Hive, Impala, ...
XML, JSON, PMML
Text, Doc, Image, ...
Web Crawlers,
Industry Specific,
Community / 3rd
party ...
Transformation
Row, Column, Matrix
Text, Image, Networks, Time
Series, Java, Python,
Community / 3rd party, ...
Visualization
R, Python,
JFreeChart,
JavaScript,
Community / 3rd party, ...
Deployment
via BIRT
PMML, XML, JSON
Databases, Excel, Flat, etc.
Text, Doc, Image
Industry Specific
Community / 3rd party, ...
Over 2000 native and embedded nodes included:
Big Data
Hive, Impala, HDFS Vertica,
Teradata/Aster, Spark, MLlib,
Community / 3rd party, ...
© 2018 KNIME AG. All Rights Reserved. 6
Free E-Learning Course: Web Page
6
• Hands-on e-learning course
• Data Access, ETL, Analytics, Control
Structures, Visualization
• Around 50 small units
• … with exercises
• … and with solutions on the
EXAMPLES server
• Final exercises to test your
knowledge!
https://www.knime.org/knime-
introductory-course
© 2018 KNIME AG. All Rights Reserved. 7
KNIME Products Overview
KNIME®
Analytics
Platform
Open Source
Extensions
Community
&
Partner
Extensions
Chem- & Bioinf,
Data Providers,
Signal Processing,
...
R & Python,
Big Data,
Deep Learning
Text Processing,
Image Analysis,
High Speed ML,
...
Deployment:
- to Applications
- to Humans
Collaboration:
- Compliance
- Best Practices
- Sharing Expertise
Automation:
- Scheduling
- (Model) Management
KNIME® Server
- on Premise
- in the Cloud
© 2018 KNIME AG. All Rights Reserved. 8
KNIME Server
Shared Repositories Access Management Web Enablement
Flexible Execution
9© 2018 KNIME AG. All Rights Reserved.
Processing HTS Data with KNME
© 2018 KNIME AG. All Rights Reserved. 10
Background
• The problem: Processing a hit list from a high-
throughput phenotypic screen for malaria.
– Clean up the hit list
– Suggest compounds to be sent to a validation assay
• Data source: 2014 Teach-Discover-Treat challenge
http://www.tdtproject.org/challenge-1---malaria-
hts.html
• Additional info:
– https://github.com/sriniker/TDT-tutorial-2014
– Riniker et al. https://f1000research.com/articles/6-1136/v2
© 2018 KNIME AG. All Rights Reserved. 11
Approach we’ll take: cleanup
• Remove ”ugly” molecules:
– PAINS filters1,2: containing substructures that are likely to
interfere with/have interfered with the assay.
– ”Rapid elimination of swill” (REOS)3: Too big, complicated
or greasy.
• Don’t want to apply these filters mindlessly, so we
should always look at the results and allow manual
rescue
1. Baell, J. B. & Holloway, G. A. J. Med. Chem. 53, 2719–40 (2010).
2. http://rdkit.blogspot.ch/2015/08/curating-pains-filters.html
3. Walters, W. P. & Namchuk, M. Nat. Rev. Drug Discov. 2, 259–66 (2003).
© 2018 KNIME AG. All Rights Reserved. 12
Approach we’ll take: selection for validation
• We want good coverage of the chemical space of
the HTS actives, but would ideally also like to learn
something from the validation results
• Approach:
– Start with a diverse subset of the cleaned actives
– Pick neighbors of each of these so that we have some SAR
information in the results
https://github.com/sriniker/TDT-tutorial-2014
© 2018 KNIME AG. All Rights Reserved. 13
Selection example: some cluster centroids
© 2018 KNIME AG. All Rights Reserved. 14
Selection example: the picks
Cluster 1 Cluster 2
© 2018 KNIME AG. All Rights Reserved. 15
Cleanup workflow (part 1)
© 2018 KNIME AG. All Rights Reserved. 16
Cleanup workflow (part 1)
© 2018 KNIME AG. All Rights Reserved. 17
Cleanup workflow (part 1)
© 2018 KNIME AG. All Rights Reserved. 18
Cleanup workflow (part 1)
© 2018 KNIME AG. All Rights Reserved. 19
Cleanup workflow (part 1)
© 2018 KNIME AG. All Rights Reserved. 20
Cleanup workflow (part 1)
© 2018 KNIME AG. All Rights Reserved. 21
Cleanup workflow (part 2)
© 2018 KNIME AG. All Rights Reserved. 22
Cleanup workflow (part 2)
© 2018 KNIME AG. All Rights Reserved. 23
The output
© 2018 KNIME AG. All Rights Reserved. 24
Selection workflow
© 2018 KNIME AG. All Rights Reserved. 25
Selection workflow
© 2018 KNIME AG. All Rights Reserved. 26
Selection workflow
© 2018 KNIME AG. All Rights Reserved. 27
Selection workflow
© 2018 KNIME AG. All Rights Reserved. 28
Selection workflow
© 2018 KNIME AG. All Rights Reserved. 29
The output
© 2018 KNIME AG. All Rights Reserved. 30
The workflows
• Download (with data) from the
EXAMPLES folder in KNIME:
knime://EXAMPLES/50_Applications/
32_Hitlist_Processing
…
31© 2018 KNIME AG. All Rights Reserved.
Brief intro to the RDKit
© 2018 KNIME AG. All Rights Reserved. 32
• Business-friendly BSD license
• Runs on Linux/Mac/Windows
• Commercial support available
• Releases every six months
• Active and engaged community
• Core data structures and algorithms in C++
• Usable from Python (2 or 3), C#, or Java
• Strong integration with other tools like KNIME,
Jupyter, Pandas, and PostgreSQL
• Pretty good documentation
• Basic functionality highlights:
– Chemical reactions
– 2D depiction
– Substructure searching
– Canonical SMILES
– Gasteiger-Marsili charges
– Molecular standardization
• 2D Functionality highlights:
– RECAP and BRICS support
– Multi-molecule MCS
– Similarity maps
– Functional group filters
– Diversity picking
• Supported fingerprint highlights:
– Morgan/Feature Morgan (ECFP/FCFP-like)
– RDKit (Daylight-like)
– Atom-pairs and topological torsions
– MACCS keys
– Avalon
• Descriptor highlights:
– Hall-Kier 𝜒 and 𝜅 descriptors
– SLogP, SMR, TPSA
– MQN
– “MOE-like” VSA
– Compositional (number of donors, number of
rings, number of heterocycles, etc.)
• 3D Functionality highlights:
– 2D->3D conversion/conformational analysis via
distance geometry
– UFF and MMFF94/MMFF94S implementations for
cleaning up structures
– Feature maps and feature-map vectors
– Shape-based similarity
– RMSD-based molecule-molecule alignment
– Open3DAlign implementation
– Integration with PyMOL
– Torsion Fingerprint Differences
The RDKit: An open-source toolkit for cheminformatics
www.rdkit.org
© 2018 KNIME AG. All Rights Reserved. 33
The RDKit code ecosystem
C++ :
Core data structures and algorithms
PostgreSQL
Boost.Python SWIG
Python Java C#
Jupyter Pandas KNIME
The exact same implementation is available in all endpoints
© 2018 KNIME AG. All Rights Reserved. 34
The RDKit and KNIME
34
34
• Open-source wrappers for KNIME maintained by NIBR
and the open-source community
• Useful for:
• Descriptor calculation
• Cleaning structures
• Canonical SMILES and InChi conversion
• Fingerprints
• Scaffolds/substructures
• Reaction simulation
• Conformation generation
• and more…
www.rdkit.org
© 2018 KNIME AG. All Rights Reserved. 35
“Demo” 1: finding the scaffold for a set of compounds
knime://EXAMPLES/99_Community/03_RDKit/06_Find_Scaffolds_And_Sidechains
© 2018 KNIME AG. All Rights Reserved. 36
“Demo” 1: finding the scaffold for a set of compounds
© 2018 KNIME AG. All Rights Reserved. 37
“Demo” 2: library enumeration
knime://EXAMPLES/99_Community/03_RDKit/02_Reaction_Enumeration
© 2018 KNIME AG. All Rights Reserved. 38
“Demo” 2: library enumeration
knime://EXAMPLES/99_Community/03_RDKit/02_Reaction_Enumeration
© 2018 KNIME AG. All Rights Reserved. 39
“Demo” 2: library enumeration results
knime://EXAMPLES/99_Community/03_RDKit/02_Reaction_Enumeration
© 2018 KNIME AG. All Rights Reserved. 40
“Demo” 3: key compound from a patent
knime://EXAMPLES/50_Applications/29_Patent_Network_Analysis/Tarceva_neighbor_network_-_From_SureChEMBL
© 2018 KNIME AG. All Rights Reserved. 41
“Demo” 3: key compound from a patent
knime://EXAMPLES/50_Applications/29_Patent_Network_Analysis/Tarceva_neighbor_network_-_From_SureChEMBL
Read structures from the
Tarceva patent
(exported from SureChEMBL)
© 2018 KNIME AG. All Rights Reserved. 42
knime://EXAMPLES/50_Applications/29_Patent_Network_Analysis/Tarceva_neighbor_network_-_From_SureChEMBL
“Demo” 3: key compound from a patent
© 2018 KNIME AG. All Rights Reserved. 43
knime://EXAMPLES/50_Applications/29_Patent_Network_Analysis/Tarceva_neighbor_network_-_From_SureChEMBL
“Demo” 3: key compound from a patent
Build network by connecting
similar molecules
© 2018 KNIME AG. All Rights Reserved. 44
knime://EXAMPLES/50_Applications/29_Patent_Network_Analysis/Tarceva_neighbor_network_-_From_SureChEMBL
“Demo” 3: key compound from a patent
© 2018 KNIME AG. All Rights Reserved. 45
knime://EXAMPLES/50_Applications/29_Patent_Network_Analysis/Tarceva_neighbor_network_-_From_SureChEMBL
“Demo” 3: key compound from a patent
That’s Tarceva
46© 2018 KNIME AG. All Rights Reserved.
Wrapping up
The workflows and data used in this presentation can all be
downloaded from the EXAMPLES folder in KNIME in the folder:
knime://EXAMPLES/50_Applications/32_Hitlist_Processing
© 2018 KNIME AG. All Rights Reserved. 47
KNIME Spring Summit 2018
March 5 – 9 at Hotel Berlin, Berlin in Germany
• Monday & Tuesday: One and two-day courses
– From Basics to Big Data and Text Processing as well as Advanced Analytics
• Wednesday & Thursday: Summit sessions
• Friday: Workshops
Registration at
www.KNIME.com

More Related Content

What's hot

Managing large (and small) R based solutions with R Suite
Managing large (and small) R based solutions with R SuiteManaging large (and small) R based solutions with R Suite
Managing large (and small) R based solutions with R Suite
Wit Jakuczun
 
NSF CAC Cloud Interoperability Testbed Projects
NSF CAC Cloud Interoperability Testbed ProjectsNSF CAC Cloud Interoperability Testbed Projects
NSF CAC Cloud Interoperability Testbed Projects
Alan Sill
 
Plume - A Code Property Graph Extraction and Analysis Library
Plume - A Code Property Graph Extraction and Analysis LibraryPlume - A Code Property Graph Extraction and Analysis Library
Plume - A Code Property Graph Extraction and Analysis Library
TigerGraph
 
Know your R usage workflow to handle reproducibility challenges
Know your R usage workflow to handle reproducibility challengesKnow your R usage workflow to handle reproducibility challenges
Know your R usage workflow to handle reproducibility challenges
Wit Jakuczun
 
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Revolution Analytics
 
Worldwide LHC Computing Grid - Ian Bird -HNSciCloud Prototype Phase kickoff M...
Worldwide LHC Computing Grid - Ian Bird -HNSciCloud Prototype Phase kickoff M...Worldwide LHC Computing Grid - Ian Bird -HNSciCloud Prototype Phase kickoff M...
Worldwide LHC Computing Grid - Ian Bird -HNSciCloud Prototype Phase kickoff M...
Helix Nebula The Science Cloud
 
Massively Scalable Computational Finance with SciDB
 Massively Scalable Computational Finance with SciDB Massively Scalable Computational Finance with SciDB
Massively Scalable Computational Finance with SciDB
Paradigm4Inc
 
Graph Databases and Machine Learning | November 2018
Graph Databases and Machine Learning | November 2018Graph Databases and Machine Learning | November 2018
Graph Databases and Machine Learning | November 2018
TigerGraph
 
Deep Hybrid DataCloud
Deep Hybrid DataCloudDeep Hybrid DataCloud
Deep Hybrid DataCloud
EOSC-hub project
 
DEEP general presentation
DEEP general presentationDEEP general presentation
DEEP general presentation
EUDAT
 
Raster Algebra mit Oracle Spatial und uDig
Raster Algebra mit Oracle Spatial und uDigRaster Algebra mit Oracle Spatial und uDig
Raster Algebra mit Oracle Spatial und uDig
Karin Patenge
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big data
Sigmoid
 
HNSciCloud update @ the World LHC Computing Grid deployment board
HNSciCloud update @ the World LHC Computing Grid deployment board  HNSciCloud update @ the World LHC Computing Grid deployment board
HNSciCloud update @ the World LHC Computing Grid deployment board
Helix Nebula The Science Cloud
 
Graph Gurus 21: Integrating Real-Time Deep-Link Graph Analytics with Spark AI
Graph Gurus 21: Integrating Real-Time Deep-Link Graph Analytics with Spark AIGraph Gurus 21: Integrating Real-Time Deep-Link Graph Analytics with Spark AI
Graph Gurus 21: Integrating Real-Time Deep-Link Graph Analytics with Spark AI
TigerGraph
 
Graph Gurus Episode 3: Anti Fraud and AML Part 1
Graph Gurus Episode 3: Anti Fraud and AML Part 1Graph Gurus Episode 3: Anti Fraud and AML Part 1
Graph Gurus Episode 3: Anti Fraud and AML Part 1
TigerGraph
 
Graph Gurus 15: Introducing TigerGraph 2.4
Graph Gurus 15: Introducing TigerGraph 2.4 Graph Gurus 15: Introducing TigerGraph 2.4
Graph Gurus 15: Introducing TigerGraph 2.4
TigerGraph
 
State of enterprise data science
State of enterprise data scienceState of enterprise data science
State of enterprise data science
Yan Xu
 
The Science Cloud Users: Challenges and Needs
The Science Cloud Users: Challenges and NeedsThe Science Cloud Users: Challenges and Needs
The Science Cloud Users: Challenges and Needs
Helix Nebula The Science Cloud
 
Developing Your Own Flux Packages by David McKay | Head of Developer Relation...
Developing Your Own Flux Packages by David McKay | Head of Developer Relation...Developing Your Own Flux Packages by David McKay | Head of Developer Relation...
Developing Your Own Flux Packages by David McKay | Head of Developer Relation...
InfluxData
 
SAMOA: A Platform for Mining Big Data Streams (Apache BigData North America 2...
SAMOA: A Platform for Mining Big Data Streams (Apache BigData North America 2...SAMOA: A Platform for Mining Big Data Streams (Apache BigData North America 2...
SAMOA: A Platform for Mining Big Data Streams (Apache BigData North America 2...
Nicolas Kourtellis
 

What's hot (20)

Managing large (and small) R based solutions with R Suite
Managing large (and small) R based solutions with R SuiteManaging large (and small) R based solutions with R Suite
Managing large (and small) R based solutions with R Suite
 
NSF CAC Cloud Interoperability Testbed Projects
NSF CAC Cloud Interoperability Testbed ProjectsNSF CAC Cloud Interoperability Testbed Projects
NSF CAC Cloud Interoperability Testbed Projects
 
Plume - A Code Property Graph Extraction and Analysis Library
Plume - A Code Property Graph Extraction and Analysis LibraryPlume - A Code Property Graph Extraction and Analysis Library
Plume - A Code Property Graph Extraction and Analysis Library
 
Know your R usage workflow to handle reproducibility challenges
Know your R usage workflow to handle reproducibility challengesKnow your R usage workflow to handle reproducibility challenges
Know your R usage workflow to handle reproducibility challenges
 
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
 
Worldwide LHC Computing Grid - Ian Bird -HNSciCloud Prototype Phase kickoff M...
Worldwide LHC Computing Grid - Ian Bird -HNSciCloud Prototype Phase kickoff M...Worldwide LHC Computing Grid - Ian Bird -HNSciCloud Prototype Phase kickoff M...
Worldwide LHC Computing Grid - Ian Bird -HNSciCloud Prototype Phase kickoff M...
 
Massively Scalable Computational Finance with SciDB
 Massively Scalable Computational Finance with SciDB Massively Scalable Computational Finance with SciDB
Massively Scalable Computational Finance with SciDB
 
Graph Databases and Machine Learning | November 2018
Graph Databases and Machine Learning | November 2018Graph Databases and Machine Learning | November 2018
Graph Databases and Machine Learning | November 2018
 
Deep Hybrid DataCloud
Deep Hybrid DataCloudDeep Hybrid DataCloud
Deep Hybrid DataCloud
 
DEEP general presentation
DEEP general presentationDEEP general presentation
DEEP general presentation
 
Raster Algebra mit Oracle Spatial und uDig
Raster Algebra mit Oracle Spatial und uDigRaster Algebra mit Oracle Spatial und uDig
Raster Algebra mit Oracle Spatial und uDig
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big data
 
HNSciCloud update @ the World LHC Computing Grid deployment board
HNSciCloud update @ the World LHC Computing Grid deployment board  HNSciCloud update @ the World LHC Computing Grid deployment board
HNSciCloud update @ the World LHC Computing Grid deployment board
 
Graph Gurus 21: Integrating Real-Time Deep-Link Graph Analytics with Spark AI
Graph Gurus 21: Integrating Real-Time Deep-Link Graph Analytics with Spark AIGraph Gurus 21: Integrating Real-Time Deep-Link Graph Analytics with Spark AI
Graph Gurus 21: Integrating Real-Time Deep-Link Graph Analytics with Spark AI
 
Graph Gurus Episode 3: Anti Fraud and AML Part 1
Graph Gurus Episode 3: Anti Fraud and AML Part 1Graph Gurus Episode 3: Anti Fraud and AML Part 1
Graph Gurus Episode 3: Anti Fraud and AML Part 1
 
Graph Gurus 15: Introducing TigerGraph 2.4
Graph Gurus 15: Introducing TigerGraph 2.4 Graph Gurus 15: Introducing TigerGraph 2.4
Graph Gurus 15: Introducing TigerGraph 2.4
 
State of enterprise data science
State of enterprise data scienceState of enterprise data science
State of enterprise data science
 
The Science Cloud Users: Challenges and Needs
The Science Cloud Users: Challenges and NeedsThe Science Cloud Users: Challenges and Needs
The Science Cloud Users: Challenges and Needs
 
Developing Your Own Flux Packages by David McKay | Head of Developer Relation...
Developing Your Own Flux Packages by David McKay | Head of Developer Relation...Developing Your Own Flux Packages by David McKay | Head of Developer Relation...
Developing Your Own Flux Packages by David McKay | Head of Developer Relation...
 
SAMOA: A Platform for Mining Big Data Streams (Apache BigData North America 2...
SAMOA: A Platform for Mining Big Data Streams (Apache BigData North America 2...SAMOA: A Platform for Mining Big Data Streams (Apache BigData North America 2...
SAMOA: A Platform for Mining Big Data Streams (Apache BigData North America 2...
 

Similar to Processing malaria HTS results using KNIME: a tutorial

Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...
Greg Landrum
 
Knime & bioinformatics
Knime & bioinformaticsKnime & bioinformatics
Knime & bioinformatics
BioinformaticsInstitute
 
KNIME Software Overview
KNIME Software OverviewKNIME Software Overview
KNIME Software Overview
KNIMESlides
 
Moving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine LearningMoving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine Learning
Greg Landrum
 
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIMESlides
 
Webinar: Deep Learning Pipelines Beyond the Learning
Webinar: Deep Learning Pipelines Beyond the LearningWebinar: Deep Learning Pipelines Beyond the Learning
Webinar: Deep Learning Pipelines Beyond the Learning
Mesosphere Inc.
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examples
Luciano Resende
 
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
Sri Ambati
 
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
Alok Singh
 
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
Codemotion
 
Arcelormittal @ Scilab Conference 2018
Arcelormittal @ Scilab Conference 2018Arcelormittal @ Scilab Conference 2018
Arcelormittal @ Scilab Conference 2018
Scilab
 
Master the RETE algorithm
Master the RETE algorithmMaster the RETE algorithm
Master the RETE algorithm
Masahiko Umeno
 
From Raw Data to Deployment
From Raw Data to DeploymentFrom Raw Data to Deployment
From Raw Data to Deployment
KNIMESlides
 
Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for Code
Luciano Resende
 
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
Charles Sonigo
 
Curated "Cloud Design Patterns" for Call Center Platforms
Curated "Cloud Design Patterns" for Call Center PlatformsCurated "Cloud Design Patterns" for Call Center Platforms
Curated "Cloud Design Patterns" for Call Center Platforms
Alejandro Rios Peña
 
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
Kamanja: Driving Business Value through Real-Time Decisioning SolutionsKamanja: Driving Business Value through Real-Time Decisioning Solutions
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
Greg Makowski
 
What's New in KNIME Analytics Platform 4.1
What's New in KNIME Analytics Platform 4.1What's New in KNIME Analytics Platform 4.1
What's New in KNIME Analytics Platform 4.1
KNIMESlides
 
Workshop - Architecting Innovative Graph Applications- GraphSummit Milan
Workshop -  Architecting Innovative Graph Applications- GraphSummit MilanWorkshop -  Architecting Innovative Graph Applications- GraphSummit Milan
Workshop - Architecting Innovative Graph Applications- GraphSummit Milan
Neo4j
 
Container and Kubernetes without limits
Container and Kubernetes without limitsContainer and Kubernetes without limits
Container and Kubernetes without limits
Antje Barth
 

Similar to Processing malaria HTS results using KNIME: a tutorial (20)

Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...
 
Knime & bioinformatics
Knime & bioinformaticsKnime & bioinformatics
Knime & bioinformatics
 
KNIME Software Overview
KNIME Software OverviewKNIME Software Overview
KNIME Software Overview
 
Moving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine LearningMoving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine Learning
 
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
 
Webinar: Deep Learning Pipelines Beyond the Learning
Webinar: Deep Learning Pipelines Beyond the LearningWebinar: Deep Learning Pipelines Beyond the Learning
Webinar: Deep Learning Pipelines Beyond the Learning
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examples
 
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
 
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
 
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
 
Arcelormittal @ Scilab Conference 2018
Arcelormittal @ Scilab Conference 2018Arcelormittal @ Scilab Conference 2018
Arcelormittal @ Scilab Conference 2018
 
Master the RETE algorithm
Master the RETE algorithmMaster the RETE algorithm
Master the RETE algorithm
 
From Raw Data to Deployment
From Raw Data to DeploymentFrom Raw Data to Deployment
From Raw Data to Deployment
 
Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for Code
 
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
 
Curated "Cloud Design Patterns" for Call Center Platforms
Curated "Cloud Design Patterns" for Call Center PlatformsCurated "Cloud Design Patterns" for Call Center Platforms
Curated "Cloud Design Patterns" for Call Center Platforms
 
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
Kamanja: Driving Business Value through Real-Time Decisioning SolutionsKamanja: Driving Business Value through Real-Time Decisioning Solutions
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
 
What's New in KNIME Analytics Platform 4.1
What's New in KNIME Analytics Platform 4.1What's New in KNIME Analytics Platform 4.1
What's New in KNIME Analytics Platform 4.1
 
Workshop - Architecting Innovative Graph Applications- GraphSummit Milan
Workshop -  Architecting Innovative Graph Applications- GraphSummit MilanWorkshop -  Architecting Innovative Graph Applications- GraphSummit Milan
Workshop - Architecting Innovative Graph Applications- GraphSummit Milan
 
Container and Kubernetes without limits
Container and Kubernetes without limitsContainer and Kubernetes without limits
Container and Kubernetes without limits
 

More from Greg Landrum

Chemical registration
Chemical registrationChemical registration
Chemical registration
Greg Landrum
 
Mike Lynch Award Lecture, ICCS 2022
Mike Lynch Award Lecture, ICCS 2022Mike Lynch Award Lecture, ICCS 2022
Mike Lynch Award Lecture, ICCS 2022
Greg Landrum
 
Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Google BigQuery for analysis of scientific datasets: Interactive exploration ...Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Greg Landrum
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)
Greg Landrum
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)
Greg Landrum
 
Is one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchIs one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical research
Greg Landrum
 
Large scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent dataLarge scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent data
Greg Landrum
 
Machine learning in the life sciences with knime
Machine learning in the life sciences with knimeMachine learning in the life sciences with knime
Machine learning in the life sciences with knime
Greg Landrum
 
Open-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKitOpen-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKit
Greg Landrum
 
Open-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesOpen-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databases
Greg Landrum
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Greg Landrum
 
Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...
Greg Landrum
 

More from Greg Landrum (12)

Chemical registration
Chemical registrationChemical registration
Chemical registration
 
Mike Lynch Award Lecture, ICCS 2022
Mike Lynch Award Lecture, ICCS 2022Mike Lynch Award Lecture, ICCS 2022
Mike Lynch Award Lecture, ICCS 2022
 
Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Google BigQuery for analysis of scientific datasets: Interactive exploration ...Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Google BigQuery for analysis of scientific datasets: Interactive exploration ...
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)
 
Is one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchIs one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical research
 
Large scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent dataLarge scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent data
 
Machine learning in the life sciences with knime
Machine learning in the life sciences with knimeMachine learning in the life sciences with knime
Machine learning in the life sciences with knime
 
Open-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKitOpen-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKit
 
Open-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesOpen-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databases
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...
 
Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...
 

Recently uploaded

extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
DiyaBiswas10
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
subedisuryaofficial
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
kumarmathi863
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
NathanBaughman3
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SELF-EXPLANATORY
 
Predicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdfPredicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdf
binhminhvu04
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
AADYARAJPANDEY1
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
FAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable PredictionsFAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable Predictions
Michel Dumontier
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
plant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptxplant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptx
yusufzako14
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
aishnasrivastava
 
insect morphology and physiology of insect
insect morphology and physiology of insectinsect morphology and physiology of insect
insect morphology and physiology of insect
anitaento25
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
muralinath2
 

Recently uploaded (20)

extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
Predicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdfPredicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdf
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
FAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable PredictionsFAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable Predictions
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
plant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptxplant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptx
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
 
insect morphology and physiology of insect
insect morphology and physiology of insectinsect morphology and physiology of insect
insect morphology and physiology of insect
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
 

Processing malaria HTS results using KNIME: a tutorial

  • 1. © 2018 KNIME AG. All Rights Reserved. Processing malaria HTS results using KNIME: a tutorial 21 February, 2018 Greg Landrum, Ph.D. greg.landrum@knime.com
  • 2. © 2018 KNIME AG. All Rights Reserved. 2 Agenda • Very brief intro to KNIME • The HTS processing workflow • Q&A • Chemistry in KNIME with the RDKit The workflows and data used in this presentation can all be downloaded from the EXAMPLES folder in KNIME in the folder: knime://EXAMPLES/50_Applications/32_Hitlist_Processing
  • 3. © 2018 KNIME AG. All Rights Reserved. 3 KNIME, the company • KNIME AG founded in 2008 • Offices in Zurich (HQ), Konstanz, Berlin, and Austin • 40+ employees • Maintainer of the Open Source KNIME Analytics Platform – comprehensive data loading, processing, analysis, modeling platform – visual frontend – open: to all sorts of data, other tools (R and Python, etc.), various user personas – 20+ open source releases since 2006 – Free and open source. • KNIME Server – 14 commercial product releases since 2008 • KNIME cloud offerings
  • 4. © 2018 KNIME AG. All Rights Reserved. 4 The KNIME® Analytics Platform
  • 5. © 2018 KNIME AG. All Rights Reserved. 5 Analysis & Mining Statistics, Machine Learning, Data Mining, Web Analytics, Text Mining, Network Analysis, Social Media Analysis, R, Weka, Python, Community / 3rd party, ... Data Access MySQL, Oracle, ... SAS, SPSS, ... Excel, Flat, ... Hive, Impala, ... XML, JSON, PMML Text, Doc, Image, ... Web Crawlers, Industry Specific, Community / 3rd party ... Transformation Row, Column, Matrix Text, Image, Networks, Time Series, Java, Python, Community / 3rd party, ... Visualization R, Python, JFreeChart, JavaScript, Community / 3rd party, ... Deployment via BIRT PMML, XML, JSON Databases, Excel, Flat, etc. Text, Doc, Image Industry Specific Community / 3rd party, ... Over 2000 native and embedded nodes included: Big Data Hive, Impala, HDFS Vertica, Teradata/Aster, Spark, MLlib, Community / 3rd party, ...
  • 6. © 2018 KNIME AG. All Rights Reserved. 6 Free E-Learning Course: Web Page 6 • Hands-on e-learning course • Data Access, ETL, Analytics, Control Structures, Visualization • Around 50 small units • … with exercises • … and with solutions on the EXAMPLES server • Final exercises to test your knowledge! https://www.knime.org/knime- introductory-course
  • 7. © 2018 KNIME AG. All Rights Reserved. 7 KNIME Products Overview KNIME® Analytics Platform Open Source Extensions Community & Partner Extensions Chem- & Bioinf, Data Providers, Signal Processing, ... R & Python, Big Data, Deep Learning Text Processing, Image Analysis, High Speed ML, ... Deployment: - to Applications - to Humans Collaboration: - Compliance - Best Practices - Sharing Expertise Automation: - Scheduling - (Model) Management KNIME® Server - on Premise - in the Cloud
  • 8. © 2018 KNIME AG. All Rights Reserved. 8 KNIME Server Shared Repositories Access Management Web Enablement Flexible Execution
  • 9. 9© 2018 KNIME AG. All Rights Reserved. Processing HTS Data with KNME
  • 10. © 2018 KNIME AG. All Rights Reserved. 10 Background • The problem: Processing a hit list from a high- throughput phenotypic screen for malaria. – Clean up the hit list – Suggest compounds to be sent to a validation assay • Data source: 2014 Teach-Discover-Treat challenge http://www.tdtproject.org/challenge-1---malaria- hts.html • Additional info: – https://github.com/sriniker/TDT-tutorial-2014 – Riniker et al. https://f1000research.com/articles/6-1136/v2
  • 11. © 2018 KNIME AG. All Rights Reserved. 11 Approach we’ll take: cleanup • Remove ”ugly” molecules: – PAINS filters1,2: containing substructures that are likely to interfere with/have interfered with the assay. – ”Rapid elimination of swill” (REOS)3: Too big, complicated or greasy. • Don’t want to apply these filters mindlessly, so we should always look at the results and allow manual rescue 1. Baell, J. B. & Holloway, G. A. J. Med. Chem. 53, 2719–40 (2010). 2. http://rdkit.blogspot.ch/2015/08/curating-pains-filters.html 3. Walters, W. P. & Namchuk, M. Nat. Rev. Drug Discov. 2, 259–66 (2003).
  • 12. © 2018 KNIME AG. All Rights Reserved. 12 Approach we’ll take: selection for validation • We want good coverage of the chemical space of the HTS actives, but would ideally also like to learn something from the validation results • Approach: – Start with a diverse subset of the cleaned actives – Pick neighbors of each of these so that we have some SAR information in the results https://github.com/sriniker/TDT-tutorial-2014
  • 13. © 2018 KNIME AG. All Rights Reserved. 13 Selection example: some cluster centroids
  • 14. © 2018 KNIME AG. All Rights Reserved. 14 Selection example: the picks Cluster 1 Cluster 2
  • 15. © 2018 KNIME AG. All Rights Reserved. 15 Cleanup workflow (part 1)
  • 16. © 2018 KNIME AG. All Rights Reserved. 16 Cleanup workflow (part 1)
  • 17. © 2018 KNIME AG. All Rights Reserved. 17 Cleanup workflow (part 1)
  • 18. © 2018 KNIME AG. All Rights Reserved. 18 Cleanup workflow (part 1)
  • 19. © 2018 KNIME AG. All Rights Reserved. 19 Cleanup workflow (part 1)
  • 20. © 2018 KNIME AG. All Rights Reserved. 20 Cleanup workflow (part 1)
  • 21. © 2018 KNIME AG. All Rights Reserved. 21 Cleanup workflow (part 2)
  • 22. © 2018 KNIME AG. All Rights Reserved. 22 Cleanup workflow (part 2)
  • 23. © 2018 KNIME AG. All Rights Reserved. 23 The output
  • 24. © 2018 KNIME AG. All Rights Reserved. 24 Selection workflow
  • 25. © 2018 KNIME AG. All Rights Reserved. 25 Selection workflow
  • 26. © 2018 KNIME AG. All Rights Reserved. 26 Selection workflow
  • 27. © 2018 KNIME AG. All Rights Reserved. 27 Selection workflow
  • 28. © 2018 KNIME AG. All Rights Reserved. 28 Selection workflow
  • 29. © 2018 KNIME AG. All Rights Reserved. 29 The output
  • 30. © 2018 KNIME AG. All Rights Reserved. 30 The workflows • Download (with data) from the EXAMPLES folder in KNIME: knime://EXAMPLES/50_Applications/ 32_Hitlist_Processing …
  • 31. 31© 2018 KNIME AG. All Rights Reserved. Brief intro to the RDKit
  • 32. © 2018 KNIME AG. All Rights Reserved. 32 • Business-friendly BSD license • Runs on Linux/Mac/Windows • Commercial support available • Releases every six months • Active and engaged community • Core data structures and algorithms in C++ • Usable from Python (2 or 3), C#, or Java • Strong integration with other tools like KNIME, Jupyter, Pandas, and PostgreSQL • Pretty good documentation • Basic functionality highlights: – Chemical reactions – 2D depiction – Substructure searching – Canonical SMILES – Gasteiger-Marsili charges – Molecular standardization • 2D Functionality highlights: – RECAP and BRICS support – Multi-molecule MCS – Similarity maps – Functional group filters – Diversity picking • Supported fingerprint highlights: – Morgan/Feature Morgan (ECFP/FCFP-like) – RDKit (Daylight-like) – Atom-pairs and topological torsions – MACCS keys – Avalon • Descriptor highlights: – Hall-Kier 𝜒 and 𝜅 descriptors – SLogP, SMR, TPSA – MQN – “MOE-like” VSA – Compositional (number of donors, number of rings, number of heterocycles, etc.) • 3D Functionality highlights: – 2D->3D conversion/conformational analysis via distance geometry – UFF and MMFF94/MMFF94S implementations for cleaning up structures – Feature maps and feature-map vectors – Shape-based similarity – RMSD-based molecule-molecule alignment – Open3DAlign implementation – Integration with PyMOL – Torsion Fingerprint Differences The RDKit: An open-source toolkit for cheminformatics www.rdkit.org
  • 33. © 2018 KNIME AG. All Rights Reserved. 33 The RDKit code ecosystem C++ : Core data structures and algorithms PostgreSQL Boost.Python SWIG Python Java C# Jupyter Pandas KNIME The exact same implementation is available in all endpoints
  • 34. © 2018 KNIME AG. All Rights Reserved. 34 The RDKit and KNIME 34 34 • Open-source wrappers for KNIME maintained by NIBR and the open-source community • Useful for: • Descriptor calculation • Cleaning structures • Canonical SMILES and InChi conversion • Fingerprints • Scaffolds/substructures • Reaction simulation • Conformation generation • and more… www.rdkit.org
  • 35. © 2018 KNIME AG. All Rights Reserved. 35 “Demo” 1: finding the scaffold for a set of compounds knime://EXAMPLES/99_Community/03_RDKit/06_Find_Scaffolds_And_Sidechains
  • 36. © 2018 KNIME AG. All Rights Reserved. 36 “Demo” 1: finding the scaffold for a set of compounds
  • 37. © 2018 KNIME AG. All Rights Reserved. 37 “Demo” 2: library enumeration knime://EXAMPLES/99_Community/03_RDKit/02_Reaction_Enumeration
  • 38. © 2018 KNIME AG. All Rights Reserved. 38 “Demo” 2: library enumeration knime://EXAMPLES/99_Community/03_RDKit/02_Reaction_Enumeration
  • 39. © 2018 KNIME AG. All Rights Reserved. 39 “Demo” 2: library enumeration results knime://EXAMPLES/99_Community/03_RDKit/02_Reaction_Enumeration
  • 40. © 2018 KNIME AG. All Rights Reserved. 40 “Demo” 3: key compound from a patent knime://EXAMPLES/50_Applications/29_Patent_Network_Analysis/Tarceva_neighbor_network_-_From_SureChEMBL
  • 41. © 2018 KNIME AG. All Rights Reserved. 41 “Demo” 3: key compound from a patent knime://EXAMPLES/50_Applications/29_Patent_Network_Analysis/Tarceva_neighbor_network_-_From_SureChEMBL Read structures from the Tarceva patent (exported from SureChEMBL)
  • 42. © 2018 KNIME AG. All Rights Reserved. 42 knime://EXAMPLES/50_Applications/29_Patent_Network_Analysis/Tarceva_neighbor_network_-_From_SureChEMBL “Demo” 3: key compound from a patent
  • 43. © 2018 KNIME AG. All Rights Reserved. 43 knime://EXAMPLES/50_Applications/29_Patent_Network_Analysis/Tarceva_neighbor_network_-_From_SureChEMBL “Demo” 3: key compound from a patent Build network by connecting similar molecules
  • 44. © 2018 KNIME AG. All Rights Reserved. 44 knime://EXAMPLES/50_Applications/29_Patent_Network_Analysis/Tarceva_neighbor_network_-_From_SureChEMBL “Demo” 3: key compound from a patent
  • 45. © 2018 KNIME AG. All Rights Reserved. 45 knime://EXAMPLES/50_Applications/29_Patent_Network_Analysis/Tarceva_neighbor_network_-_From_SureChEMBL “Demo” 3: key compound from a patent That’s Tarceva
  • 46. 46© 2018 KNIME AG. All Rights Reserved. Wrapping up The workflows and data used in this presentation can all be downloaded from the EXAMPLES folder in KNIME in the folder: knime://EXAMPLES/50_Applications/32_Hitlist_Processing
  • 47. © 2018 KNIME AG. All Rights Reserved. 47 KNIME Spring Summit 2018 March 5 – 9 at Hotel Berlin, Berlin in Germany • Monday & Tuesday: One and two-day courses – From Basics to Big Data and Text Processing as well as Advanced Analytics • Wednesday & Thursday: Summit sessions • Friday: Workshops Registration at www.KNIME.com