This document provides an overview and installation instructions for machine learning basics using various tools and libraries. It discusses installing and setting up Orange, KNIME, Anaconda, and related Python libraries. Key steps include downloading installers, setting paths, defining workspaces, installing extensions, and creating workflows in Orange and KNIME. Popular cheminformatics and deep learning libraries supported include RDKit, DeepChem, numpy, and scikit-learn.
12. Install Extensions
12
Extensions
Add - link or archives
Copy jar files to “dropins”
Install from link
Experimental Partners
Trusted Partners
Version specific
13. Create a Workflow
13
KNIME workflows are stored within the
workspace
Workflow group could be created under
the workspace
Workflow could be imported to the
workspace
Public KNIME server and KNIME hub has
example workflows of several extensions
14. KNIME Interface
14
Recommended to
own a KNIME hub
account (free)
Workflow editor has
notes option
External binaries
could be integrated
Python scripts could
be integrated via
conda
21. Selected Libraries
21
DeepChem
Open-source toolchain for deep learning in the drug discovery, quantum
chemistry, and other life sciences
RDKit
General-purpose machine learning and cheminformatics software written in C++
and Python. Some of the functionality includes reading and writing molecules,
substructure searching, chemical transformations, or molecular similarity.
CACTVS
Cactvs is a universal, scriptable cheminformatics toolkit, with a large collection of
modules for property computation, chemistry data file I/O and other tasks
Cinfony
API to several cheminformatics toolkits (Open Babel, RDKit, the CDK, Indigo,
JChem, OPSIN and cheminformatics webservices
MDAnalysis
An object-oriented library to analyze trajectories from molecular dynamics (MD)
simulations in many popular formats
MDTraj
Package for manipulating molecular dynamics trajectories with support for
multiple formats
nglView A Jupyter widget to interactively view molecular structures and trajectories.
22. Selected Libraries
22
numpy
A library for the Python programming language, adding support for large,
multi-dimensional arrays and matrices, along with a large collection of high-level
mathematical functions to operate on these arrays
matplotlib
A plotting library for the Python programming language and its numerical
mathematics extension NumPy. It provides an object-oriented API for embedding
plots into applications using general-purpose GUI toolkits
scikit
A free software machine learning library for the Python programming language. It
features various classification, regression and clustering algorithms including
support vector machines
GromacsWrapper
A Python package (Python 2.7.x and Python > 3.4) that wraps system calls to
Gromacs tools into thin classes. This allows for fairly seamless integration of the
gromacs tools into Python scripts.
Seaborn
A Python data visualization library based on matplotlib. It provides a high-level
interface for drawing attractive and informative statistical graphics