Bio inspiring computing and its application in cheminformatics

Bio-inspired Computing and its
Application in Cheminformatics
B Y
A B D E L A Z I M G A L A L H U S S I E N
D E M O N S T R A T O R A T F A C U L T Y O F S C I E N C E , F A Y O U M U N I V E R S I T Y
Professor Mohamed Amin
Faculty of Science
Menofiya University
Professor Aboul Ella Hassanien
Faculty of Computers and Information
Cairo University
Supervisoion by

Agenda
Cheminformatics
• Introduction.
• Representation.
• Molecular descriptors.
Bio-Inspiring
• Problems
• Algorithms
• Ant Colony Optimization
Bioinspiring and
Cheminformatics
• Classification
• Clustering
• Feature Selection
Application
• Drug Discovery
• Drug Design

Cheminformatics
Chemoinformatics is concerned with the
application of computational methods to
tackle chemical problems, with particular
emphasis on the manipulation of chemical
structural information.
The term was introduced in the late 1990s.
there is not even any universal agreement on
the correct spelling:
Cheminformatics.
chemical informatics.
Chemiinformatics.
Chemoinformatics.

Cheminformatics
• Cheminformatics is the use of computer and
informational techniques applied to a range of
problems in the field of Chemistry.
• Cheminformatics strategies are useful in drug
discovery and other efforts where large numbers
of compounds are being evaluated for specific
properties.
• Cheminformatics is also known as
multidisciplinary science as it combines
Chemistry, Biology, Mathematics, Biochemistry,
Statistics and informatics.

Problems in Cheminformatics
• Storing data generated through experiments or from molecular
simulation Retrieval of chemical
• Structures from chemical database (Software libraries).
• Prediction of physical, chemical and biological properties of chemical
compounds.
• Elucidation of the structure of a compound based on spectroscopic
data.
• Structure, Substructure, Similarity and diversity searching from
chemical database.
• Docking - Interaction between two macromolecules.
• Drug Discovery
• Molecular Science, Materials Science, Food Science (nutraceuticals),
Atmospheric chemistry, Polymer chemistry, Textile Industry,
Combinatorial organic synthesis (COS).

Representation of Chemical Structures
•Chemical structures are usually stored
in a computer as molecular graphs.
Graph theory is a well-established area
of mathematics that has found
application not just in chemistry but in
many other areas, such as computer
science.
nodes = atoms
edges = bonds
The nodes and edges may have
properties associated with them.
SMILES
Connection Table

Connection Table
The simplest type of connection table consists of two sections:
A) List of the atomic numbers of the atoms in the molecule
B) List of the bonds, specified as pairs of bonded atoms.
hydrogen atoms may be implied in which case the connection
table is hydrogen suppressed.

SMILES
• SMILES stands for Simplified Molecular
Input Line Entry Specification.
• In SMILES, atoms are represented by
their atomic symbol.
• Upper case symbols are used for
aliphatic atoms and lower case for
aromatic atoms.
• Double bonds are written using “=”
and triple bonds using “#”

Morgan algorithm
• There may be many different ways to construct the connection table or
the SMILES string for a given molecule.
• each atom is assigned a connectivity value equal to the number of
connected atoms. In the second and subsequent iterations a new
connectivity value is calculated.

Screening Methods
• Molecule screens are often
implemented using binary string
representations of the molecules
and the query substructure called
bitstrings. Bitstrings consist of a
sequence of “0”s and “1s”. They
are the “natural currency” of
computers and so can be
compared and manipulated very
rapidly, especially if held in the
computer’s memory. A “1” in a
bitstring usually indicates the
presence of a particular structural
feature and a “0” its absence.

Structure Searching
• Graph theoretic methods can be used to perform substructure searching,
which is equivalent to determining whether one graph is entirely
contained within another, a problem known as subgraph isomorphism.

Molecular Descriptors
• The manipulation and analysis of chemical structural information is made possible through
the use of molecular descriptors.
• These are numerical values that characterise properties of molecules.
• The molecular descriptor is the final result of a logic and mathematical procedure which
transforms chemical information encoded within a symbolic representation of a molecule
into an useful number or the result of some standardized experiment.
• Examples:
• The descriptors fall into Four classes .
 Topological.
 Geometrical.
 Electronic .
 Hybrid or 3D Descriptors.

Computational Models
• Most molecular discoveries today are the result of an iterative, three-
phase cycle of design, synthesis and test. Analysis of the results from one
iteration provides information and knowledge that enables the next cycle
to be initiated and further improvements to be achieved.
• A common feature of this analysis stage is the construction of some form
of model which enables the observed activity or properties to be related
to the molecular structure.
• Examples:
 Quantitative Structure-Activity Relationships (QSARs)
 Quantitative Structure–Property Relationships (QSPRs)

Quantitative Structure-Activity Relationships
(QSARs)
QSAR is a mathematical relationship between a biological activity of a
molecular system and its geometric and chemical characteristics.
A general formula for a quantitative structure-activity relationship
(QSAR) can be given by the following:
activity = f (molecular or fragmental properties)
QSAR attempts to find consistent relationship between biological activity
and molecular properties, so that these “rules” can be used to evaluate
the activity of new compounds.

Quantitative Structure-Activity
Relationships (QSARs) (Cont.)

QSAR
Compounds + biological activity
New compounds with
improved biological activity
QSAR

Agenda
Cheminformatics
• Introduction.
• Representation.
Bio-Inspiring
• Problems
• Algorithms
• Ant Colony Optimization
Bioinspiring and
Cheminformatics
• Classification
• Clustering
Application
• Drug Discovery
• Drug Design
Thesis statement
• what’s I aim to achieve

Bio-Inspired Computing
Finding the best solution
increasingly becomes very difficult
to identify, if not impossible, due to
the very large and dynamic scope of
solutions and complexity of
computations. Often, the optimal
solution for such a NP hard problem
is a point in the n-dimensional
hyperspace and identifying the
solution is computationally very
expensive or even not feasible in
limited time.

Bio-Inspired Computing
21
• The computing inspired from biology is a field of study
based on the social behavior of animals, insects and other
living organisms, including also connectionism and
emergence.
• Bio-inspired computing uses computers to model nature and
the study of nature to improve the usage of computers.
Biological
computation
Artificial
Intelligence
Bio-inspired
computing

Motivation
 Dealing too complex problems
Incapable to solve by human proposed solution
Absence of complete mathematical model
 Existing of similar problem in nature
Adaptation
Self-organization
Communication
Optimization

Bio-inspired computing Methods:
24
Some areas of bio-inspired computing are:
• neural networks
• genetic algorithm
• particle swarm
• ant colony optimization
• artificial bee colony
• bacterial foraging
• cuckoo search
• Firefly
• leaping frog
• bat algorithm
• flower pollination
• artificial plant optimization

Swarm Intelligence
• The SI-based algorithms belong to a wider class of the algorithms, called
the bio-inspired algorithms.
• we can observe that SI-based ⊂ bio-inspired ⊂ nature-inspired.

Swarm Intelligence
• Population of simple agents
• Decentralized
• Self-Organized
• No or local communication
• Example
 Ant/Bee colonies
 Bird flocking
 Fish schooling

Ant Colony Optimization
• mimic the foraging behavior of
social ants.
• Ants primarily use pheromone as
a chemical messenger.
• pheromone concentration can be
considered as the indicator of
quality solutions to a problem of
interest.
• The movement of an ant is
controlled by pheromone, which
will evaporate over time.
• the probability of ants at a
particular node i to choose the
route from node i to node j is
given by

Agenda
Cheminformatics
• Introduction.
• Representation.
BioInspiring
• Cheminformatics
• Molecular Descriptors
• Similarity
Bioinspiring and
Cheminformatics
• Classification
• Clustering
Application
• Drug Discovery
• Drug Design
Thesis statement
• what’s I aim to achieve

Bio-Inspiring in Cheminformatics
Bio-Inspiring has many application in the field of Cheminformatics:
 Classification: is a general process related to categorization, the process
in which molecules are differentiated and understood.
 Clustering: is the task of grouping a set of objects in such a way that
objects in such a way that objects in the same group (called a cluster)
are more similar to each other than those in other groups (clusters).
 Feature Selection: is a process that chooses an optimal subset of
features according to a certain criterion.

Classification
• In machine learning and statistics, classification is the problem of
identifying to which of a set of categories (sub-populations) a
new observation belongs, on the basis of a training set of data containing
observations (or instances) whose category membership is known.

Clustering
• Clustering is the process of partitioning
a usually large dataset into groups (or
clusters), according to a similarity (or
dissimilarity) measure.
• If we assume that we have a dataset X,
defined as X = x1, x2, x3, . . ., which
consists of all the data that we want to
place into clusters, then we define a
clustering of X in m clusters C1, ..., Cm,
in such a way that the following
conditions apply:

Feature Selection
• Why we need FS?
 To improve performance (in terms of speed, predictive power,
simplicity of the model).
 to visualize the data for model selection.
 To reduce dimensionality and remove noise.
• Prespectives:
– searching for the best subset of features.
– criteria for evaluating different subsets.
– principle for selection, adding, removing or changing new features
during the search.

Application (I) Drug Design
• Drug design, often referred
to as rational drug design or
simply rational design, is the
inventive process of finding
new medications based on
the knowledge of a biological
target.
• The drug is most commonly
an organic small molecule
that activates or inhibits the
function of a biomolecule
such as a protein.

Application (II) Drug Discovery

Cheminformatics and Bioinformatics
in Drug Design

Literature Review
• Joerg Kurt Wegner, Aaron Sterling, Rajarshi Guha, Andreas Bender in their
survey “ Cheminformatics ” introduce a comprehensive introduction to the
field of cheminformatics and Roberto Todeschini and Viviana Consonni in
their book molecular descriptors combine a huge number of descriptors. All
new descriptors, QSAR approaches and chemometric strategies proposed
since 2000 have been included in this handbook.
• Aboul Ella Hassnien and Eid Elamry introduce “Swarm Intelligence Methods
and Concepts ”.

Literature Review
 Gerald M. Maggiora and Veerabahu Shanmugasundaram in the
“Molecular Similarity Measures ” introduce a survey on getting
similarity between 2 graph and they try to solve Maximum subgraph
matching.
 Arpan Kumar Kar introduce a bio-inspired review .

Thesis Statement
Title:
Bio-Inspiring Computing and its Application in Cheminformatics
Aim:
 Try to cluster Molecular using spectral clustering.
 Try to find similarity between molecules.

References
1. Andrew R. Leach and Valerie J. Gillet, “An Introduction to Chemoinformatics” Springer 2007.
2. Roberto Todeschini and Viviana Consonni ,“Molecular Descriptors for Cheminformatics” ,WILEY-VCH
May,2009.
3. Christina Chrysouli, Anastasios Tefa, “Spectral clustering and semi-supervised learning using evolvingsimilarity
graphs”, Applied Soft Computing,
4. U. Luxburg, A tutorial on spectral clustering, Stat. Comput. 17 (4) (2007)395–416
5. R. Dutt , A. K. Madan , “Predicting biological activity: Computational approach using novel distance based
molecular descriptors”, Computers in Biology and Medicine,2012.
6. Yang, X.S., Cui, Z.,Xias, R., Gandomi, A.H. and Karamanoglu, M. eds., 2013. Swarm intelligence and bio-inspired
computation: theory and applications. Newnes
7. Kar, Arpan Kumar. "Bio inspired computing–A review of algorithms and scope of applications." Expert Systems
with Applications 59 (2016): 20-32.
8. Emmert-Streib, Frank, Matthias Dehmer and Yongtang Shi. “Fifty years of graph matching, network alignment and
network comparison.” Inf. Sci. 346-347 (2016): 180-197.
9. Oduguwa, Abiola, Ashutosh Tiwari, Rajkumar Roy, and Conrad Bessant. "An overview of soft computing
techniques used in the drug discovery process." In Applied Soft Computing Technologies: The Challenge of
Complexity, pp. 465-480. Springer Berlin Heidelberg, 2006.
10. Maggiora, G.M. and Shanmugasundaram, V., 2004. Molecular similarity measures. Chemoinformatics: Concepts,
Methods, and Tools for Drug Discovery, pp.1-50.

Bio inspiring computing and its application in cheminformatics

Bio inspiring computing and its application in cheminformatics

More Related Content

What's hot

Viewers also liked

Similar to Bio inspiring computing and its application in cheminformatics

Recently uploaded

Bio inspiring computing and its application in cheminformatics

Editor's Notes