Disintegration of the small world property with increasing diversity of chemical libraries

Network Measures
 Molecules are represented as nodes and some similarity or dissimilarity measure is used to define the
connections between them.
 Degree k: The number of connections (links) between a node and other nodes.
 Degree distribution P(k): The probability that a selected node has exactly k links.
― count the number of nodes with k = 1,2... links and divide by the total number of nodes.
― allows us to distinguish between different classes of networks.
 In random (Erdӧs-Renyi) networks the node degrees follow a Poisson distribution; most nodes have
approximately the same number of links (close to the average degree).
 The tail of the degree distribution decreases exponentially P(k) ~ exp(-k); nodes that significantly deviate
from the average are extremely rare.
 For scale-free networks, the probability that a node has k links follows a power-law P(k) ~ k-. Such
networks are often characterized by a small number of highly connected nodes (hubs). A scale-free
networks’ degree distribution is a straight line on a log–log plot.
 Clustering coefficient: CI = 2nI/k(k-1), where nI is the number of links connecting the k neighbors of
node I to each other.
― CI is the number of triangles that go through node I divided by the total number of possible triangles
that can pass through node I.
― C(k) is the average clustering coefficient of all nodes with k links. This property measures the
cliquishness of the network.
 The average path length or diameter is the shortest path connecting a pair of nodes averaged over all pairs
of nodes in the network.
 For Erdӧs-Renyi random networks, the average path length is proportional to the logarithm of the network
size: this is the small-world property.
 Scale-free networks are ultra-small; the average path length follows ℓ ~ log (log N)
 Existence of the small-world property in a network can be characterized by the following metrics:
 C(G) ≫ CE(G), where C(G) = Average Clustering coefficient of a network G and CE(G) = Average
Clustering coefficient of the Erdӧs-Renyi random network constructed from the same nodes at equivalent
edge density,
 Average Path Length L(G) < LE(G), where LE(G)= Average Path Length of the corresponding Erdӧs-Renyi
random network constructed from the same nodes at equivalent edge density,
 Average Path Length L(G) ∝ Log N (not relevant for this study).
 Degree assortativity is the correlation coefficient between the degrees
of connected nodes:
 where Aij is an element of the adjacency matrix of the network, ki and kj are the degrees of node i and j,
respectively; n and m are the order (total number of nodes) and the size (total number of edges) in the
network.
 Degree assortativity provides information about the nodes of high degree connecting nodes of high
degree and nodes of low degree connecting nodes of low degree; negative values of degree assortativity
represent nodes of high degree connecting to low degree nodes in a dissortative network.
 Modularity reflects the community structures of a network:
 Modularity ranges between -1 and 1. A network with more community structure reflects high modularity.
Disintegration of the small world property with
increasing diversity of chemical libraries
Ganesh Prabhu1, Sudeepto Bhattacharya2,3, Michael Krein4, and N. Sukumar1,2
1Department of Chemistry, Shiv Nadar University, Dadri, Gautam Buddha Nagar, UP, India
2Center for Informatics, Shiv Nadar University, Dadri, Gautam Buddha Nagar, UP, India
3Department of Mathematics, Shiv Nadar University, Dadri, Gautam Buddha Nagar, UP, India
4Rensselaer Exploratory Center for Cheminformatics Research, Rensselaer Polytechnic Institute, Troy, NY 12180
Conclusions:
Abstract:
In recent years factors such as solubility, permeability, polymorphism, cytotoxicity, mutation and drug resistance have forced chemists to broaden the spectrum
of new chemical entities for drug design. Diversity-oriented synthesis (DOS) has emerged as a popular strategy to synthesize molecular libraries possessing
structural complexity as well as skeletal and stereochemical diversity, targeting various dimensions of chemical and biological space. Topological properties of
chemical library graphs, such as the average clustering co-efficient, average path length and existence of hubs, are useful in identifying the presence or absence
of the small world property. The focus of the present study was to analyze the existence and disintegration of small world behaviour in graphs created on the
basis of similarity and dissimilarity scales. We generated similarity and dissimilarity graphs from diverse chemical libraries at various Tanimoto similarity
coefficients (tc) using FP2 and MACCS fingerprints. In comparison to Erdos-Reyni random graph, the dissimilarity graphs exhibited low average clustering co-
efficient and low average path length. The fitting of cumulative distribution function verses the degree of the dissimilarity graphs have demonstrated
exponential, log-normal and also possible power-law distribution statics validated using Kolmogorov-Smirnov’s (KS) goodness of fit. The similarity graphs at high
tc threshold show high clustering co-efficient and similar average path length as those of Erdӧs-Reyni graphs, without any sign of hubs. The dissimilarity graphs
at very low tc threshold displayed loss of small world character, as evidenced by low average clustering co-efficient and low average path length in comparison to
Erdӧs-Reyni random graphs. Graph theory and statistical metrics thus represent a simple and efficient approach to analyze the diversity in a chemical library.
Motivation
 The objective of the present work is to
determine whether different chemical
libraries exhibit different signatures in
terms of their network topologies.
 Another goal of this work is to better
understand the impact of library
diversity: as to whether or not such
small-world and scale-free properties
are common to certain classes of
chemical libraries.
 Similarity networks constructed on
large compound collections using
different sets of descriptors have
revealed some common features such
as the small-world property and scale-
free degree distributions:
Michael P. Krein, N. Sukumar, Exploration of
the topology of chemical spaces with network
measures. J. Phys. Chem. A 115(45),
2011,12905–12918.
R. W. Benz, J. Swamidass, P. Baldi, Discovery
of Power-Laws in Chemical Space. J. Chem.
Inf. Model. 2008, 48, 1138-1151.
Network properties of DOS libraries (N=118, 32, 41) and Focussed library (FL, N=41)
at various dissimilarity and similarity thresholds using MACCS fingerprints
Degree assortativity and Modularity of various threshold networks from DOS libraries (N=118, 41, 32) and
Focussed library (FL, N=41) generated using FP2 and MACCS fingerprints and their corresponding ERN(Erdӧs-Renyi
random network) constructed from the same nodes at equivalent edge density.
Datasets & Computational Methodology
• FP2 and MACCS fingerprints were
computed for DOS (Diversity-Oriented
Synthesis), NPDB (ZINC Natural
Products Database) and focused
(antimalarial) chemical libraries.
Mamidala, R., et al., Pyrrolidine and piperidine
based chiral spiro and fused scaffolds via
build/couple/pair approach. RSC Advances,
2014. 4(21): p. 10619.
Irwin, J.J., et al., ZINC: a free tool to discover
chemistry for biology. J Chem Inf Model,
2012. 52(7): p. 1757-68.
Cruz-Monteagudo, M., et al., Computational
modeling tools for the design of potent
antimalarial bisbenzamidines: overcoming the
antimalarial potential of pentamidine. Bioorg
Med Chem, 2007. 15(15): p. 5322-39.
• Edgelists based on Tanimotosimilarity
coefficients (tc) of the fingerprints
were generated and transformed to
networks in R-studio using the ‘igraph’
package.
• Similarity and dissimilarity networks
were generated at various thresholds.
 Dissimilarity networks from DOS and focussed libraries using FP2 and
MACCS fingerprints demonstrate disintegration of small world
behaviour.
 Hubs were detected in both similarity and dissimilarity networks.
 Hubs in dissimilarity networks show dissortative behaviour, whereas
hubs in similarity networks show assortative behaviour.
Assortative and dissortative behaviour in DOS libraries
Combined graph DOS118_M_tc ≤ 0.2 ≥ 0.9 generated from MACCS fingerprints showing dissortative
(red edges) and assortative hubs (blue edges)
 The diversity of a chemical library assessed by
combining similarity and dissimilarity threshold graphs.
 High average clustering coefficient, assortativity and
high modularity of the network are hallmarks of a high
similarity chemical library.
 Low average clustering coefficient, dissortativity and low
modularity of the network are the signature of a highly
diverse chemical library.
 The combined graph here exhibits C(G) >> CE(G) and
L(G) = LE(G)
Dissimilarity Network-MACCS C(G) L(G) C(G) > CE(G) C(G) ≫ CE(G) L(G) < LE(G) Avg. degree No. of edges D(G)
DOS118_M_tc < 0.2 0 1.81
N N Y
2 28 0.004
ERN(118, 0.0045) 0 2.11 0.52 31 0.0045
DOS41_M_tc < 0.2 0 1.66
N N N
1.66 5 0.0061
ERN(41, 0.0061) 0 1.4 0.3 6 0.0097
DOS32_M_tc < 0.2 0 1.86
N N N
1.66 5 0.01
ERN(32, 0.01) 0 1.25 0.19 3 0.006
FL41_M_tc < 0.3 0 1.92
N N Y
3.64 31 0.038
ERN(41, 0.038) 0 2.63 1.27 26 0.032
Similarity Networks-MACCS
DOS118_M_tc > 0.9 0.67 1.1
Y Y Y
1.6 28 0.004
ERN(118, 4-E03) 0 2.11 0.52 31 0.0045
DOS41_M_tc > 0.8 0.5 1.37
Y Y Y
1.21 14 0.02
ERN(41, 0.02) 0 3.3 0.97 20 0.024
DOS32_M_tc > 0.8 0.75 1.16
Y Y Y
1.54 10 0.02
ERN(32, 0.02) 0 1.38 0.56 9 0.018
FL41_M_tc > 0.98 1 1
Y Y Y
4.9 22 0.026
ERN(41, 0.026) 0 1.77 0.78 16 0.019
Dissimilarity Networks-FP2
Assortativity
degree
( r )
Modularity
( Q and QE )
D(G)
Dissimilarity Network-
MACCS
Assortativity
degree
( r )
Modularity
( Q and QE )
D(G)
DOS118_tc < 0.22 -0.158 0.64 0.0016 tc < 0.2 -0.53 0.47 4.05 E-03
ERN(118, 0.0016) -0.166 0.81 0.001 ERN(118, 4.5-E03) 0.2 0.93 4.5 E-03
DOS41_tc < 0.37 -0.53 0.15 0.017 tc < 0.2 -1 0 6.1-E03
ERN(41, 0.017) -0.41 0.7 0.018 ERN(41, 0.0061) -0.61 0.61 9.7-E03
DOS32_tc < 0.4 -0.58 0.18 0.11 tc < 0.2 -0.74 0.22 0.01
ERN(32, 0.11) -0.23 0.4 0.1 ERN(32, 0.01) -0.5 0.44 0.006
FL41_tc < 0.5 -0.668 0.062 0.013 tc < 0.3 -0.72 0.087 0.038
ERN(41, 0.013) -0.095 0.58 0.017 ERN(41, 0.038) -0.03 0.66 0.032
Similarity Networks-FP2 Similarity Networks-MACCS
DOS118_tc > 0.95 0.70 0.95 0.004 tc > 0.9 0.8 0.85 4.0-E03
ERN(118, 0.004) 0.2 0.88 0.003 ERN(118, 4-E03) 0.2 0.88 4.5-E03
DOS41_tc > 0.95 1 0.73 0.012 tc > 0.8 0.69 0.82 0.02
ERN(41, 0.012) 0.3 0.73 0.015 ERN(41, 0.02) 0.14 0.74 0.024
DOS32_tc > 0.8 0.59 0.77 0.06 tc > 0.8 0.35 0.72 0.02
ERN(32, 0.06) -0.09 0.46 0.07 ERN(32, 0.02) -0.53 0.64 0.018
FL41_tc > 0.995 1 0.28 0.022 tc > 0.98 1 0.2 0.026
ERN(41, 0.022) 0.144 0.74 0.04 ERN(41, 0.026) -0.19 0.73 0.019
Fitting cumulative distribution
functions of the dissimilarity
networks versus degree
Similarity and dissimilarity networks
Fruchterman Reingold layout (force field directed layout) used to visualize the undirected
threshold networks. Red and blue edges represent dissimilarity and similarity edges.
Networks with dissimilarity edges (a-b) show the absence of cliquishness, loss of small-
world character, presence of hubs (star subnetworks) with limited communities and links of
high degree nodes to low degree nodes reflecting dissortativity. Networks with similarity
edges (c-d) show the presence of cliques reflecting small-worldness with high modularity.
a
b
c
d

Disintegration of the small world property with increasing diversity of chemical libraries

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Similar to Disintegration of the small world property with increasing diversity of chemical libraries

Similar to Disintegration of the small world property with increasing diversity of chemical libraries (20)

Recently uploaded

Recently uploaded (20)

Disintegration of the small world property with increasing diversity of chemical libraries