This document proposes using spectral clustering based on the normalized graph Laplacian spectrum to solve problems in community detection and handwritten digit recognition. It summarizes the key concepts in graph signal processing and introduces spectral clustering. The paper provides a mathematical proof that the signs of the second eigenvector components of the normalized graph Laplacian can accurately partition a graph into two communities. It then applies this spectral clustering method to community detection and digit recognition, comparing results to other popular algorithms to demonstrate the advantages of the spectral clustering approach.
Exact network reconstruction from consensus signals and one eigen valueIJCNCJournal
The basic inverse problem in spectral graph theory consists in determining the graph given its eigenvalue
spectrum. In this paper, we are interested in a network of technological agents whose graph is unknown,
communicating by means of a consensus protocol. Recently, the use of artificial noise added to consensus
signals has been proposed to reconstruct the unknown graph, although errors are possible. On the other
hand, some methodologies have been devised to estimate the eigenvalue spectrum, but noise could interfere
with the elaborations. We combine these two techniques in order to simplify calculations and avoid
topological reconstruction errors, using only one eigenvalue. Moreover, we use an high frequency noise to
reconstruct the network, thus it is easy to filter the control signals after the graph identification. Numerical
simulations of several topologies show an exact and robust reconstruction of the graphs.
JAVA BASED VISUALIZATION AND ANIMATION FOR TEACHING THE DIJKSTRA SHORTEST PAT...ijseajournal
Shortest path (SP) algorithms, such as the popular Dijkstra algorithm has been considered as the “basic
building blocks” for many advanced transportation network models. Dijkstra algorithm will find the
shortest time (ST) and the corresponding SP to travel from a source node to a destination node.
Applications of SP algorithms include real-time GPS and the Frank-Wolfe network equilibrium.
For transportation engineering students, the Dijkstra algorithm is not easily understood. This paper
discusses the design and development of a software that will help the students to fully understand the key
components involved in the Dijkstra SP algorithm. The software presents an intuitive interface for
generating transportation network nodes/links, and how the SP can be updated in each iteration. The
software provides multiple visual representations of colour mapping and tabular display. The software can
be executed in each single step or in continuous run, making it easy for students to understand the Dijkstra
algorithm. Voice narratives in different languages (English, Chinese and Spanish) are available.A demo
video of the Dijkstra Algorithm’s animation and result can be viewed online from any web browser using
the website: http://www.lions.odu.edu/~imako001/dijkstra/demo/index.html.
Prepared as a conference tutorial, MIC-Electrical, Athens, Greece, 5th April 2014, updated and delivered again in Beijing, China, 27 January 2015 to students from Complex Systems Group, CSRC and Dept. of Engineering Physics, Tsinghua University
Exact network reconstruction from consensus signals and one eigen valueIJCNCJournal
The basic inverse problem in spectral graph theory consists in determining the graph given its eigenvalue
spectrum. In this paper, we are interested in a network of technological agents whose graph is unknown,
communicating by means of a consensus protocol. Recently, the use of artificial noise added to consensus
signals has been proposed to reconstruct the unknown graph, although errors are possible. On the other
hand, some methodologies have been devised to estimate the eigenvalue spectrum, but noise could interfere
with the elaborations. We combine these two techniques in order to simplify calculations and avoid
topological reconstruction errors, using only one eigenvalue. Moreover, we use an high frequency noise to
reconstruct the network, thus it is easy to filter the control signals after the graph identification. Numerical
simulations of several topologies show an exact and robust reconstruction of the graphs.
JAVA BASED VISUALIZATION AND ANIMATION FOR TEACHING THE DIJKSTRA SHORTEST PAT...ijseajournal
Shortest path (SP) algorithms, such as the popular Dijkstra algorithm has been considered as the “basic
building blocks” for many advanced transportation network models. Dijkstra algorithm will find the
shortest time (ST) and the corresponding SP to travel from a source node to a destination node.
Applications of SP algorithms include real-time GPS and the Frank-Wolfe network equilibrium.
For transportation engineering students, the Dijkstra algorithm is not easily understood. This paper
discusses the design and development of a software that will help the students to fully understand the key
components involved in the Dijkstra SP algorithm. The software presents an intuitive interface for
generating transportation network nodes/links, and how the SP can be updated in each iteration. The
software provides multiple visual representations of colour mapping and tabular display. The software can
be executed in each single step or in continuous run, making it easy for students to understand the Dijkstra
algorithm. Voice narratives in different languages (English, Chinese and Spanish) are available.A demo
video of the Dijkstra Algorithm’s animation and result can be viewed online from any web browser using
the website: http://www.lions.odu.edu/~imako001/dijkstra/demo/index.html.
Prepared as a conference tutorial, MIC-Electrical, Athens, Greece, 5th April 2014, updated and delivered again in Beijing, China, 27 January 2015 to students from Complex Systems Group, CSRC and Dept. of Engineering Physics, Tsinghua University
Content Based Image Retrieval Using Gray Level Co-Occurance Matrix with SVD a...ijcisjournal
In this paper, gray level co-occurrence matrix, gray level co-occurrence matrix with singular value
decomposition and local binary pattern are presented for content based image retrieval. Based upon the
feature vector parameters of energy, contrast, entropy and distance metrics such as Euclidean distance,
Canberra distance, Manhattan distance the retrieval efficiency, precision, and recall of the images are
calculated. The retrieval results of the proposed method are tested on Corel-1k database. The results after
being investigated shows a significant improvement in terms of average retrieval rate, average retrieval
precision and recall of different algorithms such as GLCM, GLCM & SVD, LBP with radius one and LBP
with radius two based on different distance metrics.
PERFORMANCE AND COMPLEXITY ANALYSIS OF A REDUCED ITERATIONS LLL ALGORITHMIJCNCJournal
Multiple-input multiple-output (MIMO) systems are playing an increasing and interesting role in the recent
wireless communication. The complexity and the performance of the systems are driving the different
studies and researches. Lattices Reduction techniques bring more resources to investigate the complexity
and performances of such systems.
In this paper, we look to modify a fixed complexity verity of the LLL algorithm to reduce the computation
operations by reducing the number of iterations without important performance degradation. Our proposal
shows that we can achieve a good performance results while avoiding extra iteration that doesn’t bring
much performance.
Bayes Nets Meetup Sept 29th 2016 - Bayesian Network Modelling by Marco ScutariBayes Nets meetup London
A talk given at the Bayes Nets meetup on Sept 29th 2016 by Dr Marco Scutari from the University of Oxford. Title of the talk was Bayesian Network Modelling with examples in Genetics and Systems Biology, with case studies.
Extract the ancient letters from decoratedIJERA Editor
Nowadays, large databases of ornaments of the hand-press period are available and need efficient retrieval tools
for history specialists and general users. This article deals with document images analysis. The purpose of our
work is to automatically determine the letter represented in an ornamental letter image. Our process is divided
into two parts: Wavelet transformation: Segmentation of the ornamental letter followed by a recognition step.
The segmentation process uses multi-resolution analysis to filter background decorations followed by
binarisation and morphologic reconstruction of the expected letter.
Chapter summary and solutions to end-of-chapter exercises for "Data Visualization: Principles and Practice" book by Alexandru C. Telea
In this chapter author discusses a number of popular visualization methods for vector datasets: vector glyphs, vector color-coding, displacement plots, stream objects, texture-based vector visualization, and the simplified representation of vector fields.
Section 6.5 presents stream objects, which use integral techniques to construct paths in vector fields. Section 6.7 discusses a number of strategies for simplified representation of vector datasets. Section 6.8 presents a number of illustrative visualization techniques for vector fields, which offer an alternative mechanism for simplified representation to the techniques discussed in Section 6.7 Chapter presents also feature detection methods, algorithm for computing separatrices on field’s topology, and top-down and bottom-up field decomposition methods.
A COMPARATIVE STUDY ON DISTANCE MEASURING APPROACHES FOR CLUSTERINGIJORCS
Clustering plays a vital role in the various areas of research like Data Mining, Image Retrieval, Bio-computing and many a lot. Distance measure plays an important role in clustering data points. Choosing the right distance measure for a given dataset is a biggest challenge. In this paper, we study various distance measures and their effect on different clustering. This paper surveys existing distance measures for clustering and present a comparison between them based on application domain, efficiency, benefits and drawbacks. This comparison helps the researchers to take quick decision about which distance measure to use for clustering. We conclude this work by identifying trends and challenges of research and development towards clustering.
Content Based Image Retrieval Using Gray Level Co-Occurance Matrix with SVD a...ijcisjournal
In this paper, gray level co-occurrence matrix, gray level co-occurrence matrix with singular value
decomposition and local binary pattern are presented for content based image retrieval. Based upon the
feature vector parameters of energy, contrast, entropy and distance metrics such as Euclidean distance,
Canberra distance, Manhattan distance the retrieval efficiency, precision, and recall of the images are
calculated. The retrieval results of the proposed method are tested on Corel-1k database. The results after
being investigated shows a significant improvement in terms of average retrieval rate, average retrieval
precision and recall of different algorithms such as GLCM, GLCM & SVD, LBP with radius one and LBP
with radius two based on different distance metrics.
PERFORMANCE AND COMPLEXITY ANALYSIS OF A REDUCED ITERATIONS LLL ALGORITHMIJCNCJournal
Multiple-input multiple-output (MIMO) systems are playing an increasing and interesting role in the recent
wireless communication. The complexity and the performance of the systems are driving the different
studies and researches. Lattices Reduction techniques bring more resources to investigate the complexity
and performances of such systems.
In this paper, we look to modify a fixed complexity verity of the LLL algorithm to reduce the computation
operations by reducing the number of iterations without important performance degradation. Our proposal
shows that we can achieve a good performance results while avoiding extra iteration that doesn’t bring
much performance.
Bayes Nets Meetup Sept 29th 2016 - Bayesian Network Modelling by Marco ScutariBayes Nets meetup London
A talk given at the Bayes Nets meetup on Sept 29th 2016 by Dr Marco Scutari from the University of Oxford. Title of the talk was Bayesian Network Modelling with examples in Genetics and Systems Biology, with case studies.
Extract the ancient letters from decoratedIJERA Editor
Nowadays, large databases of ornaments of the hand-press period are available and need efficient retrieval tools
for history specialists and general users. This article deals with document images analysis. The purpose of our
work is to automatically determine the letter represented in an ornamental letter image. Our process is divided
into two parts: Wavelet transformation: Segmentation of the ornamental letter followed by a recognition step.
The segmentation process uses multi-resolution analysis to filter background decorations followed by
binarisation and morphologic reconstruction of the expected letter.
Chapter summary and solutions to end-of-chapter exercises for "Data Visualization: Principles and Practice" book by Alexandru C. Telea
In this chapter author discusses a number of popular visualization methods for vector datasets: vector glyphs, vector color-coding, displacement plots, stream objects, texture-based vector visualization, and the simplified representation of vector fields.
Section 6.5 presents stream objects, which use integral techniques to construct paths in vector fields. Section 6.7 discusses a number of strategies for simplified representation of vector datasets. Section 6.8 presents a number of illustrative visualization techniques for vector fields, which offer an alternative mechanism for simplified representation to the techniques discussed in Section 6.7 Chapter presents also feature detection methods, algorithm for computing separatrices on field’s topology, and top-down and bottom-up field decomposition methods.
A COMPARATIVE STUDY ON DISTANCE MEASURING APPROACHES FOR CLUSTERINGIJORCS
Clustering plays a vital role in the various areas of research like Data Mining, Image Retrieval, Bio-computing and many a lot. Distance measure plays an important role in clustering data points. Choosing the right distance measure for a given dataset is a biggest challenge. In this paper, we study various distance measures and their effect on different clustering. This paper surveys existing distance measures for clustering and present a comparison between them based on application domain, efficiency, benefits and drawbacks. This comparison helps the researchers to take quick decision about which distance measure to use for clustering. We conclude this work by identifying trends and challenges of research and development towards clustering.
On algorithmic problems concerning graphs of higher degree of symmetrygraphhoc
Since the ancient determination of the five platonic solids the study of symmetry and regularity has always
been one of the most fascinating aspects of mathematics. One intriguing phenomenon of studies in graph
theory is the fact that quite often arithmetic regularity properties of a graph imply the existence of many
symmetries, i.e. large automorphism group G. In some important special situation higher degree of
regularity means that G is an automorphism group of finite geometry. For example, a glance through the
list of distance regular graphs of diameter d < 3 reveals the fact that most of them are connected with
classical Lie geometry. Theory of distance regular graphs is an important part of algebraic combinatorics
and its applications such as coding theory, communication networks, and block design. An important tool
for investigation of such graphs is their spectra, which is the set of eigenvalues of adjacency matrix of a
graph. Let G be a finite simple group of Lie type and X be the set homogeneous elements of the associated
geometry. The complexity of computing the adjacency matrices of a graph Gr on the vertices X such that
Aut GR = G depends very much on the description of the geometry with which one starts. For example, we
can represent the geometry as the totality of 1 cosets of parabolic subgroups 2 chains of embedded
subspaces (case of linear groups), or totally isotropic subspaces (case of the remaining classical groups), 3
special subspaces of minimal module for G which are defined in terms of a G invariant multilinear form.
The aim of this research is to develop an effective method for generation of graphs connected with classical
geometry and evaluation of its spectra, which is the set of eigenvalues of adjacency matrix of a graph. The
main approach is to avoid manual drawing and to calculate graph layout automatically according to its
formal structure. This is a simple task in a case of a tree like graph with a strict hierarchy of entities but it
becomes more complicated for graphs of geometrical nature. There are two main reasons for the
investigations of spectra: (1) very often spectra carry much more useful information about the graph than a
corresponding list of entities and relationships (2) graphs with special spectra, satisfying so called
Ramanujan property or simply Ramanujan graphs (by name of Indian genius mathematician) are important
for real life applications (see [13]). There is a motivated suspicion that among geometrical graphs one
could find some new Ramanujan graphs.
ON ALGORITHMIC PROBLEMS CONCERNING GRAPHS OF HIGHER DEGREE OF SYMMETRYFransiskeran
Since the ancient determination of the five platonic solids the study of symmetry and regularity has always
been one of the most fascinating aspects of mathematics. One intriguing phenomenon of studies in graph
theory is the fact that quite often arithmetic regularity properties of a graph imply the existence of many
symmetries, i.e. large automorphism group G. In some important special situation higher degree of
regularity means that G is an automorphism group of finite geometry. For example, a glance through the
list of distance regular graphs of diameter d < 3 reveals the fact that most of them are connected with
classical Lie geometry. Theory of distance regular graphs is an important part of algebraic combinatorics
and its applications such as coding theory, communication networks, and block design. An important tool
for investigation of such graphs is their spectra, which is the set of eigenvalues of adjacency matrix of a
graph. Let G be a finite simple group of Lie type and X be the set homogeneous elements of the associated
geometry.
MODIFIED LLL ALGORITHM WITH SHIFTED START COLUMN FOR COMPLEXITY REDUCTIONijwmn
Multiple-input multiple-output (MIMO) systems are playing an important role in the recent wireless
communication. The complexity of the different systems models challenge different researches to get a good
complexity to performance balance. Lattices Reduction Techniques and Lenstra-Lenstra-Lovàsz (LLL)
algorithm bring more resources to investigate and can contribute to the complexity reduction purposes.
In this paper, we are looking to modify the LLL algorithm to reduce the computation operations by
exploiting the structure of the upper triangular matrix without “big” performance degradation. Basically,
the first columns of the upper triangular matrix contain many zeroes, so the algorithm will perform several
operations with very limited income. We are presenting a performance and complexity study and our
proposal show that we can gain in term of complexity while the performance results remains almost the
same.
Traveling Salesman Problem in Distributed Environmentcsandit
In this paper, we focus on developing parallel algorithms for solving the traveling salesman problem (TSP) based on Nicos Christofides algorithm released in 1976. The parallel algorithm
is built in the distributed environment with multi-processors (Master-Slave). The algorithm is installed on the computer cluster system of National University of Education in Hanoi,
Vietnam (ccs1.hnue.edu.vn) and uses the library PJ (Parallel Java). The results are evaluated and compared with other works.
TRAVELING SALESMAN PROBLEM IN DISTRIBUTED ENVIRONMENTcscpconf
In this paper, we focus on developing parallel algorithms for solving the traveling salesman
problem (TSP) based on Nicos Christofides algorithm released in 1976. The parallel algorithm
is built in the distributed environment with multi-processors (Master-Slave). The algorithm is
installed on the computer cluster system of National University of Education in Hanoi,
Vietnam (ccs1.hnue.edu.vn) and uses the library PJ (Parallel Java). The results are evaluated
and compared with other works.
Distributed coloring with O(sqrt. log n) bitsSubhajit Sahu
Distributed Coloring with O˜(√log n) Bits
K Kothapalli, M Onus, C Scheideler, C Schindelhauer
Proc. of IEEE International Parallel and Distributed Processing Symposium …
We consider the well-known vertex coloring problem: given a graph G, find a coloring of its vertices so that no two neighbors in G have the same color. It is trivial to see that every graph of maximum degree∆ can be colored with∆+ 1 colors, and distributed algorithms that find a (∆+ 1)-coloring in a logarithmic number of communication rounds, with high probability, are known since more than a decade. This is in general the best possible if only a constant number of bits can be sent along every edge in each round. In fact, we show that for the n-node cycle the bit complexity of the coloring problem is
Ω (log n). More precisely, if only one bit can be sent along each edge in a round, then every distributed coloring algorithm (ie, algorithms in which every node has the same initial state and initially only knows its own edges) needs at least Ω (log n) rounds, with high probability, to color the n–node cycle, for any finite number of colors. But what if the edges have orientations, ie, the endpoints of an edge agree on its orientation (while bits may still flow in both directions)? Edge orientations naturally occur in dynamic networks where new nodes establish connections to old nodes. Does this allow one to provide faster coloring algorithms?
Quantum persistent k cores for community detectionColleen Farrelly
PPT overview of paper accepted for 2019 Southeastern International Conference on Combinatorics, Graph Theory & Computing. Details a persistence approach to community detection and a new quantum persistence-based algorithm based on the coloring problem.
EVOLUTIONARY CENTRALITY AND MAXIMAL CLIQUES IN MOBILE SOCIAL NETWORKSijcsit
This paper introduces an evolutionary approach to enhance the process of finding central nodes in mobile networks. This can provide essential information and important applications in mobile and social networks. This evolutionary approach considers the dynamics of the network and takes into consideration the central nodes from previous time slots. We also study the applicability of maximal cliques algorithms in mobile social networks and how it can be used to find the central nodes based on the discovered maximal cliques. The experimental results are promising and show a significant enhancement in finding the central nodes.
Color Image Watermarking Application for ERTU CloudCSCJournals
Color image is one of the the Egyptian Radio and Television Union (ERTU)’s content should be saved from any abuse from outside or inside the organization alike. The application of saving color image deploys the watermarking techniques based on Discrete Wavelet Transform (DWT). This application is implemented by software that suits the ERTU’s cloud besides many tests to insure the originality of the photo and if there is any changes applied on. All that provides the essential objectives of the cloud to overcome the limitation of distance as well as provide reliable and trusted services to Authorized group.
Using spectral radius ratio for node degreeIJCNCJournal
In this paper, we show that the spectral radius ratio for node degree could be used to analyze the variation of node degree during the evolution of complex networks. We focus on three commonly studied models of complex networks: random networks, scale-free networks and small-world networks. The spectral radius ratio for node degree is defined as the ratio of the principal (largest) eigenvalue of the adjacency matrix of a network graph to that of the average node degree. During the evolution of each of the above three categories of networks (using the appropriate evolution model for each category), we observe the spectral radius ratio for node degree to exhibit high-very high positive correlation (0.75 or above) to that of the
coefficient of variation of node degree (ratio of the standard deviation of node degree and average node degree). We show that the spectral radius ratio for node degree could be used as the basis to tune the operating parameters of the evolution models for each of the three categories of complex networks as well as analyze the impact of specific operating parameters for each model.
1. 1
Graph Signal Processing: Handwritten Digits
Recognition Via Community Detection
Abstract—Graph signal processing is an emerging field of
research. When the structure of signals can be represented as
a graph, it allows to fully exploit their inherent structure. It has
been shown that the normalized graph Laplacian matrix plays
a major role in the characterization of a signal on a graph. In
this paper we are interested in using the spectrum of this matrix
to solve classical problems. More precisely, we aim to detect
communities in order to recognize image digits. Indeed, we use
the spectrum of the normalized graph Laplacian as a suitable
method to detect two communities in a graph. We show that this
method has better results than many algorithms of the state of
art. Then, we use the same spectrum to recognize handwritten
digit images. We compare the spectral clustering method with
some other classical algorithms, emphasizing the advantages of
spectral clustering in community detection and semi-supervised
classification applications.
Index Terms—Graph signal processing, Community detection,
Digit recognition, Normalized Graph Laplacian.
I. INTRODUCTION
During the recent years, the analysis and processing of
large-scale datasets using graphs has become very useful
[8]. In fact, many kinds of data domains such as social
and economic networks, electric grids, neuronal networks
and images databases require a graph representation of their
structure. Each of these structures usually carries out infor-
mation that flow between different elements of the network.
For example, in a neural network, a neuron is activated after
receiving an electric excitation, and the activation of a neuron
usually influences the nearby neurons. In the case of economic
networks, we can consider the economic crisis as a flow that
spreads from one bank to another. This need to represent these
phenomena has lead to the development of a new field: the
graph signal processing. Indeed, a continuous signal can be
sampled according to a specific frequency and the sampled
discrete signal that is obtained is usually carried out on a
graph [10]. By this way, we obtain at the same time a
representation of the structure of the network as well as of the
information flowing through it. For instance, a sound signal
can be represented on a linear or a ring graph. However, a
picture is usually represented on a grid graph where each
pixel is linked to its four or eigth nearest neighbors [8].
Weighted graphs are particularly used to represent the links
and similarities between the different elements of a network.
The advantage about signals on graphs is the fact that they can
be processed in a way analogous to the classical processing
[1].
One of the main applications of graph signal processing
today appear in the field of artificial intelligence and especially
in machine learning. Community detection and digit recogni-
tion are among the most known applications in this domain
[4]. For community detection many methods have been used
but the graph signal processing using the spectral clustering
method seems to be more efficient. Further more, for digit
recognition, there are many algorithms that are used nowadays
to classify handwritten digits such as the k-means algorithm
[5] but the graph signal processing can also be used for the
same purpose.
In this paper, we present a method based on the graph signal
processing and known as spectral clustering to resolve the
problem of community detection and provide a mathematical
proof of this method. The same method is then applied to
recognize handwritten digits from the MNIST data base. This
method takes a variant based on the smoothness properties of
signals defined on graphs .
The remainder of the paper is as follow. In the next
section, we provide some background from the graph signal
processing domain. In section III, we present the method of
spectral clustering applied to both community detection and
handwritten digits recognition problems. Then, we discuss our
results compared to the state of art in order to identify both
the advantages and the drawbacks of the spectral clustering
method. Section V concludes the paper.
II. GRAPH SIGNAL PROCESSING
Let us introduce notations first. We consider a weighted,
simple, undirected graph G = (E, V ) where E represents
the set of edges and V the set of vertices. Without a loss of
generality, we consider V to be the set of integers between 1
and N = |E|. We equip G with a N × N adjacency matrix
W defined as follows [9]:
Wi,j
The weight of the edge connecting i and j
0 if no such edge exists
(1)
When the edge weights are not naturally defined by an
application, one common way to define the weight of an edge
connecting vertices i and j is via a similarity function like a
distance :
Wi,j = dist(i, j) (2)
Where dist(i, j) may represent a physical distance between
two feature vectors describing the nodes i and j.
We also define the N × N diagonal degree matrix D as:
Di,i = di =
N
k=0
Wi,k (3)
For instance, a social network can be represented by a
weighted, simple, undirected graph, where the vertices are
the individuals and the edges represent the friendship bond
between two individuals. In this case, the degree matrix gives
2. 2
us an idea about how important are the friendship links of
each individual.
We then introduce the non-normalized graph Laplacian L
D−W [9]. This matrix turns to have a major importance as it
stands for a differentiation operator for a signal over a graph.
We remind that a signal over a graph G is a vector x ∈
RN
where the ith
component of the vector x represents the
function value at the ith
vertex of V . The Laplacian’s ith
component of such a signal is the vector:
(Lx)(i) =
N
j=1
Wi,j[x(i) − x(j)] (4)
For example in the case of the social network, a signal can
represent a rumor: the individuals who received the rumor are
given the value 1 and those who did not are given the value
0. We obtain therefore a binary signal on graph.
When working with L2-norm, it makes sense to use instead
the normalized graph Laplacian, defined as [9]:
L = D− 1
2 · L · D− 1
2 (5)
Since the normalized (or standard) graph Laplacian is a
real valued symmetric matrix, it can be diagonalized using
an orthonormal basis. We denote a corresponding set of
orthonormal eigenvectors by {µl}l=1,2,...,N and the set of
associated real, non-negative eigenvalues by {λl}l=1,2,...,N
when those are ordered from the lowest eigenvalue to the
largest one.
In particular, we have:
Lµl = λlµl (6)
It is well-known that [6].:
0 = λ1 ≤ λ2 ≤ ... ≤ λN λmax ≤ 2 (7)
The literature gives many results binding eigenvalues with
properties of the graph. As an example, the number of con-
nected components of the graph is given by the multiplicity of
the eigenvalue zero. For instance, if the graph is connected, the
multiplicity of the eingenvalue zero is one. Also the highest
eigenvalue is equal to 2 if and only if the graph is bipartite.
The first eigenvector µ1 has a closed-form given by the
following formula [6]:
µ1(i) =
d(i)
u∈V d(u)
(8)
In the case of regular graphs, all the vertices have the same
degree so µ1 is a constant vector.
Eigenvectors of the graph normalized Laplacian extend
the principles of the Fourier transform for classical signal
processing. To understand this bindings, let us recall that the
classical Fourier transform of a signal f is given by:
˜f(ω) =
ˆ
R
f(x).e−iωx
dx (9)
Having:
d2
dx2
e−iωx
= −ω2
e−iωx
(10)
Figure 1. A positive graph signal defined on the Peterson graph. The height of
each blue bar represents the signal value at the vertex where the bar originates
[8].
Figure 2. Representation of the 16 cycle graph Laplacian eigenvectors. The
eigenvectors exhibit the sinusoidal characteristics of the Fourier Transform
basis. Signals defined on this graph are equivalent to classical descrete,
periodic signals.
We can notice that e−iωx
is the eigenvector of the Laplace
operator d2
dx2 associated with the eigenvalue −ω2
. On the other
hand, we have:
Lµl = λlµl (11)
so the frequencies in classical signal processing are analo-
gous to the eigenvalues of the normalized Laplacian in graph
signal processing. Consequently the Fourier transform ˆx of a
signal x on graph G is defined as [6]:
˜x(λl) =
N
i=1
x(i)µ∗
l (i) (12)
Where u∗
l represents the complexe conjugate of the eigen-
vector ul.
And the inverse graph Fourier transform is defined as [8]:
x(i) =
N
l=1
˜x(λl)µl(i) (13)
Finally, to characterize the smoothness of a signal on graph
G, one can use the Dirichlet form [6]:
S(x) =
xτ
Lx
x 2
=
1
x 2
N
l=1
λl(< x, µl >)2
(14)
The smaller is S(x), the smoother is the signal x.
3. 3
III. SPECTRAL CLUSTERING
A. Community Detection
Detecting communities is a problem with many variants in
mathematics and computer science [4]. Over the years, several
methods have been developed for the data partitioning . One
of these methods is known as “spectral clustering” which uses
some properties of the normalized graph Laplacian. In this
section, we will be interested in detecting communities using
only the normalized graph Laplacian spectrum’s properties.
Considering a population of individuals that can be par-
tionned into two communities according to their properties,
our aim is to detect these two communities.
To achieve this, we consider a random graph characterized
by two probabilities p ∈ [0 1] and q ∈ [0 1], where p is the
probability to have a link between two individuals belonging
to the same community and q is the probability to have a
connection between two individuals belonging to different
communities. The vertices of this graph are the individuals,
the edges are the connections between them and the weigth
of the edges are p if it is a inter-community link or q if it
concerns a intra-community link.
We suppose that p ≥ q. If we consider the case where
(p, q) = (1, 0), the error rate of detecting the two communities
should tend to zero. The other limit case is when p = q where
the error rate should tend to 0.5.
If we consider G to be the graph representing the popula-
tion, and L its normalized graph Laplacian, the sign of L’s
second eigenvector components allows us to partition the set
of vertices into two different communities. Indeed, the L’s
second eigenvector components with the same sign belong to
the same community. We provide a proof of this principle:
The graph of the population is represented by a statistic
adjacency matrix. Instead of using a random graph with two
different probabilities p and q, we can consider a complete
simple graph where the weight of the edges linking two
items belonging to the same community is equal to p and the
weight of the edges linking two items belonging to different
communities is equal to q. This matrix is the statistical mean of
the adjacency matrices representing the Erdos Renyi random
graphs generated respectively with the probabilities p and q
[2].
The adjacency matrix A corresponding to this situation is
given in block form as:
A =
p · (JN∗ − IN ) q · JN∗
q · JN∗ p · (JN∗ − IN∗ )
(15)
where we denote by IN∗ the identity matrix of size N∗
,
JN∗ the N∗
× N∗
matrix containing one in each component
and N∗
the cardinality of each community. Then, the first N∗
vertices belong to a specific community and the others belong
to a different one.
The graph considered is regular, thus the degree matrix is
a scalar one given by:
D = ((N∗
− 1) · p + N∗
q) · IN∗ (16)
The graph Laplacian denoted by L is then given by:
L =
N∗
(p + q) · IN∗ − p · JN∗ q · JN∗
q · JN∗ N∗
· (p + q) · IN∗ − p · JN∗
(17)
Since we are interested in finding the second eigenvector of
L, and given that both L and L have the same eigenspaces,
we only need to have the second eigenvector of L.
To achieve this, we have to compute the characteristic
polynomial of L denoted by χL in order to calculate the eigen-
values and then their corresponding eigenspaces (in particular,
the eigenspace associated to the second eigenvalue).
We have:
χL(x) = det(L − x.IN ) =
M1 M2
M2 M1
(18)
where M1 = N∗
(p + q − x).IN∗ − pJN∗ and M2 = qJN∗
Since JN∗ and IN∗ commute, M1 and M2 commute, and
consequently we have:
χL(x) = det(M2
1 − M2
2 ) = det(M1 − M2) · det(M1 + M2)
(19)
We can easily verify that for p = q we have:
χL(x) = (p2
−q2
)N∗
·χJN∗ ·(
N∗
(p + q − x)
p − q
)·χJN∗ ·(
N∗
(p + q − x)
p + q
)
(20)
And knowing that:
χJN∗ (x) ∝ xN∗
−1
· (x − N∗
) (21)
we can prove that:
λ2 = 2 · N∗
· q (22)
.
Expressing explicitly ker(L − λ2.IN∗ ), we find that the
second eigenvector of L takes the form:
µ2 = [x1, x1, ..., x1
N∗
, x2, x2, ..., x2]
N∗
(23)
The graph is regular so µ1 is a constant vector and thanks
to the orthogonality of the eigenvector basis {µl}l=1,2,...,N we
have:
< µ1, µ2 >= 0 (24)
where we denote by < •, • > the classical dot product in
an euclidean space.
we conclude that:
x1 + x2 = 0 (25)
and then:
x1 = −x2. (26)
Thereby, the components of the second eigenvector having
the same sign represent vertices belonging to the same com-
munity.
4. 4
Then, we compare the performance of this method with
two other algorithms from the state of art: “Reichardt” [7]
and “LFK” algorithms [3].
We recall that “Reichardt” method for community detection
uses a greedy optimization of a modularity function Q and the
aim is to compare the original network to a randomized one.
The weight of the links in the randomized network depends on
the probability of linking two nodes belonging to the original
network as follow [7]:
ai,j = wi,j − pi,j
bi,j = γRpi,j
(27)
where wi,j represents the weight of the edge linking the
nodes i and j in the original network, pi,j represents the
probability of linking two nodes in the original network, γR
is a parameter for optimization and finally ai,j and bi,j are the
weights of the links in the randomized network.
The second method for detecting communities is the “LFK”
method which consists in detecting a community for each node
of a network. To achieve this, we consider that a community is
a subgraph which maximizes the fitness function of its nodes.
The fitness function FSG of a subgraph is defined by [3]:
FSG =
KSG
in
(KSG
in + KSG
out)α
(28)
where SG is the community, Kin is twice the number of
the internal links in the community SG, Kout is the number
of the edges connecting SG with the rest of the graph and α
is a positive real-valued parameter, controlling the size of the
community.
We compare the performances of these two algorithms with
the spectral clustering method in figure 3. For that, we consider
random graphs with a fixed probability of linking two nodes
in the same community and we vary the probability of linking
two nodes belonging to different communities. The size of
these graphs is equal to 200 and the graphs are divided into
two equal communities.
B. Handwritten Digit Recognition
Hanwritten digit recognition is another application of graph
signal processing and one of the most classic examples in
the classification literature. It is an important task in semi-
supervised machine learning that allows us to identify a digit
based on a training database of digits [11]. In our case, we used
the MNIST database of handwritten digits which has a training
set of 60, 000 examples, and a test set of 10, 000 examples.
All images in the MNIST database are of size 28 × 28.
The task is to classify an unlabeled digit into one of a fixed
number of digit classes. Our first recognition test is aimed at
identifying a digit i amongst two possibilities of digits i1 and
i2 with the conditions :
i, i1, i2 ∈ [0..9]
i1 = i2
i ∈ {i1, i2}
(29)
0 0.05 0.1 0.15 0.2 0.25 0.3
0
0.2
0.4
0.6
0.8
q
ErrorRate
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0
0.2
0.4
0.6
0.8
q
ErrorRate
Reichard
LFK
Spectral Clustering
Figure 3. Computing the performance of three community detection algo-
rithms using ErdosRenyi random graphs of size 200 with a fixed probability
of linking two individuals belonging to the same community (first case p=0.3,
second case p=0.9) and a variable probability q of linking two individuals
belonging to the different communities. We notice that the spectral clustering
is more accurate to detect the two communities with a lower error rate than
the two other methods. The tendency towards the maximum value of error
rate is lower in the case of the spectral clustering than in the case of Reichadt
and LFK methods.
−0.11 −0.105 −0.1 −0.095 −0.09 −0.085
−0.2
−0.1
0
0.1
0.2
First eigenVector
SecondeigenVector
−0.12 −0.115 −0.11 −0.105 −0.1 −0.095 −0.09 −0.085
−0.4
−0.2
0
0.2
0.4
First eigenVector
SecondeigenVector
first community
second community
Figure 4. Representation of the individuals belonging to two communities in
the basis (first eigenvector, second eigenvector)[10]. In the first case the prob-
ability of linking two individuals belonging to the same community is p=0.9
and the probability of linking two individuals from different communities is
q = 0.1. In the second case: we consider p=q=0.4.
We denote N1 the number of i1 images and N2 the number
of i2 images; both taken from the MNIST training database.
The graph is therefore composed of N1 + N2 + 1 vertices
corresponding to the training examples and to the digit i.
There are several possibilities to construct the graph of
which one option is to make a complete undirected graph.
To represent the link between two digits (2 vertices in the
graph), we need a metric. In our tests we used the standard
euclidean metric. The weight of an edge linking two images
5. 5
0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
−0.2
−0.15
−0.1
−0.05
0
0.05
0.1
0.15
0.2
First eigenVector
SecondeigenVector
2
5
Figure 5. Representation of the digits 2 and 5 in the normalized graph
Laplacian eigenvectors basis (first eigenvector, second eigenvector)[10]
I and I (matrices of size L × C) is 1
1+d2(I,I ) with d(I, I )
defined as the euclidean distance between I and I
d(I, I ) =
1
L × C
k,l
(Ikl − Ikl)2. (30)
Where Ikl denote the pixel’s value localted at the kth
row
and the lth
column of the picture I wich is considered as a
matrix of size L × C. In our case, we have: L = C = 28.
The graph carrying out the digits i1 and i2 can be shown
in the basis composed of the first and second normalized
graph Laplacian eigenvectors [10]. The first two eigenvectors
represent low frequencies of the signal and give therefore a
clear idea of the distance between the different vertices. For
instance, in the case of two classes of digits (2 and 5), the
associated graph is shown in figure 5.
As we can notice in figure 5, it is possible to identify two
communities although there are some vertices that belong to
different classes but that are very close to each other.
The second option to construct the graph carrying out the
digits i1 and i2 is to link each vertex only to its k-nearest
neighbors with a weight equal to one. k is a user defined
constant and the graph obtained is connected but not complete
for k < N1+N2. This method is known as the KNN algorithm
and it is interesting when handling very large size graphs
as it allows to have a sparse matrix and therefore a better
complexity.
Since there are two communities of digits, the problem is
similar to the one of community detection. Therefore the first
method that we use to identify the digit i is an algorithm
that produces the normalized graph Laplacian matrix which
is a good representation of the links between the digits in
the graph. As in the case of two communities discussed in the
previous section, we obtain two signs in the second eigenvector
and by comparing these signs with the sign of the component
corresponding to the digit i, it is possible to identify whether
i belongs to i1 or to i2 community.
The second method to classify the digit i is to compute the
smoothest signal on graph associated to the graph of digits
using the Dirichlet form [6]. The method consists of forming
a signal x of size N1 + N2 + 1 such as:
Algorithm k-means Laplacian method Smoothest signal
Error rate(%) 11.96 9.65 6.36
Table I
ERROR RATE FOR DIFFERENT CLASSIFICATION ALGORITHMS
The three algorithms are tested for the digits 2 and 5 where the cardinality
of each digit group is 100. 100 tests are realized for each algorithm.
x[j] =
1 for j ∈ [1 N1]
α for j = N1 + 1
−1 for j ∈ [N1 + 2, N1 + N2 + 1]
(31)
α ∈ [−1, 1] is an unknown parameter corresponding to the
signal’s component for the digit i. We aim at finding the value
α that makes the signal x the smoothest possible i.e. that
minimizes S(x), the Dirichlet form. The sign of α allows us
to identify i:
if α > 0 then i = i1
if α < 0 then i = i2
(32)
The results that we obtain with this algorithm show that we
get the smoothest signal particularly for the values {−1, 1} of
α; we have a minimum of S(x) with :
α = 1 if i = i1
α = −1 otherwise
(33)
We tested the two methods (Laplacian and the “smoothest
signal” methods) for the digits i1 = 2 and i2 = 5 and with
N1 = N2 = 100. For 100 tests we obtain an error rate
of 9.65% for the Laplacian method and of 6.36% for the
smoothest signal method. Table 1 shows the error rates of
the Laplacian method, of the smoothest signal method and of
k-means algorithm (k = 2) that are tested on the same MNIST
database.
Our second recognition test realize the recognition of one
digit i amongst l different possibilities {ik}k∈[2,l] with 2 ≤
l ≤ 10. For l = 3, we realized an algorithm that recognizes
the closest digit to i between two digits i1 and i2 which is the
same as the case of a classification with two classes. The digit
chosen from {i1, i2}as the closest to i is then compared with
i3 based on the same two-classes algorithm and the result
allows us to identify the closest digit to i from {i1, i2, i3}.
Since i ∈ {i1, i2, i3} this method allows us to recognize the
digit i. The same principle can be applied to the case where
l > 3.
The third and last test is to identify several digits si-
multaneously among l different possibilities {ik}k∈[2,l] with
2 ≤ l ≤ 10. For this case we started by realizing a complete
graph where the vertices include both the training and the test
set of digits. And for each digit to be classified, we use the
algorithm of the second recognition test, which realizes the
recognition of one digit amongst l different possibilities.
C. Results Analysis
In both applications that we have seen, a random graph
is generated. In the first case, we use an Erdos Reyni graph
6. 6
Table II
EXECUTION TIME (IN SECONDS) FOR DIFFERENT ALGORITHMS IN
FUNCTION OF THE GRAPH SIZE.
Graph Size Reichardt algorithm LFK algorithm Spectral clustering method
20 0.002 0.0869 9.98 e-0.4
50 0.0026 0.1451 0.0021
100 0.0101 0.5483 0.0103
200 0.0582 2.5145 0.0540
[9] with a probability p of linking two vertices in the same
community and a probability q of linking two vertices from
different communities. The condition p > q is needed in order
to distinguish the two communities. The cardinality is the same
in both communities and the adjacency matrix is generated
in a block form, where the first block represents the first
community and the fourth block the second community.
For the second case, the digit images are chosen randomly
from the MNIST handwritten database. To construct the
random graph we use two methods. The first is based on
generating a complete graph where the weights are computed
using the euclidean distance. The second method involves
KNN algorithm and the graph generated is binary and not
complete.
In both these applications, we generate the normalized
Laplacian matrix and use the property of smoothness in low
frequency signals. In the case of Laplacian, we use the second
eigenvector since it is related to the second lowest eigenvalue.
In the first application, the detection of the two communities
using the Laplacian (spectral clustering) is more accurate than
some of the classical methods such as Reichardt or LFK’s
algorithms: the error rate in the case of spectral clustering is
lower and increases more slowly with q comparing to the other
two algorithms.
On the other hand, this method works essentially in order
to detect two comminities. In this case, it remains a very
efficient algorithm. The other two algorithms which were
tested (Reichardt and LFK) detect in some cases more than
two communities. Moreover, the spectral clustering algorithm
is faster than the two other algorithms (Reichardt and LFK).
Table II compares the execution times for the differents
algorithms tested.
IV. FUTURE WORKS
Our future works consist on partitioning a set of data in
more than two communities in order to generalize the principle
of the spectral clustering. For that, we are thinking about
applying the method presented in this paper hierarchically on
a data set. So, by applying our algorithm m times, we will
be able to recover 2m
communities. The interations’ number
of our algorithm will be choosen by optimizing a stability
criterion well defined.
V. CONCLUSION
This paper shows the importance of processing signals
on graph and the advantages of using the normalized graph
Laplacian in this processing. The low frequencies of the
Laplacian carry indeed interesting information about the struc-
ture of the graph it is representing. The use of a metric to
characterize the distance between the vertices allows us to
have a better idea of the link between the different vertices of
the graph. These properties of the Laplacian matrix are used
in two classical applications in machine learning literature:
community detection and handwritten digit recognition. The
spectral clustering allows us in the first case to detect two
unlabeled communities based only on the structure of the
graph and in the second case to classify one or many digits
into one or many labeled classes of training digits based on the
similarities between the training set of digits and the digits to
be classified. The spectral clustering allows us to have better
efficiency than some classical algorithms but remains limited
by the fact that it can detect only two communities at once.
Therefore, the study of the normalized graph Laplacian
spectrum provides us with solutions to some frequent applica-
tions. There are many other use cases that can be treated using
the graph Laplacian method and that need to be considered in
further studies.
VI. ACKNOWLEDGEMENTS
The authors would like to thank Vincent Gripon, associate
professor at Telecom Bretagne for giving us the opportunity
to work on the domain of the graph signal processing and
for helping us to improve our work thanks to his constructive
comments.
REFERENCES
[1] Ameya Agaskar and Yue M Lu. A spectral graph uncertainty principle.
Information Theory, IEEE Transactions on, 59(7):4338–4356, 2013.
[2] P Erd˝os and Alfréd Rényi. On the existence of a factor of degree one of
a connected random graph. Acta Mathematica Hungarica, 17(3-4):359–
368, 1966.
[3] Andrea Lancichinetti, Santo Fortunato, and János Kertész. Detecting the
overlapping and hierarchical community structure in complex networks.
New Journal of Physics, 11(3):033015, 2009.
[4] Erwan Le Martelot and Chris Hankin. Fast multi-scale community
detection based on local criteria within a multi-threaded algorithm. arXiv
preprint arXiv:1301.0955, 2013.
[5] Mohammad Norouzi and David J Fleet. Cartesian k-means. In Computer
Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on,
pages 3017–3024. IEEE, 2013.
[6] Michael G Rabbat and Vincent Gripon. Towards a spectral characteriza-
tion of signals supported on small-world networks. In Acoustics, Speech
and Signal Processing (ICASSP), 2014 IEEE International Conference
on, pages 4793–4797. IEEE, 2014.
[7] Jörg Reichardt and Stefan Bornholdt. Statistical mechanics of commu-
nity detection. Physical Review E, 74(1):016110, 2006.
[8] David Shuman, Sunil K Narang, Pascal Frossard, Antonio Ortega, Pierre
Vandergheynst, et al. The emerging field of signal processing on graphs:
Extending high-dimensional data analysis to networks and other irregular
domains. Signal Processing Magazine, IEEE, 30(3):83–98, 2013.
[9] David I Shuman, Pierre Vandergheynst, and Pascal Frossard. Distributed
signal processing via chebyshev polynomial approximation. arXiv
preprint arXiv:1111.5239, 2011.
[10] Daniel A Spielman. Spectral graph theory and its applications. In null,
pages 29–38. IEEE, 2007.
[11] M Van Breukelen, Robert PW Duin, David MJ Tax, and JE Den Hartog.
Handwritten digit recognition by combined classifiers. Kybernetika,
34(4):381–386, 1998.