Core periphery structures exist naturally in many complex networks in the real-world like social,
economic, biological and metabolic networks. Most of the existing research efforts focus on the
identification of a meso scale structure called community structure. Core periphery structures are another
equally important meso scale property in a graph that can help to gain deeper insights about the
relationships between different nodes. In this paper, we provide a definition of core periphery structures
suitable for weighted graphs. We further score and categorize these relationships into different types based
upon the density difference between the core and periphery nodes. Next, we propose an algorithm called
CP-MKNN (Core Periphery-Mutual K Nearest Neighbors) to extract core periphery structures from
weighted graphs using a heuristic node affinity measure called Mutual K-nearest neighbors (MKNN).
Using synthetic and real-world social and biological networks, we illustrate the effectiveness of developed
core periphery structures.
Bat-Cluster: A Bat Algorithm-based Automated Graph Clustering Approach IJECEIAES
The document presents a new approach called Bat-Cluster (BC) for automated graph clustering. BC combines the Fast Fourier Domain Positioning (FFDP) algorithm and the Bat Algorithm. FFDP positions graph nodes, then Bat Algorithm optimizes clustering by finding configurations that minimize the Davies-Bouldin Index. BC is tested on four benchmark graphs and outperforms Particle Swarm Optimization, Ant Colony Optimization, and Differential Evolution in providing higher clustering precision.
O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...csandit
The high-level contributions of this paper are as f
ollows: We modify an existing branch-and-
bound based exact algorithm (for maximum clique siz
e of an entire graph) to determine the
maximal clique size that the individual vertices in
the graph are part of. We then run this
algorithm on six real-world network graphs (ranging
from random networks to scale-free
networks) and analyze the distribution of the maxim
al clique size of the vertices in these graphs.
We observe five of the six real-world network graph
s to exhibit a Poisson-style distribution for
the maximal clique size of the vertices. We analyze
the correlation between the maximal clique
size and the clustering coefficient of the vertices
, and find these two metrics to be poorly
correlated for the real-world network graphs. Final
ly, we analyze the Assortativity index of the
vertices of the real-world network graphs and obser
ve the graphs to exhibit positive
assortativity with respect to maximal clique size a
nd negative assortativity with respect to node
degree; nevertheless, we observe the Assortativity
index of the real-world network graphs with
respect to both the maximal clique size and node de
gree to increase with decrease in the
spectral radius ratio for node degree, indicating a
positive correlation between the maximal
clique size and node degree.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Clustering in Aggregated User Profiles across Multiple Social Networks IJECEIAES
A social network is indeed an abstraction of related groups interacting amongst themselves to develop relationships. However, toanalyze any relationships and psychology behind it, clustering plays a vital role. Clustering enhances the predictability and discoveryof like mindedness amongst users. This article’s goal exploits the technique of Ensemble Kmeans clusters to extract the entities and their corresponding interestsas per the skills and location by aggregating user profiles across the multiple online social networks. The proposed ensemble clustering utilizes known K-means algorithm to improve results for the aggregated user profiles across multiple social networks. The approach produces an ensemble similarity measure and provides 70% better results than taking a fixed value of K or guessing a value of K while not altering the clustering method. This paper states that good ensembles clusters can be spawned to envisage the discoverability of a user for a particular interest.
Clustering of high dimensionality data which can be seen in almost all fields these days is becoming
very tedious process. The key disadvantage of high dimensional data which we can pen down is curse
of dimensionality. As the magnitude of datasets grows the data points become sparse and density of
area becomes less making it difficult to cluster that data which further reduces the performance of
traditional algorithms used for clustering. Semi-supervised clustering algorithms aim to improve
clustering results using limited supervision. The supervision is generally given as pair wise
constraints; such constraints are natural for graphs, yet most semi-supervised clustering algorithms are
designed for data represented as vectors [2]. In this paper, we unify vector-based and graph-based
approaches. We first show that a recently-proposed objective function for semi-supervised clustering
based on Hidden Markov Random Fields, with squared Euclidean distance and a certain class of
constraint penalty functions, can be expressed as a special case of the global kernel k-means objective
[3]. A recent theoretical connection between global kernel k-means and several graph clustering
objectives enables us to perform semi-supervised clustering of data. In particular, some methods have
been proposed for semi supervised clustering based on pair wise similarity or dissimilarity
information. In this paper, we propose a kernel approach for semi supervised clustering and present in
detail two special cases of this kernel approach.
This document provides an overview of several clustering algorithms. It begins by defining clustering and its importance in data mining. It then categorizes clustering algorithms into four main types: partitional, hierarchical, grid-based, and density-based. For each type, some representative algorithms are described briefly. The document also reviews several popular clustering algorithms like k-means, CLARA, PAM, CLARANS, and BIRCH in more detail. It discusses aspects like the algorithms' time complexity, types of data handled, ability to detect clusters of different shapes, required input parameters, and advantages/disadvantages. Overall, the document aims to guide selection of suitable clustering algorithms for specific applications by surveying their key characteristics.
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGijcsa
This document provides a survey of optimization approaches that have been applied to text document clustering. It discusses several clustering algorithms and categorizes them as partitioning methods, hierarchical methods, density-based methods, grid-based methods, model-based methods, frequent pattern-based clustering, and constraint-based clustering. It then describes several soft computing techniques that have been used as optimization approaches for text document clustering, including genetic algorithms, bees algorithms, particle swarm optimization, and ant colony optimization. These optimization techniques perform a global search to improve the quality and efficiency of document clustering algorithms.
A Novel Clustering Method for Similarity Measuring in Text DocumentsIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
Bat-Cluster: A Bat Algorithm-based Automated Graph Clustering Approach IJECEIAES
The document presents a new approach called Bat-Cluster (BC) for automated graph clustering. BC combines the Fast Fourier Domain Positioning (FFDP) algorithm and the Bat Algorithm. FFDP positions graph nodes, then Bat Algorithm optimizes clustering by finding configurations that minimize the Davies-Bouldin Index. BC is tested on four benchmark graphs and outperforms Particle Swarm Optimization, Ant Colony Optimization, and Differential Evolution in providing higher clustering precision.
O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...csandit
The high-level contributions of this paper are as f
ollows: We modify an existing branch-and-
bound based exact algorithm (for maximum clique siz
e of an entire graph) to determine the
maximal clique size that the individual vertices in
the graph are part of. We then run this
algorithm on six real-world network graphs (ranging
from random networks to scale-free
networks) and analyze the distribution of the maxim
al clique size of the vertices in these graphs.
We observe five of the six real-world network graph
s to exhibit a Poisson-style distribution for
the maximal clique size of the vertices. We analyze
the correlation between the maximal clique
size and the clustering coefficient of the vertices
, and find these two metrics to be poorly
correlated for the real-world network graphs. Final
ly, we analyze the Assortativity index of the
vertices of the real-world network graphs and obser
ve the graphs to exhibit positive
assortativity with respect to maximal clique size a
nd negative assortativity with respect to node
degree; nevertheless, we observe the Assortativity
index of the real-world network graphs with
respect to both the maximal clique size and node de
gree to increase with decrease in the
spectral radius ratio for node degree, indicating a
positive correlation between the maximal
clique size and node degree.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Clustering in Aggregated User Profiles across Multiple Social Networks IJECEIAES
A social network is indeed an abstraction of related groups interacting amongst themselves to develop relationships. However, toanalyze any relationships and psychology behind it, clustering plays a vital role. Clustering enhances the predictability and discoveryof like mindedness amongst users. This article’s goal exploits the technique of Ensemble Kmeans clusters to extract the entities and their corresponding interestsas per the skills and location by aggregating user profiles across the multiple online social networks. The proposed ensemble clustering utilizes known K-means algorithm to improve results for the aggregated user profiles across multiple social networks. The approach produces an ensemble similarity measure and provides 70% better results than taking a fixed value of K or guessing a value of K while not altering the clustering method. This paper states that good ensembles clusters can be spawned to envisage the discoverability of a user for a particular interest.
Clustering of high dimensionality data which can be seen in almost all fields these days is becoming
very tedious process. The key disadvantage of high dimensional data which we can pen down is curse
of dimensionality. As the magnitude of datasets grows the data points become sparse and density of
area becomes less making it difficult to cluster that data which further reduces the performance of
traditional algorithms used for clustering. Semi-supervised clustering algorithms aim to improve
clustering results using limited supervision. The supervision is generally given as pair wise
constraints; such constraints are natural for graphs, yet most semi-supervised clustering algorithms are
designed for data represented as vectors [2]. In this paper, we unify vector-based and graph-based
approaches. We first show that a recently-proposed objective function for semi-supervised clustering
based on Hidden Markov Random Fields, with squared Euclidean distance and a certain class of
constraint penalty functions, can be expressed as a special case of the global kernel k-means objective
[3]. A recent theoretical connection between global kernel k-means and several graph clustering
objectives enables us to perform semi-supervised clustering of data. In particular, some methods have
been proposed for semi supervised clustering based on pair wise similarity or dissimilarity
information. In this paper, we propose a kernel approach for semi supervised clustering and present in
detail two special cases of this kernel approach.
This document provides an overview of several clustering algorithms. It begins by defining clustering and its importance in data mining. It then categorizes clustering algorithms into four main types: partitional, hierarchical, grid-based, and density-based. For each type, some representative algorithms are described briefly. The document also reviews several popular clustering algorithms like k-means, CLARA, PAM, CLARANS, and BIRCH in more detail. It discusses aspects like the algorithms' time complexity, types of data handled, ability to detect clusters of different shapes, required input parameters, and advantages/disadvantages. Overall, the document aims to guide selection of suitable clustering algorithms for specific applications by surveying their key characteristics.
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGijcsa
This document provides a survey of optimization approaches that have been applied to text document clustering. It discusses several clustering algorithms and categorizes them as partitioning methods, hierarchical methods, density-based methods, grid-based methods, model-based methods, frequent pattern-based clustering, and constraint-based clustering. It then describes several soft computing techniques that have been used as optimization approaches for text document clustering, including genetic algorithms, bees algorithms, particle swarm optimization, and ant colony optimization. These optimization techniques perform a global search to improve the quality and efficiency of document clustering algorithms.
A Novel Clustering Method for Similarity Measuring in Text DocumentsIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
Textual Data Partitioning with Relationship and Discriminative AnalysisEditor IJMTER
Data partitioning methods are used to partition the data values with similarity. Similarity
measures are used to estimate transaction relationships. Hierarchical clustering model produces tree
structured results. Partitioned clustering produces results in grid format. Text documents are
unstructured data values with high dimensional attributes. Document clustering group ups unlabeled text
documents into meaningful clusters. Traditional clustering methods require cluster count (K) for the
document grouping process. Clustering accuracy degrades drastically with reference to the unsuitable
cluster count.
Textual data elements are divided into two types’ discriminative words and nondiscriminative
words. Only discriminative words are useful for grouping documents. The involvement of
nondiscriminative words confuses the clustering process and leads to poor clustering solution in return.
A variation inference algorithm is used to infer the document collection structure and partition of
document words at the same time. Dirichlet Process Mixture (DPM) model is used to partition
documents. DPM clustering model uses both the data likelihood and the clustering property of the
Dirichlet Process (DP). Dirichlet Process Mixture Model for Feature Partition (DPMFP) is used to
discover the latent cluster structure based on the DPM model. DPMFP clustering is performed without
requiring the number of clusters as input.
Document labels are used to estimate the discriminative word identification process. Concept
relationships are analyzed with Ontology support. Semantic weight model is used for the document
similarity analysis. The system improves the scalability with the support of labels and concept relations
for dimensionality reduction process.
AN GROUP BEHAVIOR MOBILITY MODEL FOR OPPORTUNISTIC NETWORKS csandit
Mobility is regarded as a network transport mechanism for distributing data in many networks.
However, many mobility models ignore the fact that peer nodes often carried by people and
thus move in group pattern according to some kind of social relation. In this paper, we propose
one mobility model based on group behavior character which derives from real movement
scenario in daily life. This paper also gives the character analysis of this mobility model and
compares with the classic Random Waypoint Mobility model.
The document provides a literature review of different clustering techniques. It begins by defining clustering and its applications. It then categorizes and describes several clustering methods including hierarchical (BIRCH, CURE, CHAMELEON), partitioning (k-means, k-medoids), density-based (DBSCAN, OPTICS, DENCLUE), grid-based (CLIQUE, STING, MAFIA), and model-based (RBMN, SOM) methods. For each method, it discusses the algorithm, advantages, disadvantages and time complexity. The document aims to provide an overview of various clustering techniques for classification and comparison.
This document compares the k-means and grid density clustering algorithms. K-means partitions data into k clusters based on minimizing distances between points and cluster centroids. It works well with numerical data but can be affected by outliers. Grid density determines dense grids based on neighbor densities and can handle different shaped and multi-density clusters without knowing the number of clusters beforehand. It has advantages over k-means in that it can handle categorical data, noise and arbitrary shaped clusters.
Proposing a new method of image classification based on the AdaBoost deep bel...TELKOMNIKA JOURNAL
Image classification has different applications. Up to now, various algorithms have been presented
for image classification. Each of these methods has its own weaknesses and strengths. Reducing error rate
is an issue which many researches have been carried out about it. This research intends to optimize
the problem with hybrid methods and deep learning. The hybrid methods were developed to improve
the results of the single-component methods. On the other hand, a deep belief network (DBN) is a generative
probabilistic modelwith multiple layers of latent variables and is used to solve the unlabeled problems. In
fact, this method is anunsupervised method, in which all layers are one-way directed layers except for
the last layer. So far, various methods have been proposed for image classification, and the goal of this
research project was to use a combination of the AdaBoost method and the deep belief network method to
classify images. The other objective was to obtain better results than the previous results. In this project, a
combination of the deep belief network and AdaBoost method was used to boost learning and the network
potential was enhanced by making the entire network recursive. This method was tested on the MINIST
dataset and the results were indicative of a decrease in the error rate with the proposed method as compared
to the AdaBoost and deep belief network methods.
Methods of Combining Neural Networks and Genetic AlgorithmsESCOM
1. The document discusses methods for combining neural networks and genetic algorithms. It describes three main approaches: evolving connection weights, evolving network architectures, and evolving learning rules.
2. In the first approach, a genetic algorithm is used as the learning rule to optimize neural network weights. The second approach uses a genetic algorithm to select structural parameters of the network and trains the network separately. The third approach evolves parameters of the network's learning rule.
3. Combining neural networks and genetic algorithms can help compensate for each techniques' weaknesses in search, and may lead to highly successful adaptive systems by integrating learning and evolutionary search.
Review Paper on Shared and Distributed Memory Parallel Algorithms to Solve Bi...JIEMS Akkalkuwa
This document presents a review of parallel algorithms to solve big data problems in biological, social network, and spatial domains using shared and distributed memory. It discusses sequential and parallel algorithms for community detection in protein-protein interaction networks and social networks. It also discusses techniques for processing and analyzing large LiDAR point cloud data for applications like forest monitoring and 3D modeling. The document reviews relevant literature on algorithms for community detection, network partitioning, and LiDAR data reduction and interpolation. It then describes the BLLP algorithm for community detection in biological networks and discusses how it could be extended to distributed memory systems.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
In network aggregation techniques for wireless sensor networks - a surveyGungi Achi
This document provides a comprehensive survey of in-network aggregation techniques for wireless sensor networks. It begins by defining in-network aggregation and classifying approaches into those with and without data size reduction. It identifies the key components of in-network aggregation as routing protocols, aggregation functions, and data representation. It reviews theoretical limits of aggregation and discusses open issues. The document aims to provide an updated view of in-network aggregation and motivate future research in this area.
This document summarizes a research paper on developing an improved LEACH (Low-Energy Adaptive Clustering Hierarchy) communication protocol for energy efficient data mining in multi-feature sensor networks. It begins with background on wireless sensor networks and issues like energy efficiency. It then discusses the existing LEACH protocol and its drawbacks. The proposed improved LEACH protocol includes cluster heads, sub-cluster heads, and cluster nodes to address LEACH's limitations. This new version aims to minimize energy consumption during cluster formation and data aggregation in multi-feature sensor networks.
The project re-implements the architecture of the paper Reasoning with Neural Tensor Networks for Knowledge Base Completion in Torch framework, achieving similar accuracy results with an elegant implementation in a modern language.
Below are some links for further details:
https://github.com/agarwal-shubham/Reasoning-Over-Knowledge-Base
http://darsh510.github.io/IREPROJ/
This document summarizes a research paper that proposes a new density-based clustering technique called Triangle-Density Based Clustering Technique (TDCT) to efficiently cluster large spatial datasets. TDCT uses a polygon approach where the number of data points inside each triangle of a polygon is calculated to determine triangle densities. Triangle densities are used to identify clusters based on a density confidence threshold. The technique aims to identify clusters of arbitrary shapes and densities while minimizing computational costs. Experimental results demonstrate the technique's superiority in terms of cluster quality and complexity compared to other density-based clustering algorithms.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Ensemble based Distributed K-Modes ClusteringIJERD Editor
Clustering has been recognized as the unsupervised classification of data items into groups. Due to the explosion in the number of autonomous data sources, there is an emergent need for effective approaches in distributed clustering. The distributed clustering algorithm is used to cluster the distributed datasets without gathering all the data in a single site. The K-Means is a popular clustering method owing to its simplicity and speed in clustering large datasets. But it fails to handle directly the datasets with categorical attributes which are generally occurred in real life datasets. Huang proposed the K-Modes clustering algorithm by introducing a new dissimilarity measure to cluster categorical data. This algorithm replaces means of clusters with a frequency based method which updates modes in the clustering process to minimize the cost function. Most of the distributed clustering algorithms found in the literature seek to cluster numerical data. In this paper, a novel Ensemble based Distributed K-Modes clustering algorithm is proposed, which is well suited to handle categorical data sets as well as to perform distributed clustering process in an asynchronous manner. The performance of the proposed algorithm is compared with the existing distributed K-Means clustering algorithms, and K-Modes based Centralized Clustering algorithm. The experiments are carried out for various datasets of UCI machine learning data repository.
Influence of priors over multityped object in evolutionary clusteringcsandit
In recent years, Evolutionary clustering is an evolving research area in data mining. The
evolution diagnosis of any homogeneous as well as heterogeneous network will provide an
overall view about the network. Applications of evolutionary clustering includes, analyzing, the
social networks, information networks, about their structure, properties and behaviors. In this
paper, the authors study the problem of influence of priors over multi-typed object in
evolutionary clustering. Priors are defined for each type of object in a heterogeneous
information network and experimental results were produced to show how consistency and
quality of cluster changes according to the priors.
This document describes a new method for detecting community structure in complex networks based on node similarity. The method works as follows:
1. It calculates the similarity between all node pairs using a local node similarity metric.
2. It treats each node as its own community initially. Then it iteratively incorporates the community of the current node with the communities containing its most similar nodes.
3. It selects the most similar uncovered node as the next current node, and repeats the process until all nodes have been incorporated into communities.
The method requires only local network information and has a computational complexity of O(nk) for a network with n nodes and average degree k. It is evaluated on real and computer-generated networks, demonstrating
The document proposes an improved clustering algorithm for social network analysis. It combines BSP (Business System Planning) clustering with Principal Component Analysis (PCA) to group social network objects into classes based on their links and attributes. Specifically, it applies PCA before BSP clustering to reduce the dimensionality of the social network data and retain only the most important variables for clustering. This improves the BSP clustering results by focusing on the key information in the social network.
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...IOSR Journals
This document discusses link prediction in social networks. It analyzes shortcomings of existing leading link prediction methods like common neighbor. It then proposes a modified common neighbor approach that takes into account both topological network structure and node similarities based on features. The approach generates a weight for each link based on the number of common features between nodes, divided by the total number of features. It then calculates a contribution score for each common neighbor by multiplying the weights of that neighbor's links to the two nodes. Experimental results on co-authorship networks show the modified common neighbor approach outperforms existing methods.
This document discusses predicting new friendships in social networks using temporal information. It describes research on predicting new links in social networks over time using supervised learning models trained on temporal features from past network interactions. The researchers used anonymized Facebook data over 28 months to train decision tree and neural network classifiers to predict new relationships, finding models using temporal information performed better than those without it.
A Survey On Link Prediction In Social NetworksApril Smith
This document summarizes a survey on link prediction in social networks. It discusses how link prediction can be framed as a binary classification problem to predict whether a link will exist between two nodes in the future. It presents a framework for link prediction that uses topological features of the network, applies proximity measures between nodes, and uses supervised learning algorithms. The document discusses several papers on link prediction approaches, including initial clustering of related nodes before prediction and using a generalized clustering coefficient feature without explicit clustering.
An information-theoretic, all-scales approach to comparing networksJim Bagrow
My presentation at NetSci 2018 on Portrait Divergence, a new approach to comparing networks that is simple, general-purpose, and easy to interpret.
The preprint: https://arxiv.org/abs/1804.03665
The code: https://github.com/bagrow/portrait-divergence
Textual Data Partitioning with Relationship and Discriminative AnalysisEditor IJMTER
Data partitioning methods are used to partition the data values with similarity. Similarity
measures are used to estimate transaction relationships. Hierarchical clustering model produces tree
structured results. Partitioned clustering produces results in grid format. Text documents are
unstructured data values with high dimensional attributes. Document clustering group ups unlabeled text
documents into meaningful clusters. Traditional clustering methods require cluster count (K) for the
document grouping process. Clustering accuracy degrades drastically with reference to the unsuitable
cluster count.
Textual data elements are divided into two types’ discriminative words and nondiscriminative
words. Only discriminative words are useful for grouping documents. The involvement of
nondiscriminative words confuses the clustering process and leads to poor clustering solution in return.
A variation inference algorithm is used to infer the document collection structure and partition of
document words at the same time. Dirichlet Process Mixture (DPM) model is used to partition
documents. DPM clustering model uses both the data likelihood and the clustering property of the
Dirichlet Process (DP). Dirichlet Process Mixture Model for Feature Partition (DPMFP) is used to
discover the latent cluster structure based on the DPM model. DPMFP clustering is performed without
requiring the number of clusters as input.
Document labels are used to estimate the discriminative word identification process. Concept
relationships are analyzed with Ontology support. Semantic weight model is used for the document
similarity analysis. The system improves the scalability with the support of labels and concept relations
for dimensionality reduction process.
AN GROUP BEHAVIOR MOBILITY MODEL FOR OPPORTUNISTIC NETWORKS csandit
Mobility is regarded as a network transport mechanism for distributing data in many networks.
However, many mobility models ignore the fact that peer nodes often carried by people and
thus move in group pattern according to some kind of social relation. In this paper, we propose
one mobility model based on group behavior character which derives from real movement
scenario in daily life. This paper also gives the character analysis of this mobility model and
compares with the classic Random Waypoint Mobility model.
The document provides a literature review of different clustering techniques. It begins by defining clustering and its applications. It then categorizes and describes several clustering methods including hierarchical (BIRCH, CURE, CHAMELEON), partitioning (k-means, k-medoids), density-based (DBSCAN, OPTICS, DENCLUE), grid-based (CLIQUE, STING, MAFIA), and model-based (RBMN, SOM) methods. For each method, it discusses the algorithm, advantages, disadvantages and time complexity. The document aims to provide an overview of various clustering techniques for classification and comparison.
This document compares the k-means and grid density clustering algorithms. K-means partitions data into k clusters based on minimizing distances between points and cluster centroids. It works well with numerical data but can be affected by outliers. Grid density determines dense grids based on neighbor densities and can handle different shaped and multi-density clusters without knowing the number of clusters beforehand. It has advantages over k-means in that it can handle categorical data, noise and arbitrary shaped clusters.
Proposing a new method of image classification based on the AdaBoost deep bel...TELKOMNIKA JOURNAL
Image classification has different applications. Up to now, various algorithms have been presented
for image classification. Each of these methods has its own weaknesses and strengths. Reducing error rate
is an issue which many researches have been carried out about it. This research intends to optimize
the problem with hybrid methods and deep learning. The hybrid methods were developed to improve
the results of the single-component methods. On the other hand, a deep belief network (DBN) is a generative
probabilistic modelwith multiple layers of latent variables and is used to solve the unlabeled problems. In
fact, this method is anunsupervised method, in which all layers are one-way directed layers except for
the last layer. So far, various methods have been proposed for image classification, and the goal of this
research project was to use a combination of the AdaBoost method and the deep belief network method to
classify images. The other objective was to obtain better results than the previous results. In this project, a
combination of the deep belief network and AdaBoost method was used to boost learning and the network
potential was enhanced by making the entire network recursive. This method was tested on the MINIST
dataset and the results were indicative of a decrease in the error rate with the proposed method as compared
to the AdaBoost and deep belief network methods.
Methods of Combining Neural Networks and Genetic AlgorithmsESCOM
1. The document discusses methods for combining neural networks and genetic algorithms. It describes three main approaches: evolving connection weights, evolving network architectures, and evolving learning rules.
2. In the first approach, a genetic algorithm is used as the learning rule to optimize neural network weights. The second approach uses a genetic algorithm to select structural parameters of the network and trains the network separately. The third approach evolves parameters of the network's learning rule.
3. Combining neural networks and genetic algorithms can help compensate for each techniques' weaknesses in search, and may lead to highly successful adaptive systems by integrating learning and evolutionary search.
Review Paper on Shared and Distributed Memory Parallel Algorithms to Solve Bi...JIEMS Akkalkuwa
This document presents a review of parallel algorithms to solve big data problems in biological, social network, and spatial domains using shared and distributed memory. It discusses sequential and parallel algorithms for community detection in protein-protein interaction networks and social networks. It also discusses techniques for processing and analyzing large LiDAR point cloud data for applications like forest monitoring and 3D modeling. The document reviews relevant literature on algorithms for community detection, network partitioning, and LiDAR data reduction and interpolation. It then describes the BLLP algorithm for community detection in biological networks and discusses how it could be extended to distributed memory systems.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
In network aggregation techniques for wireless sensor networks - a surveyGungi Achi
This document provides a comprehensive survey of in-network aggregation techniques for wireless sensor networks. It begins by defining in-network aggregation and classifying approaches into those with and without data size reduction. It identifies the key components of in-network aggregation as routing protocols, aggregation functions, and data representation. It reviews theoretical limits of aggregation and discusses open issues. The document aims to provide an updated view of in-network aggregation and motivate future research in this area.
This document summarizes a research paper on developing an improved LEACH (Low-Energy Adaptive Clustering Hierarchy) communication protocol for energy efficient data mining in multi-feature sensor networks. It begins with background on wireless sensor networks and issues like energy efficiency. It then discusses the existing LEACH protocol and its drawbacks. The proposed improved LEACH protocol includes cluster heads, sub-cluster heads, and cluster nodes to address LEACH's limitations. This new version aims to minimize energy consumption during cluster formation and data aggregation in multi-feature sensor networks.
The project re-implements the architecture of the paper Reasoning with Neural Tensor Networks for Knowledge Base Completion in Torch framework, achieving similar accuracy results with an elegant implementation in a modern language.
Below are some links for further details:
https://github.com/agarwal-shubham/Reasoning-Over-Knowledge-Base
http://darsh510.github.io/IREPROJ/
This document summarizes a research paper that proposes a new density-based clustering technique called Triangle-Density Based Clustering Technique (TDCT) to efficiently cluster large spatial datasets. TDCT uses a polygon approach where the number of data points inside each triangle of a polygon is calculated to determine triangle densities. Triangle densities are used to identify clusters based on a density confidence threshold. The technique aims to identify clusters of arbitrary shapes and densities while minimizing computational costs. Experimental results demonstrate the technique's superiority in terms of cluster quality and complexity compared to other density-based clustering algorithms.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Ensemble based Distributed K-Modes ClusteringIJERD Editor
Clustering has been recognized as the unsupervised classification of data items into groups. Due to the explosion in the number of autonomous data sources, there is an emergent need for effective approaches in distributed clustering. The distributed clustering algorithm is used to cluster the distributed datasets without gathering all the data in a single site. The K-Means is a popular clustering method owing to its simplicity and speed in clustering large datasets. But it fails to handle directly the datasets with categorical attributes which are generally occurred in real life datasets. Huang proposed the K-Modes clustering algorithm by introducing a new dissimilarity measure to cluster categorical data. This algorithm replaces means of clusters with a frequency based method which updates modes in the clustering process to minimize the cost function. Most of the distributed clustering algorithms found in the literature seek to cluster numerical data. In this paper, a novel Ensemble based Distributed K-Modes clustering algorithm is proposed, which is well suited to handle categorical data sets as well as to perform distributed clustering process in an asynchronous manner. The performance of the proposed algorithm is compared with the existing distributed K-Means clustering algorithms, and K-Modes based Centralized Clustering algorithm. The experiments are carried out for various datasets of UCI machine learning data repository.
Influence of priors over multityped object in evolutionary clusteringcsandit
In recent years, Evolutionary clustering is an evolving research area in data mining. The
evolution diagnosis of any homogeneous as well as heterogeneous network will provide an
overall view about the network. Applications of evolutionary clustering includes, analyzing, the
social networks, information networks, about their structure, properties and behaviors. In this
paper, the authors study the problem of influence of priors over multi-typed object in
evolutionary clustering. Priors are defined for each type of object in a heterogeneous
information network and experimental results were produced to show how consistency and
quality of cluster changes according to the priors.
This document describes a new method for detecting community structure in complex networks based on node similarity. The method works as follows:
1. It calculates the similarity between all node pairs using a local node similarity metric.
2. It treats each node as its own community initially. Then it iteratively incorporates the community of the current node with the communities containing its most similar nodes.
3. It selects the most similar uncovered node as the next current node, and repeats the process until all nodes have been incorporated into communities.
The method requires only local network information and has a computational complexity of O(nk) for a network with n nodes and average degree k. It is evaluated on real and computer-generated networks, demonstrating
The document proposes an improved clustering algorithm for social network analysis. It combines BSP (Business System Planning) clustering with Principal Component Analysis (PCA) to group social network objects into classes based on their links and attributes. Specifically, it applies PCA before BSP clustering to reduce the dimensionality of the social network data and retain only the most important variables for clustering. This improves the BSP clustering results by focusing on the key information in the social network.
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...IOSR Journals
This document discusses link prediction in social networks. It analyzes shortcomings of existing leading link prediction methods like common neighbor. It then proposes a modified common neighbor approach that takes into account both topological network structure and node similarities based on features. The approach generates a weight for each link based on the number of common features between nodes, divided by the total number of features. It then calculates a contribution score for each common neighbor by multiplying the weights of that neighbor's links to the two nodes. Experimental results on co-authorship networks show the modified common neighbor approach outperforms existing methods.
This document discusses predicting new friendships in social networks using temporal information. It describes research on predicting new links in social networks over time using supervised learning models trained on temporal features from past network interactions. The researchers used anonymized Facebook data over 28 months to train decision tree and neural network classifiers to predict new relationships, finding models using temporal information performed better than those without it.
A Survey On Link Prediction In Social NetworksApril Smith
This document summarizes a survey on link prediction in social networks. It discusses how link prediction can be framed as a binary classification problem to predict whether a link will exist between two nodes in the future. It presents a framework for link prediction that uses topological features of the network, applies proximity measures between nodes, and uses supervised learning algorithms. The document discusses several papers on link prediction approaches, including initial clustering of related nodes before prediction and using a generalized clustering coefficient feature without explicit clustering.
An information-theoretic, all-scales approach to comparing networksJim Bagrow
My presentation at NetSci 2018 on Portrait Divergence, a new approach to comparing networks that is simple, general-purpose, and easy to interpret.
The preprint: https://arxiv.org/abs/1804.03665
The code: https://github.com/bagrow/portrait-divergence
Community detection of political blogs network based on structure-attribute g...IJECEIAES
Complex networks provide means to represent different kinds of networks with multiple features. Most biological, sensor and social networks can be represented as a graph depending on the pattern of connections among their elements. The goal of the graph clustering is to divide a large graph into many clusters based on various similarity criteria’s. Political blogs as standard social dataset network, in which it can be considered as blog-blog connection, where each node has political learning beside other attributes. The main objective of work is to introduce a graph clustering method in social network analysis. The proposed Structure-Attribute Similarity (SAS-Cluster) able to detect structures of community, based on nodes similarities. The method combines topological structure with multiple characteristics of nodes, to earn the ultimate similarity. The proposed method is evaluated using well-known evaluation measures, Density, and Entropy. Finally, the presented method was compared with the state-of-art comparative method, and the results show that the proposed method is superior to the comparative method according to the evaluations measures.
Distribution of maximal clique size of theIJCNCJournal
Our primary objective in this paper is to study the distribution of the maximal clique size of the vertices in complex networks. We define the maximal clique size for a vertex as the maximum size of the clique that the vertex is part of and such a clique need not be the maximum size clique for the entire network. We determine the maximal clique size of the vertices using a modified version of a branch-and-bound based exact algorithm that has been originally proposed to determine the maximum size clique for an entire network graph. We then run this algorithm on two categories of complex networks: One category of networks capture the evolution of small-world networks from regular network (according to the well-known Watts-Strogatz model) and their subsequent evolution to random networks; we show that the distribution of
the maximal clique size of the vertices follows a Poisson-style distribution at different stages of the evolution of the small-world network to a random network; on the other hand, the maximal clique size of the vertices is observed to be in-variant and to be very close to that of the maximum clique size for the entire network graph as the regular network is transformed to a small-world network. The second category
of complex networks studied are real-world networks (ranging from random networks to scale-free networks) and we observe the maximal clique size of the vertices in five of the six real-world networks to follow a Poisson-style distribution. In addition to the above case studies, we also analyze the correlation between the maximal clique size and clustering coefficient as well as analyze the assortativity index of the
vertices with respect to maximal clique size and node degree.
Community Detection in Networks Using Page Rank Vectors ijbbjournal
Nodes in the real world networks organize in the form of network communities. A community (also referred
to as module or cluster)is defined as where the links are denser inside the nodes and sparser outside the
nodes in the network. Communities in the networks also overlap because the nodes may belong to different
clusters at once. The task of detecting communities in networks becomes an open problem because of lack
of reliable algorithms. In practice all the existing community detection methods work good for nonoverlapping
communities and fail to detect communities with dense overlaps. We developed a novel method
for detecting communities by considering a single seed node. This method successfully captures the
overlapping networks ranging from social to information and from biological to citation networks. We
believe that the proposed system works well for the overlapping communities.
Community Detection in Networks Using Page Rank Vectors ijbbjournal
Nodes in the real world networks organize in the form of network communities. A community (also referred
to as module or cluster)is defined as where the links are denser inside the nodes and sparser outside the
nodes in the network. Communities in the networks also overlap because the nodes may belong to different
clusters at once. The task of detecting communities in networks becomes an open problem because of lack
of reliable algorithms. In practice all the existing community detection methods work good for nonoverlapping
communities and fail to detect communities with dense overlaps. We developed a novel method
for detecting communities by considering a single seed node. This method successfully captures the
overlapping networks ranging from social to information and from biological to citation networks. We
believe that the proposed system works well for the overlapping communities.
The study about the analysis of responsiveness pair clustering tosocial netwo...acijjournal
In this study, regional (cities, towns and villages
) data and tweet data are obtained from Twitter, an
d
extract information of "purchase information (Where
and what bought)" from the tweet data by
morphological analysis and rule-based dependency an
alysis. Then, the "The regional information" and th
e
"Theinformation of purchase history (Where and wha
t bought information)" are captured as bipartite
graph, and Responsiveness Pair Clustering analysis
(a clustering using correspondence analysis as
similarity measure) is conducted. In this study, si
nce it was found to be difficult to analyze a netwo
rk such
as bipartite graph having limitations in links by u
sing modularity Q, responsiveness is used instead o
f
modularity Q as similarity measure. As a result of
this analysis, "regional information cluster" which
refers
to similar "Theinformation of purchase history" nod
es group is generated. Finally, similar regions are
visualized by mapping the regional information clus
ter on the map. This visualization system is expect
ed to
contribute as an analytical tool for customers’ pur
chasing behaviour and so on.
Sub-Graph Finding Information over Nebula Networksijceronline
Social and information networks have been extensively studied over years. This paper studies a new query on sub graph search on heterogeneous networks. Given an uncertain network of N objects, where each object is associated with a network to an underlying critical problem of discovering, top-k sub graphs of entities with rare and surprising associations returns k objects such that the expected matching sub graph queries efficiently involves, Compute all matching sub graphs which satisfy "Nebula computing requests" and this query is useful in ranking such results based on the rarity and the interestingness of the associations among nebula requests in the sub graphs. "In evaluating Top k-selection queries, "we compute information nebula using a global structural context similarity, and our similarity measure is independent of connection sub graphs". We need to compute the previous work on the matching problem can be harnessed for expected best for a naive ranking after matching for large graphs. Top k-selection sets and search for the optimal selection set with the large graphs; sub graphs may have enormous number of matches. In this paper, we identify several important properties of top-k selection queries, We propose novel top–K mechanisms to exploit these indexes for answering interesting sub graph queries efficiently.
SEARCH OF INFORMATION BASED CONTENT IN SEMI-STRUCTURED DOCUMENTS USING INTERF...ijcsitcejournal
This paper proposes a semi-structured information retrieval model based on a new method for calculation
of similarity. We have developed CASISS (Calculation of Similarity of Semi-Structured documents)
method to quantify how two given texts are similar. This new method identifies elements of semi-structured
documents using elements descriptors. Each semi-structured document is pre-processed before the
extraction of a set of descriptors for each element, which characterize the contents of elements.It can be
used to increase the accuracy of the information retrieval process by taking into account not only the
presence of query terms in the given document but also the topology (position continuity) of these terms.
National Resource for Networks Biology's TR&D Theme 3: Although networks have been very useful for representing molecular interactions and mechanisms, network diagrams do not visually resemble the contents of cells. Rather, the cell involves a multi-scale hierarchy of components – proteins are subunits of protein complexes which, in turn, are parts of pathways, biological processes, organelles, cells, tissues, and so on. In this technology research project, we will pursue methods that move Network Biology towards such hierarchical, multi-scale views of cell structure and function.
1. The document discusses a proposed technique called Fuzzy Based Improved Mutual Friend Crawling (Fmfc) for crawling online social networks. It aims to reduce bias introduced by the time taken for crawling the whole network.
2. The technique crawls all users within the same community first before moving to the next community, allowing researchers to selectively obtain users belonging to the same community. This is compared to existing mutual friend crawling.
3. The paper also provides a literature review of existing crawling techniques and studies of complex network properties relevant to community detection in networks. Future work in overlapping communities and performance evaluation on very large networks is discussed.
The document summarizes a thesis presentation on graph clustering and community detection algorithms. It discusses:
1. The main contributions of the thesis, which include improving existing clustering algorithms and applying the new algorithms to synthetic and real-world networks.
2. An outline of the presentation covering introduction, literature review, proposed algorithms, experimentation on graphs, and applications.
3. Critical observations on existing clustering algorithms related to scalability, accuracy, and memory requirements.
4. The objectives of investigating improved clustering techniques to address issues in existing methods and apply the new algorithms to complex networks.
In this paper the design of an experiment is presented. An experiment was designed to select relevant and not redundant features or characterization functions, which allow quantitatively discriminating among different types of complex networks. As well there exist researchers given to the task of classifying some networks of the real world through characterization functions inside a type of complex network, they do not give enough evidences of detailed analysis of the functions that allow to determine if all of them are necessary to carry out an efficient discrimination or which are better functions for discriminating. Our results show that with a reduced number of characterization functions such as the degree dispersion coefficient can discriminate efficiently among the types of complex networks treated here.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
INFLUENCE OF PRIORS OVER MULTITYPED OBJECT IN EVOLUTIONARY CLUSTERINGcscpconf
In recent years, Evolutionary clustering is an evolving research area in data mining. The evolution diagnosis of any homogeneous as well as heterogeneous network will provide an overall view about the network. Applications of evolutionary clustering includes, analyzing, the social networks, information networks, about their structure, properties and behaviors. In this paper, the authors study the problem of influence of priors over multi-typed object in
evolutionary clustering. Priors are defined for each type of object in a heterogeneous information network and experimental results were produced to show how consistency and quality of cluster changes according to the priors
New Similarity Index for Finding Followers in Leaders Based Community DetectionIRJET Journal
This document presents a new similarity index method for improving the accuracy of leader-based community detection in large networks. Leader-based community detection identifies influential leader nodes and assigns other nodes as followers based on their similarity to the leaders. The existing method's accuracy decreases as the network size increases. The proposed method modifies the similarity calculation to increase accuracy for networks with more than 2000 nodes. It was tested on synthetic networks generated by the LFR benchmark model and evaluated using normalized mutual information and adjusted rand index, showing accuracy remains high even for large networks.
Similar to GRAPH ALGORITHM TO FIND CORE PERIPHERY STRUCTURES USING MUTUAL K-NEAREST NEIGHBORS (20)
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSIJNSA Journal
The smart irrigation system represents an innovative approach to optimize water usage in agricultural and landscaping practices. The integration of cutting-edge technologies, including sensors, actuators, and data analysis, empowers this system to provide accurate monitoring and control of irrigation processes by leveraging real-time environmental conditions. The main objective of a smart irrigation system is to optimize water efficiency, minimize expenses, and foster the adoption of sustainable water management methods. This paper conducts a systematic risk assessment by exploring the key components/assets and their functionalities in the smart irrigation system. The crucial role of sensors in gathering data on soil moisture, weather patterns, and plant well-being is emphasized in this system. These sensors enable intelligent decision-making in irrigation scheduling and water distribution, leading to enhanced water efficiency and sustainable water management practices. Actuators enable automated control of irrigation devices, ensuring precise and targeted water delivery to plants. Additionally, the paper addresses the potential threat and vulnerabilities associated with smart irrigation systems. It discusses limitations of the system, such as power constraints and computational capabilities, and calculates the potential security risks. The paper suggests possible risk treatment methods for effective secure system operation. In conclusion, the paper emphasizes the significant benefits of implementing smart irrigation systems, including improved water conservation, increased crop yield, and reduced environmental impact. Additionally, based on the security analysis conducted, the paper recommends the implementation of countermeasures and security approaches to address vulnerabilities and ensure the integrity and reliability of the system. By incorporating these measures, smart irrigation technology can revolutionize water management practices in agriculture, promoting sustainability, resource efficiency, and safeguarding against potential security threats.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELgerogepatton
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
Low power architecture of logic gates using adiabatic techniquesnooriasukmaningtyas
The growing significance of portable systems to limit power consumption in ultra-large-scale-integration chips of very high density, has recently led to rapid and inventive progresses in low-power design. The most effective technique is adiabatic logic circuit design in energy-efficient hardware. This paper presents two adiabatic approaches for the design of low power circuits, modified positive feedback adiabatic logic (modified PFAL) and the other is direct current diode based positive feedback adiabatic logic (DC-DB PFAL). Logic gates are the preliminary components in any digital circuit design. By improving the performance of basic gates, one can improvise the whole system performance. In this paper proposed circuit design of the low power architecture of OR/NOR, AND/NAND, and XOR/XNOR gates are presented using the said approaches and their results are analyzed for powerdissipation, delay, power-delay-product and rise time and compared with the other adiabatic techniques along with the conventional complementary metal oxide semiconductor (CMOS) designs reported in the literature. It has been found that the designs with DC-DB PFAL technique outperform with the percentage improvement of 65% for NOR gate and 7% for NAND gate and 34% for XNOR gate over the modified PFAL techniques at 10 MHz respectively.
Technical Drawings introduction to drawing of prisms
GRAPH ALGORITHM TO FIND CORE PERIPHERY STRUCTURES USING MUTUAL K-NEAREST NEIGHBORS
1. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.12, No.1, January 2021
DOI: 10.5121/ijaia.2021.12101 1
GRAPH ALGORITHM TO FIND CORE
PERIPHERY STRUCTURES USING MUTUAL
K-NEAREST NEIGHBORS
Divya Sardana1
and Raj Bhatnagar2
1Teradata Corp., Santa Clara, CA, USA
2
Department of Electrical Engineering and Computer Science,
University of Cincinnati, OH, USA
ABSTRACT
Core periphery structures exist naturally in many complex networks in the real-world like social,
economic, biological and metabolic networks. Most of the existing research efforts focus on the
identification of a meso scale structure called community structure. Core periphery structures are another
equally important meso scale property in a graph that can help to gain deeper insights about the
relationships between different nodes. In this paper, we provide a definition of core periphery structures
suitable for weighted graphs. We further score and categorize these relationships into different types based
upon the density difference between the core and periphery nodes. Next, we propose an algorithm called
CP-MKNN (Core Periphery-Mutual K Nearest Neighbors) to extract core periphery structures from
weighted graphs using a heuristic node affinity measure called Mutual K-nearest neighbors (MKNN).
Using synthetic and real-world social and biological networks, we illustrate the effectiveness of developed
core periphery structures.
KEYWORDS
Graph Mining, Core Periphery Structures, MKNN, Complex Networks.
1. INTRODUCTION
Weighted complex networks have non-trivial topological properties as well as non-homogeneous
edge weights [1]. Meso-scale structures help in studying the features of complex networks which
are not easily evident at the local or global scale, e.g., community structure. Most algorithms for
finding community structure in graphs result in disjoint subsets or clusters. However, in many
complex networks, nodes play more than one role in the network, for example multi-functional
proteins in protein-protein interaction (PPI) networks and people belonging to multiple social
groups in a social network. Overlapping or fuzzy clustering algorithms result in multiple cluster
assignments to nodes thereby helping to study relationships among clusters in the network. For
e.g., in a co-authorship network, overlapping communities would represent authors who
collaborate in multiple groups. Many overlapping clustering algorithms exist in literature that
work for complex networks [2], [3], [4], [5], [6], [7], [8].
Another meso-scale structure called core periphery is also important in understanding complex
networks. However, its research literature is not as wide spread as that exists for community
structure. In [9], the authors illustrated that overlapping clusters naturally lead to the existence of
dense cores surrounded by sparse peripheries in many complex networks. In [10], the authors use
real world Twitter data to provide empirical evidence that community structure is always
accompanied by a core periphery structure in a social network.
2. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.12, No.1, January 2021
2
The analysis of core periphery structures can be very useful in developing insights in many real-
world networks. For example, in a social network, a dense core group of close friends is usually
surrounded by a sparsely connected group of periphery acquaintances. A study of core periphery
structures in such a network can help in identifying how trends spread across the network of
friends. Similarly, in a protein-protein interaction (PPI) network, a dense core could be seen as a
cohesively connected set of proteins surrounded by a loosely connected set of periphery proteins
which dynamically connect to more than one core set depending upon their function.
Borgatti and Everett were the first to describe a core periphery structure as a dense and
cohesively connected core which is surrounded by a less dense periphery with loose connections
to it. [11]. Traditionally, the notion of core periphery structures has been used in diverse fields
such as world systems [12], economics [13], organization studies [14] and social networks [15],
[16]. Core periphery structures have been employed for the analysis of biological networks such
as protein-protein interaction networks [17] and work groups [18]. With the accumulation of Big
data in various real-world domains, the usefulness of core periphery structures has increased even
more. In a recent paper [19], the authors perform an analysis of cores in a series of big crime data
so as to characterize the modus operandi of criminals across the crime series.
In weighted graphs, both, the topological structure of nodes as well as non-trivial edge weight
properties in the graph contribute towards density. Thus, a core periphery structure in a weighted
graph can have a dense core as determined by high edge weight density or high structural density
or both. We call this measure of density as SE-density. In figure 1, we demonstrate regions of
varying density in a weighted graph. The subset of nodes (R,S,T,U,V) form a clique and is
structurally very dense with low edge weight density (or low mean of edge similarities). On the
other hand, the subset of nodes (A, B, C, D, E, F, G) form a subset with high edge weight density
amongst each other with low structural density.
Figure 1. Synthetic Dataset 1
In this paper, we formally define core periphery structures that exist in weighted graphs. Next, we
build a graph algorithm called Core Periphery-MKNN (CP-MKNN) which identifies core
periphery structures in a weighted graph using a density based heuristic measure called MKNN
(Mutual K-nearest neighbors). We further score and categorize the peripheries based upon the
strength of their connection to the cores. The algorithm also identifies sibling relations between
core-periphery structures which are connected together by loose periphery-periphery connections.
We provide a comparison of the properties of core peripheries identified by CP-MKNN with
another recently developed core periphery algorithm called ClusterONE-CP [20]. Through the
use of synthetic and real world weighted social and PPI networks, we illustrate that CP-MKNN is
3. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.12, No.1, January 2021
3
able to find denser cores than ClusterONE-CP as per structure or edge weight density. Lastly, we
use a motivating real-world example in protein-protein interaction networks to express the
usefulness of found core-periphery structures.
Next, we provide an outline of the paper. In section 2, we discuss the related work in the field of
core periphery structures. We present our definition of core periphery structures in weighted
graphs along with other relevant terms in section 3. We provide our methodology to find core
periphery structures for a weighted graph in section 4. In section 5, we present an extensive
experimental evaluation of our algorithm on two synthetic and real world social and PPI
networks. In section 6, we conclude our findings.
2. RELATED WORK
While the notion of core-periphery structures has been in use for quite a while, it was Borgatti
and Everett, who formalized the first model for finding core-periphery structures in a network in
the late nineties [11]. They proposed a quality function comparing the network to an ideal model
of core-periphery structures where nodes are connected to each other only if they have core
membership. This algorithm is limited to the division of the whole graph into one core and one
periphery. This model was extended by the authors in their following paper to identify more than
one core and periphery structure. [21]. Another algorithm [22] based on the same idea was built
using a more flexible model to assign a core score to each node.
A Kernighan-Lin algorithm-based methodology was proposed by Boyd et al. [23] to find core
periphery structures in a social network. An enhanced version of this algorithm was built by Luo
et al. [17] to identify k-plex cores in PPI networks. The idea of random walks has been used in
[24] to assign a coreness score to nodes in a network as a centrality measure. An overlapping
tiles-based model was developed by Yang and Leskovec [9] to identify overlapping communities
and core periphery structures in a network. In [25], Bruckner et al. developed a core periphery
identification technique for PPI networks using graph modification. Sardana et al. [20] proposed
a core periphery algorithm called ClusterONE-CP based on a greedy growth based overlapping
graph clustering algorithm called ClusterONE. de Jeude et al. [26] developed a p-value based
method using multinomial hypergeometric distribution to assign a surprise like score to network
partitions and categorize them into cores and peripheries. Three methodologies for find core
periphery structures in directed graphs were described in [27]. Two of them are based upon the
well know link analysis algorithm called Hyperlink-Induced Topic Search (HITS) [28] and one of
them is based upon the concept of likelihood maximization.
The concept of MKNN heuristic was first coined in for use in data clustering in [29]. Sardana et
al. utilized this heuristic for finding density-based clusters in a weighted graph [30]. In this paper,
we use this heuristic measure to find core periphery structures in weighted graphs. Further, the
work in this publication is a part of this paper’s first author’s PhD dissertation [31].
3. BACKGROUND
In a weighted graph, we formalize a core periphery structure as a dense, cohesive core set C,
surrounded by a loosely connected periphery set P. The nodes in a core periphery structure are
identified by the union (C ∪ P). The core set C is denser than the periphery set P as per structural
density or edge weight density. We further categorize the peripheries into two types: 1 and 2
depending upon the nature of density difference between the core and periphery. These two types
of peripheries are defined in detail next.
4. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.12, No.1, January 2021
4
1) Periphery Type 1: A periphery of type 1 is less dense than its core as per structure density
measured using a term called Cohesion as defined below.
Definition 3.1. [Cohesion, η] For a set S, cohesion empirically captures the degree of intensity by
which the nodes are structurally connected to each other. Mathematically, we define cohesion as
follows.
Here Insim(S) corresponds to the sum of internal similarity weights for the set S, and Cut(S)
represents the sum of edge weights connecting S with rest of the clusters. Formally, we define
periphery P to be of type 1 for its core C if Cohesion(C) > Cohesion (C ∪ E ∪ P) where E is the
edge set connecting C and P. Figure 2 demonstrates a core periphery structure of type 1 with a
core of blue nodes having a higher structural density than its type 1 periphery of red nodes.
2) Periphery Type 2: A periphery type 2 is less dense than its core set as per edge weight density
measured using the mean of edge weights (μ) of the edges belonging to a cluster. In other words,
a periphery P connected to its core C using edge set E is labeled as type 2 if μ(Edge weights
belonging to C) − μ(Edge weights belonging to (P U E)) > m (MEAN OFFSET). Here m or
MEAN OFFSET is a user defined parameter with a default value of 0.2. Figure 2 demonstrates a
core periphery structure of type 2 with a center core of blue nodes having a higher edge weight
density than its type 2 periphery represented by green nodes. Note that the width of the edges has
been drawn proportional to their edge weights.
Figure 2. Example of two types of Core Periphery Structures in a weighted graph. The center blue core has
two peripheries: Red (Type 1) and Green (Type 2).
We further assign a score or a distance to each core periphery structure based upon the density
difference between its core and periphery constituents. On similar lines, a score based upon
domain knowledge can also be constructed to make the core periphery relationships more
meaningful. We construct an example core periphery score, called CPScore and an example core
periphery distance called CPDistance as defined below. Let E be the edge set connecting the core
and periphery sets.
Definition 3.2. [CPScore] CPScore for a core periphery structure is defined as the number of
actual edges in the set C ∪ E ∪ P divided by the number of all possible edges in the set C ∪ E ∪
P. Intuitively, this score captures the structural density of the hypothetical cluster formed by
merging the core C and its periphery P. A higher value of CP score is an indicator of high
structural density between core and periphery. Formally, we calculate CPScore is as below.
5. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.12, No.1, January 2021
5
Definition 3.3. [CPDistance] CPDistance is defined as the number of standard deviations () that
the periphery edges are away from the mean () of core edges. It gives a measure of edge weight
density difference between the core and periphery. A lower value of CPDistance indicates a
strong edge weight density similarity between the core and periphery. Formally, we calculate
CPDistance as below.
In the above formula, win corresponds to the edge weights of edges inside the cluster. The core
periphery types and scores as defined above can be very useful in real-world networks to rank the
strength of connection from cores to peripheries. For e.g., in a co-authorship network the
CPScore and CPDistance can be used to prioritize potential co-authors for a core group based
upon the quantity and quality of shared publications with its periphery group of authors.
4. METHODOLOGY
Consider an undirected, weighted graph, G = (V, E) with V as the vertex set and E as the edge
set. The edge weights are represented by a similarity matrix (SM). Each element of SM contains
a non-negative real number representing similarity between two vertices between 0 and 1. The
CP-MKNN algorithm is performed in three phases as described next.
4.1. Initialization Phase
The initialization phase involves two steps. First, it expands G to form an augmented graph Ga
where secondary similarities are defined between all nodes up to four hops away. This is done by
applying Dijkstra’s algorithm in the four-hop neighborhood of each node. Second, a Mutual K-
nearest neighbor (MKNN) matrix is generated using the similarities in Ga. MKNN is a node
affinity measure as defined in [28] for use in graph clustering. As opposed to the classic KNN
algorithm, MKNN is based on a two-sided relationship between vertices. We explain the concept
of MKNN for K=4 using figure 3. Node G has 4 KNNs, namely, nodes A, B, D and E. However,
nodes A, B, D and E already have found K=4 KNNs amongst the red nodes. MKNN defines a
mutual relationship between two nodes, therefore, G forms an MKNN relationship with the next
available node F with less than K=4 neighbors. Intuitively, for relatively low values of K, MKNN
is able to capture set of nodes at approximately the same level of edge-weight based density. We
use this heuristic to guide our core periphery formation algorithm.
Figure 3. Example of Mutual K-nearest neighbors in a weighted graph
6. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.12, No.1, January 2021
6
4.2. Phase 1 (Preliminary Phase)
The preliminary phase involves growing small sized dense subgraphs by merging some seed
nodes with their MKNN neighbors. In this phase we call each node as an i-node. The i-nodes are
ranked in a non-increasing order of density around them using a term called radius defined next.
Given an i-node ini with similarities to its p MKNNs being (sm1, sm2...smp). Let the degrees of
MKNN nodes be (d1, d2, ...dp). We define radius, σ for ini as follows.
The average degree of a graph’s i-nodes corresponds to davg. Based upon this formulation, i-nodes
having a higher radius have denser surroundings and get a chance to merge with their MKNNs
earlier than other i-nodes. We call this merging order defined by radius as the merge initiator
order.
In phase 1, i-nodes merge with their MKNN neighbors in the merge initiator order. If in this
process, some neighbors are found to already belong to some other subgroup, then this neighbor
is merged with the initiator where it leads to the least increase in standard deviation of edges
inside the subgroup. This process continues until all initiators are exhausted. Figure 4 illustrates
the small subsets formed after phase 1 for synthetic dataset 1. The pseudocode for this phase is
given in algorithm 1.
4.3. Phase 2 (Merge and Boundary Formation Phase)
In the preliminary phase, i-nodes merge with their MKNNs to generate small subgroups, which
we denote as a c-node in phase 2. In the merge phase, (1) pairs of c-nodes with high level of
similarity between them are merged with each other, and (2) boundary c-node sets are generated
for each c-node which will later help in the formation of core-periphery structures. A connectivity
matrix (CM) is constructed to define a notion of similarity among pair of c-nodes. We formulate
it as below.
Here linkage (cni, cnj) is calculated as the sum of primary similarities of edges that link cni’s i-
nodes and cnj’s i-nodes. Further, cut(cni) is used to capture the primary similarity summation
Figure 4. Dense subsets formed after Phase 1 for Synthetic Dataset 1
7. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.12, No.1, January 2021
7
between cni’s i-nodes and i-nodes belonging to other graph c-nodes. Intuitively, CM captures the
relative strength of connection between two c-nodes as compared to their connections with all
other graph c-nodes. Next, using CM values as a measure of similarity, MKNN relationships are
defined among c-nodes. This helps to avoid considering every pair of c-nodes in CM for possible
merging. For example, in figure 4, the core (C, D, E, F, G) forms MKNN neighbors as (A, B),
(AN, AP) and (J, H). However, all c-nodes which are MKNN are not best to merge with each
other. Before two c-nodes are merged, we ensure that they both are synchronized in terms of
structural density and edge-weight density. This is done by using the two checks below for
validating every merge.
1) Check for structural density: Two c-nodes pass the structural similarity check if the
Cohesion of the prospective merged c-node is greater than the Cohesion of the initiator c-
node.
2) Check for edge-weight density: Two c-nodes pass the edge weight density check if the
mean difference of their edge weights is within a user defined parameter called MEAN
OFFSET, m. This ensures that the two c-nodes are homogeneous in terms of edge
weights and the variance of the merged cluster remains low.
The following rules are established to decide about merging and boundary set formation.
1) If c-nodei and its MKNN c-nodej satisfy both constraint A and B, then the two c-nodes
merge with each other to form a bigger sized core if they are cores, or are peripheries
belonging to the same set of cores. Otherwise, the two c-nodes
8. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.12, No.1, January 2021
8
form a weak periphery-periphery relationship. In this case, c-nodej is put in boundary
node set pp (periphery-periphery) of c-nodei
2) If c-nodei and its MKNN c-nodej satisfy only the structural density constraint A, c-nodej is put
in boundary nodeset low of c-nodei.
3) If c-nodei and its MKNN c-nodej satisfy only the edge weight density constraint B, c-nodej is
put in boundary nodeset inrange of c-nodei.
4) If c-nodei and its MKNN c-nodej do not satisfy any of the constraint A or B, c-nodej is put in
boundary nodeset low of c-nodei if its mean is lower than the mean of c-nodei, or else in
boundary nodeset high of c-nodei.
As an example, in figure 4, the red core (C, D, E, F, G) merges with its MKNN neighbor (A, B),
whereas, it puts its MKNN subsets (AP, AN) and (J, H) in its boundary sets. The subsets (AP,
AN) and (J, H) further merge with their own MKNN neighbors to grow in size. The pseudocode
for phase 2 is given in algorithm 2.
9. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.12, No.1, January 2021
9
4.4. Phase 3 (Core Periphery Structure Formation Phase)
After the merge phase completes, it leads to the formation of dense c-nodes and their
corresponding boundary c-node sets, low, inrange, high and pp. The boundary sets vary in terms
of density difference from the core. In this phase, all c-nodes are traversed in the order of their
density measure Cohesion to explore their boundary c-node sets for the possibility of formation
of core periphery and periphery periphery relationships. Further, each relationship is assigned a
type as described below.
1) Core with Type 1 periphery: This type of periphery has a lower structural density than its
core. c-nodes lying in boundary c-nodesets in-range of the core c-node are made as type 1
peripheries of the core.
2) Core with Type 2 periphery: This type of periphery has a lower edge weight density than its
core. c-nodes lying in boundary c-nodesets low of the core c-node are put in type 2 periphery of
the core. If the core c-node lies in boundary c-nodeset high of the periphery c-node, then this
periphery is also made a type 2 periphery of the core.
3) Periphery-Periphery: This type of relationship is formed between two peripheries which do
not share the same set of cores, but still have a weak connection between the two c-nodes lying in
boundary c-nodeset pp are candidates for this relationship. These relationships seem to weakly
connect two core periphery structures in a sibling relationship.
All the extracted relationships are next assigned with CPDistance and CPScore as defined before
in the background section. A resultant core periphery structure of type 2 formed for Synthetic
dataset 1 is shown in figure 5. Figure 6 shows an example of a core periphery structure with
periphery type 1 for synthetic dataset 1. Further, the red core in figure 5 and the green core in
figure 6 share a periphery (H, J, K, L, M, N, P, Q, Z), thereby forming a sibling relationship
between the two cores. All the steps for identifying core periphery relationships are described in
algorithm 3.
5. EXPERIMENTAL EVALUATION
We demonstrate the effectiveness of core periphery structures identified by CP-MKNN using two
synthetic and two real protein-protein interaction network-based graph datasets. We further
compare and contrast them with core-periphery structures generated by another algorithm called
ClusterONE-CP. Both CP-MKNN and ClusterONE-CP allow a cluster to be both a core or a
periphery or both and find core-periphery as well as well as periphery-periphery relationships.
For running ClusterONE-CP, we set node penalty equal to 2 and overlap threshold equal to 0.5.
5.1. Results for Synthetic Datasets
We generated two weighted graphs to use as synthetic datasets: The first synthetic dataset has 39
vertices and 74 weighted edges (figure 1). The synthetic dataset 2 has 49 vertices and 97
weighted edges (figure 7) We use these synthetic datasets to illustrate the comparison of core
periphery results obtained by CP-MKNN with those by ClusterONE-CP. In synthetic dataset 1,
we embedded two edge weight dense cores, namely, core 1 with nodes A, B, C, D, E, F, G, H and
core 2 with nodes AA, AB, AC, AD, AE, AF, AG. Further, there is one structurally core, core 3
with nodes R, S, T, U, V. The cores embedded in synthetic dataset 2 are Core 1 with nodes A, B,
C, D, E, F, G, H; Core 2 with nodes J, K, L, M, N, P, Q and Core 3 with nodes R, S, T, U, V, W.
All the cores are connected through peripheries. We use a measure called structural density
10. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.12, No.1, January 2021
10
(graph density) [32] to measure the structural cohesiveness of clusters (both cores and
peripheries). It is formally defined as
Further, we use average variance of the weighted edges inside a core or periphery to determine
how homogeneous are the edge weights inside the subset.
5.1.1. Comparison of CP-MKNN vs. ClusterONE-CP for Synthetic datasets
In table 1, a comparison of properties of cores and peripheries is provided between CP-MKNN
and ClusterONE- CP. It can be noted that both the datasets and both the algorithms, cores are
denser than the peripheries as per structural density or edge weight density denoted by mean of
edges, For CP-MKNN, in both the datasets, we see that all cores, core-peripheries and peripheries
obtain a low variance of edge weights. This signifies homogeneous edge weights. For
ClusterONE-CP, the variance results of synthetic dataset 2 indicate that the clusters are less
homogeneous as per edge weights.
Figure 5. Core Periphery Structure with peripheries of Type 2 (Core: Red nodes, Peripheries: Brown and
Pink nodes) for Synthetic Dataset 1
Sn
Cp (C, P)
Pp (P, P)
CP Pp
On Sn Cohesion
Oi ∈ On
P ∈ Oi inr ange, low
Pj ∈ P
(Pj ∈ boundar yset low µ(Oi ) > µ(Pj )) (Pj ∈ boundar yset
inr ange Cohesion(Oi ) > Cohesion(Pj ))
(Oi , Pj )
(Oi , Pj ) ∈ Pj
inr ange low
(Oi , Pj ) CP
(Pj ∈ boundar yset pp
(Oi , Pj )
11. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.12, No.1, January 2021
11
Figure 6. Core Periphery Structure with periphery Type 1(Core: Green nodes, Periphery: Brown nodes) for
Synthetic Dataset 1
Table 1. Core Periphery Structures by CP-MKNN and ClusterONE-CP for Synthetic datasets
No. Avg. Mean Avg. Standard
Deviation
Avg. Structural
Density
CP-MKNN for Synthetic Dataset 1
Cores 3 0.71 0.043 0.56
Peripheries 3 0.39 0.061 0.45
Core Peripheries 0 0 0 0
ClusterONE-CP for Synthetic Dataset 1
Cores 2 0.87 0.046 0.4
Peripheries 2 0.37 0.069 0.58
Core Peripheries 3 0.52 0.078 0.65
CP-MKNN for Synthetic Dataset 2
Cores 3 0.88 0.025 0.48
Peripheries 2 0.43 0.068 0.64
Core Peripheries 2 0.65 0.089 0.38
ClusterONE-CP for Synthetic Dataset 2
Cores 2 0.85 0.088 0.46
Peripheries 2 0.45 0.083 0.5
Core Peripheries 4 0.59 0.11 0.57
12. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.12, No.1, January 2021
12
Figure 7. Synthetic Dataset 2
5.2. Results for Movie Co-Appearance Dataset
Using synthetic datasets, we demonstrated that CP-MKNN obtains cores which are denser than
the core-peripheries and peripheries. In this section we demonstrate the difference between CP-
MKNN and ClusterONE-CP using a movie co-appearance dataset called Les Miserables [33].
This dataset is network of interaction amongst 77 characters in a movie based on Victor Hugo’s
novel called Les Miserables. The clustering coefficient of this dataset is 0.57. Figure 8 illustrates
a core found by CP-MKNN (blue nodes), K=2 and a core found by ClusterONE-CP (red and blue
nodes combined). The blue nodes correspond to a very dense core comprising of the novel’s
actor, Maurius (MA); actress, Cosette (CO) and the actress’s father, Jean Valjean (JV). The red
nodes comprise other important characters in the novel. CP-MKNN separates the most important
characters of the novel because it uses both structure and edge weight density constraints in
constructing core periphery structures.
Figure 8. A dense core by CP-MKNN (K=2) (blue nodes) and ClusterONE-CP (blue and red nodes) on
Les Miserables, a movie co-appearance dataset
5.3. Results for Real Social Network Dataset
We use a well-known social network dataset called Zachary’s Karate Club Network, frequently
used in the research community for evaluating community structures in graphs. Zachary’s dataset
is based on associations in a karate club among 34 people as its members, collected at a US
university of a total of three years [34]. We represent each member of the club by its number
from 1 to 34. The club’s instructor represented by node 1 had a disagreement with his president,
node 34 and the club got disintegrated into two factions. The first group mainly consisted of
supporters of the president (node 34), and the second group represented people who supported the
instructor (node 1). We use a weighted version of the network here. The edge weights are based
upon the number of common activities that the club members took part in.
Figure 9 represents core periphery structures obtained by CP-MKNN on the Karate Club dataset
and figure 10 displays core periphery structures obtained by ClusterONE-CP on the same dataset.
13. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.12, No.1, January 2021
13
The size of nodes has been drawn proportional to their degree and the width of edges has been
drawn proportional to the edge weights. CP-MKNN is able to identify the two factions correctly
around the two nodes 1 and 34. ClusterONE-CP further splits one of the factions into two parts.
Further, CP-MKNN identifies nodes 10, 20 and 29 as peripheries for both the factions. It
indicates that perhaps, these three nodes have relationships with both the cores and suggest a
relationship between the cores. Further, CP-MKNN provides a core periphery association of type
1 between node 1’s core and node 34’s core with a low value of CPDistance (0.11) and CPScore
(0.17). This indicates that the two factions are similar to each other in terms of edge weight
density even if they are separated in terms of structural density. Thus, CP-MKNN not only
divides the network into cores and peripheries, but we are also able to derive insights about how
the members of different cores might be associated with each other. This information can be very
useful for understanding the flow of information in social network analysis.
Figure 9. Core Periphery structures obtained by CP-MKNN on Karate Club dataset (Faction1 (Core1): Red
nodes, Faction (Core 2): Green nodes, Shared periphery: Yellow nodes)
Figure 10. Core Periphery structures obtained by ClusterONE-CP on Karate Club dataset (Faction 1 (Split
into two cores): Red and Blue nodes, Faction 2 (Core3): Green nodes)
5.4. Results for Real PPI Network Dataset
We use two widely used PPI datasets in the research community for the evaluation of core
periphery structures generated by CP-MKNN. These datasets are named Gavin (1855 proteins
with 7669 edges) [35] and Krogan (2708 proteins with 7123 edges) [36]. Further, we use the
MIPS catalog of gold standard complexes [37] as a validation dataset. In this dataset, there are a
total of 203 protein complexes and 1189 proteins. We use accuracy as defined by Brohee and
Helden [38] for assessing the quality of core periphery structures. This measure is formulated as
the geometric mean of sensitivity (Sn) and positive predicted value (PPV). Sn and PPV are
mathematically formulated as below.
14. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.12, No.1, January 2021
14
Here n is the count of clusters predicted by the algorithm and n is the number of protein
complexes in the gold standard database. mi corresponds to the protein count in complex i. T is
the contingency table of protein complexes with two dimensions, actual and predicted.
5.4.1. Comparison of CP-MKNN vs. ClusterONE-CP for Real PPI datasets
We present an accuracy comparison between CP-MKNN and ClusterONE-CP in table 2. It can
be noted that accuracy value obtained for both the algorithms is very similar in the case of both
the PPI datasets. This suggests that clusters obtained by CP-MKNN are meaningful and match
well to the gold standard complexes similar to ClusterONE-CP. Further, the clusters obtained for
CP-MKNN tend to have a higher average mean of edge weights indicating that CP- MKNN is
able to separate very dense cores from their sparser peripheries.
Next, we present a comparison of the properties of cores and peripheries as obtained by CP-
MKNN and ClusterONE-CP on the real PPI datasets in table 3. Both the algorithms identify the
clusters as being core, periphery or core-periphery (both core and periphery). We notice that for
both CP-MKNN and ClusterONE-CP, cores obtain a higher average mean and average structural
density over the peripheries. This difference is statistically significant as per Student’s t-test with
α = 0.05. Further, both the algorithms find core-peripheries to have a statistically significantly
higher average mean than peripheries for both the datasets. The difference in structural density of
core-peripheries and peripheries is not found to be statistically significant. The table also displays
the difference in the essentiality values of cores vs peripheries. Essential proteins are those that
are thought be critical for the survival of the organism. It has been studied that the essentiality of
proteins in a PPI network could be related to their structural characteristics in the network [39]
[17]. We use an essential proteins database [40] to determine the essentiality of proteins. For both
the datasets, we find that cores obtained by CP-MKNN have a higher average essentiality over
the peripheries as per t-test with α = 0.05. This is in alignment with the existing studies that the
essentiality of proteins could be related to their structural properties in the PPI network For
ClusterONE-CP, this trend is supported only for the Gavin dataset.
Table 2. Accuracy comparison for CP-KNN vs ClusterONE-CP for PPI datasets
Algorithm No. of Clusters (size>2) Sn PPV Accuracy Avg. Mean
Gavin Dataset
CP-MKNN 233 0.44 0.41 0.43 0.41
ClusterONE-CP 245 0.56 0.39 0.47 0.38
Krogan Dataset
CP-MKNN 293 0.42 0.46 0.44 0.75
ClusterONE-CP 332 0.49 0.44 0.46 0.71
15. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.12, No.1, January 2021
15
5.4.2. Comparison of relationship types identified by CP-MKNN for Real PPI datasets
We provide a comparative study of different types of relationships (core-periphery (type 1 and
type 2) and periphery- periphery) for CP-MKNN in table 4 for Gavin dataset. The table displays
the number of relations for each type, along with the average CPDistance and the average
CPScore of the peripheries in each type. As described before, CPDistance is a measure of edge
weight distance between core and periphery, whereas CPScore is a measure of structural
similarity between core and periphery. A smaller value of CPDistance signifies that core and
periphery are closer as per edge weight density. A higher value of CPScore suggests that the core
and periphery are closer to each other as per structural density. As evident from table 4, type 1
peripheries are closer their cores than type 2 peripheries as per edge weight density (lower value
of average CPDistance). On the other hand, type 2 peripheries are closer to their cores than type 1
peripheries as per structural density (higher value of average CPScore). These results are in line
with our construction of peripheries of different types.
Table 3. Core Periphery Structures by CP-MKNN and ClusterONE-CP for PPI datasets
No. Avg. Mean Avg. Stand.
Dev.
Avg. Struct.
Density
Avg.
Essentiality
CP-MKNN for Gavin Dataset
Cores 67 0.56 0.14 0.84 0.38
Peripheries 45 0.28 0.024 0.52 0.28
Core Peripheries 101 0.35 0.083 0.49 0.33
ClusterONE-CP for Gavin Dataset
Cores 22 0.52 0.15 0.91 0.37
Peripheries 52 0.32 0.06 0.58 0.16
Core Peripheries 149 0.38 0.12 0.57 0.34
CP-MKNN for Krogan Dataset
Cores 88 0.89 0.093 0.58 0.37
Peripheries 60 0.51 0.096 0.52 0.16
Core Peripheries 112 0.77 0.095 0.41 0.32
ClusterONE-CP for Krogan Dataset
Cores 14 0.93 0.059 0.70 0.23
Peripheries 55 0.48 0.079 0.54 0.22
Core Peripheries 247 0.77 0.13 0.45 0.29
16. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.12, No.1, January 2021
16
Table 4. Comparison of relationships types obtained for CP-KNN (K=4) for Gavin dataset
Relationship Type # Relationships Avg. Mean Difference Avg. Cohesion Difference
CP-Type 1 214 0.85 0.20
CP-Type 2 194 3.04 0.32
Periphery-Periphery 109 4.9 0.22
Next, we provide a motivating example of a real core- periphery and periphery-periphery
relationship obtained by CP-MKNN on the Krogan PPI dataset in figure 11. For each core or
periphery subunit, we also note down the MIPS gold standard complex it mapped on to. CP-
MKNN finds a protein subset which maps to Exosome complex as a core for a periphery subset
which maps to RNA-polymerase-III com- plex. Further, CP-MKNN finds a periphery-periphery
relationship between this RNA-polymerase-III complex periphery and another periphery that
maps to RNA-polymerase-II complex. It has been researched that exosome complex is a central
factor in processing stable RNA species produced by RNA polymerases I, II, and III [41]. This
suggests that both the core-periphery and periphery-periphery relations identified by CP-MKNN
are meaningful and can be used to gain insights into the nature of complex networks.
Figure 11. An example Core-Periphery and Periphery-Periphery relationship obtained by CP- MKNN on
Krogan PPI dataset
6. CONCLUSIONS
To summarize, we developed a core periphery structure identification algorithm in weighted and
undirected graphs using a heuristic measure called MKNN. We provide with a definition of core
periphery structures suited for weighted graphs. We illustrated the usefulness of core periphery
structures generated by CP-MKNN using synthetically generated datasets as well as real world
social networks and biological PPI networks. We also provided a comparative analysis of core
periphery results by CP-MKNN with those obtained by another core periphery algorithm called
ClusterONE-CP. We demonstrate that CP-MKNN finds denser cores than ClusterONE-CP as per
structure or edge weight density. We further scored and categorized the obtained core periphery
structures and provided with comparative examples. We demonstrated that CP-MKNN is able to
separate very dense cores as per structural density and edge weight density constraints from their
peripheries in weighted graphs. These relationships can be very useful to gain insights about the
nature of complex networks. This paper is a proof of concept of the algorithm for identifying core
periphery structures in weighted graphs. As a future direction of work, there is scope for making
the algorithm scalable for large graphs. Further, the core periphery structures defined in this
paper can be applied to graphs which vary over time to see how the relationships and the core
17. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.12, No.1, January 2021
17
periphery scores evolve over time. This can be helpful in studying how trends change over time
in complex networks represented as weighted graphs.
REFERENCES
[1] A. Barrat, M. Barthelemy, R. Pastor-Satorras, and A. Vespignani, “The architecture of complex
weighted networks,” in Proc. Nat. Acad. of Sci. of the USA, 2004, vol. 101, no. 11, pp. 3747- 3752.
[2] Y.-Y. Ahn, J. P. Bagrow, and A. Lehmann, “Link communities reveal multi-scale complexity in
networks,” Nature, vol. 466, pp. 761–764, 2010.
[3] E.M. Airoldi, D.M. Blei, S.E. Fienberg, and E.P. Xing, “Mixed membership stochastic block
models,” J. Mach. Learn. Res., vol. 9, pp. 1981–2014, 2007.
[4] M. Sales-Pardo, R. Guimera, A. Moreira, and L.A.N Amaral, “Extracting the hierarchical
organization of complex systems,” in Proc. Nat. Acad. of Sci. of the USA, 2007, vol. 104, pp. 18
874–18 874.
[5] I. Psorakis, S. Roberts, M. Ebden, and B. Sheldon, “Overlapping community detection using
Bayesian non-negative matrix factorization,” Phys. Rev. E, vol. 83, 2011.
[6] G. Palla, I. Derenyi, I. Farkas and T. Vicsek, “Uncovering the overlapping community structure of
complex networks in nature and society,” Nature, vol. 435, no. 7043, pp. 814–818, 2005.
[7] T.S. Evans, and R. Lambiotte, “Line graphs, link partitions, and overlapping communities,” Phys.
Rev. E, vol. 80, Art. ID. 016105, 2009.
[8] T. Nepusz, H. Yu, and A. Paccanaro, “Detecting overlapping protein complexes in protein-protein
interaction networks,” Nature Methods, vol. 9, pp. 471-472, 2012.
[9] J. Yang, and J. Leskovec, “Overlapping Communities Explain Core–Periphery Organization of
Networks,” in Proc. of the IEEE, 2014, vol. 12, pp. 1892-1902.
[10] J. Yang, M. Zhang, K.N. Shen, X. Ju, and G. Xitong, “Structural correlation between communities
and core-periphery structures in social networks, Evidence from twitter data,” Expert Systems with
Applications, vol. 111, pp. 91-99, 2018.
[11] S.P. Borgatti, and M.G. Everett, “Models of Core/Periphery structures,” Social Networks, vol. 21, pp.
375–395, 1999.
[12] D. Smith, and D. White, “Structure and dynamics of the global economy: network analysis of
international trade 1965–1980,” Social Forces, vol. 70, pp. 857–893, 1992.
[13] P. Krugman, The Self-Organizing Economy. Blackwell, United King- dom: Oxford, 1996.
[14] R.R. Faulkner, Music on Demand: Composers and Careers in the Hollywood Film Industry. New
Brunskwick, NJ, USA: Transaction Publishers, 1987.
[15] E.O. Laumann, and F.U. Pappi, Networks of Collective Action: A Perspective on Community
Influence Systems. New York, USA: Academic Press, 1976.
[16] P. Doreian, “Structural equivalence in a psychology journal net- work,” American Society for
Information Science, vol. 36, no. 6, pp. 411–417, 1985.
[17] F. Luo, B. Li, X. Wan, and R. Scheuermann, “Core and periphery structures in protein interaction
networks,” BMC Bioinformatics, vol. 10, pp. S8, 2009.
[18] J.N. Cummings, and R. Cross, “Structural properties of work groups and their consequences for
performance,” Social Networks, vol. 25, no. 3, pp. 197–210, 2003.
[19] T. Wang, C. Rudin, D. Wagner, and R. Sevieri, “Finding Patterns with a Rotten Core: Data Mining
for Crime Series with Cores.” Big Data, vol. 3, no. 1, pp. 3-21, 2015.
[20] D. Sardana and R. Bhatnagar, “Core Periphery structures in weighted graphs using greedy growth,”
in IEEE/WIC/ACM International Conference on Web Intelligence, 2016, pp. 1-8.
[21] M.G. Everett, and S.P. Borgatti, “Peripheries of cohesive subsets,” Social Networks, vol. 21, pp. 397,
1999.
[22] M.P. Rombach, M.A. Porter, J.H. Fowler, and P.J. Mucha, “Core-periphery structure in networks,”
SIAM Journal on Applied Mathematics, vol. 74, no. 1, pp. 167-190, 2014.
[23] J.P. Boyd, W.J. Fitzgerald, and R.J. Beck, “Computing core/periphery structures and permutation
tests for social relations data,” Social Networks, vol. 28, pp. 166-178, 2006.
[24] F.D. Rossa, D. Fabio, and C. Piccardi, “Profiling core-periphery network structure by random
walkers,” Scientific Reports vol. 3, no. 1, pp. 1-8, 2013.
[25] S. Bruckner, F. Huffner, and C. Komusiewicz, “A graph modification approach for finding core–
periphery structures in protein interaction networks,” Algorithms for Molecular Biology, vol. 10, no.
1, pp. 16, 2015.
18. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.12, No.1, January 2021
18
[26] J.V.L. de Jeude, G. Caldarelli, and T. Squartini, “Detecting core- periphery structures by surprise,”
EPL (Europhysics Letters), vol. 125, no. 6, 2019.
[27] A. Elliott, A. Chiu, M. Bazzi, G. Reinert, and M. Cucuringu, "Core–periphery structure in directed
networks," in Proc. Royal Society A, 2020, vol. 476, no. 2241.
[28] J.M. Kleinberg, “Authoritative sources in a hyperlinked environment,” Journal of ACM, vol. 46, pp.
604-632, 1999.
[29] Z. Hu, and R. Bhatnagar, “Clustering algorithm based on mutual K-nearest neighbor relationships,”
Statistical Analysis and Data Mining: The ASA Data Science Journal, vol. 5, no. 2, pp. 100-113,
2012.
[30] D. Sardana, and R. Bhatnagar, “Graph Clustering Using Mutual K- Nearest Neighbors,” in Active
Media Technology, 2014, pp. 35-48.
[31] D. Sardana, “Analysis of Meso-scale Structures in Weighted Graphs,” PhD diss., University of
Cincinnati, 2017.
[32] J. Nešetřil, and P.O. De Mendez, “Sparsity: graphs, structures, and algorithms”, Springer Science &
Business Media, vol. 28, 2012.
[33] D.E. Knuth, “The Stanford Graphbase: A platform for combinatorial computing,” New York: ACM
Press, pp. 74-87, 1993.
[34] W. Zachary, “An information flow model for conflict and fission in small groups,” Journal of
Anthropological Research, vol. 33, pp. 452-473, 1977.
[35] A.C. Gavin, M. Bosche, R. Krause, P. Grandi, M. Marzioch, A. Bauer, J. Schultz, J.M. Rick, A.M.
Michon, C.M. Cruciat, and M. Remor, “Functional organization of the yeast proteome by systematic
analysis of protein complexes,” Nature, vol. 415, no. 6868, pp. 141-147, 2002.
[36] N.J. Krogan, G. Cagney, H. Yu, G. Zhong, X. Guo, A. Ignatchenko, J. Li, S. Pu, N. Datta, A.P.
Tikuisis, and T. Punna, “Global landscape of protein complexes in the yeast Saccharomyces
cerevisiae,” Nature, vol. 440, no. 7084, pp. 637–643, 2006.
[37] H.W. Mewes, D. Frishman, U. Güldener, G. Mannhaupt, K. Mayer, M. Mokrejs, B. Morgenstern, M.
Münsterkötter, S. Rudd, and B. Weil, “MIPS: A database for genomes and protein sequences,”
Nucleic Acids Res, vol. 30, pp. 31-34, 2002.
[38] S. Brohee, and J.V. Helden, “Evaluation of clustering algorithms for protein-protein interaction
networks,” BMC bioinformatics, vol. 7, no. 1, pp. 1, 2006.
[39] H. Jeong, S.P. Mason, A.L. Barabasi, “Lethality and centrality in protein networks,” Nature, vol. 411,
no. 6833, pp. 41-42, 2001.
[40] W.H. Chen, P. Minguez, M.J. Lercher, and P. Bork, “OGEE: an online gene essentiality database,”
Nucleic acids research, vol. 40, no. D1, pp. D901-D906, 2012.
[41] C. Kilchert, S. Wittmann, and L. Vasiljeva, “The regulation and functions of the nuclear RNA
exosome complex,” Nature Reviews Molecular Cell Biology, 2016.
AUTHORS
Divya Sardana: Dr. Divya Sardana is a Senior Data Scientist at Teradata Corp., Santa
Clara, CA, USA. She has a PhD in Computer Science from the University of Cincinnati,
OH US. Her research targets development of scalable machine learning and graph
algorithms for the analysis of complex datasets in interdisciplinary domains of data
science.
Raj Bhatnagar: Dr. Raj Bhatnagar is a Professor at the department of EECS at
University of Cincinnati, Cincinnati, OH, USA. His research interests encompass
developing algorithms for data mining and pattern recognition problems in various
domains including Bioinformatics, Geographic Information Systems, Manufacturing,
and Business.