The document summarizes a thesis project on a local cluster-based community detection algorithm. It was submitted by Regalla Sairam Reddy to the University College of Engineering Kakinada in partial fulfillment of a Master of Computer Applications degree. The thesis was supervised by Dr. M.H.M Krishna Prasad and examines using a minimal cluster approach to detect local communities more effectively in complex networks compared to algorithms that start from a single initial node. The document includes declarations by the student and supervisor, as well as acknowledgments and outlines of the problem identification, methodology, technologies used, implementation, and conclusion.
Prediction of Student's Performance with Deep Neural NetworksCSCJournals
The performance of education has a big part in people's life. The prediction of student's performance in advance is very important issue for education. School administrators and students' parents impact on students' performance. Hence, academic researchers have developed different types of models to improve student performance. The main goal to reveal of this study is to search the best model of neural network models for the prediction of the performance of the high school students. For this purpose, five different types of neural network models have been developed and compared to their results. The data set obtained from Taldykorgan Kazakh Turkish High School (in Kazakhstan) students was used. Test results show that proposed two types of neural network model are predicted students' real performance efficiently and provided better accuracy when the test of today’s and future’s samples have similar characteristics.
An Extensive Review on Generative Adversarial Networks GAN’sijtsrd
This paper is to provide a high level understanding of Generative Adversarial Networks. This paper will be covering the working of GAN’s by explaining the background idea of the framework, types of GAN’s in the industry, it’s advantages and disadvantages, history of how GAN’s are developed and enhanced along the timeline and some applications where GAN’s outperforms themselves. Atharva Chitnavis | Yogeshchandra Puranik "An Extensive Review on Generative Adversarial Networks (GAN’s)" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-4 , June 2021, URL: https://www.ijtsrd.compapers/ijtsrd42357.pdf Paper URL: https://www.ijtsrd.comcomputer-science/artificial-intelligence/42357/an-extensive-review-on-generative-adversarial-networks-gan’s/atharva-chitnavis
Eat it, Review it: A New Approach for Review Predictionvivatechijri
Deep Learning has achieved significant improvement in various machine learning tasks. Nowadays,
Recurrent Neural Network (RNN) and Long Short Term Memory (LSTM) have been increasing its popularity on
Text Sequence i.e. word prediction. The ability to abstract information from image or text is being widely
adopted by organizations around the world. A basic task in deep learning is classification be it image or text.
Current trending techniques such as RNN, CNN has proven that such techniques open the door for data analysis.
Emerging technologies such has Region CNN, Recurrent CNN have been under consideration for the analysis.
Recurrent CNN is being under development with the current world. The proposed system uses Recurrent Neural
Network for review prediction. Also LSTM is used along with RNN so as to predict long sentences. This system
focuses on context based review prediction and will provide full length sentence. This will help to write a proper
reviews by understanding the context of user.
Prediction of Student's Performance with Deep Neural NetworksCSCJournals
The performance of education has a big part in people's life. The prediction of student's performance in advance is very important issue for education. School administrators and students' parents impact on students' performance. Hence, academic researchers have developed different types of models to improve student performance. The main goal to reveal of this study is to search the best model of neural network models for the prediction of the performance of the high school students. For this purpose, five different types of neural network models have been developed and compared to their results. The data set obtained from Taldykorgan Kazakh Turkish High School (in Kazakhstan) students was used. Test results show that proposed two types of neural network model are predicted students' real performance efficiently and provided better accuracy when the test of today’s and future’s samples have similar characteristics.
An Extensive Review on Generative Adversarial Networks GAN’sijtsrd
This paper is to provide a high level understanding of Generative Adversarial Networks. This paper will be covering the working of GAN’s by explaining the background idea of the framework, types of GAN’s in the industry, it’s advantages and disadvantages, history of how GAN’s are developed and enhanced along the timeline and some applications where GAN’s outperforms themselves. Atharva Chitnavis | Yogeshchandra Puranik "An Extensive Review on Generative Adversarial Networks (GAN’s)" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-4 , June 2021, URL: https://www.ijtsrd.compapers/ijtsrd42357.pdf Paper URL: https://www.ijtsrd.comcomputer-science/artificial-intelligence/42357/an-extensive-review-on-generative-adversarial-networks-gan’s/atharva-chitnavis
Eat it, Review it: A New Approach for Review Predictionvivatechijri
Deep Learning has achieved significant improvement in various machine learning tasks. Nowadays,
Recurrent Neural Network (RNN) and Long Short Term Memory (LSTM) have been increasing its popularity on
Text Sequence i.e. word prediction. The ability to abstract information from image or text is being widely
adopted by organizations around the world. A basic task in deep learning is classification be it image or text.
Current trending techniques such as RNN, CNN has proven that such techniques open the door for data analysis.
Emerging technologies such has Region CNN, Recurrent CNN have been under consideration for the analysis.
Recurrent CNN is being under development with the current world. The proposed system uses Recurrent Neural
Network for review prediction. Also LSTM is used along with RNN so as to predict long sentences. This system
focuses on context based review prediction and will provide full length sentence. This will help to write a proper
reviews by understanding the context of user.
Intelligent Handwritten Digit Recognition using Artificial Neural NetworkIJERA Editor
The aim of this paper is to implement a Multilayer Perceptron (MLP) Neural Network to recognize and predict handwritten digits from 0 to 9. A dataset of 5000 samples were obtained from MNIST. The dataset was trained using gradient descent back-propagation algorithm and further tested using the feed-forward algorithm. The system performance is observed by varying the number of hidden units and the number of iterations. The performance was thereafter compared to obtain the network with the optimal parameters. The proposed system predicts the handwritten digits with an overall accuracy of 99.32%.
Data mining is the knowledge discovery in databases and the gaol is to extract patterns and knowledge from
large amounts of data. The important term in data mining is text mining. Text mining extracts the quality
information highly from text. Statistical pattern learning is used to high quality information. High –quality in
text mining defines the combinations of relevance, novelty and interestingness. Tasks in text mining are text
categorization, text clustering, entity extraction and sentiment analysis. Applications of natural language
processing and analytical methods are highly preferred to turn
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSijseajournal
ABSTRACT
In this paper we propose a novel method to cluster categorical data while retaining their context. Typically, clustering is performed on numerical data. However it is often useful to cluster categorical data as well, especially when dealing with data in real-world contexts. Several methods exist which can cluster categorical data, but our approach is unique in that we use recent text-processing and machine learning advancements like GloVe and t- SNE to develop a a context-aware clustering approach (using pre-trained
word embeddings). We encode words or categorical data into numerical, context-aware, vectors that we use to cluster the data points using common clustering algorithms like K-means.
Using Cisco Network Components to Improve NIDPS Performance csandit
Network Intrusion Detection and Prevention Systems (NIDPSs) are used to detect, prevent and
report evidence of attacks and malicious traffic. Our paper presents a study where we used open
source NIDPS software. We show that NIDPS detection performance can be weak in the face of
high-speed and high-load traffic in terms of missed alerts and missed logs. To counteract this
problem, we have proposed and evaluated a solution that utilizes QoS, queues and parallel
technologies in a multi-layer Cisco Catalyst Switch to increase NIDPSs detection performance.
Our approach designs a novel QoS architecture to organise and improve throughput-forwardplan
traffic in a layer 3 switch in order to improve NIDPS performance.
Proposing a new method of image classification based on the AdaBoost deep bel...TELKOMNIKA JOURNAL
Image classification has different applications. Up to now, various algorithms have been presented
for image classification. Each of these methods has its own weaknesses and strengths. Reducing error rate
is an issue which many researches have been carried out about it. This research intends to optimize
the problem with hybrid methods and deep learning. The hybrid methods were developed to improve
the results of the single-component methods. On the other hand, a deep belief network (DBN) is a generative
probabilistic modelwith multiple layers of latent variables and is used to solve the unlabeled problems. In
fact, this method is anunsupervised method, in which all layers are one-way directed layers except for
the last layer. So far, various methods have been proposed for image classification, and the goal of this
research project was to use a combination of the AdaBoost method and the deep belief network method to
classify images. The other objective was to obtain better results than the previous results. In this project, a
combination of the deep belief network and AdaBoost method was used to boost learning and the network
potential was enhanced by making the entire network recursive. This method was tested on the MINIST
dataset and the results were indicative of a decrease in the error rate with the proposed method as compared
to the AdaBoost and deep belief network methods.
A new clutering approach for anomaly intrusion detectionIJDKP
Recent advances in technology have made our work easier compare to earlier times. Computer network is
growing day by day but while discussing about the security of computers and networks it has always been a
major concerns for organizations varying from smaller to larger enterprises. It is true that organizations
are aware of the possible threats and attacks so they always prepare for the safer side but due to some
loopholes attackers are able to make attacks.
Intrusion detection is one of the major fields of research and researchers are trying to find new algorithms
for detecting intrusions. Clustering techniques of data mining is an interested area of research for detecting
possible intrusions and attacks. This paper presents a new clustering approach for anomaly intrusion
detection by using the approach of K-medoids method of clustering and its certain modifications. The
proposed algorithm is able to achieve high detection rate and overcomes the disadvantages of K-means
algorithm.
Abstract—Classical machine learning techniques have been employed severally in intrusion detection. But due to the rising cases and sophistication of attacks, more advanced machine learning techniques including ensemble-based methods, neural networks and deep learning techniques have been applied. However, there is still need for improved machine learning approach to detect attacks more effectively and efficiently. Stacked generalization approach has been shown to be capable of learning from features and meta-features but has been limited by the deficiencies of base classifiers and lack of optimization in the choice of meta-feature combination. This paper therefore proposes a stacked generalization ensemble approach based on two-tier meta-learner, in which the outputs of classical stacked ensemble are passed to multi-feature-based stacked ensemble, which is optimized. A Grid-search approach is used for the optimization. Nine data features and four meta-features derived from Logistic Regression, Support Vector Machine, Naïve Bayes, and Multilayer Perceptron neural network are used for the machine learning classification task. By applying neural networks as the meta-learner for the classification of NSL-KDD data, improved performances in terms of accuracy, precision, recall and F-measure of 0.97, 0.98, 0.98 and 0.98, respectively are achieved.
International Journal of Computer Science and Information Security,IJCSIS ISSN 1947-5500, Pittsburgh, PA, USA
Email: ijcsiseditor@gmail.com
http://sites.google.com/site/ijcsis/
https://google.academia.edu/JournalofComputerScience
https://www.linkedin.com/in/ijcsis-research-publications-8b916516/
http://www.researcherid.com/rid/E-1319-2016
Handwriting identification using deep convolutional neural network methodTELKOMNIKA JOURNAL
Handwriting is a unique thing that produced differently for each person. Handwriting has a characteristic that remain the same with single writer, so a handwriting can be used as a variable in biometric systems. Each person have a different form of handwriting style but with a small possibility that same characters have something commons. We propose a handwriting identification method using sentence segmented handwriting forms. Sentence form is used to get more complete handwriting characteristics than using a single characters or words. Dataset used is divided into three categories of images, binary, grayscale, and inverted binary. All datasets have same image with different in color and consist of 100 class. Transfer learning used in this paper are pre-trained model VGG19. Training was conducted in 100 epochs. Highest result is grayscale images with genuince acceptance rate of 92.3% and equal error rate of 7.7%.
Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze, understand, and generate languages that humans use naturally to address computers.
This session talks about how to define a problem as a machine learning one. What are the steps toward reaching a satisfying solution from data preparation, feature engineering, evaluating suitable algorithms until releasing the model and putting it in practice. It presents a case study and go through some algorithms mostly implemented in Python.
By Hussein Natsheh - Data Mining entrepreneur, scholar, and founder of CiApple
YouTube video: https://youtu.be/NGbyeX4kpU4
Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...Sarvesh Kumar
The work is carried on the application of differential equation (DE) and its computational technique of genetic algorithm and neural (GANN) in C#, which is frequently used in globalised world by human wings. Diagrammatical and flow chart presentation is the major concerned for easy undertaking of these two concepts with indication of its present and future application is the new initiative taken in this paper along with computational approaches in C#. Little observation has been also pointed during working, functioning and development process of above algorithm in C# under given boundary value condition of DE for genetic and neural. Operations of fitness function and Genetic operations were completed for behavioural transmission of chromosome.
GUI based handwritten digit recognition using CNNAbhishek Tiwari
This project is to create a model which can recognize the digits as well as also to create GUI which is user friendly i.e. user can draw the digit on it and will get appropriate output.
LOAD BALANCED CLUSTERING WITH MIMO UPLOADING TECHNIQUE FOR MOBILE DATA GATHER...Munisekhar Gunapati
A three-layer framework is proposed for mobile data collection in wireless sensor networks, which includes the sensor layer, cluster head layer, and mobile collector (called SenCar) layer. The framework employs distributed load balanced clustering and dual data uploading, which is referred to as LBC-MIMO. The objective is to achieve good scalability, long network lifetime and low data collection latency. At the sensor layer, a distributed load balanced clustering (LBC) algorithm is proposed for sensors to self-organize themselves into clusters. In contrast to existing clustering methods, our scheme generates multiple cluster heads in each cluster to balance the work load and facilitate dual data uploading. At the cluster head layer, the inter-cluster transmission range is carefully chosen to guarantee the connectivity among the clusters. Multiple cluster heads within a cluster cooperate with each other to perform energy-saving inter-cluster communications. Through inter-cluster transmissions, cluster head information is forwarded to SenCar for its moving trajectory planning. At the mobile collector layer, SenCar is equipped with two antennas, which enables two cluster heads to simultaneously upload data to SenCar in each time by utilizing multi-user multiple-input and multiple-output (MU-MIMO) technique. The trajectory planning for SenCar is optimized to fully utilize dual data uploading capability by properly selecting polling points in each cluster. By visiting each selected polling point, SenCar can efficiently gather data from cluster heads and transport the data to the static data sink. Extensive simulations are conducted to evaluate the effectiveness of the proposed LBC-MIMO scheme. The results show that when each cluster has at most two cluster heads, LBC-MIMO achieves over 50 percent energy saving per node and 60 percent energy saving on cluster heads comparing with data collection through multi-hop relay to the static data sink, and 20 percent shorter data collection time compared to traditional mobile data gathering.
Intelligent Handwritten Digit Recognition using Artificial Neural NetworkIJERA Editor
The aim of this paper is to implement a Multilayer Perceptron (MLP) Neural Network to recognize and predict handwritten digits from 0 to 9. A dataset of 5000 samples were obtained from MNIST. The dataset was trained using gradient descent back-propagation algorithm and further tested using the feed-forward algorithm. The system performance is observed by varying the number of hidden units and the number of iterations. The performance was thereafter compared to obtain the network with the optimal parameters. The proposed system predicts the handwritten digits with an overall accuracy of 99.32%.
Data mining is the knowledge discovery in databases and the gaol is to extract patterns and knowledge from
large amounts of data. The important term in data mining is text mining. Text mining extracts the quality
information highly from text. Statistical pattern learning is used to high quality information. High –quality in
text mining defines the combinations of relevance, novelty and interestingness. Tasks in text mining are text
categorization, text clustering, entity extraction and sentiment analysis. Applications of natural language
processing and analytical methods are highly preferred to turn
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSijseajournal
ABSTRACT
In this paper we propose a novel method to cluster categorical data while retaining their context. Typically, clustering is performed on numerical data. However it is often useful to cluster categorical data as well, especially when dealing with data in real-world contexts. Several methods exist which can cluster categorical data, but our approach is unique in that we use recent text-processing and machine learning advancements like GloVe and t- SNE to develop a a context-aware clustering approach (using pre-trained
word embeddings). We encode words or categorical data into numerical, context-aware, vectors that we use to cluster the data points using common clustering algorithms like K-means.
Using Cisco Network Components to Improve NIDPS Performance csandit
Network Intrusion Detection and Prevention Systems (NIDPSs) are used to detect, prevent and
report evidence of attacks and malicious traffic. Our paper presents a study where we used open
source NIDPS software. We show that NIDPS detection performance can be weak in the face of
high-speed and high-load traffic in terms of missed alerts and missed logs. To counteract this
problem, we have proposed and evaluated a solution that utilizes QoS, queues and parallel
technologies in a multi-layer Cisco Catalyst Switch to increase NIDPSs detection performance.
Our approach designs a novel QoS architecture to organise and improve throughput-forwardplan
traffic in a layer 3 switch in order to improve NIDPS performance.
Proposing a new method of image classification based on the AdaBoost deep bel...TELKOMNIKA JOURNAL
Image classification has different applications. Up to now, various algorithms have been presented
for image classification. Each of these methods has its own weaknesses and strengths. Reducing error rate
is an issue which many researches have been carried out about it. This research intends to optimize
the problem with hybrid methods and deep learning. The hybrid methods were developed to improve
the results of the single-component methods. On the other hand, a deep belief network (DBN) is a generative
probabilistic modelwith multiple layers of latent variables and is used to solve the unlabeled problems. In
fact, this method is anunsupervised method, in which all layers are one-way directed layers except for
the last layer. So far, various methods have been proposed for image classification, and the goal of this
research project was to use a combination of the AdaBoost method and the deep belief network method to
classify images. The other objective was to obtain better results than the previous results. In this project, a
combination of the deep belief network and AdaBoost method was used to boost learning and the network
potential was enhanced by making the entire network recursive. This method was tested on the MINIST
dataset and the results were indicative of a decrease in the error rate with the proposed method as compared
to the AdaBoost and deep belief network methods.
A new clutering approach for anomaly intrusion detectionIJDKP
Recent advances in technology have made our work easier compare to earlier times. Computer network is
growing day by day but while discussing about the security of computers and networks it has always been a
major concerns for organizations varying from smaller to larger enterprises. It is true that organizations
are aware of the possible threats and attacks so they always prepare for the safer side but due to some
loopholes attackers are able to make attacks.
Intrusion detection is one of the major fields of research and researchers are trying to find new algorithms
for detecting intrusions. Clustering techniques of data mining is an interested area of research for detecting
possible intrusions and attacks. This paper presents a new clustering approach for anomaly intrusion
detection by using the approach of K-medoids method of clustering and its certain modifications. The
proposed algorithm is able to achieve high detection rate and overcomes the disadvantages of K-means
algorithm.
Abstract—Classical machine learning techniques have been employed severally in intrusion detection. But due to the rising cases and sophistication of attacks, more advanced machine learning techniques including ensemble-based methods, neural networks and deep learning techniques have been applied. However, there is still need for improved machine learning approach to detect attacks more effectively and efficiently. Stacked generalization approach has been shown to be capable of learning from features and meta-features but has been limited by the deficiencies of base classifiers and lack of optimization in the choice of meta-feature combination. This paper therefore proposes a stacked generalization ensemble approach based on two-tier meta-learner, in which the outputs of classical stacked ensemble are passed to multi-feature-based stacked ensemble, which is optimized. A Grid-search approach is used for the optimization. Nine data features and four meta-features derived from Logistic Regression, Support Vector Machine, Naïve Bayes, and Multilayer Perceptron neural network are used for the machine learning classification task. By applying neural networks as the meta-learner for the classification of NSL-KDD data, improved performances in terms of accuracy, precision, recall and F-measure of 0.97, 0.98, 0.98 and 0.98, respectively are achieved.
International Journal of Computer Science and Information Security,IJCSIS ISSN 1947-5500, Pittsburgh, PA, USA
Email: ijcsiseditor@gmail.com
http://sites.google.com/site/ijcsis/
https://google.academia.edu/JournalofComputerScience
https://www.linkedin.com/in/ijcsis-research-publications-8b916516/
http://www.researcherid.com/rid/E-1319-2016
Handwriting identification using deep convolutional neural network methodTELKOMNIKA JOURNAL
Handwriting is a unique thing that produced differently for each person. Handwriting has a characteristic that remain the same with single writer, so a handwriting can be used as a variable in biometric systems. Each person have a different form of handwriting style but with a small possibility that same characters have something commons. We propose a handwriting identification method using sentence segmented handwriting forms. Sentence form is used to get more complete handwriting characteristics than using a single characters or words. Dataset used is divided into three categories of images, binary, grayscale, and inverted binary. All datasets have same image with different in color and consist of 100 class. Transfer learning used in this paper are pre-trained model VGG19. Training was conducted in 100 epochs. Highest result is grayscale images with genuince acceptance rate of 92.3% and equal error rate of 7.7%.
Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze, understand, and generate languages that humans use naturally to address computers.
This session talks about how to define a problem as a machine learning one. What are the steps toward reaching a satisfying solution from data preparation, feature engineering, evaluating suitable algorithms until releasing the model and putting it in practice. It presents a case study and go through some algorithms mostly implemented in Python.
By Hussein Natsheh - Data Mining entrepreneur, scholar, and founder of CiApple
YouTube video: https://youtu.be/NGbyeX4kpU4
Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...Sarvesh Kumar
The work is carried on the application of differential equation (DE) and its computational technique of genetic algorithm and neural (GANN) in C#, which is frequently used in globalised world by human wings. Diagrammatical and flow chart presentation is the major concerned for easy undertaking of these two concepts with indication of its present and future application is the new initiative taken in this paper along with computational approaches in C#. Little observation has been also pointed during working, functioning and development process of above algorithm in C# under given boundary value condition of DE for genetic and neural. Operations of fitness function and Genetic operations were completed for behavioural transmission of chromosome.
GUI based handwritten digit recognition using CNNAbhishek Tiwari
This project is to create a model which can recognize the digits as well as also to create GUI which is user friendly i.e. user can draw the digit on it and will get appropriate output.
LOAD BALANCED CLUSTERING WITH MIMO UPLOADING TECHNIQUE FOR MOBILE DATA GATHER...Munisekhar Gunapati
A three-layer framework is proposed for mobile data collection in wireless sensor networks, which includes the sensor layer, cluster head layer, and mobile collector (called SenCar) layer. The framework employs distributed load balanced clustering and dual data uploading, which is referred to as LBC-MIMO. The objective is to achieve good scalability, long network lifetime and low data collection latency. At the sensor layer, a distributed load balanced clustering (LBC) algorithm is proposed for sensors to self-organize themselves into clusters. In contrast to existing clustering methods, our scheme generates multiple cluster heads in each cluster to balance the work load and facilitate dual data uploading. At the cluster head layer, the inter-cluster transmission range is carefully chosen to guarantee the connectivity among the clusters. Multiple cluster heads within a cluster cooperate with each other to perform energy-saving inter-cluster communications. Through inter-cluster transmissions, cluster head information is forwarded to SenCar for its moving trajectory planning. At the mobile collector layer, SenCar is equipped with two antennas, which enables two cluster heads to simultaneously upload data to SenCar in each time by utilizing multi-user multiple-input and multiple-output (MU-MIMO) technique. The trajectory planning for SenCar is optimized to fully utilize dual data uploading capability by properly selecting polling points in each cluster. By visiting each selected polling point, SenCar can efficiently gather data from cluster heads and transport the data to the static data sink. Extensive simulations are conducted to evaluate the effectiveness of the proposed LBC-MIMO scheme. The results show that when each cluster has at most two cluster heads, LBC-MIMO achieves over 50 percent energy saving per node and 60 percent energy saving on cluster heads comparing with data collection through multi-hop relay to the static data sink, and 20 percent shorter data collection time compared to traditional mobile data gathering.
Hire me for Your Project.
The project is about Social networking site hence it is a website called as connectingyouth.com. As the name suggests the website is designed specifically for social network and persons can connect always with our friends.
This website allows a user with so many features that he can go through detailed information on any person details, can scrap any person and can save some image in memory and also many features too. This website provide user friendly environment and it provide all detail required for a naive user. The service is designed to help users meet new friends and maintain existing relationships. connectingyouth.com is a website just as a Facebook.com and Myspace.com. A user first creates a "Profile", in which the user provides "Social", "Professional" and "Personal" details. Users can upload photos into their connecting profile with a caption. Members can make groups to join friends according to their wishes.
Speaking technically, the website is designed using language java and the database used during the development of site is Microsoft Sql server.
Hence the website is developed in a way that it is both technically and non-technically sound for the administrator and user respectively.
People have used the idea of "social network" loosely for over a century to connot ecomplex sets of relationships between members of social systems at all scales, from interpersonal to international Our project aims at using JAVA Technologies to make a social Networking Website. In our project we will use JAVA for the Designing interactive interface or Presentation Logic at Front End and HTML to design the website, MySQL 2005 a Database Management System for the manipulation of database of user at Back End.
2021 python projects list
A BI-OBJECTIVE HYPER-HEURISTIC SUPPORT VECTOR MACHINES FOR BIG DATA CYBER-SECURITY
AN ARTIFICIAL INTELLIGENCE AND CLOUD BASED COLLABORATIVE PLATFORM FOR PLANT DISEASE IDENTIFICATION, TRACKING AND FORECASTING FOR FARMERS
10.sentiment analysis of customer product reviews using machine learniVenkat Projects
10.sentiment analysis of customer product reviews using machine learning In this project author is detecting sentiments from amazon reviews by using various machine learning algorithms such as SVM, Decision Tree and Naïve Bayes. In all 3 algorithms SVM is giving better accuracy and to train this algorithms author has used AMAZON reviews dataset and this dataset is saved inside ‘Amazon_Reviews_dataset’ folder. Below screen shot show example reviews from dataset
9.data analysis for understanding the impact of covid–19 vaccinations on the ...Venkat Projects
9.data analysis for understanding the impact of covid–19 vaccinations on the society
In this paper author analysing vaccines dataset to forecast required vaccines compare to manufacturing or available vaccines and by using this forecasting manufacturers may increase and decrease their manufacturing quantity. This forecasting can impact society by taking decision on manufacturing vaccines and if in society more cases occurred then forecasting will be high and by seeing forecasting manufacturers may increase production.
Vaccines are manufacturing by multiple manufacturers such as JOHNSON AND JOHNSON, PFIZER and many more. In this forecasting will take all manufacturers and their production quantity as well as usage of vaccines and based on this Machine Learning algorithm called Decision Tree will forecast require vaccines for next 30 days
To implement this project we are using vaccines dataset to train decision tree algorithm and then this algorithm will predict require vaccines quantity for next 30 days. This dataset is saved inside ‘Dataset’ folder and below screen showing some records from dataset
6.iris recognition using machine learning techniqueVenkat Projects
In this project to recognize person from IRIS we are using CASIA IRIS dataset which contains images from 108 peoples and by using this dataset we are training CNN model and then we can use this CNN model to predict/recognize persons. To train CNN model we are extracting IRIS features by using HoughCircles algorithm which extract IRIS circle from eye images. Below screen shots showing dataset with person id and this dataset saved inside ‘CASIA1’ folder
4.detection of fake news through implementation of data science applicationVenkat Projects
In this project we are using LSTM (Long Short Term Memory) Recurrent Neural Network to predict fake news as huge amount of fake news is gathering in all types of media such as social media or news media and to detect fake news author is training LSTM neural network with past news data label as ‘Genuine’ and ‘Fake’. We downloaded available twitter FAKE NEWS tweets from internet and below is the dataset screen shots
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Student information management system project report ii.pdf
5.local community detection algorithm based on minimal cluster
1. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
1
A Project Thesis on
Local Cluster-Based Community Detection Algorithm
Project work submitted in partial fulfillment of the requirements for the award of
the degree of
MASTER OF COMPUTER APPLICATIONS
in
COMPUTER SCIENCE AND ENGINEERING
Submittedby
Regalla Sairam Reddy
[Roll No.17021F0005]
Under the supervision of
DR.M.H.M KRISHNA PRASAD
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
UNIVERSITY COLLEGE OF ENGINEERING KAKINADA (A)
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY KAKINADA
KAKINADA-533003, A.P, INDIA
2017-2020
2. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
2
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
UNIVERSITY COLLEGE OF ENGINEERING KAKINADA (A)
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY KAKINADA
KAKINADA-533003, A.P, INDIA 2017-2020
DECLARATION BY THE STUDENT
I hereby declare that the work described in this thesis, entitled “A
LiteratureStudy and Local Cluster-Based Community Detection Algorithm”,
which is being submitted by me in partial fulfillment of the requirements for the
award of degree of Master Of Computer Applications in the department of
Computer Science & Engineering to the University College of Engineering
(Autonomous),Jawaharlal Nehru Technological University, Kakinada – 533003,
A.P., is the result of investigations carried out by me under the guidance of
Dr.M.H.M Krishna Prasad, Assistant professor of Computer Science and
Engineering, University College of Engineering, Jawaharlal Nehru Technological
University, Kakinada-533003.
The work is original and has not been submitted to any other University or
Institute for the award of any degree or diploma.
Place: Kakinada Signature:
3. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
3
Date: Name: Regalla Sairam Reddy
Roll No:17021F0005
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
UNIVERSITY COLLEGE OF ENGINEERING KAKINADA (A)
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY KAKINADA
KAKINADA-533003, A.P, INDIA 2017-2020
CERTIFICATE FROM THE SUPERVISOR
This is to certify that the thesis entitled “A Literature Study and Local Cluster-
Based Community Detection Algorithm”, that is being submitted by Regalla
SairamReddy, Roll No.17021F0005, in partial fulfillment of the requirements for
the award of degree of Master of Computer Applications in the Department of
Computer Science &Engineering to the University College of Engineering
(Autonomous), Jawaharlal Nehru Technological University Kakinada is a record of
bona fide project work carried out by him under my guidance and supervision.
Signature of the Supervisor
Dr.M.H.M Krishna Prasad
4. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
4
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
UNIVERSITY COLLEGE OF ENGINEERING KAKINADA (A)
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY KAKINADA
KAKINADA-533003, A.P, INDIA2017-2020
CERTIFICATE FROM THE HEAD OF THE DEPARTMENT
This is to certify that the thesis entitled “A Literature Study and Local Cluster-
Based Community Detection Algorithm”, that is being submitted by Regalla
Sairam Reddy, Roll No.17021F0005, in partial fulfillment of the requirements for
the award of degree of Master of Computer Applications in the Department of
Computer Science &Engineering to the University College of Engineering
(Autonomous), Jawaharlal Nehru Technological University Kakinada is a record of
bona fide project work carried out by him at our department.
Signature of Head of the Department
Dr. D. Haritha
6. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
6
ACKNOWLEDGEMENTS
The successful completion of any task is not possible without proper
suggestion, guidance and environment. Combination of these three factors acts
like backbone to my “A Literature Study and Local Cluster-Based Community
Detection Algorithm” project.
I wish to express my sincere gratitude to Dr. M.H.M. Krishna Prasad,
Professor, Department of CSE, University College of Engineering, Jawaharlal
Nehru Technological University Kakinada for giving me his guidance, immense
support and valuable suggestions during my project.
I would like to express my grateful thanks to Dr. D. Haritha, Professor and
Head of the Department of CSE, University College of Engineering, Jawaharlal
Nehru Technological University Kakinada whose immense support encouraged
completing the project successfully.
My Sincere thanks to all the teaching and non-teaching staff of Department
of Computer Science and Engineering for their support throughout my project
work.
Finally, I would like to thank my family and friends for their support and
motivation, which helped me to complete the project successfully.
Regalla Sairam Reddy
Roll No: 17021F0005,
Master of Computer Applications,
University College of Engineering Kakinada,
Jawaharlal Nehru Technological University,
Kakinada – 533003,
East Godavari District,A.P.
7. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
7
CONTENT
Acknowledgement 5
Abstract 6
Contents 7
List of Tables 8
List of Figures 9
CHAPTER -1 INTRODUCTION 1-23
1.1 What Is Software 1
1.2What Is Software Development Life Cycle
Bug Prediction 2
CHAPTER-2
LITERATURE REVIEW
2.1 Related Work5
CHAPTER 3
PROBLEM IDENTIFICATION AND OBJECTIVE
3.1 Problem Statement 8
3.2 Project Objective 8
CHAPTER 4
METHODOLOGY
4.1 Methodology 9
4.1.1using Python Tool On Standalone Machine Learning
Environment 9
4.2 Data Description 10
8. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
8
4.3 Evaluation Criteria Used For Classification 11
4.3.1 Confusion Matrix
12
4.3.2 Accuracy And Precision
12
4.3.3 Recall And F-Square
13
4.3.4 Sensitivity, Specificity And Roc 13
4.3.5 Significance And Analysis Of Ensemble Method In
MachineLearning 14
4.4 Uml Diagrams 15
4.4.1 Use Case Diagram 16
4.4.2 State Diagram 17
CHAPTER 5
OVERVIEW OF TECHNOLOGIES
5.1AlgorithmsUsed 17
5.1.1 Decision Tree Induction 17
5.1.2 Naïve Bayes 21
5.1.3 Artificial Neuralnetwork 22
5.1.4 Support Vector Machine Model 25
5.1.5 Kernal Functions 28
5.2 Tensorflow 28
CHAPTER 6
IMPLEMENTATION AND RESULTS
6.1 Framework Design 31
6.2 Coding And Testing 34
9. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
9
CHAPTER 7
CONCLUSION40
Reference 42
LIST OF TABLES:
Table No Table Name Page No
1 Cluster based community
detection
27
2 Generic algorithms based
community detection
30
3 Label propagation based
community detection
31
4 Clique based methods for
overlapping community detection
35
10. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
10
LIST OF FIGURES:
Figure No Figure Name Page No
5.1.1 Decision Tree induction 17
5.1.2 Naïve Bayes Algorithm 21
5.1.3 Artificial Neural Networks
Algorithm
23
5.1.4(i) Support Vector Machine
Hyperplane
25
5.1.4(ii) Support Vector Machine
Hyperplane
26
5.2 TensorFlow in UML 29
4.3.1 Use Case Diagram
4.3.2 State Diagram
4.1.1 Block Diagram Of Proposed
Work
4.2.5 Classification Model
6.1 Framework Design
6.2 Bar graph of
knn,svm,naviebayes
15. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
15
ABSTRACT
In order to discover the structure of local community more effectively,
this paper puts forward a new local community detection algorithm based on
minimal cluster. Most of the local community detection algorithms begin
from one node. The agglomeration ability of a single node must be less than
multiple nodes, so the beginning of the community extension of the
algorithm in this paper is no longer from the initial node only but from a
node cluster containing this initial node and nodes in the cluster are
relatively densely connected with each other. The algorithm mainly includes
two phases. First it detects the minimal cluster and then finds the local
community extended from the minimal cluster. Experimental results show
that the quality of the local community detected by our algorithm is much
better than other algorithms no matter in real networks or in simulated
networks.
17. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
17
Introduction to Community Detection:
Recently, many researchers notice that the complex network is a
proper tools to describe variety of complex system in the real world , and
thus the complex network has attracted the great attention in many fields
such as physics, biology and social network et al. In complex network field,
one of the important topology property is community structure which
comprise of densely connected nodes, and some researchers have found that
detecting community structure can reveal some valuable insights of the
functional feature in the complex system . For example, communities in
multimedia social network may imply people with the same hobby and trust
relationship. Zhiyong Zhang et al. proposed an approach to analyse and
detect credible potential path based on community in multimedia social
networks , the approach can effectively and accurately mine potential paths
of copyrighted digital content sharing. Zhiyong Zhang et al. also proposed a
trust model based on small world theory which shows the widely application
18. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
18
of community struction The community structure in biology field may
cluster proteins with the same function. So that many methods have been
proposed to reveal this topological property in complex networks.
Community detection on complex networks has been a hot research field.
Recently, a large number of algorithms for studying the global structure of
the network are proposed, such as the modularity optimization algorithms ,
the spectral clustering algorithms , the hierarchical clustering algorithms ,
and the label propagation algorithms . However, with the continuous
expansion of complex networks, it is easy to collect large network dataset
with millions of nodes. How to store such a large-scale dataset in computer
memory to analyze is a huge challenge for scholars. The calculation for
studying the overall structure of this kind of large-scale networks is
unimaginable. So local community detection becomes an appealing problem
and has drawn more and more attention The main task of local community
detection is to find a community using the local information of the network.
Local community detection has good extensibility. If the local community
detection algorithm is iteratively executed, more local communities can be
found and the whole community structure of the network can be obtained.
The time complexity of this kind of global community detection algorithm is
dependent on the efficiency and accuracy of local community detection
algorithms, so the research of local community detection algorithm still has
a long way to go. There are several problems that need to be solved in the
research of local community detection. First, we should determine the initial
state and find the initial node for local community detection, so as to
determine the needed local information; then, we need to select an objective
function, and through continuous iterative optimization of the objective
function we find the community structure with high quality; after that we
need to find a suitable node expansion method, so that the algorithm can
extract the local community from the initial state step by step; finally, in
order to terminate the algorithm, a suitable termination condition is needed
to determine the boundary of the community.
19. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
19
Most of local community detection algorithms are based on the above-
mentioned process. The definition of local community detection is to find the
local community structure from one or more nodes, but most of the existing
local community detection algorithms, including Clauset , LWP , and LS , are
starting from only one initial node. They greedily select the optimal nodes
from the candidate nodes and add them into the local community. LMD
algorithmextends not from the initial node but from its closest and next
closest local degree central nodes. It discovers a local community from each
of these nodes, respectively. It still starts from single node and discovers
many local communities for the initial node. In general, the aggregation
ability of a single node is lower than that of multiple nodes. So we do not
just rely on the initial node as the beginning of local community expansion.
Our primary goal is to find a minimal cluster closely connected to the initial
node and then detect local community based on the minimal cluster. This
can avoid instability because of the excessive dependence on the initial node.
In this paper, we introduce a local community detection algorithm based on
the minimal cluster—NewLCD. In this new algorithm, the beginning of
community expansion is no longer from the initial node only, but a cluster of
nodes relatively closely connected to the initial node. The algorithm mainly
consists of two parts: one is the detection of the minimal cluster, and the
other is the detection of the local community based on the minimal cluster.
At the same time, the algorithm can be applied to the global community
detection. After finding one local community using this algorithm, we can
repeat the process to obtain the global community structure of the whole
network.
Community Detection
The concept of community detection has emerged in network science
as a method for finding groups within complex systems through represented
on a graph. In contrast to more traditional decomposition methods which
seek a strict block diagonal or block triangular structure, community
20. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
20
detection methods find subnetworks with statistically significantly more
links between nodes in the same group than nodes in different groups
(Girvan and Newman, 2002). Central to community detection is the notion
of modularity, a metric that captures this difference:
(1)Q=12m∑i,jAij−kikj2mδgigj
Here, Q is the modularity, Aij is the edge weight between nodes i and j, ki is
the total weight of all edges connecting node i with all other nodes, and m is
the total weight of all edges in the graph. The Kronecker
delta function δ(gi,gj) will evaluate to one if nodes i and j belong to the same
group, and zero otherwise. Modularity is a property of how one decides to
partition a network: networks that are not partitioned and those that place
every node in its own community will both have modularity equal to zero.
The goal of community detection, then, is to find communities that maximize
modularity. Although modularity maximization is an NP-hard integer
program, many efficient algorithms exist to solve it approximately,
including spectral clustering (Newman, 2006) and fast unfolding (Blondel et
al., 2008). Figure 1 shows a few networks with increasing maximum
modularity; note how the community structure becomes increasingly
apparent as this value increases. Previous efforts in our group have applied
community detection to chemical plant networks by creating an equation
graph of the corresponding dynamic model (Moharir et al., 2017). By doing
so, communities of state variables, inputs, and outputs can be obtained
which are tightly interacting amongst themselves but weakly interacting with
other communities. As such, these communities can form the basis of
distributed control architectures (Jogwar and Daoutidis, 2017) which
typically perform better than other distributed control architectures that one
may obtain from “intuition” (Pourkargar et al., 2017). For a more
comprehensive review of the use of community detection in distributed
control, we refer the reader to Daoutidis et al., 2017. An alternative method
for finding communities for distributed model predictive control is to apply a
decomposition on the optimization problem as a whole (Tang et al., 2017).
21. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
21
Figure 1. Networks of different maximum modularity.
However, community structure can exist in all optimization problems, thus it
makes sense to extend this method to any generic optimization problem, and
this work proposes to do so. The advantage of using community detection to
find decompositions is that subproblems generated will have statistically
minimal interactions, through complicating variables or constraints, and
thus require minimal coordination through the decomposition solution
method. The proposed method is generic, applicable to any optimization
problem or decomposition solution approach, and scalable, using
computationally efficient graph theory algorithms.
Automatically identifying groups based on network
clustering algorithms:
NodeXL can automatically identify groups within a network based
solely on network structure. In contrast to the approach of using existing
data about the attributes as used in Section 7.2.3, this approach is based
solely on who is connected to whom. A number of different network
“clustering” (also known as “community detection”) algorithms exist, which
help find subgroups of highly inter-connected vertices within a network.
NodeXL includes three such algorithms: Clauset-Newman-Moore, Wakita-
Tsurumi [3], and Girvan-Newman (which can take a long time to run on
large graphs). In all of these algorithms, the number of clusters is not
predetermined; instead the algorithm dynamically determines the number it
22. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
22
thinks is best. Each vertex is assigned to exactly one cluster, meaning that
clusters do not overlap. The number of vertices in each cluster can vary
significantly. In some cases, a single cluster can encompass all vertices,
whereas in other cases, a cluster can consist of a single vertex. See
Newman [4] for background on some of these and other community
identifying algorithms.
There is no “right” or “wrong” algorithm to use; instead, it is often useful to
try out different ones and see which ones you believe provide the best results
given your network. For example, in this network, the Clauset-Newman-
Moore algorithm results in fewer, larger groups than the other algorithms,
which provide more groups of a smaller size. Try applying the Wakita-
Tsurumi clustering algorithm by clicking on the Groups dropdown menu in
the NodeXL ribbon and choosing Group by Cluster and the checking the
appropriate selector as shown in Figure 7.14. Notice that the data on the
Groups worksheet is now updated to reflect the new groups.
24. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
24
A social network for an individual is created with his/her interactions and
personal relationships with other members in the society. Social networks
represent and model the social ties among individuals. With the rapid
expansion of the web, there is a tremendous growth in online interaction of
the users. Many social networking sites, e.g., Facebook, Twitter etc. have
also come up to facilitate user interaction. As the number of interactions
have increased manifold, it is becoming difficult to keep track of these
communications. Human beings tend to get associated with people of similar
likings and tastes. The easy-to-use social media allows people to extend their
social life in unprecedented ways since it is difficult to meet friends in the
physical world, but much easier to find friends online with similar interests.
These real-world social networks have interesting patterns and properties
which may be analysed for numerous useful purposes. Social networks have
a characteristic property to exhibit a community structure. If the vertices of
the network can be partitioned into either disjoint or overlapping sets of
vertices such that the number of edges within a set exceeds the number of
edges between any two sets by some reasonable amount, we say that the
network displays a community structure. Networks disp+laying a community
structure may often exhibit a hierarchical community structure as well1 .
The process of discovering the cohesive groups or clusters in the network is
known as community detection. It forms one of the key tasks of Social
network analysis2 . The detection of communities in social networks can be
useful in many applications where group decisions are taken, e.g.,
multicasting a message of interest to a community instead of sending it to
each one in the group or recommending a set of products to a community.
The applications of community detection have been highlighted towards the
end of the article. State of the art in community detection research for social
networks is presented in this work. The paper begins with the basic concepts
of social networks and communities. Various methods for community
detection are categorised and discussed in the next section followed by list of
standard datasets used for analysis in community detection research along
with the links for download if available online. Some potential applications of
25. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
25
community detection in social networks are briefly described in the next
section. Discussion section argues the advantages of using a method with
respect to another, the kind of community structure they obtain, etc. and
the conclusion section concludes the paper. BASIC CONCEPTS SOCIAL
NETWORK A social network is depicted by social network graph 𝐺 consisting
of 𝑛 number of nodes denoting 𝑛 individuals or the participants in the
network. The connection between node 𝑖 and node 𝑗 is represented by the
edge 𝑒𝑖𝑗 of the graph. A directed or an undirected graph may illustrate these
connections between the participants of the network. The graph can be
represented by an adjacency matrix 𝐴 in which 𝐴𝑖𝑗 = 1 in case there is an
edge between 𝑖 and 𝑗 else 𝐴𝑖𝑗 = 0. Social networks follow the properties of
complex networks3,4 . Some real life examples1 of social networks include
friends based, telephone, email and collaboration networks. These networks
can be represented as graphs and it is feasible to study and analyse them to
find interesting patterns amongst the entities. These appealing prototypes
can be utilized in various useful applications. Community A community can
be defined as a group of entities closer to each other in comparison to other
entities of the dataset. Community is formed by individuals such that those
within a group interact with each other more frequently than with those
outside the group. The closeness between entities of a group can be
measured via similarity or distance measures between entities. McPherson et
al5 stated that “similarity breeds connection”. They discussed various social
factors which lead to similar behaviour or homophily in networks. The
communities in social networks are analogous to clusters in networks. An
individual represented by a node in graphs may not be part of just a
community or a group, it may be an element of many closely associated or
different groups existing in the network. For example a person may
concurrently belong to college, school, friends and family groups. All such
communities which have common nodes are called overlapping
communities. Identification and analysis of the community structure has
been done by many researchers applying methodologies from numerous
form of sciences. The quality of clustering in networks is normally judged by
26. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
26
clustering coefficient which is a measure of how much the vertices of a
network tend to cluster together. The global clustering coefficient6 and the
local clustering coefficient7 are two types of clustering coefficients discussed
in literature. Methods for grouping similar items Communities are those
parts of the graph which have denser connections inside and few
connections with the rest of the graph8 . The aim of unsupervised learning is
to group together similar objects without any prior knowledge about them. In
case of networks, the clustering problem refers to grouping of nodes
according to their similarity computed based on topological features and/or
other characteristics of the graph. Network partitioning and clustering are
two commonly used methods in literature to find the groups in the social
network graph. These methods are briefly described in the next subsections.
Graph partitioning Graph partitioning is the process of partitioning a graph
into a predefined number of smaller components with specific properties. A
common property to be minimized is called cut size. A cut is a partition of
the vertex set of a graph into two disjoint subsets and the size of the cut is
the number of edges between the components. A multicut is a set of edges
whose removal divides the graph into two or more components. It is
necessary to specify the number of components one wishes to get in case of
graph partitioning. The size of the components must also be specified, as
otherwise a likely but not meaningful solution would be to put the minimum
degree vertex into one component and the rest of the vertices into another.
Since the number of communities is usually not known in advance, graph
partitioning methods are not suitable to detect communities in such cases.
Clustering Clustering is the process of grouping a set of similar items
together in structures known as clusters. Clustering the social network
graph may give a lot of information about the underlying hidden attributes,
relationships and properties of the participants as well as the interactions
among them. Hierarchical clustering and partitioning method of clustering
are the commonly used clustering techniques used in literature. In
hierarchical clustering, a hierarchy of clusters is formed. The process of
hierarchy creation or levelling can be agglomerative or divisive. In
27. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
27
agglomerative clustering methods, a bottom-up approach to clustering is
followed. A particular node is clubbed or agglomerated with similar nodes to
form a cluster or a community. This aggregation is based on similarity. In
divisive clustering approaches, a large cluster is repeatedly divided into
smaller clusters. Partitioning methods begin with an initial partition amidst
the number of clusters pre-set and relocation of instances by moving them
across clusters, e.g., K-means clustering. An exhaustive evaluation of all
possible partitions is required to achieve global optimality in partitioned-
based clustering. This is time consuming and sometimes infeasible, hence
researchers use greedy heuristics for iterative optimization in partitioning
methods of clustering. The next section categorizes and discusses major
algorithms for community detection. ALGORITHMS FOR COMMUNITY
DETECTION A number of community detection algorithms and methods
have been proposed and deployed for the identification of communities in
literature. There have also been modifications and revisions to many
methods and algorithms already proposed. A comprehensive survey of
community detection in graphs has been done by Fortunato8 in the year
2010. Other reviews available in literature are by Coscia et al9 in 2011,
Fortunato et al10 in 2012, Porter et al11 in 2009, Danon et al12 in 2005,
and Plantié et al13 in 2013. The presented work reviews the algorithms
available till 2015 to the best of our knowledge including the algorithms
given in the earlier surveys. Papers based on new approaches and
techniques like big data, not discussed by previous authors have been
incorporated in our article.The algorithms for community detection are
categorized into approaches based on graph partitioning, clustering, genetic
algorithms, label propagation along with methods for overlapping community
detection (clique based and non-clique based methods), and community
detection for dynamic networks. Algorithms under each of these categories
are described below. Graph partitioning based community detection Graph
partitioning based methods have been used in literature to divide the graph
into components such that there are few connections between components.
The Kernighan-Line14 algorithm for graph partitioning was amongst the
28. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
28
earliest techniques to divide a graph. It partitions the nodes of the graph
with cost on edges into subsets of given sizes so as to minimize the sum of
costs on all edges cut. A major disadvantage of this algorithm however is
that the number of groups have to be predefined. The algorithm however is
quite fast with a worst case running time of 𝑂(𝑛 2 ). Newman15 reduces the
widely-studied maximum likelihood method for community detection to a
search through a group of candidate solutions, each of which is itself a
solution to a minimum cut graph partitioning problem. The paper shows
that the two most essential community inference methods based on the
stochastic block model or its degree-corrected variant16 can be mapped onto
versions of the familiar minimum-cut graph partitioning problem. This has
been illustrated by adapting Laplacian spectral partitioning method17, 18 to
perform community inference. Clustering based community detection The
main concern of community detection is to detect clusters, groups or
cohesive subgroups. The basis of a large number of community detection
algorithms is clustering. Amongst the innovators of community detection
methods, Girvan and Newman19 had a main role. They proposed a divisive
algorithm based on edge-betweenness for a graph with undirected and
unweighted edges. The algorithm focused on edges that are most “between”
the communities and communities are constructed progressively by
removing these edges from the original graph. Three different measures for
calculation of edge-betweenness in vertices of a graph were proposed in
Newman and Girvan20 . The worst-case time complexity of the edge
betweenness algorithm is 𝑂(𝑚2𝑛) and is 𝑂(𝑛 3 ) for sparse graphs, where m
denotes the number of edges and n is the number of vertices. The Girvan
Newman (GN) algorithm has been enhanced by many authors and applied to
various networks21-28. Chen et al22 extended GN algorithm to partition
weighted graphs and used it to identify functional modules in the yeast
proteome network. Rattigan et al21 proposed the indexing methods to
reduce the computational complexity of the GN algorithm significantly.
Pinney et al24 also build an algorithm which uses GN algorithm for the
decomposition of networks based on graph theoretical concept of
29. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
29
betweenness centrality. Their paper inspected utility of betweenness
centrality to decompose such networks in diverse ways. Radicchi29 et al also
proposed an algorithm based on GN algorithm introducing a new definition
of community. They defined ‘strong’ and ‘weak’ communities. The algorithm
uses an edge clustering coefficient to perform the divisive edge removal step
of GN and has a running time of 𝑂( 𝑚4 𝑛2 ) and 𝑂(𝑛 2 ) for sparse graphs.
Moon et al30 have proposed and implemented the parallel version of the GN
algorithm to handle large scale data. They have used MapReduce model
(Apache Hadoop) and GraphChi. Newman and Girvan first defined a
measure known as ‘modularity’ to judge the quality of partitions or
communities formed20 . The modularity measure proposed by them has
been widely accepted and used by researchers to gauge the goodness of the
modules obtained from the community detection algorithms with high
modularity corresponding to a better community structure. Modularity was
defined as ∑𝒊𝒆𝒊𝒊 − 𝒂𝒊𝟐 , where 𝒆𝒊𝒊 denotes fraction of the edges that connect
vertices in community 𝒊, 𝒆𝒊𝒋 denotes fraction of the edges connecting vertices
in two different communities 𝒊 and 𝒋 while 𝒂𝒊 = ∑𝒋𝒆𝒊𝒋 is the fraction of edges
that connect to vertices in community 𝒊. The value 𝑄 = 1 indicates a network
with strong community structure. The optimization of modularity function
has received great attention in literature. The table 1 lists clustering based
community detection methods, including algorithms which use modularity
and modularity optimization. Newman31 has worked to maximize
modularity so that the process of aggregating nodes to form communities
leads to maximum modularity gain. This change in modularity upon joining
two communities defined as ∆𝑄 = 𝑒𝑖𝑗 + 𝑒𝑗𝑖 − 2𝑎𝑖𝑎𝑗 = 2(𝑒𝑖𝑗−𝑎𝑖𝑎𝑗) can be
calculated in constant time and hence is faster to execute in comparison to
the GN algorithm. The run time of the algorithm is 𝑂(𝑛 2 ) for sparse graphs
and 𝑂((𝑚 + 𝑛)𝑛) for others. In a recent work, a scalable version of this
algorithm has been implemented using MapReduce by Chen et al32.
Newman33 generalized the betweenness algorithm for weighted networks.
The modularity was now represented as 𝑄 = 1 2𝑚 ∑𝑖𝑗[𝐴𝑖𝑗 − 𝑘𝑖𝑘𝑗 2𝑚 ]δ(𝑐𝑖 , 𝑐𝑗)
where 𝑚 = 1 2 ∑𝑖𝑗𝐴𝑖𝑗 represents the number of edges between communities 𝑐𝑖
30. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
30
and 𝑐𝑗 in the graph, while 𝑘𝑖 , 𝑘𝑗 are degrees of vertices 𝑖 and 𝑗 while 𝛿(𝑢, 𝑣) is
1 if 𝑢 = 𝑣 and 0 otherwise. Newman34 in yet another approach characterised
the modularity matrix in terms of eigenvectors. The equation for modularity
was changed to 𝑄 = 1 4𝑚𝑠𝑇𝐵𝑠 , where the modularity matrix was given as 𝐵𝑖𝑗
= 𝐴𝑖𝑗 − 𝑘𝑖𝑘𝑗 2𝑚 and modularity was defined using eigenvectors of the
modularity matrix. The algorithm runs in 𝑂(𝑛 2 𝑙𝑜𝑔𝑛) time, where 𝑙𝑜𝑔𝑛
represents the average depth of the dendrogram .
31. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
31
Clauset et al35 used greedy optimization of modularity to detect
communities for large networks. For a network structure with m edges and n
vertices, the algorithm has a running time of (𝑚𝑑𝑙𝑜𝑔𝑛) , where ‘d’ denotes the
depth of the dendrogram. For sparse real world networks the running time
is(𝑛𝑙𝑜𝑔2n) . Blondel et al36 designed an iterative two phase algorithm known
as Louvain method. In first phase, all nodes are placed into different
communities and then the modularity gain of moving a node 𝑖 from one
community to another is found. In case this modularity gain is positive, the
32. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
32
node is shifted to a new community. In second phase all the communities
found in earlier phase are treated as nodes and the weight of links is found.
The algorithm improves the time complexity of the GN algorithm. It has a
linear run time of 𝑂(𝑚). Guimera et al37 used simulated annealing for
modularity optimization and showed that computing the modularity of a
network is similar to determining the ground-state energy of a spin system.
Additionally, the authors showed that the stochastic network models give
rise to modular networks due to fluctuations. Zhou et al38 attempted to
improve modularity using simulated annealing introducing the idea of ‘inter
edges’ and ‘intra edges’. The authors modified the modularity equation to
include inter and intra edges as 𝑄 = 1 2𝑚 ∑ [(𝐴𝑖𝑗𝑛𝑖𝑗 − 𝑘𝑖𝑘𝑗 2𝑚 )𝛿(𝐶𝑖 , 𝐶𝑗) − 𝛽
(𝐴𝑖𝑗 − 𝑘𝑖𝑘𝑗 2𝑚 ) 𝛼 (1 − 𝛿(𝐶𝑖 , 𝐶𝑗))] Intra factor Inter factor Here α and β are
undetermined parameters and affect the value of the inter-factor. The value
of β is increased and α is reduced when large communities are expected.
Duch et al39 proposed a heuristic search based approach for the
optimization of modularity function using extremal optimization technique,
which has a complexity of 𝑂(𝑛 2 𝑙𝑜𝑔2𝑛). AdClust method40 can extract
modules from complex networks with significant precision and strength.
Each node in the network is assumed to act as a self-directed agent
representing flocking behaviour. The vertices of the network travel towards
the desirable adjoining groups. Wahl and Sheppard41 proposed hierarchical
fuzzy spectral clustering based approach. They argued that determining the
sub-communities and their hierarchies are as important as determining
communities within a network. DENGRAPH42 algorithm uses the idea of
density-based incremental clustering of spatial data and is intended to work
for large dynamic datasets with noise. The Markov Clustering
Algorithm(MCL)43 is a graph flow simulation algorithm which can be used to
detect clusters in a graph and is analogous to detection of communities in
the networks. This algorithm consists of two alternate processes of
‘expansion’ and ‘inflation’. Markov chains are employed to perform random
walk through a graph. The method has a worst case run time of 𝑂(𝑛𝑘 2 )
where n represents the number of nodes and k is the number of resources.
33. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
33
Nikolaev et al44 used ‘entropy centrality measure’ based on Markovian
process to iteratively detect communities. A random walk through the nodes
is performed to find the communities existing in the network structure. For a
graph, the transition probability matrix for a Markov chain is created. A
locality t is selected and those edges for which the average entropy centrality
for the nodes over the graph is reduced are selected and removed. The
algorithm proposed by Steinhaeuser et al45 performs many short random
walks and interprets visited nodes during the same walk as similar nodes
which gives an indication that they belong to the same community. The
similar nodes are aggregated and community structure is created using
consensus clustering. It has a runtime of 𝑂(𝑛 2 𝑙𝑜𝑔𝑛).
Genetic algorithms (GA) based community detection :
Genetic algorithms (GA) are adaptive heuristic search algorithms whose aim
is to find the best solution under the given circumstances. A genetic
algorithm starts with a set of solutions known as chromosomes and fitness
function is calculated for these chromosomes. If a solution with a maximum
fitness is obtained, one stops else with some probability crossover and
mutation operators are applied to the current set of solutions to obtain the
new set of solutions. Community detection can be viewed as an optimization
problem in which an objective function that captures the intuition of a
community with better internal connectivity than external connectivity is
chosen to be optimized. GA have been applied to the process of community
discovery and analysis in a few recent research works. These are described
briefly in this section. Table 2 enlists the algorithms available in literature
for community detection based on GA.
34. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
34
Pizzuti46 proposed the GA-Net algorithm which uses a locus based graph
representation of the network. The nodes of the social network are depicted
by genes and alleles. The algorithm introduces and optimizes the community
score to measure the quality of partitioning. All the dense communities
present in the network structure are obtained at the end of the algorithm by
selectively exploring the search space, without the need to know in advance
the exact number of groups. Another GA based approach MOGA-Net47
proposed by the same author optimizes two objective functions i.e. the
community score and community fitness . The higher the community score,
the denser the clustering obtained. The community fitness is sum of fitness
of nodes belonging to a module. When this sum reaches its maximum, the
number of external links is minimized. MOGA-Net generates a set of
communities at different hierarchical levels in which solutions at deeper
levels, consisting of a higher number of modules, are contained in solutions
having a lower number of communities. Hafez et al48 have performed both
Single-Objective and Multi-Objective optimization for community detection
problem. The former optimization was done using roulette selection based
GA while NSGA-II algorithm was used for the latter process. Mazur et al49
35. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
35
have used modularity as the fitness function in addition to the community
score. The authors worked on undirected graphs and their algorithm can
also discover single node communities. Liu et al50 used GA in addition to
clustering to find the community structures in a network. The authors have
used a strategy of repeated divisions. The graph is initially divided into two
parts, then the subgraphs are further divided and a nested GA is applied to
them. Tasgin et al51 have also optimized the network modularity using GA.
A multi-cultural algorithm52 for community detection employs the fitness
function defined by Pizzuti46 in GA-Net. The belief space which is a state
space for the network and contains a set of individuals that have a better
fitness value has been used in this work to guide the search direction by
determining a range of possible states for individuals. A genetic algorithm for
the optimization of modularity, proposed by Nicosia et al53 and has been
explained in the overlapping communities section later.
Label propagation based community detection:
Label propagation in a network is the propagation of a label to various nodes
existing in the network. Each node attains the label possessed by a
maximum number of the neighbouring nodes. This section discusses some
label propagation based algorithms for discovering communities. Table 3
contains a listing of these algorithms, discussed in detail later in the section.
36. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
36
Label Propagation Algorithm(LPA) was proposed by Raghavan et al54 in
which initially each node tries to achieve a label from the maximum number
of labels possessed by its neighbours. The stopping criteria for the process
was also the same, i.e., when each node achieves a label, which a maximum
number of its neighbouring nodes have. Each iteration of the algorithm
takes 𝑂(𝑚) time where m is the number of edges. SLPA (speaker listener label
propagation algorithm)55 is an extension to LPA which could analyse
different kinds of communities such as disjoint communities, overlapping
communities and hierarchical communities in both unipartite and bipartite
networks. The algorithm has a linear run time of 𝑂(𝑇𝑚), where T is the user
defined maximum number of iterations and m is the number of edges. Based
on the SLPA algorithm, Hu56 proposed a Weighted Label Propagation
Algorithm (WLPA). It uses the similarity between any two of the vertices in a
network based on the labels of the vertices achieved in label propagation.
The similarity of these vertices is then used as a weight of the edge in label
propagation. LPA was further improved by Gregory57 in his algorithm
COPRA (Community Overlap Propagation Algorithm). It was the first label
propagation based procedure which could also detect overlapping
communities. The run time per iteration is 𝑂(𝑣𝑚𝑙𝑜𝑔(𝑣𝑚⁄𝑛), here n is the
number of nodes, m is the edges and v is the maximum number of
37. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
37
communities per vertex. LabelRank Algorithm58 uses the LPA and MCL
(Markov Clustering Algorithm). The node identifiers are used as labels. Each
node receives a number of labels from the neighbouring nodes. A community
is formed for nodes having the same highest probability label. Four operators
are applied namely propagation which propagates the label to neighbours,
inflation i.e. the inflation operator of the MCL algorithm, cut-off operator that
removes the labels below a threshold and an explicit conditional update
operator responsible for a conditional update. The algorithm runs in 𝑂(𝑚)
time where m is the number of edges. The LabelRank algorithm was modified
to LabelRankT algorithm by Xie et al60. This algorithm included both the
edge weights and the edge directions in the detection of communities. This
algorithm works for dynamic networks as well and is able to detect evolving
communities also. Wu et al59 proposed a Balanced Multi Label Propagation
Algorithm (BMPLA) for detection of overlapping communities. Using this
algorithm, vertices can belong to any number of communities without having
a global maximum limit on largest number of communities membership
required by COPRA57 . Each iteration of the algorithm takes 𝑂(𝑛𝑙𝑜𝑔𝑛) time to
execute, where n is the number of nodes. Semantics based community
detection Semantic content and edge relationships in a semantic network
may be additionally used to partition the nodes into communities. The
context, as well as the relationship of the nodes, both are taken into
consideration in the process of semantic community detection. LDA(Latent
Dirichlet Allocation)61 is used in several semantic community based
community detection approaches. A clustering algorithm based on the link-
field-topic (LFT) model is put forward by Xin et al62 to overcome the
limitation of defining the number of communities beforehand. The study
forms the semantic link weight (SLW) based on the investigation of LFT, to
evaluate the semantic weight of links for each sampling field. The proposed
clustering algorithm is based on the SLW which could separate the semantic
social network into clustering units. In another work63 the authors have
used ARTs model and divided the process into two phases namely LDA
sampling and community detection. In the former process multiple sampling
38. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
38
ARTs have been designed. A community clustering algorithm has also been
proposed. The procedure could detect the overlapping communities. Xia et
al64 constructed a semantic network using information from the comment
content extracted from the initial HTML source files. An average score is
obtained for two users for each link assuming comments to be implicit links
between people. An analytic method for taking out comment content is
proposed to build the semantic network for example, the terms and phrases
in data are counted in comments as supportive or opposing. Each phrase is
given an associated numerical trust value. On this semantic network, the
classical community detection algorithm is applied henceforth. Ding65 has
considered the impact of topological as well as topical elements in
community detection. Topology based approaches are based on the idea that
the real world networks can be modelled as graphs where the nodes depict
the entities whereas the interactions between them are shown by the edges
of the graph. On the other hand topic based community detection have a
basis that the more words two objects share, the more similar they are. The
author performs systematic analysis with topology-based and topic-based
community detection methodologies on the co-authorship networks. The
paper puts forward the argument that, to detect communities, one should
take into account together the topical and topological features of networks. A
community detection algorithm, SemTagP (Semantic Tag propagation) has
been proposed by Ereteo et al66 that takes yield of the semantic data
captured while organizing the RDF graphs of social networks. It basically is
an extension of the LPA54 algorithm to perform the semantic propagation of
tags. The algorithm detects and moreover labels communities using the tags
used by group during the social labelling process and the semantic
associations derived between tags. In a study by Zhao et al67, a topic
oriented approach consisting of an amalgam of social objects clustering and
link analysis has been used. Firstly a modified form of k means clustering
named as ‘Entropy Weighting K-Means (EWKM) algorithm’ has been used to
cluster the social objects. A subspace clustering algorithm is applied to
cluster all the social objects into topics. On the clusters obtained in this
39. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
39
process, topical community detection or link analysis is performed using a
modularity optimization algorithm. The members of the objects are
separated into topical clusters having unique topic. A link analysis is
performed on each topical cluster to discover the topical communities. The
end result of the entire method is topical communities. A community
extraction approach is given by Abdelbary et al68, which integrates the
content published within the social network with its semantic features.
Community discovery is performed using two layer generative Restricted
Boltzmann Machines model. The model presumes that members of a
community communicate over matters of common concern. The model
permits associate members to belong to multiple communities. Latent
semantic analysis (LSA)69 and Latent Dirichlet Allocation (LDA)61 are the
two techniques extensively employed in the process to detect topical
communities. Nyugen et al70 have used LDA to find hyper groups in the blog
content and then sentiment analysis is done to further find the meta-groups
in these units. A Link-Content model is proposed by Natarajan et al71 for
discovering topic based communities in social networks. Community has
been modelled as a distribution employing Gibbs sampling. This paper uses
links and content to extract communities in a content sharing network
Twitter. Methods to detect overlapping communities A recent survey by
Amelio et al gives a comprehensive review of major overlapping community
detection algorithms and includes the methods on dynamic networks. There
exists another review of methods for discover overlapping communities done
by Xie et al72. The following section discusses some of the methods to detect
overlapping communities. Tables 4 and 5 enlist the methods discussed in
this section. Clique based methods for overlapping community detection A
community can be interpreted as a union of smaller complete (fully
connected) subgraphs that share nodes. A k-clique is a fully connected
subgraph consisting of k nodes. A k-clique community can be defined as
union of all k-cliques that can be reached from each other through a series
of adjacent k-cliques. Many researchers have used cliques to detect
40. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
40
overlapping communities. Important contributions using cliques for
overlapping community detection are summarized in table 4.
The Clique Percolation Method (CPM) was proposed by Palla et al73 to detect
overlapping communities. The method first finds all cliques of the network
and uses the algorithm of Everett et al83 to identify communities by
component analysis of clique-clique overlap matrix. CPM has a runtime of
𝑂(exp(𝑛)). The Clique Percolation Method (CPM) method proposed by Palla et
al73 could not discover the hierarchical structure along with the overlapping
attribute. This limitation was overcome through method proposed by
Lancichinetti et al74 . It performs a local exploration in order to find the
community for each of the node. In this process the nodes may be revisited
any number of times. The main objective was to find local maxima based on
a fitness function. CFinder84 software was developed using CPM for
overlapping community detection. Du et al75 proposed ComTector
(Community DeTector) for detection of overlapping communities using
maximal cliques. Initially all maximal cliques in the network are found which
form the kernels of potential community. Then agglomerative technique is
iteratively used to add the vertices left to their closest kernels. The obtained
clusters are adjusted by merging pair of fractional communities in order to
41. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
41
optimize the modularity of the network. The running time of the algorithm is
(𝐶∗𝑇2 ), where the communities detected are denoted by C and T is the
number of triangles in the network. EAGLE, an agglomerative hierarchical
clustering based algorithm has been proposed by Shen et al 76. In the first
step maximal cliques are discovered and those smaller than a threshold are
discarded. Subordinate maximal cliques are neglected, remaining give the
initial communities (also the subordinate vertices). The similarity is found
between these communities, and communities are repeatedly merged
together on the basis of this similarity. This is repeated till one community
remains at the end. Evans et al77 proposed that by partitioning the links of
a network, the overlapping communities may be discovered. In an extension
to this work, Evans et al78 used weighted line graphs. In another work
Evans79 used clique graphs to detect the overlapping communities in real
world social networks. GCE (Greedy Clique Expansion)80 first identifies
cliques in a network. These cliques act as seeds for expansion along with the
greedy optimization of a fitness function. A community is created by
expanding the selected seed and performing its greedy optimization via the
fitness function proposed by Lancichinetti et al74 . CONGA (Cluster-Overlap
Newman Girvan Algorithm) was proposed by Gregory25. This method was
based on split- betweenness algorithm of Girvan-Newman. The runtime of
the method is 𝑂(𝑚3 ). In another work CONGO81 (CONGA Optimized)
algorithm was proposed which used local betweenness measure, leading to
an improved complexity 𝑂(𝑛𝑙𝑜𝑔𝑛). A two phase Peacock algorithm for
detection of overlapping communities is proposed in Gregory82 using
disjoint community detection approaches. In the first phase, the network
transformation was performed using the split betweenness concept proposed
earlier by the author. In the second phase, the transformed network is
processed by a disjoint community detection algorithm and the detected
communities were converted back to overlapping communities of the original
network. Non clique methods for overlapping community detection Some
other non-clique methods to discover overlapping communities are given in
the table 5. These methods have been briefly explained in this section. An
42. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
42
extension of Newman’s modularity for directed graphs and overlapping
communities was done by Nicosia et al53 and modularity was given by 𝑄𝑜𝑣 =
1 𝑚 ∑𝑐∊𝐶 ∑𝑖,𝑗∊𝑉[𝛽𝑙(𝑖,𝑗),𝑐𝐴𝑖𝑗 − 𝛽𝑙(𝑖,𝑗),𝑐𝑜𝑢𝑡𝑘𝑖𝑜𝑢𝑡𝛽𝑙(𝑖,𝑗),𝑐𝑖𝑛𝑘𝑗𝑖𝑛𝑚 . The authors
defined a belongingness coefficient 𝛽𝑙,𝑐 of an edge 𝑙 connecting nodes 𝑖 and 𝑗
for a particular community 𝑐 and is given by 𝛽𝑙,𝑐 = ℱ(𝛼𝑖,𝑐 , 𝛼𝑗,𝑐) where
definition for ℱ(𝛼𝑖,𝑐 , 𝛼𝑗,𝑐) is taken as arbitrary, e.g., it can be taken as a
product of the belonging coefficients of the nodes involved, or as max(𝛼𝑖,𝑐 ,
𝛼𝑗,𝑐). 𝛽𝑙(𝑖,𝑗),𝑐𝑜𝑢𝑡 = ∑𝑗∊𝑉ℱ(𝛼𝑖,𝑐,𝛼𝑗,𝑐) |𝑉| , 𝛽𝑙(𝑖,𝑗),𝑐𝑖𝑛 = ∑𝑖∊𝑉ℱ(𝛼𝑖,𝑐,𝛼𝑗,𝑐) |𝑉| . A
genetic approach has been used in this work for the optimization of
modularity function. Another work which uses genetic approach to
overlapping community detection is GA-Net+, by Pizzuti85 GA-NET+ could
detect overlapping communities using edge clustering. Order Statistics Local
Optimization Method(OSLOM)86 detects clusters in networks, and can
handle various kind of graph properties like edge direction, edge weights,
overlapping communities, hierarchy and network dynamics. It is based on
local optimization of a fitness function expressing the statistical significance
of clusters with respect to random fluctuations, which is estimated with
tools of Extreme and Order Statistics. Baumes et al87 considered a
community as a subset of nodes which induces a locally optimal subgraph
with respect to a density function. Two different subsets with significant
overlap can be locally optimal which forms the basis to find overlapping
communities. Chen et al 88 used game-theoretic approach to address the
issue of overlapping communities. Each node is assumed to be an agent
trying to improve the utility by joining or leaving the community. The
community of the nodes in Nash equilibrium are assumed to form the
output of the algorithm. Utility of an agent is formulated as combination of a
gain and a loss function. To capture the idea of overlapping communities,
each agent is permitted to select multiple communities. In another game-
theoretic approach, Alvari et al89 proposed an algorithm consisting of two
methods PSGAME based on Pearson correlation, and NGGAME centred on
neighbourhood similarity measure. Alvari et al90 proposed the Dynamic
Game Theory method (D-GT) which treated nodes as rational agents. These
43. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
43
agents perform actions in iterative and game theoretic manner so as to
maximize the total utility.
Community detection for Dynamic networks:
Dynamic networks are the networks in which the membership of the
nodes of communities evolve or change over time. The task of community
identification for dynamic networks has received relatively less attention
than the static networks The methods have been categorized into two classes
by Bansal et al94, one designed for data which is evolving in real time known
as incremental or online community detection; and the other for data where
all the changes of the network evolution are known a priori, known as offline
community detection. Wolf et al95 proposed mathematical and
computational formulations for the analysis of dynamic communities on the
basis of social interactions occurring in the network. Tantipathananandh et
al96 made assumptions about the individual behaviour and group
membership. Henceforth they framed the objective as an optimization
problem by formulating three cost functions, namely i-cost, g-cost and c-
cost. Graph colouring and heuristics based approach were deployed.
FacetNet, proposed by Lin et al97 is a unified framework to study the
dynamic evolutions of communities. The community structure at any time
includes the network data as well as the previous history of the evolution.
They have used a cost function and proposed an iterative algorithm which
converges to an optimal solution. Palla et al98 conducted experiments on
two diverse datasets of phone call network and collaboration network to find
time dependence. After building joint graphs for two time steps, the CPM
algorithm73 was applied. They have used an auto-correlation function to
find overlap among two states of a community, and a stationarity parameter
which denotes the average correlation of various states. Greene et al99
proposed a heuristic technique for identification of dynamic communities in
the network data. They represented the dynamic network graph as an
aggregation of time step graphs. Step communities represent the dynamic
44. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
44
communities at a particular time. The algorithm begins with the application
of a static community detection algorithm on the graph. In the subsequent
steps, dynamic communities are created for each step and Jaccard similarity
is calculated. They have also generated benchmark dataset for experimental
work. The algorithm by Bansal et al94 involves the addition or deletion of
edges in the network. The algorithm is built on the greedy agglomerative
technique of the modularity based method earlier proposed in the work of
Clauset et al35 . He et al100 improvised Louvain method36 to include
concept of dynamicity in the formation of communities. A key point in their
algorithm is to make use of previously detected communities at time 𝑡 − 1 to
identify the communities at time 𝑡. Dinh et al101 proposed A3CS, an
adaptive framework which uses the power-law distribution and achieves
approximation guarantees for the NP-hard modularity maximization
problem, particularly on dynamic networks. Nguyen et al102 have attempted
to identify disjoint community structure in dynamic social networks. An
adaptive modularity-based framework Quick Community Adaptation (QCA)
is proposed. The method finds and traces the progress of network
communities in dynamic online social networks. Takaffoli et al103 have
proposed a two-step approach to community detection. In the first step the
communities extracted at different time instances are compared using
weighted bipartite matching. Next, a ‘meta’ community is constructed which
is defined as a series of similar communities at various time instances. Five
events to capture the changes to community are split, survive, dissolve,
merge, and form. A similarity function is used to calculate the similarity
between two communities and a community matching algorithm has been
employed thereafter. The authors, Kim et al104 proposed a particle-and-
density based evolutionary clustering method for discovery of communities
in dynamic networks. Their approach is grounded on the assumption that a
network is built of a number of particles termed as nano-communities,
where each community further is made up of particles termed as quasi-
clique-by-clique (l-KK). The density based clustering method uses cost
embedding technique and optimal modularity method to ensure temporal
45. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
45
smoothness even when the number of cluster varies. They have used an
information theory based mapping technique to recognize the stages of the
community i.e. evolving, forming or dissolving. Their method improves
accuracy and is time efficient as compared to the FacetNet method proposed
earlier. In another approach proposed by Chi et al105, two frameworks for
evolutionary spectral clustering have been proposed namely PCQ (Preserving
cluster quality) and PCM (Preserving cluster membership). In this work the
temporal smoothness is ensured by some terms in the clustering cost
functions. These two frameworks combine the processes of community
extraction and the community evolution process. They use a cost function
which consists of the snapshot and temporal cost. The clustering quality of
any partition determines the snapshot cost while the temporal cost definition
varies for each of the frameworks. For PCQ framework, the temporal cost is
decided by the cluster quality when the current partition is applied to the
historic data. In PCM, the difference between the current and the historic
partition gives the temporal cost. Both the frameworks proposed, can tackle
the change in number of clusters. In their work DYNMOGA (Dynamic
MultiObjective Genetic Algorithm), the authors Folino et al106 have used a
genetic algorithm based approach to dynamic community detection. They
attempt to achieve temporal smoothness by multiobjectiveoptimisation,
i.e.maximisation of snapshot quality (community score is used) and
minimization of temporal cost (here NMI is used). Kim et al107 in their
method CHRONICLE have performed two stage clustering and the method
can detect clusters of path group type also in addition to the single path type
clusters. In first stage of the algorithm, called as CHRONICLE1st the cosine
similarity measure is used. In second stage of the algorithm the measure
proposed and used is general similarity (GS). It is a combination of the two
measures structural affinity and weight affinity.
46. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
46
SOME POTENTIAL APPLICATIONS OF COMMUNITY
DETECTION:
With the enormous growth of the social networking site users, the graphs
representing these sites are becoming very complex, hence difficult to
visualize and understand. Communities can be considered as a summary of
the whole network thus making the network easy to comprehend. The
discovery of these communities in social networks can be useful in various
applications. Some of the applications where community detection is useful
are briefly described below.
Improving recommender systems with community detection
Recommender Systems use data of similar users or similar items to
generate recommendations. This is analogous to the identification of groups,
or similar nodes in a graph. Hence community detection holds an immense
potential for recommendation algorithms. Cao et al114 have used a
community detection based approach to improve the traditional collaborative
filtering process of Recommender Systems. The process starts with the
mapping of user-item matrix to user similarity structure. On this matrix, a
discrete PSO (particle swarm optimization) algorithm is applied to detect
communities. The items are then recommended to the user based on the
discovered communities.
Evolution of communities in social media
With the increase in the number of social networking sites, the focus and
scope of sites are getting expanded. The sites are getting diversified in terms
of focus. In addition to common sites like Facebook, Twitter, MySpace and
Bebo, other sites like Flickr for photo-sharing have also come up. The
analysis of the tweetretweet and the follower-followee network in twitter
provides an insight into the community structure existing in the Twitter
network. Sentiment analysis of the tweets may be performed as an
intermediary step to find the general nature of the tweets and then
community detection algorithms may be applied to help deduce the
structure of communities. Zalmout et al115, applied the community
47. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
47
detection algorithm to UK political tweets dataset. CQA(Community question
answering) has been used by Zhang et al116 to discover overlapping
communities in dynamic networks based on user interactions.
Related Work:
Social Network:
A social network is depicted by social network graph consisting of number
of nodes denoting individuals or the participants in the network. The
connection between node and node is represented by the edge of the graph.
A directed or an undirected graph may illustrate these connections between
the participants of the network. The graph can be represented by an
adjacency matrix in which in case there is an edge between and else.
Social networks follow the properties of complex networks3,4. Some real life
examples1 of social networks include friends based, telephone, email and
collaboration networks. These networks can be represented as graphs and it
is feasible to study and analyse them to find interesting patterns amongst
the entities. These appealing prototypes can be utilized in various useful
applications.
Community:
A community can be defined as a group of entities closer to each other in
comparison to other entities of the dataset. Community is formed by
individuals such that those within a group interact with each other more
frequently than with those outside the group. The closeness between entities
of a group can be measured via similarity or distance measures between
entities. McPherson et al5 stated that “similarity breeds connection”. They
discussed various social factors which lead to similar behaviour or
homophily in networks. The communities in social networks are analogous
to clusters in networks. An individual represented by a node in graphs may
not be part of just a community or a group, it may be an element of many
48. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
48
closely associated or different groups existing in the network. For example a
person may concurrently belong to college, school, friends and family
groups. All such communities which have common nodes are called
overlapping communities. Identification and analysis of the community
structure has been done by many researchers applying methodologies from
numerous form of sciences. The quality of clustering in networks is
normally judged by clustering coefficient which is a measure of how much
the vertices of a network tend to cluster together. The global clustering
coefficient6 and the local clustering coefficient7 are two types of clustering
coefficients discussed in literature.
Methods for grouping similar items:
Communities are those parts of the graph which have denser connections
inside and few connections with the rest of the graph8. The aim of
unsupervised learning is to group together similar objects without any prior
knowledge about them. In case of networks, the clustering problem refers to
grouping of nodes according to their similarity computed based on
topological features and/or other characteristics of the graph. Network
partitioning and clustering are two commonly used methods in literature to
find the groups in the social network graph. These methods are briefly
described in the next subsections.
Graph partitioning Graph partitioning is the process of partitioning a graph
into a predefined number of smaller components with specific properties. A
common property to be minimized is called cut size. A cut is a partition of
the vertex set of a graph into two disjoint subsets and the size of the cut is
the number of edges between the
components. A multicut is a set of edges whose removal divides the graph
into two or more components. It is necessary to specify the number of
components one wishes to get in case of graph partitioning. The size of the
components must also be specified, as otherwise a likely but not meaningful
solution would be to put the minimum degree vertex into one component
49. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
49
and the rest of the vertices into another. Since the number of communities is
usually not known in advance, graph partitioning methods are not suitable
to detect communities in such cases.
Clustering is the process of grouping a set of similar items together in
structures known as clusters. Clustering the social network graph may give
a lot of information about the underlying hidden attributes, relationships
and properties of the participants as well as the interactions among them.
Hierarchical clustering and partitioning method of clustering are the
commonly used clustering techniques used in literature.
In hierarchical clustering, a hierarchy of clusters is formed. The process of
hierarchy creation or levelling can be agglomerative or divisive. In
agglomerative clustering methods, a bottom-up approach to clustering is
followed. A particular node is clubbed or agglomerated with similar nodes to
form a cluster or a community. This aggregation is based on similarity. In
divisive clustering approaches, a large cluster is repeatedly divided into
smaller clusters.
Partitioning methods begin with an initial partition amidst the number of
clusters pre-set and relocation of instances by moving them across clusters,
e.g., K-means clustering. An exhaustive evaluation of all possible partitions
is required to achieve global optimality in partitioned-based clustering. This
is time consuming and sometimes infeasible, hence researchers use greedy
heuristics for iterative optimization in partitioning methods of clustering.
The next section categorizes and discusses major algorithms for community
detection.
ALGORITHMS FOR COMMUNITY DETECTION:
A number of community detection algorithms and methods have been
proposed and deployed for the identification of communities in literature.
There have also been modifications and revisions to many methods and
algorithms already proposed. A comprehensive survey of community
50. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
50
detection in graphs has been done by Fortunato8 in the year 2010. Other
reviews available in literature are by Coscia et al9 in 2011, Fortunato et al10
in 2012, Porter et al11 in 2009, Danon et al12 in 2005, and Plantié et al13
in 2013. The presented work reviews the algorithms available till 2015 to the
best of our knowledge including the algorithms given in the earlier surveys.
Papers based on new approaches and techniques like big data, not
discussed by previous authors have been incorporated in our article.The
algorithms for community detection are categorized into approaches based
on graph partitioning, clustering, genetic algorithms, label propagation along
with methods for overlapping community detection (clique based and non-
clique based methods), and community detection for dynamic networks.
Algorithms under each of these categories are described below.
Graph partitioning based community detection:
Graph partitioning based methods have been used in literature to divide the
graph into components such that there are few connections between
components. The Kernighan-Line14 algorithm for graph partitioning was
amongst the earliest techniques to divide a graph. It partitions the nodes of
the graph with cost on edges into subsets of given sizes so as to minimize
the sum of costs on all edges cut. A major disadvantage of this algorithm
however is that the number of groups have to be predefined. The algorithm
however is quite fast with a worst case running time of 𝑂(𝑛2). Newman15
reduces the widely-studied maximum likelihood method for community
detection to a search through a group of candidate solutions, each of which
is itself a solution to a minimum cut graph partitioning problem. The paper
shows that the two most essential community inference methods based on
the stochastic block model or its degree-corrected variant16 can be mapped
onto versions of the familiar minimum-cut graph partitioning problem. This
has been illustrated by adapting Laplacian spectral partitioning method17,
18 to perform community inference.
51. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
51
Definition of Local Community:
The problem of local community detection is proposed by Clauset [15].
Usually we define the local community problem in the following way: there is
a nondirected graph , represents the set of nodes, and represents the edges
in the graph. The connecting information of partial nodes in the graph is
known or can be obtained. The local community is defined as . The set of
nodes connected with is defined as and the set of nodes in connected with
nodes in is defined as the boundary node set . That is to say, any node in is
connected to one node in , and the rest of is the core node set .
Clustering based community detection:
The main concern of community detection is to detect clusters, groups or
cohesive subgroups. The basis of a large number of community detection
algorithms is clustering. Amongst the innovators of community detection
methods, Girvan and Newman19 had a main role. They proposed a divisive
algorithm based on edge-betweenness for a graph with undirected and
unweighted edges. The algorithm focused on edges that are most “between”
the communities and communities are constructed progressively by
removing these edges from the original graph. Three different measures for
calculation of edge-betweenness in vertices of a graph were proposed in
Newman and Girvan20. The worst-case time complexity of the edge
betweenness algorithm is 𝑂(𝑚2𝑛) and is 𝑂(𝑛3) for sparse graphs, where m
denotes the number of edges and n is the number of vertices.
The Girvan Newman (GN) algorithm has been enhanced by many authors
and applied to various networks21-28. Chen et al22 extended GN algorithm
to partition weighted graphs and used it to identify functional modules in
the yeast proteome network. Rattigan et al21 proposed the indexing methods
to reduce the computational complexity of the GN algorithm significantly.
Pinney et al24 also build an algorithm which uses GN algorithm for the
decomposition of networks based on graph theoretical concept of
52. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
52
betweenness centrality. Their paper inspected utility of betweenness
centrality to decompose such networks in diverse ways. Radicchi29 et al also
proposed an algorithm based on GN algorithm introducing a new definition
of community. They defined ‘strong’ and ‘weak’ communities. The algorithm
uses an edge clustering coefficient to perform the divisive edge removal step
of GN and has a running time of 𝑂(𝑚4 𝑛2 ) and 𝑂(𝑛2) for sparse graphs.
Moon et al30 have proposed and implemented the parallel version of the GN
algorithm to handle large scale data. They have used MapReduce model
(Apache Hadoop) and GraphChi. Newman and Girvan first defined a
measure known as ‘modularity’ to judge the quality of partitions or
communities formed20. The modularity measure proposed by them has been
widely accepted and used by researchers to gauge the goodness of the
modules obtained from the community detection algorithms with high
modularity corresponding to a better community structure. Modularity was
defined as ∑𝒆𝒊𝒊𝒊 − 𝒂𝒊𝟐 , where 𝒆𝒊𝒊 denotes fraction of the edges that connect
vertices in community 𝒊, 𝒆𝒊𝒋 denotes fraction of the edges connecting vertices
in two different communities 𝒊 and 𝒋 while 𝒂𝒊 = ∑ 𝒆𝒊𝒋𝒋 is the fraction of edges
that connect to vertices in community 𝒊. The value 𝑄 = 1 indicates a network
with strong community structure. The optimization of modularity function
has received great attention in literature. The table 1 lists clustering based
community detection methods, including algorithms which use modularity
and modularity optimization. Newman31 has worked to maximize
modularity so that the process of aggregating nodes to form communities
leads to maximum modularity gain. This change in modularity upon joining
two communities defined as ∆𝑄 = 𝑒𝑖𝑗 + 𝑒𝑗𝑖 − 2𝑎𝑖𝑎𝑗 = 2(𝑒𝑖𝑗−𝑎𝑖𝑎𝑗) can be
calculated in constant time and hence is faster to execute in comparison to
the GN algorithm. The run time of the algorithm is 𝑂(𝑛2) for sparse graphs
and 𝑂((𝑚 + 𝑛)𝑛) for others. In a recent work, a scalable version of this
algorithm has been implemented using MapReduce by Chen et al32.
Newman33 generalized the betweenness algorithm for weighted networks.
The modularity was now represented as 𝑄 = 1 2𝑚 ∑ [𝐴𝑖𝑗𝑖𝑗 − 𝑘𝑖𝑘𝑗 2𝑚 ]δ(𝑐𝑖,𝑐𝑗)
53. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
53
where 𝑚 = 1 2 ∑ 𝐴𝑖𝑗𝑖𝑗 represents the number of edges between communities
𝑐𝑖 and 𝑐𝑗 in the graph, while 𝑘𝑖,𝑘𝑗 are degrees of vertices 𝑖 and 𝑗 while 𝛿(𝑢,𝑣)
is 1 if 𝑢 = 𝑣 and 0 otherwise. Newman34 in yet another approach
characterised the modularity matrix in terms of eigenvectors. The equation
for modularity was changed to
𝑄 =1 4𝑚𝑠𝑇𝐵𝑠 , where the modularity matrix was given as 𝐵𝑖𝑗 = 𝐴𝑖𝑗 − 𝑘𝑖𝑘𝑗 2𝑚
and modularity was defined using eigenvectors of the modularity
matrix.
Figure 1
Definition of local community:
Local community detection problem is to start from a preselected source
node. It adds the node meeting the conditions in into and removes the node
which does not meet the conditions from D gradually.
2.2. Related Algorithms:
At present, many local community detection algorithms have been proposed.
We introduce two representative local community detection algorithms.
(1)Clauset Algorithm: In order to solve the problem of local community
detection, Clauset [15] put forward the local community modularity R and
gave a fast convergence greedy algorithm to find the local community with
the greatest modularity.
Thedefinition of local community modularity is asfollows
where and represent two nodes in the graph. If nodes and are connected,
the value of is 1; otherwise, it is 0; if nodes and are both in , the value
of is 1; otherwise, it is 0.
The local community detection process of Clauset algorithm is similar to that
of web crawler algorithm. First, Clauset algorithm starts from an initial
54. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
54
node .Node is added to the subgraph , and all its neighbor nodes are added
to . Then the algorithm adds the node in which can bring the maximum
increment of into the local community iteratively, until the scale of the local
community reaches the preset size. That is to say, the algorithm needs to set
up a parameter to decide the size of the community, and the result is greatly
influenced by the initial node.
(2) LWP Algorithm: LWP [16] algorithm is an improved algorithm and it has
a clear end condition compared with Clauset algorithm. The algorithm
defines another local community modularity , which is expressed
aswhere and represent two nodes in the graph. If nodes and are
connected to each other, the value of is 1; otherwise, it is 0; if
nodes and are both in , the value of is 1; otherwise, it is 0; if only one of
the nodes and is in , the value of is 1; otherwise, it is 0.
Given an undirected and unweighted graph , LWP algorithm starts from an
initial node to find a subgraph with maximum value of . If the subgraph is a
community (i.e., ), then it returns the subgraph as a community. Otherwise,
it is considered that there is no community that can be found starting from
this initial node. For an initial node, LWP algorithm finds a subgraph with
the maximum value of local modularity by two steps. First, the algorithm is
initialized by constructing a subgraph with only an initial node and all the
neighbor nodes of node are added to the set . Then the algorithm performs
incremental step and pruning step.
In the incremental step, the node selected from which can make the local
modularity of increase with the highest value is added to iteratively. The
greedy algorithm will iteratively add nodes in to , until no node in can be
added. In the pruning step, if the local modularity of becomes larger when
removing a node from , then really remove it from . In the process of
pruning, the algorithm must ensure that the connectivity of is not destroyed
until no node can be removed. Then update the set and repeat the two steps
until there is no change in the process. The algorithm has a high Recall, but
its accuracy is low.
55. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
55
The complexity of these two algorithms is , where is the number of nodes to
be explored in the local community and is the average degree of the nodes to
be explored in the local community.
CHAPTER-2
LITERATURE
56. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
56
Literature Review:
A wide research study in the recent years focused on community detection in
complex systems [4], most of them focus on undirected networks to enhance
the efficiency of identifying communities in understanding complex
networks. For instances, Fortunato et al [3] based his proposed approach on
statistical inference perspectives, Schaeffer et al [5], proposed their approach
for clustering problem as an unsupervised learning task based on similarity
measure over the data of the network, Girvan and Newman based their
community detection proposal on betweenness calculation to find out
community boundaries where modularity measure is the overall quality of
the graph partitioning [6, 7]. The weight used by Newman and Girvan [7]
aims to be the betweenness measure of the edge, representing the number of
shortest paths connecting any pair of nodes passing through. However,
community detection problem has been studied mainly in case of undirected
57. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
57
networks, various solutions was proposed in this context, motivating many
disciplines to deal with the issue. Interestingly, Fortunato et al [3] mentioned
the few possibilities for extending techniques from undirected to directed
case, where the edge directedness is not the only complication that could
face the clustering problem. Nevertheless, diverse graph data in many real-
world applications are by nature directed, thus interesting to save available
information behinds the edge directionality. Malliaros et al [8], revealed in
their survey that the most common way for researcher community to deal
with the problem of clustering is to ignore the directionality of the graph,
then proceed to clustering with a wide range of proposed tools. Therefore,
most of community detection proposals can not be used directly on weighted
directed graphs, where the number of communities not always known in
advance and the communities present different granularity scale. Since the
problem of community detection in complex network analysis acquires more
attention, many researchers have been interested into structural information
and topological networks metrics [1, 3, 4, 6, 7, 8, 9]. In [10], S. Ahajjam
based the proposed community detection algorithms on a new scalable
approach using leader nodes characteristics through two steps: (i)
Identification of potential leaders in the network, and (ii) exploration of nodes
similarities around leaders to build communities. Therefore, recent works
start focusing on both topological and topical aspects [9, 11, 12] to overpass
limited performances of topology-based community detection approaches.
Topic-based community detection started gaining attention through different
works for community detection in complex network [9, 13, 14]. The essence
behind the approach is to similarly detect nodes with same properties, which
are not necessarily real connections between nodes of the network, in which
actors communicate on topics of mutual interest [14] to determine the
communities which are topically similar.
The main contributions of this thesis are to exploring and utilizing local
community detection related approaches to improve the services in Online
58. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
58
SocialNetworking Sites in terms of their recommendation efficiency and
accuracy.
Contributions are listed as follows:
M.E.J.NewmananDM.Girvan[1],”Finding and evaluating community structure
in network”. One of the most relevant features of graphs representing real
systems is community structure, or clustering. Detecting communities is of great
importance in sociology, biology and computer science, disciplines where systems
are often represented as graphs. Community detection is important for
other reasons, too. Identifying modules and their boundaries allows for a
classification of vertices, according to their structural position in the modules. So,
vertices lying at the boundaries between modules play an important role of
mediation and lead the relationships and exchanges between different
communities . Such a classification seems to be meaningful in social and
metabolic networks.
The procedure can be better illustrated by means of dendrograms. Sometimes,
stopping conditions are imposed to select a partition or a group of partitions that
satisfy a special criterion, like a given number of clusters or the optimization of a
quality function.
A simple way to identify communities in a graph is to detect the edges that connect
vertices of different communities and remove them, so that the clusters get
disconnected from each other. This is the philosophy of Divisivealgorithms.
H.-W.Shen and X.-Q.Cheng[3],”Spectral methods for the detection of neteork
community structure”.In this paper Spectral analysis has been successfully
applied at the detection of community structure of networks, respectively being
based on the adjacency matrix, the standard Laplacian matrix, the normalized
Laplacian matrix, the modularity matrix, the correlation matrix and several other
variants of these matrices. However, the comparison between these spectral
methods is less reported. More importantly, it is still unclear which matrix is more
appropriate for the detection of community structure. This paper answers the
question through evaluating the effectiveness of these five matrices against the
59. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
59
benchmark networks with heterogeneous distributions of node degree and
community size. In this paper, we conduct a comparative analysis of the
aforementioned five matrices on the benchmark networks which have
heterogeneous distributions of node degree and community size. The comparison is
carried out from two perspectives. The former one focuses on whether the number
of intrinsic communities can be exactly identified according to the spectrum of
these five matrices. The latter evaluates the effectiveness of these matrices at
identifying the intrinsic community structure using their eigenvectors
This paper carried out a comparative analysis on the spectral methods for the
detection of network community structure through evaluating the performance of
five widely used matrices on the benchmark networks with heterogeneous
distribution of node degree and community size. These five matrices are
respectively the adjacency matrix, the standard Laplacian matrix, the normalized
Laplacian matrix, the modularity matrix and the correlation matrix. Test results
demonstrate that the normalized Laplacian matrix and the correlation matrix
significantly outperform the other three matrices at identifying the community
structure of networks. This indicates that the heterogeneity of node degree is a
crucial ingredient for the detection of community structure using spectral
methods.
V.D.Blondel,J.Guillaume,R.Lambiotte et al[7],”Fast unfolding of communities in
large networks”. This Paper propose a simple method to extract the community
structure of large networks. Our method is a heuristic method that is based on
modularity optimization. It is shown to outperform all other known community
detection method in terms of computation time. Moreover, the quality of the
communities detected is very good, as measured by the so-called modularity. This
is shown first by identifying language communities in a Belgian mobile phone
network of 2.6 million customers and by analyzing a web graph of 118 million
nodes and more than one billion links. The accuracy of our algorithm is also
verified on ad-hoc modular networks. The problem of community detection
requires the partition of a network into communities of densely connected nodes,
with the nodes belonging to different communities being only sparsely connected.
60. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
60
This paper introduced an algorithm for optimizing modularity that allows to
study networks of unprecedented size. The limitation of the method for the
experiments that we performed was the storage of the network in main memory
rather than the computation time. The accuracy of our method has also been
tested on ad-hoc modular networks and is shown to be excellent in comparison
with other (much slower) community detection methods. By construction, our
algorithm unfolds a complete hierarchical community structure for the network,
each level of the hierarchy being given by the intermediate partitions found at each
pass.
K.M.Tan,D.Written and A.Shojaie[8],”The cluster graphical lasso for improved
estimation of Gaussian graphical models”. In this paper the task of estimating
a Gaussian graphical model in the high-dimensional setting is considered. The
graphical lasso, which involves maximizing the Gaussian log likelihood subject to a
lasso penalty, is a well-studied approach for this task. A surprising connection
between the graphical lasso and hierarchical clustering is introduced: the
graphical lasso in effect performs a two-step procedure, in which single linkage
hierarchical clustering is performed on the variables in order to identify connected
components, and then a penalized log likelihood is maximized on the subset of
variables within each connected component. Thus, the graphical lasso determines
the connected components of the estimated network via single linkage clustering.
The single linkage clustering is known to perform poorly in certain finite-sample
settings. Therefore, the cluster graphical lasso, which involves clustering the
features using an alternative to single linkage clustering, and then performing the
graphical lasso on the subset of variables within each cluster, is proposed.
We have shown that identifying the connected components of the graphical lasso
solution is equivalent to performing SLC based on S ̃ , the absolute value of the
empirical covariance matrix. Based on this connection, we have proposed the
cluster graphical lasso, an improved version of the graphical lasso for sparse
inverse covariance estimation. In this paper, we have considered the use of
hierarchical clustering in the CGL procedure. We have shown that performing
hierarchical clustering on S ̃ leads to consistent cluster recovery. As a byproduct,
61. Venkat Java Projects
Mobile:+91 9966499110 Visit:www.venkatjavaprojects.com
Email:venkatjavaprojects@gmail.com
61
we suggest a choice of λ1, …, λK in CGL that yields consistent identification of the
connected components. In addition, we establish the model selection consistency of
CGL.
Y.J.Wu,H.Huang,Z.F.Hao and F.Chen[17],”Local Community Detection using
link similarity”.In this paper Exploring local community structure is an
appealing problem that has drawn much recent attention in the area of social
network analysis. The existing approaches do well in measuring the community
quality, but they are largely dependent on source vertex and putting too strict
policy in agglomerating new vertices. Moreover, they have parameters which are
difficult to obtain. This paper proposes a method to¯find local community structure
by analyzing link similarity between the community and the vertex. Inspired by the
fact that elements in the same community are more likely to share common links,
we explore community structure heuristically by giving priority to vertices which
have a high link similarity with the community. A three-phase process is also used
for the sake of improving quality of community structure. Experimental results
prove that our method performs efectively not only in computer-generated graphs
but alsoin real-world graphs.
In this paper we present a method that mainly depends on link similarity
between the vertex and community to explore local community. This method
searches the potential vertices in a specific sequence so as to help improve the
accuracy. In this paper, we pro-posed an improved method to detect local
community structure. Our algorithm mainly takes advantage of link similarity
between the vertex and the community.A greedy agglomeration phase, an
optimization phase and a trimming phase are included in our algorithm.Compared
with other multi-phase algorithms, our al-gorithm implements comparatively
easier and stricter stopping criteria for the purpose of simplifying and op-timizing
our algorithm. Experimental results show that our algorithm can discover local
community better than other existing methods both in computer-generated
benchmark graphs and in real-world networks.