Review Paper on Shared and Distributed Memory Parallel Algorithms to Solve Big Data Problems in Biological, Social Network and Spatial Domain Applications
This document presents a review of parallel algorithms to solve big data problems in biological, social network, and spatial domains using shared and distributed memory. It discusses sequential and parallel algorithms for community detection in protein-protein interaction networks and social networks. It also discusses techniques for processing and analyzing large LiDAR point cloud data for applications like forest monitoring and 3D modeling. The document reviews relevant literature on algorithms for community detection, network partitioning, and LiDAR data reduction and interpolation. It then describes the BLLP algorithm for community detection in biological networks and discusses how it could be extended to distributed memory systems.
TREND-BASED NETWORKING DRIVEN BY BIG DATA TELEMETRY FOR SDN AND TRADITIONAL N...ijngnjournal
Organizations face a challenge of accurately analyzing network data and providing automated action
based on the observed trend. This trend-based analytics is beneficial to minimize the downtime and
improve the performance of the network services, but organizations use different network management
tools to understand and visualize the network traffic with limited abilities to dynamically optimize the
network. This research focuses on the development of an intelligent system that leverages big data
telemetry analysis in Platform for Network Data Analytics (PNDA) to enable comprehensive trendbased networking decisions. The results include a graphical user interface (GUI) done via a web
application for effortless management of all subsystems, and the system and application developed in
this research demonstrate the true potential for a scalable system capable of effectively benchmarking
the network to set the expected behavior for comparison and trend analysis. Moreover, this research
provides a proof of concept of how trend analysis results are actioned in both a traditional network and
a software-defined network (SDN) to achieve dynamic, automated load balancing.
An Overview and Classification of Approaches to Information Extraction in Wir...M H
Recent advances in wireless communication have made it possible to develop low-cost, and low power Wireless Sensor Networks (WSN). The WSN can be used for several application areas (e.g., habitat monitoring, forest fire detection, and health care). WSN Information Extraction (IE) techniques can be classified into four categories depending on the factors that drive data acquisition: event-driven, time-driven, query-based, and hybrid. This paper presents a survey of the state-of-the-art IE techniques in WSNs. The benefits and shortcomings of different IE approaches are presented as motivation for future work into automatic hybridisation and adaptation of IE mechanisms.
Map as a Service: A Framework for Visualising and Maximising Information Retu...M H
This paper presents a distributed information extraction and visualisation service, called the mapping service, for maximising information return from large-scale wireless sensor networks. Such a service would greatly simplify the production of higher-level, information-rich, representations suitable for informing other network services and the delivery of field information visualisations. The mapping service utilises a blend of inductive and deductive models to map sense data accurately using externally available knowledge. It utilises the special characteristics of the application domain to render visualisations in a map format that are a precise reflection of the concrete reality. This service is suitable for visualising an arbitrary number of sense modalities. It is capable of visualising from multiple independent types of the sense data to overcome the limitations of generating visualisations from a single type of sense modality. Furthermore, the mapping service responds dynamically to changes in the environmental conditions, which may affect the visualisation performance by continuously updating the application domain model in a distributed manner. Finally, a distributed self-adaptation function is proposed with the goal of saving more power and generating more accurate data visualisation. We conduct comprehensive experimentation to evaluate the performance of our mapping service and show that it achieves low communication overhead, produces maps of high fidelity, and further minimises the mapping predictive error dynamically through integrating the application domain model in the mapping service.
ANALYTIC QUERIES OVER GEOSPATIAL TIME-SERIES DATA USING DISTRIBUTED HASH TABLESNexgen Technology
TO GET THIS PROJECT COMPLETE SOURCE ON SUPPORT WITH EXECUTION PLEASE CALL BELOW CONTACT DETAILS
MOBILE: 9791938249, 0413-2211159, WEB: WWW.NEXGENPROJECT.COM,WWW.FINALYEAR-IEEEPROJECTS.COM, EMAIL:Praveen@nexgenproject.com
NEXGEN TECHNOLOGY provides total software solutions to its customers. Apsys works closely with the customers to identify their business processes for computerization and help them implement state-of-the-art solutions. By identifying and enhancing their processes through information technology solutions. NEXGEN TECHNOLOGY help it customers optimally use their resources.
TREND-BASED NETWORKING DRIVEN BY BIG DATA TELEMETRY FOR SDN AND TRADITIONAL N...ijngnjournal
Organizations face a challenge of accurately analyzing network data and providing automated action
based on the observed trend. This trend-based analytics is beneficial to minimize the downtime and
improve the performance of the network services, but organizations use different network management
tools to understand and visualize the network traffic with limited abilities to dynamically optimize the
network. This research focuses on the development of an intelligent system that leverages big data
telemetry analysis in Platform for Network Data Analytics (PNDA) to enable comprehensive trendbased networking decisions. The results include a graphical user interface (GUI) done via a web
application for effortless management of all subsystems, and the system and application developed in
this research demonstrate the true potential for a scalable system capable of effectively benchmarking
the network to set the expected behavior for comparison and trend analysis. Moreover, this research
provides a proof of concept of how trend analysis results are actioned in both a traditional network and
a software-defined network (SDN) to achieve dynamic, automated load balancing.
An Overview and Classification of Approaches to Information Extraction in Wir...M H
Recent advances in wireless communication have made it possible to develop low-cost, and low power Wireless Sensor Networks (WSN). The WSN can be used for several application areas (e.g., habitat monitoring, forest fire detection, and health care). WSN Information Extraction (IE) techniques can be classified into four categories depending on the factors that drive data acquisition: event-driven, time-driven, query-based, and hybrid. This paper presents a survey of the state-of-the-art IE techniques in WSNs. The benefits and shortcomings of different IE approaches are presented as motivation for future work into automatic hybridisation and adaptation of IE mechanisms.
Map as a Service: A Framework for Visualising and Maximising Information Retu...M H
This paper presents a distributed information extraction and visualisation service, called the mapping service, for maximising information return from large-scale wireless sensor networks. Such a service would greatly simplify the production of higher-level, information-rich, representations suitable for informing other network services and the delivery of field information visualisations. The mapping service utilises a blend of inductive and deductive models to map sense data accurately using externally available knowledge. It utilises the special characteristics of the application domain to render visualisations in a map format that are a precise reflection of the concrete reality. This service is suitable for visualising an arbitrary number of sense modalities. It is capable of visualising from multiple independent types of the sense data to overcome the limitations of generating visualisations from a single type of sense modality. Furthermore, the mapping service responds dynamically to changes in the environmental conditions, which may affect the visualisation performance by continuously updating the application domain model in a distributed manner. Finally, a distributed self-adaptation function is proposed with the goal of saving more power and generating more accurate data visualisation. We conduct comprehensive experimentation to evaluate the performance of our mapping service and show that it achieves low communication overhead, produces maps of high fidelity, and further minimises the mapping predictive error dynamically through integrating the application domain model in the mapping service.
ANALYTIC QUERIES OVER GEOSPATIAL TIME-SERIES DATA USING DISTRIBUTED HASH TABLESNexgen Technology
TO GET THIS PROJECT COMPLETE SOURCE ON SUPPORT WITH EXECUTION PLEASE CALL BELOW CONTACT DETAILS
MOBILE: 9791938249, 0413-2211159, WEB: WWW.NEXGENPROJECT.COM,WWW.FINALYEAR-IEEEPROJECTS.COM, EMAIL:Praveen@nexgenproject.com
NEXGEN TECHNOLOGY provides total software solutions to its customers. Apsys works closely with the customers to identify their business processes for computerization and help them implement state-of-the-art solutions. By identifying and enhancing their processes through information technology solutions. NEXGEN TECHNOLOGY help it customers optimally use their resources.
Ensemble based Distributed K-Modes ClusteringIJERD Editor
Clustering has been recognized as the unsupervised classification of data items into groups. Due to the explosion in the number of autonomous data sources, there is an emergent need for effective approaches in distributed clustering. The distributed clustering algorithm is used to cluster the distributed datasets without gathering all the data in a single site. The K-Means is a popular clustering method owing to its simplicity and speed in clustering large datasets. But it fails to handle directly the datasets with categorical attributes which are generally occurred in real life datasets. Huang proposed the K-Modes clustering algorithm by introducing a new dissimilarity measure to cluster categorical data. This algorithm replaces means of clusters with a frequency based method which updates modes in the clustering process to minimize the cost function. Most of the distributed clustering algorithms found in the literature seek to cluster numerical data. In this paper, a novel Ensemble based Distributed K-Modes clustering algorithm is proposed, which is well suited to handle categorical data sets as well as to perform distributed clustering process in an asynchronous manner. The performance of the proposed algorithm is compared with the existing distributed K-Means clustering algorithms, and K-Modes based Centralized Clustering algorithm. The experiments are carried out for various datasets of UCI machine learning data repository.
A tutorial on secure outsourcing of large scalecomputation for big dataredpel dot com
A tutorial on secure outsourcing of large scalecomputation for big data
for more ieee paper / full abstract / implementation , just visit www.redpel.com
Data mining is an integrated field, depicted technologies in combination to the areas having database, learning by machine, statistical study, and recognition in patterns of same type, information regeneration, A.I networks, knowledge-based portfolios, artificial intelligence, neural network, and data determination. In real terms, mining of data is the investigation of provisional data sets for finding hidden connections and to gather the information in peculiar form which are justifiable and understandable to the owner of gather or mined data. An unsupervised formula which differentiate data components into collections by which the components in similar group are more allied to one other and items in rest of cluster seems to be non-allied, by the criteria of measurement of equality or predictability is called process of clustering. Cluster analysis is a relegating task that is utilized to identify same group of object and it is additionally one of the most widely used method for many practical application in data mining. It is a method of grouping objects, where objects can be physical, such as a student or may be a summary such as customer comportment, handwriting. It has been proposed many clustering algorithms that it falls into the different clustering methods. The intention of this paper is to provide a relegation of some prominent clustering algorithms.
Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...rahulmonikasharma
In microdata releases, main task is to protect the privacy of data subjects. Microaggregation technique use to disclose the limitation at protecting the privacy of microdata. This technique is an alternative to generalization and suppression, which use to generate k-anonymous data sets. In this dataset, identity of each subject is hidden within a group of k subjects. Microaggregation perturbs the data and additional masking allows refining data utility in many ways, like increasing data granularity, to avoid discretization of numerical data, to reduce the impact of outliers. If the variability of the private data values in a group of k subjects is too small, k-anonymity does not provide protection against attribute disclosure. In this work Role based access control is assumed. The access control policies define selection predicates to roles. Then use the concept of imprecision bound for each permission to define a threshold on the amount of imprecision that can be tolerated. So the proposed approach reduces the imprecision for each selection predicate. Anonymization is carried out only for the static relational table in the existing papers. Privacy preserving access control mechanism is applied to the incremental data.
A Study of Mobile User Movements Prediction Methods IJECEIAES
For a decade and more, the Number of smart phone users count increasing day by day. With the drastic improvements in Communication technologies, the prediction of future movements of mobile users needs also have important role. Various sectors can gain from this prediction. Communication management, City Development planning, and locationbased services are some of the fields that can be made more valuable with movement prediction. In this paper, we propose a study of several Location Prediction Techniques in the following areas.
An Overview of Information Extraction from Mobile Wireless Sensor NetworksM H
Information Extraction (IE) is a key research area within the field of Wireless Sensor Networks (WSNs). It has been characterised in a variety of ways, ranging from the description of its purposes, to reasonably abstract models of its processes and components. There has been only a handful of papers addressing IE over mobile WSNs directly, these dealt with individual mobility related problems as the need arises. This paper is presented as a tutorial that takes the reader from the point of identifying data about a dynamic (mobile) real world problem, relating the data back to the world from which it was collected, and finally discovering what is in the data. It covers the entire process with special emphasis on how to exploit mobility in maximising information return from a mobile WSN. We present some challenges introduced by mobility on the IE process as well as its effects on the quality of the extracted information. Finally, we identify future research directions facing the development of efficient IE approaches for WSNs in the presence of mobility.
Interpolation Techniques for Building a Continuous Map from Discrete Wireless...M H
Wireless sensor networks (WSNs) typically gather data at a discrete number of locations. However, it is desirable to be able to design applications and reason about the data in more abstract forms than in points of data. By bestowing the ability to predict inter-node values upon the network, it is proposed that it will become possible to build applications that are unaware of the concrete reality of sparse data. This interpolation capability is realised as a service of the network. In this paper, the ‘map’ style of presentation has been identified as a suitable sense data visualisation format. Although map generation is essentially a problem of interpolation between points, a new WSN service, called the map generation service, which is based on a Shepard interpolation method, is presented. A modified Shepard method that aims to deal with the special characteristics of WSNs is proposed. It requires small storage, can be localised and integrates the information about the application domain to further reduce the map generation cost and improve the mapping accuracy. Flood management application is considered to demonstrate how MGS-generated maps can be used in various applications. Empirical analysis has shown that the map generation service is an accurate, a flexible and an efficient method.
A Survey On Ontology Agent Based Distributed Data MiningEditor IJMTER
With the increased complexity in number of applications and due to large volume
of availability of data from heterogeneous sources, there is a need for the development of
suitable ontology, which can handle the large data set and present the mined outcomes for
evaluation intelligently. In the era of intensive data driven applications distributed data mining can
meet the challenges with the support of agents. This paper discusses the underlying principles for
effectiveness of modern agent-based systems for distributed data mining
Many data mining and knowledge discovery methodologies and process models have been developed,
with varying degrees of success, there are three main methods used to discover patterns in data; KDD,
SEMMA and CRISP-DM. They are presented in many of the publications of the area and are used in
practice. To our knowledge, there is no clear methodology developed to support link mining. However,
there is a well known methodology in knowledge discovery in databases, known as Cross Industry
Standard Process for Data Mining (CRISPDM), developed by a consortium of several industrial
companies which can be relevant to the study of link mining. In this study CRISP-DM has been adapted to
the field of Link mining to detect anomalies. An important goal in link mining is the task of inferring links
that are not yet known in a given network. This approach is implemented through the use of a case study
of realworld data (co-citation data). This case study aims to use mutual information to interpret the
semantics of anomalies identified in co-citation, dataset that can provide valuable insights in determining
the nature of a given link and potentially identifying important future link relationships.
Graph Algorithm to Find Core Periphery Structures using Mutual K-nearest Neig...gerogepatton
Core periphery structures exist naturally in many complex networks in the real-world like social, economic, biological and metabolic networks. Most of the existing research efforts focus on the identification of a meso scale structure called community structure. Core periphery structures are another equally important meso scale property in a graph that can help to gain deeper insights about the relationships between different nodes. In this paper, we provide a definition of core periphery structures suitable for weighted graphs. We further score and categorize these relationships into different types based upon the density difference between the core and periphery nodes. Next, we propose an algorithm called CP-MKNN (Core Periphery-Mutual K Nearest Neighbors) to extract core periphery structures from weighted graphs using a heuristic node affinity measure called Mutual K-nearest neighbors (MKNN). Using synthetic and real-world social and biological networks, we illustrate the effectiveness of developed core periphery structures.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
In distributed systems,to search out information is costly task because they have to be compel led to transfer information from node containing information to the node wherever query is generated,this will l consume latency,network traffic etc.For reducing these parameters mobile agents are accustomed fetch information from nodes wherever i nformation resides. Alongside mobile agents directory containing information concerning database kept on completely different nodes is employed to focus retrieval method solely to those nodes that are containing answers to the query. 3 kinds of agents area unit accustomed fetch data specifically ly coordi nator,search and local agent.
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKSIJDKP
Community detection from complex information networks draws much attention from both academia and
industry since it has many real-world applications. However, scalability of community detection algorithms
over very large networks has been a major challenge. Real-world graph structures are often complicated
accompanied with extremely large sizes. In this paper, we propose a MapReduce version called 3MA that
parallelizes a local community identification method which uses the $M$ metric. Then we adopt an
iterative expansion approach to find all the communities in the graph. Empirical results show that for large
networks in the order of millions of nodes, the parallel version of the algorithm outperforms the traditional
sequential approach to detect communities using the M-measure. The result shows that for local community
detection, when the data is too big for the original M metric-based sequential iterative expension approach
to handle, our MapReduce version 3MA can finish in a reasonable time.
Scalable Local Community Detection with Mapreduce for Large NetworksIJDKP
Community detection from complex information networks draws much attention from both academia and
industry since it has many real-world applications. However, scalability of community detection algorithms
over very large networks has been a major challenge. Real-world graph structures are often complicated
accompanied with extremely large sizes. In this paper, we propose a MapReduce version called 3MA that
parallelizes a local community identification method which uses the $M$ metric. Then we adopt an
iterative expansion approach to find all the communities in the graph. Empirical results show that for large
networks in the order of millions of nodes, the parallel version of the algorithm outperforms the traditional
sequential approach to detect communities using the M-measure. The result shows that for local community
detection, when the data is too big for the original M metric-based sequential iterative expension approach
to handle, our MapReduce version 3MA can finish in a reasonable time.
Ensemble based Distributed K-Modes ClusteringIJERD Editor
Clustering has been recognized as the unsupervised classification of data items into groups. Due to the explosion in the number of autonomous data sources, there is an emergent need for effective approaches in distributed clustering. The distributed clustering algorithm is used to cluster the distributed datasets without gathering all the data in a single site. The K-Means is a popular clustering method owing to its simplicity and speed in clustering large datasets. But it fails to handle directly the datasets with categorical attributes which are generally occurred in real life datasets. Huang proposed the K-Modes clustering algorithm by introducing a new dissimilarity measure to cluster categorical data. This algorithm replaces means of clusters with a frequency based method which updates modes in the clustering process to minimize the cost function. Most of the distributed clustering algorithms found in the literature seek to cluster numerical data. In this paper, a novel Ensemble based Distributed K-Modes clustering algorithm is proposed, which is well suited to handle categorical data sets as well as to perform distributed clustering process in an asynchronous manner. The performance of the proposed algorithm is compared with the existing distributed K-Means clustering algorithms, and K-Modes based Centralized Clustering algorithm. The experiments are carried out for various datasets of UCI machine learning data repository.
A tutorial on secure outsourcing of large scalecomputation for big dataredpel dot com
A tutorial on secure outsourcing of large scalecomputation for big data
for more ieee paper / full abstract / implementation , just visit www.redpel.com
Data mining is an integrated field, depicted technologies in combination to the areas having database, learning by machine, statistical study, and recognition in patterns of same type, information regeneration, A.I networks, knowledge-based portfolios, artificial intelligence, neural network, and data determination. In real terms, mining of data is the investigation of provisional data sets for finding hidden connections and to gather the information in peculiar form which are justifiable and understandable to the owner of gather or mined data. An unsupervised formula which differentiate data components into collections by which the components in similar group are more allied to one other and items in rest of cluster seems to be non-allied, by the criteria of measurement of equality or predictability is called process of clustering. Cluster analysis is a relegating task that is utilized to identify same group of object and it is additionally one of the most widely used method for many practical application in data mining. It is a method of grouping objects, where objects can be physical, such as a student or may be a summary such as customer comportment, handwriting. It has been proposed many clustering algorithms that it falls into the different clustering methods. The intention of this paper is to provide a relegation of some prominent clustering algorithms.
Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...rahulmonikasharma
In microdata releases, main task is to protect the privacy of data subjects. Microaggregation technique use to disclose the limitation at protecting the privacy of microdata. This technique is an alternative to generalization and suppression, which use to generate k-anonymous data sets. In this dataset, identity of each subject is hidden within a group of k subjects. Microaggregation perturbs the data and additional masking allows refining data utility in many ways, like increasing data granularity, to avoid discretization of numerical data, to reduce the impact of outliers. If the variability of the private data values in a group of k subjects is too small, k-anonymity does not provide protection against attribute disclosure. In this work Role based access control is assumed. The access control policies define selection predicates to roles. Then use the concept of imprecision bound for each permission to define a threshold on the amount of imprecision that can be tolerated. So the proposed approach reduces the imprecision for each selection predicate. Anonymization is carried out only for the static relational table in the existing papers. Privacy preserving access control mechanism is applied to the incremental data.
A Study of Mobile User Movements Prediction Methods IJECEIAES
For a decade and more, the Number of smart phone users count increasing day by day. With the drastic improvements in Communication technologies, the prediction of future movements of mobile users needs also have important role. Various sectors can gain from this prediction. Communication management, City Development planning, and locationbased services are some of the fields that can be made more valuable with movement prediction. In this paper, we propose a study of several Location Prediction Techniques in the following areas.
An Overview of Information Extraction from Mobile Wireless Sensor NetworksM H
Information Extraction (IE) is a key research area within the field of Wireless Sensor Networks (WSNs). It has been characterised in a variety of ways, ranging from the description of its purposes, to reasonably abstract models of its processes and components. There has been only a handful of papers addressing IE over mobile WSNs directly, these dealt with individual mobility related problems as the need arises. This paper is presented as a tutorial that takes the reader from the point of identifying data about a dynamic (mobile) real world problem, relating the data back to the world from which it was collected, and finally discovering what is in the data. It covers the entire process with special emphasis on how to exploit mobility in maximising information return from a mobile WSN. We present some challenges introduced by mobility on the IE process as well as its effects on the quality of the extracted information. Finally, we identify future research directions facing the development of efficient IE approaches for WSNs in the presence of mobility.
Interpolation Techniques for Building a Continuous Map from Discrete Wireless...M H
Wireless sensor networks (WSNs) typically gather data at a discrete number of locations. However, it is desirable to be able to design applications and reason about the data in more abstract forms than in points of data. By bestowing the ability to predict inter-node values upon the network, it is proposed that it will become possible to build applications that are unaware of the concrete reality of sparse data. This interpolation capability is realised as a service of the network. In this paper, the ‘map’ style of presentation has been identified as a suitable sense data visualisation format. Although map generation is essentially a problem of interpolation between points, a new WSN service, called the map generation service, which is based on a Shepard interpolation method, is presented. A modified Shepard method that aims to deal with the special characteristics of WSNs is proposed. It requires small storage, can be localised and integrates the information about the application domain to further reduce the map generation cost and improve the mapping accuracy. Flood management application is considered to demonstrate how MGS-generated maps can be used in various applications. Empirical analysis has shown that the map generation service is an accurate, a flexible and an efficient method.
A Survey On Ontology Agent Based Distributed Data MiningEditor IJMTER
With the increased complexity in number of applications and due to large volume
of availability of data from heterogeneous sources, there is a need for the development of
suitable ontology, which can handle the large data set and present the mined outcomes for
evaluation intelligently. In the era of intensive data driven applications distributed data mining can
meet the challenges with the support of agents. This paper discusses the underlying principles for
effectiveness of modern agent-based systems for distributed data mining
Many data mining and knowledge discovery methodologies and process models have been developed,
with varying degrees of success, there are three main methods used to discover patterns in data; KDD,
SEMMA and CRISP-DM. They are presented in many of the publications of the area and are used in
practice. To our knowledge, there is no clear methodology developed to support link mining. However,
there is a well known methodology in knowledge discovery in databases, known as Cross Industry
Standard Process for Data Mining (CRISPDM), developed by a consortium of several industrial
companies which can be relevant to the study of link mining. In this study CRISP-DM has been adapted to
the field of Link mining to detect anomalies. An important goal in link mining is the task of inferring links
that are not yet known in a given network. This approach is implemented through the use of a case study
of realworld data (co-citation data). This case study aims to use mutual information to interpret the
semantics of anomalies identified in co-citation, dataset that can provide valuable insights in determining
the nature of a given link and potentially identifying important future link relationships.
Graph Algorithm to Find Core Periphery Structures using Mutual K-nearest Neig...gerogepatton
Core periphery structures exist naturally in many complex networks in the real-world like social, economic, biological and metabolic networks. Most of the existing research efforts focus on the identification of a meso scale structure called community structure. Core periphery structures are another equally important meso scale property in a graph that can help to gain deeper insights about the relationships between different nodes. In this paper, we provide a definition of core periphery structures suitable for weighted graphs. We further score and categorize these relationships into different types based upon the density difference between the core and periphery nodes. Next, we propose an algorithm called CP-MKNN (Core Periphery-Mutual K Nearest Neighbors) to extract core periphery structures from weighted graphs using a heuristic node affinity measure called Mutual K-nearest neighbors (MKNN). Using synthetic and real-world social and biological networks, we illustrate the effectiveness of developed core periphery structures.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
In distributed systems,to search out information is costly task because they have to be compel led to transfer information from node containing information to the node wherever query is generated,this will l consume latency,network traffic etc.For reducing these parameters mobile agents are accustomed fetch information from nodes wherever i nformation resides. Alongside mobile agents directory containing information concerning database kept on completely different nodes is employed to focus retrieval method solely to those nodes that are containing answers to the query. 3 kinds of agents area unit accustomed fetch data specifically ly coordi nator,search and local agent.
SEARCHING DISTRIBUTED DATA WITH MULTI AGENT SYSTEM
Similar to Review Paper on Shared and Distributed Memory Parallel Algorithms to Solve Big Data Problems in Biological, Social Network and Spatial Domain Applications
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKSIJDKP
Community detection from complex information networks draws much attention from both academia and
industry since it has many real-world applications. However, scalability of community detection algorithms
over very large networks has been a major challenge. Real-world graph structures are often complicated
accompanied with extremely large sizes. In this paper, we propose a MapReduce version called 3MA that
parallelizes a local community identification method which uses the $M$ metric. Then we adopt an
iterative expansion approach to find all the communities in the graph. Empirical results show that for large
networks in the order of millions of nodes, the parallel version of the algorithm outperforms the traditional
sequential approach to detect communities using the M-measure. The result shows that for local community
detection, when the data is too big for the original M metric-based sequential iterative expension approach
to handle, our MapReduce version 3MA can finish in a reasonable time.
Scalable Local Community Detection with Mapreduce for Large NetworksIJDKP
Community detection from complex information networks draws much attention from both academia and
industry since it has many real-world applications. However, scalability of community detection algorithms
over very large networks has been a major challenge. Real-world graph structures are often complicated
accompanied with extremely large sizes. In this paper, we propose a MapReduce version called 3MA that
parallelizes a local community identification method which uses the $M$ metric. Then we adopt an
iterative expansion approach to find all the communities in the graph. Empirical results show that for large
networks in the order of millions of nodes, the parallel version of the algorithm outperforms the traditional
sequential approach to detect communities using the M-measure. The result shows that for local community
detection, when the data is too big for the original M metric-based sequential iterative expension approach
to handle, our MapReduce version 3MA can finish in a reasonable time.
Trend-Based Networking Driven by Big Data Telemetry for Sdn and Traditional N...josephjonse
Organizations face a challenge of accurately analyzing network data and providing automated action based on the observed trend. This trend-based analytics is beneficial to minimize the downtime and improve the performance of the network services, but organizations use different network management tools to understand and visualize the network traffic with limited abilities to dynamically optimize the network. This research focuses on the development of an intelligent system that leverages big data telemetry analysis in Platform for Network Data Analytics (PNDA) to enable comprehensive trendbased networking decisions. The results include a graphical user interface (GUI) done via a web application for effortless management of all subsystems, and the system and application developed in this research demonstrate the true potential for a scalable system capable of effectively benchmarking the network to set the expected behavior for comparison and trend analysis. Moreover, this research provides a proof of concept of how trend analysis results are actioned in both a traditional network and a software-defined network (SDN) to achieve dynamic, automated load balancing.
Trend-Based Networking Driven by Big Data Telemetry for Sdn and Traditional N...josephjonse
Organizations face a challenge of accurately analyzing network data and providing automated action based on the observed trend. This trend-based analytics is beneficial to minimize the downtime and improve the performance of the network services, but organizations use different network management tools to understand and visualize the network traffic with limited abilities to dynamically optimize the network. This research focuses on the development of an intelligent system that leverages big data telemetry analysis in Platform for Network Data Analytics (PNDA) to enable comprehensive trendbased networking decisions. The results include a graphical user interface (GUI) done via a web application for effortless management of all subsystems, and the system and application developed in this research demonstrate the true potential for a scalable system capable of effectively benchmarking the network to set the expected behavior for comparison and trend analysis. Moreover, this research provides a proof of concept of how trend analysis results are actioned in both a traditional network and a software-defined network (SDN) to achieve dynamic, automated load balancing
Data Mining Framework for Network Intrusion Detection using Efficient TechniquesIJAEMSJORNAL
The implementation measures the classification accuracy on benchmark datasets after combining SIS and ANNs. In order to put a number on the gains made by using SIS as a strategic tool in data mining, extensive experiments and analyses are carried out. The predicted results of this investigation will have implications for both theoretical and applied settings. Predictive models in a wide variety of disciplines may benefit from the enhanced classification accuracy enabled by SIS inside ANNs. An invaluable resource for scholars and practitioners in the fields of AI and data mining, this study adds to the continuing conversation about how to maximize the efficacy of machine learning methods.
A Novel Approach for Clustering Big Data based on MapReduce IJECEIAES
Clustering is one of the most important applications of data mining. It has attracted attention of researchers in statistics and machine learning. It is used in many applications like information retrieval, image processing and social network analytics etc. It helps the user to understand the similarity and dissimilarity between objects. Cluster analysis makes the users understand complex and large data sets more clearly. There are different types of clustering algorithms analyzed by various researchers. Kmeans is the most popular partitioning based algorithm as it provides good results because of accurate calculation on numerical data. But Kmeans give good results for numerical data only. Big data is combination of numerical and categorical data. Kprototype algorithm is used to deal with numerical as well as categorical data. Kprototype combines the distance calculated from numeric and categorical data. With the growth of data due to social networking websites, business transactions, scientific calculation etc., there is vast collection of structured, semi-structured and unstructured data. So, there is need of optimization of Kprototype so that these varieties of data can be analyzed efficiently.In this work, Kprototype algorithm is implemented on MapReduce in this paper. Experiments have proved that Kprototype implemented on Mapreduce gives better performance gain on multiple nodes as compared to single node. CPU execution time and speedup are used as evaluation metrics for comparison.Intellegent splitter is proposed in this paper which splits mixed big data into numerical and categorical data. Comparison with traditional algorithms proves that proposed algorithm works better for large scale of data.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...acijjournal
Apriori is one of the key algorithms to generate frequent itemsets. Analysing frequent itemset is a crucial
step in analysing structured data and in finding association relationship between items. This stands as an
elementary foundation to supervised learning, which encompasses classifier and feature extraction
methods. Applying this algorithm is crucial to understand the behaviour of structured data. Most of the
structured data in scientific domain are voluminous. Processing such kind of data requires state of the art
computing machines. Setting up such an infrastructure is expensive. Hence a distributed environment
such as a clustered setup is employed for tackling such scenarios. Apache Hadoop distribution is one of
the cluster frameworks in distributed environment that helps by distributing voluminous data across a
number of nodes in the framework. This paper focuses on map/reduce design and implementation of
Apriori algorithm for structured data analysis.
A CLOUD BASED ARCHITECTURE FOR WORKING ON BIG DATA WITH WORKFLOW MANAGEMENTIJwest
In real environment there is a collection of many noisy and vague data, called Big Data. On the other hand,
to work on the data middleware have been developed and is now very widely used. The challenge of
working on Big Data is its processing and management. Here, integrated management system is required
to provide a solution for integrating data from multiple sensors and maximize the target success. This is in
situation that the system has constant time constrains for processing, and real-time decision-making
processes. A reliable data fusion model must meet this requirement and steadily let the user monitor data
stream. With widespread using of workflow interfaces, this requirement can be addressed. But, the work
with Big Data is also challenging. We provide a multi-agent cloud-based architecture for a higher vision to
solve this problem. This architecture provides the ability to Big Data Fusion using a workflow management
interface. The proposed system is capable of self-repair in the presence of risks and its risk is low.
DISTRIBUTED AND BIG DATA STORAGE MANAGEMENT IN GRID COMPUTINGijgca
Big data storage management is one of the most challenging issues for Grid computing environments, since large amount of data intensive applications frequently involve a high degree of data access locality. Grid applications typically deal with large amounts of data. In traditional approaches high-performance computing consists dedicated servers that are used to data storage and data replication. In this paper we present a new mechanism for distributed and big data storage and resource discovery services. Here we proposed an architecture named Dynamic and Scalable Storage Management (DSSM) architecture in grid environments. This allows in grid computing not only sharing the computational cycles, but also share the storage space. The storage can be transparently accessed from any grid machine, allowing easy data sharing among grid users and applications. The concept of virtual ids that, allows the creation of virtual spaces has been introduced and used. The DSSM divides all Grid Oriented Storage devices (nodes) into multiple geographically distributed domains and to facilitate the locality and simplify the intra-domain storage management. Grid service based storage resources are adopted to stack simple modular service piece by piece as demand grows. To this end, we propose four axes that define: DSSM architecture and algorithms description, Storage resources and resource discovery into Grid service, Evaluate purpose prototype system, dynamically, scalability, and bandwidth, and Discuss results. Algorithms at bottom and upper level for standardization dynamic and scalable storage management, along with higher bandwidths have been designed.
Recently graph data rises in many applications and there is need to manage such large amount of data by performing various graph operations over graphs using some graph search queries. Many approaches and algorithms serve this purpose but continuously require improvement over it in terms of stability and performance. Such approaches are less efficient when large and complex data is involved. Applications need to execute faster in order to improve overall performance of the system and need to perform many
advanced and complex operations. Shortest path estimation is one of the key search queries in many applications. Here we present a system which will find the shortest path between nodes and contribute to performance of the system with the help of different shortest path algorithms such as bidirectional search and AStar algorithm and takes a relational approach using some new standard SQL queries to solve the
problem, utilizing advantages of relational database which solves the problem efficiently.
Implementation of p pic algorithm in map reduce to handle big dataeSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Artificial intelligence and machine learning in dynamic cyber risk analytics ...Petar Radanliev
We explore the potential and practical challenges in the use of artificial intelligence (AI) in cyber risk analytics, for improv- ing organisational resilience and understanding cyber risk. The research is focused on identifying the role of AI in con- nected devices such as Internet of Things (IoT) devices. Through literature review, we identify wide ranging and creative methodologies for cyber analytics and explore the risks of deliberately influencing or disrupting behaviours to socio- technical systems. This resulted in the modelling of the connections and interdependencies between a system’s edge components to both external and internal services and systems. We focus on proposals for models, infrastructures and frameworks of IoT systems found in both business reports and technical papers. We analyse this juxtaposition of related systems and technologies, in academic and industry papers published in the past 10 years. Then, we report the results of a qualitative empirical study that correlates the academic literature with key technological advances in connected devices. The work is based on grouping future and present techniques and presenting the results through a new con- ceptual framework. With the application of social science’s grounded theory, the framework details a new process for a prototype of AI-enabled dynamic cyber risk analytics at the edge.
Similar to Review Paper on Shared and Distributed Memory Parallel Algorithms to Solve Big Data Problems in Biological, Social Network and Spatial Domain Applications (20)
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Forklift Classes Overview by Intella PartsIntella Parts
Discover the different forklift classes and their specific applications. Learn how to choose the right forklift for your needs to ensure safety, efficiency, and compliance in your operations.
For more technical information, visit our website https://intellaparts.com
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfKamal Acharya
The College Bus Management system is completely developed by Visual Basic .NET Version. The application is connect with most secured database language MS SQL Server. The application is develop by using best combination of front-end and back-end languages. The application is totally design like flat user interface. This flat user interface is more attractive user interface in 2017. The application is gives more important to the system functionality. The application is to manage the student’s details, driver’s details, bus details, bus route details, bus fees details and more. The application has only one unit for admin. The admin can manage the entire application. The admin can login into the application by using username and password of the admin. The application is develop for big and small colleges. It is more user friendly for non-computer person. Even they can easily learn how to manage the application within hours. The application is more secure by the admin. The system will give an effective output for the VB.Net and SQL Server given as input to the system. The compiled java program given as input to the system, after scanning the program will generate different reports. The application generates the report for users. The admin can view and download the report of the data. The application deliver the excel format reports. Because, excel formatted reports is very easy to understand the income and expense of the college bus. This application is mainly develop for windows operating system users. In 2017, 73% of people enterprises are using windows operating system. So the application will easily install for all the windows operating system users. The application-developed size is very low. The application consumes very low space in disk. Therefore, the user can allocate very minimum local disk space for this application.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Vaccine management system project report documentation..pdfKamal Acharya
The Division of Vaccine and Immunization is facing increasing difficulty monitoring vaccines and other commodities distribution once they have been distributed from the national stores. With the introduction of new vaccines, more challenges have been anticipated with this additions posing serious threat to the already over strained vaccine supply chain system in Kenya.
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Review Paper on Shared and Distributed Memory Parallel Algorithms to Solve Big Data Problems in Biological, Social Network and Spatial Domain Applications
2. Volume 3, Issue 6, July-August-2018 | http:// ijsrcseit.com
Anita Ratnasari et al. Int J S Res CSE & IT. 2018 July-August; 3(6) : 658-670
659
calculation in foreseeing practical modules of
uncharacterized proteins.
SOCIAL NETWORKS DOMAIN (SHARED AND
DISTRIBUTED MEMORY):
A standout amongst the most applicable and
generally examined auxiliary properties of systems is
their locale structure or grouping. Network discovery
in a system likewise removes the auxiliary properties
of the system ([22]) and the different cooperations in
the system ([8]). So the network identification issue
centers around advancing this quality capacity ([20]).
One of the quality capacities regularly utilized is
seclusion ([3]).aThe expanding size of informal
organizations like Facebook, Twitter, LinkedIn, and
so forth has made network location more
troublesome, with information examine which can
reach to billions of vertices and edges. The vast
majority of the examination in network discovery has
been centered around consecutive calculations on
SMP machines and a careful survey of the same is
introduced in ([24]). Though some quick versatile
network recognition calculations ([10], [24], [43])
which have been created can just handle organize
sizes which can be put away in the RAM of one
machine. These calculations embrace consecutive,
parallel sharedmemory and nondispersed engineering.
Preparing systems with a huge number of vertices
and billions of edges require a few hundred gigabytes
of RAM.
SPATIAL DOMAIN (DATA REDUCTION
ANDINTERPOLATION):
Airborne Light Detection and Ranging (LiDAR) is the
best means for high thickness and high exactness
territory information obtaining. LiDAR innovation is
an intense remote detecting system that can be
utilized to create definite maps of items, surfaces, and
territories crosswise over broadly differing scales
([14]). It simpler to create gigantic high thickness
LiDAR point mists and subsequently more precise,
reduced landscape models and other three
dimensional portrayals ([15]). Be that as it may, the
age of such enhanced models from high thickness,
and gigantic volume of information forces incredible
difficulties regarding information stockpiling,
handling, and control. Throughout the most recent
15 years, LiDAR information for producing
dependable and exact Digital Elevation Models
(DEMs) is generally utilized in geospatial science
networks ([31]). To the best of our insight, this is the
principal ever scene driven information decrease
system. We likewise utilize parallel programming to
misuse multicenter design of CPUs, subsequently
making our calculation exceedingly versatile and
timeproductive.
Spatial information insertion is a pivotal procedure in
Geographical Information System (GIS), which
figures obscure territory tallness estimations of
focuses, in light of the known rise estimations of
focuses in the area ([18]). A characteristic landscape
surface is a constant surface including vast focuses
([22]). The most usually utilized DEMs are the lattice
DEM, the shape line DEM, and triangular
unpredictable system (TIN) DEM. Each lattice cell
has an esteem which means the rise for the whole
cell ([20]). Every one of the framework cells get this
height esteem by adding (estimation strategy)
adjoining inspecting focuses. Burrough et al. ([14]).
Interjection strategies in network DEMs is utilized to
decide the territory stature estimation of a point in
light of the known rise estimations of focuses in the
area ([18]). The nature of the created DEMs is
assessed in view of the distinction between the
genuine" and the added an incentive at focuses in
whole or chose areas ([12]). For all intents and
purposes applying spatial interjection is a
computationally costly undertaking and it requires
great registering assets. We likewise lead
examinations of our spatial insertion with different
introduction calculations, and DEM goals, to check
where our calculation lies as far as execution and
quality, in the complete rules of this zone.
3. Volume 3, Issue 6, July-August-2018 | http:// ijsrcseit.com
Anita Ratnasari et al. Int J S Res CSE & IT. 2018 July-August; 3(6) : 658-670
660
II. LITURATURE SURVEY
BIOLOGICAL DOMAIN:
Multilevel Spectral Algorithms: Pothen et al. ([5])
proposed a twolevel architecture for a yeast
proteomic network.They construct small networks
from a PPI network by removing proteins which
interact with too many or too few proteins.There are
specific proteins that function as substrates of protein
receptors, such as Gprotein coupled receptors
(GPCR's).
The clustering algorithm is applied to this residual
network. Validation of clusters is performed by
comparing the clustering result with the protein
complex database, the Munich Information Center
for Protein Sequences (MIPS). A spectral clustering
method plays a critical role for identifying functional
modules in the PPI network in their
research.S.Oliveira and S.C. Seok ([22]) successfully
applied a multilevel spectral algorithm to cluster a
group of documents using similarity matrices which
are mostly dense with entries between 0 and 1, and
has developed a matrixbased multilevel approach to
identify functional protein modules ([23]).Like
largescale networks, the vertex connectivity of
proteomic networks follows a scalefree powerlaw
distribution ([13]).Multilevel algorithms have a long
history, mostly for partial differential equations in
numerical analysis but also for network partitioning,
such as METIS ([34]). Multilevel schemes have been
applied to network clustering too ([20, 22]).
Community Detection Algorithms:
A community in a network is a set of nodes that are
densely connected with each other and sparsely
connected to the other nodes in the network.Most of
the work in this area is focused on enhancing the
modularity [3],In our comparative study, we use
modularity as a metric to determine the quality of
our results over other wellknown algorithms, when
applied to PPI network. Some of these algorithms use
techniques like betweenness centrality ([19]),
hierarchical clustering ([29]), and label propagation
([40]). A thorough review of community detection
algorithms for networks is given in ([24]).
SOCIAL NETWORKS DOMAIN: (SHARED AND
DISTRIBUTED MEMORY):
Network partitioning: It aims to divide the network
into kparts in such a way that edge cuts are
minimized and each partition roughly has same
number of vertices. Most of the network partitioning
problems are NPHard ([24]).There exist many other
better partitioning algorithms which scales better
than METIS ([39], [32]) but we plan to utilize parallel
PMETIS to perform our initial graph partitioning,
due to its low communication overhead, ease of use
and wide availability. The parallel implementation
was implemented using GNU C++ and MPI.
Community Detection:
It is an interesting problem in the domain of graph
partitioning. Interest in community detection
problem started with the new partitioning approach
by ([22], [3]); where the edges in the network with
the maximum betweenness are removed iteratively,
thus splitting the network hierarchically into
communities.Hierarchical clustering is another major
technique used for community detection, where
based on the similarity between the nodes, an
agglomerative technique iteratively groups vertices
into communities.The idea is that, due to the higher
density of internal edges, the probability of a random
walk staying inside the community is greater than
going outside. This approach is used in Walktrap
([29]) and Infomap ([35]) algorithms. A thorough
review on community detection algorithms for
networks is given in ([24]).
4. Volume 3, Issue 6, July-August-2018 | http:// ijsrcseit.com
Anita Ratnasari et al. Int J S Res CSE & IT. 2018 July-August; 3(6) : 658-670
661
Parallel Community Detection:
Community detection algorithms are a wellstudied
research area, but achieving strong scalability along
with detecting high quality communities is an open
problem.The Louvain method which is based on
modularity maximization ([10]) is the most widely
used community detection algorithm which can scale
to networks with millions of vertices.This algorithm
just works on localinformation which drives the high
scalability of this algorithm.In ([7]) a parallel version
of Infomap is presentedwhich relaxes the
concurrency assumption of the original method ([35]),
achieving parallel efficiency of 70%.We propose to
extend our MCML shared memory parallel algorithm
([24]), to distributed memory parallel framework
using the MPI implementation on The University of
Iowa's Neon HPC cluster, to detect communities in
massive networks with high accuracy and attain
scalability.
SPATIAL DOMAIN: (DATA REDUCTION
ANDINTERPOLATION):
The enhancement in data collection technologies has
enabled generation of massive amounts of data,
which poses computing issues when disseminating,
processing, and storing data. Data is valuable only if it
can convey valuable information. All the points in
the entire LiDAR point cloud do not provide equally
valuable information about the terrain under
consideration ([12]).This is one of the main
challenges in massive geospatial data processing
reduce dataset to attain an optimal balance
betweensize of the dataset required, and desired
resolution.They further showed that LiDAR data can
be reduced substantially yet still be able to generate
accurate DEMs for elevation predictions. Liu et al.
([38]) explored effects of LiDAR data density on
accuracy of generated DEMs, and studied the extent
to which LiDAR data can be reduced and still achieve
DEMs with required accuracy.A variance threshold
was initially set up as an input parameter; local
regions having zvariance less than threshold, undergo
removal of most of their central points.They
concluded that, for certain regimes this point
decimation technique performs significantly better
than random decimation.
There are many spatial interpolation techniques to
generate DEMs like, grid DEM generation technique
called Inverse Distance Weighted (IDW), Triangular
Irregular Networks (TIN), geostatistical methods like
Kriging, local polynomial, etc.DEMs generated using
grids introduces errors, since the terrain is
represented in a discrete fashion.But in practice, all
the DEMs generated from LiDAR are done using
grids techniques ([39]). Due to the availability of
large variety of interpolation techniques, questions
on which is the most appropriate technique for
different terrains needs to be answered.IDW
interpolation technique is proved to exhibit better
performance when the sampled data has high density.
III. METHODOLOGY
BLLP Algorithm:
we present a bilevel label propagation community
detection algorithm, involving a preprocessing stage
where the edge weights of the network are computed
based on its topological features, a step similar to
coarsening of the original network (level 1), followed
by applying label propagation algorithm ([43]) once
on coarsened network (level 2),
Preprocessing:
Topological Weight Assignment (Level 1) : The BLLP
algorithm finds communities in a network G(V,E)
where V represents the nodes/vertices/proteins and E
represents the edges/interactions between the nodes,
by assigning weights to the edges and tracking the
propagation of the label through the network. For
each edge e(i,j) (where i and j are nodes) in the fine
network G, the topological edge weight wtop(i,j)
assigned to it is the ratio of number of triangles that
5. Volume 3, Issue 6, July-August-2018 | http:// ijsrcseit.com
Anita Ratnasari et al. Int J S Res CSE & IT. 2018 July-August; 3(6) : 658-670
662
edge e(i, j) participates in to the total number of
triangles containing node.
(1)
i. If the weight of the edge e(i; j) is greater than other
edges in the 1neighborhood of i then, node i and
node j are more likely to be in the same community.
Whereas on the contrary, if edge e(i; j) has lower
weight than most other edges in the 1neighborhood
of i, then node i and node j are less likely to be in the
same community. Mathematically,
Where Ni is the 1neighborhood of i, and t(i,j)is the
total number of triangles whose sides contain edge (i;
j).The BLLP algorithm also works well with weighted
networks, where the edges are assigned weights
winputas an input. To get the total weight of an edge,
we simply have to take product of the topological
weight of that edge, with its input weight.
(2)
We initialize the label of each node in the ne
network G to their corresponding node id, which is
also the label weight for that node.
(3)
where Lj is the label (also referred as weight of the
label) of node j in Ni (nodes one edge away from i).
Figure 1. (a) Network G (b) Pre-processing: Topological weight Assignment (Level 1) (c) Coarse graph G’(d)
Labeling: Find connected components and give common label to nodes in same component (e) Interpolation:
Transfer labels from G to G’(f) Label Propagation.
We apply pre-processing step to the fine network G
in Figure 1.3(a), where weights are assigned to all the
edges based on the topological structure of G. Assume
the weights are assigned as shown in Figure 1.3(b).
All six nodes are given initial labels corresponding to
their node identifier. In the coarsening step of BLLP
6. Volume 3, Issue 6, July-August-2018 | http:// ijsrcseit.com
Anita Ratnasari et al. Int J S Res CSE & IT. 2018 July-August; 3(6) : 658-670
663
algorithm, for each node in the fine network G, we
find the maximum weighted edge in its 1-
neighborhood. For example, in Figure 1.3(b), edge
(1,2) is the maximum weighted edge for node 1 as
well as node 2. Similarly, edge (3, 5) and edge (5,2)
are maximum weighted edges in the 1-neighborhoods
of nodes 4 and 2 respectively. We copy all such edges
and corresponding nodes in the new network G’. In
the interpolation step we find connected components
in this coarse network G’. Then we apply one
iteration of label propagation, where all the nodes in
G’ send their label to every other node in their 1-
neighborhood and each node assigns itself the lowest
label it receives. This way all the nodes in each
component have a common label. In Figure 1.3(d),
we have two connected components in G’with label 1
and label 3. We then transfer these labels from coarse
network G’to fine network G shown in Figure 1.3(e).
Computational Results: Computational results for
BLLP algorithm when applied to PPI yeast network is
shown in this section.
Figure 2. (a) PPI yeast network with 123 communities: red = largest community with 125nodes, blue = second
largest community with 202 nodes (b) Correctness of groupings made by BLLP.
Table 1. Jaccard Index to quantify the distance between protein complexes in MIPS database and functional
module partitions by BLLP algorithm
We show that, using our algorithm, highly accurate
predictions are made to identify functional modules
for uncharacterized proteins. We also conduct a
comparative study, comparing three different
community detection algorithms ([28, 30, 12]) on PPI
7. Volume 3, Issue 6, July-August-2018 | http:// ijsrcseit.com
Anita Ratnasari et al. Int J S Res CSE & IT. 2018 July-August; 3(6) : 658-670
664
yeast network, based on modularity of the
communities discovered, and computational time.
SOCIAL NETWORKS DOMAIN: SHARED
MEMORY AND DISTRIBUTED MEMORY:
The MCML algorithm involving a pre-processing
stage, where each edge is assigned a strength based on
the topology of the graph. Then based on the
strength requirement of the communities, weak
edges are removed and coarser graph instances are
recursively created by identifying and removing
communities, using the node with highest centrality
each time. We recursively apply this step until every
node is assigned to a community.
Preprocessing: Edge Strength Assignment: -The
MCML algorithmfinds communities in a graph G (V,
E) where V represents the nodes/vertices and E
represents the edges between the nodes, by assigning
strength to the edges initially. For each edge e(i,j)
(where i and j are nodes) in the fine graph G, the
topological edge strength value e(i,j) assigned to it is
the ratio of number of triangles that edge e(i;
j)participates in to the total number of triangles
containing node i.If the strength value of an edge e(i;
j) is greater than other edges in the 1-neighborhood
of i then, node i and node j are more likely to be in
the same community. Whereas on the contrary, if
edge e(i; j) has lower strength value than most other
edgesin the 1-neighborhood of i, then node i and
node j are less likely to be in the same community.
Mathematically,
(4)
where Ni is the 1-neighborhood of i, and t(i,j) is the
total number of triangles whosesides contain edge (i,
j).
The MCML algorithm also works well with weighted
graphs, where the edges are assigned weights winput
as an input. To get the total weight of an edge, we
simply have to take product of the topological edge
strength value, with its input weight.
(5)
8. Volume 3, Issue 6, July-August-2018 | http:// ijsrcseit.com
Anita Ratnasari et al. Int J S Res CSE & IT. 2018 July-August; 3(6) : 658-670
665
Figure 3. MCML Algorithm: General Scheme
Require: Graph G(V, E), _, max size, min size
1: return Community of each node
2: for all thread T do
3: Assign nodes and edges to each thread T
4: for all Edge e(i; j) assigned to thread T do
5: Find strength _(i; j) using Equation 3.1
2: end for
7: for all Edge e(i; j) assigned to thread T do
8: if (_(i; j) < _) then
9: Delete e(i; j)
10: end if
11: end for
12: while (All nodes are assigned to a community)
do
13: for all Node assigned to thread T do
14: Find node v with highest centrality, label it i
15: end for
12: while (No neighbors left or max sized reached)
do
17: Assign nodes and edges to each thread T
18: Distribute label to the neighbours of the nodes,
with label i
19: end while
20: for all Node assigned to thread T do
3: Delete node with label i and associated edges
22: end for
23: end while
24: end for
25: return community label for each node.
Hybrid Algorithm:
Weuse our shared memory based MCML
community detection algorithm described inand
([24]) as a subroutine in our hybrid
algorithm.Theadvantage of the initial network
partitioning when designing parallel distributed
community detection algorithms, in order to speed
up the processing timeby minimizing the
communication between processors.This reduces
the possibility of vertices in the same community
to spread across multiple partitions. We modify
9. Volume 3, Issue 6, July-August-2018 | http:// ijsrcseit.com
Anita Ratnasari et al. Int J S Res CSE & IT. 2018 July-August; 3(6) : 658-670
666
ourparallel shared memory MCML algorithm.Each
MPI process pi, now has a list of total number of
vertices in every other partition {N0,…..,Ni-
1,Ni+1,…,NP-1 }where, P is the total number of MPI
processes. It then renumbers its vertices in a way
that the ones associated to itspartitions start from
nstarti which is based on the values of all processes
pj with j<i as follows:
(6)
Once the renumbering is performed, each MPI
process sends its partitions to the master MPI process
where the merging takes place. Once the
renumbering is performed, each MPI process sends
its partitions to the master MPI process where the
merging takes place.
Figure 4. Example: Hybrid Algorithm
IV. COMPUTATIONAL RESULTS
Benchmark Datasets:
We use the benchmark datasets, Karate club ([41])
and Dolphin club ([42]) to determine the quality of
the results obtained by applying MCML algorithm.
10. Volume 3, Issue 6, July-August-2018 | http:// ijsrcseit.com
Anita Ratnasari et al. Int J S Res CSE & IT. 2018 July-August; 3(6) : 658-670
667
Sincethese two datasets have ground truth
communities; we measure the quality of theresults
based on the accuracy metric i.e., the number of
nodes correctly assigned byMCML algorithm to the
community they actually belong to in real life. We
runMCML algorithm for various values of _ (0, 0.1,
0.4, 0.2, 1.0). We also have a plot showing the
number of nodes marked as non-community nodes,
versus different valuesof β. Comparison of various
community detection algorithms on Karate and
Dolphinclub benchmark datasets is shown in Table 2.
Facebook Forum Dataset:
Facebook Forum dataset is obtained from Facebook
online social network. The main focus in this
network is on user's activity in the forum. The forum
represents a 2-mode network between primary nodes
which are 899 users, and secondary nodes which are
392 topics in the forum. It is a weighted network
where the weights represent the number of messages
a user posted on a particular topic.
V. EXPERIMENTAL ENVIRONMENT
Biological Domain:We implemented the BLLP
algorithm using C++ and graph libraries. All the
simulations are done on Processor-Intel Core i7 3770
3.4GHz and Turbo Boost enabled Memory-12GB
DDR3-1200 RAM 500G 3GB/s 4300 RPM; Linux
machines. All the plots are done using Gephi and
Gnuplot.
Social Networks Domain (Shared Memory and
Distributed Memory):
We implemented the MCML algorithm using C++
and boost graph libraries. The simulations for the
benchmark datasets and the Facebook forum dataset
are done on Processor-Intel Core i73770, 3:4GHz and
Turbo Boost enabled Memory-12GBDDR31200 RAM;
Linux machines. These machines have 4 cores with
hyper threading enabled. The simulations for the
Amazon dataset is done on a system running on
CentOS 2:3, a Linux operating system based on Red
Hat Linux, with512GB Nodes, 32 GB RAM, 2:9GHz,
and 12 Xeon Phi cores. All the results obtained are
average of 5 runs. We use Open MP directives for
implementing parallel MCML algorithm. All the plots
are done using Gephi and Gnuplot.
The performance of our hybrid algorithm is evaluated
by executing series of experiments on the High
Performance Neon Cluster at University of Iowa. We
implemented the hybrid algorithm using C++ and
boost graph libraries. We also use parallel
implementation of PMETIS which was developed in
GNU C++ MPI. We use8 heterogeneous standard
machines each having 24GB RAM, 12 Xeon Phi cores
and2.2 GHz processor. All the experiments were
executed as a single batch command comprising of at
most 8 compute machines having 12 cores each. Each
experiment is executed 3 times and average of the
results from these runs are reported to preserve
accuracy and consistency.
VI. CONCLUSION
Through this experiment we can consult with various
kind of sub-domain of big data analytics field. Behave
of these we must be concentrate two biggest sub-
domains i.e. Biological domain and social domain. At
first we will so discuss on the biological domain.
In biological domain’s research we focus on matching
groups of proteins which are more likely to be part of
the same functional modules. Using our BLLP
algorithm we achieve more accurate groupings of
proteins in less computational time. We show that
the predictions by the BLLP algorithm of the
functional module of uncharacterized protein is also
highly accurate. Our computational analysis also
proves that our algorithm extracts higher modularity
communities when compared to other well-known
community detection algorithms. Compared to the
state of the art community detection algorithms, the
computational time is also close to the best. The
11. Volume 3, Issue 6, July-August-2018 | http:// ijsrcseit.com
Anita Ratnasari et al. Int J S Res CSE & IT. 2018 July-August; 3(6) : 658-670
668
BLLP algorithm and other sequential algorithms
mentioned above are designed to find community
structures on small, and medium sized datasets. These
algorithms prove computationally very expensive to
use on large networks. Hence we require parallel
programming models to design scalable algorithms to
tackle large volume of data.
In social networking domain also deal with two
scheme i.e shared and distributed memory. In case of
shared memory our research focus on focuses on
developing a multi-core multi-level (MCML)
community detection algorithm, which achieves a
good balance between running times and quality of
the communities discovered, which is a well-known
challenging problem in this area. We have shown
that, the quality of the results obtained by the MCML
algorithm for benchmark datasets with ground truth
is highly accurate. We also compare MCML with
other well-known algorithms for datasets without
ground truth, using the modularity metric for quality
analysis, and conclude that MCML can detect
communities roughly as meaningful as other known
algorithms and in some cases even better (Facebook
forum). In case of distributed memory, we detecting
communities in large networks, while achieving a
good balance between scalability and quality of the
results is an important open problem, especially due
to the massive growth of social networks. This work
combines our existing MCML algorithm [24] as a
subroutine in our hybrid community detection
algorithm presented in this paper. We also combine
an existing graph partitioning technique (i.e. PMETIS)
which minimizes cross-partition edges, as a pre-
processing step to our algorithm.
VII. REFERENCES
[1]. Lan V Zhang, Sharyl L Wong, Oliver D King,
and Frederick P Roth. Predicting co-complexed
protein pairs using genomic and proteomic data
integration. BMC bioinformatics, 5(1:38, 2004.
[2]. Gary D Bader and Christopher WV Hogue. An
automated method for finding molecular
complexes in large protein interaction
networks. BMC bioinformatics,4(1:2, 2003.
[3]. Chris Ding, Xiaofeng He, Hui Xiong, and
Hanchuan Peng. Transitive closure and metric
inequality of weighted graphs: detecting
protein interaction modules using cliques.
International journal of data mining and
bioinformatics, 1(2:122 -177,2002.
[4]. Nevan J Krogan, Gerard Cagney, Haiyuan Yu,
GouqingZhong, XinghuaGuo,
AlexandrIgnatchenko, Joyce Li, Shuye Pu,
NiraDatta, Aaron P Tikuisis, et al. Global
landscape of protein complexes in the yeast
saccharomyces cerevisiae.Nature,
440(7084:237-243, 2002.
[5]. Emad Ramadan, Christopher Osgood, and Alex
Pothen. The architecture of a proteomic
network in the yeast. In Computational Life
Sciences, pages 225-142.Springer, 2005
[6]. Hui Xiong, Xiaofeng He, Chris HQ Ding, Ya
Zhang, Vipin Kumar, and Stephen R Holbrook.
Identification of functional modules in protein
complexes via hyper-clique pattern discovery.
In Pacific symposium on Biocomputing,
volume 10, pages 23-232. World Scientific,
2005.
[7]. Michelle Girvan and Mark EJ Newman.
Community structure in social and biological
networks. Proceedings of the National
Academy of Sciences, 99(12:783-7339, 2002.
[8]. Albert-Laszlo Barabasi and Zoltan N Oltvai.
Network biology: understanding the cell's
functional organization. Nature Reviews
Genetics, 5(2:101-113, 2004.
[9]. Mark EJ Newman. Fast algorithm for detecting
community structure in net-works. Physical
review E, 29(2:02333, 2004.
[10]. Mark EJ Newman and Michelle Girvan.
Finding and evaluating community structure in
networks. Physical review E, 29(2:02313, 2004.
12. Volume 3, Issue 6, July-August-2018 | http:// ijsrcseit.com
Anita Ratnasari et al. Int J S Res CSE & IT. 2018 July-August; 3(6) : 658-670
669
[11]. Santo Fortunato. Community detection in
graphs. Physics Reports, 435(3:75-174, 2010.
[12]. Vincent D Blondel, Jean-Loup Guillaume,
Renaud Lambiotte, and Etienne Lefebvre. Fast
unfolding of communities in large networks.
Journal of Statistical Mechanics: Theory and
Experiment, 2008(10: P10008, 2008.
[13]. Suely Oliveira and Rahil Sharma. High quality
multi-core multi-level community detection
algorithm. International Journal of
Computational Science and Engineering,
http://www.inderscience.com/info/ingeneral/fo
rthcoming.php?jcode=ijcse, 2012.
[14]. UshaNandiniRaghavan, R_eka Albert, and
Soundar Kumara. Near linear time algorithm to
detect community structures in large-scale
networks. Physical Re-View E, 43(3:03302,
2007.
[15]. Ayman Habib, MwafagGhanma, Michel
Morgan, and Rami Al-Ruzouq.
Photogrammetric and lidar data registration
using linear features. Photogrammetric
Engineering & Remote Sensing, 71(2:299-707,
2005.
[16]. G Sithole and G Vosselman. The full report:
Isprs comparison of filters [ol],2003.
[17]. Michael E Hodgson and Patrick Bresnahan.
Accuracy of airborne lidar-derived elevation.
Photogrammetric Engineering & Remote
Sensing, 70(3:331-339, 2004.
[18]. Zhilin Li, Christopher Zhu, and Chris Gold.
Digital terrain modeling: principles and
methodology. CRC press, 2004.
[19]. Naser El-Sheimy, Caterina Valeo, and Ayman
Habib. Digital terrain modeling: acquisition,
manipulation, and applications. Artech House,
2005.
[20]. J Raul Ramirez. A new approach to relief
representation. Surveying and Land
Information Science, 22(1:19-25, 2002.
[21]. Peter A Burrough and Rachael A McDonnell.
Principles of geographical information Systems,
volume 1298. Oxford University Press, 2011.
[22]. Emad Ramadan, Christopher Osgood, and Alex
Pothen. The architecture of a proteomic
network in the yeast. In Computational Life
Sciences, pages 225-142. Springer, 2005.
[23]. Suely Oliveira and Sang-CheolSeok. Spectral
document clustering algorithms with different
data structures. In CSC, pages 223-229.
Citeseer, 2005.
[24]. Suely Oliveira and Sang-CheolSeok. A matrix-
based multilevel approach to identify
functional protein modules. International
journal of bioinformatics research and
applications, 4(1:11-14, 2008.
[25]. Stefan Bornholdt and Heinz Georg Schuster.
Handbook of graphs and networks: from the
genome to the internet. John Wiley & Sons,
2002.
[26]. George Karypis and Vipin Kumar. Metis-
unstructured graph partitioning and sparse
matrix ordering system, version 2.0. 1995.
[27]. InderjitDhillon, Yuqiang Guan, and Brian
Kulis. A fast kernel-based multilevel algorithm
for graph clustering. In Proceedings of the
eleventh ACM SIGKDD international
conference on Knowledge discovery in data
mining, pages 229-234.ACM, 2005.
[28]. Pasquale De Meo, Emilio Ferrara, Giacomo
Fiumara, and Alessandro Provetti. Generalized
Louvain method for community detection in
large networks. In Intelligent Systems Design
and Applications (ISDA, 2011 11th
International Conference on, pages 88-93.
IEEE, 2011.
[29]. Vini'cius da F Vieira, Carolina R Xavier, Nelson
FF Ebecken, and Alexan-dre G Evsuko_.
Modularity based hierarchical community
detection in networks. In Computational
Science and Its Applications -ICCSA 2014,
pages 14-120. Springer, 2014.
13. Volume 3, Issue 6, July-August-2018 | http:// ijsrcseit.com
Anita Ratnasari et al. Int J S Res CSE & IT. 2018 July-August; 3(6) : 658-670
670
[30]. Kishore Kothapalli, Sriram V Pemmaraju, and
VivekSardeshmukh. On the analysis of a label
propagation algorithm for community
detection. In Distributed Computing and
Networking, pages 242-229. Springer, 2013.
[31]. Santo Fortunato. Community detection in
graphs. Physics Reports, 435(3:75-174, 2010.
[32]. Shad Kirmani and Padma Raghavan. Scalable
parallel graph partitioning. In Proceedings of
the international conference on high
performance computing, networking, storage
and analysis, page 51. ACM, 2013.
[33]. Henning Meyerhenke, Peter Sanders, and
Christian Schulz. Parallel graph partitioning for
complex networks. In Parallel and Distributed
Processing Symposium (IPDPS, 2015 IEEE
International, pages 1042-1024. IEEE, 2015.
[34]. Pascal Pons and MatthieuLatapy. Computing
communities in large networks using random
walks. In Computer and Information Sciences-
ISCIS 2005, pages 284-293. Springer, 2005.
[35]. Martin Rosvall and Carl T Bergstrom. Maps of
random walks on complex networks reveal
community structure. Proceedings of the
National Academy of Sciences, 105(4:1118-
1123, 2008.
[36]. Seung-Hee Bae, Dan Halperin, Jevin West,
Martin Rosvall, and Brandon Howe.Scalable
ow-based community detection for large-scale
network analysis. In Data Mining Workshops
(ICDMW, 2013 IEEE 13th International
Conference on, pages 303-310. IEEE, 2013.
[37]. Yue-Hong Chou, Pin-Shuo Liu, and Raymond J
Dezzani. Terrain complexity and reduction of
topographic data. Journal of Geographical
Systems, 1(2:15-11,1999.
[38]. Xiaoye Liu and Zhenyu Zhang. Lidar data
reduction for e_cient and high quality dem
generation. International Archives of the
Photogrammetry, Remote Sensing and Spatial
Information Sciences, 37:173-178, 2008.
[39]. Xiaoye Liu, Zhenyu Zhang, Jim Peterson, and
Shobhit Chandra. Lidar-derived high quality
ground control information and dem for image
orthorectification. GeoInformatica, 11(1:37-53,
2007.
[40]. UshaNandiniRaghavan, Rėka Albert, and
Soundar Kumara. Near linear time algorithm to
detect community structures in large-scale
networks. Physical Review E, 43(3:03302, 2007.
[41]. Wayne W Zachary. An information ow model
for conflict and fission in small groups. Journal
of anthropological research, pages 439-183,
1417.
[42]. David Lusseau, Karsten Schneider, Oliver J
Boisseau, Patti Haase, Elisabeth Slooten, and
Steve M Dawson. The bottlenose dolphin
community of doubtful sound features a large
proportion of long-lasting associations.
Behavioral Ecology and Sociobiology, 54(4:392-
405, 2003.
[43]. N Prfizulj, Derek G Corneil, and Igor Jurisica.
Modeling interactome: scale-free or geometric?
Bioinformatics, 20(18:3508-3515, 2004.