In this paper, the authors describe two approaches for dynamic community detection using the CNM algorithm. CNM is a hierarchical, agglomerative algorithm that greedily maximizes modularity. They define two approaches: BasicDyn and FastDyn. BasicDyn backtracks merges of communities until each marked (changed) vertex is its own singleton community. FastDyn undoes a merge only if the quality of merge, as measured by the induced change in modularity, has significantly decreased compared to when the merge initially took place. FastDyn also allows more than two vertices to contract together if in the previous time step these vertices eventually ended up contracted in the same community. In the static case, merging several vertices together in one contraction phase could lead to deteriorating results. FastDyn is able to do this, however, because it uses information from the merges of the previous time step. Intuitively, merges that previously occurred are more likely to be acceptable later.
https://gist.github.com/wolfram77/1856b108334cc822cdddfdfa7334792a
Scalable Static and Dynamic Community Detection Using Grappolo : NOTESSubhajit Sahu
A **community** (in a network) is a subset of nodes which are _strongly connected among themselves_, but _weakly connected to others_. Neither the number of output communities nor their size distribution is known a priori. Community detection methods can be divisive or agglomerative. **Divisive methods** use _betweeness centrality_ to **identify and remove bridges** between communities. **Agglomerative methods** greedily **merge two communities** that provide maximum gain in _modularity_. Newman and Girvan have introduced the **modularity metric**. The problem of community detection is then reduced to the problem of modularity maximization which is **NP-complete**. **Louvain method** is a variant of the _agglomerative strategy_, in that is a _multi-level heuristic_.
https://gist.github.com/wolfram77/917a1a4a429e89a0f2a1911cea56314d
In this paper, the authors discuss **four heuristics** for Community detection using the _Louvain algorithm_ implemented upon recently developed **Grappolo**, which is a parallel variant of the Louvain algorithm. They are:
- Vertex following and Minimum label
- Data caching
- Graph coloring
- Threshold scaling
With the **Vertex following** heuristic, the _input is preprocessed_ and all single-degree vertices are merged with their corresponding neighbours. This helps reduce the number of vertices considered in each iteration, and also help initial seeds of communities to be formed. With the **Minimum label heuristic**, when a vertex is making the decision to move to a community and multiple communities provided the same modularity gain, the community with the smallest id is chosen. This helps _minimize or prevent community swaps_. With the **Data caching** heuristic, community information is stored in a vector instead of a map, and is reused in each iteration, but with some additional cost. With the **Vertex ordering via Graph coloring** heuristic, _distance-k coloring_ of graphs is performed in order to group vertices into colors. Then, each set of vertices (by color) is processed _concurrently_, and synchronization is performed after that. This enables us to mimic the behaviour of the serial algorithm. Finally, with the **Threshold scaling** heuristic, _successively smaller values of modularity threshold_ are used as the algorithm progresses. This allows the algorithm to converge faster, and it has been observed a good modularity score as well.
From the results, it appears that _graph coloring_ and _threshold scaling_ heuristics do not always provide a speedup and this depends upon the nature of the graph. It would be interesting to compare the heuristics against baseline approaches. Future work can include _distributed memory implementations_, and _community detection on streaming graphs_.
A Dynamic Algorithm for Local Community Detection in Graphs : NOTESSubhajit Sahu
**Community detection methods** can be *global* or *local*. **Global community detection methods** divide the entire graph into groups. Existing global algorithms include:
- Random walk methods
- Spectral partitioning
- Label propagation
- Greedy agglomerative and divisive algorithms
- Clique percolation
https://gist.github.com/wolfram77/b4316609265b5b9f88027bbc491f80b6
There is a growing body of work in *detecting overlapping communities*. **Seed set expansion** is a **local community detection method** where a relevant *seed vertices* of interest are picked and *expanded to form communities* surrounding them. The quality of each community is measured using a *fitness function*.
**Modularity** is a *fitness function* which compares the number of intra-community edges to the expected number in a random-null model. **Conductance** is another popular fitness score that measures the community cut or inter-community edges. Many *overlapping community detection* methods **use a modified ratio** of intra-community edges to all edges with atleast one endpoint in the community.
Andersen et al. use a **Spectral PageRank-Nibble method** which minimizes conductance and is formed by adding vertices in order of decreasing PageRank values. Andersen and Lang develop a **random walk approach** in which some vertices in the seed set may not be placed in the final community. Clauset gives a **greedy method** that *starts from a single vertex* and then iteratively adds neighboring vertices *maximizing the local modularity score*. Riedy et al. **expand multiple vertices** via maximizing modularity.
Several algorithms for **detecting global, overlapping communities** use a *greedy*, *agglomerative approach* and run *multiple separate seed set expansions*. Lancichinetti et al. run **greedy seed set expansions**, each with a *single seed vertex*. Overlapping communities are produced by a sequentially running expansions from a node not yet in a community. Lee et al. use **maximal cliques as seed sets**. Havemann et al. **greedily expand cliques**.
The authors of this paper discuss a dynamic approach for **community detection using seed set expansion**. Simply marking the neighbours of changed vertices is a **naive approach**, and has *severe shortcomings*. This is because *communities can split apart*. The simple updating method *may fail even when it outputs a valid community* in the graph.
A Subgraph Pattern Search over Graph DatabasesIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
Delta-Screening: A Fast and Efficient Technique to Update Communities in Dyna...Subhajit Sahu
Highlighted notes during research with Prof. Dip Sankar Banerjee, Prof. Kishore Kothapalli:
Delta-Screening: A Fast and Efficient Technique to Update Communities in Dynamic Graphs.
https://ieeexplore.ieee.org/document/9384277
There are 3 types of community detection methods:
Divisive, Agglomerative, and Multi-level (usually better).
In this paper, heuristics for skipping out most likely unaffected vertices for a modularity-based community detection method like Louvain and SLM (Smart Local Moving) is given. All edge batches are undirected, and sorted by source vertex id. For edge additions, source vertex i, highest modularity changing edge vertex j*, i's neighbors, and j*'s community are marked as affected. For edge deletions, where i and j must be in the same community, i, j, i's neighbors, and i's community are marked as affected. Performance is compared with static, dynamic baseline (incremental), and this method (both Louvain and SLM). Comparison is also done with "DynaMo" and "Batch" community detection methods.
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy
Applications in many areas analyze an ever-changing environment. On billion vertices graphs, providing snapshots imposes a large performance cost. We propose the first formal model for graph analysis running concurrently with streaming data updates. We consider an algorithm valid if its output is correct for the initial graph plus some implicit subset of concurrent changes. We show theoretical properties of the model, demonstrate the model on various algorithms, and extend it to updating results incrementally.
This paper discusses a GPU implementation of the Louvain community detection algorithm. Louvain algorithm obtains hierachical communities as a dendrogram through modularity optimization. Given an undirected weighted graph, all vertices are first considered to be their own communities. In the first phase, each vertex greedily decides to move to the community of one of its neighbours which gives greatest increase in modularity. If moving to no neighbour's community leads to an increase in modularity, the vertex chooses to stay with its own community. This is done sequentially for all the vertices. If the total change in modularity is more than a certain threshold, this phase is repeated. Once this local moving phase is complete, all vertices have formed their first hierarchy of communities. The next phase is called the aggregation phase, where all the vertices belonging to a community are collapsed into a single super-vertex, such that edges between communities are represented as edges between respective super-vertices (edge weights are combined), and edges within each community are represented as self-loops in respective super-vertices (again, edge weights are combined). Together, the local moving and the aggregation phases constitute a stage. This super-vertex graph is then used as input fof the next stage. This process continues until the increase in modularity is below a certain threshold. As a result from each stage, we have a hierarchy of community memberships for each vertex as a dendrogram.
Approaches to perform the Louvain algorithm can be divided into coarse-grained and fine-grained. Coarse-grained approaches process a set of vertices in parallel, while fine-grained approaches process all vertices in parallel. A coarse-grained hybrid-GPU algorithm using multi GPUs has be implemented by Cheong et al. which grabbed my attention. In addition, their algorithm does not use hashing for the local moving phase, but instead sorts each neighbour list based on the community id of each vertex.
https://gist.github.com/wolfram77/7e72c9b8c18c18ab908ae76262099329
Scalable Static and Dynamic Community Detection Using Grappolo : NOTESSubhajit Sahu
A **community** (in a network) is a subset of nodes which are _strongly connected among themselves_, but _weakly connected to others_. Neither the number of output communities nor their size distribution is known a priori. Community detection methods can be divisive or agglomerative. **Divisive methods** use _betweeness centrality_ to **identify and remove bridges** between communities. **Agglomerative methods** greedily **merge two communities** that provide maximum gain in _modularity_. Newman and Girvan have introduced the **modularity metric**. The problem of community detection is then reduced to the problem of modularity maximization which is **NP-complete**. **Louvain method** is a variant of the _agglomerative strategy_, in that is a _multi-level heuristic_.
https://gist.github.com/wolfram77/917a1a4a429e89a0f2a1911cea56314d
In this paper, the authors discuss **four heuristics** for Community detection using the _Louvain algorithm_ implemented upon recently developed **Grappolo**, which is a parallel variant of the Louvain algorithm. They are:
- Vertex following and Minimum label
- Data caching
- Graph coloring
- Threshold scaling
With the **Vertex following** heuristic, the _input is preprocessed_ and all single-degree vertices are merged with their corresponding neighbours. This helps reduce the number of vertices considered in each iteration, and also help initial seeds of communities to be formed. With the **Minimum label heuristic**, when a vertex is making the decision to move to a community and multiple communities provided the same modularity gain, the community with the smallest id is chosen. This helps _minimize or prevent community swaps_. With the **Data caching** heuristic, community information is stored in a vector instead of a map, and is reused in each iteration, but with some additional cost. With the **Vertex ordering via Graph coloring** heuristic, _distance-k coloring_ of graphs is performed in order to group vertices into colors. Then, each set of vertices (by color) is processed _concurrently_, and synchronization is performed after that. This enables us to mimic the behaviour of the serial algorithm. Finally, with the **Threshold scaling** heuristic, _successively smaller values of modularity threshold_ are used as the algorithm progresses. This allows the algorithm to converge faster, and it has been observed a good modularity score as well.
From the results, it appears that _graph coloring_ and _threshold scaling_ heuristics do not always provide a speedup and this depends upon the nature of the graph. It would be interesting to compare the heuristics against baseline approaches. Future work can include _distributed memory implementations_, and _community detection on streaming graphs_.
A Dynamic Algorithm for Local Community Detection in Graphs : NOTESSubhajit Sahu
**Community detection methods** can be *global* or *local*. **Global community detection methods** divide the entire graph into groups. Existing global algorithms include:
- Random walk methods
- Spectral partitioning
- Label propagation
- Greedy agglomerative and divisive algorithms
- Clique percolation
https://gist.github.com/wolfram77/b4316609265b5b9f88027bbc491f80b6
There is a growing body of work in *detecting overlapping communities*. **Seed set expansion** is a **local community detection method** where a relevant *seed vertices* of interest are picked and *expanded to form communities* surrounding them. The quality of each community is measured using a *fitness function*.
**Modularity** is a *fitness function* which compares the number of intra-community edges to the expected number in a random-null model. **Conductance** is another popular fitness score that measures the community cut or inter-community edges. Many *overlapping community detection* methods **use a modified ratio** of intra-community edges to all edges with atleast one endpoint in the community.
Andersen et al. use a **Spectral PageRank-Nibble method** which minimizes conductance and is formed by adding vertices in order of decreasing PageRank values. Andersen and Lang develop a **random walk approach** in which some vertices in the seed set may not be placed in the final community. Clauset gives a **greedy method** that *starts from a single vertex* and then iteratively adds neighboring vertices *maximizing the local modularity score*. Riedy et al. **expand multiple vertices** via maximizing modularity.
Several algorithms for **detecting global, overlapping communities** use a *greedy*, *agglomerative approach* and run *multiple separate seed set expansions*. Lancichinetti et al. run **greedy seed set expansions**, each with a *single seed vertex*. Overlapping communities are produced by a sequentially running expansions from a node not yet in a community. Lee et al. use **maximal cliques as seed sets**. Havemann et al. **greedily expand cliques**.
The authors of this paper discuss a dynamic approach for **community detection using seed set expansion**. Simply marking the neighbours of changed vertices is a **naive approach**, and has *severe shortcomings*. This is because *communities can split apart*. The simple updating method *may fail even when it outputs a valid community* in the graph.
A Subgraph Pattern Search over Graph DatabasesIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
Delta-Screening: A Fast and Efficient Technique to Update Communities in Dyna...Subhajit Sahu
Highlighted notes during research with Prof. Dip Sankar Banerjee, Prof. Kishore Kothapalli:
Delta-Screening: A Fast and Efficient Technique to Update Communities in Dynamic Graphs.
https://ieeexplore.ieee.org/document/9384277
There are 3 types of community detection methods:
Divisive, Agglomerative, and Multi-level (usually better).
In this paper, heuristics for skipping out most likely unaffected vertices for a modularity-based community detection method like Louvain and SLM (Smart Local Moving) is given. All edge batches are undirected, and sorted by source vertex id. For edge additions, source vertex i, highest modularity changing edge vertex j*, i's neighbors, and j*'s community are marked as affected. For edge deletions, where i and j must be in the same community, i, j, i's neighbors, and i's community are marked as affected. Performance is compared with static, dynamic baseline (incremental), and this method (both Louvain and SLM). Comparison is also done with "DynaMo" and "Batch" community detection methods.
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy
Applications in many areas analyze an ever-changing environment. On billion vertices graphs, providing snapshots imposes a large performance cost. We propose the first formal model for graph analysis running concurrently with streaming data updates. We consider an algorithm valid if its output is correct for the initial graph plus some implicit subset of concurrent changes. We show theoretical properties of the model, demonstrate the model on various algorithms, and extend it to updating results incrementally.
This paper discusses a GPU implementation of the Louvain community detection algorithm. Louvain algorithm obtains hierachical communities as a dendrogram through modularity optimization. Given an undirected weighted graph, all vertices are first considered to be their own communities. In the first phase, each vertex greedily decides to move to the community of one of its neighbours which gives greatest increase in modularity. If moving to no neighbour's community leads to an increase in modularity, the vertex chooses to stay with its own community. This is done sequentially for all the vertices. If the total change in modularity is more than a certain threshold, this phase is repeated. Once this local moving phase is complete, all vertices have formed their first hierarchy of communities. The next phase is called the aggregation phase, where all the vertices belonging to a community are collapsed into a single super-vertex, such that edges between communities are represented as edges between respective super-vertices (edge weights are combined), and edges within each community are represented as self-loops in respective super-vertices (again, edge weights are combined). Together, the local moving and the aggregation phases constitute a stage. This super-vertex graph is then used as input fof the next stage. This process continues until the increase in modularity is below a certain threshold. As a result from each stage, we have a hierarchy of community memberships for each vertex as a dendrogram.
Approaches to perform the Louvain algorithm can be divided into coarse-grained and fine-grained. Coarse-grained approaches process a set of vertices in parallel, while fine-grained approaches process all vertices in parallel. A coarse-grained hybrid-GPU algorithm using multi GPUs has be implemented by Cheong et al. which grabbed my attention. In addition, their algorithm does not use hashing for the local moving phase, but instead sorts each neighbour list based on the community id of each vertex.
https://gist.github.com/wolfram77/7e72c9b8c18c18ab908ae76262099329
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...Artem Lutov
Slides of the presentation given at BigData'19, special session on Information Granulation in Data Science and Scalable Computing.
The fully automatic (i.e., without any manual tuning) graph embedding (i.e., network representation learning, unsupervised feature extraction) performed in near-linear time is presented. The resulting embeddings are interpretable, preserve both low- and high-order structural proximity of the graph nodes, computed (i.e., learned) by orders of magnitude faster and perform competitively to the manually tuned best state-of-the-art embedding techniques evaluated on diverse tasks of graph analysis.
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...acijjournal
Cluster analysis of graph related problems is an important issue now-a-day. Different types of graph
clustering techniques are appeared in the field but most of them are vulnerable in terms of effectiveness
and fragmentation of output in case of real-world applications in diverse systems. In this paper, we will
provide a comparative behavioural analysis of RNSC (Restricted Neighbourhood Search Clustering) and
MCL (Markov Clustering) algorithms on Power-Law Distribution graphs. RNSC is a graph clustering
technique using stochastic local search. RNSC algorithm tries to achieve optimal cost clustering by
assigning some cost functions to the set of clusterings of a graph. This algorithm was implemented by A.
D. King only for undirected and unweighted random graphs. Another popular graph clustering
algorithm MCL is based on stochastic flow simulation model for weighted graphs. There are plentiful
applications of power-law or scale-free graphs in nature and society. Scale-free topology is stochastic i.e.
nodes are connected in a random manner. Complex network topologies like World Wide Web, the web of
human sexual contacts, or the chemical network of a cell etc., are basically following power-law
distribution to represent different real-life systems. This paper uses real large-scale power-law
distribution graphs to conduct the performance analysis of RNSC behaviour compared with Markov
clustering (MCL) algorithm. Extensive experimental results on several synthetic and real power-law
distribution datasets reveal the effectiveness of our approach to comparative performance measure of
these algorithms on the basis of cost of clustering, cluster size, modularity index of clustering results and
normalized mutual information (NMI).
Parallel Batch-Dynamic Graphs: Algorithms and Lower BoundsSubhajit Sahu
Highlighted notes on Parallel Batch-Dynamic Graphs: Algorithms and Lower Bounds.
While doing research work under Prof. Kishore Kothapalli.
Laxman Dhulipala, David Durfee, Janardhan Kulkarni, Richard Peng, Saurabh Sawlani, Xiaorui Sun:
Parallel Batch-Dynamic Graphs: Algorithms and Lower Bounds. SODA 2020: 1300-1319
In this paper we study the problem of dynamically maintaining graph properties under batches of edge insertions and deletions in the massively parallel model of computation. In this setting, the graph is stored on a number of machines, each having space strongly sublinear with respect to the number of vertices, that is, n for some constant 0 < < 1. Our goal is to handle batches of updates and queries where the data for each batch fits onto one machine in constant rounds of parallel computation, as well as to reduce the total communication between the machines. This objective corresponds to the gradual buildup of databases over time, while the goal of obtaining constant rounds of communication for problems in the static setting has been elusive for problems as simple as undirected graph connectivity. We give an algorithm for dynamic graph connectivity in this setting with constant communication rounds and communication cost almost linear in terms of the batch size. Our techniques combine a new graph contraction technique, an independent random sample extractor from correlated samples, as well as distributed data structures supporting parallel updates and queries in batches. We also illustrate the power of dynamic algorithms in the MPC model by showing that the batched version of the adaptive connectivity problem is P-complete in the centralized setting, but sub-linear sized batches can be handled in a constant number of rounds. Due to the wide applicability of our approaches, we believe it represents a practically-motivated workaround to the current difficulties in designing more efficient massively parallel static graph algorithms.
Parallel Batch-Dynamic Graphs: Algorithms and Lower BoundsSubhajit Sahu
In this paper we study the problem of dynamically
maintaining graph properties under batches of edge
insertions and deletions in the massively parallel model
of computation. In this setting, the graph is stored
on a number of machines, each having space strongly
sublinear with respect to the number of vertices, that
is, n
for some constant 0 < < 1. Our goal is to
handle batches of updates and queries where the data
for each batch fits onto one machine in constant rounds
of parallel computation, as well as to reduce the total
communication between the machines. This objective
corresponds to the gradual buildup of databases over
time, while the goal of obtaining constant rounds of
communication for problems in the static setting has
been elusive for problems as simple as undirected graph
connectivity.
We give an algorithm for dynamic graph connectivity
in this setting with constant communication rounds and
communication cost almost linear in terms of the batch
size. Our techniques combine a new graph contraction
technique, an independent random sample extractor from
correlated samples, as well as distributed data structures
supporting parallel updates and queries in batches.
We also illustrate the power of dynamic algorithms in
the MPC model by showing that the batched version
of the adaptive connectivity problem is P-complete in
the centralized setting, but sub-linear sized batches can
be handled in a constant number of rounds. Due to
the wide applicability of our approaches, we believe
it represents a practically-motivated workaround to the
current difficulties in designing more efficient massively
parallel static graph algorithms.
Poster given at the Bioinformatics Open Source Conference (BOSC) and Intelligent Systems for Molecular Biology (ISMB) Conference held July 8-12, 2016 in Orlando, Florida, entitled GRNmap and GRNsight: open source software for dynamical systems modeling and visualization of medium-scale gene regulatory networks. Abstract: A gene regulatory network (GRN) consists of genes, transcription factors, and the regulatory connections between them that govern the level of expression of mRNA and proteins from those genes. Our open source MATLAB software package, GRNmap (http://kdahlquist.github.io/GRNmap/), uses ordinary differential equations to model the dynamics of medium-scale GRNs. The program uses a penalized least squares approach (Dahlquist et al. 2015, DOI: 10.1007/s11538-015-0092-6) to estimate production rates, expression thresholds, and regulatory weights for each transcription factor in the network based on gene expression data, and then performs a forward simulation of the dynamics of the network. GRNmap has options for using a sigmoidal or Michaelis-Menten production function. Parameters for a series of related networks, ranging in size from 15 to 35 genes, were optimized against DNA microarray data measuring the transcriptional response to cold shock in budding yeast, Saccharomyces cerevisiae, for the wild type strain and strains deleted for the transcription factors Cin5, Gln3, Hap4, Hmo1, and Zap1, giving biological insights into this process. GRNsight is an open source web application for visualizing such models of gene regulatory networks (http://dondi.github.io/GRNsight/index.html). GRNsight accepts GRNmap- or user-generated Excel spreadsheets containing an adjacency matrix representation of the GRN and automatically lays out the graph. The application colors the edges and adjusts their thicknesses based on the sign (activation or repression) and the strength (magnitude) of the regulatory relationship, respectively. Users can then modify the graph to define the best visual layout for the network. This work was partially supported by NSF award 0921038.
NMS and Thresholding Architecture used for FPGA based Canny Edge Detector for...idescitation
In this paper, an architecture designed for Non-
Maximal Suppression used in Canny edge detection algorithm
is presented in order to reduce memory requirements
significantly. The architecture also achieves decreased latency
and increased throughput with no loss in edge detection. The
new algorithm used has a low-complexity 8-bin non-uniform
gradient magnitude histogram to compute block-based
hysteresis thresholds that are used by the Canny edge detector.
Furthermore, the hardware architecture of the proposed
algorithm is presented in this paper and the architecture is
synthesized on the Xilinx Virtex 5 FPGA. The design
development is done in VHDL and simulated results are
obtained using modelsim 6.3 with Xilinx 12.2.
Development of stereo matching algorithm based on sum of absolute RGB color d...IJECEIAES
This article presents local-based stereo matching algorithm which comprises a devel- opment of an algorithm using block matching and two edge preserving filters in the framework. Fundamentally, the matching process consists of several stages which will produce the disparity or depth map. The problem and most challenging work for matching process is to get an accurate corresponding point between two images. Hence, this article proposes an algorithm for stereo matching using improved Sum of Absolute RGB Differences (SAD), gradient matching and edge preserving filters. It is Bilateral Filter (BF) to surge up the accuracy. The SAD and gradient matching will be implemented at the first stage to get the preliminary corresponding result, then the BF works as an edge-preserving filter to remove the noise from the first stage. The second BF is used at the last stage to improve final disparity map and increase the object boundaries. The experimental analysis and validation are using the Middlebury standard benchmarking evaluation system. Based on the results, the proposed work is capable to increase the accuracy and to preserve the object edges. To make the proposed work more reliable with current available methods, the quantitative measurement has been made to compare with other existing methods and it shows the proposed work in this article perform much better.
UrbaWind, a Computational Fluid Dynamics tool to predict wind resource in urb...Stephane Meteodyn
Computational Fluid Dynamics (CFD) is already a necessary tool for modeling the wind over complex country side terrains. Indeed to maximize energetic yield and optimize the costs, before installing the wind systems, a good knowledge of wind characteristics at the site is required. Meteodyn has developed UrbaWind, which is an automatic CFD software for computing the wind between buildings for small wind turbines. Compared to rural open spaces, the geometry in urban areas is more complex and unforeseeable. The effects created by the buildings, such as vortexes at the feet of the towers, Venturi effect or Wise effect, make the modeling of urban flows more difficult. The model used in UrbaWind allows to take these effects into account by solving the equations of Fluid Mechanics with a specific model which can represent the turbulence and the wakes around buildings as well as the porosity of the trees. In order to validate UrbaWind’s results, different study cases proposed by the Architectural Institute of Japan have been set up. The three selected cases have an ascending complexity, from the simple block to the complete rebuilding of a quarter of the Japanese city of Niigata. The results validate UrbaWind as well for theoretical cases as for real cases by offering a minor error margin on the wind speed prediction.
http://meteodyn.com/en
Integration of the natural cross ventilation in the CFD software UrbaWindStephane Meteodyn
Nowadays, a lot of energy is spent for air-conditioning in the cities for offices and private-housing. A good knowledge of the urban micro climate around the buildings could allow using the wind for natural air ventilation. UrbaWind is an automatic computational fluid dynamics (CFD) code developed in 2008 by Meteodyn to model the wind in urban environment. A module was recently added to assess the buildings air ventilation. First UrbaWind integrates climatology according to the geographic location of the site. Giving the influence of the shape and urban planning on the wind behaviors, UrbaWind solves the equations of fluid mechanics with a specific model which allows taking into account the urban environment effects such as vortexes, venturi or wise effects. Finally, the software is able to compute the wind flow inside each internal volume according to the openings of the buildings.This paper presents this software that has been designed for energy engineers to optimize the energy consumption inside a building. This is also an important tool for architects and project managers to make a building. The shape of the building as well as the orientation and the location of the openings can be designed with the awareness of the wind-induced natural air ventilation.
Using spectral radius ratio for node degreeIJCNCJournal
In this paper, we show that the spectral radius ratio for node degree could be used to analyze the variation of node degree during the evolution of complex networks. We focus on three commonly studied models of complex networks: random networks, scale-free networks and small-world networks. The spectral radius ratio for node degree is defined as the ratio of the principal (largest) eigenvalue of the adjacency matrix of a network graph to that of the average node degree. During the evolution of each of the above three categories of networks (using the appropriate evolution model for each category), we observe the spectral radius ratio for node degree to exhibit high-very high positive correlation (0.75 or above) to that of the
coefficient of variation of node degree (ratio of the standard deviation of node degree and average node degree). We show that the spectral radius ratio for node degree could be used as the basis to tune the operating parameters of the evolution models for each of the three categories of complex networks as well as analyze the impact of specific operating parameters for each model.
Half Gaussian-based wavelet transform for pooling layer for convolution neura...TELKOMNIKA JOURNAL
Pooling methods are used to select most significant features to be aggregated to small region. In this paper, anew pooling method is proposed based on probability function. Depending on the fact that, most information is concentrated from mean of the signal to its maximum values, upper half of Gaussian function is used to determine weights of the basic signal statistics, which is used to determine the transform of the original signal into more concise formula, which can represent signal features, this method named half gaussian transform (HGT). Based on strategy of transform computation, Three methods are proposed, the first method (HGT1) is used basic statistics after normalized it as weights to be multiplied by original signal, second method (HGT2) is used determined statistics as features of the original signal and multiply it with constant weights based on half Gaussian, while the third method (HGT3) is worked in similar to (HGT1) except, it depend on entire signal. The proposed methods are applied on three databases, which are (MNIST, CIFAR10 and MIT-BIH ECG) database. The experimental results show that, our methods are achieved good improvement, which is outperformed standard pooling methods such as max pooling and average pooling.
Multiple Ant Colony Optimizations for Stereo MatchingCSCJournals
The stereo matching problem, which obtains the correspondence between right and left images, can be cast as a search problem. The matching of all candidates in the same line forms a 2D optimization task and the two dimensional (2D) optimization is a NP-hard problem. There are two characteristics in stereo matching. Firstly, the local optimization process along each scan-line can be done concurrently; secondly, there are some relationship among adjacent scan-lines can be explored to promote the matching correctness. Although there are many methods, such as GCPs, GGCPs are proposed, but these so called GCPs maybe not be ground. The relationship among adjacent scan-lines is posteriori, that is to say the relationship can only be discovered after every optimization is finished. The Multiple Ant Colony Optimization(MACO) is efficient to solve large scale problem. It is a proper way to settle down the stereo matching task with constructed MACO, in which the master layer values the sub-solutions and propagate the reliability after every local optimization is finished. Besides, whether the ordering and uniqueness constraints should be considered during the optimization is discussed, and the proposed algorithm is proved to guarantee its convergence to find the optimal matched pairs.
O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...csandit
The high-level contributions of this paper are as f
ollows: We modify an existing branch-and-
bound based exact algorithm (for maximum clique siz
e of an entire graph) to determine the
maximal clique size that the individual vertices in
the graph are part of. We then run this
algorithm on six real-world network graphs (ranging
from random networks to scale-free
networks) and analyze the distribution of the maxim
al clique size of the vertices in these graphs.
We observe five of the six real-world network graph
s to exhibit a Poisson-style distribution for
the maximal clique size of the vertices. We analyze
the correlation between the maximal clique
size and the clustering coefficient of the vertices
, and find these two metrics to be poorly
correlated for the real-world network graphs. Final
ly, we analyze the Assortativity index of the
vertices of the real-world network graphs and obser
ve the graphs to exhibit positive
assortativity with respect to maximal clique size a
nd negative assortativity with respect to node
degree; nevertheless, we observe the Assortativity
index of the real-world network graphs with
respect to both the maximal clique size and node de
gree to increase with decrease in the
spectral radius ratio for node degree, indicating a
positive correlation between the maximal
clique size and node degree.
Community detection of political blogs network based on structure-attribute g...IJECEIAES
Complex networks provide means to represent different kinds of networks with multiple features. Most biological, sensor and social networks can be represented as a graph depending on the pattern of connections among their elements. The goal of the graph clustering is to divide a large graph into many clusters based on various similarity criteria’s. Political blogs as standard social dataset network, in which it can be considered as blog-blog connection, where each node has political learning beside other attributes. The main objective of work is to introduce a graph clustering method in social network analysis. The proposed Structure-Attribute Similarity (SAS-Cluster) able to detect structures of community, based on nodes similarities. The method combines topological structure with multiple characteristics of nodes, to earn the ultimate similarity. The proposed method is evaluated using well-known evaluation measures, Density, and Entropy. Finally, the presented method was compared with the state-of-art comparative method, and the results show that the proposed method is superior to the comparative method according to the evaluations measures.
About TrueTime, Spanner, Clock synchronization, CAP theorem, Two-phase lockin...Subhajit Sahu
TrueTime is a service that enables the use of globally synchronized clocks, with bounded error. It returns a time interval that is guaranteed to contain the clock’s actual time for some time during the call’s execution. If two intervals do not overlap, then we know calls were definitely ordered in real time. In general, synchronized clocks can be used to avoid communication in a distributed system.
The underlying source of time is a combination of GPS receivers and atomic clocks. As there are “time masters” in every datacenter (redundantly), it is likely that both sides of a partition would continue to enjoy accurate time. Individual nodes however need network connectivity to the masters, and without it their clocks will drift. Thus, during a partition their intervals slowly grow wider over time, based on bounds on the rate of local clock drift. Operations depending on TrueTime, such as Paxos leader election or transaction commits, thus have to wait a little longer, but the operation still completes (assuming the 2PC and quorum communication are working).
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
More Related Content
Similar to Fast Incremental Community Detection on Dynamic Graphs : NOTES
DAOR - Bridging the Gap between Community and Node Representations: Graph Emb...Artem Lutov
Slides of the presentation given at BigData'19, special session on Information Granulation in Data Science and Scalable Computing.
The fully automatic (i.e., without any manual tuning) graph embedding (i.e., network representation learning, unsupervised feature extraction) performed in near-linear time is presented. The resulting embeddings are interpretable, preserve both low- and high-order structural proximity of the graph nodes, computed (i.e., learned) by orders of magnitude faster and perform competitively to the manually tuned best state-of-the-art embedding techniques evaluated on diverse tasks of graph analysis.
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...acijjournal
Cluster analysis of graph related problems is an important issue now-a-day. Different types of graph
clustering techniques are appeared in the field but most of them are vulnerable in terms of effectiveness
and fragmentation of output in case of real-world applications in diverse systems. In this paper, we will
provide a comparative behavioural analysis of RNSC (Restricted Neighbourhood Search Clustering) and
MCL (Markov Clustering) algorithms on Power-Law Distribution graphs. RNSC is a graph clustering
technique using stochastic local search. RNSC algorithm tries to achieve optimal cost clustering by
assigning some cost functions to the set of clusterings of a graph. This algorithm was implemented by A.
D. King only for undirected and unweighted random graphs. Another popular graph clustering
algorithm MCL is based on stochastic flow simulation model for weighted graphs. There are plentiful
applications of power-law or scale-free graphs in nature and society. Scale-free topology is stochastic i.e.
nodes are connected in a random manner. Complex network topologies like World Wide Web, the web of
human sexual contacts, or the chemical network of a cell etc., are basically following power-law
distribution to represent different real-life systems. This paper uses real large-scale power-law
distribution graphs to conduct the performance analysis of RNSC behaviour compared with Markov
clustering (MCL) algorithm. Extensive experimental results on several synthetic and real power-law
distribution datasets reveal the effectiveness of our approach to comparative performance measure of
these algorithms on the basis of cost of clustering, cluster size, modularity index of clustering results and
normalized mutual information (NMI).
Parallel Batch-Dynamic Graphs: Algorithms and Lower BoundsSubhajit Sahu
Highlighted notes on Parallel Batch-Dynamic Graphs: Algorithms and Lower Bounds.
While doing research work under Prof. Kishore Kothapalli.
Laxman Dhulipala, David Durfee, Janardhan Kulkarni, Richard Peng, Saurabh Sawlani, Xiaorui Sun:
Parallel Batch-Dynamic Graphs: Algorithms and Lower Bounds. SODA 2020: 1300-1319
In this paper we study the problem of dynamically maintaining graph properties under batches of edge insertions and deletions in the massively parallel model of computation. In this setting, the graph is stored on a number of machines, each having space strongly sublinear with respect to the number of vertices, that is, n for some constant 0 < < 1. Our goal is to handle batches of updates and queries where the data for each batch fits onto one machine in constant rounds of parallel computation, as well as to reduce the total communication between the machines. This objective corresponds to the gradual buildup of databases over time, while the goal of obtaining constant rounds of communication for problems in the static setting has been elusive for problems as simple as undirected graph connectivity. We give an algorithm for dynamic graph connectivity in this setting with constant communication rounds and communication cost almost linear in terms of the batch size. Our techniques combine a new graph contraction technique, an independent random sample extractor from correlated samples, as well as distributed data structures supporting parallel updates and queries in batches. We also illustrate the power of dynamic algorithms in the MPC model by showing that the batched version of the adaptive connectivity problem is P-complete in the centralized setting, but sub-linear sized batches can be handled in a constant number of rounds. Due to the wide applicability of our approaches, we believe it represents a practically-motivated workaround to the current difficulties in designing more efficient massively parallel static graph algorithms.
Parallel Batch-Dynamic Graphs: Algorithms and Lower BoundsSubhajit Sahu
In this paper we study the problem of dynamically
maintaining graph properties under batches of edge
insertions and deletions in the massively parallel model
of computation. In this setting, the graph is stored
on a number of machines, each having space strongly
sublinear with respect to the number of vertices, that
is, n
for some constant 0 < < 1. Our goal is to
handle batches of updates and queries where the data
for each batch fits onto one machine in constant rounds
of parallel computation, as well as to reduce the total
communication between the machines. This objective
corresponds to the gradual buildup of databases over
time, while the goal of obtaining constant rounds of
communication for problems in the static setting has
been elusive for problems as simple as undirected graph
connectivity.
We give an algorithm for dynamic graph connectivity
in this setting with constant communication rounds and
communication cost almost linear in terms of the batch
size. Our techniques combine a new graph contraction
technique, an independent random sample extractor from
correlated samples, as well as distributed data structures
supporting parallel updates and queries in batches.
We also illustrate the power of dynamic algorithms in
the MPC model by showing that the batched version
of the adaptive connectivity problem is P-complete in
the centralized setting, but sub-linear sized batches can
be handled in a constant number of rounds. Due to
the wide applicability of our approaches, we believe
it represents a practically-motivated workaround to the
current difficulties in designing more efficient massively
parallel static graph algorithms.
Poster given at the Bioinformatics Open Source Conference (BOSC) and Intelligent Systems for Molecular Biology (ISMB) Conference held July 8-12, 2016 in Orlando, Florida, entitled GRNmap and GRNsight: open source software for dynamical systems modeling and visualization of medium-scale gene regulatory networks. Abstract: A gene regulatory network (GRN) consists of genes, transcription factors, and the regulatory connections between them that govern the level of expression of mRNA and proteins from those genes. Our open source MATLAB software package, GRNmap (http://kdahlquist.github.io/GRNmap/), uses ordinary differential equations to model the dynamics of medium-scale GRNs. The program uses a penalized least squares approach (Dahlquist et al. 2015, DOI: 10.1007/s11538-015-0092-6) to estimate production rates, expression thresholds, and regulatory weights for each transcription factor in the network based on gene expression data, and then performs a forward simulation of the dynamics of the network. GRNmap has options for using a sigmoidal or Michaelis-Menten production function. Parameters for a series of related networks, ranging in size from 15 to 35 genes, were optimized against DNA microarray data measuring the transcriptional response to cold shock in budding yeast, Saccharomyces cerevisiae, for the wild type strain and strains deleted for the transcription factors Cin5, Gln3, Hap4, Hmo1, and Zap1, giving biological insights into this process. GRNsight is an open source web application for visualizing such models of gene regulatory networks (http://dondi.github.io/GRNsight/index.html). GRNsight accepts GRNmap- or user-generated Excel spreadsheets containing an adjacency matrix representation of the GRN and automatically lays out the graph. The application colors the edges and adjusts their thicknesses based on the sign (activation or repression) and the strength (magnitude) of the regulatory relationship, respectively. Users can then modify the graph to define the best visual layout for the network. This work was partially supported by NSF award 0921038.
NMS and Thresholding Architecture used for FPGA based Canny Edge Detector for...idescitation
In this paper, an architecture designed for Non-
Maximal Suppression used in Canny edge detection algorithm
is presented in order to reduce memory requirements
significantly. The architecture also achieves decreased latency
and increased throughput with no loss in edge detection. The
new algorithm used has a low-complexity 8-bin non-uniform
gradient magnitude histogram to compute block-based
hysteresis thresholds that are used by the Canny edge detector.
Furthermore, the hardware architecture of the proposed
algorithm is presented in this paper and the architecture is
synthesized on the Xilinx Virtex 5 FPGA. The design
development is done in VHDL and simulated results are
obtained using modelsim 6.3 with Xilinx 12.2.
Development of stereo matching algorithm based on sum of absolute RGB color d...IJECEIAES
This article presents local-based stereo matching algorithm which comprises a devel- opment of an algorithm using block matching and two edge preserving filters in the framework. Fundamentally, the matching process consists of several stages which will produce the disparity or depth map. The problem and most challenging work for matching process is to get an accurate corresponding point between two images. Hence, this article proposes an algorithm for stereo matching using improved Sum of Absolute RGB Differences (SAD), gradient matching and edge preserving filters. It is Bilateral Filter (BF) to surge up the accuracy. The SAD and gradient matching will be implemented at the first stage to get the preliminary corresponding result, then the BF works as an edge-preserving filter to remove the noise from the first stage. The second BF is used at the last stage to improve final disparity map and increase the object boundaries. The experimental analysis and validation are using the Middlebury standard benchmarking evaluation system. Based on the results, the proposed work is capable to increase the accuracy and to preserve the object edges. To make the proposed work more reliable with current available methods, the quantitative measurement has been made to compare with other existing methods and it shows the proposed work in this article perform much better.
UrbaWind, a Computational Fluid Dynamics tool to predict wind resource in urb...Stephane Meteodyn
Computational Fluid Dynamics (CFD) is already a necessary tool for modeling the wind over complex country side terrains. Indeed to maximize energetic yield and optimize the costs, before installing the wind systems, a good knowledge of wind characteristics at the site is required. Meteodyn has developed UrbaWind, which is an automatic CFD software for computing the wind between buildings for small wind turbines. Compared to rural open spaces, the geometry in urban areas is more complex and unforeseeable. The effects created by the buildings, such as vortexes at the feet of the towers, Venturi effect or Wise effect, make the modeling of urban flows more difficult. The model used in UrbaWind allows to take these effects into account by solving the equations of Fluid Mechanics with a specific model which can represent the turbulence and the wakes around buildings as well as the porosity of the trees. In order to validate UrbaWind’s results, different study cases proposed by the Architectural Institute of Japan have been set up. The three selected cases have an ascending complexity, from the simple block to the complete rebuilding of a quarter of the Japanese city of Niigata. The results validate UrbaWind as well for theoretical cases as for real cases by offering a minor error margin on the wind speed prediction.
http://meteodyn.com/en
Integration of the natural cross ventilation in the CFD software UrbaWindStephane Meteodyn
Nowadays, a lot of energy is spent for air-conditioning in the cities for offices and private-housing. A good knowledge of the urban micro climate around the buildings could allow using the wind for natural air ventilation. UrbaWind is an automatic computational fluid dynamics (CFD) code developed in 2008 by Meteodyn to model the wind in urban environment. A module was recently added to assess the buildings air ventilation. First UrbaWind integrates climatology according to the geographic location of the site. Giving the influence of the shape and urban planning on the wind behaviors, UrbaWind solves the equations of fluid mechanics with a specific model which allows taking into account the urban environment effects such as vortexes, venturi or wise effects. Finally, the software is able to compute the wind flow inside each internal volume according to the openings of the buildings.This paper presents this software that has been designed for energy engineers to optimize the energy consumption inside a building. This is also an important tool for architects and project managers to make a building. The shape of the building as well as the orientation and the location of the openings can be designed with the awareness of the wind-induced natural air ventilation.
Using spectral radius ratio for node degreeIJCNCJournal
In this paper, we show that the spectral radius ratio for node degree could be used to analyze the variation of node degree during the evolution of complex networks. We focus on three commonly studied models of complex networks: random networks, scale-free networks and small-world networks. The spectral radius ratio for node degree is defined as the ratio of the principal (largest) eigenvalue of the adjacency matrix of a network graph to that of the average node degree. During the evolution of each of the above three categories of networks (using the appropriate evolution model for each category), we observe the spectral radius ratio for node degree to exhibit high-very high positive correlation (0.75 or above) to that of the
coefficient of variation of node degree (ratio of the standard deviation of node degree and average node degree). We show that the spectral radius ratio for node degree could be used as the basis to tune the operating parameters of the evolution models for each of the three categories of complex networks as well as analyze the impact of specific operating parameters for each model.
Half Gaussian-based wavelet transform for pooling layer for convolution neura...TELKOMNIKA JOURNAL
Pooling methods are used to select most significant features to be aggregated to small region. In this paper, anew pooling method is proposed based on probability function. Depending on the fact that, most information is concentrated from mean of the signal to its maximum values, upper half of Gaussian function is used to determine weights of the basic signal statistics, which is used to determine the transform of the original signal into more concise formula, which can represent signal features, this method named half gaussian transform (HGT). Based on strategy of transform computation, Three methods are proposed, the first method (HGT1) is used basic statistics after normalized it as weights to be multiplied by original signal, second method (HGT2) is used determined statistics as features of the original signal and multiply it with constant weights based on half Gaussian, while the third method (HGT3) is worked in similar to (HGT1) except, it depend on entire signal. The proposed methods are applied on three databases, which are (MNIST, CIFAR10 and MIT-BIH ECG) database. The experimental results show that, our methods are achieved good improvement, which is outperformed standard pooling methods such as max pooling and average pooling.
Multiple Ant Colony Optimizations for Stereo MatchingCSCJournals
The stereo matching problem, which obtains the correspondence between right and left images, can be cast as a search problem. The matching of all candidates in the same line forms a 2D optimization task and the two dimensional (2D) optimization is a NP-hard problem. There are two characteristics in stereo matching. Firstly, the local optimization process along each scan-line can be done concurrently; secondly, there are some relationship among adjacent scan-lines can be explored to promote the matching correctness. Although there are many methods, such as GCPs, GGCPs are proposed, but these so called GCPs maybe not be ground. The relationship among adjacent scan-lines is posteriori, that is to say the relationship can only be discovered after every optimization is finished. The Multiple Ant Colony Optimization(MACO) is efficient to solve large scale problem. It is a proper way to settle down the stereo matching task with constructed MACO, in which the master layer values the sub-solutions and propagate the reliability after every local optimization is finished. Besides, whether the ordering and uniqueness constraints should be considered during the optimization is discussed, and the proposed algorithm is proved to guarantee its convergence to find the optimal matched pairs.
O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...csandit
The high-level contributions of this paper are as f
ollows: We modify an existing branch-and-
bound based exact algorithm (for maximum clique siz
e of an entire graph) to determine the
maximal clique size that the individual vertices in
the graph are part of. We then run this
algorithm on six real-world network graphs (ranging
from random networks to scale-free
networks) and analyze the distribution of the maxim
al clique size of the vertices in these graphs.
We observe five of the six real-world network graph
s to exhibit a Poisson-style distribution for
the maximal clique size of the vertices. We analyze
the correlation between the maximal clique
size and the clustering coefficient of the vertices
, and find these two metrics to be poorly
correlated for the real-world network graphs. Final
ly, we analyze the Assortativity index of the
vertices of the real-world network graphs and obser
ve the graphs to exhibit positive
assortativity with respect to maximal clique size a
nd negative assortativity with respect to node
degree; nevertheless, we observe the Assortativity
index of the real-world network graphs with
respect to both the maximal clique size and node de
gree to increase with decrease in the
spectral radius ratio for node degree, indicating a
positive correlation between the maximal
clique size and node degree.
Community detection of political blogs network based on structure-attribute g...IJECEIAES
Complex networks provide means to represent different kinds of networks with multiple features. Most biological, sensor and social networks can be represented as a graph depending on the pattern of connections among their elements. The goal of the graph clustering is to divide a large graph into many clusters based on various similarity criteria’s. Political blogs as standard social dataset network, in which it can be considered as blog-blog connection, where each node has political learning beside other attributes. The main objective of work is to introduce a graph clustering method in social network analysis. The proposed Structure-Attribute Similarity (SAS-Cluster) able to detect structures of community, based on nodes similarities. The method combines topological structure with multiple characteristics of nodes, to earn the ultimate similarity. The proposed method is evaluated using well-known evaluation measures, Density, and Entropy. Finally, the presented method was compared with the state-of-art comparative method, and the results show that the proposed method is superior to the comparative method according to the evaluations measures.
Similar to Fast Incremental Community Detection on Dynamic Graphs : NOTES (20)
About TrueTime, Spanner, Clock synchronization, CAP theorem, Two-phase lockin...Subhajit Sahu
TrueTime is a service that enables the use of globally synchronized clocks, with bounded error. It returns a time interval that is guaranteed to contain the clock’s actual time for some time during the call’s execution. If two intervals do not overlap, then we know calls were definitely ordered in real time. In general, synchronized clocks can be used to avoid communication in a distributed system.
The underlying source of time is a combination of GPS receivers and atomic clocks. As there are “time masters” in every datacenter (redundantly), it is likely that both sides of a partition would continue to enjoy accurate time. Individual nodes however need network connectivity to the masters, and without it their clocks will drift. Thus, during a partition their intervals slowly grow wider over time, based on bounds on the rate of local clock drift. Operations depending on TrueTime, such as Paxos leader election or transaction commits, thus have to wait a little longer, but the operation still completes (assuming the 2PC and quorum communication are working).
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Adjusting Bitset for graph : SHORT REPORT / NOTESSubhajit Sahu
Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is commonly used for efficient graph computations. Unfortunately, using CSR for dynamic graphs is impractical since addition/deletion of a single edge can require on average (N+M)/2 memory accesses, in order to update source-offsets and destination-indices. A common approach is therefore to store edge-lists/destination-indices as an array of arrays, where each edge-list is an array belonging to a vertex. While this is good enough for small graphs, it quickly becomes a bottleneck for large graphs. What causes this bottleneck depends on whether the edge-lists are sorted or unsorted. If they are sorted, checking for an edge requires about log(E) memory accesses, but adding an edge on average requires E/2 accesses, where E is the number of edges of a given vertex. Note that both addition and deletion of edges in a dynamic graph require checking for an existing edge, before adding or deleting it. If edge lists are unsorted, checking for an edge requires around E/2 memory accesses, but adding an edge requires only 1 memory access.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Experiments with Primitive operations : SHORT REPORT / NOTESSubhajit Sahu
This includes:
- Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
- Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
- Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
- Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...Subhajit Sahu
Below are the important points I note from the 2020 paper by Martin Grohe:
- 1-WL distinguishes almost all graphs, in a probabilistic sense
- Classical WL is two dimensional Weisfeiler-Leman
- DeepWL is an unlimited version of WL graph that runs in polynomial time.
- Knowledge graphs are essentially graphs with vertex/edge attributes
ABSTRACT:
Vector representations of graphs and relational structures, whether handcrafted feature vectors or learned representations, enable us to apply standard data analysis and machine learning techniques to the structures. A wide range of methods for generating such embeddings have been studied in the machine learning and knowledge representation literature. However, vector embeddings have received relatively little attention from a theoretical point of view.
Starting with a survey of embedding techniques that have been used in practice, in this paper we propose two theoretical approaches that we see as central for understanding the foundations of vector embeddings. We draw connections between the various approaches and suggest directions for future research.
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTESSubhajit Sahu
https://gist.github.com/wolfram77/54c4a14d9ea547183c6c7b3518bf9cd1
There exist a number of dynamic graph generators. Barbasi-Albert model iteratively attach new vertices to pre-exsiting vertices in the graph using preferential attachment (edges to high degree vertices are more likely - rich get richer - Pareto principle). However, graph size increases monotonically, and density of graph keeps increasing (sparsity decreasing).
Gorke's model uses a defined clustering to uniformly add vertices and edges. Purohit's model uses motifs (eg. triangles) to mimick properties of existing dynamic graphs, such as growth rate, structure, and degree distribution. Kronecker graph generators are used to increase size of a given graph, with power-law distribution.
To generate dynamic graphs, we must choose a metric to compare two graphs. Common metrics include diameter, clustering coefficient (modularity?), triangle counting (triangle density?), and degree distribution.
In this paper, the authors propose Dygraph, a dynamic graph generator that uses degree distribution as the only metric. The authors observe that many real-world graphs differ from the power-law distribution at the tail end. To address this issue, they propose binning, where the vertices beyond a certain degree (minDeg = min(deg) s.t. |V(deg)| < H, where H~10 is the number of vertices with a given degree below which are binned) are grouped into bins of degree-width binWidth, max-degree localMax, and number of degrees in bin with at least one vertex binSize (to keep track of sparsity). This helps the authors to generate graphs with a more realistic degree distribution.
The process of generating a dynamic graph is as follows. First the difference between the desired and the current degree distribution is calculated. The authors then create an edge-addition set where each vertex is present as many times as the number of additional incident edges it must recieve. Edges are then created by connecting two vertices randomly from this set, and removing both from the set once connected. Currently, authors reject self-loops and duplicate edges. Removal of edges is done in a similar fashion.
Authors observe that adding edges with power-law properties dominates the execution time, and consider parallelizing DyGraph as part of future work.
My notes on shared memory parallelism.
Shared memory is memory that may be simultaneously accessed by multiple programs with an intent to provide communication among them or avoid redundant copies. Shared memory is an efficient means of passing data between programs. Using memory for communication inside a single program, e.g. among its multiple threads, is also referred to as shared memory [REF].
Application Areas of Community Detection: A Review : NOTESSubhajit Sahu
This is a short review of Community detection methods (on graphs), and their applications. A **community** is a subset of a network whose members are *highly connected*, but *loosely connected* to others outside their community. Different community detection methods *can return differing communities* these algorithms are **heuristic-based**. **Dynamic community detection** involves tracking the *evolution of community structure* over time.
https://gist.github.com/wolfram77/09e64d6ba3ef080db5558feb2d32fdc0
Communities can be of the following **types**:
- Disjoint
- Overlapping
- Hierarchical
- Local.
The following **static** community detection **methods** exist:
- Spectral-based
- Statistical inference
- Optimization
- Dynamics-based
The following **dynamic** community detection **methods** exist:
- Independent community detection and matching
- Dependent community detection (evolutionary)
- Simultaneous community detection on all snapshots
- Dynamic community detection on temporal networks
**Applications** of community detection include:
- Criminal identification
- Fraud detection
- Criminal activities detection
- Bot detection
- Dynamics of epidemic spreading (dynamic)
- Cancer/tumor detection
- Tissue/organ detection
- Evolution of influence (dynamic)
- Astroturfing
- Customer segmentation
- Recommendation systems
- Social network analysis (both)
- Network summarization
- Privary, group segmentation
- Link prediction (both)
- Community evolution prediction (dynamic, hot field)
<br>
<br>
## References
- [Application Areas of Community Detection: A Review : PAPER](https://ieeexplore.ieee.org/document/8625349)
Survey for extra-child-process package : NOTESSubhajit Sahu
Useful additions to inbuilt child_process module.
📦 Node.js, 📜 Files, 📰 Docs.
Please see attached PDF for literature survey.
https://gist.github.com/wolfram77/d936da570d7bf73f95d1513d4368573e
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTERSubhajit Sahu
For the PhD forum an abstract submission is required by 10th May, and poster by 15th May. The event is on 30th May.
https://gist.github.com/wolfram77/692d263f463fd49be6eb5aa65dd4d0f9
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...Subhajit Sahu
For the PhD forum an abstract submission is required by 10th May, and poster by 15th May. The event is on 30th May.
https://gist.github.com/wolfram77/1c1f730d20b51e0d2c6d477fd3713024
Can you fix farming by going back 8000 years : NOTESSubhajit Sahu
1. Human population didn't explode, but plateued.
2. Fertilizer prices are going to the sky.
3. Farmers are looking for alternatives such as animal waste (manure) or even human waste.
4. Manure prices are also going up.
5. Switching to organic farming not an option.
https://gist.github.com/wolfram77/49067fc3ddc1ba2e1db4f873056fd88a
1. Webpages tend to behave as authorities or hubs.
2. An authority represents an research thesis, and a hub represents an encyclopedia.
3. Each page has an authority and a hub score.
4. The graph is based on query, included pointed to and from pages.
5. Authority score is the sum of scores of all hubs pointing to it.
6. Hub score is the sum of scores of all authorities is pointing to.
7. Score are normalized with L2-norm in each iteration (root of sum of squares).
8. Needs to be performed at query time.
9. Two scores are returned, instead of just one.
https://gist.github.com/wolfram77/3d9ef6c5a5b63f53caabce4812c7ea81
Basic Computer Architecture and the Case for GPUs : NOTESSubhajit Sahu
Computer architectures are facing issues:
Memory latencies are far higher.
Benefits from instruction level parallelism (ILP) is reducing.
With increasing clock rates, power consumption is increasing.
Increasing complexity with multi-stage pipelines, intermediate buffers, multi-level caches, out-of-order execution, branch prediction, ...
GPUs are parallel computer architectures that are good at some tasks, not so good at others. Running routines with high arithmetic intensity with overlapped memory access is the preferred approach. They may be unsuitable for irregular algorithms, where it is difficult to get high efficiency due to the high latency of accesses. They are less versatile compared to CPUs, using SIMD parallelism, and are dense compute-wise (per currency). NVIDIA's CUDA programming model enables GPUs to be used for general-purpose computing, and hence the term GPGPU.
GPU Architectural, Programming, and Performance Models presentation at PPoPP, 2010, Bangalore, India.
By Prof. Kishore Kothapalli with Prof. P. J. Narayanan and Suryakant Patidar.
https://gist.github.com/wolfram77/43a6660121eef45b78c10d4e652dad6c
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDESSubhajit Sahu
For the IPDPS ParSocial event a presentation submission is required by 15th May. The event is on 3rd June.
https://gist.github.com/wolfram77/51b15ca09eb28f6909673a2deb1a314d
DYNAMIC BATCH PARALLEL
ALGORITHMS FOR UPDATING
PAGERANK
Subhajit Sahut, Kishore Kothapallit and Dip Sankar Banerjeet
tInternational Institute of Information Technology Hyderabad, India.
tIndian Institute of Technology Jodhpur, India.
subhajit.sahu@research. ,kkishore@iiit.ac.in, dipsankarb@iitj.ac.in
This work is partially supported by a grant from the Department of Science and Technology (DST), India, under the
National Supercomputing Mission (NSM) R&D in Exascale initiative vide Ref. No: DST/NSM/R&D Exascale/2021/16.
FACEBOOK 15 TAKING A PAGE OUT
OF GOOGLE’S PLAYBOOK 10 STOP
FAKE NEWS FROM GOING VIRAL
PUBLISHED APR 2015 BY SALVADOR RODRIGUEZ
Click-Gap: When is Facebook
is driving disproportionate
amounts of traffic to
websites.
Effort to rid fakes news
from Facebook’s services.
Is a website relying on
Facebook to drive
significant traffic, but not
well ranked by the rest of
the web?
Also News Citation Graph.
PAGERANK APPLICATIONS
Ranking of websites.
Measuring scientific impact of researchers.
Finding the best teams and athletes.
Ranking companies by talent concentration.
Predicting road/foot traffic in urban spaces.
Analysing protein networks.
Finding the most authoritative news sources
Identifying parts of brain that change jointly.
Toxic waste management.
PAGERANK APPLICATIONS
Debugging complex software systems (Moni torRank)
Finding the most original writers (BookRank)
Finding topical authorities (TwitterRank)
WHAT IS PAGERANK
l—-d
Plu = Cus + ——
UCIiNny
Pru
u->v = (1-—d) x
“us ( ) outdegy,
PageRank is a lLink-analysis algorithm.
By Larry Page and Sergey Brin in 1996.
For ordering information on the web.
Represented with a random-surfer model.
Rank of a page is defined recursively.
Calculate iteratively with power-iteration.
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
Fast Incremental Community Detection on Dynamic Graphs : NOTES
1. Fast Incremental Community Detection
on Dynamic Graphs
Anita Zakrzewska(B)
and David A. Bader
Computational Science and Engineering, Georgia Institute of Technology,
Atlanta, GA, USA
azakrzewska3@gatech.edu
Abstract. Community detection, or graph clustering, is the problem of
finding dense groups in a graph. This is important for a variety of applica-
tions, from social network analysis to biological interactions. While most
work in community detection has focused on static graphs, real data
is usually dynamic, changing over time. We present a new algorithm
for dynamic community detection that incrementally updates clusters
when the graph changes. The method is based on a greedy, modularity
maximizing static approach and stores the history of merges in order to
backtrack. On synthetic graph tests with known ground truth clusters,
it can detect a variety of structural community changes for both small
and large batches of edge updates.
Keywords: Graphs · Community detection · Graph clustering ·
Dynamic graphs
1 Introduction
Graphs are used to represent a variety of relational data, such as internet traffic,
social networks, financial transactions, and biological data. A graph G = (V, E)
is composed of a set of vertices V , which represent entities, and edges E, which
represent relationships between entities. The problem of finding dense, highly
connected groups of vertices in a graph is called community detection or graph
clustering. In cases where communities are non-overlapping and cover the entire
graph, the term community partitioning may also be used. Many real world
graphs contain communities, such as groups of friends on social networks, lab
colleagues in a co-publishing graph, or related proteins in biological networks.
In this work, we present a new algorithm for incremental community detection
on dynamic graphs.
1.1 Related Work
Most work in community detection has been done for static, unchanging graphs.
Popular methods include hierarchical vertex agglomeration, edge agglomeration,
c
Springer International Publishing Switzerland 2016
R. Wyrzykowski et al. (Eds.): PPAM 2015, Part I, LNCS 9573, pp. 207–217, 2016.
DOI: 10.1007/978-3-319-32149-3 20
2. 208 A. Zakrzewska and D.A. Bader
clique percolation, spectral algorithms, and label propagation. Details of a vari-
ety of approaches can be found in the survey by Fortunato [11]. The quality of
a community C is often measured using a fitness function. Modularity, shown in
Eq. 1, is a popular measure that compares the number of intra-community edges
to the expected number under a random null model [16]. Let kC
in be the sum of
edge weights for edges with both endpoint vertices in C and kC
out be the sum of
edge weights for edges with only one endpoint in C.
Q(C) =
1
|E|
(kC
in −
(kC
in + kC
out)2
4 |E|
) (1)
CNM [6] is a hierarchical, agglomerative algorithm that greedily maximizes
modularity. Each vertex begins as its own singleton community. Communities
are then sequentially merged by contracting the edge resulting in the greatest
increase in modularity. Riedy et al. present a parallel version of this algorithm
in which a weighted maximal matching on edges is used [19].
Real graphs evolve over time. Many works study community detection on
dynamic graphs by creating a sequence of data snapshots over time and cluster-
ing each snapshot. Chakrabarti et al. [5] introduce evolutionary clustering and
GraphScope uses block modeling and information compression principles [21].
There are many other works which study the evolution of communities in chang-
ing graphs [9,14,22]. However, these are not incremental methods. A second cat-
egory of algorithms incrementally updates previously computed clusters when
the underlying graph changes. By reusing previous computations, incremental
algorithms can run faster, which is critical for scaling to large datasets. This is
especially useful for applications in which community information must be kept
up to date for a quickly changing graph and low latency is desired. Moreover,
incremental approaches can result in smoother community transitions.
Duan et al. [10] present an incremental algorithm for finding k-clique com-
munities [18] in a changing graph and Ning et al. [17] provide a dynamic algo-
rithm for spectral partitioning. Other algorithms search for emerging events in
microblog streams [3,4]. Dinh et al. [7,8] present an incremental updating algo-
rithm to modularity based CNM. First, standard CNM is run on the initial
graph. For each graph update, every vertex that is an endpoint of an inserted
edge is removed from its community and placed into a new singleton community.
The standard, static CNM algorithm is then restarted and communities may be
further merged to increase modularity.
The work most closely related to ours is by Görke et al. [13], in which the
authors also present two dynamic algorithms that update a CNM based clus-
tering. In their first algorithm, endpoint vertices of newly updated edges, along
with a local neighborhood, are freed from their current cluster and CNM is
restarted. The second algorithm stores the dendrogram of cluster merges formed
by the initial CNM clustering. When the graph changes, the algorithm back-
tracks cluster merges using this dendrogram until specified conditions are met.
If an intra-community edge is inserted, it backtracks till the two endpoint ver-
tices are in separate communities. If an intra-community edge is deleted or an
3. Fast Incremental Community Detection on Dynamic Graphs 209
inter-community edge is inserted, it backtracks till both endpoints are their own
singleton communities. No backtracking occurs on a inter-community edge dele-
tion. After communities have been modified, CNM is restarted.
1.2 Contribution
In this work we present two incremental algorithms that update communities
when the underlying graph changes. The first version, BasicDyn is similar to
previous work by Görke et al. [13]. Our second algorithm, FastDyn, is a modi-
fication of BasicDyn that runs faster. We use various synthetic dynamic graphs
to test the quality of our methods. Both BasicDyn and FastDyn are able to
detect community changes.
backtracking Re-contracting
Roots marked represent
actual communities output for G
Fig. 1. The left image shows ForestG created after agglomeration. The center image
shows the ForestG after backtracking undos merges using BasicDyn or FastDyn.
Right shows after re-starting agglomeration using Agglomerate or FastAgglomerate.
2 Algorithm
The basis of our algorithm is the parallel version of CNM presented by Riedy
et al. [19]. We will denote this by Agglomerate. Each vertex in the graph G is
initialized as its own singleton community, forming the community graph Gcomm.
Note that while Gcomm is initially the same as G, each vertex of Gcomm represents
a group or cluster of vertices in G. The weighted degree of a vertex c ∈ Gcomm
corresponds to kc
out and the weight assigned to c (initially 0), corresponds to the
intra-community edges kc
in.
Next, the following three steps are repeated until the termination criterion
is reached. (1) For each edge in Gcomm, the change in modularity resulting from
an edge contraction is computed. Given an edge (c1, c2) in Gcomm, the change
in modularity is given by the value ΔQ(c1, c2) = Q(c1 ∪ c2) − (Q(c1) + Q(c2)),
which can be easily computed using Eq. 1. This score is assigned to the edge.
(2) A weighted maximal matching is computed on the edges of Gcomm using
these scores. (3) Edges in the maximal matching are contracted in parallel. An
edge contraction merges the two endpoint vertices into one vertex. Termination
occurs when no edge contraction results in a great enough modularity increase.
All vertices that have been merged together by edge contractions correspond
to a single community and so by repeatedly merging vertices in Gcomm, we create
ever larger communities in G. The history of these contractions can be stored as a
forest of merges. Pairs of vertices in Gcomm that were merged together by an edge
4. 210 A. Zakrzewska and D.A. Bader
contraction are children of the same parent. The root of each tree corresponds
to the final communities found for G. We refer to this forest by ForestG. Edge
contraction scores can be computed in O(|E|) time. If the maximal matching
is also computed in O(|E|), then the complexity of Agglomerate is O(K ∗ |E|),
where K is the number of contraction phases. In the best case, the number of
communities halves in each phase and the complexity is O(log(|V |)∗|E|). In the
worst case, the graph has a star formation and only one edge contracts in each
phase, resulting in a complexity of O(|V | ∗ |E|).
Our dynamic algorithm begins with the initial graph G and runs the Agglom-
erate algorithm to create ForestG. We then apply a stream of edge insertions and
deletions to G. These updates can be applied in batches of any size. Small batch
sizes are used for low latency applications, while aggregating larger batches can
result in a relatively lower overall running time. For each batch of updates, our
algorithm uses the history of merges in ForestG and undoes certain merges that
previously had taken place. This breaks apart certain previous communities and
un-contracts corresponding vertices in Gcomm. Next, the Agglomerate algorithm
is restarted and new modularity increasing merges may be performed. Figure 1
shows this process. Next we describe two versions of the dynamic algorithm,
each of which follows the basic steps described.
BasicDyn: When a new batch of edge insertions and deletions is processed, each
vertex that is an endpoint of any edge in the batch is marked. Merges are then
backtracked in ForestG until each marked vertex is in its own singleton com-
munity. After backtracking, Agglomerate is restarted to re-merge communities.
FastDyn: The FastDyn method is based on BasicDyn, but has two modi-
fications that improve computational speed. The first is that backtracking of
previous merges in ForestG occurs under more stringent conditions. Merges are
only undone if the quality of the merge, as measured by the induced change
in modularity, has significantly decreased compared to when the merge initially
took place. Because merges may be performed in parallel using a weighted max-
imal matching on edges, not every vertex c ∈ Gcomm merges with its high-
est scoring neighbor best(c). When a vertex c ∈ Gcomm merges at time ti,
we store both the score of that merge ΔQti
(c, match(c)) and the score of the
highest possible merge c could have participated in ΔQti
(c, bestti
(c)). After a
batch of edge updates is applied at time tcurr and the graph G changes, the
best and actual merge scores are rechecked. The score of a merge has sig-
nificantly decreased if: ΔQti
(c, bestti
(c)) − ΔQti
(c, match(c)) + threshold
ΔQtcurr
(c, besttcurr
(c)) − ΔQtcurr
(c, match(c)). This occurs if either the score
of the merge taken has decreased or the score of the best possible merge has
increased.
For each merge evaluated, FastDyn checks the above condition and if it
is met, ForestG is backtracked to undo the merge. Merges that are evaluated
are those that occur between clusters that contain at least one member that is
directly affected by the batch of edge updates (is an endpoint of an newly inserted
5. Fast Incremental Community Detection on Dynamic Graphs 211
Data: Gcomm and ForestG,prev
1 while contractions occuring do
2 for v ∈ Gcomm in parallel do
3 paired[v] = 0;
4 match[v] = v;
5 end
6 while matches possible do
7 for v ∈ Gcomm s.t. paired[v] == 0 in parallel do
8 max = 0, match[v] = v;
9 for neighbors w of v s.t. paired[w] == 0 do
10 if ΔQ(v, w) max then
11 max = ΔQ(v, w);
12 match[v] = w;
13 end
14 end
15 end
16 for v ∈ Gcomm s.t. paired[v] == 0 in parallel do
17 if match[v] 0 and (match[match[v]] == v or
prevroot[v] == prevroot[match[v]]) then
18 paired[v] = 1;
19 end
20 end
21 end
22 for v ∈ Gcomm in parallel do
23 Pointer jump match array to root;
24 dest[v]=root;
25 end
26 Contract in parallel all vertices in Gcomm with same value of dest;
27 end
Algorithm 1. The FastAgglomerate algorithm used by FastDyn. prevroot
labels each vertex by its root in ForestG,prev, which corresponds to its previous
community membership.
or deleted edge). In ForestG, this corresponds to leaf nodes that represent such
endpoint vertices as well as any upstream parent of such a leaf node.
The second modification of FastDyn occurs in the re-merging step. Instead
of running Agglomerate after backtracking merges, FastDyn runs a modified
method, which we will call FastAgglomerate. Because Agglomerate only allows
two vertices to merge together in a contraction phase, star-like formations con-
taining n vertices take n phases to merge, which results in a very long running
time. We have found that backtracking merges creates such structures so that
running Agglomerate after backtracking results in a very long tail of contrac-
tions, each of which only contacts a small number of edges. We address this
problem with FastAgglomerate, which uses information about previous merges
to speed up contractions. Instead of performing a maximal edge matching,
FastAgglomerate allows more than two vertices of Gcomm to contract together
if in the previous time step these vertices eventually ended up contracted into
6. 212 A. Zakrzewska and D.A. Bader
the same community (if they correspond to nodes in ForestG that previously
shared the same root). In the static case, merging several vertices together in
one contraction phase could lead to deteriorating results. FastAgglomerate is
able to do this, however, because it uses information from the merges of the pre-
vious time step. Intuitively, merges that previously occurred are more likely to
be acceptable later. Pseudocode is given in Algorithm 1. In the worst case, both
BasicDyn and FastDyn will backtrack enough to undo many or most merges.
In this case, the running time is the same as that of Agglomerative. For small
updates, the dynamic algorithms run faster in practice. FastDyn decreases the
number of contraction steps needed by allowing several vertices to merge in a
single step.
Fig. 2. Results for the synthetic test of splitting and merging clusters are shown. The
actual and detected number of communities across time are plotted for batch sizes of 1,
10, 100, and 1000. An alternative approach is shown and labeled “NoHistory” (Color
figure online).
3 Results
Dynamic community detection is only useful if the algorithm is able to detect
structural changes. We evaluate our algorithm using two tests. First, we form a
ring of 32 communities, each with 32 vertices. Each vertex in a community is
on average connected to 16 other vertices in the cluster. Over time, edges are
both inserted and removed so that each of the 32 clusters consecutively splits
in two. Once each community has been split, edges are then updated so that
7. Fast Incremental Community Detection on Dynamic Graphs 213
pairs of clusters merge together (different pairs than at the start), resulting in
32 clusters again. Figure 2 shows the number of communities detected over time
by BasicDyn with the solid blue line, the number detected by FastDyn with
the green dots and dashes, and the actual number with the dashed red line.
Because graphs are randomly generated, each curve is the average of 10 runs.
The x-axis represents the number of updates that have been made to the graph.
Batch sizes used are 1,10,100, and 1000. A batch size of n means that n edge
insertions and deletions are accumulated before the communities are incremen-
tally updated. Since the curves rise and then fall, both algorithms are able to
detect first splits and then merges. As the batch size increases, the performance
of FastDyn improves, which is expected because a larger batch size allows more
communities to be broken apart during backtracking and the algorithm has more
options for how to re-contract communities. Therefore, the algorithm can output
results closer to those that would be found with a static recomputation begin-
ning from scratch. We also compare the ability of FastDyn and BasicDyn to
detect community changes to the dynamic algorithm from [20], which also uses
greedy modularity maximization. After a batch of edges is inserted and removed,
this method removes each vertex with an updated edge from its current com-
munity into its own singelton community. Agglomeration is then restarted. The
results of this alternative approach are shown in Fig. 2 with the purple dotted
line labeled “NoHistory”. Unlike the backtracking approaches, community splits
and merges are only detected for a large batch size of 1000.
Fig. 3. Results for FastDyn on the second synthetic test using a shifting stochastic
block model are shown. The y-axis shows the quality of output and x-axis shows the
probability of an intra-community edge.
The second test is run on a graph formed using a stochastic block model [15].
We generate two separate graphs, blockA and blockB, each with two communi-
ties in which the probability of an intra-community edge and that of an inter-
community edge are both specified. BlockB uses the same parameters as blockA,
8. 214 A. Zakrzewska and D.A. Bader
but with the vertex members of each community changed. Half of the vertices
that were in community one in blockA move to the second community of blockB
and half of those that were in community two of blockA move to community one
of blockB. We begin by running Agglomerate on the entire blockA. Then, in a
stream of updates, we remove edges from blockA and add edges from blockB. At
the end of the stream, the graph will equal blockB. This test differs from the first
test in an important way. In test one, the algorithm could always detect some
number of distinct communities. Here, distinct communities only exist near the
beginning and end of the stream, while in the middle the graph is a mixture of
blockA and blockB. The updates to the graph in this test are randomly distrib-
uted across all vertices so that much of ForestG may be affected. In contrast,
the updates in test one were applied to one cluster at a time. Lastly, test one
specifically tests splits and merges, while test two uses a general rearrangement.
Figure 3 shows how well FastDyn is able to distinguish the two communities of
blockB at the end of the stream. Communities are detected for each batch, but we
only evaluate at the end of the stream since that is when the graph returns to a
known block structure. The x-axis varies the probability of an intra-community
edge. Unsurprisingly, the algorithm performs better when the communities in
blockA and blockB are denser. The y-axis shows the correctness of FastDyn,
which is determined as follows. For each vertex v in the graph, we compare its
actual community in blockB to its community output by FastDyn. The Jac-
card index then measures the normalized overlap between the two sets, with 1
measuring perfect overlap, and 0 no overlap. The average Jaccard index across
all vertices measures the algorithm correctness. Because blockA and blockB are
randomly generated, each point plotted is an average of 50 separate runs on
different graphs. The two synthetic tests described above show that FastDyn
is able to distinguish structural community changes. Synthetic tests are useful
because communities have ground truth and known changes can be applied.
Table 1. The time to compute the initial static community detection and average time
to update with incremental algorithms is shown.
Graph Initial Batch size FastDyn BasicDyn
Facebook 3.5 s 10 0.1 s 0.2 s
1000 0.7 s 4.9 s
DBLP 122 s 10 7.1 s 21.7 s
1000 72 s 101 s
Slashdot 45 s 10 0.6 s 2.1 s
1000 7.5 s 289 s
Table 1 shows the running time of the initial static community detection used
by the dynamic algorithms and the average running time to process a batch
of size 10 and 1000 edges for FastDyn and BasicDyn. The algorithms were
tested on a Slashdot graph [12], a DBLP graph [1,24], and a Facebook wall post
graph [2,23]. The FastDyn algorithm is faster than BasicDyn for all graphs and
9. Fast Incremental Community Detection on Dynamic Graphs 215
batch sizes. The speedup of using a dynamic algorithm is greater for small batch
sizes because the incremental algorithms perform less work, while running time
for static computation remains the same. Therefore, an incremental approach
is most useful for monitoring applications where community results must be
updated after a small number of data changes.
4 Conclusion
In this work, we present two incremental community detection algorithms for
dynamic graphs. Both use hierarchical, modularity maximizing clustering as
their base and update results by storing the history of previous merges in a
forest. While BasicDyn is similar in nature to a previous approach, FastDyn
has two improvements that significantly improve running time while maintaining
clustering quality. FastDyn is faster than BasicDyn on all real graphs tested
for both small and large update batch sizes. Incrementally updating allows for
speedup over recalculating from scratch whenever the graph changes, especially
for small batch sizes when low latency updates are needed. While the algorithm
is based off a parallel static algorithm, and can be parallelized, the focus of this
work was to study improvements in performance due to incremental updates.
Parallel scalability is a topic for future work.
Acknowledgments. The work depicted in this paper was sponsored by Defense
Advanced Research Projects Agency (DARPA) under agreement #HR0011-13-2-0001.
The content, views and conclusions presented in this document do not necessarily reflect
the position or the policy of DARPA or the U.S. Government, no official endorsement
should be inferred.
References
1. Dblp co-authorship network dataset – KONECT, May 2015. http://konect.
unikoblenz.de/networks/com-dblp
2. Facebook wall posts network dataset – KONECT, May 2015. http://konect.
unikoblenz.de/networks/facebook-wosn-wall
3. Agarwal, M.K., Ramamritham, K., Bhide, M.: Real time discovery of dense clus-
ters in highly dynamic graphs: identifying real world events in highly dynamic
environments. Proc. VLDB Endowment 5(10), 980–991 (2012)
4. Angel, A., Sarkas, N., Koudas, N., Srivastava, D.: Dense subgraph maintenance
under streaming edge weight updates for real-time story identification. Proc. VLDB
Endowment 5(6), 574–585 (2012)
5. Chakrabarti, D., Kumar, R., Tomkins, A.: Evolutionary clustering. In: Proceedings
of the 12th ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining, pp. 554–560. ACM (2006)
6. Clauset, A., Newman, M.E., Moore, C.: Finding community structure in very large
networks. Physical Rev. E 70(6), 066111 (2004)
10. 216 A. Zakrzewska and D.A. Bader
7. Dinh, T.N., Shin, I., Thai, N.K., Thai, M.T., Znati, T.: A general approach for mod-
ules identification in evolving networks. In: Hirsch, M.J., Pardalos, P.M., Murphey,
R. (eds.) Dynamics of Information Systems. Springer Optimization and Its Appli-
cations, vol. 40, pp. 83–100. Springer, Heidelberg (2010)
8. Dinh, T.N., Xuan, Y., Thai, M.T.: Towards social-aware routing in dynamic com-
munication networks. In: 2009 IEEE 28th International Performance Computing
and Communications Conference (IPCCC), pp. 161–168. IEEE (2009)
9. Duan, D., Li, Y., Jin, Y., Lu, Z.: Community mining on dynamic weighted directed
graphs. In: Proceedings of the 1st ACM International Workshop on Complex Net-
works Meet Information Knowledge Management, pp. 11–18. ACM (2009)
10. Duan, D., Li, Y., Li, R., Lu, Z.: Incremental k-clique clustering in dynamic social
networks. Artif. Intell. Rev. 38(2), 129–147 (2012)
11. Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3), 75–174 (2010)
12. Gómez, V., Kaltenbrunner, A., López, V.: Statistical analysis of the social net-
work and discussion threads in slashdot. In: Proceedings of the 17th International
Conference on World Wide Web, pp. 645–654. ACM (2008)
13. Görke, R., Maillard, P., Schumm, A., Staudt, C., Wagner, D.: Dynamic graph
clustering combining modularity and smoothness. J. Exp. Algorithmics (JEA) 18,
1–5 (2013)
14. Greene, D., Doyle, D., Cunningham, P.: Tracking the evolution of communities in
dynamic social networks. In: 2010 International Conference on Advances in Social
Networks Analysis and Mining (ASONAM), pp. 176–183. IEEE (2010)
15. Holland, P.W., Laskey, K.B., Leinhardt, S.: Stochastic blockmodels: first steps.
Soc. Netw. 5(2), 109–137 (1983)
16. Newman, M.E., Girvan, M.: Finding and evaluating community structure in net-
works. Phys. Rev. E 69(2), 026113 (2004)
17. Ning, H., Xu, W., Chi, Y., Gong, Y., Huang, T.S.: Incremental spectral clustering
by efficiently updating the eigen-system. Pattern Recogn. 43(1), 113–127 (2010)
18. Palla, G., Derényi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community
structure of complex networks in nature and society. Nature 435(7043), 814–818
(2005)
19. Riedy, E.J., Meyerhenke, H., Ediger, D., Bader, D.A.: Parallel community detec-
tion for massive graphs. In: Wyrzykowski, R., Dongarra, J., Karczewski, K.,
Waśniewski, J. (eds.) PPAM 2011, Part I. LNCS, vol. 7203, pp. 286–296. Springer,
Heidelberg (2012)
20. Riedy, J., Bader, D., et al.: Multithreaded community monitoring for massive
streaming graph data. In: 2013 IEEE 27th International Parallel and Distrib-
uted Processing Symposium Workshops PhD Forum (IPDPSW), pp. 1646–1655.
IEEE (2013)
21. Sun, J., Faloutsos, C., Papadimitriou, S., Yu, P.S.: Graphscope: parameter-free
mining of large time-evolving graphs. In: Proceedings of the 13th ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, pp. 687–696.
ACM (2007)
22. Tantipathananandh, C., Berger-Wolf, T., Kempe, D.: A framework for commu-
nity identification in dynamic social networks. In: Proceedings of the 13th ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.
717–726. ACM (2007)
11. Fast Incremental Community Detection on Dynamic Graphs 217
23. Viswanath, B., Mislove, A., Cha, M., Gummadi, K.P.: On the evolution of user
interaction in facebook. In: Proceedings of the 2nd ACM Workshop on Online
Social Networks, pp. 37–42. ACM (2009)
24. Yang, J., Leskovec, J.: Defining and evaluating network communities based on
ground-truth. Nature 42(1), 181–213 (2015)