•

0 likes•4 views

For the PhD forum an abstract submission is required by 10th May, and poster by 15th May. The event is on 30th May. https://gist.github.com/wolfram77/1c1f730d20b51e0d2c6d477fd3713024

Report

Share

Report

Share

Download to read offline

Dynamic Batch Parallel Algorithms for Updating PageRank : POSTER

This paper presents two algorithms for efficiently computing PageRank on dynamically updating graphs in a batched manner: DynamicLevelwisePR and DynamicMonolithicPR. DynamicLevelwisePR processes vertices level-by-level based on strongly connected components and avoids recomputing converged vertices on the CPU. DynamicMonolithicPR uses a full power iteration approach on the GPU that partitions vertices by in-degree and skips unaffected vertices. Evaluation on real-world graphs shows the batched algorithms provide speedups of up to 4000x over single-edge updates and outperform other state-of-the-art dynamic PageRank algorithms.

HyPR: Hybrid Page Ranking on Evolving Graphs (NOTES)

Highlighted notes on A Parallel Algorithm Template for Updating Single-Source Shortest Paths in Large-Scale Dynamic Networks.
While doing research work under Prof. Dip Banerjee, Prof, Kishore Kothapalli.
In Hybrid Pagerank the vertices are divided in 3 groups, V_old, V_border, V_new. Scaling for old, border vertices is N/N_new, and 1/N_new for V_new (i do this too ). Then PR is run only on V_border, V_new.
"V_border which is the set of nodes which have edges in Bi connecting V_old and V_new and is reachable using a breadth first traversal."
Does that mean V_border = V_batch(i) ∩ V_old? BFS from where?
"We can assume that the new batch of updates is topologically sorted since the PR scores of the new nodes in Bi is guaranteed to be lower than those in Co."
Is sum(PR) in V_old > sum(PR) in V_new always?
"For performing the comparisons with GPMA and GPMA+, we configure the experiment to run HyPR on the same platform as used in [1] which is a Intel Xeon CPU connected to a Titan X Pascal GPU, and also the same datasets."
Old GPUs are going to be slower ...
Like we were discussing last time, it is not possible to scale old ranks, and skip the unchanged components (or here V_old). Please check this simple counter example that shows skipping leads to incorrect ranks.
https://github.com/puzzlef/pagerank-levelwise-skip-unchanged-components
Another omission in the paper is that Hybrid PR (just like STICD) wont work for graphs which have dead ends. This is a pre-condition for the algorithm.

Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES

For the IPDPS ParSocial event a presentation submission is required by 15th May. The event is on 3rd June.
https://gist.github.com/wolfram77/51b15ca09eb28f6909673a2deb1a314d
DYNAMIC BATCH PARALLEL
ALGORITHMS FOR UPDATING
PAGERANK
Subhajit Sahut, Kishore Kothapallit and Dip Sankar Banerjeet
tInternational Institute of Information Technology Hyderabad, India.
tIndian Institute of Technology Jodhpur, India.
subhajit.sahu@research. ,kkishore@iiit.ac.in, dipsankarb@iitj.ac.in
This work is partially supported by a grant from the Department of Science and Technology (DST), India, under the
National Supercomputing Mission (NSM) R&D in Exascale initiative vide Ref. No: DST/NSM/R&D Exascale/2021/16.
FACEBOOK 15 TAKING A PAGE OUT
OF GOOGLE’S PLAYBOOK 10 STOP
FAKE NEWS FROM GOING VIRAL
PUBLISHED APR 2015 BY SALVADOR RODRIGUEZ
Click-Gap: When is Facebook
is driving disproportionate
amounts of traffic to
websites.
Effort to rid fakes news
from Facebook’s services.
Is a website relying on
Facebook to drive
significant traffic, but not
well ranked by the rest of
the web?
Also News Citation Graph.
PAGERANK APPLICATIONS
Ranking of websites.
Measuring scientific impact of researchers.
Finding the best teams and athletes.
Ranking companies by talent concentration.
Predicting road/foot traffic in urban spaces.
Analysing protein networks.
Finding the most authoritative news sources
Identifying parts of brain that change jointly.
Toxic waste management.
PAGERANK APPLICATIONS
Debugging complex software systems (Moni torRank)
Finding the most original writers (BookRank)
Finding topical authorities (TwitterRank)
WHAT IS PAGERANK
l—-d
Plu = Cus + ——
UCIiNny
Pru
u->v = (1-—d) x
“us ( ) outdegy,
PageRank is a lLink-analysis algorithm.
By Larry Page and Sergey Brin in 1996.
For ordering information on the web.
Represented with a random-surfer model.
Rank of a page is defined recursively.
Calculate iteratively with power-iteration.

Big Data, Bigger Analytics

This document discusses using Apache Spark for big data analytics at an insurance pricing and customer analytics company called Earnix. It summarizes Earnix's business problems modeling large customer behavior data, how Spark helps address performance issues with their existing 10GB datasets, and improvements made to Spark's MLlib machine learning library. These include adding statistical functionality like covariance estimation to logistic regression models and optimizing algorithms to run efficiently on Spark. Benchmark results show Spark providing scalability by reducing algorithm run times as more nodes are added.

STIC-D: algorithmic techniques for efficient parallel pagerank computation on...

Authors:
Paritosh Garg
Kishore Kothapalli
Publication:
ICDCN '16: Proceedings of the 17th International Conference on Distributed Computing and Networking. January 2016.
Article No.: 15 Pages 1–10
https://doi.org/10.1145/2833312.2833322

Decomposing image generation into layout priction and conditional synthesis

in this presentation you can learn how to decompose an image into layout and find the predictions. In this presentation , I mention all the data in very convenient way , I hope you can take it easy.
Thank you.

Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...

Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.

Accelerating Real Time Applications on Heterogeneous Platforms

In this paper we describe about the novel implementations of depth estimation from a stereo
images using feature extraction algorithms that run on the graphics processing unit (GPU) which is
suitable for real time applications like analyzing video in real-time vision systems. Modern graphics
cards contain large number of parallel processors and high-bandwidth memory for accelerating the
processing of data computation operations. In this paper we give general idea of how to accelerate the
real time application using heterogeneous platforms. We have proposed to use some added resources to
grasp more computationally involved optimization methods. This proposed approach will indirectly
accelerate a database by producing better plan quality.

Dynamic Batch Parallel Algorithms for Updating PageRank : POSTER

This paper presents two algorithms for efficiently computing PageRank on dynamically updating graphs in a batched manner: DynamicLevelwisePR and DynamicMonolithicPR. DynamicLevelwisePR processes vertices level-by-level based on strongly connected components and avoids recomputing converged vertices on the CPU. DynamicMonolithicPR uses a full power iteration approach on the GPU that partitions vertices by in-degree and skips unaffected vertices. Evaluation on real-world graphs shows the batched algorithms provide speedups of up to 4000x over single-edge updates and outperform other state-of-the-art dynamic PageRank algorithms.

HyPR: Hybrid Page Ranking on Evolving Graphs (NOTES)

Highlighted notes on A Parallel Algorithm Template for Updating Single-Source Shortest Paths in Large-Scale Dynamic Networks.
While doing research work under Prof. Dip Banerjee, Prof, Kishore Kothapalli.
In Hybrid Pagerank the vertices are divided in 3 groups, V_old, V_border, V_new. Scaling for old, border vertices is N/N_new, and 1/N_new for V_new (i do this too ). Then PR is run only on V_border, V_new.
"V_border which is the set of nodes which have edges in Bi connecting V_old and V_new and is reachable using a breadth first traversal."
Does that mean V_border = V_batch(i) ∩ V_old? BFS from where?
"We can assume that the new batch of updates is topologically sorted since the PR scores of the new nodes in Bi is guaranteed to be lower than those in Co."
Is sum(PR) in V_old > sum(PR) in V_new always?
"For performing the comparisons with GPMA and GPMA+, we configure the experiment to run HyPR on the same platform as used in [1] which is a Intel Xeon CPU connected to a Titan X Pascal GPU, and also the same datasets."
Old GPUs are going to be slower ...
Like we were discussing last time, it is not possible to scale old ranks, and skip the unchanged components (or here V_old). Please check this simple counter example that shows skipping leads to incorrect ranks.
https://github.com/puzzlef/pagerank-levelwise-skip-unchanged-components
Another omission in the paper is that Hybrid PR (just like STICD) wont work for graphs which have dead ends. This is a pre-condition for the algorithm.

Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES

For the IPDPS ParSocial event a presentation submission is required by 15th May. The event is on 3rd June.
https://gist.github.com/wolfram77/51b15ca09eb28f6909673a2deb1a314d
DYNAMIC BATCH PARALLEL
ALGORITHMS FOR UPDATING
PAGERANK
Subhajit Sahut, Kishore Kothapallit and Dip Sankar Banerjeet
tInternational Institute of Information Technology Hyderabad, India.
tIndian Institute of Technology Jodhpur, India.
subhajit.sahu@research. ,kkishore@iiit.ac.in, dipsankarb@iitj.ac.in
This work is partially supported by a grant from the Department of Science and Technology (DST), India, under the
National Supercomputing Mission (NSM) R&D in Exascale initiative vide Ref. No: DST/NSM/R&D Exascale/2021/16.
FACEBOOK 15 TAKING A PAGE OUT
OF GOOGLE’S PLAYBOOK 10 STOP
FAKE NEWS FROM GOING VIRAL
PUBLISHED APR 2015 BY SALVADOR RODRIGUEZ
Click-Gap: When is Facebook
is driving disproportionate
amounts of traffic to
websites.
Effort to rid fakes news
from Facebook’s services.
Is a website relying on
Facebook to drive
significant traffic, but not
well ranked by the rest of
the web?
Also News Citation Graph.
PAGERANK APPLICATIONS
Ranking of websites.
Measuring scientific impact of researchers.
Finding the best teams and athletes.
Ranking companies by talent concentration.
Predicting road/foot traffic in urban spaces.
Analysing protein networks.
Finding the most authoritative news sources
Identifying parts of brain that change jointly.
Toxic waste management.
PAGERANK APPLICATIONS
Debugging complex software systems (Moni torRank)
Finding the most original writers (BookRank)
Finding topical authorities (TwitterRank)
WHAT IS PAGERANK
l—-d
Plu = Cus + ——
UCIiNny
Pru
u->v = (1-—d) x
“us ( ) outdegy,
PageRank is a lLink-analysis algorithm.
By Larry Page and Sergey Brin in 1996.
For ordering information on the web.
Represented with a random-surfer model.
Rank of a page is defined recursively.
Calculate iteratively with power-iteration.

Big Data, Bigger Analytics

This document discusses using Apache Spark for big data analytics at an insurance pricing and customer analytics company called Earnix. It summarizes Earnix's business problems modeling large customer behavior data, how Spark helps address performance issues with their existing 10GB datasets, and improvements made to Spark's MLlib machine learning library. These include adding statistical functionality like covariance estimation to logistic regression models and optimizing algorithms to run efficiently on Spark. Benchmark results show Spark providing scalability by reducing algorithm run times as more nodes are added.

STIC-D: algorithmic techniques for efficient parallel pagerank computation on...

Authors:
Paritosh Garg
Kishore Kothapalli
Publication:
ICDCN '16: Proceedings of the 17th International Conference on Distributed Computing and Networking. January 2016.
Article No.: 15 Pages 1–10
https://doi.org/10.1145/2833312.2833322

Decomposing image generation into layout priction and conditional synthesis

in this presentation you can learn how to decompose an image into layout and find the predictions. In this presentation , I mention all the data in very convenient way , I hope you can take it easy.
Thank you.

Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...

Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.

Accelerating Real Time Applications on Heterogeneous Platforms

In this paper we describe about the novel implementations of depth estimation from a stereo
images using feature extraction algorithms that run on the graphics processing unit (GPU) which is
suitable for real time applications like analyzing video in real-time vision systems. Modern graphics
cards contain large number of parallel processors and high-bandwidth memory for accelerating the
processing of data computation operations. In this paper we give general idea of how to accelerate the
real time application using heterogeneous platforms. We have proposed to use some added resources to
grasp more computationally involved optimization methods. This proposed approach will indirectly
accelerate a database by producing better plan quality.

Early Benchmarking Results for Neuromorphic Computing

This document summarizes early benchmarking results for neuromorphic computing using Intel's Loihi chip. It finds that Loihi provides orders of magnitude gains over CPUs and GPUs for certain workloads that are directly trained on the chip or use novel bio-inspired algorithms. These include online learning, adaptive control, event-based vision and tactile sensing, constraint satisfaction problems, and nearest neighbor search. Larger networks and problems tend to provide greater performance gains with Loihi.

Neural Architecture Search: Learning How to Learn

Neural Architecture Search aims to automate the design of neural networks. The document discusses several papers that developed methods for neural architecture search using reinforcement learning and evolutionary algorithms. These methods led to the discovery of neural network cells that achieved state-of-the-art performance on image classification tasks when combined into larger networks. Later work explored ways to make neural architecture search more efficient and applicable to different tasks.

Ling liu part 02：big graph processing

This document discusses challenges and opportunities in parallel graph processing for big data. It describes how graphs are ubiquitous but processing large graphs at scale is difficult due to their huge size, complex correlations between data entities, and skewed distributions. Current computation models have problems with ghost vertices, too much interaction between partitions, and lack of support for iterative graph algorithms. New frameworks are needed to handle these graphs in a scalable way with low memory usage and balanced computation and communication.

SparkNet presentation

SparkNet implements a scalable, distributed algorithm to train deep neural networks that can be applied to existing batch processing frameworks like MapReduce and Spark.
Work by researchers at UC Berkeley.

IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...

This document presents a review of FPGA-based architectures for image capturing, processing, and display using a VGA monitor. It discusses using the Xilinx AccelDSP tool to develop the system on a Spartan 3E FPGA. The AccelDSP tool allows converting a MATLAB design into HDL for implementation on the FPGA. It summarizes the FPGA-based system architecture, which includes units for initialization, data transfer, image processing, and memory management. It then outlines the Xilinx AccelDSP design flow, which verifies the functionality at each stage of converting the floating-point MATLAB model to a fixed-point hardware implementation on the FPGA. The goal is to accelerate image processing applications using the parallel

A Survey of Machine Learning Methods Applied to Computer ...

This document discusses various machine learning methods that have been applied to computer architecture problems. It begins by introducing k-means clustering and how it is used in SimPoint to reduce architecture simulation time. It then discusses how machine learning can be used for design space exploration in multi-core processors and for coordinated resource management on multiprocessors. Finally, it provides an example of using artificial neural networks to build performance models to inform resource allocation decisions.

Performance Characterization and Optimization of In-Memory Data Analytics on ...

The document discusses performance optimization of Apache Spark on scale-up servers through near-data processing. It finds that Spark workloads have poor multi-core scalability and high I/O wait times on scale-up servers. It proposes exploiting near-data processing through in-storage processing and 2D-integrated processing-in-memory to reduce data movements and latency. The author evaluates these techniques through modeling and a programmable FPGA accelerator to improve the performance of Spark MLlib workloads by up to 9x. Challenges in hybrid CPU-FPGA design and attaining peak performance are also discussed.

Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...

Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.

An OpenCL Method of Parallel Sorting Algorithms for GPU Architecture

In this paper, we present a comparative performance analysis of different parallel sorting algorithms: Bitonic sort and Parallel Radix Sort. In order to study the interaction between the algorithms and architecture, we implemented both the algorithms in OpenCL and compared its performance with Quick Sort algorithm, the fastest algorithm. In our simulation, we have used Intel Core2Duo CPU 2.67GHz and NVidia Quadro FX 3800 as graphical processing unit.

Performance boosting of discrete cosine transform using parallel programming ...

This document summarizes a paper that proposes using parallel programming techniques to improve the performance of the discrete cosine transform (DCT) algorithm. It describes implementing both thread-level parallelism by distributing image blocks across multiple processor cores, and vector-level parallelism by performing SIMD operations within each core using AVX instructions. The proposed methodology uses Cilk Plus to enable parallelization at both the thread and vector levels. It is estimated that this multi-level parallel approach could theoretically provide a speedup of up to 32 times compared to a serial scalar implementation.

Micro-architectural Characterization of Apache Spark on Batch and Stream Proc...

Micro-architectural performance is generally consistent between batch and stream processing workloads in Spark if they only differ in micro-batching. DataFrames show improved instruction retirement and reduced stalls compared to RDDs. Higher data velocities can improve CPU utilization and reduce stalls, while increasing bandwidth consumption and instruction retirement. The size of micro-batches in stream workloads determines their micro-architectural behavior.

Auto-Pilot for Apache Spark Using Machine Learning

At Qubole, users run Spark at scale on cloud (900+ concurrent nodes). At such scale, for efficiently running SLA critical jobs, tuning Spark configurations is essential. But it continues to be a difficult undertaking, largely driven by trial and error. In this talk, we will address the problem of auto-tuning SQL workloads on Spark. The same technique can also be adapted for non-SQL Spark workloads. In our earlier work[1], we proposed a model based on simple rules and insights. It was simple yet effective at optimizing queries and finding the right instance types to run queries. However, with respect to auto tuning Spark configurations we saw scope of improvement. On exploration, we found previous works addressing auto-tuning using Machine learning techniques. One major drawback of the simple model[1] is that it cannot use multiple runs of query for improving recommendation, whereas the major drawback with Machine Learning techniques is that it lacks domain specific knowledge. Hence, we decided to combine both techniques. Our auto-tuner interacts with both models to arrive at good configurations. Once user selects a query to auto tune, the next configuration is computed from models and the query is run with it. Metrics from event log of the run is fed back to models to obtain next configuration. Auto-tuner will continue exploring good configurations until it meets the fixed budget specified by the user. We found that in practice, this method gives much better configurations compared to configurations chosen even by experts on real workload and converges soon to optimal configuration. In this talk, we will present a novel ML model technique and the way it was combined with our earlier approach. Results on real workload will be presented along with limitations and challenges in productionizing them. [1] Margoor et al,'Automatic Tuning of SQL-on-Hadoop Engines' 2018,IEEE CLOUD

A04660105

This document describes the implementation of the AES (Advanced Encryption Standard) algorithm using a fully pipelined design on an FPGA. It first provides background on the AES algorithm, including its key components and previous hardware implementations. It then details the proposed fully pipelined design, which implements each of AES's 10 rounds as separate pipeline stages to achieve high throughput. Key generation is also pipelined internally. Simulation results show the design achieves a throughput higher than previous reported implementations.

powerpoint feb

The document proposes using an ensemble of K-nearest neighbor classifiers optimized with genetic programming for intrusion detection. It trains multiple K-NN classifiers on subsets of the KDD Cup 1999 intrusion detection dataset and then uses genetic programming to combine the classifiers to improve performance. Results show the ensemble approach reduces error rates compared to individual classifiers and the genetic programming-based ensemble achieves an area under the ROC curve of 0.99976, outperforming the component classifiers.

Architectural Optimizations for High Performance and Energy Efficient Smith-W...

Architectural Optimizations for High Performance and Energy Efficient Smith-W...NECST Lab @ Politecnico di Milano

This document discusses optimizations for high performance and energy efficient implementations of the Smith-Waterman algorithm on FPGAs using OpenCL. It describes an architecture with a systolic array for parallel computation along anti-diagonals and compression techniques to address the memory-bound nature. Experimental results on two FPGA boards show up to 42.5 GCUPS performance with the best performance/power ratio compared to CPUs and other FPGA implementations.ASIC Implementation for SOBEL Accelerator

This document summarizes an ASIC implementation of a Sobel edge detection accelerator. It begins with an introduction to accelerators and edge detection in video images. It then describes the proposed Sobel accelerator design, which uses a pipelined architecture to process pixels from an image in groups of four. The design includes registers to store pixel values, a multiplier array to calculate partial derivatives, and addition circuits to accumulate the results. Finally, it discusses the advantages of implementing the Sobel accelerator using an ASIC design over an FPGA, including higher density, cost savings, and faster fabrication for mass production.

IRJET-ASIC Implementation for SOBEL Accelerator

This document summarizes an ASIC implementation of a Sobel edge detection accelerator. It begins with an introduction to edge detection and accelerators. It then describes the proposed pipeline architecture of the Sobel accelerator, which takes in pixels from an image and processes them through multiplication, addition and other operations to produce output derivative pixels. The document discusses the ASIC design flow, including frontend steps like simulation, synthesis and DFT insertion, as well as backend steps such as floorplanning, placement, routing and timing analysis. It provides diagrams of the accelerator architecture and screenshots of synthesis reports.

A data and task co scheduling algorithm for scientific cloud workflows

A multi-level K-cut graph partitioning algorithm is proposed to minimize data transfer across cloud datacenters while satisfying constraints for load balancing and fixed data locations. The algorithm contracts fixed input datasets and tasks within each datacenter, coarsens the graph level-by-level, performs K-cut partitioning to minimize cut size, and projects the partitioned graph back to the original workflow while maintaining load balancing. Evaluation on three real-world workflows shows the algorithm outperforms other state-of-the-art methods.

Linear regression model

Project where data sets of different drivers with different driving behavior were classified with linear regression and machine learning to train and test data.

A dynamically reconfigurable multi asip architecture for multistandard and mu...

A Dynamically Reconfigurable Multi-ASIP Architecture for Multistandard and Multimode Turbo Decoding
The multiplication of wireless communication standards is introducing the need of flexible and reconfigurable multistandard baseband receivers. In this context, multiprocessor turbo decoders have been recently developed in order to support the increasing flexibility and throughput requirements of emerging applications. However, these solutions do not sufficiently address reconfiguration performance issues, which can be a limiting factor in the future. This brief presents the design of a reconfigurable multiprocessor architecture for turbo decoding achieving very fast reconfiguration without compromising the decoding performances.
Web : http://www.lemenizinfotech.com
web : http://www.lemenizinfotech.com/vlsi-ieee-projects-2016-2017/
Web : http://ieeemaster.com
Web : http://ieeemaster.com/vlsi-ieee-projects-2016-2017/
Address: 36, 100 Feet Road Near Indira Gandhi Statue, Natesan Nagar, Pondicherry-605 005
Contact numbers: +91 95663 55386, 99625 88976 0413 420 5444
Mail : projects@lemenizinfotech.com
Mobile : 9566355386 / 9962588976

About TrueTime, Spanner, Clock synchronization, CAP theorem, Two-phase lockin...

TrueTime is a service that enables the use of globally synchronized clocks, with bounded error. It returns a time interval that is guaranteed to contain the clock’s actual time for some time during the call’s execution. If two intervals do not overlap, then we know calls were definitely ordered in real time. In general, synchronized clocks can be used to avoid communication in a distributed system.
The underlying source of time is a combination of GPS receivers and atomic clocks. As there are “time masters” in every datacenter (redundantly), it is likely that both sides of a partition would continue to enjoy accurate time. Individual nodes however need network connectivity to the masters, and without it their clocks will drift. Thus, during a partition their intervals slowly grow wider over time, based on bounds on the rate of local clock drift. Operations depending on TrueTime, such as Paxos leader election or transaction commits, thus have to wait a little longer, but the operation still completes (assuming the 2PC and quorum communication are working).

Adjusting Bitset for graph : SHORT REPORT / NOTES

Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is commonly used for efficient graph computations. Unfortunately, using CSR for dynamic graphs is impractical since addition/deletion of a single edge can require on average (N+M)/2 memory accesses, in order to update source-offsets and destination-indices. A common approach is therefore to store edge-lists/destination-indices as an array of arrays, where each edge-list is an array belonging to a vertex. While this is good enough for small graphs, it quickly becomes a bottleneck for large graphs. What causes this bottleneck depends on whether the edge-lists are sorted or unsorted. If they are sorted, checking for an edge requires about log(E) memory accesses, but adding an edge on average requires E/2 accesses, where E is the number of edges of a given vertex. Note that both addition and deletion of edges in a dynamic graph require checking for an existing edge, before adding or deleting it. If edge lists are unsorted, checking for an edge requires around E/2 memory accesses, but adding an edge requires only 1 memory access.

Early Benchmarking Results for Neuromorphic Computing

This document summarizes early benchmarking results for neuromorphic computing using Intel's Loihi chip. It finds that Loihi provides orders of magnitude gains over CPUs and GPUs for certain workloads that are directly trained on the chip or use novel bio-inspired algorithms. These include online learning, adaptive control, event-based vision and tactile sensing, constraint satisfaction problems, and nearest neighbor search. Larger networks and problems tend to provide greater performance gains with Loihi.

Neural Architecture Search: Learning How to Learn

Neural Architecture Search aims to automate the design of neural networks. The document discusses several papers that developed methods for neural architecture search using reinforcement learning and evolutionary algorithms. These methods led to the discovery of neural network cells that achieved state-of-the-art performance on image classification tasks when combined into larger networks. Later work explored ways to make neural architecture search more efficient and applicable to different tasks.

Ling liu part 02：big graph processing

This document discusses challenges and opportunities in parallel graph processing for big data. It describes how graphs are ubiquitous but processing large graphs at scale is difficult due to their huge size, complex correlations between data entities, and skewed distributions. Current computation models have problems with ghost vertices, too much interaction between partitions, and lack of support for iterative graph algorithms. New frameworks are needed to handle these graphs in a scalable way with low memory usage and balanced computation and communication.

SparkNet presentation

SparkNet implements a scalable, distributed algorithm to train deep neural networks that can be applied to existing batch processing frameworks like MapReduce and Spark.
Work by researchers at UC Berkeley.

IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...

This document presents a review of FPGA-based architectures for image capturing, processing, and display using a VGA monitor. It discusses using the Xilinx AccelDSP tool to develop the system on a Spartan 3E FPGA. The AccelDSP tool allows converting a MATLAB design into HDL for implementation on the FPGA. It summarizes the FPGA-based system architecture, which includes units for initialization, data transfer, image processing, and memory management. It then outlines the Xilinx AccelDSP design flow, which verifies the functionality at each stage of converting the floating-point MATLAB model to a fixed-point hardware implementation on the FPGA. The goal is to accelerate image processing applications using the parallel

A Survey of Machine Learning Methods Applied to Computer ...

This document discusses various machine learning methods that have been applied to computer architecture problems. It begins by introducing k-means clustering and how it is used in SimPoint to reduce architecture simulation time. It then discusses how machine learning can be used for design space exploration in multi-core processors and for coordinated resource management on multiprocessors. Finally, it provides an example of using artificial neural networks to build performance models to inform resource allocation decisions.

Performance Characterization and Optimization of In-Memory Data Analytics on ...

The document discusses performance optimization of Apache Spark on scale-up servers through near-data processing. It finds that Spark workloads have poor multi-core scalability and high I/O wait times on scale-up servers. It proposes exploiting near-data processing through in-storage processing and 2D-integrated processing-in-memory to reduce data movements and latency. The author evaluates these techniques through modeling and a programmable FPGA accelerator to improve the performance of Spark MLlib workloads by up to 9x. Challenges in hybrid CPU-FPGA design and attaining peak performance are also discussed.

Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...

Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.

An OpenCL Method of Parallel Sorting Algorithms for GPU Architecture

In this paper, we present a comparative performance analysis of different parallel sorting algorithms: Bitonic sort and Parallel Radix Sort. In order to study the interaction between the algorithms and architecture, we implemented both the algorithms in OpenCL and compared its performance with Quick Sort algorithm, the fastest algorithm. In our simulation, we have used Intel Core2Duo CPU 2.67GHz and NVidia Quadro FX 3800 as graphical processing unit.

Performance boosting of discrete cosine transform using parallel programming ...

This document summarizes a paper that proposes using parallel programming techniques to improve the performance of the discrete cosine transform (DCT) algorithm. It describes implementing both thread-level parallelism by distributing image blocks across multiple processor cores, and vector-level parallelism by performing SIMD operations within each core using AVX instructions. The proposed methodology uses Cilk Plus to enable parallelization at both the thread and vector levels. It is estimated that this multi-level parallel approach could theoretically provide a speedup of up to 32 times compared to a serial scalar implementation.

Micro-architectural Characterization of Apache Spark on Batch and Stream Proc...

Micro-architectural performance is generally consistent between batch and stream processing workloads in Spark if they only differ in micro-batching. DataFrames show improved instruction retirement and reduced stalls compared to RDDs. Higher data velocities can improve CPU utilization and reduce stalls, while increasing bandwidth consumption and instruction retirement. The size of micro-batches in stream workloads determines their micro-architectural behavior.

Auto-Pilot for Apache Spark Using Machine Learning

At Qubole, users run Spark at scale on cloud (900+ concurrent nodes). At such scale, for efficiently running SLA critical jobs, tuning Spark configurations is essential. But it continues to be a difficult undertaking, largely driven by trial and error. In this talk, we will address the problem of auto-tuning SQL workloads on Spark. The same technique can also be adapted for non-SQL Spark workloads. In our earlier work[1], we proposed a model based on simple rules and insights. It was simple yet effective at optimizing queries and finding the right instance types to run queries. However, with respect to auto tuning Spark configurations we saw scope of improvement. On exploration, we found previous works addressing auto-tuning using Machine learning techniques. One major drawback of the simple model[1] is that it cannot use multiple runs of query for improving recommendation, whereas the major drawback with Machine Learning techniques is that it lacks domain specific knowledge. Hence, we decided to combine both techniques. Our auto-tuner interacts with both models to arrive at good configurations. Once user selects a query to auto tune, the next configuration is computed from models and the query is run with it. Metrics from event log of the run is fed back to models to obtain next configuration. Auto-tuner will continue exploring good configurations until it meets the fixed budget specified by the user. We found that in practice, this method gives much better configurations compared to configurations chosen even by experts on real workload and converges soon to optimal configuration. In this talk, we will present a novel ML model technique and the way it was combined with our earlier approach. Results on real workload will be presented along with limitations and challenges in productionizing them. [1] Margoor et al,'Automatic Tuning of SQL-on-Hadoop Engines' 2018,IEEE CLOUD

A04660105

This document describes the implementation of the AES (Advanced Encryption Standard) algorithm using a fully pipelined design on an FPGA. It first provides background on the AES algorithm, including its key components and previous hardware implementations. It then details the proposed fully pipelined design, which implements each of AES's 10 rounds as separate pipeline stages to achieve high throughput. Key generation is also pipelined internally. Simulation results show the design achieves a throughput higher than previous reported implementations.

powerpoint feb

The document proposes using an ensemble of K-nearest neighbor classifiers optimized with genetic programming for intrusion detection. It trains multiple K-NN classifiers on subsets of the KDD Cup 1999 intrusion detection dataset and then uses genetic programming to combine the classifiers to improve performance. Results show the ensemble approach reduces error rates compared to individual classifiers and the genetic programming-based ensemble achieves an area under the ROC curve of 0.99976, outperforming the component classifiers.

Architectural Optimizations for High Performance and Energy Efficient Smith-W...

Architectural Optimizations for High Performance and Energy Efficient Smith-W...NECST Lab @ Politecnico di Milano

This document discusses optimizations for high performance and energy efficient implementations of the Smith-Waterman algorithm on FPGAs using OpenCL. It describes an architecture with a systolic array for parallel computation along anti-diagonals and compression techniques to address the memory-bound nature. Experimental results on two FPGA boards show up to 42.5 GCUPS performance with the best performance/power ratio compared to CPUs and other FPGA implementations.ASIC Implementation for SOBEL Accelerator

This document summarizes an ASIC implementation of a Sobel edge detection accelerator. It begins with an introduction to accelerators and edge detection in video images. It then describes the proposed Sobel accelerator design, which uses a pipelined architecture to process pixels from an image in groups of four. The design includes registers to store pixel values, a multiplier array to calculate partial derivatives, and addition circuits to accumulate the results. Finally, it discusses the advantages of implementing the Sobel accelerator using an ASIC design over an FPGA, including higher density, cost savings, and faster fabrication for mass production.

IRJET-ASIC Implementation for SOBEL Accelerator

This document summarizes an ASIC implementation of a Sobel edge detection accelerator. It begins with an introduction to edge detection and accelerators. It then describes the proposed pipeline architecture of the Sobel accelerator, which takes in pixels from an image and processes them through multiplication, addition and other operations to produce output derivative pixels. The document discusses the ASIC design flow, including frontend steps like simulation, synthesis and DFT insertion, as well as backend steps such as floorplanning, placement, routing and timing analysis. It provides diagrams of the accelerator architecture and screenshots of synthesis reports.

A data and task co scheduling algorithm for scientific cloud workflows

A multi-level K-cut graph partitioning algorithm is proposed to minimize data transfer across cloud datacenters while satisfying constraints for load balancing and fixed data locations. The algorithm contracts fixed input datasets and tasks within each datacenter, coarsens the graph level-by-level, performs K-cut partitioning to minimize cut size, and projects the partitioned graph back to the original workflow while maintaining load balancing. Evaluation on three real-world workflows shows the algorithm outperforms other state-of-the-art methods.

Linear regression model

Project where data sets of different drivers with different driving behavior were classified with linear regression and machine learning to train and test data.

A dynamically reconfigurable multi asip architecture for multistandard and mu...

A Dynamically Reconfigurable Multi-ASIP Architecture for Multistandard and Multimode Turbo Decoding
The multiplication of wireless communication standards is introducing the need of flexible and reconfigurable multistandard baseband receivers. In this context, multiprocessor turbo decoders have been recently developed in order to support the increasing flexibility and throughput requirements of emerging applications. However, these solutions do not sufficiently address reconfiguration performance issues, which can be a limiting factor in the future. This brief presents the design of a reconfigurable multiprocessor architecture for turbo decoding achieving very fast reconfiguration without compromising the decoding performances.
Web : http://www.lemenizinfotech.com
web : http://www.lemenizinfotech.com/vlsi-ieee-projects-2016-2017/
Web : http://ieeemaster.com
Web : http://ieeemaster.com/vlsi-ieee-projects-2016-2017/
Address: 36, 100 Feet Road Near Indira Gandhi Statue, Natesan Nagar, Pondicherry-605 005
Contact numbers: +91 95663 55386, 99625 88976 0413 420 5444
Mail : projects@lemenizinfotech.com
Mobile : 9566355386 / 9962588976

Early Benchmarking Results for Neuromorphic Computing

Early Benchmarking Results for Neuromorphic Computing

Neural Architecture Search: Learning How to Learn

Neural Architecture Search: Learning How to Learn

Ling liu part 02：big graph processing

Ling liu part 02：big graph processing

SparkNet presentation

SparkNet presentation

IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...

IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...

A Survey of Machine Learning Methods Applied to Computer ...

A Survey of Machine Learning Methods Applied to Computer ...

Performance Characterization and Optimization of In-Memory Data Analytics on ...

Performance Characterization and Optimization of In-Memory Data Analytics on ...

Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...

Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...

An OpenCL Method of Parallel Sorting Algorithms for GPU Architecture

An OpenCL Method of Parallel Sorting Algorithms for GPU Architecture

Performance boosting of discrete cosine transform using parallel programming ...

Performance boosting of discrete cosine transform using parallel programming ...

Micro-architectural Characterization of Apache Spark on Batch and Stream Proc...

Micro-architectural Characterization of Apache Spark on Batch and Stream Proc...

Auto-Pilot for Apache Spark Using Machine Learning

Auto-Pilot for Apache Spark Using Machine Learning

A04660105

A04660105

powerpoint feb

powerpoint feb

Architectural Optimizations for High Performance and Energy Efficient Smith-W...

Architectural Optimizations for High Performance and Energy Efficient Smith-W...

ASIC Implementation for SOBEL Accelerator

ASIC Implementation for SOBEL Accelerator

IRJET-ASIC Implementation for SOBEL Accelerator

IRJET-ASIC Implementation for SOBEL Accelerator

A data and task co scheduling algorithm for scientific cloud workflows

A data and task co scheduling algorithm for scientific cloud workflows

Linear regression model

Linear regression model

A dynamically reconfigurable multi asip architecture for multistandard and mu...

A dynamically reconfigurable multi asip architecture for multistandard and mu...

About TrueTime, Spanner, Clock synchronization, CAP theorem, Two-phase lockin...

TrueTime is a service that enables the use of globally synchronized clocks, with bounded error. It returns a time interval that is guaranteed to contain the clock’s actual time for some time during the call’s execution. If two intervals do not overlap, then we know calls were definitely ordered in real time. In general, synchronized clocks can be used to avoid communication in a distributed system.
The underlying source of time is a combination of GPS receivers and atomic clocks. As there are “time masters” in every datacenter (redundantly), it is likely that both sides of a partition would continue to enjoy accurate time. Individual nodes however need network connectivity to the masters, and without it their clocks will drift. Thus, during a partition their intervals slowly grow wider over time, based on bounds on the rate of local clock drift. Operations depending on TrueTime, such as Paxos leader election or transaction commits, thus have to wait a little longer, but the operation still completes (assuming the 2PC and quorum communication are working).

Adjusting Bitset for graph : SHORT REPORT / NOTES

Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is commonly used for efficient graph computations. Unfortunately, using CSR for dynamic graphs is impractical since addition/deletion of a single edge can require on average (N+M)/2 memory accesses, in order to update source-offsets and destination-indices. A common approach is therefore to store edge-lists/destination-indices as an array of arrays, where each edge-list is an array belonging to a vertex. While this is good enough for small graphs, it quickly becomes a bottleneck for large graphs. What causes this bottleneck depends on whether the edge-lists are sorted or unsorted. If they are sorted, checking for an edge requires about log(E) memory accesses, but adding an edge on average requires E/2 accesses, where E is the number of edges of a given vertex. Note that both addition and deletion of edges in a dynamic graph require checking for an existing edge, before adding or deleting it. If edge lists are unsorted, checking for an edge requires around E/2 memory accesses, but adding an edge requires only 1 memory access.

Adjusting primitives for graph : SHORT REPORT / NOTES

Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).

Experiments with Primitive operations : SHORT REPORT / NOTES

This includes:
- Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
- Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
- Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
- Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).

PageRank Experiments : SHORT REPORT / NOTES

This includes:
- Adjusting data types for rank vector
- Adjusting Pagerank parameters
- Adjusting Sequential approach
- Adjusting OpenMP approach
- Comparing sequential approach
- Adjusting Monolithic (Sequential) optimizations (from STICD)
- Adjusting Levelwise (STICD) approach
- Comparing Levelwise (STICD) approach
- Adjusting ranks for dynamic graphs
- Adjusting Levelwise (STICD) dynamic approach
- Comparing dynamic approach with static
- Adjusting Monolithic CUDA approach
- Adjusting Monolithic CUDA optimizations (from STICD)
- Adjusting Levelwise (STICD) CUDA approach
- Comparing Levelwise (STICD) CUDA approach
- Comparing dynamic CUDA approach with static
- Comparing dynamic optimized CUDA approach with static

Algorithmic optimizations for Dynamic Monolithic PageRank (from STICD) : SHOR...

Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.

Adjusting OpenMP PageRank : SHORT REPORT / NOTES

For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).

word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...

Below are the important points I note from the 2020 paper by Martin Grohe:
- 1-WL distinguishes almost all graphs, in a probabilistic sense
- Classical WL is two dimensional Weisfeiler-Leman
- DeepWL is an unlimited version of WL graph that runs in polynomial time.
- Knowledge graphs are essentially graphs with vertex/edge attributes
ABSTRACT:
Vector representations of graphs and relational structures, whether handcrafted feature vectors or learned representations, enable us to apply standard data analysis and machine learning techniques to the structures. A wide range of methods for generating such embeddings have been studied in the machine learning and knowledge representation literature. However, vector embeddings have received relatively little attention from a theoretical point of view.
Starting with a survey of embedding techniques that have been used in practice, in this paper we propose two theoretical approaches that we see as central for understanding the foundations of vector embeddings. We draw connections between the various approaches and suggest directions for future research.

DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES

https://gist.github.com/wolfram77/54c4a14d9ea547183c6c7b3518bf9cd1
There exist a number of dynamic graph generators. Barbasi-Albert model iteratively attach new vertices to pre-exsiting vertices in the graph using preferential attachment (edges to high degree vertices are more likely - rich get richer - Pareto principle). However, graph size increases monotonically, and density of graph keeps increasing (sparsity decreasing).
Gorke's model uses a defined clustering to uniformly add vertices and edges. Purohit's model uses motifs (eg. triangles) to mimick properties of existing dynamic graphs, such as growth rate, structure, and degree distribution. Kronecker graph generators are used to increase size of a given graph, with power-law distribution.
To generate dynamic graphs, we must choose a metric to compare two graphs. Common metrics include diameter, clustering coefficient (modularity?), triangle counting (triangle density?), and degree distribution.
In this paper, the authors propose Dygraph, a dynamic graph generator that uses degree distribution as the only metric. The authors observe that many real-world graphs differ from the power-law distribution at the tail end. To address this issue, they propose binning, where the vertices beyond a certain degree (minDeg = min(deg) s.t. |V(deg)| < H, where H~10 is the number of vertices with a given degree below which are binned) are grouped into bins of degree-width binWidth, max-degree localMax, and number of degrees in bin with at least one vertex binSize (to keep track of sparsity). This helps the authors to generate graphs with a more realistic degree distribution.
The process of generating a dynamic graph is as follows. First the difference between the desired and the current degree distribution is calculated. The authors then create an edge-addition set where each vertex is present as many times as the number of additional incident edges it must recieve. Edges are then created by connecting two vertices randomly from this set, and removing both from the set once connected. Currently, authors reject self-loops and duplicate edges. Removal of edges is done in a similar fashion.
Authors observe that adding edges with power-law properties dominates the execution time, and consider parallelizing DyGraph as part of future work.

Shared memory Parallelism (NOTES)

My notes on shared memory parallelism.
Shared memory is memory that may be simultaneously accessed by multiple programs with an intent to provide communication among them or avoid redundant copies. Shared memory is an efficient means of passing data between programs. Using memory for communication inside a single program, e.g. among its multiple threads, is also referred to as shared memory [REF].

A Dynamic Algorithm for Local Community Detection in Graphs : NOTES

**Community detection methods** can be *global* or *local*. **Global community detection methods** divide the entire graph into groups. Existing global algorithms include:
- Random walk methods
- Spectral partitioning
- Label propagation
- Greedy agglomerative and divisive algorithms
- Clique percolation
https://gist.github.com/wolfram77/b4316609265b5b9f88027bbc491f80b6
There is a growing body of work in *detecting overlapping communities*. **Seed set expansion** is a **local community detection method** where a relevant *seed vertices* of interest are picked and *expanded to form communities* surrounding them. The quality of each community is measured using a *fitness function*.
**Modularity** is a *fitness function* which compares the number of intra-community edges to the expected number in a random-null model. **Conductance** is another popular fitness score that measures the community cut or inter-community edges. Many *overlapping community detection* methods **use a modified ratio** of intra-community edges to all edges with atleast one endpoint in the community.
Andersen et al. use a **Spectral PageRank-Nibble method** which minimizes conductance and is formed by adding vertices in order of decreasing PageRank values. Andersen and Lang develop a **random walk approach** in which some vertices in the seed set may not be placed in the final community. Clauset gives a **greedy method** that *starts from a single vertex* and then iteratively adds neighboring vertices *maximizing the local modularity score*. Riedy et al. **expand multiple vertices** via maximizing modularity.
Several algorithms for **detecting global, overlapping communities** use a *greedy*, *agglomerative approach* and run *multiple separate seed set expansions*. Lancichinetti et al. run **greedy seed set expansions**, each with a *single seed vertex*. Overlapping communities are produced by a sequentially running expansions from a node not yet in a community. Lee et al. use **maximal cliques as seed sets**. Havemann et al. **greedily expand cliques**.
The authors of this paper discuss a dynamic approach for **community detection using seed set expansion**. Simply marking the neighbours of changed vertices is a **naive approach**, and has *severe shortcomings*. This is because *communities can split apart*. The simple updating method *may fail even when it outputs a valid community* in the graph.

Scalable Static and Dynamic Community Detection Using Grappolo : NOTES

A **community** (in a network) is a subset of nodes which are _strongly connected among themselves_, but _weakly connected to others_. Neither the number of output communities nor their size distribution is known a priori. Community detection methods can be divisive or agglomerative. **Divisive methods** use _betweeness centrality_ to **identify and remove bridges** between communities. **Agglomerative methods** greedily **merge two communities** that provide maximum gain in _modularity_. Newman and Girvan have introduced the **modularity metric**. The problem of community detection is then reduced to the problem of modularity maximization which is **NP-complete**. **Louvain method** is a variant of the _agglomerative strategy_, in that is a _multi-level heuristic_.
https://gist.github.com/wolfram77/917a1a4a429e89a0f2a1911cea56314d
In this paper, the authors discuss **four heuristics** for Community detection using the _Louvain algorithm_ implemented upon recently developed **Grappolo**, which is a parallel variant of the Louvain algorithm. They are:
- Vertex following and Minimum label
- Data caching
- Graph coloring
- Threshold scaling
With the **Vertex following** heuristic, the _input is preprocessed_ and all single-degree vertices are merged with their corresponding neighbours. This helps reduce the number of vertices considered in each iteration, and also help initial seeds of communities to be formed. With the **Minimum label heuristic**, when a vertex is making the decision to move to a community and multiple communities provided the same modularity gain, the community with the smallest id is chosen. This helps _minimize or prevent community swaps_. With the **Data caching** heuristic, community information is stored in a vector instead of a map, and is reused in each iteration, but with some additional cost. With the **Vertex ordering via Graph coloring** heuristic, _distance-k coloring_ of graphs is performed in order to group vertices into colors. Then, each set of vertices (by color) is processed _concurrently_, and synchronization is performed after that. This enables us to mimic the behaviour of the serial algorithm. Finally, with the **Threshold scaling** heuristic, _successively smaller values of modularity threshold_ are used as the algorithm progresses. This allows the algorithm to converge faster, and it has been observed a good modularity score as well.
From the results, it appears that _graph coloring_ and _threshold scaling_ heuristics do not always provide a speedup and this depends upon the nature of the graph. It would be interesting to compare the heuristics against baseline approaches. Future work can include _distributed memory implementations_, and _community detection on streaming graphs_.

Application Areas of Community Detection: A Review : NOTES

This is a short review of Community detection methods (on graphs), and their applications. A **community** is a subset of a network whose members are *highly connected*, but *loosely connected* to others outside their community. Different community detection methods *can return differing communities* these algorithms are **heuristic-based**. **Dynamic community detection** involves tracking the *evolution of community structure* over time.
https://gist.github.com/wolfram77/09e64d6ba3ef080db5558feb2d32fdc0
Communities can be of the following **types**:
- Disjoint
- Overlapping
- Hierarchical
- Local.
The following **static** community detection **methods** exist:
- Spectral-based
- Statistical inference
- Optimization
- Dynamics-based
The following **dynamic** community detection **methods** exist:
- Independent community detection and matching
- Dependent community detection (evolutionary)
- Simultaneous community detection on all snapshots
- Dynamic community detection on temporal networks
**Applications** of community detection include:
- Criminal identification
- Fraud detection
- Criminal activities detection
- Bot detection
- Dynamics of epidemic spreading (dynamic)
- Cancer/tumor detection
- Tissue/organ detection
- Evolution of influence (dynamic)
- Astroturfing
- Customer segmentation
- Recommendation systems
- Social network analysis (both)
- Network summarization
- Privary, group segmentation
- Link prediction (both)
- Community evolution prediction (dynamic, hot field)
<br>
<br>
## References
- [Application Areas of Community Detection: A Review : PAPER](https://ieeexplore.ieee.org/document/8625349)

Community Detection on the GPU : NOTES

This paper discusses a GPU implementation of the Louvain community detection algorithm. Louvain algorithm obtains hierachical communities as a dendrogram through modularity optimization. Given an undirected weighted graph, all vertices are first considered to be their own communities. In the first phase, each vertex greedily decides to move to the community of one of its neighbours which gives greatest increase in modularity. If moving to no neighbour's community leads to an increase in modularity, the vertex chooses to stay with its own community. This is done sequentially for all the vertices. If the total change in modularity is more than a certain threshold, this phase is repeated. Once this local moving phase is complete, all vertices have formed their first hierarchy of communities. The next phase is called the aggregation phase, where all the vertices belonging to a community are collapsed into a single super-vertex, such that edges between communities are represented as edges between respective super-vertices (edge weights are combined), and edges within each community are represented as self-loops in respective super-vertices (again, edge weights are combined). Together, the local moving and the aggregation phases constitute a stage. This super-vertex graph is then used as input fof the next stage. This process continues until the increase in modularity is below a certain threshold. As a result from each stage, we have a hierarchy of community memberships for each vertex as a dendrogram.
Approaches to perform the Louvain algorithm can be divided into coarse-grained and fine-grained. Coarse-grained approaches process a set of vertices in parallel, while fine-grained approaches process all vertices in parallel. A coarse-grained hybrid-GPU algorithm using multi GPUs has be implemented by Cheong et al. which grabbed my attention. In addition, their algorithm does not use hashing for the local moving phase, but instead sorts each neighbour list based on the community id of each vertex.
https://gist.github.com/wolfram77/7e72c9b8c18c18ab908ae76262099329

Survey for extra-child-process package : NOTES

Useful additions to inbuilt child_process module.
📦 Node.js, 📜 Files, 📰 Docs.
Please see attached PDF for literature survey.
https://gist.github.com/wolfram77/d936da570d7bf73f95d1513d4368573e

Fast Incremental Community Detection on Dynamic Graphs : NOTES

In this paper, the authors describe two approaches for dynamic community detection using the CNM algorithm. CNM is a hierarchical, agglomerative algorithm that greedily maximizes modularity. They define two approaches: BasicDyn and FastDyn. BasicDyn backtracks merges of communities until each marked (changed) vertex is its own singleton community. FastDyn undoes a merge only if the quality of merge, as measured by the induced change in modularity, has significantly decreased compared to when the merge initially took place. FastDyn also allows more than two vertices to contract together if in the previous time step these vertices eventually ended up contracted in the same community. In the static case, merging several vertices together in one contraction phase could lead to deteriorating results. FastDyn is able to do this, however, because it uses information from the merges of the previous time step. Intuitively, merges that previously occurred are more likely to be acceptable later.
https://gist.github.com/wolfram77/1856b108334cc822cdddfdfa7334792a

Can you ﬁx farming by going back 8000 years : NOTES

1. Human population didn't explode, but plateued.
2. Fertilizer prices are going to the sky.
3. Farmers are looking for alternatives such as animal waste (manure) or even human waste.
4. Manure prices are also going up.
5. Switching to organic farming not an option.
https://gist.github.com/wolfram77/49067fc3ddc1ba2e1db4f873056fd88a

HITS algorithm : NOTES

1. Webpages tend to behave as authorities or hubs.
2. An authority represents an research thesis, and a hub represents an encyclopedia.
3. Each page has an authority and a hub score.
4. The graph is based on query, included pointed to and from pages.
5. Authority score is the sum of scores of all hubs pointing to it.
6. Hub score is the sum of scores of all authorities is pointing to.
7. Score are normalized with L2-norm in each iteration (root of sum of squares).
8. Needs to be performed at query time.
9. Two scores are returned, instead of just one.
https://gist.github.com/wolfram77/3d9ef6c5a5b63f53caabce4812c7ea81

Basic Computer Architecture and the Case for GPUs : NOTES

Computer architectures are facing issues:
Memory latencies are far higher.
Benefits from instruction level parallelism (ILP) is reducing.
With increasing clock rates, power consumption is increasing.
Increasing complexity with multi-stage pipelines, intermediate buffers, multi-level caches, out-of-order execution, branch prediction, ...
GPUs are parallel computer architectures that are good at some tasks, not so good at others. Running routines with high arithmetic intensity with overlapped memory access is the preferred approach. They may be unsuitable for irregular algorithms, where it is difficult to get high efficiency due to the high latency of accesses. They are less versatile compared to CPUs, using SIMD parallelism, and are dense compute-wise (per currency). NVIDIA's CUDA programming model enables GPUs to be used for general-purpose computing, and hence the term GPGPU.
GPU Architectural, Programming, and Performance Models presentation at PPoPP, 2010, Bangalore, India.
By Prof. Kishore Kothapalli with Prof. P. J. Narayanan and Suryakant Patidar.
https://gist.github.com/wolfram77/43a6660121eef45b78c10d4e652dad6c

Are Satellites Covered in Gold Foil : NOTES

Satellites are usually covered in aluminized polyimide. The yellowish gold color of polyimide with silver aluminium side facing in gives the satellite the appearance of being wrapped in gold. The material is called Multi-layer Insulation (MLI). It helps in radiative insulation of the onboard instruments of satellite. Gold is actually used in electrical contacts to prevent corrosion due to Ultra-violet light or X-rays.
https://gist.github.com/wolfram77/8ae2de1a29caf1a2f84babed79943389

About TrueTime, Spanner, Clock synchronization, CAP theorem, Two-phase lockin...

About TrueTime, Spanner, Clock synchronization, CAP theorem, Two-phase lockin...

Adjusting Bitset for graph : SHORT REPORT / NOTES

Adjusting Bitset for graph : SHORT REPORT / NOTES

Adjusting primitives for graph : SHORT REPORT / NOTES

Adjusting primitives for graph : SHORT REPORT / NOTES

Experiments with Primitive operations : SHORT REPORT / NOTES

Experiments with Primitive operations : SHORT REPORT / NOTES

PageRank Experiments : SHORT REPORT / NOTES

PageRank Experiments : SHORT REPORT / NOTES

Algorithmic optimizations for Dynamic Monolithic PageRank (from STICD) : SHOR...

Algorithmic optimizations for Dynamic Monolithic PageRank (from STICD) : SHOR...

Adjusting OpenMP PageRank : SHORT REPORT / NOTES

Adjusting OpenMP PageRank : SHORT REPORT / NOTES

word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...

word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...

DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES

DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES

Shared memory Parallelism (NOTES)

Shared memory Parallelism (NOTES)

A Dynamic Algorithm for Local Community Detection in Graphs : NOTES

A Dynamic Algorithm for Local Community Detection in Graphs : NOTES

Scalable Static and Dynamic Community Detection Using Grappolo : NOTES

Scalable Static and Dynamic Community Detection Using Grappolo : NOTES

Application Areas of Community Detection: A Review : NOTES

Application Areas of Community Detection: A Review : NOTES

Community Detection on the GPU : NOTES

Community Detection on the GPU : NOTES

Survey for extra-child-process package : NOTES

Survey for extra-child-process package : NOTES

Fast Incremental Community Detection on Dynamic Graphs : NOTES

Fast Incremental Community Detection on Dynamic Graphs : NOTES

Can you ﬁx farming by going back 8000 years : NOTES

Can you ﬁx farming by going back 8000 years : NOTES

HITS algorithm : NOTES

HITS algorithm : NOTES

Basic Computer Architecture and the Case for GPUs : NOTES

Basic Computer Architecture and the Case for GPUs : NOTES

Are Satellites Covered in Gold Foil : NOTES

Are Satellites Covered in Gold Foil : NOTES

Eukaryotic Transcription Presentation.pptx

ukaryotic Transcription Presentation and RNA Precessing

快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样

学校原件一模一样【微信：741003700 】《(UAM毕业证书)马德里自治大学毕业证学位证》【微信：741003700 】学位证，留信认证（真实可查，永久存档）原件一模一样纸张工艺/offer、雅思、外壳等材料/诚信可靠,可直接看成品样本，帮您解决无法毕业带来的各种难题！外壳，原版制作，诚信可靠，可直接看成品样本。行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备。十五年致力于帮助留学生解决难题，包您满意。
本公司拥有海外各大学样板无数，能完美还原。
1:1完美还原海外各大学毕业材料上的工艺：水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠。文字图案浮雕、激光镭射、紫外荧光、温感、复印防伪等防伪工艺。材料咨询办理、认证咨询办理请加学历顾问Q/微741003700
【主营项目】
一.毕业证【q微741003700】成绩单、使馆认证、教育部认证、雅思托福成绩单、学生卡等！
二.真实使馆公证(即留学回国人员证明,不成功不收费)
三.真实教育部学历学位认证（教育部存档！教育部留服网站永久可查）
四.办理各国各大学文凭(一对一专业服务,可全程监控跟踪进度)
如果您处于以下几种情况：
◇在校期间，因各种原因未能顺利毕业……拿不到官方毕业证【q/微741003700】
◇面对父母的压力，希望尽快拿到；
◇不清楚认证流程以及材料该如何准备；
◇回国时间很长，忘记办理；
◇回国马上就要找工作，办给用人单位看；
◇企事业单位必须要求办理的
◇需要报考公务员、购买免税车、落转户口
◇申请留学生创业基金
留信网认证的作用:
1:该专业认证可证明留学生真实身份
2:同时对留学生所学专业登记给予评定
3:国家专业人才认证中心颁发入库证书
4:这个认证书并且可以归档倒地方
5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息
6:个人职称评审加20分
7:个人信誉贷款加10分
8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才

Bob Reedy - Nitrate in Texas Groundwater.pdf

Presented at June 6-7 Texas Alliance of Groundwater Districts Business Meeting

20240520 Planning a Circuit Simulator in JavaScript.pptx

Evaporation step counter work. I have done a physical experiment.
(Work in progress.)

SAR of Medicinal Chemistry 1st by dk.pdf

In this presentation include the prototype drug SAR on thus or with their examples .
Syllabus of Second Year B. Pharmacy
2019 PATTERN.

GBSN - Biochemistry (Unit 6) Chemistry of Proteins

Chemistry of Proteins

Describing and Interpreting an Immersive Learning Case with the Immersion Cub...

Current descriptions of immersive learning cases are often difficult or impossible to compare. This is due to a myriad of different options on what details to include, which aspects are relevant, and on the descriptive approaches employed. Also, these aspects often combine very specific details with more general guidelines or indicate intents and rationales without clarifying their implementation. In this paper we provide a method to describe immersive learning cases that is structured to enable comparisons, yet flexible enough to allow researchers and practitioners to decide which aspects to include. This method leverages a taxonomy that classifies educational aspects at three levels (uses, practices, and strategies) and then utilizes two frameworks, the Immersive Learning Brain and the Immersion Cube, to enable a structured description and interpretation of immersive learning cases. The method is then demonstrated on a published immersive learning case on training for wind turbine maintenance using virtual reality. Applying the method results in a structured artifact, the Immersive Learning Case Sheet, that tags the case with its proximal uses, practices, and strategies, and refines the free text case description to ensure that matching details are included. This contribution is thus a case description method in support of future comparative research of immersive learning cases. We then discuss how the resulting description and interpretation can be leveraged to change immersion learning cases, by enriching them (considering low-effort changes or additions) or innovating (exploring more challenging avenues of transformation). The method holds significant promise to support better-grounded research in immersive learning.

THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...

THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...Abdul Wali Khan University Mardan,kP,Pakistan

hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skillsImmersive Learning That Works: Research Grounding and Paths Forward

We will metaverse into the essence of immersive learning, into its three dimensions and conceptual models. This approach encompasses elements from teaching methodologies to social involvement, through organizational concerns and technologies. Challenging the perception of learning as knowledge transfer, we introduce a 'Uses, Practices & Strategies' model operationalized by the 'Immersive Learning Brain' and ‘Immersion Cube’ frameworks. This approach offers a comprehensive guide through the intricacies of immersive educational experiences and spotlighting research frontiers, along the immersion dimensions of system, narrative, and agency. Our discourse extends to stakeholders beyond the academic sphere, addressing the interests of technologists, instructional designers, and policymakers. We span various contexts, from formal education to organizational transformation to the new horizon of an AI-pervasive society. This keynote aims to unite the iLRN community in a collaborative journey towards a future where immersive learning research and practice coalesce, paving the way for innovative educational research and practice landscapes.

11.1 Role of physical biological in deterioration of grains.pdf

Storagedeteriorationisanyformoflossinquantityandqualityofbio-materials.
Themajorcausesofdeteriorationinstorage
•Physical
•Biological
•Mechanical
•Chemical
Storageonlypreservesquality.Itneverimprovesquality.
Itisadvisabletostartstoragewithqualityfoodproduct.Productwithinitialpoorqualityquicklydepreciates

Compexometric titration/Chelatorphy titration/chelating titration

Classification
Metal ion ion indicators
Masking and demasking reagents
Estimation of Magnisium sulphate
Calcium gluconate
Complexometric Titration/ chelatometry titration/chelating titration, introduction, Types-
1.Direct Titration
2.Back Titration
3.Replacement Titration
4.Indirect Titration
Masking agent, Demasking agents
formation of complex
comparition between masking and demasking agents,
Indicators/Metal ion indicators/ Metallochromic indicators/pM indicators,
Visual Technique,PM indicators (metallochromic), Indicators of pH, Redox Indicators
Instrumental Techniques-Photometry
Potentiometry
Miscellaneous methods.
Complex titration with EDTA.

Pests of Storage_Identification_Dr.UPR.pdf

InIndia-post-harvestlosses-unscientificstorage,insects,rodents,micro-organismsetc.,accountforabout10percentoftotalfoodgrains
Graininfestation
Directdamage
Indirectly
•theexuviae,skin,deadinsects
•theirexcretawhichmakefoodunfitforhumanconsumption
About600speciesofinsectshavebeenassociatedwithstoredgrainproducts
100speciesofinsectpestsofstoredproductscauseeconomiclosses

aziz sancar nobel prize winner: from mardin to nobel

aziz sancar nobel prize winner

Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water

Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterTexas Alliance of Groundwater Districts

Presented at June 6-7 Texas Alliance of Groundwater Districts Business MeetingNuGOweek 2024 Ghent programme overview flyer

NuGOweek 2024 Ghent programme overview flyer

The debris of the ‘last major merger’ is dynamically young

The Milky Way’s (MW) inner stellar halo contains an [Fe/H]-rich component with highly eccentric orbits, often referred to as the
‘last major merger.’ Hypotheses for the origin of this component include Gaia-Sausage/Enceladus (GSE), where the progenitor
collided with the MW proto-disc 8–11 Gyr ago, and the Virgo Radial Merger (VRM), where the progenitor collided with the
MW disc within the last 3 Gyr. These two scenarios make different predictions about observable structure in local phase space,
because the morphology of debris depends on how long it has had to phase mix. The recently identified phase-space folds in Gaia
DR3 have positive caustic velocities, making them fundamentally different than the phase-mixed chevrons found in simulations
at late times. Roughly 20 per cent of the stars in the prograde local stellar halo are associated with the observed caustics. Based
on a simple phase-mixing model, the observed number of caustics are consistent with a merger that occurred 1–2 Gyr ago.
We also compare the observed phase-space distribution to FIRE-2 Latte simulations of GSE-like mergers, using a quantitative
measurement of phase mixing (2D causticality). The observed local phase-space distribution best matches the simulated data
1–2 Gyr after collision, and certainly not later than 3 Gyr. This is further evidence that the progenitor of the ‘last major merger’
did not collide with the MW proto-disc at early times, as is thought for the GSE, but instead collided with the MW disc within
the last few Gyr, consistent with the body of work surrounding the VRM.

ESR spectroscopy in liquid food and beverages.pptx

With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.

23PH301 - Optics - Optical Lenses.pptx

Under graduate Physics - Optics

Katherine Romanak - Geologic CO2 Storage.pdf

Presented at June 6-7 Texas Alliance of Groundwater Districts Business Meeting

Shallowest Oil Discovery of Turkiye.pptx

The Petroleum System of the Çukurova Field - the Shallowest Oil Discovery of Türkiye, Adana

Eukaryotic Transcription Presentation.pptx

Eukaryotic Transcription Presentation.pptx

快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样

快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样

Bob Reedy - Nitrate in Texas Groundwater.pdf

Bob Reedy - Nitrate in Texas Groundwater.pdf

20240520 Planning a Circuit Simulator in JavaScript.pptx

20240520 Planning a Circuit Simulator in JavaScript.pptx

SAR of Medicinal Chemistry 1st by dk.pdf

SAR of Medicinal Chemistry 1st by dk.pdf

GBSN - Biochemistry (Unit 6) Chemistry of Proteins

GBSN - Biochemistry (Unit 6) Chemistry of Proteins

Describing and Interpreting an Immersive Learning Case with the Immersion Cub...

Describing and Interpreting an Immersive Learning Case with the Immersion Cub...

THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...

THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...

Immersive Learning That Works: Research Grounding and Paths Forward

Immersive Learning That Works: Research Grounding and Paths Forward

11.1 Role of physical biological in deterioration of grains.pdf

11.1 Role of physical biological in deterioration of grains.pdf

Compexometric titration/Chelatorphy titration/chelating titration

Compexometric titration/Chelatorphy titration/chelating titration

Pests of Storage_Identification_Dr.UPR.pdf

Pests of Storage_Identification_Dr.UPR.pdf

aziz sancar nobel prize winner: from mardin to nobel

aziz sancar nobel prize winner: from mardin to nobel

Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water

Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water

NuGOweek 2024 Ghent programme overview flyer

NuGOweek 2024 Ghent programme overview flyer

The debris of the ‘last major merger’ is dynamically young

The debris of the ‘last major merger’ is dynamically young

ESR spectroscopy in liquid food and beverages.pptx

ESR spectroscopy in liquid food and beverages.pptx

23PH301 - Optics - Optical Lenses.pptx

23PH301 - Optics - Optical Lenses.pptx

Katherine Romanak - Geologic CO2 Storage.pdf

Katherine Romanak - Geologic CO2 Storage.pdf

Shallowest Oil Discovery of Turkiye.pptx

Shallowest Oil Discovery of Turkiye.pptx

- 1. Dynamic Batch Parallel Algorithms for Updating PageRank (Poster abstract for IPDPS 2022 PhD Forum) Subhajit Sahu†, Kishore Kothapalli†and Dip Sankar Banerjee‡ †International Institute of Information Technology Hyderabad, India. ‡Indian Institute of Technology Jodhpur, India. subhajit.sahu@research.,kkishore@iiit.ac.in, dipsankarb@iitj.ac.in May 4, 2022 We present two new parallel algorithms for recomputing the PageRank values of only the vertices affected by the insertion/deletion of a batch of edges, in a dynamic graph. One algorithm, named DYNAMICLEVELWISEPR, computes updated ranks of vertices in topological order of affected SCCs. PageRank computation is performed on each affected level of SCCs in sequential order, from the topmost unprocessed level until convergence. This avoids unnecessary re- computation of SCCs that are dependent upon ranks of vertices in other SCCs which have not yet converged. The other algorithm, DYNAMICMONOLITHICPR computes updated ranks of vertices in one go, but groups affected vertices by SCCs and partitions them by in-degree, to obtain a better work-balance on the GPU. Both algorithms accept the previ- ous and current snapshot of a graph as input, along with the previous ranks of the vertices. From each changed SCC, DFS is performed in order to obtain a list of affected SCCs. We group vertices by SCCs for ensuring good memory locality. On the GPU, each affected SCC is processed with a thread-per-vertex and a block-per-vertex CUDA kernel after partitioning. However to reduce the number of kernel calls, we combine small affected SCCs together until they satisfy a minimum work requirement of 10M vertices. Computation is performed on CSR representation of the graph. We conduct experimental studies of our algorithms on a set of 11 real-world graphs. Self-loops are added to dead ends in all the graphs. Their order |V | varies from 75k to 41M vertices, and size |E| varies from 524k to 1.1B edges. We experiment with batch sizes of 500 to 10000 edges. Each batch is randomly generated with an equal mix of insertions and deletions, such that edges connecting vertices with high out-degrees have a greater chance of selection. This is done in order to mimic the behaviour of real-world dynamic graphs. A fair comparison is ensured except in cases beyond our control. The measured time in all cases is the rank computation time. Our results on an Intel Xeon Silver 4116 CPU and NVIDIA Tesla V100 PCIe 16GB GPU indicate that DYNAMIC- MONOLITHICPR and DYNAMICLEVELWISEPR outperform static STIC-D PageRank by 6.1×and 8.6×on the CPU, and naive dynamic nvGraph PageRank by 9.8×and 9.3×on the GPU respectively. In addition we observe a mean speedup of 4.2×and 5.8×on the CPU over a pure CPU implementation of HyPR, and a mean speedup of 1.9×and 1.8×on the GPU over a pure GPU implementation of HyPR respectively. We also compare the performance of the algorithms in batched mode to cumulative single-edge updates. A batch update of 5000 edges offers a speedup of 4066×and 2998×for algorithms DYNAMICMONOLITHICPR and DYNAMICLEVELWISEPR respectively on the CPU, and a speedup of 1712×and 2324×respectively on the GPU. We therefore conclude that DYNAMICLEVELWISEPR is a suitable approach for CPUs. However on a GPU, smaller levels/components could be combined and processed at a time in order to help improve GPU usage efficiency as DYNAMICMONOLITHICPR suggests.