Graphs are the natural data structure to represent relations. Graph algorithms show irregular memory access pattern. This causes, distributed-memory parallel graph algorithms to do more communication than computation. When an algorithm generates more work the more communication they need to do. The amount of work can be reduced with frequent synchronization. However, the overhead of frequent synchronization reduces the performance of distributed-memory parallel graph algorithms. Abstract Graph Machine (AGM) is a model that can control the amount of synchronization and the amount of work generated by an algorithm,
The document discusses distributed linear classification on Apache Spark. It describes using Spark to train logistic regression and linear support vector machine models on large datasets. Spark improves on MapReduce by conducting communications in-memory and supporting fault tolerance. The paper proposes using a trust region Newton method to optimize the objective functions for logistic regression and linear SVM. Conjugate gradient is used to approximate the Hessian matrix and solve the Newton system without explicitly storing the large Hessian.
2014-06-20 Multinomial Logistic Regression with Apache SparkDB Tsai
Logistic Regression can not only be used for modeling binary outcomes but also multinomial outcome with some extension. In this talk, DB will talk about basic idea of binary logistic regression step by step, and then extend to multinomial one. He will show how easy it's with Spark to parallelize this iterative algorithm by utilizing the in-memory RDD cache to scale horizontally (the numbers of training data.) However, there is mathematical limitation on scaling vertically (the numbers of training features) while many recent applications from document classification and computational linguistics are of this type. He will talk about how to address this problem by L-BFGS optimizer instead of Newton optimizer.
Bio:
DB Tsai is a machine learning engineer working at Alpine Data Labs. He is recently working with Spark MLlib team to add support of L-BFGS optimizer and multinomial logistic regression in the upstream. He also led the Apache Spark development at Alpine Data Labs. Before joining Alpine Data labs, he was working on large-scale optimization of optical quantum circuits at Stanford as a PhD student.
Multinomial Logistic Regression with Apache SparkDB Tsai
Logistic Regression can not only be used for modeling binary outcomes but also multinomial outcome with some extension. In this talk, DB will talk about basic idea of binary logistic regression step by step, and then extend to multinomial one. He will show how easy it's with Spark to parallelize this iterative algorithm by utilizing the in-memory RDD cache to scale horizontally (the numbers of training data.) However, there is mathematical limitation on scaling vertically (the numbers of training features) while many recent applications from document classification and computational linguistics are of this type. He will talk about how to address this problem by L-BFGS optimizer instead of Newton optimizer.
Bio:
DB Tsai is a machine learning engineer working at Alpine Data Labs. He is recently working with Spark MLlib team to add support of L-BFGS optimizer and multinomial logistic regression in the upstream. He also led the Apache Spark development at Alpine Data Labs. Before joining Alpine Data labs, he was working on large-scale optimization of optical quantum circuits at Stanford as a PhD student.
This document summarizes several graph algorithms, including:
1) Prim's algorithm for finding minimum spanning trees, which grows a minimum spanning tree by successively adding the closest vertex;
2) Dijkstra's algorithm for single-source shortest paths, which is similar to Prim's and finds shortest paths from a source vertex to all others;
3) An algorithm for all-pairs shortest paths based on repeated squaring of the weighted adjacency matrix, with multiplication replaced by minimization.
This document describes parametric surface visualization using DirectX 11. It discusses representing surfaces defined by degree, knot vectors, and control points by generating a grid of surface points. The surface is rendered by forming triangles between the points. Normal vectors can be estimated to add per-pixel lighting effects. The number of points used can be controlled dynamically by varying the parameter step size. Examples of surfaces rendered with and without lighting are shown.
Algorithm Design and Complexity - Course 8Traian Rebedea
The document discusses algorithms for finding strongly connected components (SCCs) in directed graphs. It describes Kosaraju's algorithm, which uses two depth-first searches (DFS) to find the SCCs. The first DFS computes a finishing time ordering, while the second DFS uses the transpose graph and the finishing time ordering to find the SCCs, outputting each SCC as a separate DFS tree. The algorithm runs in O(V+E) time and uses the property that the first DFS provides a topological sorting of the SCCs graph.
This document discusses graph algorithms and graph search techniques. It begins with an introduction to graphs and their representations as adjacency matrices and adjacency lists. It then covers graph terminology like vertices, edges, paths, cycles, and weighted graphs. Common graph search algorithms like breadth-first search and depth-first search are explained. Variations of these algorithms like recursive depth-first search and Dijkstra's algorithm for finding shortest paths in weighted graphs are also covered. Examples are provided throughout to illustrate the concepts and algorithms.
The document discusses randomized graph algorithms and techniques for analyzing them. It describes a linear time algorithm for finding minimum spanning trees (MST) that samples edges and uses Boruvka's algorithm and edge filtering. It also discusses Karger's algorithm for approximating the global minimum cut in near-linear time using edge contractions. Finally, it presents an approach for 3-approximate distance oracles that preprocesses a graph to build a data structure for answering approximate shortest path queries in constant time using landmark vertices and storing local and global distance information.
The document discusses distributed linear classification on Apache Spark. It describes using Spark to train logistic regression and linear support vector machine models on large datasets. Spark improves on MapReduce by conducting communications in-memory and supporting fault tolerance. The paper proposes using a trust region Newton method to optimize the objective functions for logistic regression and linear SVM. Conjugate gradient is used to approximate the Hessian matrix and solve the Newton system without explicitly storing the large Hessian.
2014-06-20 Multinomial Logistic Regression with Apache SparkDB Tsai
Logistic Regression can not only be used for modeling binary outcomes but also multinomial outcome with some extension. In this talk, DB will talk about basic idea of binary logistic regression step by step, and then extend to multinomial one. He will show how easy it's with Spark to parallelize this iterative algorithm by utilizing the in-memory RDD cache to scale horizontally (the numbers of training data.) However, there is mathematical limitation on scaling vertically (the numbers of training features) while many recent applications from document classification and computational linguistics are of this type. He will talk about how to address this problem by L-BFGS optimizer instead of Newton optimizer.
Bio:
DB Tsai is a machine learning engineer working at Alpine Data Labs. He is recently working with Spark MLlib team to add support of L-BFGS optimizer and multinomial logistic regression in the upstream. He also led the Apache Spark development at Alpine Data Labs. Before joining Alpine Data labs, he was working on large-scale optimization of optical quantum circuits at Stanford as a PhD student.
Multinomial Logistic Regression with Apache SparkDB Tsai
Logistic Regression can not only be used for modeling binary outcomes but also multinomial outcome with some extension. In this talk, DB will talk about basic idea of binary logistic regression step by step, and then extend to multinomial one. He will show how easy it's with Spark to parallelize this iterative algorithm by utilizing the in-memory RDD cache to scale horizontally (the numbers of training data.) However, there is mathematical limitation on scaling vertically (the numbers of training features) while many recent applications from document classification and computational linguistics are of this type. He will talk about how to address this problem by L-BFGS optimizer instead of Newton optimizer.
Bio:
DB Tsai is a machine learning engineer working at Alpine Data Labs. He is recently working with Spark MLlib team to add support of L-BFGS optimizer and multinomial logistic regression in the upstream. He also led the Apache Spark development at Alpine Data Labs. Before joining Alpine Data labs, he was working on large-scale optimization of optical quantum circuits at Stanford as a PhD student.
This document summarizes several graph algorithms, including:
1) Prim's algorithm for finding minimum spanning trees, which grows a minimum spanning tree by successively adding the closest vertex;
2) Dijkstra's algorithm for single-source shortest paths, which is similar to Prim's and finds shortest paths from a source vertex to all others;
3) An algorithm for all-pairs shortest paths based on repeated squaring of the weighted adjacency matrix, with multiplication replaced by minimization.
This document describes parametric surface visualization using DirectX 11. It discusses representing surfaces defined by degree, knot vectors, and control points by generating a grid of surface points. The surface is rendered by forming triangles between the points. Normal vectors can be estimated to add per-pixel lighting effects. The number of points used can be controlled dynamically by varying the parameter step size. Examples of surfaces rendered with and without lighting are shown.
Algorithm Design and Complexity - Course 8Traian Rebedea
The document discusses algorithms for finding strongly connected components (SCCs) in directed graphs. It describes Kosaraju's algorithm, which uses two depth-first searches (DFS) to find the SCCs. The first DFS computes a finishing time ordering, while the second DFS uses the transpose graph and the finishing time ordering to find the SCCs, outputting each SCC as a separate DFS tree. The algorithm runs in O(V+E) time and uses the property that the first DFS provides a topological sorting of the SCCs graph.
This document discusses graph algorithms and graph search techniques. It begins with an introduction to graphs and their representations as adjacency matrices and adjacency lists. It then covers graph terminology like vertices, edges, paths, cycles, and weighted graphs. Common graph search algorithms like breadth-first search and depth-first search are explained. Variations of these algorithms like recursive depth-first search and Dijkstra's algorithm for finding shortest paths in weighted graphs are also covered. Examples are provided throughout to illustrate the concepts and algorithms.
The document discusses randomized graph algorithms and techniques for analyzing them. It describes a linear time algorithm for finding minimum spanning trees (MST) that samples edges and uses Boruvka's algorithm and edge filtering. It also discusses Karger's algorithm for approximating the global minimum cut in near-linear time using edge contractions. Finally, it presents an approach for 3-approximate distance oracles that preprocesses a graph to build a data structure for answering approximate shortest path queries in constant time using landmark vertices and storing local and global distance information.
Scaling out logistic regression with SparkBarak Gitsis
This document discusses scaling out logistic regression with Apache Spark. It describes the need to classify a large number of websites using machine learning. Several approaches to logistic regression were tried, including a single machine Java implementation and moving to Spark for better scalability. Spark's L-BFGS algorithm was chosen for its out of the box distributed logistic regression solution. Challenges implementing logistic regression at large scale are discussed, such as overfitting and regularization. Methods used to address these challenges include L2 regularization, cross-validation to select the regularization parameter, and extensions made to Spark's LBFGS implementation.
Stockage, manipulation et analyse de données matricielles avec PostGIS RasterACSG Section Montréal
La plus importantes nouveautés de la base de données spatiale open source PostgreSQL/PostGIS 2.0 est le support pour les données raster. PostGIS Raster comprend un outil d’importation similaire à shp2pgsql basé sur GDAL et une série d’opérateurs SQL pour la manipulation et l'analyse des données matricielles. Le nouveau type RASTER est géoréférencé, multi-résolutions et multi-bandes et il supporte une valeur nulle (nodata) et un type de valeur de pixel par bande. PostGIS raster s’inspire de la simplicité de l’expérience vecteur offerte par PostGIS pour rendre toutes les opérations raster aussi simples que possible. Comme pour une couverture vecteur, une couverture raster est divisée en un ensemble d’enregistrements (une ligne = une tuile) stockés dans une seule table (contrairement à Oracle Spatial qui utilise deux types et donc deux tables ou plus). Il est possible d’importer une couverture complète et de la retuiler en une seule commande avec l’outil d’importation et de multiples résolutions de la même couverture peuvent être importées dans des tables adjacentes. Les propriétés des objets raster et de chacune des bandes peuvent être consultées et modifiées ainsi que les valeurs des pixels. Des fonctions existent pour obtenir le minimum, le maximum, la somme, la moyenne, la déviation standard, l’histogramme d’une tuile ou d’une couverture complète. Les fonctions ST_Intersection() et ST_Intersects() fonctionnent pratiquement de manière transparente entre des données raster et vecteur et une série de fonctions pour l’algèbre matricielle (ST_MapAlgebra()) permet de faire de l’analyse de type raster. Il est possible de reclasser les bandes et de les convertir en n’importe quel format d’écriture GDAL. Des fonctions pour générer des rasters et des bandes existent également pour du développement PL/pgSQL. Un driver GDAL pour convertir les couvertures raster en fichiers images est en développement et des plugins pour QGIS et svSIG existent déjà pour les visualiser.
Gradient descent optimization with simple examples. covers sgd, mini-batch, momentum, adagrad, rmsprop and adam.
Made for people with little knowledge of neural network.
Stochastic gradient descent and its tuningArsalan Qadri
This paper talks about optimization algorithms used for big data applications. We start with explaining the gradient descent algorithms and its limitations. Later we delve into the stochastic gradient descent algorithms and explore methods to improve it it by adjusting learning rates.
Big Data Analysis with Signal Processing on GraphsMohamed Seif
This document discusses signal processing on graphs and big data analysis using graph theory concepts. It begins with introducing fundamental graph theory terms like nodes, edges, and adjacency matrices. It then explains how to define graph signals and how signal processing concepts like shifting, filtering, and Fourier transforms can be generalized to graphs. In particular, it describes how the graph shift replaces time shifts, graph filters are polynomials of the graph shift matrix, and the graph Fourier transform uses the eigenvectors of the graph shift matrix as the basis. The document concludes by discussing how eigenvalues represent frequencies on graphs and how filters affect the frequency content of graph signals.
This document introduces Mahout Scala and Spark bindings, which aim to provide an R-like environment for machine learning on Spark. The bindings define algebraic expressions for distributed linear algebra using Spark and provide optimizations. They define data types for scalars, vectors, matrices and distributed row matrices. Features include common linear algebra operations, decompositions, construction/collection functions, HDFS persistence, and optimization strategies. The goal is a high-level semantic environment that can run interactively on Spark.
The Shortest Path Tour Problem is an extension to the normal Shortest Path Problem and appeared in the scientific literature in Bertsekas's dynamic programming and optimal control book in 2005, for the first time. This paper gives a description of the problem, two algorithms to solve it. Results to the numeric experimentation are given in terms of graphs. Finally, conclusion and discussions are made.
Sparse Kernel Learning for Image AnnotationSean Moran
The document describes an approach called Sparse Kernel Continuous Relevance Model (SKL-CRM) for image annotation. SKL-CRM learns data-adaptive visual kernels to better combine different image features like GIST, SIFT, color, and texture. It introduces a binary kernel-feature alignment matrix to learn which kernel functions are best suited to which features by directly optimizing annotation performance on a validation set. Evaluation on standard datasets shows SKL-CRM improves over baselines with fixed 'default' kernels, achieving a relative gain of 10-15% in F1 score.
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEHONGJOO LEE
45 min talk about collecting home network performance measures, analyzing and forecasting time series data, and building anomaly detection system.
In this talk, we will go through the whole process of data mining and knowledge discovery. Firstly we write a script to run speed test periodically and log the metric. Then we parse the log data and convert them into a time series and visualize the data for a certain period.
Next we conduct some data analysis; finding trends, forecasting, and detecting anomalous data. There will be several statistic or deep learning techniques used for the analysis; ARIMA (Autoregressive Integrated Moving Average), LSTM (Long Short Term Memory).
Dijkstra's algorithm is used to find the shortest paths between nodes in a graph. It works by maintaining a tentative distances priority queue, updating distances if shorter paths are found, and repeating until all shortest paths are determined. The time complexity is O(E+V log V) if using a Fibonacci heap, or O(E log E + V) for other priority queues. Dijkstra's algorithm only works for graphs without negative edge costs. It has applications in mapping, routing, and other networks.
This document discusses parallel algorithms for tree problems. It introduces the Euler tour technique for representing trees as lists to allow parallel processing. The technique converts trees to Eulerian graphs by adding edges. It also discusses parallel depth-first search using Euler tours. Tree contraction is presented as a method to evaluate arithmetic expressions represented as binary trees in parallel by successively merging nodes.
Injecting image priors into Learnable Compressive SubsamplingMartino Ferrari
My master thesis work extends the problem formulation of learnable compressive subsampling [1] that focuses on the learning of the best sampling operator in the Fourier domain adapted to spectral properties of a training set of images. I formulated the problem as a reconstruction from a finite number of sparse samples with a prior learned from the external dataset or learned on-fly from the images to be reconstructed. More in
details, I developed two very different methods, one using multiband coding in the spectral domain and the second using a neural network.
The new methods can be applied to many different fields of spectroscopy and Fourier optics, for example in medical (computerized tomography, magnetic resonance spectroscopy) and astronomy (the Square Kilometre Array) imaging, where the capability to reconstruct high-quality images, in the pixel domain, from a limited number of samples, in the frequency domain, is a key issue.
The proposed methods have been tested on diverse datasets covering facial images, medical and multi-band astronomical data, using the mean square error and SSIM as a perceptual measure of the quality of the reconstruction.
Finally, I explored the possible application in data acquisition systems such as computer tomography and radio astronomy. The obtained results demostrate that the properties of the proposed methods have a very promising potential for future research and extensions.
For such reason, the work was both presented at the poster session of the EUSIPCO 2018 conference in Rome and submitted for a EU patent.
[1] L. Baldassarre, Y.-H. Li, J. Scarlett, B. Gözcü, I. Bogunovic, and V.
Cevher, “Learning-based compressive subsampling,” IEEE Journal of Selected
Topics in Signal Processing, vol. 10, no. 4, pp. 809–822, 2016
Jorge Silva, Sr. Research Statistician Developer, SAS at MLconf ATL - 9/18/15MLconf
Estimating the Number of Clusters in Big Data with the Aligned Box Criterion: Finding the number, k, of clusters in a dataset is a fundamental problem in unsupervised learning. It is also an important business problem, e.g. in market segmentation. Existing approaches include the silhouette measure, the gap statistic and Dirichlet process clustering. For thirty years SAS procedures have included the option of using the cubic clustering criterion (CCC) to estimate k. While CCC remains competitive, we propose a significant and original improvement, referred to herein as the aligned box criterion (ABC). Like CCC, ABC is based on a hypothesis-testing framework, but instead of a heuristic measure we use data-adaptive reference distributions to generate more realistic null hypotheses in a scalable and easily parallelizable manner. We have implemented ABC using SAS’ High Performance Analytics platform, and achieve state-of-the-art accuracy in the estimation of k.
This document summarizes a presentation given by Nesreen K. Ahmed on graph sampling techniques. It discusses previous work on sampling large graphs to estimate properties like triangle counts. Existing methods either require multiple passes over the data or make assumptions about the graph stream order. The presentation introduces a new single-pass Graph Priority Sampling framework that can estimate properties in an unbiased way using a fixed-size sample. It assigns edge weights and priorities to sample edges proportional to their contribution to graph structures. Estimates can be updated incrementally during the stream or retrospectively after it ends. The framework is evaluated on real-world graphs with billions of edges to estimate triangle counts, wedge counts, and clustering coefficients with low variance.
Algorithm Design and Complexity - Course 12Traian Rebedea
This document provides an overview of algorithm design and complexity, specifically focusing on flow networks. It defines key concepts related to flow networks including positive flow, net flow, flow properties, maximum flow problems, and minimum cut. It also describes the Ford-Fulkerson method and Edmonds-Karp algorithm for solving maximum flow problems using the concept of a residual network given an initial flow.
Co-occurrence Based Recommendations with Mahout, Scala and Sparksscdotopen
This document discusses techniques for co-occurrence-based recommendations using Apache Mahout, Scala, and Spark. It describes how Mahout computes the co-occurrence matrix ATA using a row-outer product formulation that executes in a single pass over the row-partitioned matrix A. It also explains how the computation is optimized physically by using specialized operators like Transpose-Times-Self to avoid repartitioning the matrix. Finally, it provides examples of how the distributed computation of ATA is implemented across worker nodes.
This document provides an overview of separation logic, including:
- Applications include program analysis, verified software, and axiomatic semantics.
- Future work may focus on logics beyond pre/post conditions to specify order of actions or observable program states.
- SpaceInvader is an implementation of compositional shape analysis via bi-abduction that uses separation logic to reason about mutable data structures.
- Smallfoot is an earlier tool that used symbolic execution and a decidable fragment of separation logic to perform automatic reasoning with Hoare logic for a toy language.
Bayesian Inference and Uncertainty Quantification for Inverse ProblemsMatt Moores
So-called “inverse” problems arise when the parameters of a physical system cannot be directly observed. The mapping between these latent parameters and the space of noisy observations is represented as a mathematical model, often involving a system of differential equations. We seek to infer the parameter values that best fit our observed data. However, it is also vital to obtain accurate quantification of the uncertainty involved with these parameters, particularly when the output of the model will be used for forecasting. Bayesian inference provides well-calibrated uncertainty estimates, represented by the posterior distribution over the parameters. In this talk, I will give a brief introduction to Markov chain Monte Carlo (MCMC) algorithms for sampling from the posterior distribution and describe how they can be combined with numerical solvers for the forward model. We apply these methods to two examples of ODE models: growth curves in ecology, and thermogravimetric analysis (TGA) in chemistry. This is joint work with Matthew Berry, Mark Nelson, Brian Monaghan and Raymond Longbottom.
This document discusses implementing various graph algorithms using GraphBLAS kernels. It describes how degree filtered breadth-first search, k-truss detection, calculating the Jaccard index, and non-negative matrix factorization can be expressed using operations like sparse matrix multiplication, element-wise multiplication, scaling and reduction. The goal is to demonstrate how fundamental graph problems can be solved within the GraphBLAS framework using linear algebraic formulations of graph computations.
This document discusses implementing various graph algorithms using GraphBLAS kernels. It describes how degree filtered breadth-first search, k-truss detection, calculating the Jaccard index, and non-negative matrix factorization can be expressed using operations like SpGEMM, SpMV, element-wise multiplication, and scaling. The goal is to demonstrate how common graph analytics can utilize the linear algebra approach of the GraphBLAS framework.
Scaling out logistic regression with SparkBarak Gitsis
This document discusses scaling out logistic regression with Apache Spark. It describes the need to classify a large number of websites using machine learning. Several approaches to logistic regression were tried, including a single machine Java implementation and moving to Spark for better scalability. Spark's L-BFGS algorithm was chosen for its out of the box distributed logistic regression solution. Challenges implementing logistic regression at large scale are discussed, such as overfitting and regularization. Methods used to address these challenges include L2 regularization, cross-validation to select the regularization parameter, and extensions made to Spark's LBFGS implementation.
Stockage, manipulation et analyse de données matricielles avec PostGIS RasterACSG Section Montréal
La plus importantes nouveautés de la base de données spatiale open source PostgreSQL/PostGIS 2.0 est le support pour les données raster. PostGIS Raster comprend un outil d’importation similaire à shp2pgsql basé sur GDAL et une série d’opérateurs SQL pour la manipulation et l'analyse des données matricielles. Le nouveau type RASTER est géoréférencé, multi-résolutions et multi-bandes et il supporte une valeur nulle (nodata) et un type de valeur de pixel par bande. PostGIS raster s’inspire de la simplicité de l’expérience vecteur offerte par PostGIS pour rendre toutes les opérations raster aussi simples que possible. Comme pour une couverture vecteur, une couverture raster est divisée en un ensemble d’enregistrements (une ligne = une tuile) stockés dans une seule table (contrairement à Oracle Spatial qui utilise deux types et donc deux tables ou plus). Il est possible d’importer une couverture complète et de la retuiler en une seule commande avec l’outil d’importation et de multiples résolutions de la même couverture peuvent être importées dans des tables adjacentes. Les propriétés des objets raster et de chacune des bandes peuvent être consultées et modifiées ainsi que les valeurs des pixels. Des fonctions existent pour obtenir le minimum, le maximum, la somme, la moyenne, la déviation standard, l’histogramme d’une tuile ou d’une couverture complète. Les fonctions ST_Intersection() et ST_Intersects() fonctionnent pratiquement de manière transparente entre des données raster et vecteur et une série de fonctions pour l’algèbre matricielle (ST_MapAlgebra()) permet de faire de l’analyse de type raster. Il est possible de reclasser les bandes et de les convertir en n’importe quel format d’écriture GDAL. Des fonctions pour générer des rasters et des bandes existent également pour du développement PL/pgSQL. Un driver GDAL pour convertir les couvertures raster en fichiers images est en développement et des plugins pour QGIS et svSIG existent déjà pour les visualiser.
Gradient descent optimization with simple examples. covers sgd, mini-batch, momentum, adagrad, rmsprop and adam.
Made for people with little knowledge of neural network.
Stochastic gradient descent and its tuningArsalan Qadri
This paper talks about optimization algorithms used for big data applications. We start with explaining the gradient descent algorithms and its limitations. Later we delve into the stochastic gradient descent algorithms and explore methods to improve it it by adjusting learning rates.
Big Data Analysis with Signal Processing on GraphsMohamed Seif
This document discusses signal processing on graphs and big data analysis using graph theory concepts. It begins with introducing fundamental graph theory terms like nodes, edges, and adjacency matrices. It then explains how to define graph signals and how signal processing concepts like shifting, filtering, and Fourier transforms can be generalized to graphs. In particular, it describes how the graph shift replaces time shifts, graph filters are polynomials of the graph shift matrix, and the graph Fourier transform uses the eigenvectors of the graph shift matrix as the basis. The document concludes by discussing how eigenvalues represent frequencies on graphs and how filters affect the frequency content of graph signals.
This document introduces Mahout Scala and Spark bindings, which aim to provide an R-like environment for machine learning on Spark. The bindings define algebraic expressions for distributed linear algebra using Spark and provide optimizations. They define data types for scalars, vectors, matrices and distributed row matrices. Features include common linear algebra operations, decompositions, construction/collection functions, HDFS persistence, and optimization strategies. The goal is a high-level semantic environment that can run interactively on Spark.
The Shortest Path Tour Problem is an extension to the normal Shortest Path Problem and appeared in the scientific literature in Bertsekas's dynamic programming and optimal control book in 2005, for the first time. This paper gives a description of the problem, two algorithms to solve it. Results to the numeric experimentation are given in terms of graphs. Finally, conclusion and discussions are made.
Sparse Kernel Learning for Image AnnotationSean Moran
The document describes an approach called Sparse Kernel Continuous Relevance Model (SKL-CRM) for image annotation. SKL-CRM learns data-adaptive visual kernels to better combine different image features like GIST, SIFT, color, and texture. It introduces a binary kernel-feature alignment matrix to learn which kernel functions are best suited to which features by directly optimizing annotation performance on a validation set. Evaluation on standard datasets shows SKL-CRM improves over baselines with fixed 'default' kernels, achieving a relative gain of 10-15% in F1 score.
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEHONGJOO LEE
45 min talk about collecting home network performance measures, analyzing and forecasting time series data, and building anomaly detection system.
In this talk, we will go through the whole process of data mining and knowledge discovery. Firstly we write a script to run speed test periodically and log the metric. Then we parse the log data and convert them into a time series and visualize the data for a certain period.
Next we conduct some data analysis; finding trends, forecasting, and detecting anomalous data. There will be several statistic or deep learning techniques used for the analysis; ARIMA (Autoregressive Integrated Moving Average), LSTM (Long Short Term Memory).
Dijkstra's algorithm is used to find the shortest paths between nodes in a graph. It works by maintaining a tentative distances priority queue, updating distances if shorter paths are found, and repeating until all shortest paths are determined. The time complexity is O(E+V log V) if using a Fibonacci heap, or O(E log E + V) for other priority queues. Dijkstra's algorithm only works for graphs without negative edge costs. It has applications in mapping, routing, and other networks.
This document discusses parallel algorithms for tree problems. It introduces the Euler tour technique for representing trees as lists to allow parallel processing. The technique converts trees to Eulerian graphs by adding edges. It also discusses parallel depth-first search using Euler tours. Tree contraction is presented as a method to evaluate arithmetic expressions represented as binary trees in parallel by successively merging nodes.
Injecting image priors into Learnable Compressive SubsamplingMartino Ferrari
My master thesis work extends the problem formulation of learnable compressive subsampling [1] that focuses on the learning of the best sampling operator in the Fourier domain adapted to spectral properties of a training set of images. I formulated the problem as a reconstruction from a finite number of sparse samples with a prior learned from the external dataset or learned on-fly from the images to be reconstructed. More in
details, I developed two very different methods, one using multiband coding in the spectral domain and the second using a neural network.
The new methods can be applied to many different fields of spectroscopy and Fourier optics, for example in medical (computerized tomography, magnetic resonance spectroscopy) and astronomy (the Square Kilometre Array) imaging, where the capability to reconstruct high-quality images, in the pixel domain, from a limited number of samples, in the frequency domain, is a key issue.
The proposed methods have been tested on diverse datasets covering facial images, medical and multi-band astronomical data, using the mean square error and SSIM as a perceptual measure of the quality of the reconstruction.
Finally, I explored the possible application in data acquisition systems such as computer tomography and radio astronomy. The obtained results demostrate that the properties of the proposed methods have a very promising potential for future research and extensions.
For such reason, the work was both presented at the poster session of the EUSIPCO 2018 conference in Rome and submitted for a EU patent.
[1] L. Baldassarre, Y.-H. Li, J. Scarlett, B. Gözcü, I. Bogunovic, and V.
Cevher, “Learning-based compressive subsampling,” IEEE Journal of Selected
Topics in Signal Processing, vol. 10, no. 4, pp. 809–822, 2016
Jorge Silva, Sr. Research Statistician Developer, SAS at MLconf ATL - 9/18/15MLconf
Estimating the Number of Clusters in Big Data with the Aligned Box Criterion: Finding the number, k, of clusters in a dataset is a fundamental problem in unsupervised learning. It is also an important business problem, e.g. in market segmentation. Existing approaches include the silhouette measure, the gap statistic and Dirichlet process clustering. For thirty years SAS procedures have included the option of using the cubic clustering criterion (CCC) to estimate k. While CCC remains competitive, we propose a significant and original improvement, referred to herein as the aligned box criterion (ABC). Like CCC, ABC is based on a hypothesis-testing framework, but instead of a heuristic measure we use data-adaptive reference distributions to generate more realistic null hypotheses in a scalable and easily parallelizable manner. We have implemented ABC using SAS’ High Performance Analytics platform, and achieve state-of-the-art accuracy in the estimation of k.
This document summarizes a presentation given by Nesreen K. Ahmed on graph sampling techniques. It discusses previous work on sampling large graphs to estimate properties like triangle counts. Existing methods either require multiple passes over the data or make assumptions about the graph stream order. The presentation introduces a new single-pass Graph Priority Sampling framework that can estimate properties in an unbiased way using a fixed-size sample. It assigns edge weights and priorities to sample edges proportional to their contribution to graph structures. Estimates can be updated incrementally during the stream or retrospectively after it ends. The framework is evaluated on real-world graphs with billions of edges to estimate triangle counts, wedge counts, and clustering coefficients with low variance.
Algorithm Design and Complexity - Course 12Traian Rebedea
This document provides an overview of algorithm design and complexity, specifically focusing on flow networks. It defines key concepts related to flow networks including positive flow, net flow, flow properties, maximum flow problems, and minimum cut. It also describes the Ford-Fulkerson method and Edmonds-Karp algorithm for solving maximum flow problems using the concept of a residual network given an initial flow.
Co-occurrence Based Recommendations with Mahout, Scala and Sparksscdotopen
This document discusses techniques for co-occurrence-based recommendations using Apache Mahout, Scala, and Spark. It describes how Mahout computes the co-occurrence matrix ATA using a row-outer product formulation that executes in a single pass over the row-partitioned matrix A. It also explains how the computation is optimized physically by using specialized operators like Transpose-Times-Self to avoid repartitioning the matrix. Finally, it provides examples of how the distributed computation of ATA is implemented across worker nodes.
This document provides an overview of separation logic, including:
- Applications include program analysis, verified software, and axiomatic semantics.
- Future work may focus on logics beyond pre/post conditions to specify order of actions or observable program states.
- SpaceInvader is an implementation of compositional shape analysis via bi-abduction that uses separation logic to reason about mutable data structures.
- Smallfoot is an earlier tool that used symbolic execution and a decidable fragment of separation logic to perform automatic reasoning with Hoare logic for a toy language.
Bayesian Inference and Uncertainty Quantification for Inverse ProblemsMatt Moores
So-called “inverse” problems arise when the parameters of a physical system cannot be directly observed. The mapping between these latent parameters and the space of noisy observations is represented as a mathematical model, often involving a system of differential equations. We seek to infer the parameter values that best fit our observed data. However, it is also vital to obtain accurate quantification of the uncertainty involved with these parameters, particularly when the output of the model will be used for forecasting. Bayesian inference provides well-calibrated uncertainty estimates, represented by the posterior distribution over the parameters. In this talk, I will give a brief introduction to Markov chain Monte Carlo (MCMC) algorithms for sampling from the posterior distribution and describe how they can be combined with numerical solvers for the forward model. We apply these methods to two examples of ODE models: growth curves in ecology, and thermogravimetric analysis (TGA) in chemistry. This is joint work with Matthew Berry, Mark Nelson, Brian Monaghan and Raymond Longbottom.
This document discusses implementing various graph algorithms using GraphBLAS kernels. It describes how degree filtered breadth-first search, k-truss detection, calculating the Jaccard index, and non-negative matrix factorization can be expressed using operations like sparse matrix multiplication, element-wise multiplication, scaling and reduction. The goal is to demonstrate how fundamental graph problems can be solved within the GraphBLAS framework using linear algebraic formulations of graph computations.
This document discusses implementing various graph algorithms using GraphBLAS kernels. It describes how degree filtered breadth-first search, k-truss detection, calculating the Jaccard index, and non-negative matrix factorization can be expressed using operations like SpGEMM, SpMV, element-wise multiplication, and scaling. The goal is to demonstrate how common graph analytics can utilize the linear algebra approach of the GraphBLAS framework.
Support Vector Machines in MapReduce presented an overview of support vector machines (SVMs) and how to implement them in a MapReduce framework to handle large datasets. The document discussed the theory behind basic linear SVMs and generalized multi-classification SVMs. It explained how to parallelize SVM training using stochastic gradient descent and randomly distributing samples across mappers and reducers. The document also addressed handling non-linear SVMs using kernel methods and approximations that allow SVMs to be treated as a linear problem in MapReduce. Finally, examples were given of large companies using SVMs trained on MapReduce to perform customer segmentation and improve inventory value.
The document describes several graph layout programs including dot, neato, twopi, circo, fdp, and sfdp. These programs take graph files as input and output drawings of the graphs in various formats like PostScript, SVG, and bitmap images. The programs use different algorithms to determine the layout, such as hierarchies for dot, springs for neato and fdp, radial layouts for twopi, and circular layouts for circo. The document provides details on the command line syntax, input graph file format, and attributes that control the graph drawing output.
Advanced Data Science with Apache Spark-(Reza Zadeh, Stanford)Spark Summit
The document discusses graph computations and Pregel, an API for graph processing. It introduces Pregel's vertex-centric programming model where computation is organized into supersteps and depends only on neighboring vertices. Examples like PageRank are shown implemented in Pregel. GraphX is also introduced as a library providing Pregel-like abstractions on Spark. The document then discusses distributing matrix computations, covering partitioning schemes for matrices and how to distribute operations like multiplication and singular value decomposition (SVD) across a cluster.
CS 354 Transformation, Clipping, and CullingMark Kilgard
This document summarizes a lecture on graphics transformations, clipping, and culling. It discusses how vertex positions are transformed from object space to normalized device coordinates space using the modelview and projection matrices. It also covers generalized clipping against the view frustum and user-defined clip planes, as well as back face culling. The lecture provides examples of translation, rotation, scaling, orthographic, and perspective transformations.
Greg Hogan – To Petascale and Beyond- Apache Flink in the CloudsFlink Forward
http://flink-forward.org/kb_sessions/to-petascale-and-beyond-apache-flink-in-the-clouds/
Apache Flink performs with low latency but can also scale to great heights. Gelly is Flink’s laboratory for building and tuning scalable graph algorithms and analytics. In this talk we’ll discuss writing algorithms optimized for the Flink architecture, assembling and configuring a cloud compute cluster, and boosting performance through benchmarking and system profiling. This talk will cover recent developments in the Gelly library to include scalable graph generators and a mixed collection of modular algorithms written with native Flink operators. We’ll think like a data stream, keep a cool cache, and send the garbage collector on holiday. To this we’ll add a lightweight benchmarking harness to stress and validate core Flink and to identify and refactor hot code with aplomb.
NV_path_rendering is an OpenGL extension for CUDA-capable NVIDIA GPUs for performing resolution-independent 2D rendering. Standards such as Scalable Vector Graphics (SVG), PostScript, PDF, Adobe Flash, and TrueType fonts rely on path rendering. With NV_path_rendering, this important class of rendering is accelerated by the GPU in a way that co-exists with conventional 3D rendering.
For more information see:
http://developer.nvidia.com/nv-path-rendering
Here are the steps to plot the given functions using MATLAB:
1. Plot y = 0.4x + 1.8 for 0 ≤ x ≤ 35 and 0 ≤ y ≤ 3.5:
x = 0:35;
y = 0.4.*x + 1.8;
plot(x,y)
xlim([0 35])
ylim([0 3.5])
2. Plot imaginary vs real parts of 0.2 + 0.8i*n for 0 ≤ n ≤ 20:
n = 0:20;
z = 0.2 + 0.8i*n;
plot(real(z),imag(z))
xlabel('Real Part')
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Spark Summit
Real-world graphs are seldom static. Applications that generate
graph-structured data today do so continuously, giving rise to an underlying graph whose structure evolves over time. Mining these time-evolving graphs can be insightful, both from research and business perspectives. While several works have focused on some individual aspects, there exists no general purpose time-evolving graph processing engine.
We present Tegra, a time-evolving graph processing system built
on a general-purpose dataflow framework. We introduce Timelapse, a flexible abstraction that enables efficient analytics on evolving graphs by allowing graph-parallel stages to iterate over complete history of nodes. We use Timelapse to present two computational models, a temporal analysis model for performing computations on multiple snapshots of an evolving graph, and a generalized incremental computation model for efficiently updating results of computations.
This document discusses several graph algorithms:
1) Topological sort is an ordering of the vertices of a directed acyclic graph (DAG) such that for every edge from vertex u to v, u comes before v in the ordering. It can be used to find a valid schedule respecting dependencies.
2) Strongly connected components are maximal subsets of vertices in a directed graph such that there is a path between every pair of vertices. An algorithm uses depth-first search to find SCCs in linear time.
3) Minimum spanning trees find a subset of edges that connects all vertices at minimum total cost. Prim's and Kruskal's algorithms find minimum spanning trees using greedy strategies in O(E
This document discusses several graph algorithms:
1) Topological sort is an ordering of the vertices of a directed acyclic graph (DAG) such that for every edge from vertex u to v, u comes before v in the ordering. It can be used to find a valid schedule respecting dependencies.
2) Strongly connected components are maximal subsets of vertices in a directed graph such that there is a path between every pair of vertices. An algorithm uses depth-first search to find SCCs in linear time.
3) Minimum spanning trees find a subset of edges that connects all vertices at minimum total cost. Prim's and Kruskal's algorithms find minimum spanning trees using greedy strategies in O(E
I am Boris M. I am a Computer Science Assignment Help Expert at programminghomeworkhelp.com. I hold MSc. in Programming, McGill University, Canada. I have been helping students with their homework for the past 7 years. I solve assignments related to Computer Science.
Visit programminghomeworkhelp.com or email support@programminghomeworkhelp.com.You can also call on +1 678 648 4277 for any assistance with Computer Science assignments.
This document discusses algorithms that can be implemented using MapReduce, including sorting, searching, TF-IDF, breadth-first search, PageRank, and more advanced algorithms. It provides details on how sorting, searching, TF-IDF, breadth-first search, and PageRank algorithms work in MapReduce, including explaining the map and reduce phases. It also discusses graph representations that can be used for algorithms like breadth-first search and PageRank and how the algorithms are distributed across parallel tasks in MapReduce.
The document provides an introduction to MATLAB and Simulink. It describes MATLAB as a numerical computing environment and matrix laboratory that is used for data analysis, algorithm development, modeling, and more across many disciplines. Simulink is introduced as a block diagram environment for multi-domain simulation and model-based design. Key features and uses of MATLAB and Simulink are outlined, including acquiring and analyzing data, developing functions and algorithms, modeling and simulation.
Scalable and Adaptive Graph Querying with MapReduceKyong-Ha Lee
This document summarizes a research paper that proposes a distributed graph querying algorithm called MR-Graph that employs MapReduce. MR-Graph uses a filter-and-verify scheme to first filter graphs based on contained features before verifying subgraph isomorphism. It also adaptively tunes the feature size at runtime by sampling data graphs to determine the most appropriate size. The experiments showed MR-Graph outperforms conventional algorithms in scalability and efficiency for processing multiple graph queries over massive datasets.
The document provides an algorithm and sample program to implement Bresenham's circle drawing algorithm in C.
The algorithm reads the radius of the circle, initializes the starting points and decision variable, and then uses a do-while loop to plot pixels on the circle by incrementing x and conditionally incrementing or decrementing y based on the decision variable.
The sample program includes code to read the radius, initialize graphics mode, set starting points, and implement the do-while loop to plot pixels and delay between each pixel for visualization. It plots all four quadrants of the circle.
Graph Algorithms, Sparse Algebra, and the GraphBLAS with Janice McMahonChristopher Conlan
This talk will provide a very brief overview of graph algorithms and their expression using sparse linear algebra, followed by a high-level description of the GraphBLAS library and its usage.
Graphs are among the most important abstract data types in computer science, and the algorithms that operate on them are critical to modern life. Algorithms on graphs are applied in many ways in today's world—from Web rankings to metabolic networks, from finite element meshes to semantic graphs. Graphs have been shown to be powerful tools for modeling these complex problems because of their simplicity and generality. GraphBLAS is an API specification that defines standard building blocks for graph algorithms in the language of linear algebra. Graph algorithms have long taken advantage of the idea that a graph can be represented as a matrix, and graph operations can be performed as linear transformations and other linear algebraic operations on sparse matrices. For example, matrix-vector multiplication can be used to perform a step in a breadth-first search. The GraphBLAS specification (and the various libraries that implement it) provides data structures and functions to compute these linear algebraic operations. In particular, the GraphBLAS specifies sparse matrix objects which map well to graphs where vertices are likely connected to relatively few neighbors (i.e. the degree of a vertex is significantly smaller than the total number of vertices in the graph). The benefits of this approach are reduced algorithmic complexity, ease of implementation, and improved performance.
Similar to ABSTRACT GRAPH MACHINE: MODELING ORDERINGS IN ASYNCHRONOUS DISTRIBUTED-MEMORY PARALLEL GRAPH ALGORITHMS (20)
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
Immersive Learning That Works: Research Grounding and Paths ForwardLeonel Morgado
We will metaverse into the essence of immersive learning, into its three dimensions and conceptual models. This approach encompasses elements from teaching methodologies to social involvement, through organizational concerns and technologies. Challenging the perception of learning as knowledge transfer, we introduce a 'Uses, Practices & Strategies' model operationalized by the 'Immersive Learning Brain' and ‘Immersion Cube’ frameworks. This approach offers a comprehensive guide through the intricacies of immersive educational experiences and spotlighting research frontiers, along the immersion dimensions of system, narrative, and agency. Our discourse extends to stakeholders beyond the academic sphere, addressing the interests of technologists, instructional designers, and policymakers. We span various contexts, from formal education to organizational transformation to the new horizon of an AI-pervasive society. This keynote aims to unite the iLRN community in a collaborative journey towards a future where immersive learning research and practice coalesce, paving the way for innovative educational research and practice landscapes.
When I was asked to give a companion lecture in support of ‘The Philosophy of Science’ (https://shorturl.at/4pUXz) I decided not to walk through the detail of the many methodologies in order of use. Instead, I chose to employ a long standing, and ongoing, scientific development as an exemplar. And so, I chose the ever evolving story of Thermodynamics as a scientific investigation at its best.
Conducted over a period of >200 years, Thermodynamics R&D, and application, benefitted from the highest levels of professionalism, collaboration, and technical thoroughness. New layers of application, methodology, and practice were made possible by the progressive advance of technology. In turn, this has seen measurement and modelling accuracy continually improved at a micro and macro level.
Perhaps most importantly, Thermodynamics rapidly became a primary tool in the advance of applied science/engineering/technology, spanning micro-tech, to aerospace and cosmology. I can think of no better a story to illustrate the breadth of scientific methodologies and applications at their best.
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...AbdullaAlAsif1
The pygmy halfbeak Dermogenys colletei, is known for its viviparous nature, this presents an intriguing case of relatively low fecundity, raising questions about potential compensatory reproductive strategies employed by this species. Our study delves into the examination of fecundity and the Gonadosomatic Index (GSI) in the Pygmy Halfbeak, D. colletei (Meisner, 2001), an intriguing viviparous fish indigenous to Sarawak, Borneo. We hypothesize that the Pygmy halfbeak, D. colletei, may exhibit unique reproductive adaptations to offset its low fecundity, thus enhancing its survival and fitness. To address this, we conducted a comprehensive study utilizing 28 mature female specimens of D. colletei, carefully measuring fecundity and GSI to shed light on the reproductive adaptations of this species. Our findings reveal that D. colletei indeed exhibits low fecundity, with a mean of 16.76 ± 2.01, and a mean GSI of 12.83 ± 1.27, providing crucial insights into the reproductive mechanisms at play in this species. These results underscore the existence of unique reproductive strategies in D. colletei, enabling its adaptation and persistence in Borneo's diverse aquatic ecosystems, and call for further ecological research to elucidate these mechanisms. This study lends to a better understanding of viviparous fish in Borneo and contributes to the broader field of aquatic ecology, enhancing our knowledge of species adaptations to unique ecological challenges.
Authoring a personal GPT for your research and practice: How we created the Q...Leonel Morgado
Thematic analysis in qualitative research is a time-consuming and systematic task, typically done using teams. Team members must ground their activities on common understandings of the major concepts underlying the thematic analysis, and define criteria for its development. However, conceptual misunderstandings, equivocations, and lack of adherence to criteria are challenges to the quality and speed of this process. Given the distributed and uncertain nature of this process, we wondered if the tasks in thematic analysis could be supported by readily available artificial intelligence chatbots. Our early efforts point to potential benefits: not just saving time in the coding process but better adherence to criteria and grounding, by increasing triangulation between humans and artificial intelligence. This tutorial will provide a description and demonstration of the process we followed, as two academic researchers, to develop a custom ChatGPT to assist with qualitative coding in the thematic data analysis process of immersive learning accounts in a survey of the academic literature: QUAL-E Immersive Learning Thematic Analysis Helper. In the hands-on time, participants will try out QUAL-E and develop their ideas for their own qualitative coding ChatGPT. Participants that have the paid ChatGPT Plus subscription can create a draft of their assistants. The organizers will provide course materials and slide deck that participants will be able to utilize to continue development of their custom GPT. The paid subscription to ChatGPT Plus is not required to participate in this workshop, just for trying out personal GPTs during it.
Or: Beyond linear.
Abstract: Equivariant neural networks are neural networks that incorporate symmetries. The nonlinear activation functions in these networks result in interesting nonlinear equivariant maps between simple representations, and motivate the key player of this talk: piecewise linear representation theory.
Disclaimer: No one is perfect, so please mind that there might be mistakes and typos.
dtubbenhauer@gmail.com
Corrected slides: dtubbenhauer.com/talks.html
hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills
ESPP presentation to EU Waste Water Network, 4th June 2024 “EU policies driving nutrient removal and recycling
and the revised UWWTD (Urban Waste Water Treatment Directive)”
3. Graphs
• Graph is an ordered pair, G = (V, E) where V is a set of vertices and E
is a set of edges ( ).
3Abstract Graph Machine
Our Focus.
4. Graph Applications (Graph Problems)
• Single Source Shortest Paths
• Breadth First Search
• Connected Components
• Maximal Independent Set
• Minimum Spanning Tree
• Graph Coloring
• And many more …
4Abstract Graph Machine
Multiple algorithms to solve a
graph problem.
E.g.,
- Dijkstra’s SSSP
- Bellman-Ford
- Etc.
5. Why Performance in Parallel Graph Algorithms is
Challenging ?
• Low compute/communication ratio (caused by irregularity)
• Synchronization overhead
• Higher amount of work
• Many dependencies
- Input graph
- The algorithm logic
- The underlying runtime
- Etc.
5Abstract Graph Machine
6. Regular vs. Irregular
Communication only take
place at the data boundary
Communication take place
everywhere.
Low communication/computation
ratio.
High communication/computation
ratio.
6Abstract Graph Machine
7. Synchronization Overhead
• Many existing parallel graph algorithms are designed focusing
shared-memory parallel architectures.
• When we ”extend” these algorithms for distributed-memory
execution, they end up having many synchronization phases.
Synchronization overhead is significant when processing a
large graph on many distributed nodes.
7Abstract Graph Machine
8. Shared-Memory Parallel Graph Algorithms
• Shared-memory parallel algorithms can be describe abstractly using the
Parallel Random Access Machine (PRAM) model.
Independent
processors
with its own
memory (e.g.,
registers)
Shared-memory
8Abstract Graph Machine
9. Shared-Memory Parallel Graph Algorithms
.
.
.
.
.
Consists of
multiple phases
like these
Synchronize
Synchronize
These algorithms can be naturally extended to distributed-memory using
Bulk Synchronous Parallel (BSP).
9Abstract Graph Machine
10. Shared-Memory to Distributed-Memory
.
.
.
.
.
.
.
.
.
. Global Barrier
Global Barrier
Executes many global barriers. Synchronization overhead is
significant, especially when executing the program on many nodes.
10Abstract Graph Machine
13. Asynchronous Graph Algorithms
Level synchronous
BFS, using “S” as
the source.
Synchronize
Once a label is set, no corrections 0 additional work.
13Abstract Graph Machine
16. Asynchronous Graph Algorithms
• Asynchronous graph algorithms are better in the sense
they avoid the overhead of synchronization.
• BUT, they tend to generate lot of redundant work.
- High runtime
16Abstract Graph Machine
17. How much Synchronization?
Low
diameter
Have enough
parallel work in
each level
Fewer barriers
(diameter ~ barriers)
E.g., Twitter, ER.
Level synchronous execution is
better.
.
.
.
High
diameter
Not enough
parallel work in a
level
E.g., Road NW.
Asynchronous execution
is better.
We need a way to control the amount of synchronization needed by a graph
algorithm Abstract Graph Machine (AGM) is a model that can control the
level of synchronization of a graph algorithm using an ordering.
17Abstract Graph Machine
18. SSSP as an Example
• In an undirected weighted graph (G=(V, E)), the Single Source Shortest Path
problem is to find a path to every vertex so that the sum of weights in that path is
minimized.
18Abstract Graph Machine
19. Algorithms
void Chaotic(Graph g, Vertex source) {
For each Vertex v in G {
state[v] <- INFINITY;
}
Relax(source, 0);
}
void Relax(Vertex v, Distance d) {
If (d < state[v]) {
state[v] <- d;
For each edge e of v {
Vertex u = target_vertex(e);
Relax(u, d+weight[e]);
}
}
}
void Dijkstra(Graph g, Vertex source)
{
For each Vertex v in G {
state[v] <- INFINITY;
}
pq.insert(<source, 0>);
While pq is not empty {
<v, d> = vertex-distance pair with
minimum distance in pq;
Relax(v, d);
}
}
void Relax(Vertex v, Distance d) {
If (d < state[v]) {
state[v] <- d;
For each edge e of v {
Vertex u = target_vertex(e);
pq.insert(<u, d+weight[e]>);
}
}
}
void -Stepping(Graph g, Vertex source) {
For each Vertex v in G {
state[v] <- INFINITY;
}
insert <source, 0> to appropriate bucket from
buckets;
While all buckets are not empty {
bucket = smallest non-empty bucket in buckets;
For each <v, d> in bucket {
Relax(v, d);
}
}
}
void Relax(Vertex v, Distance d) {
If (d < state[v]) {
state[v] <- d
For each edge e of v {
Vertex u = target_vertex(e);
insert <u, d+weight[e]> to appropriate
bucket in buckets;
}
}
}
Relax
Relax
Relax
Process random
vertex
Put vertex in priority
queue
Put vertex in ∆
priority queue
19Abstract Graph Machine
20. SSSP Algorithms Processing Work
Chaotic ▲-Stepping KLA Dijkstras
Vertex, distance pairs
generated by the Relax
function are fed to itself.
Vertex distance pairs
generated in by the Relax
function are ordered based on
∆. Generated vertex-distance
pair is inserted into
appropriated, bucket and
vertex, distance pairs in the
first bucket is processed first.
Vertex, distance pairs are
ordered based on the level
from the source vertex.
Buckets are created based on
level interval k.
Vertex, distance pairs are
ordered based on the
distance. Smallest distant
vertex, distance pairs are
processed first.
Relax
State
update
Ordering of
updates
Same Same
Different Different
20Abstract Graph Machine
21. Ordering of SSSP Algorithms
Chaotic ▲-Stepping KLA Dijkstra
Generated vertex,
distance pairs are
selected randomly and not
ordered.
Generated vertex,
distance pairs are
separated into several
partitions based on a
relation defined on the
distance and ∆.
Generated vertex,
distance pairs are
separated into partitions
based on the level and k.
Generated vertex,
distance pairs are
separated into partitions
based on the distinct
distance values.
Single partition. Multiple partitions based
on ∆.
Multiple partitions based
on the level.
Multiple partitions based
on the distance.
(v0,d0)
(v1,d1)
(v2,d2)
(v3,d3)
(v4,d4)(v5,d5)
(v6,d6) (v0,d0)(v1,d1)
(v2,d2)(v3,d3)
(v4,d4)(v5,d5)
(v6,d6) (v0,d0)
(v1,d1)
(v2,d2)
(v3,d3)
(v4,d4)
(v5,d5)
(v6,d6)
(v0,d0)
(v1,d1)
(v2,d2)
…
Unordered
Unordered
within ∆
Unordered
within k
Ordered
Ordered
between ∆
Ordered
between k
21Abstract Graph Machine
22. Distributed Memory Implementations
Chaotic ▲-Stepping KLA Dijkstras
Barriers are executed at
the beginning of the
algorithm and at the end
of the algorithm.
A barrier is executed after
processing a ∆ bucket.
A barrier is executed after
processing k levels.
A barrier is executed after
processing a vertex-
distance pair with different
distance.
(v0,d0)
(v2,d2)
(v4,d4)
(v6,d6)
(v1,d1)
(v3,d3)
(v5,d5)
R0 R1
(v1,d1)
(v3,d3)
(v5,d5)
(v0,d0)
(v2,d2)
(v4,d4)
(v6,d6)
R0 R1
(v0,d0)
(v2,d2)
(v4,d4)
(v6,d6)
(v1,d1)
(v3,d3)
(v5,d5)
R0 R1
(v0,d0)
(v2,d2)
…
(v1,d1)
…
R0 R1
• Two ranks (R0 and R1)
• Vertices are distributed (e.g., Even vertices in R0 and odd vertices in R1).
• Barrier execution :
22Abstract Graph Machine
23. Abstract Graph Machine for SSSP
Strict weak ordering
relation defined on work
units.
The strict weak ordering relation creates equivalence
classes and induces an ordering on generated equivalence
classes.
23Abstract Graph Machine
24. Abstract Graph Machine
• Abstract model that represents graph algorithms as a processing function and
strict weak ordering
• In AGM for SSSP, the (vertex, distance) pair is called a workitem
• An AGM consists of:
A definition of a graph (vertices, edges, vertex attributes, edge attributes)
A set of states (of the computation, e.g., distances)
A definition of workitems
A processing function
A strict weak ordering relation
An initial workitem set (to start the algorithm)
24Abstract Graph Machine
25. The Abstract Graph Machine (AGM)
Definition of
graph
Definition of
workitems
Set of states
Processing
function
Strict weak
ordering
Initial set of
workitems
Including vertex and
edge attributes
SSSP: distance
from source
SSSP: relax
SSSP: edge weight
SSSP: starting vertex SSSP: depends on
algorithm!
25Abstract Graph Machine
26. SSSP Algorithms in AGM
• In general, a SSSP algorithm’s workitem consists of a vertex and a
distance. Therefore, the set of workitems for AGM is defined as
follows:
• is a pair (e.g., w = <v, d>). The “[]”
operator retrieves values associated with ”w”. E.g., w[0] returns the
vertex associated with workitem and w[1] returns the distances.
• The state of the algorithms (shortest distance calculated at a point in
time) is stored in a state. We call this state “distance”.
26Abstract Graph Machine
27. The Processing Function for SSSP
Input workitem. The
workitem is a pair and
w[0] returns the
vertex associated to
the workitem and w[1]
returns the distance
associated.
1. Check this
condition. If input
workitem’s distance is
better than stored
distance go to 2.
2. Updated
distance and go
to 3.
3. Generate new
workitems.
relax() in AGM notation
27Abstract Graph Machine
28. State Update & Work Generation
28Abstract Graph Machine
State update logic.
Work generation
logic.
30. Algorithm Family
• All algorithms share the same processing function but with different
orderings.
• The collection of algorithms that share the same processing function
is called an algorithm family.
30Abstract Graph Machine
31. Distributed ∆–Stepping Algorithm
Partitions created by global
ordering, based on relation
the .
First bucket created
by delta-stepping
algorithm
Work within a bucket
is not ordered
Partitions are ordered
∆–Stepping Algorithm
on two ranks (R0, R1)
Huge opportunity for
acceleration
31Abstract Graph Machine
32. Extended AGM (EAGM): ∆–Stepping + thread level ordering
For a specific algorithm, work
within a bucket is not ordered
AGM creates buckets with
global ordering, based on
relation .
And we can do
these in parallel
Each rank can process its
bucket with a better ordering!
But why be
unordered?
32Abstract Graph Machine
E.g., .
33. Extended AGM (EAGM): Hybrid Hierarchical Ordering
As we distribute/parallelize, we
create different spatial domains
for processing
Extended AGM: Use
different orderings in each
domain
33
E.g.,
Abstract Graph Machine
34. Extended AGM in General
Abstract Graph Machine 34
AGM takes a
processing function
and a strict weak
ordering relation
EAGM takes a
processing function
and a hierarchy of
strict weak ordering
relations attached to
each spatial domain.
Global
Ordering
Node
Ordering
NUMA
Ordering
Thread
Ordering
35. Extended AGM (EAGM): E.g., ∆–Stepping with NUMA level
ordering.
• Work within each bucket is further ordered using NUMA local priority
queues.
35Abstract Graph Machine
ordering for B1
bucket and NUMA domain
1 in Rank 0
36. Extended AGM (EAGM) - ∆–Stepping with node level
ordering.
• Work within each bucket is further ordered using node local priority
queues.
36Abstract Graph Machine
ordering for B1
bucket in Rank 0
38. template<typename buckets>
void PF(const WorkItem& wi,
int tid,
buckets& outset) {
Vertex v = std::get<0>(wi);
Distance d = std::get<1>(wi);
Distance old_dist = vdistance[v], last_old_dist;
while (d < old_dist) {
Distance din = CAS (d, &vdistance[v]);
if (din == old_dist) {
FORALL_OUTEDGES_T(v, e, g, Graph) {
Vertex u = boost::target(e, g);
Weight we = boost::get(vweight, e);
WorkItem wigen(u, (d+we));
outset.push(wigen, tid);
}
break;
} else {
old_dist = din;
}
}
}
AGM Graph Processing Framework
Abstract Graph Machine 38
//================== Dijkstra Ordering =====================
template<int index>
struct dijkstra : public base_ordering {
public:
static const eagm_ordering ORDERING_NAME = eagm_ordering::enum_dijkstra;
template <typename T>
bool operator()(T i, T j) {
return (std::get<index>(i) < std::get<index>(j));
}
eagm_ordering name() {
return ORDERING_NAME;
}
};
The strict weak
ordering, partitions
WorkItems. In this
case the ordering is
based on the
distance.
The processing
function for
SSSP.
For SSSP a
”WorkItem” is a
vertex and distance.
CAS = Atomic
compare & swap
After executing wi,
processing function
generates new
work items and
they are pushed for
ordering.
39. Invoking the Framework
Abstract Graph Machine 39
// SSSP (AGM) algorithm
typedef agm<Graph,
WorkItem,
ProcessingFunction,
StrictWeakOrdering,
RuntimeModelGen> sssp_agm_t;
sssp_agm_t ssspalgo(rtmodelgen,
ordering,
pf,
initial);
Creating the
AGM for SSSP.
The WorkItem type.
The processing function
(preorder) type.
The strict weak ordering
relation type.
The underlying runtime.
(AGM framework abstracts
out the runtime).
The underlying
runtime.
The specific ordering
function (e.g., Delta
Ordering).
The processing
function instance.
The initial work item
set (e.g., source
vertex).
40. Mapping AGM to an Implementation
Abstract Graph Machine 40
Q2. How should we arrange
processing function and
ordering ?
Q1. How should we do
ordering ?
Abstract
Actual
ImplementationCommunication
method
41. Ordering Implementation
• Every compute node keeps a set of equivalence classes locally in a data structure.
• Equivalence classes are ordered according to the strict weak ordering (swo)
• Every equivalence class has:
A representative work-item
An append buffer that holds work items for the equivalence class.
Abstract Graph Machine 41
Representation
Shared-memory
parallel thread
Holds all the work
items that are not
comparable to the
representative.
Strict weak ordering
is transitive.
Will discuss later.
An equivalence
class.
42. Ordering Implementation: Inserts to an Equivalence
class
• Finds the equivalence class by comparing with representative
work items.
• If there is no equivalence class, create a one with the inserting
work item as the representative work item.
- Deal with concurrency
Abstract Graph Machine 42
Inserting element is not
comparable with the
representative element.
43. Ordering Implementation: Processing an Equivalence
Class
• Find the equivalence class with minimum representation workitem.
- Every rank sends the representative work item of the first equivalence class in the list. (Uses all
reduce to communicate minimum work items)
- Selects the smallest equivalence class
Abstract Graph Machine 43
Equivalence
classes are not
distributed
uniformly
If a particular node does
not have an equivalence
class relevant to the
global minimum
representative, then
insert one.
44. Ordering Implementation: Termination of an Equivalence
Class
• Termination of a single equivalence class
When processing an equivalence class more work can be
added to the same equivalence class.
Every node keeps track of number of work items pushed into
the equivalence class and number of work items processed in
the equivalence class.
The difference between those two numbers are globally
reduced (sum) and when the sum is zero framework decides
that equivalence class is processed.
Abstract Graph Machine 44
Counts are reduced
after making sure
all messages are
exchanged (i.e.
after an epoch).
45. AGM Implementation: Data Structure
• A data structure is needed to hold the equivalence classes
Later this data structure will be further extended to use at
EAGM levels.
• Requirements of the Data Structure
Should order nodes by representative work items
Faster lookups are important
Ability to concurrently insert/delete and lookup/search data
Abstract Graph Machine 45
Sounds like a
Dictionary ADT.
46. Data Structure
• Binary Search Trees (BST), with Locks
• Linked List, with Locks and Atomics
• Concurrent SkipList
• Partitioning Scheme
Abstract Graph Machine 46
E.g, RBTrees. Often
balances the tree. Therefore,
need to lock the whole data
structure for insert/delete
Not much difference between
locked version and lock free
version. No need to lock the
whole data structure (can lock
only the modifying node), but
linear lookup time.
An alternative probabilistic data
structure that gives the same
complexity guarantees as BST,
but avoids the need for
rebalancing, but the contention is
much higher.
Explain in the next slide.
47. Date Structure: Partitioning Scheme
Abstract Graph Machine 47
Every node maintains
a set of vectors -- one
per each thread
Every thread maintains the
min work item seen so far.
Push back - O(1)
(assuming no resize)
Find the global min
representative
Partition by the
global min with strict
weak ordering
relation and move
partitioned work to a
new vector. (O(n))
Thread buckets is a
data structure that
has a vector per
each thread,
represents the next
processing
equivalence class
ordering for B1
bucket in Rank 0
48. Data Structure: Partitioning Scheme
• Processing Thread Buckets
• Avoid load imbalance issues.
• But cannot insert work items that are coming for the same buckets (cannot
afford vector resize -- if we need this we need to use locks)
• Work items pushed for the same bucket are directly sent.
Abstract Graph Machine 48
T0 T1 T2
T0 T1 T2 T0
…
49. Data Structure: Summary
Abstract Graph Machine 49
Data Structure Execution Time (sec.)
Linked List (Atomics) ~61
Binary Search Trees with
Locks
~82
Concurrent SkipList ~89
Partitioning Scheme ~44
• Pre-order processing function
• SSSP-Delta ordering
• Two nodes, 16 shared-memory
parallel threads, Graph500 Scale 24.
50. Mapping AGM to an Implementation
Abstract Graph Machine 50
Q2. How should we arrange
processing function and
ordering ?
Q1. How should we do
ordering ?
Abstract
Actual
ImplementationCommunication
method
51. Communication Paradigms
Abstract Graph Machine 51
R0
R1 R2 R3
Push
Push
Push
Push
Push
Push
R0
R1 R2 R3
Pull
Pull
Pull
Pull
Pull
Pull
R0
R1 R2 R3
Gather
Gather
Gather
Scatter
Scatter
Scatter
Apply
Push Communication Pull Communication GAS Communication
R1 pushes workitems to R0 R0 pulls workitems to R1
52. Placing Processing Function: Pre-Order & Post-Order
Abstract Graph Machine 52
After receiving the
workitem processing
function is executed
and generated
workitems are
inserted to the data
structure
After receiving
the workitem
insert to the
data structure
for ordering
Pre-Order
Post-Order
53. Placing Processing Function: Split-Order
• The processing function contains logic for updating state/s and to generate new
work.
- We split the processing function into two functions: a state update function
( and)
Abstract Graph Machine 53
workitem
Insert the
workitem for
ordering if state
is updated
Generate new
work only if state is
not changed
Allows us to
prune work
further
54. Placing Processing Function: Summary
Abstract Graph Machine 54
Run Time Total Inserts Additional
WorkItems
Processed
Split-Order 14.54 17139284 21334
Pre-order 62.55 409388039 2469025
Post-order 44.09 398344581 2284918
Best timing.
Total work items pushed
into the data structure is
much less compared other
two implementations.
Additional relaxes are
much less compared to
other two
implementations,
because of pruning.
Post-Order is
slightly better
than pre-order
because of the
reduced
contention.
58. EAGMConfig
Abstract Graph Machine 59
// strict weak orderings
//================== Chaotic =====================
struct chaotic : public base_ordering {
public:
static const eagm_ordering ORDERING_NAME = eagm_ordering::enum_chaotic;
template <typename T>
bool operator()(T i, T j) {
return false;
}
eagm_ordering name() {
return ORDERING_NAME;
}
};
Any two workitems
are not comparable
to each other.
CHAOTIC_ORDERING_T ch;
DELTA_ORDERING_T delta(agm_params.delta);
auto config = boost::graph::agm::create_eagm_config(delta,
ch,
ch,
ch);
Globally delta
ordering
Node level chaotic
ordering
NUMA level chaotic
orderingThread level chaotic
ordering
59. EAGM Execution:
Abstract Graph Machine 60
T0 T1 T2 T3 T4 T5
N0
numa0 numa1
T0 T1 T2 T3 T4 T5
N1
numa0 numa1
T0 T1 T2 T3 T4 T5
N2
numa0 numa1
Time
An equivalence
class
Barrier
synchronization
Execution same
as AGM
Execution is
globally
synchronous
Execution is
asynchronous in
process, NUMA and
thread levels
60. EAGM Execution:
Abstract Graph Machine 61
Execution is globally
asynchronous
globally single
equivalence class.
Execution is node
level
synchronous,
Equivalence class
at process level.
Execution is
asynchronous in
NUMA and thread
levels
Local thread
barriers
61. EAGM Execution:
Abstract Graph Machine 62
T0 T1 T2 T3 T4 T5
N0
numa0 numa1
T0 T1 T2 T3 T4 T5
N1
numa0 numa1
T0 T1 T2 T3 T4 T5
N2
numa0 numa1
Time
Global
Equivalence
Class
Global
Barrier
Node
Equivalence
Class
NUMA
Equivalence
Class
Thread
Equivalence
Class
NUMA
Barrier
Thread
Barrier
Spatial Ordering
TemporalOrdering
• If we have “non-
chaotic” orderings
defined for every
memory level.
• E.g.,
Spatial ordering
Temporal ordering
62. EAGM Implementation
• The heart of the EAGM implementation is the nested data structure that holds
equivalence classes.
Abstract Graph Machine 63
Global level
classes
Node level
classes
NUMA level
classes
64. EAGM Implementation: Static Optimizations
Abstract Graph Machine 65
Chaotic ordering always
create a single
equivalence class.
Therefore, we exclude
levels for chaotic
orderings.
Optimized data structure.
Optimizations are performed
statically at compile time with the
help of template meta-
programming and template
specialization.
Remove nested levels
for chaotic orderings.
65. EAGM Implementation: Static Optimizations
Abstract Graph Machine 66
Remove nested levels
for chaotic orderings.This is really an AGM.
68. Single Source Shortest Paths: Pre/Post/Split Orderings
Abstract Graph Machine 69
Weak scaling results
carried out on 2 Broad-
well 16-core Intel Xeon
processors.
230 vertices &
234 edges.
Shared-memory
execution.
Split-order is ~5
times faster than
pre-order & post-
order.
69. Single Source Shortest Paths: Weak Scaling
Abstract Graph Machine 70
Graph500 graph.
Graph500
proposed SSSP
graph.
Erdos-Renyi
graph.
Framework algorithms show
better scaling compared to
PowerGraph.
Thread level Dijkstra ordering
shows better scaling behavior
for Power-Law graphs.
Globally synchronous delta-
stepping has better scaling for
ER graphs.
70. Single Source Shortest Paths: Strong Scaling (BRII+)
Abstract Graph Machine 71
Relative Speedup, = Time for
fastest sequential algorithm /
Parallel execution time on “n”
PEs
Synchronization
overhead
become
significant when
executing on
higher number
of nodes
Just ordering by
level does not
reduce the work,
but if we apply
Dijkstra ordering
at thread level
we see better
performance.
71. SSSP More Results: Weak Scaling (BR II)
Abstract Graph Machine 72
Both Power-Graph & PBGL-1
do not scale well at higher
scale
72. Breadth First Search: Weak Scaling
Abstract Graph Machine 73
K(2) global ordering and
level global ordering
reduce more redundant
work compared thread
level ordering.
73. BFS: In-node Performance (Weak Scaling)
Abstract Graph Machine 74
Global level
synchronous
Node level
synchronous
Within a node both configurations
execute in the same way.
Therefore, the performance of both
configurations are similar.
74. BFS: Strong Scaling
Abstract Graph Machine 75
Globally level
synchronous versions
show better speed-up
than thread level
synchronous
versions.
75. BFS: Road Networks
Abstract Graph Machine 76
Strong scaling with
Road networks. Road
networks has a high
diameter (~850-900).
Therefore, global
level synchronous
version show poor
scaling because of
the synchronization
overhead.
Global level
synchronous
Globally
asynchronous
and thread level
synchronous
76. Other Graph Applications
• Connected Components
• Maximal Independent Set
• Triangle Counting
Abstract Graph Machine 77
77. Conclusions
• Using the AGM abstraction, we showed that, we can generate
families of algorithms by keeping the processing function intact
and by changing ordering.
• With the EAGM we can achieve spatial and temporal orderings.
• We discuss the challenges in mapping AGM framework to an
implementations and described how we could avoid such
challenges.
• We came up with an efficient implementation of AGM and EAGM
frameworks.
Weak and strong scaling results show that AGM/EAGM
algorithms outperform graph algorithms in other frameworks.
Abstract Graph Machine 78
Work within each bucket is further ordered using thread local priority queues (threadq).
Interesting duality. Expressing the algorithm with AGM lets us pick an ordering which exposes parallelizability, but once parallelized, we can order in different ways. Usually what makes sense is to start as loosely as possible and then add more ordering
Work within each bucket is further ordered using thread local priority queues (threadq).