Massively parallel architectures such as the GPU are becoming increasingly important due to the recent proliferation of data. In this paper, we propose a key class of hybrid parallel graphlet algorithms that leverages multiple CPUs and GPUs simultaneously for computing k-vertex induced subgraph statistics (called graphlets). In addition to the hybrid multi-core CPU-GPU framework, we also investigate single GPU methods (using multiple cores) and multi-GPU methods that leverage all available GPUs simultaneously for computing induced subgraph statistics. Both methods leverage GPU devices only, whereas the hybrid multi-core CPU-GPU framework leverages all available multi-core CPUs and multiple GPUs for computing graphlets in large networks. Compared to recent approaches, our methods are orders of magnitude faster, while also more cost effective enjoying superior performance per capita and per watt. In particular, the methods are up to 300+ times faster than a recent state-of-the-art method. To the best of our knowledge, this is the first work to leverage multiple CPUs and GPUs simultaneously for computing induced subgraph statistics.
Crescent Womb is seeking funding to expand their ergonomic infant sleeper globally. Their patented design keeps babies safe, comfortable and helps reduce the risk of SIDS. It has received positive reviews from parents and medical professionals. With additional funding, they plan to launch on more Amazon marketplaces, introduce a portable model, and increase marketing to reach their 3-year growth goal of 5% market share in the US. The funding would allow them to better enforce their intellectual property and continue their mission of making their product accessible to all families.
The 3 single best Gamestorming exercises — 6-8-5 for ideation, Poster Session for envisioning the future, and Start-Stop-Continue for decision making problems.
Dynamic PageRank using Evolving TeleportationRyan Rossi
This document proposes a dynamic generalization of PageRank called Dynamic PageRank. It describes PageRank as modeling a random walk over a static graph, but notes that real networks are dynamic with importance continuously changing. Dynamic PageRank models this by formulating PageRank as a dynamical system where the teleportation vector evolves over time based on external influences like pageviews. This provides flexibility to study dynamic network problems and seamlessly extends static PageRank to incorporate network dynamics. The dynamical system can then be evolved over time using numerical methods to determine how importance values change as the network and teleportation vector evolve.
Time-Evolving Relational Classification and Ensemble MethodsRyan Rossi
This document proposes a temporal-relational classification framework for predicting node attributes in dynamic networks. It represents networks as temporal graphs that capture how edges and attributes change over time. It uses weighting functions to assign more importance to recent or frequent events. Classification is done using relational classifiers on the weighted temporal graphs. Experimental evaluation is done on two real-world networks to predict node attributes at future timesteps based on past network structure and attributes.
This document discusses knowledge discovery and machine learning on graph data. It makes three main observations:
1) Graphs are typically constructed from input data rather than given directly, as relationships must be inferred.
2) Graph data management is challenging due to issues like large size, dynamic nature, heterogeneity and attribution.
3) Useful insights and accurate modeling depend on the representation of the data as a graph, such as through decomposition, feature learning or other techniques.
Large-scale Recommendation Systems on Just a PCAapo Kyrölä
Aapo Kyrölä presented on running large-scale recommender systems on a single PC using GraphChi, a framework for graph computation on disk. GraphChi uses parallel sliding windows to efficiently process graphs that do not fit in memory by only loading subsets of the graph into RAM at a time. Kyrölä demonstrated training recommender models like ALS matrix factorization and item-based collaborative filtering on large graphs like Twitter using GraphChi on a single laptop. He concluded that very large recommender algorithms can now be run on a single machine and that GraphChi and similar frameworks hide the low-level optimizations needed for efficient single machine graph computation.
Graph Analysis: New Algorithm Models, New ArchitecturesJason Riedy
The document discusses new algorithms and architectures for graph analysis on streaming data. It introduces a new algorithm model that allows analysis to run concurrently with graph changes without locking the graph. Some algorithms like degree computation are valid under this model while others like shortest paths may not be. A new architecture called Emu uses lightweight threads that migrate to data, showing better performance on pointer-chasing benchmarks than CPUs by better utilizing memory bandwidth. Overall the document explores new approaches for analyzing massive streaming graphs in real-time.
Crescent Womb is seeking funding to expand their ergonomic infant sleeper globally. Their patented design keeps babies safe, comfortable and helps reduce the risk of SIDS. It has received positive reviews from parents and medical professionals. With additional funding, they plan to launch on more Amazon marketplaces, introduce a portable model, and increase marketing to reach their 3-year growth goal of 5% market share in the US. The funding would allow them to better enforce their intellectual property and continue their mission of making their product accessible to all families.
The 3 single best Gamestorming exercises — 6-8-5 for ideation, Poster Session for envisioning the future, and Start-Stop-Continue for decision making problems.
Dynamic PageRank using Evolving TeleportationRyan Rossi
This document proposes a dynamic generalization of PageRank called Dynamic PageRank. It describes PageRank as modeling a random walk over a static graph, but notes that real networks are dynamic with importance continuously changing. Dynamic PageRank models this by formulating PageRank as a dynamical system where the teleportation vector evolves over time based on external influences like pageviews. This provides flexibility to study dynamic network problems and seamlessly extends static PageRank to incorporate network dynamics. The dynamical system can then be evolved over time using numerical methods to determine how importance values change as the network and teleportation vector evolve.
Time-Evolving Relational Classification and Ensemble MethodsRyan Rossi
This document proposes a temporal-relational classification framework for predicting node attributes in dynamic networks. It represents networks as temporal graphs that capture how edges and attributes change over time. It uses weighting functions to assign more importance to recent or frequent events. Classification is done using relational classifiers on the weighted temporal graphs. Experimental evaluation is done on two real-world networks to predict node attributes at future timesteps based on past network structure and attributes.
This document discusses knowledge discovery and machine learning on graph data. It makes three main observations:
1) Graphs are typically constructed from input data rather than given directly, as relationships must be inferred.
2) Graph data management is challenging due to issues like large size, dynamic nature, heterogeneity and attribution.
3) Useful insights and accurate modeling depend on the representation of the data as a graph, such as through decomposition, feature learning or other techniques.
Large-scale Recommendation Systems on Just a PCAapo Kyrölä
Aapo Kyrölä presented on running large-scale recommender systems on a single PC using GraphChi, a framework for graph computation on disk. GraphChi uses parallel sliding windows to efficiently process graphs that do not fit in memory by only loading subsets of the graph into RAM at a time. Kyrölä demonstrated training recommender models like ALS matrix factorization and item-based collaborative filtering on large graphs like Twitter using GraphChi on a single laptop. He concluded that very large recommender algorithms can now be run on a single machine and that GraphChi and similar frameworks hide the low-level optimizations needed for efficient single machine graph computation.
Graph Analysis: New Algorithm Models, New ArchitecturesJason Riedy
The document discusses new algorithms and architectures for graph analysis on streaming data. It introduces a new algorithm model that allows analysis to run concurrently with graph changes without locking the graph. Some algorithms like degree computation are valid under this model while others like shortest paths may not be. A new architecture called Emu uses lightweight threads that migrate to data, showing better performance on pointer-chasing benchmarks than CPUs by better utilizing memory bandwidth. Overall the document explores new approaches for analyzing massive streaming graphs in real-time.
Every year the financial industry loses billions because of fraud while in the meantime fraudsters are coming up with more and more sophisticated patterns.
Financial institutions have to find the balance between fraud protection and negative customer experience. Fraudsters bury their patterns in lots of data, but the traditional technologies are not designed to detect fraud in real-time or to see patterns beyond the individual account.
Analyzing relations with graph databases helps uncover these larger complex patterns and speeds up suspicious behavior identification.
Furthermore, graph databases enable fast and effective real-time link queries and passing context to machine learning models.
The earlier fraud pattern or network is identified, the faster the activity is blocked. As a result, losses and fines are minimized.
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy
Applications in many areas analyze an ever-changing environment. On billion vertices graphs, providing snapshots imposes a large performance cost. We propose the first formal model for graph analysis running concurrently with streaming data updates. We consider an algorithm valid if its output is correct for the initial graph plus some implicit subset of concurrent changes. We show theoretical properties of the model, demonstrate the model on various algorithms, and extend it to updating results incrementally.
The document describes research on distributed graph summarization algorithms. It introduces three distributed graph summarization algorithms (DistGreedy, DistRandom, DistLSH) that can scale to large graphs by distributing computation across machines. The algorithms share a common framework of iteratively merging super-nodes representing aggregated subsets of nodes, but differ in how they select candidate pairs of super-nodes to merge. Experimental evaluation on real-world graphs demonstrates the ability of the proposed distributed algorithms to summarize large graphs in a parallelized manner.
The document discusses N-gram graphs, which represent the proximity or co-occurrence of items in a text by modeling them as a graph. An N-gram graph is constructed by extracting n-grams from a text, determining their neighborhood based on a window size, and assigning edge weights based on co-occurrence frequencies. The document outlines the process for constructing N-gram graphs and describes their potential uses, including representing sets of items with a single graph, comparing graphs through clustering, and defining similarity measures between graphs. N-gram graphs aim to capture proximity information in a way that is domain-agnostic, allows different analysis levels, and can represent multiple texts with a single graph structure.
Design and Implementation of Multiplier Using Kcm and Vedic Mathematics by Us...IJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
The document proposes a model for dynamically organizing edge computing nodes into micro clouds to provide edge computing as a service. The model involves grouping nodes into clusters, clusters into regions, and regions into a topology. Micro clouds are ephemeral cloud-like structures serving local requests before reaching the traditional cloud. Protocols for health checking, cluster formation, and listing the system state are proposed. The model is inspired by cloud architecture and aims to lower latency by processing data closer to its source.
This document describes a web-based application called "Path Finding Visualizer" that visualizes shortest path algorithms like Dijkstra's algorithm and A* algorithm. It discusses the motivation, objectives and implementation of the project. The implementation involves creating a graph from a maze, building an adjacency matrix to represent the graph, and applying Dijkstra's algorithm to find the shortest path between nodes. Screenshots show the visualization of Dijkstra's algorithm finding the shortest path between a source and destination node. The technologies used include Visual Studio Code. The project aims to help users better understand how shortest path algorithms work through visualization.
Social networks are not new, even though websites like Facebook and Twitter might make you want to believe they are; and trust me- I’m not talking about Myspace! Social networks are extremely interesting models for human behavior, whose study dates back to the early twentieth century. However, because of those websites, data scientists have access to much more data than the anthropologists who studied the networks of tribes!
Because networks take a relationship-centered view of the world, the data structures that we will analyze model real world behaviors and community. Through a suite of algorithms derived from mathematical Graph theory we are able to compute and predict behavior of individuals and communities through these types of analyses. Clearly this has a number of practical applications from recommendation to law enforcement to election prediction, and more.
Design and Implementation of Mobile Map Application for Finding Shortest Dire...Eswar Publications
The shortest path problem is an approach towards finding the shortest and quickest path or route from a starting point to a final destination, four major algorithms are peculiar to solving the shortest path problem. The algorithms include Dijkstra’s Algorithm, Floyd-Warshall Algorithm, Bellman-Ford Algorithm and Alternative Path Algorithm. This research work is focused on the design of mobile map application for finding the shortest
route from one location to another within Yaba College of Technology and its environ. The design was focused on
Dijkstra’s algorithm that source node as a first permanent node, and assign it 0 cost and check all neighbor nodes
from the previous permanent node and calculate the cumulative cost of each neighbor nodes and make them
temporary, then chooses the node with the smallest cumulative cost, and make it as a permanent node. The different nodes that lead to a particular destination were identified, the distance and time from a source to a destination is calculated using the Google map. The application then recommends the shortest and quickest route to the destination.
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013Amazon Web Services
GraphLab is like Hadoop for graphs in that it enables users to easily express and execute machine learning algorithms on massive graphs. In this session, we illustrate how GraphLab leverages Amazon EC2 and advances in graph representation, asynchronous communication, and scheduling to achieve orders-of-magnitude performance gains over systems like Hadoop on real-world data.
Graph Sample and Hold: A Framework for Big Graph AnalyticsNesreen K. Ahmed
Sampling is a standard approach in big-graph analytics; the goal is to efficiently estimate the graph properties by consulting a sample of the whole population. A perfect sample is assumed to mirror every property of the whole population. Unfortunately, such a perfect sample is hard to collect in complex populations such as graphs(e.g. web graphs, social networks), where an underlying network connects the units of the population. Therefore, a good sample will be representative in the sense that graph properties of interest can be estimated with a known degree of accuracy.While previous work focused particularly on sampling schemes to estimate certain graph properties (e.g. triangle count), much less is known for the case when we need to estimate various graph properties with the same sampling scheme. In this paper, we pro-pose a generic stream sampling framework for big-graph analytics,called Graph Sample and Hold (gSH), which samples from massive graphs sequentially in a single pass, one edge at a time, while maintaining a small state in memory. We use a Horvitz-Thompson construction in conjunction with a scheme that samples arriving edges without adjacencies to previously sampled edges with probability p and holds edges with adjacencies with probability q. Our sample and hold framework facilitates the accurate estimation of subgraph patterns by enabling the dependence of the sampling process to vary based on previous history. Within our framework, we show how to produce statistically unbiased estimators for various graph properties from the sample. Given that the graph analytic swill run on a sample instead of the whole population, the runtime complexity is kept under control. Moreover, given that the estimators are unbiased, the approximation error is also kept under control.
The graph theory, which studies the properties of the graphs, has been widely
accepted as a core subject in the knowledge of computer science. In this paper, we
produced a method for developing an algorithm. The effectiveness of testing is the most
important factor for determining the cost and the duration of the development of the
large software products with a given quality, so the cost of testing for detecting errors
in the software reaches 30-40% of the total cost of its development and largely
determines its quality. The most commonly used of the testing methods are regression,
function, load, module, and optimization test if the graph is sufficiently complex. The
graph accelerates the testing process. We see the ways that we need to test. When they
cover all graph paths, the algorithm of the program is fully tested and does not need
any further development.
This document provides a summary of practical machine learning on big data platforms. It begins with an introduction and agenda, then provides a quick brief on the machine learning process. It discusses the current landscape of open source tools, including evolutionary drivers and examples. It covers case studies from Twitter and their experience. Finally, it discusses architectural forces like Moore's Law and Kryder's Law that are shaping the field. The document aims to present a unified approach for machine learning on big data platforms and discuss how industry leaders are implementing these techniques.
Start From A MapReduce Graph Pattern-recognize AlgorithmYu Liu
This document summarizes a presentation on developing a MapReduce algorithm to recognize patterns in large graphs by finding connected components. It discusses:
- Motivation to study parallel graph algorithms and frameworks like MapReduce and Pregel
- The problem of finding link patterns in graphs by extracting connected components
- Background on semantic web and linked open data modeled as RDF graphs
- A naive O(2Ck)-iteration MapReduce algorithm to find connected components between pairs of datasets
- Examples and analysis of the algorithm's complexity and communication costs
GraphX is Apache Spark's library for graph analytics. It allows users to analyze large graphs in parallel across a cluster. Some key capabilities include calculating centrality metrics like PageRank to identify important nodes, finding shortest and longest paths between nodes, and breaking large graphs into smaller subgraphs for individual analysis. The library represents graphs as vertices connected by edges and can be used to model many real-world networks from social networks to citation networks to computer architectures.
This document summarizes a presentation given by Nesreen K. Ahmed on graph sampling techniques. It discusses previous work on sampling large graphs to estimate properties like triangle counts. Existing methods either require multiple passes over the data or make assumptions about the graph stream order. The presentation introduces a new single-pass Graph Priority Sampling framework that can estimate properties in an unbiased way using a fixed-size sample. It assigns edge weights and priorities to sample edges proportional to their contribution to graph structures. Estimates can be updated incrementally during the stream or retrospectively after it ends. The framework is evaluated on real-world graphs with billions of edges to estimate triangle counts, wedge counts, and clustering coefficients with low variance.
This document discusses different types of geometric modeling methods including wireframe, surface, and solid modeling. Wireframe modeling uses points and lines to define objects but does not represent actual surfaces or volumes. Surface modeling defines the outer surfaces of an object. Solid modeling precisely defines the enclosed volume of an object using its faces, edges, and vertices. Constructive solid geometry and boundary representation are two common solid modeling techniques. CSG uses Boolean operations to combine primitive shapes, while boundary representation stores topological information about faces, edges, and vertices. Feature-based modeling allows shapes to be created through operations like extruding, revolving, sweeping, and filling.
Computational steering Interactive Design-through-Analysis for Simulation Sci...SURFevents
The document discusses computational steering and interactive design-through-analysis. It provides a vision of a unified computational framework that allows for rapid prototyping and accurate analysis of engineering designs. This framework would combine physics-informed machine learning for initial design exploration with isogeometric analysis for detailed analysis and optimization. The document then demonstrates some of the key concepts behind isogeometric analysis, including its use of B-spline basis functions to represent geometry, solutions, and right-hand sides, as well as its formulation as an abstract linear system.
Novel Graph Modeling Framework for Feature Importance Determination in Unsupe...Neo4j
The document describes a novel graph modeling framework for determining feature importance in unsupervised learning. It proposes converting datasets into directed graphs and applying a modified PageRank algorithm to rank features based on their importance. The approach involves 7 steps: 1) converting data to a directed graph, 2) calculating node ranks with PageRank, 3) rebuilding the graph based on ranks, 4) iterating this process and tracking ranks, 5) summarizing ranks, 6) sorting ranks, and 7) outputting ranked features. The approach is validated on several datasets and shown to produce similar feature importance rankings as supervised learning methods. Potential applications include knowledge graphs, disease progression modeling, and disaster recovery system analysis.
hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills
The technology uses reclaimed CO₂ as the dyeing medium in a closed loop process. When pressurized, CO₂ becomes supercritical (SC-CO₂). In this state CO₂ has a very high solvent power, allowing the dye to dissolve easily.
More Related Content
Similar to Leveraging Multiple GPUs and CPUs for Graphlet Counting in Large Networks
Every year the financial industry loses billions because of fraud while in the meantime fraudsters are coming up with more and more sophisticated patterns.
Financial institutions have to find the balance between fraud protection and negative customer experience. Fraudsters bury their patterns in lots of data, but the traditional technologies are not designed to detect fraud in real-time or to see patterns beyond the individual account.
Analyzing relations with graph databases helps uncover these larger complex patterns and speeds up suspicious behavior identification.
Furthermore, graph databases enable fast and effective real-time link queries and passing context to machine learning models.
The earlier fraud pattern or network is identified, the faster the activity is blocked. As a result, losses and fines are minimized.
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy
Applications in many areas analyze an ever-changing environment. On billion vertices graphs, providing snapshots imposes a large performance cost. We propose the first formal model for graph analysis running concurrently with streaming data updates. We consider an algorithm valid if its output is correct for the initial graph plus some implicit subset of concurrent changes. We show theoretical properties of the model, demonstrate the model on various algorithms, and extend it to updating results incrementally.
The document describes research on distributed graph summarization algorithms. It introduces three distributed graph summarization algorithms (DistGreedy, DistRandom, DistLSH) that can scale to large graphs by distributing computation across machines. The algorithms share a common framework of iteratively merging super-nodes representing aggregated subsets of nodes, but differ in how they select candidate pairs of super-nodes to merge. Experimental evaluation on real-world graphs demonstrates the ability of the proposed distributed algorithms to summarize large graphs in a parallelized manner.
The document discusses N-gram graphs, which represent the proximity or co-occurrence of items in a text by modeling them as a graph. An N-gram graph is constructed by extracting n-grams from a text, determining their neighborhood based on a window size, and assigning edge weights based on co-occurrence frequencies. The document outlines the process for constructing N-gram graphs and describes their potential uses, including representing sets of items with a single graph, comparing graphs through clustering, and defining similarity measures between graphs. N-gram graphs aim to capture proximity information in a way that is domain-agnostic, allows different analysis levels, and can represent multiple texts with a single graph structure.
Design and Implementation of Multiplier Using Kcm and Vedic Mathematics by Us...IJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
The document proposes a model for dynamically organizing edge computing nodes into micro clouds to provide edge computing as a service. The model involves grouping nodes into clusters, clusters into regions, and regions into a topology. Micro clouds are ephemeral cloud-like structures serving local requests before reaching the traditional cloud. Protocols for health checking, cluster formation, and listing the system state are proposed. The model is inspired by cloud architecture and aims to lower latency by processing data closer to its source.
This document describes a web-based application called "Path Finding Visualizer" that visualizes shortest path algorithms like Dijkstra's algorithm and A* algorithm. It discusses the motivation, objectives and implementation of the project. The implementation involves creating a graph from a maze, building an adjacency matrix to represent the graph, and applying Dijkstra's algorithm to find the shortest path between nodes. Screenshots show the visualization of Dijkstra's algorithm finding the shortest path between a source and destination node. The technologies used include Visual Studio Code. The project aims to help users better understand how shortest path algorithms work through visualization.
Social networks are not new, even though websites like Facebook and Twitter might make you want to believe they are; and trust me- I’m not talking about Myspace! Social networks are extremely interesting models for human behavior, whose study dates back to the early twentieth century. However, because of those websites, data scientists have access to much more data than the anthropologists who studied the networks of tribes!
Because networks take a relationship-centered view of the world, the data structures that we will analyze model real world behaviors and community. Through a suite of algorithms derived from mathematical Graph theory we are able to compute and predict behavior of individuals and communities through these types of analyses. Clearly this has a number of practical applications from recommendation to law enforcement to election prediction, and more.
Design and Implementation of Mobile Map Application for Finding Shortest Dire...Eswar Publications
The shortest path problem is an approach towards finding the shortest and quickest path or route from a starting point to a final destination, four major algorithms are peculiar to solving the shortest path problem. The algorithms include Dijkstra’s Algorithm, Floyd-Warshall Algorithm, Bellman-Ford Algorithm and Alternative Path Algorithm. This research work is focused on the design of mobile map application for finding the shortest
route from one location to another within Yaba College of Technology and its environ. The design was focused on
Dijkstra’s algorithm that source node as a first permanent node, and assign it 0 cost and check all neighbor nodes
from the previous permanent node and calculate the cumulative cost of each neighbor nodes and make them
temporary, then chooses the node with the smallest cumulative cost, and make it as a permanent node. The different nodes that lead to a particular destination were identified, the distance and time from a source to a destination is calculated using the Google map. The application then recommends the shortest and quickest route to the destination.
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013Amazon Web Services
GraphLab is like Hadoop for graphs in that it enables users to easily express and execute machine learning algorithms on massive graphs. In this session, we illustrate how GraphLab leverages Amazon EC2 and advances in graph representation, asynchronous communication, and scheduling to achieve orders-of-magnitude performance gains over systems like Hadoop on real-world data.
Graph Sample and Hold: A Framework for Big Graph AnalyticsNesreen K. Ahmed
Sampling is a standard approach in big-graph analytics; the goal is to efficiently estimate the graph properties by consulting a sample of the whole population. A perfect sample is assumed to mirror every property of the whole population. Unfortunately, such a perfect sample is hard to collect in complex populations such as graphs(e.g. web graphs, social networks), where an underlying network connects the units of the population. Therefore, a good sample will be representative in the sense that graph properties of interest can be estimated with a known degree of accuracy.While previous work focused particularly on sampling schemes to estimate certain graph properties (e.g. triangle count), much less is known for the case when we need to estimate various graph properties with the same sampling scheme. In this paper, we pro-pose a generic stream sampling framework for big-graph analytics,called Graph Sample and Hold (gSH), which samples from massive graphs sequentially in a single pass, one edge at a time, while maintaining a small state in memory. We use a Horvitz-Thompson construction in conjunction with a scheme that samples arriving edges without adjacencies to previously sampled edges with probability p and holds edges with adjacencies with probability q. Our sample and hold framework facilitates the accurate estimation of subgraph patterns by enabling the dependence of the sampling process to vary based on previous history. Within our framework, we show how to produce statistically unbiased estimators for various graph properties from the sample. Given that the graph analytic swill run on a sample instead of the whole population, the runtime complexity is kept under control. Moreover, given that the estimators are unbiased, the approximation error is also kept under control.
The graph theory, which studies the properties of the graphs, has been widely
accepted as a core subject in the knowledge of computer science. In this paper, we
produced a method for developing an algorithm. The effectiveness of testing is the most
important factor for determining the cost and the duration of the development of the
large software products with a given quality, so the cost of testing for detecting errors
in the software reaches 30-40% of the total cost of its development and largely
determines its quality. The most commonly used of the testing methods are regression,
function, load, module, and optimization test if the graph is sufficiently complex. The
graph accelerates the testing process. We see the ways that we need to test. When they
cover all graph paths, the algorithm of the program is fully tested and does not need
any further development.
This document provides a summary of practical machine learning on big data platforms. It begins with an introduction and agenda, then provides a quick brief on the machine learning process. It discusses the current landscape of open source tools, including evolutionary drivers and examples. It covers case studies from Twitter and their experience. Finally, it discusses architectural forces like Moore's Law and Kryder's Law that are shaping the field. The document aims to present a unified approach for machine learning on big data platforms and discuss how industry leaders are implementing these techniques.
Start From A MapReduce Graph Pattern-recognize AlgorithmYu Liu
This document summarizes a presentation on developing a MapReduce algorithm to recognize patterns in large graphs by finding connected components. It discusses:
- Motivation to study parallel graph algorithms and frameworks like MapReduce and Pregel
- The problem of finding link patterns in graphs by extracting connected components
- Background on semantic web and linked open data modeled as RDF graphs
- A naive O(2Ck)-iteration MapReduce algorithm to find connected components between pairs of datasets
- Examples and analysis of the algorithm's complexity and communication costs
GraphX is Apache Spark's library for graph analytics. It allows users to analyze large graphs in parallel across a cluster. Some key capabilities include calculating centrality metrics like PageRank to identify important nodes, finding shortest and longest paths between nodes, and breaking large graphs into smaller subgraphs for individual analysis. The library represents graphs as vertices connected by edges and can be used to model many real-world networks from social networks to citation networks to computer architectures.
This document summarizes a presentation given by Nesreen K. Ahmed on graph sampling techniques. It discusses previous work on sampling large graphs to estimate properties like triangle counts. Existing methods either require multiple passes over the data or make assumptions about the graph stream order. The presentation introduces a new single-pass Graph Priority Sampling framework that can estimate properties in an unbiased way using a fixed-size sample. It assigns edge weights and priorities to sample edges proportional to their contribution to graph structures. Estimates can be updated incrementally during the stream or retrospectively after it ends. The framework is evaluated on real-world graphs with billions of edges to estimate triangle counts, wedge counts, and clustering coefficients with low variance.
This document discusses different types of geometric modeling methods including wireframe, surface, and solid modeling. Wireframe modeling uses points and lines to define objects but does not represent actual surfaces or volumes. Surface modeling defines the outer surfaces of an object. Solid modeling precisely defines the enclosed volume of an object using its faces, edges, and vertices. Constructive solid geometry and boundary representation are two common solid modeling techniques. CSG uses Boolean operations to combine primitive shapes, while boundary representation stores topological information about faces, edges, and vertices. Feature-based modeling allows shapes to be created through operations like extruding, revolving, sweeping, and filling.
Computational steering Interactive Design-through-Analysis for Simulation Sci...SURFevents
The document discusses computational steering and interactive design-through-analysis. It provides a vision of a unified computational framework that allows for rapid prototyping and accurate analysis of engineering designs. This framework would combine physics-informed machine learning for initial design exploration with isogeometric analysis for detailed analysis and optimization. The document then demonstrates some of the key concepts behind isogeometric analysis, including its use of B-spline basis functions to represent geometry, solutions, and right-hand sides, as well as its formulation as an abstract linear system.
Novel Graph Modeling Framework for Feature Importance Determination in Unsupe...Neo4j
The document describes a novel graph modeling framework for determining feature importance in unsupervised learning. It proposes converting datasets into directed graphs and applying a modified PageRank algorithm to rank features based on their importance. The approach involves 7 steps: 1) converting data to a directed graph, 2) calculating node ranks with PageRank, 3) rebuilding the graph based on ranks, 4) iterating this process and tracking ranks, 5) summarizing ranks, 6) sorting ranks, and 7) outputting ranked features. The approach is validated on several datasets and shown to produce similar feature importance rankings as supervised learning methods. Potential applications include knowledge graphs, disease progression modeling, and disaster recovery system analysis.
Similar to Leveraging Multiple GPUs and CPUs for Graphlet Counting in Large Networks (20)
hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills
The technology uses reclaimed CO₂ as the dyeing medium in a closed loop process. When pressurized, CO₂ becomes supercritical (SC-CO₂). In this state CO₂ has a very high solvent power, allowing the dye to dissolve easily.
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills MN
Travis Hills of Minnesota developed a method to convert waste into high-value dry fertilizer, significantly enriching soil quality. By providing farmers with a valuable resource derived from waste, Travis Hills helps enhance farm profitability while promoting environmental stewardship. Travis Hills' sustainable practices lead to cost savings and increased revenue for farmers by improving resource efficiency and reducing waste.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
Authoring a personal GPT for your research and practice: How we created the Q...Leonel Morgado
Thematic analysis in qualitative research is a time-consuming and systematic task, typically done using teams. Team members must ground their activities on common understandings of the major concepts underlying the thematic analysis, and define criteria for its development. However, conceptual misunderstandings, equivocations, and lack of adherence to criteria are challenges to the quality and speed of this process. Given the distributed and uncertain nature of this process, we wondered if the tasks in thematic analysis could be supported by readily available artificial intelligence chatbots. Our early efforts point to potential benefits: not just saving time in the coding process but better adherence to criteria and grounding, by increasing triangulation between humans and artificial intelligence. This tutorial will provide a description and demonstration of the process we followed, as two academic researchers, to develop a custom ChatGPT to assist with qualitative coding in the thematic data analysis process of immersive learning accounts in a survey of the academic literature: QUAL-E Immersive Learning Thematic Analysis Helper. In the hands-on time, participants will try out QUAL-E and develop their ideas for their own qualitative coding ChatGPT. Participants that have the paid ChatGPT Plus subscription can create a draft of their assistants. The organizers will provide course materials and slide deck that participants will be able to utilize to continue development of their custom GPT. The paid subscription to ChatGPT Plus is not required to participate in this workshop, just for trying out personal GPTs during it.
ESPP presentation to EU Waste Water Network, 4th June 2024 “EU policies driving nutrient removal and recycling
and the revised UWWTD (Urban Waste Water Treatment Directive)”
Or: Beyond linear.
Abstract: Equivariant neural networks are neural networks that incorporate symmetries. The nonlinear activation functions in these networks result in interesting nonlinear equivariant maps between simple representations, and motivate the key player of this talk: piecewise linear representation theory.
Disclaimer: No one is perfect, so please mind that there might be mistakes and typos.
dtubbenhauer@gmail.com
Corrected slides: dtubbenhauer.com/talks.html
Equivariant neural networks and representation theory
Leveraging Multiple GPUs and CPUs for Graphlet Counting in Large Networks
1.
2. Graphs – rich and powerful
data representation
-
-
-
-
-
Social network
Human Disease Network
[Barabasi 2007]
Food Web [2007]
Terrorist Network
[Krebs 2002]Internet (AS) [2005]
Gene Regulatory Network
[Decourty 2008]
Protein Interactions
[breast cancer]
Political blogs
Power grid
3. Graphlets
H13 H14 H15 H16 H17H6H2
H7 H8 H9 H10 H11 H12H4H1 H3
H5
1
0
0.5
0.83
0.67
0.33
0.17
Network Motifs: Simple Building Blocks of Complex Networks [Milo et. al – Science 2002]
The Structure and Function of Complex Networks [Newman – Siam Review 2003]
Small induced subgraphs
4. Graphlets
H13 H14 H15 H16 H17H6H2
H7 H8 H9 H10 H11 H12H4H1 H3
H5
1
0
0.5
0.83
0.67
0.33
0.17
Connected
Disconnected
Network Motifs: Simple Building Blocks of Complex Networks [Milo et. al – Science 2002]
The Structure and Function of Complex Networks [Newman – Siam Review 2003]
Small induced subgraphs
5. Graphlets
k-graphlets = family of graphlets of size k
2-graphlets 3-graphlets 4-graphlets
H13 H14 H15 H16 H17H6H2
H7 H8 H9 H10 H11 H12H4H1 H3
H5
1
0
0.5
0.83
0.67
0.33
0.17
Connected
Disconnected
Network Motifs: Simple Building Blocks of Complex Networks [Milo et. al – Science 2002]
The Structure and Function of Complex Networks [Newman – Siam Review 2003]
Small induced subgraphs
6. Graphlets
k-graphlets = family of graphlets of size k
motifs = frequently occurring subgraphs
2-graphlets 3-graphlets 4-graphlets
H13 H14 H15 H16 H17H6H2
H7 H8 H9 H10 H11 H12H4H1 H3
H5
1
0
0.5
0.83
0.67
0.33
0.17
Connected
Disconnected
Network Motifs: Simple Building Blocks of Complex Networks [Milo et. al – Science 2002]
The Structure and Function of Complex Networks [Newman – Siam Review 2003]
Small induced subgraphs
7. Graphlets
k-graphlets = family of graphlets of size k
motifs = frequently occurring subgraphs
Applied to food web, genetic, neural, web, and other networks
Found distinct graphlets in each case
2-graphlets 3-graphlets 4-graphlets
H13 H14 H15 H16 H17H6H2
H7 H8 H9 H10 H11 H12H4H1 H3
H5
1
0
0.5
0.83
0.67
0.33
0.17
Connected
Disconnected
Network Motifs: Simple Building Blocks of Complex Networks [Milo et. al – Science 2002]
The Structure and Function of Complex Networks [Newman – Siam Review 2003]
Small induced subgraphs
8. • Biological Networks
⎻ network alignment, protein function prediction
[Pržulj 2007][Milenković-Pržulj 2008] [Hulovatyy-Solava-Milenković 2014]
[Shervashidze et al. 2009][Vishwanathan et al. 2010]
• Social Networks
⎻ Triad analysis, role discovery, community detection
[Granovetter 1983][Holland-Leinhardt 1976][Rossi-Ahmed 2015]
[Ahmed et al. 2015][Xie-Kelley-Szymanski 2013]
• Internet AS [Feldman et al. 2008]
• Spam Detection
[Becchetti et al. 2008][Ahmed et al. 2016]
Applications of Graphlets
Useful for various machine learning tasks
e.g., Anomaly detection, Role Discovery, Relational Learning, Clustering etc.
9. Useful for a variety of ML tasks
• Graph-based anomaly detection
⎻ Unusual/malicious behavior detection
⎻ Emerging event and threat identification, …
• Graph-based semi-supervised learning, classification, …
• Link prediction and relationship strength estimation
• Graph similarity queries
⎻ Find similar nodes, edges, or graphs
• Subgraph detection and matching
10. Applications:
Higher-order network analysis and
modeling
Higher-order network structures
• Visualization – “spotting anomalies” [Ahmed et al.
ICDM 2014]
• Finding large cliques, stars, and other larger
network structures [Ahmed et al. KAIS 2015]
• Spectral clustering [Jure et al. Science 2016]
• Role discovery [Ahmed et al. 2016]
...
11. How
CPU/GP
Us
compare
CPU GPU
Large memory Memory is very limited
Few fast/powerful processing units Thousands of smaller processing units
Handles unbalanced jobs better Performs best with “balanced” workloads
Optimized for general computations
Optimized for simple repetitive calculations
at a very fast rate.
12. How
CPU/GP
Us
compare
CPU GPU
Large memory Memory is very limited
Few fast/powerful processing units Thousands of smaller processing units
Handles unbalanced jobs better Performs best with “balanced” workloads
Optimized for general computations
Optimized for simple repetitive calculations
at a very fast rate.
Combine advantages of both
13. INPUT: a large graph G=(V,E), set of graphlets 𝓗
PROBLEM: Find the number of embeddings
(appearances) of each graphlet 𝐻 𝑘 ∈ 𝓗 in G
Problem: global graphlet counting
(macro-level)
14. INPUT: a large graph G=(V,E), set of graphlets 𝓗
PROBLEM: Find the number of embeddings
(appearances) of each graphlet 𝐻 𝑘 ∈ 𝓗 in G
Problem: global graphlet counting
(macro-level)
Given an input graph G
- How many triangles in G?
- How many cliques of size 4-nodes in G?
- How many cycles of size 4-nodes in G?
15. INPUT: a large graph G=(V,E), set of graphlets 𝓗
PROBLEM: Find the number of embeddings
(appearances) of each graphlet 𝐻 𝑘 ∈ 𝓗 in G
Problem: global graphlet counting
(macro-level)
Given an input graph G
- How many triangles in G?
- How many cliques of size 4-nodes in G?
- How many cycles of size 4-nodes in G?
Many applications require counting all k-vertex graphlets
Recent research work
- Exact/approximation of global counts [Rahman et al. TKDE14] [Jha et al. WWW15]
- Scalable for massive graphs (billions of nodes/edges) ] [Ahmed et al. ICDM15,KAIS16]
16. INPUT: a large graph G=(V,E), set of graphlets ℋ
PROBLEM: Find the number of occurrences that
edge i is contained within 𝐻 𝑘, for all k = 1, … , |ℋ|
Role discovery, Relational Learning, Multi-label Classification
Problem: local graphlet counting
(micro-level)
17. Current work
• Enumerate all possible graphlets
- Exhaustive enumeration is too expensive
• Count graphlets for each node
- Expensive for large k [Shervashidze et al. – AISTAT 2009]
Not practical – scales only for graphs with few hundred/thousand nodes/edges
[Hočevar et al. – Bioinfo. 13]
Sequential
18. Current work
• Enumerate all possible graphlets
- Exhaustive enumeration is too expensive
• Count graphlets for each node
- Expensive for large k [Shervashidze et al. – AISTAT 2009]
Not practical – scales only for graphs with few hundred/thousand nodes/edges
[Hočevar et al. – Bioinfo. 13]
Sequential
Parallel
• Edge-centric graphlet counting (PGD)
⎻ Multi-core CPUs, large graphs
[Ahmed et al. ICDM 14, KAIS 15]
19. Current work
• Enumerate all possible graphlets
- Exhaustive enumeration is too expensive
• Count graphlets for each node
- Expensive for large k [Shervashidze et al. – AISTAT 2009]
Not practical – scales only for graphs with few hundred/thousand nodes/edges
[Hočevar et al. – Bioinfo. 13]
Sequential
Parallel
• Edge-centric graphlet counting (PGD)
⎻ Multi-core CPUs, large graphs
• Node-centric graphlet counting,
⎻ Single GPU, Handles only tiny graphs (ORCA-GPU)
[Ahmed et al. ICDM 14, KAIS 15]
[Milinković et al.]
20. Our approach
Hybrid parallel graphlet counting framework that
leverages all available CPUs and GPUs
Parallel Graphlet
Counting Framework
Single GPU
methods
Multi-GPU
methods
Hybrid CPU-GPU
methods
Algorithm classes
21. Our approach
Hybrid parallel graphlet
counting framework that
leverages all available
CPUs & GPUs
Hybrid Parallel
Graphlet Counting
Framework
Multi-GPU
methods
Hybrid CPU-GPU
methods
Algorithm classes
Single GPU
methods
Other key advantages:
• Edge-centric parallelization
⎻ Improved load balancing & lock-free
• Global and local graphlet counts
• Connected and disconnected graphlets
• Fine-grained parallelization
• Space-efficient
⫶
e
24. Our Approach –
(Edge-centric, parallel, space-efficient)
Searching Edge
Neighborhoods
For each edge
Find the triangles
Step 1
25. Our Approach –
(Edge-centric, parallel, space-efficient)
Searching Edge
Neighborhoods
For each edge
Find the triangles
Count a few
k-graphlets
For each edge,
count only:
k-cliques
k-cycles
tailed-triangles
Step 1 Step 2
26. Our Approach –
(Edge-centric, parallel, space-efficient)
Searching Edge
Neighborhoods
For each edge
Find the triangles
Count a few
k-graphlets
For each edge,
count only:
Count all other
graphlets
For each edge, use
combinatorial
relationships
to derive counts
of other graphlets
in constant time o(1)
k-cliques
k-cycles
tailed-triangles
Step 1 Step 2 Step 3
27. Our Approach – (Edge-centric, parallel, space-efficient)
Searching Edge
Neighborhoods
For each edge
Find the triangles
Count a few
k-graphlets
For each edge,
count only:
Count all other
graphlets
For each edge, use
combinatorial
relationships
to derive counts
of other graphlets
in constant time o(1)
k-cliques
k-cycles
tailed-triangles
Step 1 Step 2 Step 3
Step 4 Merge all counts
28. neighborhood runtimes (CPU)
Key Observations
Neighborhood runtimes
are power-lawed
The distribution of
graphlet runtimes for
edge neighborhoods
obey a power-law.
29. Most edge neighborhoods are fast with
runtimes that are approximately equal.
Key Observations
The distribution of
graphlet runtimes for
edge neighborhoods
obey a power-law.
Neighborhood runtimes
are power-lawed
30. HOWEVER, a handful of neighborhoods
are hard and take significantly longer.
Most edge neighborhoods are fast with
runtimes that are approximately equal.
Key Observations
The distribution of
graphlet runtimes for
edge neighborhoods
obey a power-law.
Neighborhood runtimes
are power-lawed
31. HOWEVER, a handful of neighborhoods
are hard and take significantly longer.
Most edge neighborhoods are fast with
runtimes that are approximately equal.
Key Observations
The distribution of
graphlet runtimes for
edge neighborhoods
obey a power-law.
Neighborhood runtimes
are power-lawed
QUESTION:
What is the “best” way to
partition neighborhoods
among CPUs and GPUs?
• “hardness” proxy
edge deg., vol., ...
32. Our approach
• Order edges by “hardness” and partition into 3 sets:
Γ e1 Γ e 𝑀Γ e𝑗Γ e 𝑘
⋯ ⋯ ⋯
Πcpu Πgpu
33. Our approach
• Order edges by “hardness” and partition into 3 sets:
• Compute induced subgraphs centered at each edge
• CPU Workers: use hash table for o(1) lookups, O(N)
• GPU Workers: use binary search for o(log d) lookups
• When finished, dequeue next b edges:
• CPU: get b edges from FRONT of Πunproc
• GPU: get b edges from BACK of Πunproc
34. Preprocessing steps
Three simple and efficient preprocessing steps:
1) Sort vertices from smallest to largest degree 𝑓(∙) and
relabel them s.t. 𝑓 v1 ≤ ⋯ ≤ 𝑓 v 𝑁
35. Preprocessing steps
Three simple and efficient preprocessing steps:
1) Sort vertices from smallest to largest degree 𝑓(∙) and
relabel them s.t. 𝑓 v1 ≤ ⋯ ≤ 𝑓 v 𝑁
2) For each Γ v𝑖 , ∀𝑖 = 1, … , N, order the set of neighbors
Γ v𝑖 = … , w𝑗, … , w 𝑘 , … from smallest to largest deg.
36. Preprocessing steps
Three simple and efficient preprocessing steps:
1) Sort vertices from smallest to largest degree 𝑓(∙) and
relabel them s.t. 𝑓 v1 ≤ ⋯ ≤ 𝑓 v 𝑁
2) For each Γ v𝑖 , ∀𝑖 = 1, … , N, order the set of neighbors
Γ v𝑖 = … , w𝑗, … , w 𝑘 , … from smallest to largest deg.
3) Given edge (v, u) ∈ E, ensure that 𝑓 v ≥ 𝑓 u
⎻ hence, v is always the vertex with largest degree, dv ≥ du
37. Preprocessing steps
• All of these steps are not required, but significantly improve
• Each step is extremely fast and lends itself to easy parallelization
Three simple and efficient preprocessing steps:
1) Sort vertices from smallest to largest degree 𝑓(∙) and
relabel them s.t. 𝑓 v1 ≤ ⋯ ≤ 𝑓 v 𝑁
2) For each Γ v𝑖 , ∀𝑖 = 1, … , N, order the set of neighbors
Γ v𝑖 = … , w𝑗, … , w 𝑘 , … from smallest to largest deg.
3) Given edge (v, u) ∈ E, ensure that 𝑓 v ≥ 𝑓 u
⎻ hence, v is always the vertex with largest degree, dv ≥ du
38. Fine Granularity & Work Stealing
For a single edge (v, u) ∈ E,
I. Compute the sets 𝐓 and 𝐒 𝐮
II. Find the total 4-cliques using T
III. Find the total 4-cycles using 𝐒 𝐮
NOTE: (II) and (III) are independent parallelize
41. Time Complexity
K = number of edges
Δ = max degree
Tmax = max number of triangles incident to an edge in G
Smax = max number of 2-stars incident to an edge in G
43. Connected 4-graphlet frequencies for a variety of the real-
world networks investigated from different domains.
Facebook networks
Social networks
Interaction networks
Network type
Collaboration networks
Brain networks
Web graphs
Technological/IP networks
Dense hard benchmark graphs
44. Validating edge
partitioning
0 2 4 6
x 10
4
0
0.05
0.1
0.15
0.2
Edge neighborhoods
Time(ms)
CPU
GPU
0 2 4 6
x 10
4
0
0.05
0.1
0.15
0.2
Edge neighborhoods
Time(ms)
• Edges partitioned by
“hardness”
• GPUs assigned sparser
neighborhoods
• Assigns edge neighborhoods
to “best” processor type
• Importance of initial
ordering
GPU workers assigned easy & balanced edge
neighborhoods (approx. equal runtimes)
CPU workers assigned difficult
unbalanced/skewed neighborhoods
45. Experiments: Improvement
Runtime improvement
over state-of-the-art
GPU: Uses a single multi-core GPU
Multi-GPU: Uses all available GPUs
Hybrid: Leverages all multi-core CPUs & GPUs
2 Intel Xeon CPUs (E5-2687) –
• 8 cores (3.10Ghz)
8 Titan Black NVIDIA GPUs –
• 2880 cores (889 Mhz), ~6GB
46. Experiments: Improvement
Runtime improvement
over state-of-the-art
Improvement:
significant at α = 0.01
GPU: Uses a single multi-core GPU
Multi-GPU: Uses all available GPUs
Hybrid: Leverages all multi-core CPUs & GPUs
47. Experiments: Improvement
Runtime improvement
over state-of-the-art
Improvement:
significant at α = 0.01
MEAN 8x 40x 126x
GPU: Uses a single multi-core GPU
Multi-GPU: Uses all available GPUs
Hybrid: Leverages all multi-core CPUs & GPUs
48. Comparing ORCA-GPU methods
Many problems with Orca-GPU:
• No “effective parallelization”, many parts dependent
• Requires synchronization throughout, locks
• No fine-grained parallelization
• Significant improvement
over Orca-GPU (at 𝛼 =
0.01)
Orca-GPU runtime /
runtime of proposed
method
ImprovementoverOrca-GPU
49. Varying the edge ordering
Ordering strategy significantly impacts performance
58. Framework & Algorithms
• Introduced hybrid graphlet counting approach that leverages all
available CPUs & GPUs
• First hybrid CPU-GPU approach for graphlet counting
• On average 126x faster than current methods
- Edge-centric computations (only requires access to edge neighborhood)
• Time and space-efficient
Applications
• Visual analytics and real-time graphlet mining
Summary