Localized methods in graph mining exploit the local structures in a graph instead attempting to find global structures. These are widely successful at all sorts of problems including community detection, label propagation, and a few others.
Localized methods for diffusions in large graphsDavid Gleich
I describe a few ongoing research projects on diffusions in large graphs and how we can create efficient matrix computations in order to determine them efficiently.
Spacey random walks and higher order Markov chainsDavid Gleich
My talk at SIAM NetSci workshop (2015) on our new spacey random walk and spacey random surfer models and how we derived them. There many potential extensions and opportunities to use this for analyzing big data as tensors.
Anti-differentiating Approximation Algorithms: PageRank and MinCutDavid Gleich
We study how Google's PageRank method relates to mincut and a particular type of electrical flow in a network. We also explain the details of how the "push method" for computing PageRank helps to accelerate it. This has implications for semi-supervised learning and machine learning, as well as social network analysis.
Higher-order organization of complex networksDavid Gleich
A talk I gave at the Park City Institute of Mathematics about our recent work on using motifs to analyze and cluster networks. This involves a higher-order cheeger inequality in terms of motifs.
PageRank Centrality of dynamic graph structuresDavid Gleich
A talk I gave at the SIAM Annual Meeting Mini-symposium on the mathematics of the power grid organized by Mahantesh Halappanavar. I discuss a few ideas on how our dynamic centrality could help analyze such situations.
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...David Gleich
This talk covers the idea of anti-differentiating approximation algorithms, which is an idea to explain the success of widely used heuristic procedures. Formally, this involves finding an optimization problem solved exactly by an approximation algorithm or heuristic.
Spacey random walks and higher-order data analysisDavid Gleich
My talk at TMA 2016 (The workshop on Tensors, Matrices, and their Applications) on the relationship between a spacey random walk process and tensor eigenvectors
Localized methods for diffusions in large graphsDavid Gleich
I describe a few ongoing research projects on diffusions in large graphs and how we can create efficient matrix computations in order to determine them efficiently.
Spacey random walks and higher order Markov chainsDavid Gleich
My talk at SIAM NetSci workshop (2015) on our new spacey random walk and spacey random surfer models and how we derived them. There many potential extensions and opportunities to use this for analyzing big data as tensors.
Anti-differentiating Approximation Algorithms: PageRank and MinCutDavid Gleich
We study how Google's PageRank method relates to mincut and a particular type of electrical flow in a network. We also explain the details of how the "push method" for computing PageRank helps to accelerate it. This has implications for semi-supervised learning and machine learning, as well as social network analysis.
Higher-order organization of complex networksDavid Gleich
A talk I gave at the Park City Institute of Mathematics about our recent work on using motifs to analyze and cluster networks. This involves a higher-order cheeger inequality in terms of motifs.
PageRank Centrality of dynamic graph structuresDavid Gleich
A talk I gave at the SIAM Annual Meeting Mini-symposium on the mathematics of the power grid organized by Mahantesh Halappanavar. I discuss a few ideas on how our dynamic centrality could help analyze such situations.
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...David Gleich
This talk covers the idea of anti-differentiating approximation algorithms, which is an idea to explain the success of widely used heuristic procedures. Formally, this involves finding an optimization problem solved exactly by an approximation algorithm or heuristic.
Spacey random walks and higher-order data analysisDavid Gleich
My talk at TMA 2016 (The workshop on Tensors, Matrices, and their Applications) on the relationship between a spacey random walk process and tensor eigenvectors
Spectral clustering with motifs and higher-order structuresDavid Gleich
I presented these slides at the #strathna meeting in Glasgow in June 2017. They are an updated and enhanced version of the earlier talks on the subject.
Using Local Spectral Methods to Robustify Graph-Based LearningDavid Gleich
This is my KDD2015 talk on robustness in semi-supervised learning. The paper is already on Michael Mahoney's website: http://www.stat.berkeley.edu/~mmahoney/pubs/robustifying-kdd15.pdf See the KDD paper for all the details, which this talk is a bit light on.
A copy of my slides from the SILO Seminar at UW Madison on our recent developments for the NEO-K-Means methods including new optimization routines and results.
Big data matrix factorizations and Overlapping community detection in graphsDavid Gleich
In a talk at the Chinese Academic of Sciences Institute for Automation, I discuss some of the MapReduce and community detection methods I've worked on.
Fast relaxation methods for the matrix exponential David Gleich
The matrix exponential is a matrix computing primitive used in link prediction and community detection. We describe a fast method to compute it using relaxation on a large linear system of equations. This enables us to compute a column of the matrix exponential is sublinear time, or under a second on a standard desktop computer.
Correlation clustering and community detection in graphs and networksDavid Gleich
We show a new relationship between various community detection objectives and a correlation clustering framework. These enable us to detect communities with good bounds on the solution.
Gaps between the theory and practice of large-scale matrix-based network comp...David Gleich
I discuss some runtimes for the personalized PageRank vector and how it relates to open questions in how we should tackle these network based measures via matrix computations.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Relaxation methods for the matrix exponential on large networksDavid Gleich
My talk from the Stanford ICME seminar series on doing network analysis and link prediction using the a fast algorithm for the matrix exponential on graph problems.
Presentation at OM-2017, the Twelfth International Workshop on Ontology Matching collocated with the 16th International Semantic Web Conference ISWC-2017, October 21st, 2017, Vienna, Austria
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Pattern-based classification of demographic sequencesDmitrii Ignatov
We have proposed prefix-based gapless sequential patterns for classification of demographic sequences. In comparison to black-box machine learning techniques, this one provides interpretable patterns suitable for treatment by professional demographers. As for the language, we have used Pattern Structures as an extension of Formal Concept Analysis for the case of complex data like sequences, graphs, intervals, etc.
1. Y. Gal, Uncertainty in Deep Learning, 2016
2. P. McClure, Representing Inferential Uncertainty in Deep Neural Networks Through Sampling, 2017
3. G. Khan et al., Uncertainty-Aware Reinforcement
Learning from Collision Avoidance, 2016
4. B. Lakshminarayanan et al., Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles, 2017
5. A. Kendal and Y. Gal, What Uncertainties Do We Need in
Bayesian Deep Learning for Computer Vision?, 2017
6. S. Choi et al., Uncertainty-Aware Learning from Demonstration Using Mixture Density Networks with Sampling-Free Variance Modeling, 2017
7. Anonymous, Bayesian Uncertainty Estimation for
Batch Normalized Deep Networks, 2017
Overlapping clusters for distributed computationDavid Gleich
My talk from WSDM2012. See the paper on my webpage: http://www.cs.purdue.edu/homes/dgleich/publications/Andersen%202012%20-%20overlapping.pdf
And the codes http://www.cs.purdue.edu/homes/dgleich/codes/overlapping/
Spectral clustering with motifs and higher-order structuresDavid Gleich
I presented these slides at the #strathna meeting in Glasgow in June 2017. They are an updated and enhanced version of the earlier talks on the subject.
Using Local Spectral Methods to Robustify Graph-Based LearningDavid Gleich
This is my KDD2015 talk on robustness in semi-supervised learning. The paper is already on Michael Mahoney's website: http://www.stat.berkeley.edu/~mmahoney/pubs/robustifying-kdd15.pdf See the KDD paper for all the details, which this talk is a bit light on.
A copy of my slides from the SILO Seminar at UW Madison on our recent developments for the NEO-K-Means methods including new optimization routines and results.
Big data matrix factorizations and Overlapping community detection in graphsDavid Gleich
In a talk at the Chinese Academic of Sciences Institute for Automation, I discuss some of the MapReduce and community detection methods I've worked on.
Fast relaxation methods for the matrix exponential David Gleich
The matrix exponential is a matrix computing primitive used in link prediction and community detection. We describe a fast method to compute it using relaxation on a large linear system of equations. This enables us to compute a column of the matrix exponential is sublinear time, or under a second on a standard desktop computer.
Correlation clustering and community detection in graphs and networksDavid Gleich
We show a new relationship between various community detection objectives and a correlation clustering framework. These enable us to detect communities with good bounds on the solution.
Gaps between the theory and practice of large-scale matrix-based network comp...David Gleich
I discuss some runtimes for the personalized PageRank vector and how it relates to open questions in how we should tackle these network based measures via matrix computations.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Relaxation methods for the matrix exponential on large networksDavid Gleich
My talk from the Stanford ICME seminar series on doing network analysis and link prediction using the a fast algorithm for the matrix exponential on graph problems.
Presentation at OM-2017, the Twelfth International Workshop on Ontology Matching collocated with the 16th International Semantic Web Conference ISWC-2017, October 21st, 2017, Vienna, Austria
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Pattern-based classification of demographic sequencesDmitrii Ignatov
We have proposed prefix-based gapless sequential patterns for classification of demographic sequences. In comparison to black-box machine learning techniques, this one provides interpretable patterns suitable for treatment by professional demographers. As for the language, we have used Pattern Structures as an extension of Formal Concept Analysis for the case of complex data like sequences, graphs, intervals, etc.
1. Y. Gal, Uncertainty in Deep Learning, 2016
2. P. McClure, Representing Inferential Uncertainty in Deep Neural Networks Through Sampling, 2017
3. G. Khan et al., Uncertainty-Aware Reinforcement
Learning from Collision Avoidance, 2016
4. B. Lakshminarayanan et al., Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles, 2017
5. A. Kendal and Y. Gal, What Uncertainties Do We Need in
Bayesian Deep Learning for Computer Vision?, 2017
6. S. Choi et al., Uncertainty-Aware Learning from Demonstration Using Mixture Density Networks with Sampling-Free Variance Modeling, 2017
7. Anonymous, Bayesian Uncertainty Estimation for
Batch Normalized Deep Networks, 2017
Overlapping clusters for distributed computationDavid Gleich
My talk from WSDM2012. See the paper on my webpage: http://www.cs.purdue.edu/homes/dgleich/publications/Andersen%202012%20-%20overlapping.pdf
And the codes http://www.cs.purdue.edu/homes/dgleich/codes/overlapping/
Fast matrix primitives for ranking, link-prediction and moreDavid Gleich
I gave this talk at Netflix about some of the recent work I've been doing on fast matrix primitives for link prediction and also some non-standard uses of the nuclear norm for ranking.
A history of PageRank from the numerical computing perspectiveDavid Gleich
We'll survey some of the underlying ideas from Google's PageRank algorithm along the lines of Massimo Franceschet's CACM history.
There are some slight liberties I've taken to make it more accessible.
This talk is a new update based on some of our recent results on doing Tall and Skinny QRs in MapReduce. In particular, the "fast" iterative refinement approximation based on a sample is new.
MapReduce Tall-and-skinny QR and applicationsDavid Gleich
A talk at the SIMONS workshop on Parallel and Distributed Algorithms for Inference and Optimization on how to do tall-and-skinny QR factorizations on MapReduce using a communication avoiding algorithm.
How does Google Google: A journey into the wondrous mathematics behind your f...David Gleich
A talk I gave at the annual meeting for the MetroNY section of the MAA about how Google works from a link-ranking perspective. (http://sections.maa.org/metrony/)
Based on a talk by Margot Gerritsen (which used elements from another talk I gave years ago, yay co-author improvements!)
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...David Gleich
My talk from KDD2012 about vertex neighborhoods and low conductance cuts. See the paper here: http://arxiv.org/abs/1112.0031 and http://dl.acm.org/citation.cfm?id=2339628
This presentation shows an overview of the main concepts introduced in the EDBT2015 Summer School, which took place in Palamos. For each area, we summarize the main issues and current approaches. We also describe the challenges and main activities that were undertaken in the summer school
Reading Group @ Kyoto University
Sheng Zhang, Rachel Rudinger, Kevin Duh, and Benjamin Van Durme. 2017. Ordinal Common-sense Inference. Transactions of the Association for Computational Linguistics (TACL) (To Appear)
Recommendation and graph algorithms in Hadoop and SQLDavid Gleich
A talk I gave at ancestry.com on Hadoop, SQL, recommendation and graph algorithms. It's a tutorial overview, there are better algorithms than those I describe, but these are a simple starting point.
Slides from our PacificVis 2015 presentation.
The paper tackles the problems of the “giant hairballs”, the dense and tangled structures often resulting from visualiza- tion of large social graphs. Proposed is a high-dimensional rotation technique called AGI3D, combined with an ability to filter elements based on social centrality values. AGI3D is targeted for a high-dimensional embedding of a social graph and its projection onto 3D space. It allows the user to ro- tate the social graph layout in the high-dimensional space by mouse dragging of a vertex. Its high-dimensional rotation effects give the user an illusion that he/she is destructively reshaping the social graph layout but in reality, it assists the user to find a preferred positioning and direction in the high- dimensional space to look at the internal structure of the social graph layout, keeping it unmodified. A prototype im- plementation of the proposal called Social Viewpoint Finder is tested with about 70 social graphs and this paper reports four of the analysis results.
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in RRevolution Analytics
Everything happens somewhere and spatial analysis attempts to use location as an explanatory variable. Such analysis is made complex by the very many ways we habitually record spatial location, the complexity of spatial data structures, and the wide variety of possible domain-driven questions we might ask. One option is to develop and use software for specific types of spatial data, another is to use a purpose-built geographical information system (GIS), but determined work by R enthusiasts has resulted in a multiplicity of packages in the R environment that can also be used.
Machine Learning Foundations for Professional ManagersAlbert Y. C. Chen
20180526@Taiwan AI Academy, Professional Managers Class.
Covering important concepts of classical machine learning, in preparation for deep learning topics to follow. Topics include regression (linear, polynomial, gaussian and sigmoid basis functions), dimension reduction (PCA, LDA, ISOMAP), clustering (K-means, GMM, Mean-Shift, DBSCAN, Spectral Clustering), classification (Naive Bayes, Logistic Regression, SVM, kNN, Decision Tree, Classifier Ensembles, Bagging, Boosting, Adaboost) and Semi-Supervised learning techniques. Emphasis on sampling, probability, curse of dimensionality, decision theory and classifier generalizability.
The “Local Ranking Problem” (LRP) is related to the computation of a centrality-like rank on a local graph, where the scores of the nodes could significantly differ from the ones computed on the global graph. Previous work has studied LRP on the hyperlink graph but never on the BrowseGraph, namely a graph where nodes are webpages and edges are browsing transitions. Recently, this graph has received more and more attention in many different tasks such as ranking, prediction and recommendation. However, a webserver has only the browsing traffic performed on its pages (local BrowseGraph) and, as a consequence, the local computation can lead to estimation errors, which hinders the increasing number of applications in the state of the art. Also, although the divergence between the local and global ranks has been measured, the possibility of estimating such divergence using only local knowledge has been mainly overlooked. These aspects are of great interest for online service providers who want to: (i) gauge their ability to correctly assess the importance of their resources only based on their local knowledge, and (ii) take into account real user browsing fluxes that better capture the actual user interest than the static hyperlink network. We study the LRP problem on a BrowseGraph from a large news provider, considering as subgraphs the aggregations of browsing traces of users coming from different domains. We show that the distance between rankings can be accurately predicted based only on structural information of the local graph, being able to achieve an average rank correlation as high as 0.8.
A spatio-temporal scientometrics framework for exploring the citation impact ...Song Gao
Gao, S., Hu, Y., Janowicz, K., & McKenzie, G. (2013, November). A spatiotemporal scientometrics framework for exploring the citation impact of publications and scientists. In Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (pp. 204-213). ACM.
Presentation slides for the paper 'Structural Patterns and Generative Models of Real-world Hypergraphs'. Published in KDD2020 - ACM SIGKDD International Conference on Knowedge Discovery and Data Mining
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
Localized methods in graph mining
1. Localized methods in !
graph mining
David F. Gleich!
Purdue University!
Joint work with
Kyle Kloster @"
Purdue &
Michael
Mahoney @
Berkeley
supported by "
NSF CAREER
CCF-1149756
David Gleich · Purdue
1
4. Localized methods in graph mining "
use the local structure of a network !
(and not the global structure).
USE THIS
NOT THIS
David Gleich · Purdue
4
5. Point 1 "
Localized methods are the right thing to
use for large graph mining
Point 2 "
Localized methods are still the right thing
to use even if you don't believe my
answer to part 1.
David Gleich · Purdue
5
6. Some graphs have global structure
David Gleich · Purdue
6
Image by R. Rossi from our paper"
on clique detection for "
Temporal Strong-Components
10. At large scales, !
real networks !
look random
(or slightly better)
David Gleich · Purdue
10
11. Localized methods only operate on
meaningful local structures in the data
David Gleich · Purdue
11
12. CAVEATS
There are large-scale global
structures.
BUT
They don’t look like what your
small-scale intuition would predict.
Continents exist in Facebook, but
they don’t look small scale
structures
Leskovec, Lang, Dasgupta, Mahoney.
Internet Math, 2009.
Ugander, Backstrom, WSDM (2013)
Jeub, Balachandran, Porter, Mucha,
Mahoney. Phys Rev E 2015.
David Gleich · Purdue
12
13. Point 1 "
Localized methods are the right thing to
use for large graph mining
Point 2 "
Localized methods are still the right thing
to use even if you don't believe my
answer to part 1.
David Gleich · Purdue
13
14. Local algorithms
give fast answers
to global queries "
(for small-source diffusions)
David Gleich · Purdue
14
16. Pictures from Sparse Matrix Respository (David & Hu)
www.cise.ufl.edu/research/sparse/matrices/
David Gleich · Purdue
16
17. Graph diffusions
David Gleich · Purdue
17
ate
t in
on
work, or mesh, from a typical problem in scientific computing
high
low
Diffusions show how
{importance, rank,
information, status, …}
flows from a source to
target nodes via edges
18. Graph diffusions
David Gleich · Purdue
18
f =
1X
k=0
↵k Pk
s
ate
t in
on
work, or mesh, from a typical problem in scientific computing
high
low
A – adjacency matrix!
D – degree matrix!
P – column stochastic operator
s – the “seed” (a sparse vector)
f – the diffusion result
𝛼k – the path weights
P = AD 1
Px =
X
j!i
1
dj
xj
Graph diffusions help:
1. Attribute prediction
2. Community detection
3. “Ranking”
4. Find small conductance sets
5. Graph label propagation
19. Graph diffusions
David Gleich · Purdue
19
ate
t in
on
work, or mesh, from a typical problem in scientific computing
high
low
h = e t
1X
k=0
tk
k!
Pk
s
h = e t
exp{tP}s
PageRank
Heat kernel
x = (1 )
1X
k=0
k
Pk
s
(I P)x = (1 )s
P = AD 1
Px =
X
j!i
1
dj
xj
20. Graph diffusions
David Gleich · Purdue
20
h = e t
1X
k=0
tk
k!
Pk
s
h = e t
exp{tP}s
PageRank
Heat kernel
0 20 40 60 80 100
10
−5
10
0
t=1 t=5 t=15 α=0.85
α=0.99
Weight
Length
x = (1 )
1X
k=0
k
Pk
s
(I P)x = (1 )s
22. Diffusion based !
community detection
1. Given a seed, approximate the
diffusion.
2. Extract the community.
Both are local operations.
David Gleich · Purdue
22
23. Conductance communities
Conductance is one of the most
important community scores [Schaeffer07]
The conductance of a set of vertices is
the ratio of edges leaving to total edges:
Equivalently, it’s the probability that a
random edge leaves the set.
Small conductance ó Good community
(S) =
cut(S)
min vol(S), vol( ¯S)
(edges leaving the set)
(total edges
in the set)
David Gleich · Purdue
cut(S) = 7
vol(S) = 33
vol( ¯S) = 11
(S) = 7/11
23
25. Sweep-cuts find small-
conductance sets
Class 1 Class 2 Class 3
Class 1
Class 2
Class 3
Class 1 Class 2 Class 3
Class 1
Class 2
Class 3
(b) Zhou (3 labels) (c) Andersen-Lang (3 labels) (d)
Class 1
Class 2
Class 3
Class 1
Class 2
Class 3
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
(a) The adjacency
structure of our sample
with the three
unbalanced classes
indicated.
Class 1 Class 2 Class 3
Class 1
Class 2
Class 3
Class 1 Class 2 Class 3
Class 1
Class 2
Class 3
(b) Zhou (3 labels) (c) Andersen-Lang (3 labels)
Class 1 Class 2 Class 3
Class 1
Class 2
Class 3
Class 1 Class 2 Class 3
Class 1
Class 2
Class 3
(e) Zhou (15 labels) (f) Andersen-Lang (15 labels)
Figure 4: A study of the paradoxical e↵ects of value-based rounding on di↵usion
GOOD !
SET 1!
Check the conductance for all “prefixes” of the diffusion vector
sorted by value – there is a fast update – O(sum of degrees work)
GOOD !
SET 2!
GOOD !
SET 3!
David Gleich · Purdue
25
26. Diffusions are localized "
and we have algorithms to find their local regions
David Gleich · Purdue
26
28. Our mission!
Find the solution with work "
roughly proportional to the "
localization, not the matrix.
David Gleich · Purdue
28
29. Our Point"
The push procedure gives "
localized algorithms for diffusions "
in a pleasingly wide variety of settings.
Our Results"
New empirical and theoretical insights into
why and how “push” is so effective
David Gleich · Purdue
29
30. The Push Algorithm for PageRank
Proposed (in closest form) in Andersen,
Chung, Lang (also by McSherry, Jeh &
Widom, Berkhin) for fast approx.
PageRank
Derived to show improved runtime for
balanced solvers
David Gleich · Purdue
30
1. Used for empirical studies
of “communities”
2. Local Cheeger inequality.
3. Used for “fast Page-Rank
approximation”
4. Works on massive graphs
O(1 second) for 4 billion
edge graph on a laptop.
5. It yields weakly localized
PageRank approximations!
Newman’s netscience!
379 vertices, 1828 nnz
Produce an ε-accurate entrywise
localized PageRank vector in work
1
"(1 )
31. Gauss-Seidel and !
Gauss-Southwell
David Gleich · Purdue
31
Methods to solve A x = b
x(k+1)
= x(k)
+ ⇢j ej [Ax(k+1)
]j = [b]jUpdate
such that
In words “Relax” or “free” the jth coordinate of your solution vector in
order to satisfy the jth equation of your linear system.
Gauss-Seidel repeatedly cycle through j = 1 to n
Gauss-Southwell use the value of j that has the highest magnitude residual
r(k)
= b Ax(k)
a
b
c
32. Almost “the push” method
The
Push
Method!
David Gleich · Purdue
32
1. x(1)
= 0, r(1)
= (1 )ei , k = 1
2. while any rj > "dj (dj is the degree of node j)
3. x(k+1)
= x(k)
+ (rj "dj ⇢)ej
4. r(k+1)
i =
8
><
>:
"dj ⇢ i = j
r(k)
i + (rj "dj ⇢)/dj i ⇠ j
r(k)
i otherwise
5. k k + 1
", ⇢
Only push “some” of the residual – If we want tolerance “eps” then
push to tolerance “eps” and no further
33. Push is fast!
For the PageRank diffusion, Push
gives constant work (entry-wise)."
Andersen, Chung, Lang FOCS 2006
1. For the Katz diffusion"
Push works empirically fast "
Bonchi, Gleich, et al., 2012, Internet Math.
2. For the exponential"
Push gives uniform localization
on power-law graphs and fast
runtimes"
Gleich and Kloster, 2014, Internet Math.
3. For the heat-kernel diffusion "
Push gives constant work
(entry-wise)"
Kloster and Gleich, 2014, KDD
4. For the PageRank diffusion "
Push yields sparsity
regularization"
Gleich and Mahoney, ICML 2014
5. For a general class of diffusions "
There is a Cheeger inequality
like before"
Ghosh, Teng, et al. KDD 2014
6. For the PageRank diffusion "
Push gives the solution path in
constant work (entry-wise)"
Kloster and Gleich, arXiv:1503.00322
x = exp(tP)ei
x = exp(P)ei
(I P)x
= (1 ↵)ei
(I A)x
= (1 ↵)ei
PageRank
Katz
David Gleich · Purdue
33
34. Push is useful!
1. Push implicitly regularizes semi-
supervised learning"
Gleich and Mahoney, submitted
2. Push gives state of the art
results for overlapping
community detection "
Whang, Gleich, Dhillon, CIKM 2013!
Whang, Gleich, Dhillon, In prep.
3. Push for overlapping clusters
decrease communication in
parallel solutions"
Andersen, Gleich, Mirrokni, WSDM 2012
David Gleich · Purdue
34
F1 F2
0.1
0.12
0.14
0.16
0.18
0.2
0.22
0.24
DBLP
demon
bigclam
graclus centers
spread hubs
random
egonet
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
0.6 dem
big
gra
spr
ran
ego
Figure 3: F1 and F2 measures comparing our algorithmic comm
indicates better communities.
6
7
8
Run time
demon
bigclam
graclus centers
spread hubs
random
Our seed set
because eac
property ind
sion method
version. Als
Seeding Phase
Seed Set Expansion Phase
Propagation Phase
Joyce Jiyoung Whang, The University of Texas at Austin Conference on Information and Knowledge Management (8/44)
35. HK PPR
F1 0.87 0.34
Set size 14 67
F1 0.33 0.14
Set size 192 15293
Amazon
(Average)
Us! Prev. Best
Thisset
Heat Kernel Based Community Detection
KDD 2014
Kyle Kloster and David F. Gleich!
Purdue University
f = exp{tP}s =
1X
k=0
tk
k! Pk
s
Convert to a linear system, and
solve in constant time
36. Heat kernel localization
General recipe!
1. Take problem X, "
convert into a linear
system
2. Apply “push” to that
linear system
3. Analyze and bound
total work
David
Gleich
·∙
Purdue
36
Heat kernel recipe!
1. Convert into "
"
"
2. Apply “push”
3. Analyze work bound "
x = exp(tP)ei
2
6
6
6
6
6
6
4
III
tP/1 III
tP/2
...
... III
tP/N III
3
7
7
7
7
7
7
5
2
6
6
6
6
6
6
4
v0
v1
...
...
vN
3
7
7
7
7
7
7
5
=
2
6
6
6
6
6
6
4
ei
0
...
...
0
3
7
7
7
7
7
7
5
37. There is a fast deterministic
adaptation of the push method
David Gleich · Purdue
37
Kloster & Gleich,
KDD2014
ons
hat
hen,
erm
s is
d to
the
s:
(7)
(8)
tity
em
(9)
to
k ⇡
we
# G is graph as dictionary -of -sets ,
# seed is an array of seeds ,
# t, eps , N, psis are precomputed
x = {} # Store x, r as dictionaries
r = {} # initialize residual
Q = collections.deque () # initialize queue
for s in seed:
r[(s ,0)] = 1./ len(seed)
Q.append ((s ,0))
while len(Q) > 0:
(v,j) = Q.popleft () # v has r[(v,j)] ...
rvj = r[(v,j)]
# perform the hk -relax step
if v not in x: x[v] = 0.
x[v] += rvj
r[(v,j)] = 0.
mass = (t*rvj/( float(j)+1.))/ len(G[v])
for u in G[v]: # for neighbors of v
next = (u,j+1) # in the next block
if j+1 == N: # last step , add to soln
x[u] += rvj/len(G(v))
continue
if next not in r: r[next] = 0.
thresh = math.exp(t)*eps*len(G[u])
thresh = thresh /(N*psis[j+1])/2.
if r[next] < thresh and
r[next] + mass >= thresh:
Q.append(next) # add u to queue
r[next] = r[next] + mass
Figure 2: Pseudo-code for our algorithm as work-
ing python code. The graph is stored as a dic-
Let h = e t
exp{tP}s.
Let x = hk-push(") output
Then kD 1
(x h)k1 "
after looking at 2Net
" edges.
We believe that the bound below suffices
N 2t log(1/")
MMDS 2014
THEOREM!
38. Analysis, three pages to one slide
1. State the approximation error that results from
approximating using the linear system.!
“Standard” matrix-approximation result.
2. Bound the work involved in doing push. !
Iterate y ≥ 0, residual r ≥ 0 "
Each step moves “mass” from r to y, "
keeps non-neg and increasing property."
Each step moves at least “deg(i)·ε” mass in deg(i) work"
So in T steps, we “push” Sum [ deg(i)·ε , i in each step]"
But we can only push “so much”, so we can bound this
from above, and invert to get a total work bound.
David Gleich · Purdue
38
Kloster & Gleich,
KDD2014
X
i2steps
"deg(i) et
41. PageRank solution paths
David Gleich · Purdue
41
These take about a second
to compute with our “new”
push-based algorithm on
graphs with millions of
nodes and edges
Related to the LARS
method for 1-norm
regularized problems
42. Use “centers” of graph partitions to
seed for overlapping communities
David Gleich · Purdue
42
0 10 20 30 40 50 60 70 80 90 100
0
Coverage (percentage)
Student Version of MATLAB
(a) AstroPh
0
0
0 10 20 30 40 50 60 70 80 90 100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Coverage (percentage)
MaximumConductance
egonet
graclus centers
spread hubs
random
bigclam
(d) Flickr
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
MaximumConductance
Flickr social
network
2M vertices"
22M edges
We can cover
95% of network
with communities
of cond. ~0.15.
43. References and ongoing work
Gleich and Kloster – Relaxation methods for the matrix exponential, J.
Internet Math "
Kloster and Gleich – Heat kernel based community detection KDD2014
Gleich and Mahoney – Algorithmic Anti-differentiation, ICML 2014 "
Gleich and Mahoney – Regularized diffusions, Submitted
Whang, Gleich, Dhillon – Seeds for overlapping communities, CIKM 2013
www.cs.purdue.edu/homes/dgleich/codes/nexpokit!
www.cs.purdue.edu/homes/dgleich/codes/l1pagerank
• Improved localization bounds for functions of matrices
• Asynchronous and parallel “push”-style methods
• Localized methods beyond conductance
David Gleich · Purdue
43
Supported by NSF CAREER 1149756-CCF
www.cs.purdue.edu/homes/dgleich