Spacey random walks and higher-order data analysisDavid Gleich
My talk at TMA 2016 (The workshop on Tensors, Matrices, and their Applications) on the relationship between a spacey random walk process and tensor eigenvectors
Anti-differentiating Approximation Algorithms: PageRank and MinCutDavid Gleich
We study how Google's PageRank method relates to mincut and a particular type of electrical flow in a network. We also explain the details of how the "push method" for computing PageRank helps to accelerate it. This has implications for semi-supervised learning and machine learning, as well as social network analysis.
Localized methods in graph mining exploit the local structures in a graph instead attempting to find global structures. These are widely successful at all sorts of problems including community detection, label propagation, and a few others.
Analysis and design of algorithms part 3Deepak John
Graphs and graph traversals. Strongly connected components of a Directed graph. Biconnected components of an undirected graph.
Transitive closure of a Binary relation. Warshalls algorithm for Transitive closure. All pair shortest path in graphs. Dynamic programming. Constructing optimal binary search trees.
Branch and-bound nearest neighbor searching over unbalanced trie-structured o...Michail Argyriou
Master presentation of Mike Argyriou in Technological University of Crete about
Branch and-bound nearest neighbor searching over unbalanced trie-structured overlays.
Sensors and Samples: A Homological ApproachDon Sheehy
In their seminal work on homological sensor networks, de Silva and Ghrist showed the surprising fact that its possible to certify the coverage of a coordinate free sensor network even with very minimal knowledge of the space to be covered. We give a new, simpler proof of the de Silva-Ghrist Topological Coverage Criterion that eliminates any assumptions about the smoothness of the boundary of the underlying space, allowing the results to be applied to much more general problems. The new proof factors the geometric, topological, and combinatorial aspects of this approach. This factoring reveals an interesting new connection between the topological coverage condition and the notion of weak feature size in geometric sampling theory. We then apply this connection to the problem of showing that for a given scale, if one knows the number of connected components and the distance to the boundary, one can also infer the higher betti numbers or provide strong evidence that more samples are needed. This is in contrast to previous work which merely assumed a good sample and gives no guarantees if the sampling condition is not met.
In topological inference, the goal is to extract information about a shape, given only a sample of points from it. There are many approaches to this problem, but the one we focus on is persistent homology. We get a view of the data at different scales by imagining the points are balls and consider different radii. The shape information we want comes in the form of a persistence diagram, which describes the components, cycles, bubbles, etc in the space that persist over a range of different scales.
To actually compute a persistence diagram in the geometric setting, previous work required complexes of size n^O(d). We reduce this complexity to O(n) (hiding some large constants depending on d) by using ideas from mesh generation.
This talk will not assume any knowledge of topology. This is joint work with Gary Miller, Benoit Hudson, and Steve Oudot.
Correlation clustering and community detection in graphs and networksDavid Gleich
We show a new relationship between various community detection objectives and a correlation clustering framework. These enable us to detect communities with good bounds on the solution.
Spacey random walks and higher-order data analysisDavid Gleich
My talk at TMA 2016 (The workshop on Tensors, Matrices, and their Applications) on the relationship between a spacey random walk process and tensor eigenvectors
Anti-differentiating Approximation Algorithms: PageRank and MinCutDavid Gleich
We study how Google's PageRank method relates to mincut and a particular type of electrical flow in a network. We also explain the details of how the "push method" for computing PageRank helps to accelerate it. This has implications for semi-supervised learning and machine learning, as well as social network analysis.
Localized methods in graph mining exploit the local structures in a graph instead attempting to find global structures. These are widely successful at all sorts of problems including community detection, label propagation, and a few others.
Analysis and design of algorithms part 3Deepak John
Graphs and graph traversals. Strongly connected components of a Directed graph. Biconnected components of an undirected graph.
Transitive closure of a Binary relation. Warshalls algorithm for Transitive closure. All pair shortest path in graphs. Dynamic programming. Constructing optimal binary search trees.
Branch and-bound nearest neighbor searching over unbalanced trie-structured o...Michail Argyriou
Master presentation of Mike Argyriou in Technological University of Crete about
Branch and-bound nearest neighbor searching over unbalanced trie-structured overlays.
Sensors and Samples: A Homological ApproachDon Sheehy
In their seminal work on homological sensor networks, de Silva and Ghrist showed the surprising fact that its possible to certify the coverage of a coordinate free sensor network even with very minimal knowledge of the space to be covered. We give a new, simpler proof of the de Silva-Ghrist Topological Coverage Criterion that eliminates any assumptions about the smoothness of the boundary of the underlying space, allowing the results to be applied to much more general problems. The new proof factors the geometric, topological, and combinatorial aspects of this approach. This factoring reveals an interesting new connection between the topological coverage condition and the notion of weak feature size in geometric sampling theory. We then apply this connection to the problem of showing that for a given scale, if one knows the number of connected components and the distance to the boundary, one can also infer the higher betti numbers or provide strong evidence that more samples are needed. This is in contrast to previous work which merely assumed a good sample and gives no guarantees if the sampling condition is not met.
In topological inference, the goal is to extract information about a shape, given only a sample of points from it. There are many approaches to this problem, but the one we focus on is persistent homology. We get a view of the data at different scales by imagining the points are balls and consider different radii. The shape information we want comes in the form of a persistence diagram, which describes the components, cycles, bubbles, etc in the space that persist over a range of different scales.
To actually compute a persistence diagram in the geometric setting, previous work required complexes of size n^O(d). We reduce this complexity to O(n) (hiding some large constants depending on d) by using ideas from mesh generation.
This talk will not assume any knowledge of topology. This is joint work with Gary Miller, Benoit Hudson, and Steve Oudot.
Similar to Computing Local and Global Centrality (20)
Correlation clustering and community detection in graphs and networksDavid Gleich
We show a new relationship between various community detection objectives and a correlation clustering framework. These enable us to detect communities with good bounds on the solution.
Spectral clustering with motifs and higher-order structuresDavid Gleich
I presented these slides at the #strathna meeting in Glasgow in June 2017. They are an updated and enhanced version of the earlier talks on the subject.
Higher-order organization of complex networksDavid Gleich
A talk I gave at the Park City Institute of Mathematics about our recent work on using motifs to analyze and cluster networks. This involves a higher-order cheeger inequality in terms of motifs.
A copy of my slides from the SILO Seminar at UW Madison on our recent developments for the NEO-K-Means methods including new optimization routines and results.
Using Local Spectral Methods to Robustify Graph-Based LearningDavid Gleich
This is my KDD2015 talk on robustness in semi-supervised learning. The paper is already on Michael Mahoney's website: http://www.stat.berkeley.edu/~mmahoney/pubs/robustifying-kdd15.pdf See the KDD paper for all the details, which this talk is a bit light on.
Spacey random walks and higher order Markov chainsDavid Gleich
My talk at SIAM NetSci workshop (2015) on our new spacey random walk and spacey random surfer models and how we derived them. There many potential extensions and opportunities to use this for analyzing big data as tensors.
PageRank Centrality of dynamic graph structuresDavid Gleich
A talk I gave at the SIAM Annual Meeting Mini-symposium on the mathematics of the power grid organized by Mahantesh Halappanavar. I discuss a few ideas on how our dynamic centrality could help analyze such situations.
Big data matrix factorizations and Overlapping community detection in graphsDavid Gleich
In a talk at the Chinese Academic of Sciences Institute for Automation, I discuss some of the MapReduce and community detection methods I've worked on.
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...David Gleich
This talk covers the idea of anti-differentiating approximation algorithms, which is an idea to explain the success of widely used heuristic procedures. Formally, this involves finding an optimization problem solved exactly by an approximation algorithm or heuristic.
Localized methods for diffusions in large graphsDavid Gleich
I describe a few ongoing research projects on diffusions in large graphs and how we can create efficient matrix computations in order to determine them efficiently.
Fast relaxation methods for the matrix exponential David Gleich
The matrix exponential is a matrix computing primitive used in link prediction and community detection. We describe a fast method to compute it using relaxation on a large linear system of equations. This enables us to compute a column of the matrix exponential is sublinear time, or under a second on a standard desktop computer.
Fast matrix primitives for ranking, link-prediction and moreDavid Gleich
I gave this talk at Netflix about some of the recent work I've been doing on fast matrix primitives for link prediction and also some non-standard uses of the nuclear norm for ranking.
Gaps between the theory and practice of large-scale matrix-based network comp...David Gleich
I discuss some runtimes for the personalized PageRank vector and how it relates to open questions in how we should tackle these network based measures via matrix computations.
MapReduce Tall-and-skinny QR and applicationsDavid Gleich
A talk at the SIMONS workshop on Parallel and Distributed Algorithms for Inference and Optimization on how to do tall-and-skinny QR factorizations on MapReduce using a communication avoiding algorithm.
Recommendation and graph algorithms in Hadoop and SQLDavid Gleich
A talk I gave at ancestry.com on Hadoop, SQL, recommendation and graph algorithms. It's a tutorial overview, there are better algorithms than those I describe, but these are a simple starting point.
Relaxation methods for the matrix exponential on large networksDavid Gleich
My talk from the Stanford ICME seminar series on doing network analysis and link prediction using the a fast algorithm for the matrix exponential on graph problems.
This talk is a new update based on some of our recent results on doing Tall and Skinny QRs in MapReduce. In particular, the "fast" iterative refinement approximation based on a sample is new.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
5. Graph centrality
This talk
Path summation
X
f (paths of length `)
`
local Katz score
X number of paths of
↵` ·
length ` between i and j
`
5/41
6. A – adjacency matrix
L – Laplacian matrix
P – random walk transition matrix
Katz score
Ki,j = [(I ↵AT ) 1 ]i,j
Commute time
Ci,j = vol(G)(L+ + L+
i,i j,j 2L+ )
i,j
PageRank
(I ↵P T )x = (1 ↵)e/n
Xi,j = (1 ↵)[(I ↵P T ) 1 ]i,j
6/41
7. USES FOR CENTRALITY
Ranking features for web-search/classification
Najork, M. A.; Zaragoza, H. & Taylor, M. J.#
HITS on the web: How does it compare?
Becchetti, L.; Castillo, C.; Donato, D.; Baeza-Yates, R.
& Leonardi, S. Link analysis for Web spam detection
Interesting nodes
GeneRank, ProteinRank, TwitterRank, IsoRank,
FutureRank, HostRank, DiffusionRank, ItemRank,
SocialPageRank, SimRank
7/41
8. USES FOR CENTRALITY
Ranking networks of comparisons.
Chartier, T. P.; Kreutzer, E.; Langville, A. N. & Pedings,
K. E. Sensitivity and Stability of Ranking Vectors
Clustering or community detection
Andersen, R.; Chung, F. & Lang, K.#
Local Graph Partitioning using PageRank Vectors
Link prediction
Savas et al. Hold on about 90 minutes
8/41
10. MATRICES, MOMENTS, QUADRATURE
Estimate a quadratic form
T
l x f (Z )x u
T +
(ei ej ) L (ei ej ) Commute
1 T 1
(ei + ej )T (I ↵P ) 1
(ei + ej ) (ei ej )T (I ↵P T ) 1
(ei ej ) Katz
4
4
Also used by Benzi and Bonito (LAA) for Katz
scores and the matrix exponential
10/41
11. MMQ - THE BIG IDEA
Quadratic form Think
Weighted sum A is s.p.d. use EVD
Stieltjes integral “A tautology”
Quadrature approximation
Matrix equation Lanczos
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 22 of 47
11/41
12. MMQ PROCEDURE
Goal
Given
1. Run k-steps of Lanczos on starting with
2. Compute , with an additional eigenvalue at ,
set Correspond to a Gauss-Radau rule, with
u as a prescribed node
3. Compute , with an additional eigenvalue at , set
Correspond to a Gauss-Radau rule, with
l as a prescribed node
4. Output as lower and upper bounds on
12/41
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 25 of 47
13. How well does it work?
Bounds
Error
arxiv, Katz, hard alpha arxiv, Katz, hard
50
0
10
0
-5
10
-50 5 10 15 20 25 30
5 10 15 20 25 30 matrix-vector products
matrix-vector products
13/41
������ = 1/( || A ||2 + 1 )
15. Katz scores
ATZ SCORES ARE LOCALIZED
T
(I ↵A )k = e i are highly
localized.
Up to 50 neighbors is
99.65% of the total
mass
15/41
Gleich (Purdue) Univ. Chicago SSCS Seminar 32 of 47
17. TOP-K ALGORITHM FOR KATZ
Approximate
T
where is sparse
Keep sparse too
Ideally, don’t “touch” all of
17/41
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 34 of
18. TOP-K ALGORITHM FOR KATZ
Approximate
T
where is sparse
Keep sparse too
Ideally, don’t “touch” all of
This is possible for "
18/41
David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 34 of
personalized PageRank!
19. Richardson Ax = b
x(k+1) = x(k) + r(k) A = AT , A ⌫ 0 Gradient descent
r(k+1) = b Ax(k) equivalent# min xT Ax 2xT b
to
What about coordinate descent?
Gauss-Southwell Ax = b
x(k+1) = x(k) + rj(k) ej How to
r(k+1) = r(k) + rj(k) Aej pick j?
Frequently “rediscovered” for PageRank.
19/41
McSherry (WWW2005), Berkhin (JIM 2007),
Andersen-Chung-Lang (FOCS 2006)
21. NEW CONVERGENCE THEORY
Katz and PageRank are equivalent if
������ < 1 / || A ||1
Gauss-Southwell converges when ������ < 1 / || A ||2
(Luo and Tseng 1992) if j is picked as the largest
residual
Read all about it
Fast matrix computations for pair-wise and column-wise commute times and
Katz scores. Bonchi, Esfandiar, Gleich, Greif, Lakshmanan, J. Internet
Mathematics (to appear)
21/41
23. OPEN QUESTIONS
I can’t find any existing derivation of this method
in the non-symmetric case (prior to the
PageRank literature). Any thoughts?
How to show that the method convergence for a
non-symmetric matrix when (I ↵P T ) is not
diagonally dominant?
23/41
27. Overlapping
Clusters
Use the
redundancy to
reduce
communication
when solving a
PageRank problem
Overlapping clusters for distributed computation. #
27/41
Andersen, Gleich, Mirrokni, WSDM2012 (to appear).
29. KEY POINTS
Utilize personalized PageRank vectors to find
the clusters with “good” conductance scores.
Define “core” vertices for each cluster. Find a
good way to cover the graph with these
clusters.
Use restricted additive Schwarz to solve #
(thanks Prof. Szyld and Frommer!)
29/41
30. All nodes solve locally using #
the coordinate descent method.
30/41
31. All nodes solve locally using #
the coordinate descent method.
A core vertex for the
31/41
gray cluster.
32. All nodes solve locally using #
the coordinate descent method.
Red sends residuals to white.
White send residuals to red.
32/41
33. White then uses the coordinate
descent method to adjust its solution.
33/41
Will cause communication to red/blue.
34. It works!
2
Swapping Probability (usroads)
PageRank Communication (usroads)
Swapping Probability (web−Google)
1.5
PageRank Communication (web−Google)
Relative Work
1 Metis Partitioner
0.5
0
1 1.1 1.2 1.3 1.4 1.5 1.6 1.7
Volume Ratio
How much more of the
34/41
graph we need to store.
35. PERSONALIZED PAGERANK CLUSTERS
Solve (I ↵P T )x = (1 ↵)ei
#
to a large degree-weighted tolerance ������
Sweep over the vertices in order of their degree-
normalized rank. Find the best conductance set.
A Cheeger-like inequality. (Not a heuristic.)
35/41
36. CORE VERTICES
Compute the expected “leavetime” for each
vertex in a cluster.
Keep increasing the threshold for a “good”
vertex until every vertex is core in some cluster.
Then approximate a set-cover problem to cover
the graph with clusters, and use a heuristic to
pack vertices until
36/41