My talk from WSDM2012. See the paper on my webpage: http://www.cs.purdue.edu/homes/dgleich/publications/Andersen%202012%20-%20overlapping.pdf
And the codes http://www.cs.purdue.edu/homes/dgleich/codes/overlapping/
A copy of my slides from the SILO Seminar at UW Madison on our recent developments for the NEO-K-Means methods including new optimization routines and results.
A copy of my slides from the SILO Seminar at UW Madison on our recent developments for the NEO-K-Means methods including new optimization routines and results.
In these slides, Generative Adversarial Network (GAN) is briefly introduced, and some GAN applications in medical imaging are presented. In the conclusions, some comments are given for persons who are interested in research of medical imaging using GAN.
t-SNE is a modern visualization algorithm that presents high-dimensional data in 2 or 3 dimensions according to some desired distances. If you have some data and you can measure their pairwise differences, t-SNE visualization can help you identify various clusters.
In this tutorial, we will learn the the following topics -
+ The Curse of Dimensionality
+ Main Approaches for Dimensionality Reduction
+ PCA - Principal Component Analysis
+ Kernel PCA
+ LLE
+ Other Dimensionality Reduction Techniques
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleHakka Labs
By Dmitry Storcheus (Engineer, Google Research)
Feature extraction, as usually understood, seeks an optimal transformation from raw data into features that can be used as an input for a learning algorithm. In recent times this problem has been attacked using a growing number of diverse techniques that originated in separate research communities: from PCA and LDA to manifold and metric learning. The goal of this talk is to contrast and compare feature extraction techniques coming from different machine learning areas as well as discuss the modern challenges and open problems in feature extraction. Moreover, this talk will suggest novel solutions to some of the challenges discussed, particularly to coupled feature extraction.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
This presentation deals with the formal presentation of anomaly detection and outlier analysis and types of anomalies and outliers. Different approaches to tackel anomaly detection problems.
[PR12] categorical reparameterization with gumbel softmaxJaeJun Yoo
(Korean) Introduction to (paper1) Categorical Reparameterization with Gumbel Softmax and (paper2) The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables
Video: https://youtu.be/ty3SciyoIyk
Paper1: https://arxiv.org/abs/1611.01144
Paper2: https://arxiv.org/abs/1611.00712
The first lecture from the Machine Learning course series of lectures. The lecture covers basic principles of machine learning, such as the difference between supervised and unsupervised learning, several classifiers: nearest neighbour (k-NN), decision trees, random forest, major obstacles in machine learning: overfitting and the curse of dimensionality, followed by cross-validation algorithm and general ML pipeline. A link to my github (https://github.com/skyfallen/MachineLearningPracticals) with practicals that I have designed for this course in both R and Python. I can share keynote files, contact me via e-mail: dmytro.fishman@ut.ee.
Spatial filtering using image processingAnuj Arora
spatial filtering in image processing (explanation cocept of
mask),lapace filtering and filtering process of image for extract information and reduce noise
PageRank Centrality of dynamic graph structuresDavid Gleich
A talk I gave at the SIAM Annual Meeting Mini-symposium on the mathematics of the power grid organized by Mahantesh Halappanavar. I discuss a few ideas on how our dynamic centrality could help analyze such situations.
In these slides, Generative Adversarial Network (GAN) is briefly introduced, and some GAN applications in medical imaging are presented. In the conclusions, some comments are given for persons who are interested in research of medical imaging using GAN.
t-SNE is a modern visualization algorithm that presents high-dimensional data in 2 or 3 dimensions according to some desired distances. If you have some data and you can measure their pairwise differences, t-SNE visualization can help you identify various clusters.
In this tutorial, we will learn the the following topics -
+ The Curse of Dimensionality
+ Main Approaches for Dimensionality Reduction
+ PCA - Principal Component Analysis
+ Kernel PCA
+ LLE
+ Other Dimensionality Reduction Techniques
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleHakka Labs
By Dmitry Storcheus (Engineer, Google Research)
Feature extraction, as usually understood, seeks an optimal transformation from raw data into features that can be used as an input for a learning algorithm. In recent times this problem has been attacked using a growing number of diverse techniques that originated in separate research communities: from PCA and LDA to manifold and metric learning. The goal of this talk is to contrast and compare feature extraction techniques coming from different machine learning areas as well as discuss the modern challenges and open problems in feature extraction. Moreover, this talk will suggest novel solutions to some of the challenges discussed, particularly to coupled feature extraction.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
This presentation deals with the formal presentation of anomaly detection and outlier analysis and types of anomalies and outliers. Different approaches to tackel anomaly detection problems.
[PR12] categorical reparameterization with gumbel softmaxJaeJun Yoo
(Korean) Introduction to (paper1) Categorical Reparameterization with Gumbel Softmax and (paper2) The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables
Video: https://youtu.be/ty3SciyoIyk
Paper1: https://arxiv.org/abs/1611.01144
Paper2: https://arxiv.org/abs/1611.00712
The first lecture from the Machine Learning course series of lectures. The lecture covers basic principles of machine learning, such as the difference between supervised and unsupervised learning, several classifiers: nearest neighbour (k-NN), decision trees, random forest, major obstacles in machine learning: overfitting and the curse of dimensionality, followed by cross-validation algorithm and general ML pipeline. A link to my github (https://github.com/skyfallen/MachineLearningPracticals) with practicals that I have designed for this course in both R and Python. I can share keynote files, contact me via e-mail: dmytro.fishman@ut.ee.
Spatial filtering using image processingAnuj Arora
spatial filtering in image processing (explanation cocept of
mask),lapace filtering and filtering process of image for extract information and reduce noise
PageRank Centrality of dynamic graph structuresDavid Gleich
A talk I gave at the SIAM Annual Meeting Mini-symposium on the mathematics of the power grid organized by Mahantesh Halappanavar. I discuss a few ideas on how our dynamic centrality could help analyze such situations.
Localized methods in graph mining exploit the local structures in a graph instead attempting to find global structures. These are widely successful at all sorts of problems including community detection, label propagation, and a few others.
Fast matrix primitives for ranking, link-prediction and moreDavid Gleich
I gave this talk at Netflix about some of the recent work I've been doing on fast matrix primitives for link prediction and also some non-standard uses of the nuclear norm for ranking.
Spacey random walks and higher order Markov chainsDavid Gleich
My talk at SIAM NetSci workshop (2015) on our new spacey random walk and spacey random surfer models and how we derived them. There many potential extensions and opportunities to use this for analyzing big data as tensors.
Using Local Spectral Methods to Robustify Graph-Based LearningDavid Gleich
This is my KDD2015 talk on robustness in semi-supervised learning. The paper is already on Michael Mahoney's website: http://www.stat.berkeley.edu/~mmahoney/pubs/robustifying-kdd15.pdf See the KDD paper for all the details, which this talk is a bit light on.
Anti-differentiating Approximation Algorithms: PageRank and MinCutDavid Gleich
We study how Google's PageRank method relates to mincut and a particular type of electrical flow in a network. We also explain the details of how the "push method" for computing PageRank helps to accelerate it. This has implications for semi-supervised learning and machine learning, as well as social network analysis.
Gaps between the theory and practice of large-scale matrix-based network comp...David Gleich
I discuss some runtimes for the personalized PageRank vector and how it relates to open questions in how we should tackle these network based measures via matrix computations.
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...David Gleich
This talk covers the idea of anti-differentiating approximation algorithms, which is an idea to explain the success of widely used heuristic procedures. Formally, this involves finding an optimization problem solved exactly by an approximation algorithm or heuristic.
MapReduce Tall-and-skinny QR and applicationsDavid Gleich
A talk at the SIMONS workshop on Parallel and Distributed Algorithms for Inference and Optimization on how to do tall-and-skinny QR factorizations on MapReduce using a communication avoiding algorithm.
This talk is a new update based on some of our recent results on doing Tall and Skinny QRs in MapReduce. In particular, the "fast" iterative refinement approximation based on a sample is new.
A history of PageRank from the numerical computing perspectiveDavid Gleich
We'll survey some of the underlying ideas from Google's PageRank algorithm along the lines of Massimo Franceschet's CACM history.
There are some slight liberties I've taken to make it more accessible.
How does Google Google: A journey into the wondrous mathematics behind your f...David Gleich
A talk I gave at the annual meeting for the MetroNY section of the MAA about how Google works from a link-ranking perspective. (http://sections.maa.org/metrony/)
Based on a talk by Margot Gerritsen (which used elements from another talk I gave years ago, yay co-author improvements!)
Rank aggregation via nuclear norm minimizationDavid Gleich
The process of rank aggregation is intimately intertwined with
the structure of skew-symmetric matrices.
We apply recent advances in the theory and algorithms of matrix completion
to skew-symmetric matrices. This combination of ideas
produces a new method for ranking a set of items. The essence
of our idea is that a rank aggregation describes a partially
filled skew-symmetric matrix. We extend an algorithm for
matrix completion to handle skew-symmetric data and use that
to extract ranks for each item.
Our algorithm applies to both pairwise comparison and rating data.
Because it is based on matrix completion, it is robust to
both noise and incomplete data. We show a formal
recovery result for the noiseless case and
present a detailed study of the algorithm on synthetic
data and Netflix ratings.
Correlation clustering and community detection in graphs and networksDavid Gleich
We show a new relationship between various community detection objectives and a correlation clustering framework. These enable us to detect communities with good bounds on the solution.
Spectral clustering with motifs and higher-order structuresDavid Gleich
I presented these slides at the #strathna meeting in Glasgow in June 2017. They are an updated and enhanced version of the earlier talks on the subject.
Higher-order organization of complex networksDavid Gleich
A talk I gave at the Park City Institute of Mathematics about our recent work on using motifs to analyze and cluster networks. This involves a higher-order cheeger inequality in terms of motifs.
Spacey random walks and higher-order data analysisDavid Gleich
My talk at TMA 2016 (The workshop on Tensors, Matrices, and their Applications) on the relationship between a spacey random walk process and tensor eigenvectors
Big data matrix factorizations and Overlapping community detection in graphsDavid Gleich
In a talk at the Chinese Academic of Sciences Institute for Automation, I discuss some of the MapReduce and community detection methods I've worked on.
Localized methods for diffusions in large graphsDavid Gleich
I describe a few ongoing research projects on diffusions in large graphs and how we can create efficient matrix computations in order to determine them efficiently.
Fast relaxation methods for the matrix exponential David Gleich
The matrix exponential is a matrix computing primitive used in link prediction and community detection. We describe a fast method to compute it using relaxation on a large linear system of equations. This enables us to compute a column of the matrix exponential is sublinear time, or under a second on a standard desktop computer.
Recommendation and graph algorithms in Hadoop and SQLDavid Gleich
A talk I gave at ancestry.com on Hadoop, SQL, recommendation and graph algorithms. It's a tutorial overview, there are better algorithms than those I describe, but these are a simple starting point.
Relaxation methods for the matrix exponential on large networksDavid Gleich
My talk from the Stanford ICME seminar series on doing network analysis and link prediction using the a fast algorithm for the matrix exponential on graph problems.
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...David Gleich
My talk from KDD2012 about vertex neighborhoods and low conductance cuts. See the paper here: http://arxiv.org/abs/1112.0031 and http://dl.acm.org/citation.cfm?id=2339628
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Welocme to ViralQR, your best QR code generator.ViralQR
Welcome to ViralQR, your best QR code generator available on the market!
At ViralQR, we design static and dynamic QR codes. Our mission is to make business operations easier and customer engagement more powerful through the use of QR technology. Be it a small-scale business or a huge enterprise, our easy-to-use platform provides multiple choices that can be tailored according to your company's branding and marketing strategies.
Our Vision
We are here to make the process of creating QR codes easy and smooth, thus enhancing customer interaction and making business more fluid. We very strongly believe in the ability of QR codes to change the world for businesses in their interaction with customers and are set on making that technology accessible and usable far and wide.
Our Achievements
Ever since its inception, we have successfully served many clients by offering QR codes in their marketing, service delivery, and collection of feedback across various industries. Our platform has been recognized for its ease of use and amazing features, which helped a business to make QR codes.
Our Services
At ViralQR, here is a comprehensive suite of services that caters to your very needs:
Static QR Codes: Create free static QR codes. These QR codes are able to store significant information such as URLs, vCards, plain text, emails and SMS, Wi-Fi credentials, and Bitcoin addresses.
Dynamic QR codes: These also have all the advanced features but are subscription-based. They can directly link to PDF files, images, micro-landing pages, social accounts, review forms, business pages, and applications. In addition, they can be branded with CTAs, frames, patterns, colors, and logos to enhance your branding.
Pricing and Packages
Additionally, there is a 14-day free offer to ViralQR, which is an exceptional opportunity for new users to take a feel of this platform. One can easily subscribe from there and experience the full dynamic of using QR codes. The subscription plans are not only meant for business; they are priced very flexibly so that literally every business could afford to benefit from our service.
Why choose us?
ViralQR will provide services for marketing, advertising, catering, retail, and the like. The QR codes can be posted on fliers, packaging, merchandise, and banners, as well as to substitute for cash and cards in a restaurant or coffee shop. With QR codes integrated into your business, improve customer engagement and streamline operations.
Comprehensive Analytics
Subscribers of ViralQR receive detailed analytics and tracking tools in light of having a view of the core values of QR code performance. Our analytics dashboard shows aggregate views and unique views, as well as detailed information about each impression, including time, device, browser, and estimated location by city and country.
So, thank you for choosing ViralQR; we have an offer of nothing but the best in terms of QR code services to meet business diversity!
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
2. Problem
Find a good way to distribute a big graph
for solving things like linear systems and simulating random walks
Contributions
Theoretical demonstration that overlap helps
Proof of concept procedure to find overlapping
partitions to reduce communication (~20%)
All code available
http://www.cs.purdue.edu/~dgleich/codes/
overlapping
2
David Gleich · Purdue
WSDM2012
3. The problem
WHAT OUR NETWORKS WHAT OUR OTHER
LOOK LIKE
NETWORKS LOOK LIKE
3
David Gleich · Purdue
WSDM2012
4. The problem
COMBINING NETWORKS AND GRAPHS IS A MESS
4
David Gleich · Purdue
WSDM2012
5. “Good” data distributions are
a fundamental problem in
distributed computation.
!
How to divide the
communication graph!
Balance work
Balance communication
Balance data
Balance programming
complexity too
5
David Gleich · Purdue
WSDM2012
6. Current solutions
Work
Comm.
Data
Programming
Disjoint vertex Okay to “Think like a
Excellent
Excellent
partitions
Good
vertex”
2d or Edge
Excellent
Excellent
Good
“Impossible”
Partitions
Where we fit!
Overlapping Good to “Think like a
Okay
“Let’s see”
partitions
Excellent
cached vertex”
6
David Gleich · Purdue
WSDM2012
7. Goals
Find a set of "
overlapping clusters "
where
random walks stay in a
cluster for a long time
solving diffusion-like problems
requires little communication
(think PageRank, Katz, hitting times,
semi-supervised learning)
7
David Gleich · Purdue
WSDM2012
8. Related work
Domain decomposition, Schwarz methods
How to solve a linear system with overlap. Szyld et al.
Communication avoiding algorithms
k-step matrix-vector products (Demmel et al.) and "
growing overlap around partitions (Fritzsche, Frommer, Szyld)
Overlapping communities and link partitioning
algorithms for social network analysis
Link communities (Ahn et al.); surveys by Fortunato and Satu
P2P based PageRank algorithms
Parreira, Castillo, Donato et al.
8
David Gleich · Purdue
WSDM2012
9. Overlapping clusters
Each vertex
in at least one cluster
has one home cluster
Formally,
an overlapping cover is
(C, ⌧ )
C={ , , }
= set of clusters
⌧ : V 7! C = map to homes
⌧ is a partition!
9
David Gleich · Purdue
WSDM2012
10. Random walks in
overlapping clusters
Each vertex
in at least one cluster
has one home cluster
red cluster "
keeps the walk
Random walks
red cluster "
go to the home
sends the walk cluster after leaving
to gray cluster
10
David Gleich · Purdue
WSDM2012
11. An evaluation metric"
Swapping probability
Is (C, ⌧ ) a good
overlapping cover?
Does a random walk
swap clusters often?
red cluster "
keeps the walk
⇢
1 =
probability that a walk
red cluster "
sends the walk changes clusters on each
to gray cluster
step
computable expression in the paper
11
David Gleich · Purdue
WSDM2012
12. Overlapping clusters
Each vertex
is in at least one cluster
has one home cluster
Vol(C) = sum of degrees of
vertices in cluster C
MaxVol = "
upper bound on Vol(C)
TotalVol(C) = "
C
sum of Vol(C) for all clusters
VolRatio = TotalVol(C) / Vol(G)"
C
how much extra data!
12
David Gleich · Purdue
WSDM2012
13. Swapping probability &
partitioning
No overlap in
this figure !
P is a partition
⇢1 (P)
=
1 X
Cut(P)
Vol(G)
P2P
Much like a
classical graph
partitioning metric
13
David Gleich · Purdue
WSDM2012
14. Overlapping clusters vs.
Partitioning in theory
Take a cycle graph
M groups of ℓ������ vertices
MaxVol = 2ℓ������
partitioning
for
1
1
⇢ = (Optimal!)
`
for overlapping
4
⇢1 =
⌦(`2 )
14
David Gleich · Purdue
WSDM2012
15. Heuristics for finding good " N P-hard for optimal
overlapping clusters
solution L
Our multi-stage heuristic!
1. Find a large set of good clusters
Use personalized PageRank clusters
2. Find “well contained” nodes (cores)
Compute expected “leavetime”
3. Cover the graph with core vertices
Approximately solve a min set-cover problem
4. Combine clusters up to MaxVol
The swapping probability is sub-modular
15
David Gleich · Purdue
WSDM2012
16. Heuristics for finding good " N P-hard for optimal
overlapping clusters
solution L
Our multi-stage heuristic!
1. Find a large set of good clusters
Each cluster takes
Use personalized PageRank clusters, or metis
“< MaxVol” work
2. Find “well contained” nodes (cores)
Takes O(Vol)
Compute expected “leave time”
work per cluster
3. Cover the graph with core vertices
Approximately solve a min set-cover problem
Fast enough
4. Combine clusters up to MaxVol
The swapping probability is sub-modular
Fast enough
16
David Gleich · Purdue
WSDM2012
18. Solving "
linear "
systems
Like PageRank, Katz, and
semi-supervised learning
18
David Gleich · Purdue
WSDM2012
19. All nodes solve locally using "
the coordinate descent method.
19
David Gleich · Purdue
WSDM2012
20. All nodes solve locally using "
the coordinate descent method.
A core vertex for the
gray cluster.
20
David Gleich · Purdue
WSDM2012
21. All nodes solve locally using "
the coordinate descent method.
Red sends residuals to white.
White send residuals to red.
21
David Gleich · Purdue
WSDM2012
22. White then uses the coordinate
descent method to adjust its solution.
Will cause communication to red/blue.
22
David Gleich · Purdue
WSDM2012
23. That algorithm is called "
restricted additive Schwarz.
PageRank
We look at
PageRank!
Katz scores
semi-supervised learning
any spd or M-matrix "
linear system
23
David Gleich · Purdue
WSDM2012
24. It works!
2
communication
Swapping Probability (usroads)
PageRank Communication (usroads)
Swapping Probability (web−Google)
1.5
PageRank Communication (web−Google)
Relative Relative Work
1 Metis Partitioner
Partitioning baseline
0.5
0
1 1.1 1.2 1.3 1.4 1.5 1.6 1.7
Volume Ratio
How much more of the
graph we need to store.
24
David Gleich · Purdue
WSDM2012
25. Edges are counted twice and some graphs have self-
loops. The first group are geometric networks and
the second are information networks.
Graph
Graph Vertices
|V | Edges
|E| MaxDeg
max deg Density
|E|/|V |
onera 85567 419201 5 4.9
usroads 126146 323900 7 2.6
annulus 500000 2999258 19 6.0
email-Enron 33696 361622 1383 10.7
soc-Slashdot 77360 1015667 2540 13.1
dico 111982 2750576 68191 24.6
lcsh 144791 394186 1025 2.7
web-Google 855802 8582704 6332 10.0
as-skitter 1694616 22188418 35455 13.1
cit-Patents 3764117 33023481 793 8.8
1 1 1
0.8 0.8 0.8
Conductance
Conductance
-
Conductance
0.6 0.6 0.6
0.4 0.4 0.4
25
0.2 0.2 0.2
0
David Gleich · Purdue
0
WSDM2012
0 5 0 0 5
26. he communication ratio of our best result for the PageRan
ommunication volume compared to METIS or GRACLUS show
at the method works for 6 of them (perf. ratio < 1). The
ommunication result is not a bug.
Graph Comm. of Comm. of Perf. Ratio Vol. Ratio
Partition Overlap
onera 18654 48 0.003 2.82
usroads 3256 0 0.000 1.49
annulus 12074 2 0.000 0.01
email-Enron 194536* 235316 1.210 1.7
soc-Slashdot 875435* 1.3 ⇥ 106 1.480 1.78
dico 1.5 ⇥ 106 * 2.0 ⇥ 106 1.320 1.53
lcsh 73000* 48777 0.668 2.17
web-Google 201159* 167609 0.833 1.57
as-skitter 2.4 ⇥ 106 3.9 ⇥ 106 1.645 1.93
cit-Patents 8.7 ⇥ 106 7.3 ⇥ 106 0.845 1.34
* means Graculus
nally, we evaluate our heuristic.
gave a better
partition than Metis
At left, the cluster combine procedure reduces 106 clusters to
26
around 102 . Middle, combining clusters can decrease the volume
David Gleich · Purdue
WSDM2012
27. Summary
Future work
!
Overlap helps reduce Truly distributed implementation and
communication in a distributed evaluation
process!
! Can we exploit data redundancy to
Proof of concept procedure to solve problems on large graphs faster?
find overlapping partitions to
reduce communication
Copy 1
Copy 2
src -> dst
src -> dst
src -> dst
src -> dst
src -> dst
src -> dst
All code available
http://www.cs.purdue.edu/~dgleich/codes/
overlapping
27
David Gleich · Purdue
WSDM2012