My talk at TMA 2016 (The workshop on Tensors, Matrices, and their Applications) on the relationship between a spacey random walk process and tensor eigenvectors
Higher-order organization of complex networksDavid Gleich
A talk I gave at the Park City Institute of Mathematics about our recent work on using motifs to analyze and cluster networks. This involves a higher-order cheeger inequality in terms of motifs.
Spectral clustering with motifs and higher-order structuresDavid Gleich
I presented these slides at the #strathna meeting in Glasgow in June 2017. They are an updated and enhanced version of the earlier talks on the subject.
Spacey random walks and higher order Markov chainsDavid Gleich
My talk at SIAM NetSci workshop (2015) on our new spacey random walk and spacey random surfer models and how we derived them. There many potential extensions and opportunities to use this for analyzing big data as tensors.
Localized methods in graph mining exploit the local structures in a graph instead attempting to find global structures. These are widely successful at all sorts of problems including community detection, label propagation, and a few others.
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...David Gleich
This talk covers the idea of anti-differentiating approximation algorithms, which is an idea to explain the success of widely used heuristic procedures. Formally, this involves finding an optimization problem solved exactly by an approximation algorithm or heuristic.
Localized methods for diffusions in large graphsDavid Gleich
I describe a few ongoing research projects on diffusions in large graphs and how we can create efficient matrix computations in order to determine them efficiently.
Higher-order organization of complex networksDavid Gleich
A talk I gave at the Park City Institute of Mathematics about our recent work on using motifs to analyze and cluster networks. This involves a higher-order cheeger inequality in terms of motifs.
Spectral clustering with motifs and higher-order structuresDavid Gleich
I presented these slides at the #strathna meeting in Glasgow in June 2017. They are an updated and enhanced version of the earlier talks on the subject.
Spacey random walks and higher order Markov chainsDavid Gleich
My talk at SIAM NetSci workshop (2015) on our new spacey random walk and spacey random surfer models and how we derived them. There many potential extensions and opportunities to use this for analyzing big data as tensors.
Localized methods in graph mining exploit the local structures in a graph instead attempting to find global structures. These are widely successful at all sorts of problems including community detection, label propagation, and a few others.
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...David Gleich
This talk covers the idea of anti-differentiating approximation algorithms, which is an idea to explain the success of widely used heuristic procedures. Formally, this involves finding an optimization problem solved exactly by an approximation algorithm or heuristic.
Localized methods for diffusions in large graphsDavid Gleich
I describe a few ongoing research projects on diffusions in large graphs and how we can create efficient matrix computations in order to determine them efficiently.
Correlation clustering and community detection in graphs and networksDavid Gleich
We show a new relationship between various community detection objectives and a correlation clustering framework. These enable us to detect communities with good bounds on the solution.
PageRank Centrality of dynamic graph structuresDavid Gleich
A talk I gave at the SIAM Annual Meeting Mini-symposium on the mathematics of the power grid organized by Mahantesh Halappanavar. I discuss a few ideas on how our dynamic centrality could help analyze such situations.
Anti-differentiating Approximation Algorithms: PageRank and MinCutDavid Gleich
We study how Google's PageRank method relates to mincut and a particular type of electrical flow in a network. We also explain the details of how the "push method" for computing PageRank helps to accelerate it. This has implications for semi-supervised learning and machine learning, as well as social network analysis.
Big data matrix factorizations and Overlapping community detection in graphsDavid Gleich
In a talk at the Chinese Academic of Sciences Institute for Automation, I discuss some of the MapReduce and community detection methods I've worked on.
Fast relaxation methods for the matrix exponential David Gleich
The matrix exponential is a matrix computing primitive used in link prediction and community detection. We describe a fast method to compute it using relaxation on a large linear system of equations. This enables us to compute a column of the matrix exponential is sublinear time, or under a second on a standard desktop computer.
A copy of my slides from the SILO Seminar at UW Madison on our recent developments for the NEO-K-Means methods including new optimization routines and results.
Gaps between the theory and practice of large-scale matrix-based network comp...David Gleich
I discuss some runtimes for the personalized PageRank vector and how it relates to open questions in how we should tackle these network based measures via matrix computations.
Using Local Spectral Methods to Robustify Graph-Based LearningDavid Gleich
This is my KDD2015 talk on robustness in semi-supervised learning. The paper is already on Michael Mahoney's website: http://www.stat.berkeley.edu/~mmahoney/pubs/robustifying-kdd15.pdf See the KDD paper for all the details, which this talk is a bit light on.
Relaxation methods for the matrix exponential on large networksDavid Gleich
My talk from the Stanford ICME seminar series on doing network analysis and link prediction using the a fast algorithm for the matrix exponential on graph problems.
Exact Matrix Completion via Convex Optimization Slide (PPT)Joonyoung Yi
Slide of the paper "Exact Matrix Completion via Convex Optimization" of Emmanuel J. Candès and Benjamin Recht. We presented this slide in KAIST CS592 Class, April 2018.
- Code: https://github.com/JoonyoungYi/MCCO-numpy
- Abstract of the paper: We consider a problem of considerable practical interest: the recovery of a data matrix from a sampling of its entries. Suppose that we observe m entries selected uniformly at random from a matrix M. Can we complete the matrix and recover the entries that we have not seen? We show that one can perfectly recover most low-rank matrices from what appears to be an incomplete set of entries. We prove that if the number m of sampled entries obeys
𝑚≥𝐶𝑛1.2𝑟log𝑛
for some positive numerical constant C, then with very high probability, most n×n matrices of rank r can be perfectly recovered by solving a simple convex optimization program. This program finds the matrix with minimum nuclear norm that fits the data. The condition above assumes that the rank is not too large. However, if one replaces the 1.2 exponent with 1.25, then the result holds for all values of the rank. Similar results hold for arbitrary rectangular matrices as well. Our results are connected with the recent literature on compressed sensing, and show that objects other than signals and images can be perfectly reconstructed from very limited information.
Tensor Train (TT) decomposition [3] is a generalization of SVD decomposition from matrices to tensors (=multidimensional arrays).
It represents a tensor compactly in terms of factors and allows to work with the tensor via its factors without materializing the tensor itself.
For example, we can find the elementwise product of two TT-tensors of size 2^100 and get the result in the TT-format as well.
In the talk, we will show how Tensor Train decomposition can be used to represent parameters of neural networks [1] and polynomial models [2].
This parametrization allows exponentially many 'virtual' parameters while working only with small factors of the TT-format.
To train the model, i.e. optimize the objective subject to the constraint that the parameters are in the TT-format, [2] uses stochastic Riemannian optimization.
[1] Novikov, A., Podoprikhin, D., Osokin, A., & Vetrov, D. P. (2015). Tensorizing neural networks. In Advances in Neural Information Processing Systems.
[2] Novikov, A., Trofimov, M., & Oseledets, I. (2016). Tensor Train polynomial models via Riemannian optimization. arXiv:1605.03795.
[3] Oseledets, I. (2011). Tensor-train decomposition. SIAM Journal on Scientific Computing.
In this talk we consider the question of how to use QMC with an empirical dataset, such as a set of points generated by MCMC. Using ideas from partitioning for parallel computing, we apply recursive bisection to reorder the points, and then interleave the bits of the QMC coordinates to select the appropriate point from the dataset. Numerical tests show that in the case of known distributions this is almost as effective as applying QMC directly to the original distribution. The same recursive bisection can also be used to thin the dataset, by recursively bisecting down to many small subsets of points, and then randomly selecting one point from each subset. This makes it possible to reduce the size of the dataset greatly without significantly increasing the overall error. Co-author: Fei Xie
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...MLconf
Anima Anandkumar is a faculty at the EECS Dept. at U.C.Irvine since August 2010. Her research interests are in the area of large-scale machine learning and high-dimensional statistics. She received her B.Tech in Electrical Engineering from IIT Madras in 2004 and her PhD from Cornell University in 2009. She has been a visiting faculty at Microsoft Research New England in 2012 and a postdoctoral researcher at the Stochastic Systems Group at MIT between 2009-2010. She is the recipient of the Microsoft Faculty Fellowship, ARO Young Investigator Award, NSF CAREER Award, and IBM Fran Allen PhD fellowship.
A history of PageRank from the numerical computing perspectiveDavid Gleich
We'll survey some of the underlying ideas from Google's PageRank algorithm along the lines of Massimo Franceschet's CACM history.
There are some slight liberties I've taken to make it more accessible.
Correlation clustering and community detection in graphs and networksDavid Gleich
We show a new relationship between various community detection objectives and a correlation clustering framework. These enable us to detect communities with good bounds on the solution.
PageRank Centrality of dynamic graph structuresDavid Gleich
A talk I gave at the SIAM Annual Meeting Mini-symposium on the mathematics of the power grid organized by Mahantesh Halappanavar. I discuss a few ideas on how our dynamic centrality could help analyze such situations.
Anti-differentiating Approximation Algorithms: PageRank and MinCutDavid Gleich
We study how Google's PageRank method relates to mincut and a particular type of electrical flow in a network. We also explain the details of how the "push method" for computing PageRank helps to accelerate it. This has implications for semi-supervised learning and machine learning, as well as social network analysis.
Big data matrix factorizations and Overlapping community detection in graphsDavid Gleich
In a talk at the Chinese Academic of Sciences Institute for Automation, I discuss some of the MapReduce and community detection methods I've worked on.
Fast relaxation methods for the matrix exponential David Gleich
The matrix exponential is a matrix computing primitive used in link prediction and community detection. We describe a fast method to compute it using relaxation on a large linear system of equations. This enables us to compute a column of the matrix exponential is sublinear time, or under a second on a standard desktop computer.
A copy of my slides from the SILO Seminar at UW Madison on our recent developments for the NEO-K-Means methods including new optimization routines and results.
Gaps between the theory and practice of large-scale matrix-based network comp...David Gleich
I discuss some runtimes for the personalized PageRank vector and how it relates to open questions in how we should tackle these network based measures via matrix computations.
Using Local Spectral Methods to Robustify Graph-Based LearningDavid Gleich
This is my KDD2015 talk on robustness in semi-supervised learning. The paper is already on Michael Mahoney's website: http://www.stat.berkeley.edu/~mmahoney/pubs/robustifying-kdd15.pdf See the KDD paper for all the details, which this talk is a bit light on.
Relaxation methods for the matrix exponential on large networksDavid Gleich
My talk from the Stanford ICME seminar series on doing network analysis and link prediction using the a fast algorithm for the matrix exponential on graph problems.
Exact Matrix Completion via Convex Optimization Slide (PPT)Joonyoung Yi
Slide of the paper "Exact Matrix Completion via Convex Optimization" of Emmanuel J. Candès and Benjamin Recht. We presented this slide in KAIST CS592 Class, April 2018.
- Code: https://github.com/JoonyoungYi/MCCO-numpy
- Abstract of the paper: We consider a problem of considerable practical interest: the recovery of a data matrix from a sampling of its entries. Suppose that we observe m entries selected uniformly at random from a matrix M. Can we complete the matrix and recover the entries that we have not seen? We show that one can perfectly recover most low-rank matrices from what appears to be an incomplete set of entries. We prove that if the number m of sampled entries obeys
𝑚≥𝐶𝑛1.2𝑟log𝑛
for some positive numerical constant C, then with very high probability, most n×n matrices of rank r can be perfectly recovered by solving a simple convex optimization program. This program finds the matrix with minimum nuclear norm that fits the data. The condition above assumes that the rank is not too large. However, if one replaces the 1.2 exponent with 1.25, then the result holds for all values of the rank. Similar results hold for arbitrary rectangular matrices as well. Our results are connected with the recent literature on compressed sensing, and show that objects other than signals and images can be perfectly reconstructed from very limited information.
Tensor Train (TT) decomposition [3] is a generalization of SVD decomposition from matrices to tensors (=multidimensional arrays).
It represents a tensor compactly in terms of factors and allows to work with the tensor via its factors without materializing the tensor itself.
For example, we can find the elementwise product of two TT-tensors of size 2^100 and get the result in the TT-format as well.
In the talk, we will show how Tensor Train decomposition can be used to represent parameters of neural networks [1] and polynomial models [2].
This parametrization allows exponentially many 'virtual' parameters while working only with small factors of the TT-format.
To train the model, i.e. optimize the objective subject to the constraint that the parameters are in the TT-format, [2] uses stochastic Riemannian optimization.
[1] Novikov, A., Podoprikhin, D., Osokin, A., & Vetrov, D. P. (2015). Tensorizing neural networks. In Advances in Neural Information Processing Systems.
[2] Novikov, A., Trofimov, M., & Oseledets, I. (2016). Tensor Train polynomial models via Riemannian optimization. arXiv:1605.03795.
[3] Oseledets, I. (2011). Tensor-train decomposition. SIAM Journal on Scientific Computing.
In this talk we consider the question of how to use QMC with an empirical dataset, such as a set of points generated by MCMC. Using ideas from partitioning for parallel computing, we apply recursive bisection to reorder the points, and then interleave the bits of the QMC coordinates to select the appropriate point from the dataset. Numerical tests show that in the case of known distributions this is almost as effective as applying QMC directly to the original distribution. The same recursive bisection can also be used to thin the dataset, by recursively bisecting down to many small subsets of points, and then randomly selecting one point from each subset. This makes it possible to reduce the size of the dataset greatly without significantly increasing the overall error. Co-author: Fei Xie
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...MLconf
Anima Anandkumar is a faculty at the EECS Dept. at U.C.Irvine since August 2010. Her research interests are in the area of large-scale machine learning and high-dimensional statistics. She received her B.Tech in Electrical Engineering from IIT Madras in 2004 and her PhD from Cornell University in 2009. She has been a visiting faculty at Microsoft Research New England in 2012 and a postdoctoral researcher at the Stochastic Systems Group at MIT between 2009-2010. She is the recipient of the Microsoft Faculty Fellowship, ARO Young Investigator Award, NSF CAREER Award, and IBM Fran Allen PhD fellowship.
A history of PageRank from the numerical computing perspectiveDavid Gleich
We'll survey some of the underlying ideas from Google's PageRank algorithm along the lines of Massimo Franceschet's CACM history.
There are some slight liberties I've taken to make it more accessible.
This talk is a new update based on some of our recent results on doing Tall and Skinny QRs in MapReduce. In particular, the "fast" iterative refinement approximation based on a sample is new.
MapReduce Tall-and-skinny QR and applicationsDavid Gleich
A talk at the SIMONS workshop on Parallel and Distributed Algorithms for Inference and Optimization on how to do tall-and-skinny QR factorizations on MapReduce using a communication avoiding algorithm.
How does Google Google: A journey into the wondrous mathematics behind your f...David Gleich
A talk I gave at the annual meeting for the MetroNY section of the MAA about how Google works from a link-ranking perspective. (http://sections.maa.org/metrony/)
Based on a talk by Margot Gerritsen (which used elements from another talk I gave years ago, yay co-author improvements!)
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...David Gleich
My talk from KDD2012 about vertex neighborhoods and low conductance cuts. See the paper here: http://arxiv.org/abs/1112.0031 and http://dl.acm.org/citation.cfm?id=2339628
Recommendation and graph algorithms in Hadoop and SQLDavid Gleich
A talk I gave at ancestry.com on Hadoop, SQL, recommendation and graph algorithms. It's a tutorial overview, there are better algorithms than those I describe, but these are a simple starting point.
Fast matrix primitives for ranking, link-prediction and moreDavid Gleich
I gave this talk at Netflix about some of the recent work I've been doing on fast matrix primitives for link prediction and also some non-standard uses of the nuclear norm for ranking.
Characterization of Subsurface Heterogeneity: Integration of Soft and Hard In...Amro Elfeki
Park, E., Elfeki, A. M. M., Dekking, F.M. (2003). Characterization of subsurface heterogeneity: Integration of soft and hard information using multi-dimensional Coupled Markov chain approach. Underground Injection Science and Technology Symposium, Lawrence Berkeley National Lab., October 22-25, 2003. p.49. Eds. Tsang, Chin.-Fu and Apps, John A.
http://www.lbl.gov/Conferences/UIST/index.html#topics
We consider the problem of model estimation in episodic Block MDPs. In these MDPs, the decision maker has access to rich observations or contexts generated from a small number of latent states. We are interested in estimating the latent state decoding function (the mapping from the observations to latent states) based on data generated under a fixed behavior policy. We derive an information-theoretical lower bound on the error rate for estimating this function and present an algorithm approaching this fundamental limit. In turn, our algorithm also provides estimates of all the components of the MDP.
We apply our results to the problem of learning near-optimal policies in the reward-free setting. Based on our efficient model estimation algorithm, we show that we can infer a policy converging (as the number of collected samples grows large) to the optimal policy at the best possible asymptotic rate. Our analysis provides necessary and sufficient conditions under which exploiting the block structure yields improvements in the sample complexity for identifying near-optimal policies. When these conditions are met, the sample complexity in the minimax reward-free setting is improved by a multiplicative factor $n$, where $n$ is the number of contexts.
Dynamical Systems Methods in Early-Universe CosmologiesIkjyot Singh Kohli
Talk I gave at The Southern Ontario Numerical Analysis Day (SONAD): http://www.math.yorku.ca/sonad2014/ on General Relativity, Dynamical Systems, and Early-Universe Cosmologies.
These are slides for my tutorial talk on network dynamics. (The colors are fine in the downloaded version, though there seem to be color issues if you view the slides directly in slideshare.)
Network-Growth Rule Dependence of Fractal Dimension of Percolation Cluster on...Shu Tanaka
Our paper entitled “Network-Growth Rule Dependence of Fractal Dimension of Percolation Cluster on Square Lattice" was published in Journal of the Physical Society of Japan. This work was done in collaboration with Dr. Ryo Tamura (NIMS).
http://journals.jps.jp/doi/abs/10.7566/JPSJ.82.053002
NIMSの田村亮さんとの共同研究論文 “Network-Growth Rule Dependence of Fractal Dimension of Percolation Cluster on Square Lattice" が Journal of the Physical Society of Japan に掲載されました。
http://journals.jps.jp/doi/abs/10.7566/JPSJ.82.053002
Complex systems are characterized by constituents -- from neurons in the brain to individuals in a social network -- which exhibit special structural organization and nonlinear dynamics. As a consequence, a complex system cannot be understood by studying its units separately because their interactions lead to unexpected emerging phenomena, from collective behavior to phase transitions.
Recently, we have discovered that a new level of complexity characterizes a variety of natural and artificial systems, where units interact, simultaneously, in distinct ways. For instance, this is the case of multimodal transportation systems (e.g., metro, bus and train networks) or of biological molecules, whose interactions might be of different type (e.g. physical, chemical, genetic) or functionality (e.g., regulatory, inhibitory, etc.). The unprecedented newfound wealth of multivariate data allows to categorize system's interdependency by defining distinct "layers", each one encoding a different network representation of the system. The result is a multilayer network model.
Analyzing data from different domains -- including molecular biology, neuroscience, urban transport, telecommunications -- we will show that neglecting or disregarding multivariate information might lead to poor results. Conversely, multilayer models provide a suitable framework for complex data analytics, allowing to quantify the resilience of a system to perturbations (e.g., localized failures or targeted attacks), improving forecasting of spreading processes and accuracy in classification problems.
Similar to Spacey random walks and higher-order data analysis (20)
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Spacey random walks and higher-order data analysis
1. Spacey random walks for !
higher-order data analysis
David F. Gleich!
Purdue University!
May 20, 2016!
Joint work with Austin Benson, Lek-Heng Lim,
Tao Wu, supported by NSF CAREER
CCF-1149756, IIS-1422918, DARPA SIMPLEX
Papers {1602.02102,1603.00395}
TMA 2016
David Gleich · Purdue
1
2. Markov chains, matrices, and
eigenvectors have a long relationship.
Kemeny and Snell, 1976. “In the land of Oz they never have two nice
days in a row. If they have a nice day, they are just as likely to have
snow as rain the next day. If they have snow or rain, they have an
even chain of having the same the next day. If there is a change from
snow or rain, only half of the time is this change to a nice day. “
other. The transition matrix is
We next consider a modified version of the random walk. If the
process is in one of the three interior states, it has equal probability
of moving right, moving left, or staying in its present state. If it is
S5 .1 .2 .4 .2 .1
EXAMPLE 8
nite Mathematics (Chapter V, Section 8), in the Land
ave two nice days in a row. If they have a nice day
ikely to have snow as rain the next day. If they
n) they have an even chance of having the same the
re is a change from snow or rain, only half of the
nge to a nice day. We form a three-state Markov
R, N, and S for rain, nice, and snow, respectively.
trix is then
(8)
Column-stochastic in my talk
P =
0
B
@
R N S
R 1/2 1/2 1/4
N 1/4 0 1/4
S 1/4 1/2 1/2
1
C
A
x stationary distribution
xi =
X
j
P(i, j)xj , xi 0,
X
i
xi = 1
x =
2/5
1/5
2/5
x is an eigenvector
TMA 2016
David Gleich · Purdue
2
3. Markov chains, matrices, and
eigenvectors have a long relationship.
1. Start with a Markov chain
2. Inquire about the stationary distribution
3. This question gives rise to an eigenvector
problem on the transition matrix
TMA 2016
David Gleich · Purdue
3
X1, X2, ... , Xt , Xt+1, ...
xi = lim
N!1
1
N
NX
t=1
Ind[Xt = i] This is the limiting fraction of time
the chain spends in state i
In general, Xt will be a stochastic
process in this talk
4. Higher-order Markov chains are more
useful for modern data problems.
Higher-order means more history!
Rosvall et al. (Nature Com. 2014) found
• Higher-order Markov chains were critical to "
finding multidisiplinary journals in citation "
data and patterns in air traffic networks.
Chierichetti et al. (WWW 2012) found
• Higher-order Markov models capture browsing "
behavior more accurately than first-order models.
(and more!)
somewhat less than second order. Next, we assembled the links into networks.
All links with the same start node in the bigrams represent out-links of the start
node in the standard network (Fig. 6d). A physical node in the memory network,
which corresponds to a regular node in a standard network, has one memory node
for each in-link (Fig. 6e). A memory node represents a pair of nodes in the
trigrams. For example, the blue memory node in Fig. 6e represents passengers who
come to Chicago from New York. All links with the same start memory node in the
Comm
work
used a
to com
dynam
As we
gener
Th
advan
data.
measu
by fol
or cov
pickin
Th
dynam
walke
the st
nodes
move
rando
nodes
length
We ac
of eac
optim
depen
memo
corres
algori
Fig
detect
New Y
order
descri
With
211
24,919
95,977
99,140
72.569
72.467Atlanta
Atlanta
Atlanta Atlanta
New York
New York
Chicago
Chicago
Chicago
Chicago
Chicago
San Francisco San Francisco
San Francisco
First–order Markov Second–order Markov
New York New YorkSeattle Seattle
Chicago
San Francisco San Francisco
Atlanta Atlanta
Chicago
Figure 6 | From pathway data to networks with and without memory.
(a) Itineraries weighted by passenger number. (b) Aggregated bigrams
for links between physical nodes. (c) Aggregated trigrams for links between
memory nodes. (d) Network without memory. (e) Network with
memory. Corresponding dynamics in Fig. 1a,b.
Rosvall et al. 2014
TMA 2016
David Gleich · Purdue
4
5. Stationary dist. of higher-order
Markov chains are still matrix eqns.
Convert into a first order Markov chain on pairs of states.
Xi,j =
X
j,k
P(i, j, k)Xj,k Xi,j 0,
P
i,j Xi,j = 1
P(i, j, k) = Prob. of state i given hist. j, k
xi =
P
j X(i, j) Marginal for the stat. dist.
P[Xt+1 = i | Xt = j, Xt 1 = k] = P(i, j, k)
TMA 2016
David Gleich · Purdue
5
Last state 1 2 3
Current state 1 2 3 1 2 3 1 2 3
P[next state = 1] 0 0 0 1/4 0 0 1/4 0 3/4
P[next state = 2] 3/5 2/3 0 1/2 0 1/2 0 1/2 0
P[next state = 3] 2/5 1/3 1 1/4 1 1/2 3/4 1/2 1/4
6. Stationary dist. of higher-order
Markov chains are still matrix eqns.
(1,1) (2,1)
(3,1)
(1,2)
(2,2)(3,2)
2/5
3/5
1/3 2/3
1/4 1/2
1/4
1
1/4
3/4
1 1 3 3 1 · · ·
2/5 1 3/4
2 3 3 · · ·
1/2
1 2 3 2 · · ·
1/3 1/2
The implicit Markov chain
P(i, j, k) = Prob. of state i given hist. j, k
P[Xt+1 = i | Xt = j, Xt 1 = k] = P(i, j, k)
TMA 2016
David Gleich · Purdue
6
Last state 1 2 3
Current state 1 2 3 1 2 3 1 2 3
P[next state = 1] 0 0 0 1/4 0 0 1/4 0 3/4
P[next state = 2] 3/5 2/3 0 1/2 0 1/2 0 1/2 0
P[next state = 3] 2/5 1/3 1 1/4 1 1/2 3/4 1/2 1/4
7. Hypermatrices, tensors, and tensor
eigenvectors have been used too
Z-eigenvectors (above) proposed by Lim (2005), Qi (2005).
Many references to doing tensors for data analysis (1970+)
Anandkumar et al. 2014
• Tensor eigenvector decomp. are optimal to recover "
latent variable models based on higher-order moments.
1
3
2
A
tensor A : n ⇥ n ⇥ n
tensor eigenvector
X
j,k
A(i, j, k)xj xk = xi ,
Ax2
= x
TMA 2016
David Gleich · Purdue
7
8. But there were few results
connecting hypermatrices, tensors,
and higher-order Markov chains.
TMA 2016
David Gleich · Purdue
8
9. Li and Ng proposed a link between
tensors and high-order MC
1. Start with a higher-order Markov chain
2. Look at the stationary distribution
3. Assume/approximate as rank 1
4. … and we have a tensor eigenvector
Li and Ng 2014.
Xi,j = xi xj
xi =
X
j,k
P(i, j, k)xj xk
TMA 2016
David Gleich · Purdue
9
Xi,j =
X
k
P(i, j, k)Xj,k Xi,j 0,
P
i,j Xi,j = 1
10. Li and Ng proposed an algebraic link
between tensors and high-order MC
The Li and Ng stationary distribution
Li and Ng 2014.
xi =
X
j,k
P(i, j, k)xj xk
• Is a tensor z-eigenvector
• Is non-negative and sums to one
• Can sometimes be computed "
[Li and Ng, 14; Chu and Wu, 14;
Gleich, Lim, Yu 15]
• May or may not be unique
• Almost always exists
Our question!
Is there a stochastic process underlying
this tensor eigenvector?
Px2
= x
TMA 2016
David Gleich · Purdue
10
11. Markov chain ! matrix equationIntro
Markov chain ! matrix equation !
approximation
Li & Ng,"
Multilinear "
PageRank
Desired
stochastic process ! approx. equations
Our question!
Is there a stochastic process underlying
this tensor eigenvector?
TMA 2016
David Gleich · Purdue
11
X1, X2, ... ! Px = x
X1, X2, ... ! Px2
= x
X1, X2, ... ! “PX = X” ! Px2
= x
12. The spacey random walk
Consider a higher-order Markov chain.
If we were perfect, we’d figure out the stat-
ionary distribution of that. But we are spacey!
• On arriving at state j, we promptly "
“space out” and forget we came from k.
• But we still believe we are “higher-order”
• So we invent a state k by drawing "
a random state from our history.
TMA 2016
David Gleich · Purdue
12
Benson, Gleich, Lim arXiv:2016
P[Xt+i | history] = P[Xt+i | Xt = j, Xt 1 = k]
走神
or
According to
my students
13. 10
12
4
9
7
11
4
Xt-1
Xt
Yt
Key insight limiting dist. of this process are tensor eigenvectors
The Spacey Random Walk
P[Xt+1 = i | Xt = j, Xt 1 = k] = P(i, j, k) Higher-order
Markov
Benson, Gleich, Lim arXiv:2016
TMA 2016
David Gleich · Purdue
13
P[Xt+1 = i | Xt = j, Yt = g] = P(i, j, g)
14. P(Xt+1 = i | Ft )
=
X
k
Pi,Xt ,k Ck (t)/(t + n)
The spacey random walk process
This is a reinforced stochastic process or a"
(generalized) vertex-reinforced random walk! "
Diaconis; Pemantle, 1992; Benaïm, 1997; Pemantle 2007
TMA 2016
David Gleich · Purdue
14
Let Ct (k) = (1 +
Pt
s=1 Ind{Xs = k})
How often we’ve visited
state k in the past
Ft is the -algebra
generated by the history
{Xt : 0 t n}
Benson, Gleich, Lim arXiv:2016
15. Generalized vertex-reinforced!
random walks (VRRW)
A vertex-reinforced random walk at time t transitions
according to a Markov matrix M given the observed
frequencies of visiting each state.
The map from the simplex of prob. distributions to Markov
chains is key to VRRW
TMA 2016
David Gleich · Purdue
15
M. Benïam 1997
P(Xt+1 = i | Ft )
= [M(t)]i,Xt
= [M(c(t))]i,Xt c(t) = observed frequency
c 7! M(c)
How often we’ve
been where
Where we are
going to next
16. Stationary distributions of
VRRWs correspond to ODEs
THEOREM [Benaïm, 1997] Paraphrased"
The sequence of empirical observation probabilities ct is
an asymptotic pseudo-trajectory for the dynamical
system
Thus, convergence of the ODE to a fixed point is
equivalent to stationary distributions of the VRRW.
• M must always have a unique stationary distribution!
• The map to M must be very continuous
• Asymptotic pseudo-trajectories satisfy
dx
dt
= ⇡[M(x)] x ⇡(M(x)) is a map to the stat. dist
TMA 2016
David Gleich · Purdue
16
lim
t!1
kc(t + T) x(t + T)x(t)=c(t)k = 0
17. The Markov matrix for !
Spacey Random Walks
A necessary condition for a stationary distribution
(otherwise makes no sense)
TMA 2016
David Gleich · Purdue
17
Property B. Let P be an order-m, n dimensional probability table. Then P has
property B if there is a unique stationary distribution associated with all stochastic
combinations of the last m 2 modes. That is, M =
P
k,`,... P(:, :, k, `, ...) k,`,... defines
a Markov chain with a unique Perron root when all s are positive and sum to one.
dx
dt
= ⇡[M(x)] x
This is the transition probability associated
with guessing the last state based on history!
2
1
M(x)
1
3
2
x
P
M(c) =
X
k
P(:, :, k)ck
Benson, Gleich, Lim arXiv:2016
18. Stationary points of the ODE for the
Spacey Random Walk are tensor evecs
M(c) =
X
k
P(:, :, k)ck
dx/dt = 0 , ⇡(M(x)) = x , M(x)x = x ,
X
j,k
P(i, j, k)xj xk = xi
But not all tensor eigenvectors are stationary points!
dx
dt
= ⇡[M(x)] x
Benson, Gleich, Lim arXiv:2016
TMA 2016
David Gleich · Purdue
18
19. Some results on spacey
random walk models
1. If you give it a Markov chain hidden in a hypermatrix,
then it works like a Markov chain.
2. All 2 x 2 x 2 x … x 2 problems have a stationary
distribution (with a few corner cases).
3. This shows that an “exotic” class of Pólya urns always converges
4. Spacey random surfer models have unique
stationary distributions in some regimes
5. Spacey random walks model Hardy-Weinberg laws in pop. genetics
6. Spacey random walks are a plausible model of
taxicab behavior
Benson, Gleich, Lim arXiv:2016
TMA 2016
David Gleich · Purdue
19
20. All 2-state spacey random walk
models have a stationary distribution
If we unfold P(i,j,k) for a 2 x 2 x 2
then
Key idea reduce to 1-dim ODE
R =
a b c d
1 a 1 b 1 c 1 d
M(x) = R(x ⌦ ) =
c x1(c a) d x1(d b)
1 c + x1(c a) 1 d + x1(d b)
⇡(
h
p 1 q
1 p q
i
) =
1 q
2 p q
Benson, Gleich, Lim arXiv:2016
TMA 2016
David Gleich · Purdue
20
21. The one-dimensional ODE has
a really simple structure
stable stable
unstable
x1
dx1/dt
In general, dx1/dt (0) ≥ 0, dx1/dt (1) ≤ 0, so there must be a stable point by cont.
Benson, Gleich, Lim arXiv:2016
TMA 2016
David Gleich · Purdue
21
22. With multiple states, the
situation is more complicated
If P is irreducible, there always exists a fixed point of the
algebraic equation
By Li and Ng 2013 using Brouwer’s theorem.
State of the art computation!
• Power method [Li and Ng], "
more analysis in [Chu & Wu, Gleich, Lim, Yu] and more today
• Shifted iteration, Newton iteration [Gleich, Lim, Yu]
New idea!
• Integrate the ODE
Px2
= x
Benson, Gleich, Lim arXiv:2016
TMA 2016
David Gleich · Purdue
22
M(c) =
X
k
P(:, :, k)ck
dx
dt
= ⇡[M(x)] x
23. Spacey random surfers are a
refined model with some structure
Akin to the PageRank modification of a Markov chain
1. With probability α, follow the spacey random walk
2. With probability 1-α, teleport based a distribution v!
The solution of is unique if α < 0.5
THEOREM (Benson, Gleich, Lim)"
The spacey random surfer model always has a
stationary dist. if α < 0.5. In other words, the ODE
always converges to a stable point
Gleich, Lim, Yu, SIMAX 2015
Benson, Gleich, Lim, arXiv:2016
x = ↵Px2
+ (1 ↵)v
dx
dt
= (1 ↵)[ ↵R(x ⌦ )] 1
v x
TMA 2016
David Gleich · Purdue
23
Yongyang Yu
Purdue
24. Some nice open problems in
this model
• For all the problems we have, Matlab’s ode45 has
never failed to converge to a eigenvector. (Even when
all other algorithms will not converge.)
• Can we show that if the power method converges to
a fixed point, then the ODE converges? (The converse
is false.)
• There is also a family of models (e.g. pick “second”
state based on history instead of the “third”), how can
we use this fact?
TMA 2016
David Gleich · Purdue
24
25. Here’s what we are using
spacey random walks to do!
1. Model the behavior of taxicabs in a large city. "
Involves fitting transition probabilities to data. "
Benson, Gleich, Lim arXiv:2016
2. Cluster higher-order data in a type of
“generalized” spectral clustering."
Involves a useful asymptotic property of spacey random walks"
Benson, Gleich, Leskovec SDM2016"
Wu, Benson, Gleich, arXiv:2016
TMA 2016
David Gleich · Purdue
25
26. Taxicab’s are a plausible
spacey random walk model
1,2,2,1,5,4,4,…
1,2,3,2,2,5,5,…
2,2,3,3,3,3,2,…
5,4,5,5,3,3,1,…
Model people by locations.
1 A passenger with location k is drawn at random.
2 The taxi picks up the passenger at location j.
3 The taxi drives the passenger to location i with probability P(i,j,k)
Approximate locations by history à spacey random walk.
Beijing Taxi image from Yu Zheng "
(Urban Computing Microsoft Asia)
TMA 2016
David Gleich · Purdue
26
Image from nyc.gov
Benson, Gleich, Lim arXiv:2016
27. NYC Taxi Data support the spacey
random walk hypothesis
One year of 1000 taxi trajectories in NYC.
States are neighborhoods in Manhattan.
P(i,j,k) = probability of taxi going from j to i "
when the passenger is from location k.
Evaluation RMSE
TMA 2016
David Gleich · Purdue
27
First-order Markov 0.846
Second-order Markov 0.835
Spacey 0.835
Benson, Gleich, Lim arXiv:2016
28. A property of spacey random walks
makes the connection to clustering
Spacey random walks (with stat. dists.) are
asymptotically Markov chains
• once the occupation vector c converges, then future
transitions are according to the Markov chain M(c)
This makes a connection to clustering
• spectral clustering methods can be derived by looking
for partitions of reversible Markov chains (and
research is on non-reversible ones too..)
We had an initial paper on using this idea for “motif-based clustering” of a
graph, but there is much better technique we have now.
TMA 2016
David Gleich · Purdue
28
Benson, Leskovec, Gleich. SDM 2015
Wu, Benson, Gleich. arXiv:2016
Jure Leskovec
Stanford
29. Given data bricks, we can cluster them
using these ideas, with one more
[i1, i2, …, in]3
[i1, i2, …, in1
] x
[j1, j2, …, jn2
] x
[k1, k2, …, kn3
]
If the data is a symmetric
cube, we can normalize it
to get a transition tensor
If the data is a brick, we
symmetrize using
Ragnarsson and van
Loan’s idea
TMA 2016
David Gleich · Purdue
29
Wu, Benson, Gleich arXiv:2016
A !
h
0 A
AT
0
iGeneralization of
30. The clustering methodology
1. Symmetrize the brick (if necessary)
2. Normalize to be a column stochastic tensor
3. Estimate the stationary distribution of the
spacey random walk (spacey random surf.)
or a generalization… (super-spacey RW)
4. Form the asymptotic Markov model
5. Bisect using eigenvectors or properties of
that asymptotic Markov model; then recurse.
TMA 2016
David Gleich · Purdue
30
31. Clustering airport-airport-airline
networks
UNCLUSTERED
(No structure apparent)
Airport-Airport-Airline"
Network
CLUSTERED
Diagonal structure evident
Name Airports Airlines Notes
World Hubs 250 77 Beijing, JFK
Europe 184 32 Europe, Morocco
United States 137 9 U.S. and Canc´un
China/Taiwan 170 33 China, Taiwan, Thailand
Oceania/SE Asia 302 52 Canadian airlines too
Mexico/Americas 399 68
TMA 2016
David Gleich · Purdue
31
32. Clusters in symmetrized three-gram
and four-gram data
Data 3, 4-gram data from COCA (ngrams.info)
“best clusters”
pronouns & articles (the, we, he, …)
prepositions & link verbs (in, of, as, to, …)
Fun 3-gram clusters!
{cheese, cream, sour, low-fat, frosting, nonfat, fat-free}
{bag, plastic, garbage, grocery, trash, freezer}
{church, bishop, catholic, priest, greek, orthodox, methodist, roman,
priests, episcopal, churches, bishops}
Fun 4-gram clusters !
{german, chancellor, angela, merkel, gerhard, schroeder, helmut, kohl}
TMA 2016
David Gleich · Purdue
32
33. Clusters in 3-gram Chinese text
TMA 2016
David Gleich · Purdue
33
社会 – society
– economy
– develop
– “ism”
国家 – nation
政府 – government
We also get stop-words in the Chinese text (highly
occuring words.)
But then we also get some strange words.
Reason Google’s Chinese corpus has a bias in its books.
34. One more problem
FIGURE 6 – Previous work
from the PI tackled net-
work alignment with ma-
trix methods for edge
overlap:
i
j j0
i0
OverlapOverlap
A L B
This proposal is for match-
ing triangles using tensor
methods:
A L B
This proposal is for match-
ing triangles using tensor
methods:
j
i
k
j0
i0
k0
TriangleTriangle
A L B
If xi, xj, and xk are
indicators associated with
the edges (i, i0
), (j, j0
), and
0
X
i2L
X
j2L
X
k2L
xixjxkTi,j,k
| {z }
triangle overlap term
nding to i, j, and k in
ching. Maximizing this
n to investigate a heuris-
he tensor T and using
ding. Similar heuristics
etwork alignment algo-
009). The work involves
Triangular Alignment (TAME): A Tensor-based Approach for
Higher-order Network Alignment"
Joint w. Shahin Mohammadi, Ananth Grama, and Tamara Kolda
http://arxiv.org/abs/1510.06482
max xT
(A ⌦ B)x s.t. kxk = 1 max(A ⌦ B)x3
s.t. kxk = 1
A, B is triangle hypergraph adjacency
A, B is edge adjacency matrix
“Solved” with x of dim. 86 million"
has 5 trillion non-zeros
A ⌦ B
35. www.cs.purdue.edu/homes/dgleich
Summary!
Spacey random walks are a new type of stochastic process that provides a
direct interpretation of tensor eigenvectors of higher-order Markov chains
probability tables.
!
We are excited!!
• Many potential new applications of the spacey random walk process
• Many open theoretical questions for us (and others) to follow up on.!
!
Code!
https://github.com/dgleich/mlpagerank
https://github.com/arbenson/tensor-sc
https://github.com/arbenson/spacey-random-walks
https://github.com/wutao27/GtensorSC
Papers!
Gleich, Lim, Yu. Mulltilinear PageRank, SIMAX 2015
Benson, Gleich, Leskovec, Tensor spectral clustering, SDM 2015
Benson, Gleich, Lim. Spacey random walks. arXiv:1602.02102
Wu, Benson, Gleich. Tensor spectral co-clustering. arXiv:1603.00395
35