The document discusses low-rank matrix optimization problems and heuristics for solving rank minimization problems. It covers the following key points in 3 sentences:
The document outlines motivation for extracting low-dimensional structures from high-dimensional data using rank minimization. It then discusses several heuristics for approximating the non-convex rank minimization problem, including replacing the rank with the nuclear norm, using the log-det heuristic as a smooth surrogate, matrix factorization methods, and iteratively solving a sequence of rank-constrained convex problems. Applications mentioned include the Netflix Prize and video intrusion detection.
The document summarizes a talk given by Mark Girolami on manifold Monte Carlo methods. It discusses using stochastic diffusions and geometric concepts to improve MCMC methods. Specifically, it proposes using discretized Langevin and Hamiltonian diffusions across a Riemann manifold as an adaptive proposal mechanism. This is founded on deterministic geodesic flows on the manifold. Examples presented include a warped bivariate Gaussian, Gaussian mixture model, and log-Gaussian Cox process.
Delayed acceptance for Metropolis-Hastings algorithmsChristian Robert
The document proposes a delayed acceptance method for accelerating Metropolis-Hastings algorithms. It begins with a motivating example of non-informative inference for mixture models where computing the prior density is costly. It then introduces the delayed acceptance approach which splits the acceptance probability into pieces that are evaluated sequentially, avoiding computing the full acceptance ratio each time. It validates that the delayed acceptance chain is reversible and provides bounds on its spectral gap and asymptotic variance compared to the original chain. Finally, it discusses optimizing the delayed acceptance approach by considering the expected square jump distance and cost per iteration to maximize efficiency.
1) The document presents a spectral sum rule for conformal field theories relating a weighted integral of the spectral density to one-point functions of the stress tensor operator.
2) It regularizes the retarded Green's function to remove divergent pieces, arriving at a difference of spectral densities that is analytic and well-behaved.
3) The sum rule constrains the parameters of the three-point function of the stress tensor in terms of the one-point function, and is checked against holographic calculations in anti-de Sitter space.
This document discusses rank-aware algorithms for joint sparse recovery from multiple measurement vectors (MMV). It begins by introducing the MMV problem and showing that when the rank of the signal matrix is r, the necessary and sufficient conditions for unique recovery are less restrictive than in the single measurement vector case. Classical MMV algorithms like SOMP and l1/lq minimization are not rank-aware. The document then proposes two rank-aware pursuit algorithms:
1) Rank-Aware OMP, which modifies the atom selection step of SOMP but still suffers from rank degeneration over iterations.
2) Rank-Aware Order Recursive Matching Pursuit (RA-ORMP), which forces the sparsity
This document discusses using the Wasserstein distance for inference in generative models. It begins by introducing ABC methods that use a distance between samples to compare observed and simulated data. It then discusses using the Wasserstein distance as an alternative distance metric that has lower variance than the Euclidean distance. The document covers computational aspects of calculating the Wasserstein distance, asymptotic properties of minimum Wasserstein estimators, and applications to time series data.
RSS discussion of Girolami and Calderhead, October 13, 2010Christian Robert
1. The document discusses discretizing Hamiltonians for Markov chain Monte Carlo (MCMC) methods. Specifically, it examines reproducing Hamiltonian equations through discretization, such as via generalized leapfrog.
2. However, the invariance and stability properties of the continuous-time process may not carry over to the discretized version. Approximations can be corrected with a Metropolis-Hastings step, so exactly reproducing the continuous behavior is not necessarily useful.
3. Discretization induces a calibration problem of determining the appropriate step size. Convergence issues for the MCMC algorithm should not be impacted by imperfect renderings of the continuous-time process in discrete time.
Tensor Decomposition and its ApplicationsKeisuke OTAKI
This document discusses tensor factorizations and decompositions and their applications in data mining. It introduces tensors as multi-dimensional arrays and covers 2nd order tensors (matrices) and 3rd order tensors. It describes how tensor decompositions like the Tucker model and CANDECOMP/PARAFAC (CP) model can be used to decompose tensors into core elements to interpret data. It also discusses singular value decomposition (SVD) as a way to decompose matrices and reduce dimensions while approximating the original matrix.
The document summarizes a talk given by Mark Girolami on manifold Monte Carlo methods. It discusses using stochastic diffusions and geometric concepts to improve MCMC methods. Specifically, it proposes using discretized Langevin and Hamiltonian diffusions across a Riemann manifold as an adaptive proposal mechanism. This is founded on deterministic geodesic flows on the manifold. Examples presented include a warped bivariate Gaussian, Gaussian mixture model, and log-Gaussian Cox process.
Delayed acceptance for Metropolis-Hastings algorithmsChristian Robert
The document proposes a delayed acceptance method for accelerating Metropolis-Hastings algorithms. It begins with a motivating example of non-informative inference for mixture models where computing the prior density is costly. It then introduces the delayed acceptance approach which splits the acceptance probability into pieces that are evaluated sequentially, avoiding computing the full acceptance ratio each time. It validates that the delayed acceptance chain is reversible and provides bounds on its spectral gap and asymptotic variance compared to the original chain. Finally, it discusses optimizing the delayed acceptance approach by considering the expected square jump distance and cost per iteration to maximize efficiency.
1) The document presents a spectral sum rule for conformal field theories relating a weighted integral of the spectral density to one-point functions of the stress tensor operator.
2) It regularizes the retarded Green's function to remove divergent pieces, arriving at a difference of spectral densities that is analytic and well-behaved.
3) The sum rule constrains the parameters of the three-point function of the stress tensor in terms of the one-point function, and is checked against holographic calculations in anti-de Sitter space.
This document discusses rank-aware algorithms for joint sparse recovery from multiple measurement vectors (MMV). It begins by introducing the MMV problem and showing that when the rank of the signal matrix is r, the necessary and sufficient conditions for unique recovery are less restrictive than in the single measurement vector case. Classical MMV algorithms like SOMP and l1/lq minimization are not rank-aware. The document then proposes two rank-aware pursuit algorithms:
1) Rank-Aware OMP, which modifies the atom selection step of SOMP but still suffers from rank degeneration over iterations.
2) Rank-Aware Order Recursive Matching Pursuit (RA-ORMP), which forces the sparsity
This document discusses using the Wasserstein distance for inference in generative models. It begins by introducing ABC methods that use a distance between samples to compare observed and simulated data. It then discusses using the Wasserstein distance as an alternative distance metric that has lower variance than the Euclidean distance. The document covers computational aspects of calculating the Wasserstein distance, asymptotic properties of minimum Wasserstein estimators, and applications to time series data.
RSS discussion of Girolami and Calderhead, October 13, 2010Christian Robert
1. The document discusses discretizing Hamiltonians for Markov chain Monte Carlo (MCMC) methods. Specifically, it examines reproducing Hamiltonian equations through discretization, such as via generalized leapfrog.
2. However, the invariance and stability properties of the continuous-time process may not carry over to the discretized version. Approximations can be corrected with a Metropolis-Hastings step, so exactly reproducing the continuous behavior is not necessarily useful.
3. Discretization induces a calibration problem of determining the appropriate step size. Convergence issues for the MCMC algorithm should not be impacted by imperfect renderings of the continuous-time process in discrete time.
Tensor Decomposition and its ApplicationsKeisuke OTAKI
This document discusses tensor factorizations and decompositions and their applications in data mining. It introduces tensors as multi-dimensional arrays and covers 2nd order tensors (matrices) and 3rd order tensors. It describes how tensor decompositions like the Tucker model and CANDECOMP/PARAFAC (CP) model can be used to decompose tensors into core elements to interpret data. It also discusses singular value decomposition (SVD) as a way to decompose matrices and reduce dimensions while approximating the original matrix.
Notes on intersection theory written for a seminar in Bonn in 2010.
Following Fulton's book the following topics are covered:
- Motivation of intersection theory
- Cones and Segre Classes
- Chern Classes
- Gauss-Bonet Formula
- Segre classes under birational morphisms
- Flat pull back
This document presents a closed-form solution for a class of discrete-time algebraic Riccati equations (DTAREs) under certain assumptions. It begins with background on Riccati equations and their importance in control theory. It then provides the assumptions considered, including that the A matrix eigenvalues are distinct. The main result is a closed-form solution for the DTARE when R=1. Extensions discussed include the solution's behavior as Q approaches zero and for repeated eigenvalues. Comparisons with numerical solutions verify the closed-form solution's accuracy.
This document summarizes results on analyzing stochastic gradient descent (SGD) algorithms for minimizing convex functions. It shows that a continuous-time version of SGD (SGD-c) can strongly approximate the discrete-time version (SGD-d) under certain conditions. It also establishes that SGD achieves the minimax optimal convergence rate of O(t^-1/2) for α=1/2 by using an "averaging from the past" procedure, closing the gap between previous lower and upper bound results.
This document presents a modified Mann iteration method for finding common fixed points of a countable family of multivalued mappings in Banach spaces. It introduces using the best approximation operator PTn to define the iteration (1.2), where xn+1 is defined as a convex combination of the previous iterate xn and an element in PTn xn . The paper aims to establish weak and strong convergence theorems for this iterative method to find a common fixed point for a countable family of nonexpansive multivalued mappings. It provides relevant background on fixed points, nonexpansive mappings, and best approximation operators in Banach and Hilbert spaces.
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsChristian Robert
Aggregate of three different papers on Rao-Blackwellisation, from Casella & Robert (1996), to Douc & Robert (2010), to Banterle et al. (2015), presented during an OxWaSP workshop on MCMC methods, Warwick, Nov 20, 2015
Amirim Project - Threshold Functions in Random Simplicial Complexes - Avichai...Avichai Cohen
This document presents an overview of prior work on threshold functions in random simplicial complexes. It discusses questions about the threshold for the existence of cycles and collapsibility in higher dimensional simplicial complexes. Specifically, it summarizes previous results that established coarse threshold functions of 1/n for the existence of cycles in dimensions greater than 1. It also reviews upper bounds on the critical probability for this threshold. The document outlines prior studies on the collapsability threshold and presents lower bounds on the critical probability. The author's own work is aimed at improving the bounds and determining sharp threshold functions for these properties in random simplicial complexes.
1) The document discusses instanton solutions in AdS4/CFT3 correspondence. Instantons represent exact solutions that allow probing the CFT away from the conformal vacuum.
2) For a conformally coupled scalar field, the bulk admits instanton solutions that correspond to an instability of AdS4. The boundary effective action is a 2+1 conformally coupled scalar with a φ6 interaction, representing a vacuum instability.
3) Tunneling probabilities between vacua can be computed from the instanton solutions, representing gravitational tunneling effects in the bulk and phase transitions in the boundary CFT.
Tutorial of topological_data_analysis_part_1(basic)Ha Phuong
This document provides an overview of topological data analysis (TDA) concepts, including:
- Simplicial complexes which represent topological spaces and holes of different dimensions
- Persistent homology which tracks the appearance and disappearance of holes over different scales
- Applications of TDA concepts like using persistent homology to analyze protein compressibility.
Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...Katsuya Ito
In this presentation, we explain the monograph ”Functional Analysis and Optimization” by Kazufumi Ito
https://kito.wordpress.ncsu.edu/files/2018/04/funa3.pdf
Our goal in this presentation is to
-Understand the basic notions of functional analysis
lower-semicontinuous, subdifferential, conjugate functional
- Understand the formulation of duality problem
primal (P), perturbed (Py), and dual (P∗) problem
-Understand the primal-dual relationships
inf(P)≤sup(P∗), inf(P) = sup(P∗), inf supL≤sup inf L
Propositional resolution is a powerful rule of inference for propositional logic that allows building a sound and complete theorem prover. The chapter covers clausal form, which expressions must be converted to for resolution to apply. A simple set of conversion rules is provided. Resolution works by cancelling out literals and their negations from two clauses to infer a new clause. Several examples are worked through. Validity checking and proving entailment using resolution are also discussed.
EM 알고리즘을 jensen's inequality부터 천천히 잘 설명되어있다
이것을 보면, LDA의 Variational method로 학습하는 방식이 어느정도 이해가 갈 것이다.
옛날 Andrew Ng 선생님의 강의노트에서 발췌한 건데 5년전에 본 것을
아직도 찾아가면서 참고하면서 해야 된다는 게 그 강의가 얼마나 명강의였는지 새삼 느끼게 된다.
This document summarizes key concepts from Sobolev spaces and their applications in mechanics problems. It introduces Sobolev spaces Wm,p(Ω) whose norms involve integrals of function and derivative values. These spaces allow generalized notions of derivatives. Sobolev's imbedding theorem establishes continuity properties of mappings between Sobolev and other function spaces. These properties are important for analyzing mechanical models that involve elements in Sobolev spaces.
This document summarizes Chapter 2 of a textbook on functional analysis in mechanics. It introduces Sobolev spaces, which are function spaces used to model mechanical problems. Sobolev spaces allow for generalized notions of derivatives of functions. The chapter discusses imbedding theorems for Sobolev spaces, which describe how functions in one Sobolev space can be mapped continuously or compactly to other function spaces. It provides examples of imbedding properties for specific Sobolev spaces over different domains.
This document discusses Bayesian model comparison in cosmology using population Monte Carlo methods. It provides background on key questions in cosmology that can be addressed using cosmic microwave background data from experiments like WMAP and Planck. Population Monte Carlo and adaptive importance sampling methods are introduced to help approximate Bayesian evidence for different cosmological models given the immense computational challenges of working with this cosmological data.
On the k-Riemann-Liouville fractional integral and applications Premier Publishers
Fractional calculus is a generalization of ordinary differentiation and integration to arbitrary non-integer order. The subject is as old as differential calculus and goes back to times when G.W. Leibniz and I. Newton invented differential calculus. Fractional integrals and derivatives arise in many engineering and scientific disciplines as the mathematical modeling of systems and processes in the fields of physics, chemistry, aerodynamics, electrodynamics of a complex medium. Very recently, Mubeen and Habibullah have introduced the k-Riemann-Liouville fractional integral defined by using the -Gamma function, which is a generalization of the classical Gamma function. In this paper, we presents a new fractional integration is called k-Riemann-Liouville fractional integral, which generalizes the k-Riemann-Liouville fractional integral. Then, we prove the commutativity and the semi-group properties of the -Riemann-Liouville fractional integral and we give Chebyshev inequalities for k-Riemann-Liouville fractional integral. Later, using k-Riemann-Liouville fractional integral, we establish some new integral inequalities.
This document discusses various methods for estimating normalizing constants that arise when evaluating integrals numerically. It begins by noting there are many computational methods for approximating normalizing constants across different communities. It then lists the topics that will be covered in the upcoming workshop, including discussions on estimating constants using Monte Carlo methods and Bayesian versus frequentist approaches. The document provides examples of estimating normalizing constants using Monte Carlo integration, reverse logistic regression, and Xiao-Li Meng's maximum likelihood estimation approach. It concludes by discussing some of the challenges in bringing a statistical framework to constant estimation problems.
comments on exponential ergodicity of the bouncy particle samplerChristian Robert
The document summarizes recent work on establishing theoretical convergence rates for the bouncy particle sampler (BPS), a non-reversible Markov chain Monte Carlo algorithm. The main results show that under certain conditions on the target distribution, including having exponentially decaying tails, the BPS exhibits exponential ergodicity. A central limit theorem is also established. The analysis considers different cases for thin-tailed, thick-tailed, and transformed target distributions.
A note on arithmetic progressions in sets of integersLukas Nabergall
This document presents a new upper bound on r3(n), the maximum size of a set of integers between 1 and n that contains no three elements in arithmetic progression. The author proves that r3(n) = O(n/log^h n) for any arbitrarily large h, improving on previous bounds. The proof uses the fundamental theorem of discrete calculus and the pigeonhole principle to show that any sufficiently dense set of integers must contain arbitrarily long arithmetic progressions. This verifies a 1936 conjecture of Erdos and improves understanding of a major problem in combinatorics.
This document presents a new immune algorithm called IMMALG for solving the chromatic number problem. IMMALG incorporates a stochastic aging operator and local search procedure to improve performance. It is characterized and parameterized using Kullback entropy. Experimental results show IMMALG is competitive with state-of-the-art evolutionary algorithms for chromatic number problem instances.
Fractional Newton-Raphson Method and Some Variants for the Solution of Nonlin...mathsjournal
The following document presents some novel numerical methods valid for one and several variables, which
using the fractional derivative, allow us to find solutions for some nonlinear systems in the complex space using
real initial conditions. The origin of these methods is the fractional Newton-Raphson method, but unlike the
latter, the orders proposed here for the fractional derivatives are functions. In the first method, a function is
used to guarantee an order of convergence (at least) quadratic, and in the other, a function is used to avoid the
discontinuity that is generated when the fractional derivative of the constants is used, and with this, it is possible
that the method has at most an order of convergence (at least) linear.
The document discusses multiobjective optimization and evolutionary algorithms. It defines multiobjective optimization problems as having multiple objective functions to minimize subject to constraints. Pareto optimal solutions are those that are not dominated by any other solutions in terms of all objectives. Evolutionary algorithms are used to approximate the Pareto front and find Pareto optimal solutions. Non-dominated sorting and crowding distance are used to select the next population in NSGA-II. The hypervolume indicator measures the size of the space covered by the Pareto front approximations.
This document discusses rank-aware algorithms for joint sparse recovery from multiple measurement vectors (MMV). It begins by introducing the MMV problem and showing that when the rank of the signal matrix is r, the necessary and sufficient conditions for unique recovery are less restrictive than in the single measurement vector case. Classical MMV algorithms like SOMP and l1/lq minimization are not rank-aware. The document then proposes two rank-aware pursuit algorithms:
1) Rank-Aware OMP, which modifies the atom selection step of SOMP but still suffers from rank degeneration over iterations.
2) Rank-Aware Order Recursive Matching Pursuit (RA-ORMP), which forces the sparsity
Notes on intersection theory written for a seminar in Bonn in 2010.
Following Fulton's book the following topics are covered:
- Motivation of intersection theory
- Cones and Segre Classes
- Chern Classes
- Gauss-Bonet Formula
- Segre classes under birational morphisms
- Flat pull back
This document presents a closed-form solution for a class of discrete-time algebraic Riccati equations (DTAREs) under certain assumptions. It begins with background on Riccati equations and their importance in control theory. It then provides the assumptions considered, including that the A matrix eigenvalues are distinct. The main result is a closed-form solution for the DTARE when R=1. Extensions discussed include the solution's behavior as Q approaches zero and for repeated eigenvalues. Comparisons with numerical solutions verify the closed-form solution's accuracy.
This document summarizes results on analyzing stochastic gradient descent (SGD) algorithms for minimizing convex functions. It shows that a continuous-time version of SGD (SGD-c) can strongly approximate the discrete-time version (SGD-d) under certain conditions. It also establishes that SGD achieves the minimax optimal convergence rate of O(t^-1/2) for α=1/2 by using an "averaging from the past" procedure, closing the gap between previous lower and upper bound results.
This document presents a modified Mann iteration method for finding common fixed points of a countable family of multivalued mappings in Banach spaces. It introduces using the best approximation operator PTn to define the iteration (1.2), where xn+1 is defined as a convex combination of the previous iterate xn and an element in PTn xn . The paper aims to establish weak and strong convergence theorems for this iterative method to find a common fixed point for a countable family of nonexpansive multivalued mappings. It provides relevant background on fixed points, nonexpansive mappings, and best approximation operators in Banach and Hilbert spaces.
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsChristian Robert
Aggregate of three different papers on Rao-Blackwellisation, from Casella & Robert (1996), to Douc & Robert (2010), to Banterle et al. (2015), presented during an OxWaSP workshop on MCMC methods, Warwick, Nov 20, 2015
Amirim Project - Threshold Functions in Random Simplicial Complexes - Avichai...Avichai Cohen
This document presents an overview of prior work on threshold functions in random simplicial complexes. It discusses questions about the threshold for the existence of cycles and collapsibility in higher dimensional simplicial complexes. Specifically, it summarizes previous results that established coarse threshold functions of 1/n for the existence of cycles in dimensions greater than 1. It also reviews upper bounds on the critical probability for this threshold. The document outlines prior studies on the collapsability threshold and presents lower bounds on the critical probability. The author's own work is aimed at improving the bounds and determining sharp threshold functions for these properties in random simplicial complexes.
1) The document discusses instanton solutions in AdS4/CFT3 correspondence. Instantons represent exact solutions that allow probing the CFT away from the conformal vacuum.
2) For a conformally coupled scalar field, the bulk admits instanton solutions that correspond to an instability of AdS4. The boundary effective action is a 2+1 conformally coupled scalar with a φ6 interaction, representing a vacuum instability.
3) Tunneling probabilities between vacua can be computed from the instanton solutions, representing gravitational tunneling effects in the bulk and phase transitions in the boundary CFT.
Tutorial of topological_data_analysis_part_1(basic)Ha Phuong
This document provides an overview of topological data analysis (TDA) concepts, including:
- Simplicial complexes which represent topological spaces and holes of different dimensions
- Persistent homology which tracks the appearance and disappearance of holes over different scales
- Applications of TDA concepts like using persistent homology to analyze protein compressibility.
Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...Katsuya Ito
In this presentation, we explain the monograph ”Functional Analysis and Optimization” by Kazufumi Ito
https://kito.wordpress.ncsu.edu/files/2018/04/funa3.pdf
Our goal in this presentation is to
-Understand the basic notions of functional analysis
lower-semicontinuous, subdifferential, conjugate functional
- Understand the formulation of duality problem
primal (P), perturbed (Py), and dual (P∗) problem
-Understand the primal-dual relationships
inf(P)≤sup(P∗), inf(P) = sup(P∗), inf supL≤sup inf L
Propositional resolution is a powerful rule of inference for propositional logic that allows building a sound and complete theorem prover. The chapter covers clausal form, which expressions must be converted to for resolution to apply. A simple set of conversion rules is provided. Resolution works by cancelling out literals and their negations from two clauses to infer a new clause. Several examples are worked through. Validity checking and proving entailment using resolution are also discussed.
EM 알고리즘을 jensen's inequality부터 천천히 잘 설명되어있다
이것을 보면, LDA의 Variational method로 학습하는 방식이 어느정도 이해가 갈 것이다.
옛날 Andrew Ng 선생님의 강의노트에서 발췌한 건데 5년전에 본 것을
아직도 찾아가면서 참고하면서 해야 된다는 게 그 강의가 얼마나 명강의였는지 새삼 느끼게 된다.
This document summarizes key concepts from Sobolev spaces and their applications in mechanics problems. It introduces Sobolev spaces Wm,p(Ω) whose norms involve integrals of function and derivative values. These spaces allow generalized notions of derivatives. Sobolev's imbedding theorem establishes continuity properties of mappings between Sobolev and other function spaces. These properties are important for analyzing mechanical models that involve elements in Sobolev spaces.
This document summarizes Chapter 2 of a textbook on functional analysis in mechanics. It introduces Sobolev spaces, which are function spaces used to model mechanical problems. Sobolev spaces allow for generalized notions of derivatives of functions. The chapter discusses imbedding theorems for Sobolev spaces, which describe how functions in one Sobolev space can be mapped continuously or compactly to other function spaces. It provides examples of imbedding properties for specific Sobolev spaces over different domains.
This document discusses Bayesian model comparison in cosmology using population Monte Carlo methods. It provides background on key questions in cosmology that can be addressed using cosmic microwave background data from experiments like WMAP and Planck. Population Monte Carlo and adaptive importance sampling methods are introduced to help approximate Bayesian evidence for different cosmological models given the immense computational challenges of working with this cosmological data.
On the k-Riemann-Liouville fractional integral and applications Premier Publishers
Fractional calculus is a generalization of ordinary differentiation and integration to arbitrary non-integer order. The subject is as old as differential calculus and goes back to times when G.W. Leibniz and I. Newton invented differential calculus. Fractional integrals and derivatives arise in many engineering and scientific disciplines as the mathematical modeling of systems and processes in the fields of physics, chemistry, aerodynamics, electrodynamics of a complex medium. Very recently, Mubeen and Habibullah have introduced the k-Riemann-Liouville fractional integral defined by using the -Gamma function, which is a generalization of the classical Gamma function. In this paper, we presents a new fractional integration is called k-Riemann-Liouville fractional integral, which generalizes the k-Riemann-Liouville fractional integral. Then, we prove the commutativity and the semi-group properties of the -Riemann-Liouville fractional integral and we give Chebyshev inequalities for k-Riemann-Liouville fractional integral. Later, using k-Riemann-Liouville fractional integral, we establish some new integral inequalities.
This document discusses various methods for estimating normalizing constants that arise when evaluating integrals numerically. It begins by noting there are many computational methods for approximating normalizing constants across different communities. It then lists the topics that will be covered in the upcoming workshop, including discussions on estimating constants using Monte Carlo methods and Bayesian versus frequentist approaches. The document provides examples of estimating normalizing constants using Monte Carlo integration, reverse logistic regression, and Xiao-Li Meng's maximum likelihood estimation approach. It concludes by discussing some of the challenges in bringing a statistical framework to constant estimation problems.
comments on exponential ergodicity of the bouncy particle samplerChristian Robert
The document summarizes recent work on establishing theoretical convergence rates for the bouncy particle sampler (BPS), a non-reversible Markov chain Monte Carlo algorithm. The main results show that under certain conditions on the target distribution, including having exponentially decaying tails, the BPS exhibits exponential ergodicity. A central limit theorem is also established. The analysis considers different cases for thin-tailed, thick-tailed, and transformed target distributions.
A note on arithmetic progressions in sets of integersLukas Nabergall
This document presents a new upper bound on r3(n), the maximum size of a set of integers between 1 and n that contains no three elements in arithmetic progression. The author proves that r3(n) = O(n/log^h n) for any arbitrarily large h, improving on previous bounds. The proof uses the fundamental theorem of discrete calculus and the pigeonhole principle to show that any sufficiently dense set of integers must contain arbitrarily long arithmetic progressions. This verifies a 1936 conjecture of Erdos and improves understanding of a major problem in combinatorics.
This document presents a new immune algorithm called IMMALG for solving the chromatic number problem. IMMALG incorporates a stochastic aging operator and local search procedure to improve performance. It is characterized and parameterized using Kullback entropy. Experimental results show IMMALG is competitive with state-of-the-art evolutionary algorithms for chromatic number problem instances.
Fractional Newton-Raphson Method and Some Variants for the Solution of Nonlin...mathsjournal
The following document presents some novel numerical methods valid for one and several variables, which
using the fractional derivative, allow us to find solutions for some nonlinear systems in the complex space using
real initial conditions. The origin of these methods is the fractional Newton-Raphson method, but unlike the
latter, the orders proposed here for the fractional derivatives are functions. In the first method, a function is
used to guarantee an order of convergence (at least) quadratic, and in the other, a function is used to avoid the
discontinuity that is generated when the fractional derivative of the constants is used, and with this, it is possible
that the method has at most an order of convergence (at least) linear.
The document discusses multiobjective optimization and evolutionary algorithms. It defines multiobjective optimization problems as having multiple objective functions to minimize subject to constraints. Pareto optimal solutions are those that are not dominated by any other solutions in terms of all objectives. Evolutionary algorithms are used to approximate the Pareto front and find Pareto optimal solutions. Non-dominated sorting and crowding distance are used to select the next population in NSGA-II. The hypervolume indicator measures the size of the space covered by the Pareto front approximations.
This document discusses rank-aware algorithms for joint sparse recovery from multiple measurement vectors (MMV). It begins by introducing the MMV problem and showing that when the rank of the signal matrix is r, the necessary and sufficient conditions for unique recovery are less restrictive than in the single measurement vector case. Classical MMV algorithms like SOMP and l1/lq minimization are not rank-aware. The document then proposes two rank-aware pursuit algorithms:
1) Rank-Aware OMP, which modifies the atom selection step of SOMP but still suffers from rank degeneration over iterations.
2) Rank-Aware Order Recursive Matching Pursuit (RA-ORMP), which forces the sparsity
Maximizing Submodular Function over the Integer LatticeTasuku Soma
The document describes generalizations of submodular function maximization and submodular cover problems from sets to integer lattices. It presents polynomial-time approximation algorithms for maximizing monotone diminishing return (DR) submodular functions subject to constraints like cardinality, polymatroid and knapsack on the integer lattice. It also presents an algorithm for the DR-submodular cover problem of minimizing cost subject to achieving a quality threshold. The results provide useful extensions of submodular optimization to settings that cannot be modeled as set functions.
The document discusses inapproximability theory for NP optimization problems. It provides an overview of approximation ratios and approximation-preserving reductions. The key ingredients for obtaining inapproximability results are approximation-preserving reductions, gap problems, and the probabilistic checking proof (PCP) theorem. The PCP theorem shows that any language in NP can be reduced to approximating a gap problem for MAX-3SAT, allowing efficient computation of gap problems and derivation of inapproximability results.
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
This document discusses an empirical Bayesian approach for estimating regularization parameters in inverse problems using maximum likelihood estimation. It proposes the Stochastic Optimization with Unadjusted Langevin (SOUL) algorithm, which uses Markov chain sampling to approximate gradients in a stochastic projected gradient descent scheme for optimizing the regularization parameter. The algorithm is shown to converge to the maximum likelihood estimate under certain conditions on the log-likelihood and prior distributions.
The document discusses extending the barrier method to optimization problems with generalized inequalities. It introduces logarithmic barrier functions for generalized inequalities and defines the central path. Points on the central path give dual feasible solutions. The barrier method can be applied to find an epsilon-optimal solution in O(log(1/epsilon)) iterations, each requiring O(sqrt(n)) Newton iterations under self-concordance assumptions. Complexity is analyzed using self-concordant functions and properties of generalized logarithms.
This document discusses using the sequence of iterates generated by inertial methods to minimize convex functions. It introduces inertial methods and how they can be used to generate sequences that converge to the minimum. While the last iterate is often used, sometimes averaging over iterates or using extrapolations like Aitken acceleration can provide better estimates of the minimum. Inertial methods allow for more exploration of the function space than gradient descent alone. The geometry of the function may provide opportunities to analyze the iterate sequence and obtain improved convergence estimates.
The document summarizes a presentation on minimizing tensor estimation error using alternating minimization. It begins with an introduction to tensor decompositions including CP, Tucker, and tensor train decompositions. It then discusses nonparametric tensor estimation using an alternating minimization method. The method iteratively updates components while holding other components fixed, achieving efficient computation. The analysis shows that after t iterations, the estimation error is bounded by the sum of a statistical error term and an optimization error term decaying exponentially in t. Real data analysis uses the method for multitask learning.
This document presents a study on minima distribution and its application to explaining the convergence and convergence speed of optimization algorithms. The author proposes weaker conditions for the convergence and monotonicity properties of minima distribution compared to previous work. Specifically, the convergence condition requires the function τ(x) to be positive and non-minimizing everywhere except at minimizers of the target function f(x). The monotonicity condition requires f(x) and the derivative of τ(x) to be inversely proportional. The author then defines convergence speed as the derivative of the integral of f(x) with respect to the optimization iterations and shows it depends on the slope of the derivative of τ(x).
This document discusses the existence and uniqueness of renormalized solutions to a nonlinear multivalued elliptic problem with homogeneous Neumann boundary conditions and L1 data. Specifically, it considers the problem β(u) - div a(x, Du) ∋ f in Ω, with a(x, Du).η = 0 on ∂Ω, where f is an L1 function. It provides definitions of renormalized solutions and entropy solutions. The main result is the existence and uniqueness of renormalized solutions to this problem, which is proved using a priori estimates and a compactness argument with doubling of variables.
This document discusses the existence and uniqueness of renormalized solutions to a nonlinear multivalued elliptic problem with homogeneous Neumann boundary conditions and L1 data. Specifically, it considers the problem β(u) - div a(x, Du) ∋ f in Ω, with a(x, Du).η = 0 on ∂Ω. It defines renormalized solutions and entropy solutions for this problem. The main result is that under certain assumptions on the data, there exists a unique renormalized solution to the problem. The proof uses approximate methods, showing existence and uniqueness for a penalized approximation problem, and passing to the limit.
The low-rank basis problem for a matrix subspaceTasuku Soma
This document summarizes a presentation on finding low-rank bases for matrix subspaces. It introduces the low-rank basis problem, describes a greedy algorithm to solve it using two phases - rank estimation and alternating projection, and proves local convergence guarantees for the algorithm. Experimental results on synthetic and image data demonstrate the algorithm can recover known low-rank bases and separate mixed images. Comparisons are made to tensor decomposition methods for the special case of rank-1 bases.
In this talk, I address two new ideas in sampling geometric objects. The first is a new take on adaptive sampling with respect to the local feature size, i.e., the distance to the medial axis. We recently proved that such samples acn be viewed as uniform samples with respect to an alternative metric on the Euclidean space. The second is a generalization of Voronoi refinement sampling. There, one also achieves an adaptive sample while simultaneously "discovering" the underlying sizing function. We show how to construct such samples that are spaced uniformly with respect to the $k$th nearest neighbor distance function.
The document describes sparse matrix reconstruction using a matrix completion algorithm. It begins with an overview of the matrix completion problem and formulation. It then describes the algorithm which uses soft-thresholding to impose a low-rank constraint and iteratively finds the matrix that agrees with the observed entries. The algorithm is proven to converge to the desired solution. Extensions to noisy data and generalized constraints are also discussed.
Conditional random fields (CRFs) are probabilistic models for segmenting and labeling sequence data. CRFs address limitations of previous models like hidden Markov models (HMMs) and maximum entropy Markov models (MEMMs). CRFs allow incorporation of arbitrary, overlapping features of the observation sequence and label dependencies. Parameters are estimated to maximize the conditional log-likelihood using iterative scaling or tracking partial feature expectations. Experiments show CRFs outperform HMMs and MEMMs on synthetic and real-world tasks by addressing label bias problems and modeling dependencies beyond the previous label.
Image sciences, image processing, image restoration, photo manipulation. Image and videos representation. Digital versus analog imagery. Quantization and sampling. Sources and models of noises in digital CCD imagery: photon, thermal and readout noises. Sources and models of blurs. Convolutions and point spread functions. Overview of other standard models, problems and tasks: salt-and-pepper and impulse noises, half toning, inpainting, super-resolution, compressed sensing, high dynamic range imagery, demosaicing. Short introduction to other types of imagery: SAR, Sonar, ultrasound, CT and MRI. Linear and ill-posed restoration problems.
This document proposes a linear programming (LP) based approach for solving maximum a posteriori (MAP) estimation problems on factor graphs that contain multiple-degree non-indicator functions. It presents an existing LP method for problems with single-degree functions, then introduces a transformation to handle multiple-degree functions by introducing auxiliary variables. This allows applying the existing LP method. As an example, it applies this to maximum likelihood decoding for the Gaussian multiple access channel. Simulation results demonstrate the LP approach decodes correctly with polynomial complexity.
The document provides an overview and outline of the course "Optimization for Machine Learning". Key points:
- The course covers topics like convexity, gradient methods, constrained optimization, proximal algorithms, stochastic gradient descent, and more.
- Mathematical modeling and computational optimization for machine learning are discussed. Optimization algorithms like gradient descent and stochastic gradient descent are important for learning model parameters.
- Convex optimization problems have desirable properties like every local minimum being a global minimum. Gradient descent and related algorithms are guaranteed to converge for convex problems.
- Convex sets and functions are introduced, including characterizations using epigraphs and subgradients. Convex functions have useful properties like continuity and satisfying Jensen's inequality.
This document discusses methods for performing sparse time-frequency representation (STFR) on signals in 2 dimensions. STFR decomposes signals into intrinsic mode functions (IMFs) by finding the sparsest representation of the signal over a redundant dictionary. The 1D version has been implemented successfully, but extending to 2D is challenging due to the need to update in two directions simultaneously. Several attempted algorithms are described, including applying the 1D algorithm to slices of the 2D signal in different directions and averaging the results. The most recent attempt uses bi-directional slicing to overcome issues with previous global approaches.
We start with motivation, few examples of uncertainties. Then we discretize elliptic PDE with uncertain coefficients, apply TT format for permeability, the stochastic operator and for the solution. We compare sparse multi-index set approach with full multi-index+TT.
Tensor Train format allows us to keep the whole multi-index set, without any multi-index set truncation.
Similar to slides_low_rank_matrix_optim_farhad (20)
Implementation of a Localization System for Sensor Networks-berkleyFarhad Gholami
This dissertation discusses the implementation of a localization system for sensor networks. It addresses two main tasks: establishing relationships to reference points (e.g. distance measurements) and using those relationships and reference point positions to calculate sensor positions algorithmically.
The dissertation first presents various centralized and distributed localization algorithms from existing research. It then focuses on implementing a distributed, least-squares-based localization algorithm and designing an ultra-low power hardware architecture for it. Measurement errors due to fixed-point arithmetic are also analyzed.
The second part of the dissertation proposes, designs and prototypes an RF signal-based time-of-flight ranging system. The prototype achieves a measurement error within -0.5m to 2m at 100
Zero-padding a signal involves appending artificial zeros to increase the length of the signal. This increases the frequency resolution of the discrete Fourier transform (DFT) by changing the implicit periodicity assumption made about the signal. Specifically, zero-padding moves the DFT closer to approximating the true discrete-time Fourier transform (DTFT) by changing the assumption from periodicity to assuming the signal is zero outside the observed range. While zero-padding does not provide new information, it can help reveal features of a signal by modifying the implicit assumptions of the DFT.
This document presents a new method called the Real-valued Iterative Adaptive Approach (RIAA) for estimating the power spectral density of nonuniformly sampled data. It aims to improve upon the periodogram, which suffers from poor resolution and leakage. RIAA is an iteratively weighted least squares periodogram that uses an adaptive weighting matrix built from the most recent spectral estimate. It is shown to have significantly less leakage than the least squares periodogram through its use of an adaptive filter. The Bayesian Information Criterion is also discussed as a way to test the significance of peaks in the estimated spectrum.
The document discusses weighted nuclear norm minimization and its applications to image denoising. It provides background on key concepts from linear algebra and optimization theory needed to understand the denoising problem, such as convex optimization, affine transformations, singular value decomposition, and eigendecomposition. The objective of denoising is to extract the low-rank original image from a noisy high-dimensional image, modeled as the sum of the original image and white noise.
This document describes an RSSI (received signal strength indicator) based localization algorithm for wireless sensor networks. It discusses using RSSI values measured from reference nodes to estimate distances and perform trilateration to locate a target sensor node. The algorithm design includes RSSI to distance conversion using a path loss model, trilateration implementation using circle intersections, and simplifying computations for resource-limited sensor node processors through techniques like Taylor series approximations of exponential functions. Pseudocode is provided for RSSI to distance conversion and trilateration calculations.
1. The wavelet transform can be used to detect singularities or discontinuities in signals by identifying large wavelet coefficients around points of abrupt change across multiple scales.
2. The wavelet transform modulus maxima (WTMM) method uses successive derivative wavelets to identify singularities by removing lower order polynomial terms from the signal at each scale.
3. Local maxima of the continuous wavelet transform are related to singularities in the signal, and their behavior across scales can be used to characterize the point-wise regularity of the signal, detect noise, and reconstruct the signal from its singularities.
This project report discusses applying a Sobel edge detection algorithm and median filtering to colour JPEG images. It introduces the Sobel edge detection algorithm, which uses two 3x3 kernels to approximate horizontal and vertical derivatives in an image. It also discusses median filtering to reduce impulsive noise without blurring edges. The report outlines using the libjpeg library to read, write and process JPEG images in C code. It includes the source code for implementing Sobel edge detection and median filtering on JPEG images.
1. Low-Rank Matrix Optimization Problems
Junxiao Song and Daniel P. Palomar
The Hong Kong University of Science and Technology (HKUST)
ELEC 5470 - Convex Optimization
Fall 2013-14, HKUST, Hong Kong
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 1 / 38
2. Outline of Lecture
1 Motivation
2 Problem Formulation
3 Heuristics for Rank Minimization Problem
Nuclear Norm Heuristic
Log-det Heuristic
Matrix Factorization based Method
Rank Constraint via Convex Iteration
4 Real Applications
Netflix Prize
Video Intrusion Detection
5 Summary
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 2 / 38
3. Outline
1 Motivation
2 Problem Formulation
3 Heuristics for Rank Minimization Problem
Nuclear Norm Heuristic
Log-det Heuristic
Matrix Factorization based Method
Rank Constraint via Convex Iteration
4 Real Applications
Netflix Prize
Video Intrusion Detection
5 Summary
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 3 / 38
4. High Dimensional Data
Data becomes increasingly massive, high dimensional...
Images: compression, denoising, recognition. . .
Videos: streaming, tracking, stabilization. . .
User data: clustering, classification, recommendation. . .
Web data: indexing, ranking, search. . .
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 4 / 38
5. Low Dimensional Structures in High Dimensional Data
Low dimensional structures in visual data
User Data: profiles of different users may share some common factors
How to extract low dimensional structures from such high
dimensional data?
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 5 / 38
6. Outline
1 Motivation
2 Problem Formulation
3 Heuristics for Rank Minimization Problem
Nuclear Norm Heuristic
Log-det Heuristic
Matrix Factorization based Method
Rank Constraint via Convex Iteration
4 Real Applications
Netflix Prize
Video Intrusion Detection
5 Summary
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 6 / 38
7. Rank Minimization Problem (RMP)
In many scenarios, low dimensional structure is closely related to low
rank.
But in real applications, the true rank is usually unknown. A natural
approach to solve this is to formulate it as a rank minimization
problem (RMP), i.e., finding the matrix of lowest rank that satisfies
some constraint
minimize
X
rank(X)
subject to X ∈ C,
where X ∈ Rm×n is the optimization variable and C is a convex set
denoting the constraints.
When X is restricted to be diagonal, rank(X) = diag(X) 0 and the
rank minimization problem reduces to the cardinality minimization
problem ( 0-norm minimization).
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 7 / 38
8. Matrix Rank
The rank of a matrix X ∈ Rm×n is
the number of linearly independent rows of X
the number of linearly independent columns of X
the number of nonzero singular values of X, i.e., σ(X) 0.
the smallest number r such that there exists an m × r matrix F and an
r × n matrix G with X = FG
It can be shown that any nonsquare matrix X can be associated with a
positive semidefinite matrix whose rank is exactly twice the rank of X.
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 8 / 38
9. Semidefinite Embedding Lemma
Lemma
Let X ∈ Rm×n be a given matrix. Then rank(X) ≤ r if and only if there
exist matrices Y = YT ∈ Rm×m and Z = ZT ∈ Rn×n such that
Y X
XT Z
0, rank(Y) + rank(Z) ≤ 2r.
Based on the semidefinite embedding lemma, minimizing the rank of a
general nonsquare matrix X, is equivalent to minimizing the rank of
the positive semidefinite, block diagonal matrix blkdiag(Y, Z):
minimize
X,Y,Z
1
2rank(blkdiag(Y, Z))
subject to
Y X
XT Z
0
X ∈ C.
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 9 / 38
10. Outline
1 Motivation
2 Problem Formulation
3 Heuristics for Rank Minimization Problem
Nuclear Norm Heuristic
Log-det Heuristic
Matrix Factorization based Method
Rank Constraint via Convex Iteration
4 Real Applications
Netflix Prize
Video Intrusion Detection
5 Summary
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 10 / 38
11. Solving Rank Minimization Problem
In general, the rank minimization problem is NP-hard, and there is
little hope of finding the global minimum efficiently in all instances.
What we are going to talk about, instead, are efficient heuristics,
categorized into two groups:
1 Approximate the rank function with some surrogate functions
Nuclear norm heuristic
Log-det heuristic
2 Solving a sequence of rank-constrained feasibility problems
Matrix factorization based method
Rank constraint via convex iteration
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 11 / 38
12. Nuclear Norm Heuristic
A well known heuristic for rank minimization problem is replacing the rank
function in the objective with the nuclear norm
minimize
X
X ∗
subject to X ∈ C
Proposed by Fazel (2002) [Fazel, 2002].
The nuclear norm X ∗ is defined as the sum of singular values, i.e.,
X ∗ = r
i=1 σi.
If X = XT 0, X ∗ is just Tr(X) and the “nuclear norm heuristic”
reduces to the “trace heuristic”.
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 12 / 38
13. Why Nuclear Norm?
Nuclear norm can be viewed as the 1-norm of the vector of singular
values.
Just as 1-norm ⇒ sparsity, nuclear norm ⇒ sparse singular value
vector, i.e., low rank.
When X is restricted to be diagonal, X ∗ = diag(X) 1 and the
nuclear norm heuristic for rank minimization problem reduces to the
1-norm heuristic for cardinality minimization problem.
x 1 is the convex envelope of card(x) over {x| x ∞ ≤ 1}.
Similarly, X ∗ is the convex envelope of rank(X) on the convex set
{X| X 2 ≤ 1} .
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 13 / 38
14. Equivalent SDP Formulation
Lemma
For X ∈ Rm×n and t ∈ R, we have X ∗ ≤ t if and only if there exist
matrices Y ∈ Rm×m and Z ∈ Rn×n such that
Y X
XT Z
0, Tr(Y) + Tr(Z) ≤ 2t.
Based on the above lemma, the nuclear norm minimization problem is
equivalent to
minimize
X,Y,Z
1
2Tr(Y + Z)
subject to
Y X
XT Z
0
X ∈ C.
This SDP formulation can also be obtained by applying the “trace
heuristic” to the PSD form of the RMP.
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 14 / 38
15. Log-det Heuristic
In the log-det heuristic, log-det function is used as a smooth surrogate
for rank function.
Symmetric positive semidefinite case:
minimize
X
log det(X + δI)
subject to X ∈ C,
where δ > 0 is a small regularization constant.
Note that log det(X + δI) = i log(σi(X + δI)),
rank(X) = σ(X) 0, and log(s + δ) can be seen as a surrogate
function of card(s).
However, the surrogate function log det(X + δI) is not convex (in
fact, it is concave).
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 15 / 38
16. Log-det Heuristic
An iterative linearization and minimization scheme (called
majorization-minimization) is used to find a local minimum.
Let X(k) denote the kth iterate of the optimization variable X. The
first-order Taylor series expansion of log det (X + δI) about X(k) is
given by
log det (X + δI) ≈ log det X(k)
+ δI +Tr X(k)
+ δI
−1
X − X(k)
Then, one could minimize log det (X + δI) by iteratively minimizing
the local linearization, which leads to
X(k+1)
= argmin
X∈C
Tr X(k)
+ δI
−1
X .
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 16 / 38
17. Interpretation of Log-det Heuristic
If we choose X(0) = I, the first iteration is equivalent to minimizing
the trace of X, which is just the trace heuristic. The iterations that
follow try to reduce the rank further. In this sense, we can view this
heuristic as a refinement of the trace heuristic.
At each iteration we solve a weighted trace minimization problem,
with weights W(k) = X(k) + δI
−1
. Thus, the log-det heuristic can
be considered as an extension of the iterative reweighted 1-norm
heuristic to the matrix case.
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 17 / 38
18. Log-det Heuristic for General Matrix
For general nonsquare matrix X, we can apply the log-det heuristic to
the equivalent PSD form and obtain
minimize
X,Y,Z
log det(blkdiag(Y, Z) + δI)
subject to
Y X
XT Z
0
X ∈ C.
Linearizing as before, at iteration k we solve the following problem to
get X(k+1), Y(k+1) and Z(k+1)
minimize
X,Y,Z
Tr blkdiag(Y(k), Z(k)) + δI
−1
blkdiag(Y, Z)
subject to
Y X
XT Z
0
X ∈ C.
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 18 / 38
19. Matrix Factorization based Method
The idea behind factorization based methods is that rank(X) ≤ r if
and only if X can be factorized as X = FG, where F ∈ Rm×r and
G ∈ Rr×n.
For each given r, we check if there exists a feasible X of rank less
than or equal to r by checking if any X ∈ C can be factored as above.
The expression X = FG is not convex in X, F and G simultaneously,
but it is convex in (X, F) when G is fixed and convex in (X, G) when
F is fixed.
Various heuristics can be applied to handle this non-convex equality
constraint, but it is not guaranteed to find an X with rank r even if
one exists.
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 19 / 38
20. Matrix Factorization based Method
Coordinate descent method: Fix F and G one at a time and
iteratively solve a convex problem at each iteration.
Choose F(0)
∈ Rm×r
. Set k = 1.
repeat
( ˜X(k)
, G(k)
) = argmin
X∈C,G∈Rr×n
X − F(k−1)
G
F
(X(k)
, F(k)
) = argmin
X∈C,F∈Rm×r
X − FG(k)
F
e(k)
= X(k)
− F(k)
G(k)
F
,
until e(k)
≤ , or e(k−1)
and e(k)
are approximately equal.
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 20 / 38
21. Rank Constraint via Convex Iteration
Consider a semidefinite rank-constrained feasibility problem
find
X∈Sn
X
subject to X ∈ C
X 0
rank(X) ≤ r,
It is proposed in [Dattorro, 2005] to solve this problem via iteratively
solving the following two convex problems:
minimize
X
Tr(W X)
subject to X ∈ C
X 0
minimize
W
Tr(WX )
subject to 0 W I
Tr(W) = n − r,
where W is the optimal solution of the second problem and X is the
optimal solution of the first problem.
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 21 / 38
22. Rank Constraint via Convex Iteration
An optimal solution to the second problem is known in closed form.
Given non-increasingly ordered diagonalization X = QΛQT , then
matrix W = U U T is optimal where
U = Q(:, r + 1 : n) ∈ Rn×n−r, and
Tr(W X ) =
n
i=r+1
λi(X ).
We start from W = I and iteratively solving the two convex
problems. Note that in the first iteration the first problem is just the
“trace heuristic”.
Suppose at convergence, Tr(W X ) = τ, if τ = 0, then
rank(X ) ≤ r and X is a feasible point. But this is not guaranteed,
only local convergence can be established, i.e., converging to some
τ ≥ 0.
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 22 / 38
23. Rank Constraint via Convex Iteration
For general nonsquare matrix X ∈ Rm×n, we have an equivalent PSD
form
find
X∈Rm×n
X
subject to X ∈ C
rank(X) ≤ r
⇔
find
X,Y,Z
X
subject to X ∈ C
G =
Y X
XT Z
0
rank(G) ≤ r.
The same convex iterations can be applied now. Note that if we start
from W = I, now the first problem is just the “nuclear norm
heuristic” for the first iteration.
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 23 / 38
24. Outline
1 Motivation
2 Problem Formulation
3 Heuristics for Rank Minimization Problem
Nuclear Norm Heuristic
Log-det Heuristic
Matrix Factorization based Method
Rank Constraint via Convex Iteration
4 Real Applications
Netflix Prize
Video Intrusion Detection
5 Summary
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 24 / 38
25. Recommender Systems
How does Amazon recommend commodities?
How does Netflix recommend movies?
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 25 / 38
26. Netflix Prize
Given 100 million ratings on a scale of 1 to 5, predict 3 million ratings
to highest accuracy
17,770 total movies, 480,189 total users
How to fill in the blanks?
Can you improve the recommendation accuracy by 10% over what
Netflix was using? =⇒ One million dollars!
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 26 / 38
27. Abstract Setup: Matrix Completion
Consider a rating matrix Rm×n with Rij representing the rating user i
gives to movie j.
But some Rij are unknown since no one watches all movies
Movies
R =
2 3 ? ? 5 ?
1 ? ? 4 ? 3
? ? 3 2 ? 5
4 ? 3 ? 2 4
Users
We would like to predict how users will like unwatched movies.
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 27 / 38
28. Structure of the Rating Matrix
The rating matrix is very big, 480, 189 (number of users) times
17, 770 (number of movies) in the Netflix case.
But there are much fewer types of people and movies than there are
people and movies.
So it is reasonable to assume that for each user i, there is a
k-dimensional vector pi explaining the user’s movie taste and for each
movie j, there is also a k-dimensional vector qj explaining the movie’s
appeal. And the inner product between these two vectors, pT
i qj, is
the rating user i gives to movie j, i.e., Rij = pT
i qj. Or equivalently in
matrix form, R is factorized as R = PT Q, where P ∈ Rk×m,
Q ∈ Rk×n, k min(m, n).
It is the same as assuming the matrix R is of low rank.
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 28 / 38
29. Matrix Completion
The true rank is unknown, a natural approach is to find the minimum
rank solution
minimize
X
rank(X)
subject to Xij = Rij, ∀(i, j) ∈ Ω,
where Ω is the set of observed entries.
In practice, instead of requiring strict equality for the observed entries,
one may allow some error and the formulation becomes
minimize
X
rank(X)
subject to (i,j)∈Ω(Xij − Rij)2 ≤ .
Then, all the heuristics can be applied, e.g., log-det heuristic, matrix
factorization.
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 29 / 38
30. What did the Winners Use?
What algorithm did the final winner of the Netflix Prize use?
You can find the report from the Netflix Prize website. The winning
solution is really a cocktail of many methods combined and thousands
of model parameters fine-tuned specially to the training set provided
by Netflix.
But one key idea they used is just the factorization of the rating
matrix as the product of two low rank matrices [Koren et al., 2009],
[Koren and Bell, 2011].
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 30 / 38
31. What did the Winners Use?
What algorithm did the final winner of the Netflix Prize use?
You can find the report from the Netflix Prize website. The winning
solution is really a cocktail of many methods combined and thousands
of model parameters fine-tuned specially to the training set provided
by Netflix.
But one key idea they used is just the factorization of the rating
matrix as the product of two low rank matrices [Koren et al., 2009],
[Koren and Bell, 2011].
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 30 / 38
32. Background Extraction from Video
Given video sequence Fi, i = 1, . . . , n.
The objective is to extract the background in the video sequence, i.e.,
separating the background from the human activities.
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 31 / 38
33. Low-Rank Matrix + Sparse Matrix
Stacking the images and grouping the video sequence
Y = [vec(F1), . . . , vec(Fn)]
The background component is of low rank, since the background is
static within a short period (ideally it is rank one as the image would
be the same).
The foreground component is sparse, as activities only occupy a small
fraction of space.
The problem fits into the following signal model
Y = X + E,
where Y is the observation, X is a low rank matrix (the low rank
background) and E is a sparse matrix (the sparse foreground).
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 32 / 38
34. Low-Rank and Sparse Matrix Recovery
Low-rank and sparse matrix recovery
minimize
X
rank(X) + γ vec(E) 0
subject to Y = X + E.
Applying the nuclear norm heuristic and 1-norm heuristic
simultaneously
minimize
X
X ∗ + γ vec(E) 1
subject to Y = X + E.
Recently, some theoretical results indicate that when X is of low rank
and E is sparse enough, exact recovery happens with high probability
[Wright et al., 2009].
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 33 / 38
35. Background Extraction Result
row 1: the original video sequences.
row 2: the extracted low-rank background.
row 3: the extracted sparse foreground.
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 34 / 38
36. Outline
1 Motivation
2 Problem Formulation
3 Heuristics for Rank Minimization Problem
Nuclear Norm Heuristic
Log-det Heuristic
Matrix Factorization based Method
Rank Constraint via Convex Iteration
4 Real Applications
Netflix Prize
Video Intrusion Detection
5 Summary
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 35 / 38
37. Summary
We have introduced the rank minimization problem.
We have developed different heuristics to solve the rank minimization
problem:
Nuclear norm heuristic
Log-det heuristic
Matrix factorization based method
Rank constraint via convex iteration
Real applications are provided.
J. Song and D. Palomar (HKUST) Low-Rank Optimization ELEC 5470 36 / 38
38. References
Candès, E. J. and Recht, B. (2009).
Exact matrix completion via convex optimization.
Foundations of Computational mathematics, 9(6):717–772.
Dattorro, J. (2005).
Convex Optimization & Euclidean Distance Geometry.
Meboo Publishing USA.
Fazel, M. (2002).
Matrix rank minimization with applications.
PhD thesis, Stanford University.
Koren, Y. and Bell, R. (2011).
Advances in collaborative filtering.
In Recommender Systems Handbook, pages 145–186. Springer.
Koren, Y., Bell, R., and Volinsky, C. (2009).
Matrix factorization techniques for recommender systems.
Computer, 42(8):30–37.
Wright, J., Ganesh, A., Rao, S., Peng, Y., and Ma, Y. (2009).
Robust principal component analysis: Exact recovery of corrupted low-rank matrices via
convex optimization.
In Advances in neural information processing systems, pages 2080–2088.