The document discusses the method of multiplicities, which is a technique for combinatorics using algebra. It involves finding a polynomial that vanishes on a set with high multiplicity. This is applied to problems in list decoding of Reed-Solomon codes, bounding the size of Kakeya sets, and constructing randomness extractors. Specifically, the method is used to improve bounds on list decoding, show that certain Kakeya sets must be large, and allow extraction of more randomness from weak sources. Propagating multiplicities of derivatives allows tighter analysis of these problems.
Computational Information Geometry: A quick review (ICMS)Frank Nielsen
From the workshop
Computational information geometry for image and signal processing
Sep 21, 2015 - Sep 25, 2015
ICMS, 15 South College Street, Edinburgh
http://www.icms.org.uk/workshop.php?id=343
This document summarizes Frank Nielsen's talk on divergence-based center clustering and their applications. Some key points:
- Center-based clustering aims to minimize an objective function that assigns data points to their closest cluster centers. This is an NP-hard problem when the number of dimensions and data points are greater than 1.
- Mixed divergences use dual centroids per cluster to define cluster assignments. Total Jensen divergences are proposed as a way to make divergences more robust by incorporating a conformal factor.
- For clustering when centroids do not have closed-form solutions, initialization methods like k-means++ can be used which randomly select initial seeds without computing centroids. Total Jensen k-means++
The dual geometry of Shannon informationFrank Nielsen
The document discusses the dual geometry of Shannon information. It covers:
1. Shannon entropy and related concepts like maximum entropy principle and exponential families.
2. The properties of Kullback-Leibler divergence including its interpretation as a statistical distance and relation to maximum entropy.
3. How maximum likelihood estimation for exponential families can be viewed as minimizing Kullback-Leibler divergence between the empirical distribution and model distribution.
Classification with mixtures of curved Mahalanobis metricsFrank Nielsen
This document discusses curved Mahalanobis distances in Cayley-Klein geometries and their application to classification. Specifically:
1. It introduces Mahalanobis distances and generalizes them to curved distances in Cayley-Klein geometries, which can model both elliptic and hyperbolic geometries.
2. It describes how to learn these curved Mahalanobis metrics using an adaptation of Large Margin Nearest Neighbors (LMNN) to the elliptic and hyperbolic cases.
3. Experimental results on several datasets show that curved Mahalanobis distances can achieve comparable or better classification accuracy than standard Mahalanobis distances.
Linear Discriminant Analysis (LDA) Under f-Divergence MeasuresAnmol Dwivedi
For more details, please have a look at:
1. https://www.mdpi.com/1099-4300/24/2/188
2. https://ieeexplore.ieee.org/document/9518004
Abstract:
In statistical inference, the information-theoretic performance limits can often be expressed in terms of a notion of divergence between the underlying statistical models (e.g., in binary hypothesis testing, the total error probability is equal to the total variation between the models). As the data dimension grows, computing the statistics involved in decision-making and the attendant performance limits (divergence measures) face complexity and stability challenges. Dimensionality reduction addresses these challenges at the expense of compromising the performance (divergence reduces due to the data processing inequality for divergence). This paper considers linear dimensionality reduction such that the divergence between the models is \emph{maximally} preserved. Specifically, the paper focuses on the Gaussian models and characterizes an optimal projection of the data onto a lower-dimensional subspace with respect to four $f$-divergence measures (Kullback-Leibler, $\chi^2$, Hellinger, and total variation). There are two key observations. First, projections are not necessarily along the dominant modes of the covariance matrix of the data, and even in some situations, they can be along the least dominant modes. Secondly, under specific regimes, the optimal design of subspace projection is identical under all the $f$-divergence measures considered, rendering a degree of universality to the design independent of the inference problem of interest.
QMC algorithms usually rely on a choice of “N” evenly distributed integration nodes in $[0,1)^d$. A common means to assess such an equidistributional property for a point set or sequence is the so-called discrepancy function, which compares the actual number of points to the expected number of points (assuming uniform distribution on $[0,1)^{d}$) that lie within an arbitrary axis parallel rectangle anchored at the origin. The dependence of the integration error using QMC rules on various norms of the discrepancy function is made precise within the well-known Koksma--Hlawka inequality and its variations. In many cases, such as $L^{p}$ spaces, $1<p<\infty$, the best growth rate in terms of the number of points “N” as well as corresponding explicit constructions are known. In the classical setting $p=\infty$ sharp results are absent for $d\geq3$ already and appear to be intriguingly hard to obtain. This talk shall serve as a survey on discrepancy theory with a special emphasis on the $L^{\infty}$ setting. Furthermore, it highlights the evolution of recent techniques and presents the latest results.
Patch Matching with Polynomial Exponential Families and Projective DivergencesFrank Nielsen
This document presents a method called Polynomial Exponential Family-Patch Matching (PEF-PM) to solve the patch matching problem. PEF-PM models patch colors using polynomial exponential families (PEFs), which are universal smooth positive densities. It estimates PEFs using a Score Matching Estimator and accelerates batch estimation using Summed Area Tables. Patch similarity is measured using a statistical projective divergence called the symmetrized γ-divergence. Experiments show PEF-PM handles noise robustly, symmetries, and outperforms baseline methods.
Computational Information Geometry: A quick review (ICMS)Frank Nielsen
From the workshop
Computational information geometry for image and signal processing
Sep 21, 2015 - Sep 25, 2015
ICMS, 15 South College Street, Edinburgh
http://www.icms.org.uk/workshop.php?id=343
This document summarizes Frank Nielsen's talk on divergence-based center clustering and their applications. Some key points:
- Center-based clustering aims to minimize an objective function that assigns data points to their closest cluster centers. This is an NP-hard problem when the number of dimensions and data points are greater than 1.
- Mixed divergences use dual centroids per cluster to define cluster assignments. Total Jensen divergences are proposed as a way to make divergences more robust by incorporating a conformal factor.
- For clustering when centroids do not have closed-form solutions, initialization methods like k-means++ can be used which randomly select initial seeds without computing centroids. Total Jensen k-means++
The dual geometry of Shannon informationFrank Nielsen
The document discusses the dual geometry of Shannon information. It covers:
1. Shannon entropy and related concepts like maximum entropy principle and exponential families.
2. The properties of Kullback-Leibler divergence including its interpretation as a statistical distance and relation to maximum entropy.
3. How maximum likelihood estimation for exponential families can be viewed as minimizing Kullback-Leibler divergence between the empirical distribution and model distribution.
Classification with mixtures of curved Mahalanobis metricsFrank Nielsen
This document discusses curved Mahalanobis distances in Cayley-Klein geometries and their application to classification. Specifically:
1. It introduces Mahalanobis distances and generalizes them to curved distances in Cayley-Klein geometries, which can model both elliptic and hyperbolic geometries.
2. It describes how to learn these curved Mahalanobis metrics using an adaptation of Large Margin Nearest Neighbors (LMNN) to the elliptic and hyperbolic cases.
3. Experimental results on several datasets show that curved Mahalanobis distances can achieve comparable or better classification accuracy than standard Mahalanobis distances.
Linear Discriminant Analysis (LDA) Under f-Divergence MeasuresAnmol Dwivedi
For more details, please have a look at:
1. https://www.mdpi.com/1099-4300/24/2/188
2. https://ieeexplore.ieee.org/document/9518004
Abstract:
In statistical inference, the information-theoretic performance limits can often be expressed in terms of a notion of divergence between the underlying statistical models (e.g., in binary hypothesis testing, the total error probability is equal to the total variation between the models). As the data dimension grows, computing the statistics involved in decision-making and the attendant performance limits (divergence measures) face complexity and stability challenges. Dimensionality reduction addresses these challenges at the expense of compromising the performance (divergence reduces due to the data processing inequality for divergence). This paper considers linear dimensionality reduction such that the divergence between the models is \emph{maximally} preserved. Specifically, the paper focuses on the Gaussian models and characterizes an optimal projection of the data onto a lower-dimensional subspace with respect to four $f$-divergence measures (Kullback-Leibler, $\chi^2$, Hellinger, and total variation). There are two key observations. First, projections are not necessarily along the dominant modes of the covariance matrix of the data, and even in some situations, they can be along the least dominant modes. Secondly, under specific regimes, the optimal design of subspace projection is identical under all the $f$-divergence measures considered, rendering a degree of universality to the design independent of the inference problem of interest.
QMC algorithms usually rely on a choice of “N” evenly distributed integration nodes in $[0,1)^d$. A common means to assess such an equidistributional property for a point set or sequence is the so-called discrepancy function, which compares the actual number of points to the expected number of points (assuming uniform distribution on $[0,1)^{d}$) that lie within an arbitrary axis parallel rectangle anchored at the origin. The dependence of the integration error using QMC rules on various norms of the discrepancy function is made precise within the well-known Koksma--Hlawka inequality and its variations. In many cases, such as $L^{p}$ spaces, $1<p<\infty$, the best growth rate in terms of the number of points “N” as well as corresponding explicit constructions are known. In the classical setting $p=\infty$ sharp results are absent for $d\geq3$ already and appear to be intriguingly hard to obtain. This talk shall serve as a survey on discrepancy theory with a special emphasis on the $L^{\infty}$ setting. Furthermore, it highlights the evolution of recent techniques and presents the latest results.
Patch Matching with Polynomial Exponential Families and Projective DivergencesFrank Nielsen
This document presents a method called Polynomial Exponential Family-Patch Matching (PEF-PM) to solve the patch matching problem. PEF-PM models patch colors using polynomial exponential families (PEFs), which are universal smooth positive densities. It estimates PEFs using a Score Matching Estimator and accelerates batch estimation using Summed Area Tables. Patch similarity is measured using a statistical projective divergence called the symmetrized γ-divergence. Experiments show PEF-PM handles noise robustly, symmetries, and outperforms baseline methods.
Tutorial on Belief Propagation in Bayesian NetworksAnmol Dwivedi
The goal of this mini-project is to implement belief propagation algorithms for posterior probability inference and most probable explanation (MPE) inference for the Bayesian Network with binary values in which the Conditional Probability Table for each random-variable/node is given.
The Probability that a Matrix of Integers Is DiagonalizableJay Liew
The Probability that a
Matrix of Integers Is Diagonalizable
Andrew J. Hetzel, Jay S. Liew, and Kent E. Morrison
1. INTRODUCTION. It is natural to use integer matrices for examples and exercises
when teaching a linear algebra course, or, for that matter, when writing a textbook in
the subject. After all, integer matrices offer a great deal of algebraic simplicity for particular
problems. This, in turn, lets students focus on the concepts. Of course, to insist
on integer matrices exclusively would certainly give the wrong idea about many important
concepts. For example, integer matrices with integer matrix inverses are quite
rare, although invertible integer matrices (over the rational numbers) are relatively
common. In this article, we focus on the property of diagonalizability for integer matrices
and pose the question of the likelihood that an integer matrix is diagonalizable.
Specifically, we ask: What is the probability that an n × n matrix with integer entries is
diagonalizable over the complex numbers, the real numbers, and the rational numbers,
respectively?
The generation of Gaussian random fields over a physical domain is a challenging problem in computational mathematics, especially when the correlation length is short and the field is rough. The traditional approach is to make use of a truncated Karhunen-Loeve (KL) expansion, but the generation of even a single realisation of the field may then be effectively beyond reach (especially for 3-dimensional domains) if the need is to obtain an expected L2 error of say 5%, because of the potentially very slow convergence of the KL expansion. In this talk, based on joint work with Ivan Graham, Frances Kuo, Dirk Nuyens, and Rob Scheichl, a completely different approach is used, in which the field is initially generated at a regular grid on a 2- or 3-dimensional rectangle that contains the physical domain, and then possibly interpolated to obtain the field at other points. In that case there is no need for any truncation. Rather the main problem becomes the factorisation of a large dense matrix. For this we use circulant embedding and FFT ideas. Quasi-Monte Carlo integration is then used to evaluate the expected value of some functional of the finite-element solution of an elliptic PDE with a random field as input.
This document discusses methods for summarizing Lego-like sphere and torus maps. It begins by introducing the concept of ({a,b},k)-maps, which are k-valent maps with faces of size a or b. It then discusses several challenges in enumerating and drawing such maps, including enumerating all possible Lego decompositions. Specific enumeration methods are described, such as using exact covering problems or satisfiability problems. The document also discusses challenges in graph drawing representations, and suggests using primal-dual circle packings as a promising approach.
This document provides an overview of Independent Components Analysis (ICA). It discusses how ICA can be used to separate mixed signals, like separating different speakers' voices from recordings of a conversation. The document introduces the ICA problem and assumptions, discusses ambiguities in the ICA solution, and presents the Bell-Sejnowski algorithm for performing ICA using maximum likelihood estimation.
The document describes the iterative compression technique for designing fixed-parameter tractable algorithms. It shows how to use a vertex/edge cover, feedback vertex set, or odd cycle transversal of size k+1 as "compression" to design a branching algorithm with running time of O*(ck) for some constant c. This technique involves guessing how an optimal solution of size k interacts with the given size k+1 solution, and branching based on including or excluding vertices in the optimal solution.
The document describes an algorithm for finding a (p,q)-partition of a graph G. A (p,q)-partition partitions the vertices of G into clusters such that each cluster has size at most p and cut size at most q. The algorithm works by first showing that finding a (p,q)-partition can be reduced to finding a (p,q)-cluster for each vertex. It then gives a randomized algorithm for finding a (p,q)-cluster for a given vertex v in time 2O(q)nO(1) by reducing the problem to an instance of the satellite problem, which can be solved in polynomial time. The algorithm works by sampling important cuts in the graph to construct an instance of
A fundamental numerical problem in many sciences is to compute integrals. These integrals can often be expressed as expectations and then approximated by sampling methods. Monte Carlo sampling is very competitive in high dimensions, but has a slow rate of convergence. One reason for this slowness is that the MC points form clusters and gaps. Quasi-Monte Carlo methods greatly reduce such clusters and gaps, and under modest smoothness demands on the integrand they can greatly improve accuracy. This can even take place in problems of surprisingly high dimension. This talk will introduce the basics of QMC and randomized QMC. It will include discrepancy and the Koksma-Hlawka inequality, some digital constructions and some randomized QMC methods that allow error estimation and sometimes bring improved accuracy.
The document discusses algorithms and techniques for analyzing graphs and polynomials. It describes how an adjacency matrix A can be used to count walks and common neighbors in a graph. It also explains how to construct a polynomial that is equal to zero if and only if a graph G contains a path of length k, by creating terms for all walks in G and exploiting the property that a+a=0 in finite fields of characteristic two. This allows evaluating whether the polynomial is zero to determine if G contains the desired path.
This document summarizes key concepts from a PhD dissertation on uncertainty in deep learning:
1) There are two types of uncertainties - epistemic uncertainty from lack of knowledge that decreases with more data, and aleatoric uncertainty from inherent noise that cannot be reduced. Deep learning models need to estimate both to provide predictive uncertainty.
2) Variational inference allows approximating intractable Bayesian posteriors by minimizing the KL divergence between an approximating distribution and the true posterior. Dropout can be seen as a Bayesian approximation where weights follow a Bernoulli distribution.
3) With dropout as a variational distribution, predictive uncertainty in regression is estimated from multiple stochastic forward passes, with aleatoric uncertainty from noise and epistem
The document discusses the theory of NP-completeness. It begins by defining the complexity classes P, NP, NP-hard, and NP-complete. It then explains the concepts of reduction and how none of the NP-complete problems can be solved in polynomial time deterministically. The document provides examples of NP-complete problems like satisfiability (SAT), vertex cover, and the traveling salesman problem. It shows how nondeterministic algorithms can solve these problems and how they can be transformed into SAT instances. Finally, it proves that SAT is the first NP-complete problem by showing it is in NP and NP-hard.
This document discusses different geometric structures and distances that can be used for clustering probability distributions that live on the probability simplex. It reviews four main geometries: Fisher-Rao Riemannian geometry based on the Fisher information metric, information geometry based on Kullback-Leibler divergence, total variation distance and l1-norm geometry, and Hilbert projective geometry based on the Hilbert metric. It compares how k-means clustering performs using distances derived from these different geometries on the probability simplex.
In this talk, we give an overview of results on numerical integration in Hermite spaces. These spaces contain functions defined on $\mathbb{R}^d$, and can be characterized by the decay of their Hermite coefficients. We consider the case of exponentially as well as polynomially decaying Hermite coefficients. For numerical integration, we either use Gauss-Hermite quadrature rules or algorithms based on quasi-Monte Carlo rules. We present upper and lower error bounds for these algorithms, and discuss their dependence on the dimension $d$. Furthermore, we comment on open problems for future research.
The document discusses uncertainty quantification (UQ) using quasi-Monte Carlo (QMC) integration methods. It introduces parametric operator equations for modeling input uncertainty in partial differential equations. Both forward and inverse UQ problems are considered. QMC methods like interlaced polynomial lattice rules are discussed for approximating high-dimensional integrals arising in UQ, with convergence rates superior to standard Monte Carlo. Algorithms for single-level and multilevel QMC are presented for solving forward and inverse UQ problems.
Accelerating Metropolis Hastings with Lightweight Inference CompilationFeynman Liang
This document summarizes research on accelerating Metropolis-Hastings sampling with lightweight inference compilation. It discusses background on probabilistic programming languages and Bayesian inference techniques like variational inference and sequential importance sampling. It introduces the concept of inference compilation, where a neural network is trained to construct proposals for MCMC that better match the posterior. The paper proposes a lightweight approach to inference compilation for imperative probabilistic programs that trains proposals conditioned on execution prefixes to address issues with sequential importance sampling.
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...zukun
The document discusses using divergence measures like the Jensen-Shannon divergence to align multiple point sets represented as probability density functions. It motivates using the JS divergence by modeling point sets as mixtures of density functions, and shows how the likelihood ratio between models leads to the JS divergence. It then formulates the problem of group-wise point set registration as minimizing the JS divergence between density functions, combined with a regularization term. Experimental results on aligning multiple 3D hippocampus point sets are also presented.
The document discusses achieving higher-order convergence for integration on RN using quasi-Monte Carlo (QMC) rules. It describes the problem that when using tensor product QMC rules on truncated domains, the convergence rate scales with the dimension s as (α log N)sN-α. The goal is to obtain a convergence rate independent of the dimension s. The document proposes using a multivariate decomposition method (MDM) to decompose an infinite-dimensional integral into a sum of finite-dimensional integrals, then applying QMC rules to each integral to achieve the desired higher-order convergence rate.
1. Representation theory studies how algebraic structures like groups, algebras, and Lie algebras can be represented by linear transformations on vector spaces. Quiver representations assign vector spaces to vertices and linear maps to arrows of a quiver.
2. Hall algebras were introduced to study representations of quivers. The document outlines representation theory, quivers, Hall algebras, and connections between quivers, Lie theory, and quantum groups.
3. Representation theory has applications in many areas of mathematics including algebra, analysis, algebraic geometry, and topology. Dynkin diagrams classify semisimple Lie algebras and Kac-Moody algebras. Quantum groups are quantized enveloping algebras generalizing the structure of universal enveloping algebras.
X2 T01 09 geometrical representation of complex numbersNigel Simmons
The document discusses representing complex numbers geometrically using vectors on the Argand diagram. It explains that complex numbers can be represented as vectors, with the advantage that vectors can be moved around without changing their modulus (length) or argument (angle from the x-axis). It describes how to perform addition and subtraction of complex numbers by placing the vectors head to tail or head to head, respectively, and discusses properties like the parallelogram law.
This document summarizes the concept of bidimensionality and how it can be used to design subexponential algorithms for graph problems on planar and other graph classes. It discusses how bidimensionality can be defined for parameters that are closed under minors or contractions by relating their behavior on grid graphs. It presents examples like vertex cover and dominating set that are bidimensional. It also discusses how bidimensionality can be extended to bounded genus graphs and H-minor free graphs using grid-minor/contraction theorems.
Relaxation methods for the matrix exponential on large networksDavid Gleich
My talk from the Stanford ICME seminar series on doing network analysis and link prediction using the a fast algorithm for the matrix exponential on graph problems.
This document discusses Bayesian inference on mixtures models. It covers several key topics:
1. Density approximation and consistency results for mixtures as a way to approximate unknown distributions.
2. The "scarcity phenomenon" where the posterior probabilities of most component allocations in mixture models are zero, concentrating on just a few high probability allocations.
3. Challenges with Bayesian inference for mixtures, including identifiability issues, label switching, and complex combinatorial calculations required to integrate over all possible component allocations.
Tutorial on Belief Propagation in Bayesian NetworksAnmol Dwivedi
The goal of this mini-project is to implement belief propagation algorithms for posterior probability inference and most probable explanation (MPE) inference for the Bayesian Network with binary values in which the Conditional Probability Table for each random-variable/node is given.
The Probability that a Matrix of Integers Is DiagonalizableJay Liew
The Probability that a
Matrix of Integers Is Diagonalizable
Andrew J. Hetzel, Jay S. Liew, and Kent E. Morrison
1. INTRODUCTION. It is natural to use integer matrices for examples and exercises
when teaching a linear algebra course, or, for that matter, when writing a textbook in
the subject. After all, integer matrices offer a great deal of algebraic simplicity for particular
problems. This, in turn, lets students focus on the concepts. Of course, to insist
on integer matrices exclusively would certainly give the wrong idea about many important
concepts. For example, integer matrices with integer matrix inverses are quite
rare, although invertible integer matrices (over the rational numbers) are relatively
common. In this article, we focus on the property of diagonalizability for integer matrices
and pose the question of the likelihood that an integer matrix is diagonalizable.
Specifically, we ask: What is the probability that an n × n matrix with integer entries is
diagonalizable over the complex numbers, the real numbers, and the rational numbers,
respectively?
The generation of Gaussian random fields over a physical domain is a challenging problem in computational mathematics, especially when the correlation length is short and the field is rough. The traditional approach is to make use of a truncated Karhunen-Loeve (KL) expansion, but the generation of even a single realisation of the field may then be effectively beyond reach (especially for 3-dimensional domains) if the need is to obtain an expected L2 error of say 5%, because of the potentially very slow convergence of the KL expansion. In this talk, based on joint work with Ivan Graham, Frances Kuo, Dirk Nuyens, and Rob Scheichl, a completely different approach is used, in which the field is initially generated at a regular grid on a 2- or 3-dimensional rectangle that contains the physical domain, and then possibly interpolated to obtain the field at other points. In that case there is no need for any truncation. Rather the main problem becomes the factorisation of a large dense matrix. For this we use circulant embedding and FFT ideas. Quasi-Monte Carlo integration is then used to evaluate the expected value of some functional of the finite-element solution of an elliptic PDE with a random field as input.
This document discusses methods for summarizing Lego-like sphere and torus maps. It begins by introducing the concept of ({a,b},k)-maps, which are k-valent maps with faces of size a or b. It then discusses several challenges in enumerating and drawing such maps, including enumerating all possible Lego decompositions. Specific enumeration methods are described, such as using exact covering problems or satisfiability problems. The document also discusses challenges in graph drawing representations, and suggests using primal-dual circle packings as a promising approach.
This document provides an overview of Independent Components Analysis (ICA). It discusses how ICA can be used to separate mixed signals, like separating different speakers' voices from recordings of a conversation. The document introduces the ICA problem and assumptions, discusses ambiguities in the ICA solution, and presents the Bell-Sejnowski algorithm for performing ICA using maximum likelihood estimation.
The document describes the iterative compression technique for designing fixed-parameter tractable algorithms. It shows how to use a vertex/edge cover, feedback vertex set, or odd cycle transversal of size k+1 as "compression" to design a branching algorithm with running time of O*(ck) for some constant c. This technique involves guessing how an optimal solution of size k interacts with the given size k+1 solution, and branching based on including or excluding vertices in the optimal solution.
The document describes an algorithm for finding a (p,q)-partition of a graph G. A (p,q)-partition partitions the vertices of G into clusters such that each cluster has size at most p and cut size at most q. The algorithm works by first showing that finding a (p,q)-partition can be reduced to finding a (p,q)-cluster for each vertex. It then gives a randomized algorithm for finding a (p,q)-cluster for a given vertex v in time 2O(q)nO(1) by reducing the problem to an instance of the satellite problem, which can be solved in polynomial time. The algorithm works by sampling important cuts in the graph to construct an instance of
A fundamental numerical problem in many sciences is to compute integrals. These integrals can often be expressed as expectations and then approximated by sampling methods. Monte Carlo sampling is very competitive in high dimensions, but has a slow rate of convergence. One reason for this slowness is that the MC points form clusters and gaps. Quasi-Monte Carlo methods greatly reduce such clusters and gaps, and under modest smoothness demands on the integrand they can greatly improve accuracy. This can even take place in problems of surprisingly high dimension. This talk will introduce the basics of QMC and randomized QMC. It will include discrepancy and the Koksma-Hlawka inequality, some digital constructions and some randomized QMC methods that allow error estimation and sometimes bring improved accuracy.
The document discusses algorithms and techniques for analyzing graphs and polynomials. It describes how an adjacency matrix A can be used to count walks and common neighbors in a graph. It also explains how to construct a polynomial that is equal to zero if and only if a graph G contains a path of length k, by creating terms for all walks in G and exploiting the property that a+a=0 in finite fields of characteristic two. This allows evaluating whether the polynomial is zero to determine if G contains the desired path.
This document summarizes key concepts from a PhD dissertation on uncertainty in deep learning:
1) There are two types of uncertainties - epistemic uncertainty from lack of knowledge that decreases with more data, and aleatoric uncertainty from inherent noise that cannot be reduced. Deep learning models need to estimate both to provide predictive uncertainty.
2) Variational inference allows approximating intractable Bayesian posteriors by minimizing the KL divergence between an approximating distribution and the true posterior. Dropout can be seen as a Bayesian approximation where weights follow a Bernoulli distribution.
3) With dropout as a variational distribution, predictive uncertainty in regression is estimated from multiple stochastic forward passes, with aleatoric uncertainty from noise and epistem
The document discusses the theory of NP-completeness. It begins by defining the complexity classes P, NP, NP-hard, and NP-complete. It then explains the concepts of reduction and how none of the NP-complete problems can be solved in polynomial time deterministically. The document provides examples of NP-complete problems like satisfiability (SAT), vertex cover, and the traveling salesman problem. It shows how nondeterministic algorithms can solve these problems and how they can be transformed into SAT instances. Finally, it proves that SAT is the first NP-complete problem by showing it is in NP and NP-hard.
This document discusses different geometric structures and distances that can be used for clustering probability distributions that live on the probability simplex. It reviews four main geometries: Fisher-Rao Riemannian geometry based on the Fisher information metric, information geometry based on Kullback-Leibler divergence, total variation distance and l1-norm geometry, and Hilbert projective geometry based on the Hilbert metric. It compares how k-means clustering performs using distances derived from these different geometries on the probability simplex.
In this talk, we give an overview of results on numerical integration in Hermite spaces. These spaces contain functions defined on $\mathbb{R}^d$, and can be characterized by the decay of their Hermite coefficients. We consider the case of exponentially as well as polynomially decaying Hermite coefficients. For numerical integration, we either use Gauss-Hermite quadrature rules or algorithms based on quasi-Monte Carlo rules. We present upper and lower error bounds for these algorithms, and discuss their dependence on the dimension $d$. Furthermore, we comment on open problems for future research.
The document discusses uncertainty quantification (UQ) using quasi-Monte Carlo (QMC) integration methods. It introduces parametric operator equations for modeling input uncertainty in partial differential equations. Both forward and inverse UQ problems are considered. QMC methods like interlaced polynomial lattice rules are discussed for approximating high-dimensional integrals arising in UQ, with convergence rates superior to standard Monte Carlo. Algorithms for single-level and multilevel QMC are presented for solving forward and inverse UQ problems.
Accelerating Metropolis Hastings with Lightweight Inference CompilationFeynman Liang
This document summarizes research on accelerating Metropolis-Hastings sampling with lightweight inference compilation. It discusses background on probabilistic programming languages and Bayesian inference techniques like variational inference and sequential importance sampling. It introduces the concept of inference compilation, where a neural network is trained to construct proposals for MCMC that better match the posterior. The paper proposes a lightweight approach to inference compilation for imperative probabilistic programs that trains proposals conditioned on execution prefixes to address issues with sequential importance sampling.
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...zukun
The document discusses using divergence measures like the Jensen-Shannon divergence to align multiple point sets represented as probability density functions. It motivates using the JS divergence by modeling point sets as mixtures of density functions, and shows how the likelihood ratio between models leads to the JS divergence. It then formulates the problem of group-wise point set registration as minimizing the JS divergence between density functions, combined with a regularization term. Experimental results on aligning multiple 3D hippocampus point sets are also presented.
The document discusses achieving higher-order convergence for integration on RN using quasi-Monte Carlo (QMC) rules. It describes the problem that when using tensor product QMC rules on truncated domains, the convergence rate scales with the dimension s as (α log N)sN-α. The goal is to obtain a convergence rate independent of the dimension s. The document proposes using a multivariate decomposition method (MDM) to decompose an infinite-dimensional integral into a sum of finite-dimensional integrals, then applying QMC rules to each integral to achieve the desired higher-order convergence rate.
1. Representation theory studies how algebraic structures like groups, algebras, and Lie algebras can be represented by linear transformations on vector spaces. Quiver representations assign vector spaces to vertices and linear maps to arrows of a quiver.
2. Hall algebras were introduced to study representations of quivers. The document outlines representation theory, quivers, Hall algebras, and connections between quivers, Lie theory, and quantum groups.
3. Representation theory has applications in many areas of mathematics including algebra, analysis, algebraic geometry, and topology. Dynkin diagrams classify semisimple Lie algebras and Kac-Moody algebras. Quantum groups are quantized enveloping algebras generalizing the structure of universal enveloping algebras.
X2 T01 09 geometrical representation of complex numbersNigel Simmons
The document discusses representing complex numbers geometrically using vectors on the Argand diagram. It explains that complex numbers can be represented as vectors, with the advantage that vectors can be moved around without changing their modulus (length) or argument (angle from the x-axis). It describes how to perform addition and subtraction of complex numbers by placing the vectors head to tail or head to head, respectively, and discusses properties like the parallelogram law.
This document summarizes the concept of bidimensionality and how it can be used to design subexponential algorithms for graph problems on planar and other graph classes. It discusses how bidimensionality can be defined for parameters that are closed under minors or contractions by relating their behavior on grid graphs. It presents examples like vertex cover and dominating set that are bidimensional. It also discusses how bidimensionality can be extended to bounded genus graphs and H-minor free graphs using grid-minor/contraction theorems.
Relaxation methods for the matrix exponential on large networksDavid Gleich
My talk from the Stanford ICME seminar series on doing network analysis and link prediction using the a fast algorithm for the matrix exponential on graph problems.
This document discusses Bayesian inference on mixtures models. It covers several key topics:
1. Density approximation and consistency results for mixtures as a way to approximate unknown distributions.
2. The "scarcity phenomenon" where the posterior probabilities of most component allocations in mixture models are zero, concentrating on just a few high probability allocations.
3. Challenges with Bayesian inference for mixtures, including identifiability issues, label switching, and complex combinatorial calculations required to integrate over all possible component allocations.
Need for Controllers having Integer Coefficients in Homomorphically Encrypted D...CDSL_at_SNU
The document discusses the problem of finite operation times for controllers with non-integer coefficients in homomorphically encrypted dynamic systems. It proposes converting non-integer coefficient controllers to integer coefficient controllers through optimization to avoid issues from repeated operations on ciphertexts. Simulation results show the encrypted controller with integer coefficients can operate for an infinite time horizon, unlike previous non-integer approaches.
Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...Frank Nielsen
The document defines total Jensen divergences, which are a generalization of total Bregman divergences. Total Jensen divergences incorporate a double-sided conformal factor that makes them invariant to rotations. They reduce to total Bregman divergences when distributions are close. The square root of the total Jensen-Shannon divergence is not a metric. Jensen centroids are not always robust. However, total Jensen k-means++ clustering does not require calculating centroids and provides approximation guarantees.
The document discusses the planted clique problem in graph theory. It introduces the problem and describes how previous research has found polynomial-time algorithms to solve the problem when the size of the planted clique k is O(√n). The document then summarizes two algorithms - Kucera's algorithm and the Low Degree Removal (LDR) algorithm - that have been used to approach the problem. It describes implementing the algorithms in a C++ program to simulate random graphs with planted cliques and test the ability of the algorithms to recover the planted clique.
Pattern learning and recognition on statistical manifolds: An information-geo...Frank Nielsen
This document provides an overview of Frank Nielsen's talk on pattern learning and recognition using information geometry and statistical manifolds. The talk focuses on departing from vector space representations and dealing with (dis)similarities that do not have Euclidean or metric properties. This poses new theoretical and computational challenges for pattern recognition. The talk describes using exponential family mixture models defined on dually flat statistical manifolds induced by convex functions. On these manifolds, dual coordinate systems and dual affine geodesics allow for computing-friendly representations of divergences and similarities between probabilistic patterns. The techniques aim to achieve statistical invariance and enable algorithmic approaches to problems like Gaussian mixture modeling, shape retrieval, and diffusion tensor imaging analysis.
Inference for stochastic differential equations via approximate Bayesian comp...Umberto Picchini
Despite the title the methods are appropriate for more general dynamical models (including state-space models). Presentation given at Nordstat 2012, Umeå. Relevant research paper at http://arxiv.org/abs/1204.5459 and software code at https://sourceforge.net/projects/abc-sde/
This document discusses various methods for estimating normalizing constants that arise when evaluating integrals numerically. It begins by noting there are many computational methods for approximating normalizing constants across different communities. It then lists the topics that will be covered in the upcoming workshop, including discussions on estimating constants using Monte Carlo methods and Bayesian versus frequentist approaches. The document provides examples of estimating normalizing constants using Monte Carlo integration, reverse logistic regression, and Xiao-Li Meng's maximum likelihood estimation approach. It concludes by discussing some of the challenges in bringing a statistical framework to constant estimation problems.
This document contains information about data structures and algorithms taught at KTH Royal Institute of Technology. It includes code templates for a contest, descriptions and implementations of common data structures like an order statistic tree and hash map, as well as summaries of mathematical and algorithmic concepts like trigonometry, probability theory, and Markov chains.
I am Falid B. I am a Mathematical Statistics Assignment Help Expert at statisticshomeworkhelper.com. I hold a Master's in Statistics, from George Town, Malaysia.I have been helping students with their assignment for the past 6 years. I solve assignments related to Mathematical Statistics.
Visit statisticshomeworkhelper.com or email info@statisticshomeworkhelper.com.You can also call on +1 678 648 4277 for any assistance with Mathematical Statistics Assignment.
Darmon Points for fields of mixed signaturemmasdeu
This document summarizes a talk on constructing Darmon points for elliptic curves over number fields of mixed signature. It begins with a brief history of the problem and previous work. It then provides an overview of the speaker's general construction, which applies to base number fields of arbitrary signature and unifies previous archimedean and non-archimedean approaches. The talk aims to present the construction in detail, illustrate it with a new example, and provide numerical evidence for related conjectures.
This document summarizes part of a lecture on factor analysis from Andrew Ng's CS229 course. It begins by reviewing maximum likelihood estimation of Gaussian distributions and its issues when the number of data points n is smaller than the dimension d. It then introduces the factor analysis model, which models data x as coming from a latent lower-dimensional variable z through x = μ + Λz + ε, where ε is Gaussian noise. The EM algorithm is derived for estimating the parameters of this model.
Stratified sampling and resampling for approximate Bayesian computationUmberto Picchini
Stratified Monte Carlo is proposed as a method to accelerate ABC-MCMC by reducing its computational cost. It involves partitioning the summary statistic space into strata and estimating the ABC likelihood using a stratified Monte Carlo approach based on resampling. This reduces the variance compared to using a single resampled dataset, without introducing significant bias as resampling alone would. The method is tested on a simple Gaussian example where it provides a posterior approximation closer to the true posterior than standard ABC-MCMC.
A Family Of Extragradient Methods For Solving Equilibrium ProblemsYasmine Anino
The document discusses using variational inequalities and bilevel programming models to analyze the optimal pollution emission price problem. Specifically, it presents a continuous-time central planning model where the government chooses the optimal price of pollution emissions considering how manufacturers in a supply chain will respond to the price. The lower-level problem involves the manufacturers determining their optimal production levels given the emission price, while the upper-level problem involves the government selecting the price to maximize social welfare. Existence of solutions is analyzed using variational inequality theory.
I am Falid B. I am a Mathematical Statistics Assignment Expert at excelhomeworkhelp.com. I hold a Master's in Statistics, from George Town, Malaysia. I have been helping students with their assignments for the past 6 years. I solved an assignment related to Mathematical Statistics.
Visit excelhomeworkhelp.com or email info@excelhomeworkhelp.com. You can also call on +1 678 648 4277 for any assistance with Mathematical Statistics Assignment.
This document proposes a method for linear regression on symbolic data where each observation is represented by a Gaussian distribution. It derives the likelihood function for such "Gaussian symbols" and shows that it can be maximized using gradient descent. Simulation results demonstrate that the maximum likelihood estimator performs better than a naive least squares regression on the mean of each symbol. The method extends classical linear regression to the symbolic data setting.
Darmon Points for fields of mixed signaturemmasdeu
This document discusses Darmon points for fields of mixed signature. It begins by reviewing the history of constructing Darmon points, both in the non-archimedean and archimedean cases. It then outlines the goals of the talk, which are to sketch a general construction of Darmon points, give details of the construction, explain algorithmic challenges, and illustrate with examples. The document provides context on Darmon points and their relation to elliptic curves, number fields, and conjectures like the Birch and Swinnerton-Dyer conjecture.
This document summarizes numerical methods for solving stochastic partial differential equations (SPDEs) driven by Lévy jump processes. It discusses both probabilistic methods like Monte Carlo (MC) and probabilistic collocation method (PCM), as well as deterministic methods based on solving the generalized Fokker-Planck equation. Specific examples discussed include an overdamped Langevin equation driven by a 1D tempered alpha-stable process, and diffusion equations driven by multi-dimensional jump processes using different dependence structures. The document compares the accuracy and efficiency of MC/PCM versus solving the tempered fractional Fokker-Planck equation directly. It also discusses how to represent SPDEs with additive multi-dimensional Lévy
- The document discusses multivariate statistical analysis techniques including principal component analysis (PCA) and factor analysis.
- PCA involves identifying linear combinations of original variables that maximize variance and are uncorrelated. The first principal component explains the most variance, followed by subsequent components.
- PCA transforms the data to a new coordinate system defined by the eigenvectors of the covariance matrix to extract important information from the data in a lower dimensional representation.
The document discusses computational models for algebraic decision trees and algebraic computation trees over a ground field F. It describes how algebraic decision trees use polynomials of degree ≤ d to branch at each node, while algebraic computation trees allow testing polynomials to be calculated from previous polynomials along the path. The document then covers existing lower bounds on the complexity C(S) of the membership problem for a set S in terms of topological invariants of S, such as the number of connected components, Euler characteristic, and sum of Betti numbers.
The document discusses recognizing sparse perfect elimination bipartite graphs. It begins with an example of Gaussian elimination on a matrix that introduces new non-zero values. The key points are that perfect elimination bipartite graphs correspond to matrices that can be eliminated without creating new non-zeros, and this can be achieved by finding a sequence of bisimplicial edges in the corresponding bipartite graph. The document proposes using bisimplicial edges as pivots during elimination to avoid introducing new non-zeros.
The document discusses recognizing sparse perfect elimination bipartite graphs through matrix elimination. It provides an example of Gaussian elimination on a matrix that introduces new non-zero values. The key points are:
- Perfect elimination bipartite graphs correspond to matrices that allow elimination without creating new non-zeros.
- Existing algorithms have time complexity of O(n^5) or O(n^3/log n) but may produce dense matrices from sparse ones.
- A new algorithm is proposed that avoids this issue by working directly with the sparse matrix structure.
The document summarizes research on multiple-conclusion calculi for first-order Gödel logic. It introduces Gödel logic and describes its semantics using both many-valued semantics based on truth values in the interval [0,1] and Kripke-style semantics. It then outlines proof theory for Gödel logic, including early sequent calculi and more recent hypersequent calculi. The hypersequent calculus introduced in 1991 uses standard rules and has been extended to the first-order case. The document provides details on the structural and logical rules of this single-conclusion hypersequent system.
The document summarizes a talk on polynomial identity testing (PIT). PIT is the problem of determining if a polynomial computed by an arithmetic circuit is identical to the zero polynomial. The talk outlines the definition of PIT, its connection to circuit lower bounds, and surveys positive results for restricted circuit classes. It also provides examples of proof techniques for PIT on depth-3 and depth-4 circuits and discusses the relationship between PIT and polynomial factorization.
This document summarizes an algorithm for maximizing throughput in online scheduling of equal length jobs. The algorithm aims to schedule incoming jobs with the goal of maximizing total value of completed jobs by their deadlines. It uses a charging scheme and potential function to prove it is (2+√5)-competitive, an improvement over prior algorithms. The algorithm handles jobs arriving online with weights, processing times, deadlines, and considers models where preemption allows restarting or resuming previously completed work. Open questions remain around settling the exact competitive ratio and developing new algorithmic methods.
The document discusses efficient algorithms for performing approximate matching queries on strings that have been grammar-compressed. It introduces the concept of implicit unit-Monge matrices which can represent permutation matrices in a space-efficient way using a range tree data structure. This representation allows dominance counting queries, needed for string comparison, to be performed in O(log2 n) time after an O(n log n) preprocessing step. More advanced data structures can improve these asymptotic time and space bounds further.
This document presents an overview of the consensus problem from an informal and formal perspective. It discusses how consensus requires representativity, where the decision reflects a sufficient number of individual opinions, and stability, where the decision is robust to individual opinion variations. It also presents some key formalizations, including defining consensus as a function from the set of sensor inputs and memory states to decisions. It introduces the concept of a geodesic to measure stability as the maximum number of state transitions needed to return to the starting configuration along a trajectory where each sensor changes at most once.
This document summarizes research on the combinatorial properties of Burrows-Wheeler Transforms (BWT). It discusses prior work that characterized words with simple BWT image forms. It also introduces two general decision problems about BWT images and claims to provide efficient solutions to these problems. Specifically, it presents a theorem providing a criterion to check whether a given word is a valid BWT image based on analyzing the number of orbits in the word's stable sorting.
The document presents a polynomial-time algorithm for finding a minimal conflicting set of rows (MCSR) in a binary matrix that contains a given row. It defines MCSR as a set of rows that does not have the consecutive ones property but where any proper subset does have the property. The algorithm works by representing the binary matrix as a vertex-colored bipartite graph and detecting forbidden substructures called Tucker configurations that characterize when the consecutive ones property does not hold. It finds an MCSR containing the given row by pruning rows from the graph until a Tucker configuration exists using the current set but not with any proper subset.
The document discusses locally decodable codes, which allow recovery of individual data symbols from a coded data set even after erasures. Reed-Muller codes and multiplicity codes were early constructions that provided locality but only up to a rate of 0.5. Matching vector codes were later introduced and can achieve locality r for codes of positive rate and length n=O(r^2). However, the optimal tradeoff between rate, length, and locality remains an open problem.
The document discusses locally decodable codes, which allow recovery of individual data symbols from a coded data set even after erasures. Reed-Muller codes and multiplicity codes were early constructions that provided locality but only up to a rate of 0.5. Matching vector codes were later introduced and can achieve locality r for codes of positive rate and length n=O(r^2). However, the optimal tradeoff between rate, length, and locality remains an open problem.
This document discusses the relationships between orbits of linear maps and regular languages. It shows that the chamber hitting problem (CHP) and permutation filter realizability problem are Turing equivalent. It also shows that the injective filter and surjective filter realizability problems are decidable by reducing them to problems about orbits. However, the regular realizability problem for the track product of the periodic and permutation filters is undecidable, as it can reduce the undecidable zero in the upper right corner problem.
The document summarizes precedence automata and languages. It provides historical background on operator precedence grammars and Floyd languages. It then discusses how precedence parsing works using an example arithmetic expression. Key points include using a precedence table to determine parentheses insertion and defining three types of moves for an automata model based on symbol precedence: push, mark, and flush. The example demonstrates the automata processing a Dyck language expression.
The document discusses the constraint satisfaction problem (CSP) and the dichotomy conjecture regarding the complexity of CSP instances. It provides definitions and examples of CSPs. It explains the role of polymorphisms in determining the complexity, identifying semilattice, majority and affine polymorphisms as "good". It outlines the dichotomy conjecture that CSPs are either solvable in polynomial time or NP-complete depending on the presence of certain types of local structure defined by polymorphisms. The document also discusses algorithms and results for various constraint languages.
This document describes a Synchronized Alternating Pushdown Automaton (SAPDA) that accepts the language of reduplication with a center marker (RCM). The SAPDA utilizes recursive conjunctive transitions to check that the nth letter before the center marker '$' is the same as the nth letter from the end of the string, for all letters n. This allows the SAPDA to accept strings of the form w$w, where w is any string over the alphabet {a,b}. The construction of the SAPDA involves states that check specific letters at specific positions relative to the center marker.
The document discusses the constraint satisfaction problem (CSP) and the dichotomy conjecture in computational complexity theory. It defines CSP and provides examples. It discusses the role of polymorphisms - operations that preserve constraints. The presence or absence of certain polymorphisms like semilattice, majority, and affine operations determines the complexity of CSP for a given constraint language. The document outlines a proposed dichotomy - CSP is either solvable in polynomial time or NP-complete, depending on the polymorphisms. It surveys partial results proving this conjecture and algorithms for certain constraint languages.
The document discusses shared-memory systems and charts. It provides definitions and concepts related to modeling shared-memory concurrency using partial orders of events called pomsets. Specifically, it defines:
- Shared-memory systems as consisting of registers, data, processes, actions, and rules for updating configurations.
- Pomsets as labeled partial orders used to model executions.
- The may-occur-concurrently relation for rules in a shared-memory system.
- Partial-order semantics for runs of pomsets in a shared-memory system.
- Shared-memory charts (SMCs) as pomsets with gates used to model specifications.
This document discusses the relationships between orbits of linear maps and regular languages. It shows that the chamber hitting problem (CHP) and permutation filter-realizability problem are Turing equivalent. It also shows that the injective filter-realizability problem and surjective filter-realizability problem are decidable, while the track product of the periodic and permutation filter-realizability problem is undecidable. The zero in the upper right corner problem, which is undecidable, can be reduced to the latter regular realizability problem.
3. Z. Dvir, S. Kopparty, S. Saraf ‘09June 13-18, 2011 Mutliplicities @ CSR TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA 1
4. Agenda A technique for combinatorics, via algebra: Polynomial (Interpolation) Method + Multiplicity method List-decoding of Reed-Solomon Codes Bounding size of Kakeya Sets Extractor constructions (won’t cover) Locally decodable codes June 13-18, 2011 2 Mutliplicities @ CSR
5. Part I: Decoding Reed-Solomon Codes Reed-Solomon Codes: Commonly used codes to store information (on CDs, DVDs etc.) Message: C0, C1, …, Cdє F (finite field) Encoding: View message as polynomial: M(x) = i=0dCixi Encoding = evaluations: { M(®) }_{ ®є F } Decoding Problem: Given: (x1,y1) … (xn,yn) є F x F; integers t,d; Find: deg. dpoly through t of the n points. June 13-18, 2011 3 Mutliplicities @ CSR
6. List-decoding? If #errors (n-t) very large, then several polynomials may agree with t of n points. List-decoding problem: Report all such polynomials. Combinatorial obstacle: There may be too many such polynomials. Hope – can’t happen. To analyze: Focus on polynomials P1,…, PLand set of agreements S1 … SL. Combinatorial question: Can S1, … SLbe large, while n = | [jSj | is small? June 13-18, 2011 4 Mutliplicities @ CSR
7. List-decoding of Reed-Solomon codes Given L polynomials P1,…,PLof degree d;and sets S1,…,SL½ F £ F s.t. |Si| = t Si½ {(x,Pi(x)) | x 2 F} How small can n = |S| be, where S = [iSi ? Algebraic analysis from [S. ‘96, GuruswamiS ’98] basis of decoding algorithms. June 13-18, 2011 5 Mutliplicities @ CSR
8. List-decoding analysis [S ‘96] Construct Q(x,y) ≠ 0 s.t. Degy(Q) < L Degx(Q) < n/L Q(x,y) = 0 for every (x,y) 2 S = [iSi Can Show: Such a Q exists (interpolation/counting). Implies: t > n/L + dL) (y – Pi(x)) | Q Conclude: n ¸ L¢ (t – dL). (Can be proved combinatorially also; using inclusion-exclusion) If L > t/(2d), yield n ¸t2/(4d) June 13-18, 2011 6 Mutliplicities @ CSR
9. Focus: The Polynomial Method To analyze size of “algebraically nice” set S: Find polynomial Q vanishing on S; (Can prove existence of Q by counting coefficients … degree Q grows with |S|.) Use “algebraic niceness” of S to prove Q vanishes at other places as well. (In our case whenever y = Pi(x) ). Conclude Q zero too often (unless S large). … (abstraction based on [Dvir]’s work) June 13-18, 2011 7 Mutliplicities @ CSR
10. Improved L-D. Analysis [G.+S. ‘98] Can we improve on the inclusion-exclusion bound? One that works when n > t2/(4d)? Idea: Try fitting a polynomial Q that passesthrough each point with “multiplicity” 2. Can find with Degy < L, Degx < 3n/L. If 2t > 3n/L + dLthen (y-Pi(x)) | Q. Yields n ¸ (L/3).(2t – dL) If L>t/d, then n ¸t2/(3d). Optimizing Q; letting mult. 1, get n ¸t2/d June 13-18, 2011 8 Mutliplicities @ CSR
11. Aside: Is the factor of 2 important? Results in some improvement in [GS] (allowed us to improve list-decoding for codes of high rate) … But crucial to subsequent work [Guruswami-Rudra] construction of rate-optimal codes: Couldn’t afford to lose this factor of 2 (or any constant > 1). June 13-18, 2011 9 Mutliplicities @ CSR
12. Focus: The Multiplicity Method To analyze size of “algebraically nice” set S: Find poly Q zero on S (w. high multiplicity); (Can prove existence of Q by counting coefficients … degree Q grows with |S|.) Use “algebraic niceness” of S to prove Q vanishes at other places as well. (In our case whenever y = Pi(x) ). Conclude Q zero too often (unless S large). June 13-18, 2011 10 Mutliplicities @ CSR
13. Multiplicity = ? Over reals: Q(x,y) has root of multiplicity m+1 at (a,b) if every partial derivative of order up to m vanishes at 0. Over finite fields? Derivatives don’t work; but “Hasse derivatives” do. What are these? Later… There are {m+n choose n} such derivatives, for n-variate polynomials; Each is a linear function of coefficients of f. June 13-18, 2011 11 Mutliplicities @ CSR
14. Part II: Kakeya Sets June 13-18, 2011 12 Mutliplicities @ CSR
15. Kakeya Sets K ½Fn is a Kakeya set if it has a line in every direction. I.e., 8 y 2Fn9 x 2Fns.t. {x + t.y | t 2 F} ½ K F is a field (could be Reals, Rationals, Finite). Our Interest: F = Fq(finite field of cardinality q). Lower bounds. Simple/Obvious: qn/2· K ·qn Do better? Mostly open till [Dvir 2008]. June 13-18, 2011 13 Mutliplicities @ CSR
16. Kakeya Set analysis [Dvir ‘08] Find Q(x1,…,xn) ≠ 0 s.t. Total deg. of Q < q (let deg. = d) Q(x) = 0 for every x 2 K. (exists if |K| < qn/n!) Prove that (homogenous deg. d part of) Q vanishes on y, if there exists a line in direction y that is contained in K. Line L ½ K )Q|L = 0. Highest degree coefficient of Q|L is homogenous part of Q evaluated at y. Conclude: homogenous part of Q = 0. ><. Yields |K| ¸qn/n!. June 13-18, 2011 14 Mutliplicities @ CSR
17. Multiplicities in Kakeya [Saraf, S ’08] Fit Q that vanishes often? Good choice: #multiplicity m = n Can find Q ≠ 0 of individual degree < q,that vanishes at each point in K with multiplicity n, provided |K| 4n < qn Q|Lis of degree < qn. But it vanishes with multiplicity n at q points! So it is identically zero ) its highest degree coeff. is zero. >< Conclude: |K| ¸ (q/4)n June 13-18, 2011 15 Mutliplicities @ CSR
18. Comparing the bounds Simple: |K| ¸qn/2 [Dvir]: |K| ¸qn/n! [SS]: |K| ¸qn/4n [SS] improves Simple even when q (large) constant and n 1 (in particular, allows q < n) [MockenhauptTao, Dvir]: 9 K s.t. |K| ·qn/2n-1 + O(qn-1) Can we do even better? June 13-18, 2011 16 Mutliplicities @ CSR
19. Part III: Randomness Mergers & Extractors June 13-18, 2011 17 Mutliplicities @ CSR
20. Context One of the motivations for Dvir’s work: Build better “randomness extractors” Approach proposed in [Dvir-Shpilka] Following [Dvir] , new “randomness merger” and analysis given by [Dvir-Wigderson] Led to “extractors” matching known constructions, but not improving them … What are Extractors? Mergers? … can we improve them? June 13-18, 2011 18 Mutliplicities @ CSR
21. Randomness in Computation Readily available randomness Support industry: F(X) X Alg Random Distribution A Uniform Randomness Processors Prgs, (seeded) extractors, limited independence generators, epsilon-biased generators, Condensers, mergers, Distribution B June 13-18, 2011 19 Mutliplicities @ CSR
22. Randomness Extractors and Mergers Extractors: Dirty randomness Pure randomness Mergers: General primitive useful in the context of manipulating randomness. k random variables 1 random variable June 13-18, 2011 20 Mutliplicities @ CSR (Uniform, independent … nearly) (Biased, correlated) + small pure seed (One of them uniform) (high entropy) (Don’t know which, others potentially correlated) + small pure seed
23. Merger Analysis Problem Merger(X1,…,Xk; s) = f(s), where X1, …, Xk2Fqn; s 2Fq andfis deg. k-1function mapping F Fn s.t.f(i) = Xi. (fis the curve through X1,…,Xk) Question: For what choices of q, n, k isMerger’soutput close to uniform? Arises from [DvirShpilka’05, DvirWigderson’08]. “Statistical high-deg. version” of Kakeya problem. June 13-18, 2011 21 Mutliplicities @ CSR
24. Concerns from Merger Analysis [DW] Analysis: Worked only if q > n. So seed length = log2 q > log2 n Not good enough for setting where k = O(1), and n 1. (Would like seed length to be O(log k)). Multiplicity technique: seems bottlenecked at mult= n. June 13-18, 2011 22 Mutliplicities @ CSR
25. General obstacle in multiplicity method Can’t force polynomial Q to vanish with too high a multiplicity. Gives no benefit. E.g. Kakeya problem: Why stop at mult= n? Most we can hope from Q is that it vanishes on all of qn; Once this happens, Q = 0, if its degree is < q in each variable. So Q|Lis of degree at most qn, so multn suffices. Using larger multiplicity can’t help! Or can it? June 13-18, 2011 23 Mutliplicities @ CSR
26. Extended method of multiplicities (In Kakeya context): Perhaps vanishing of Q with high multiplicity at each point shows higher degree polynomials (deg > q in each variable) are identically zero? (Needed: Condition on multiplicity of zeroes of multivariate polynomials .) Perhaps Q can be shown to vanish with high multiplicity at each point in Fn. (Technical question: How?) June 13-18, 2011 24 Mutliplicities @ CSR
27. Vanishing of high-degree polynomials Mult(Q,a) = multiplicity of zeroes of Qata. I(Q,a) = 1 if mult(Q,a) > 0and0 o.w. =min{1, mult(Q,a)} Schwartz-Zippel: for any S ½ F I(Q,a) · d. |S|n-1 where sum is overa 2Sn Can we replace Iwithmult above? Would strengthen S-Z, and be useful in our case. [DKSS ‘09]: Yes … (simple inductive proof … that I can never remember) June 13-18, 2011 25 Mutliplicities @ CSR
28. Multiplicities? Q(X1,…,Xn) has zero of mult. m at a = (a1,…,an)if all (Hasse) derivatives of order < m vanish. Hasse derivative = ? Formally defined in terms of coefficients of Q, various multinomial coefficients and a. But really … The i = (i1,…, in)thderivative is the coefficient of z1i1…znin in Q(z + a). Even better … coeff. of zi in Q(z+x) (defines ith derivative Qi as a function of x; can evaluate at x = a). June 13-18, 2011 26 Mutliplicities @ CSR
29. Key Properties Each derivative is a linear function of coefficients of Q. [Used in [GS’98], [SS’09] .] (Q+R)i = Qi + Ri Q has zero of multm ata, and S is a curve that passes through a, then Q|Shas zero of multmat a. [Used for lines in prior work.] Qiis a polynomial of degree deg(Q) - jii (not used in prior works) (Qi)j ≠ Qi+j, but Qi+j(a) = 0 ) (Qi)j(a) = 0 Qvanishes with multmata )Qivanishes with multm - j iiat a. June 13-18, 2011 27 Mutliplicities @ CSR
30. Propagating multiplicities (in Kakeya) FindQthat vanishes with multmonK For everyiof order m/2, Q_ivanishes with multm/2on K. Conclude: Q, as well as all derivatives of Q of order m/2 vanish on Fn ) Qvanishes with multiplicity m/2 onFn Next Question: When is a polynomial (of deg >qn, or even qn) that vanishes with high multiplicity onqnidentically zero? June 13-18, 2011 28 Mutliplicities @ CSR
31. Back to Kakeya Find Q of degree d vanishing on K with multm. (can do if (m/n)n |K| < (d/n)n,dn > mn |K| ) Conclude Q vanishes on Fn with mult. m/2. Apply Extended-Schwartz-Zippel to conclude (m/2) qn < d qn-1 , (m/2) q < d , (m/2)nqn < dn = mn |K| Conclude: |K| ¸ (q/2)n Tight to within 2+o(1) factor! June 13-18, 2011 29 Mutliplicities @ CSR
32. Consequences for Mergers Can analyze [DW] merger when q > k very small, n growing; Analysis similar, more calculations. Yields: Seed length log q (independent of n). By combining it with every other ingredient in extractor construction: Extract all but vanishing entropy (k – o(k) bits of randomness from (n,k) sources) using O(log n) seed (for the first time). June 13-18, 2011 30 Mutliplicities @ CSR
33. Conclusions Combinatorics does have many “techniques” … Polynomial method + Multiplicity method adds to the body Supporting evidence: List decoding Kakeya sets Extractors/Mergers Others … June 13-18, 2011 31 Mutliplicities @ CSR
34. Other applications [Woodruff-Yekhanin ‘05]: An elegant construction of novel “LDCs (locally decodable codes)”. [Outclassed by more recent Yekhanin/Efremenko constructions.] [Kopparty-Lev-Saraf-S. ‘09]: Higher dimensional Kakeya problems. [Kopparty-Saraf-Yekhanin‘2011]: Locally decodable codes with Rate 1. June 13-18, 2011 32 Mutliplicities @ CSR
35. Conclusions New (?) technique in combinatorics … Polynomial method + Multiplicity method Supporting evidence: List decoding Kakeya sets Extractors/Mergers Locally decodable codes … More? June 13-18, 2011 33 Mutliplicities @ CSR
37. Mutliplicity = ? Q vanishes with multiplicity 2 at (a,b): Q(a,b) = 0; el Q/del x(a,b) = 0; el Q/el y(a,b) = 0. el Q/del x? Hasse derivative; Linear function of coefficients of Q; Q zero with multiplicity m at (a,b) and C=(x(t),y(t)) is curve passing through (a,b), then Q|C has zero of multiplity m at (a,b). June 13-18, 2011 35 Mutliplicities @ CSR
38. Concerns from Merger Analysis Recall Merger(X1,…,Xk; s) = f(s), whereX1, …, Xk2Fqn; s 2Fq and f is deg. k-1 curves.t. f(i) = Xi. [DW08] Say X1random; Let Kbe such that ²fraction of choices of X1,…,Xk lead to “bad” curves such that²fraction of s’s such that Merger outputs value in Kwith high probability. Build low-deg. poly Qvanishing on K; Prove for “bad” curves,Q vanishes on curve; and so Q vanishes on ²-fraction of X1’s (and so ²-fraction of domain). Apply Schwartz-Zippel. >< June 13-18, 2011 36 Mutliplicities @ CSR
39. What is common? Given a set in Fqn with nice algebraic properties, want to understand its size. Kakeya Problem: The Kakeya Set. Merger Problem: Any set T ½Fnthat contains ²-fraction of points on ²-fraction of merger curves. If T small, then output is non-uniform; else output is uniform. List-decoding problem: The union of the sets. June 13-18, 2011 37 Mutliplicities @ CSR
40. Randomness Extractors and Mergers Extractors: Dirty randomness Pure randomness Mergers: General primitive useful in the context of manipulating randomness. Given: k (dependent) random variables X1 … Xk, such that one is uniform. Add: small seed s (Additional randomness) Output: a uniform random variable Y. June 13-18, 2011 38 Mutliplicities @ CSR (Uniform, independent … nearly) (Biased, correlated) + small pure seed