The document discusses probabilistic segmentation using mixture models and the expectation-maximization (EM) algorithm. It addresses image segmentation and line fitting applications.
For image segmentation, the missing data is an (n x g) matrix of indicator variables showing which pixel belongs to which segment. The E-step computes the probability each pixel belongs to each segment. The M-step re-estimates the mixture model parameters to maximize the complete data log-likelihood.
For line fitting, the missing data is similarly an (n x g) matrix showing which point belongs to which line. The E-step computes the probability each point was drawn from each line. The M-step then re-estimates the line parameters.
This document provides an introduction to concepts in differential geometry including manifolds, tangent spaces, vector fields, differential forms, and operations on differential forms such as the exterior product and integration. It outlines key definitions and properties for differential geometry, Riemannian geometry, and applications to probability and statistics. The document is divided into three main sections on differential geometry, Riemannian geometry, and settings without Riemannian geometry.
This document discusses an online EM algorithm and some extensions. It begins by outlining the goals of maximum likelihood estimation, good scaling, processing data incrementally without storage, and simple implementation. It then provides an overview of the topics covered, which include the EM algorithm in exponential families, the limiting EM recursion, the online EM algorithm, using online EM for batch maximum likelihood estimation, and extensions. The document uses a Poisson mixture model as a running example to illustrate the E and M steps of the EM algorithm.
My data are incomplete and noisy: Information-reduction statistical methods f...Umberto Picchini
We review parameter inference for stochastic modelling in complex scenario, such as bad parameters initialization and near-chaotic dynamics. We show how state-of-art methods for state-space models can fail while, in some situations, reducing data to summary statistics (information reduction) enables robust estimation. Wood's synthetic likelihoods method is reviewed and the lecture closes with an example of approximate Bayesian computation methodology.
Accompanying code is available at https://github.com/umbertopicchini/pomp-ricker and https://github.com/umbertopicchini/abc_g-and-k
Readership lecture given at Lund University on 7 June 2016. The lecture is of popular science nature hence mathematical detail is kept to a minimum. However numerous links and references are offered for further reading.
A discussion on sampling graphs to approximate network classification functionsLARCA UPC
The problem of network classification consists on assigning a finite set of labels to the nodes of the graphs; the underlying assumption is that nodes with the same label tend to be connected via strong paths in the graph. This is similar to the assumptions made by semi-supervised learning algorithms based on graphs, which build an artificial graph from vectorial data. Such semi-supervised algorithms are based on label propagation principles and their accuracy heavily relies on the structure (presence of edges) in the graph.
In this talk I will discuss ideas of how to perform sampling in the network graph, thus sparsifying the structure in order to apply semi-supervised algorithms and compute efficiently the classification function on the network. I will show very preliminary experiments indicating that the sampling technique has an important effect on the final results and discuss open theoretical and practical questions that are to be solved yet.
Influence of the sampling on Functional Data Analysistuxette
This document discusses the influence of sampling on functional data analysis. It notes that functional data is typically observed through discrete sampling rather than as true continuous functions. This sampling must be accounted for in functional data analysis methods. Specifically, the document discusses using spline approximations to represent sampled functions as elements of a reproducing kernel Hilbert space. This allows building estimators of the true functions from their sampling and understanding how sampling impacts estimators and errors in functional models.
Spectral Learning Methods for Finite State Machines with Applications to Na...LARCA UPC
The document summarizes a spectral learning method for probabilistic finite-state machines (FSMs). It introduces observable operator models that represent probabilistic transducers using conditional probabilities between inputs, outputs, and hidden states. A key contribution is a spectral algorithm that learns the parameters of these models from data in linear time, with theoretical PAC-style guarantees. Experimental results on synthetic data show the method outperforms baselines like HMMs and k-HMMs on learning tasks.
A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...Umberto Picchini
I show how to obtain approximate maximum likelihood inference for "complex" models having some latent (unobservable) component. With "complex" I mean models having a so-called intractable likelihood, where the latter is unavailable in closed for or is too difficult to approximate. I construct a version of SAEM (and EM-type algorithm) that makes it possible to conduct inference for complex models. Traditionally SAEM is implementable only for models that are fairly tractable analytically. By introducing the concept of synthetic likelihood, where information is captured by a series of user-defined summary statistics (as in approximate Bayesian computation), it is possible to automatize SAEM to run on any model having some latent-component.
This document provides an introduction to concepts in differential geometry including manifolds, tangent spaces, vector fields, differential forms, and operations on differential forms such as the exterior product and integration. It outlines key definitions and properties for differential geometry, Riemannian geometry, and applications to probability and statistics. The document is divided into three main sections on differential geometry, Riemannian geometry, and settings without Riemannian geometry.
This document discusses an online EM algorithm and some extensions. It begins by outlining the goals of maximum likelihood estimation, good scaling, processing data incrementally without storage, and simple implementation. It then provides an overview of the topics covered, which include the EM algorithm in exponential families, the limiting EM recursion, the online EM algorithm, using online EM for batch maximum likelihood estimation, and extensions. The document uses a Poisson mixture model as a running example to illustrate the E and M steps of the EM algorithm.
My data are incomplete and noisy: Information-reduction statistical methods f...Umberto Picchini
We review parameter inference for stochastic modelling in complex scenario, such as bad parameters initialization and near-chaotic dynamics. We show how state-of-art methods for state-space models can fail while, in some situations, reducing data to summary statistics (information reduction) enables robust estimation. Wood's synthetic likelihoods method is reviewed and the lecture closes with an example of approximate Bayesian computation methodology.
Accompanying code is available at https://github.com/umbertopicchini/pomp-ricker and https://github.com/umbertopicchini/abc_g-and-k
Readership lecture given at Lund University on 7 June 2016. The lecture is of popular science nature hence mathematical detail is kept to a minimum. However numerous links and references are offered for further reading.
A discussion on sampling graphs to approximate network classification functionsLARCA UPC
The problem of network classification consists on assigning a finite set of labels to the nodes of the graphs; the underlying assumption is that nodes with the same label tend to be connected via strong paths in the graph. This is similar to the assumptions made by semi-supervised learning algorithms based on graphs, which build an artificial graph from vectorial data. Such semi-supervised algorithms are based on label propagation principles and their accuracy heavily relies on the structure (presence of edges) in the graph.
In this talk I will discuss ideas of how to perform sampling in the network graph, thus sparsifying the structure in order to apply semi-supervised algorithms and compute efficiently the classification function on the network. I will show very preliminary experiments indicating that the sampling technique has an important effect on the final results and discuss open theoretical and practical questions that are to be solved yet.
Influence of the sampling on Functional Data Analysistuxette
This document discusses the influence of sampling on functional data analysis. It notes that functional data is typically observed through discrete sampling rather than as true continuous functions. This sampling must be accounted for in functional data analysis methods. Specifically, the document discusses using spline approximations to represent sampled functions as elements of a reproducing kernel Hilbert space. This allows building estimators of the true functions from their sampling and understanding how sampling impacts estimators and errors in functional models.
Spectral Learning Methods for Finite State Machines with Applications to Na...LARCA UPC
The document summarizes a spectral learning method for probabilistic finite-state machines (FSMs). It introduces observable operator models that represent probabilistic transducers using conditional probabilities between inputs, outputs, and hidden states. A key contribution is a spectral algorithm that learns the parameters of these models from data in linear time, with theoretical PAC-style guarantees. Experimental results on synthetic data show the method outperforms baselines like HMMs and k-HMMs on learning tasks.
A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...Umberto Picchini
I show how to obtain approximate maximum likelihood inference for "complex" models having some latent (unobservable) component. With "complex" I mean models having a so-called intractable likelihood, where the latter is unavailable in closed for or is too difficult to approximate. I construct a version of SAEM (and EM-type algorithm) that makes it possible to conduct inference for complex models. Traditionally SAEM is implementable only for models that are fairly tractable analytically. By introducing the concept of synthetic likelihood, where information is captured by a series of user-defined summary statistics (as in approximate Bayesian computation), it is possible to automatize SAEM to run on any model having some latent-component.
Inference for stochastic differential equations via approximate Bayesian comp...Umberto Picchini
Despite the title the methods are appropriate for more general dynamical models (including state-space models). Presentation given at Nordstat 2012, Umeå. Relevant research paper at http://arxiv.org/abs/1204.5459 and software code at https://sourceforge.net/projects/abc-sde/
The document is an introduction to graphical models. It discusses that graphical models define probability distributions over random variables using graphs to encode conditional independence assumptions. It then describes popular classes of graphical models including directed Bayesian networks and undirected Markov random fields. Bayesian networks define a factorization of the joint distribution over parent variables, while Markov random fields factorize over potentials at cliques in the graph. An example Markov random field is also shown.
A 3hrs intro lecture to Approximate Bayesian Computation (ABC), given as part of a PhD course at Lund University, February 2016. For sample codes see http://www.maths.lu.se/kurshemsida/phd-course-fms020f-nams002-statistical-inference-for-partially-observed-stochastic-processes/
The document discusses targeted Bayesian network learning (TBNL) and its application to predicting criminal suspects. It compares TBNL to traditional Bayesian network learning approaches, noting that TBNL aims to maximize the amount of information learned about a specific target variable rather than the entire distribution. The document provides examples of TBNL outperforming naive Bayes and tree-augmented networks on several datasets by exploiting correlations between attributes and the target more effectively for prediction tasks. It also analyzes the differential complexity of TBNL versus traditional explanatory models.
This document presents a method for estimating the eigenvalues of a covariance matrix when there are few samples. It involves shifting the sampled eigenvalues toward the population values based on theoretical distributions, and balancing the energy across eigenvalues. This simple 3-matrix approach improves estimation and detection performance compared to using the sampled eigenvalues alone. Simulations and hyperspectral data experiments demonstrate the effectiveness of the method.
Jan Picek, Martin Schindler, Jan Kyselý, Romana Beranová: Statistical aspects...Jiří Šmída
This document discusses using regression quantiles to estimate time-dependent thresholds for peaks-over-threshold extreme value analysis. It introduces regression quantiles methodology, which allows thresholds to vary based on covariates like time. Exceedances of regression quantile thresholds are shown to follow a generalized Pareto distribution. Tests are developed based on regression rank scores to select appropriate regression models. The approach provides a computationally simple way to incorporate non-stationarity into extreme value analysis.
This document summarizes research on sparse representations by Joel Tropp. It discusses how sparse approximation problems arise in applications like variable selection in regression and seismic imaging. It presents algorithms for solving sparse representation problems, including orthogonal matching pursuit and 1-minimization. It analyzes when these algorithms can recover sparse solutions and proves performance guarantees for random matrices and random sparse vectors. The document also discusses related areas like compressive sampling and simultaneous sparsity.
This document discusses using graphics processing units (GPUs) to perform approximate Bayesian computation (ABC) for parameter estimation of complex models. It describes how GPUs are well-suited for ABC due to their ability to perform linear computations on many threads in parallel. The document provides examples of applying ABC to GPUs for problems involving dynamical systems, network evolution models, and parameter estimation for protein interaction networks.
Eigenvalues of Symmetrix Hierarchical MatricesThomas Mach
The document summarizes the classification of eigenvalue problems for symmetric hierarchical matrices. It discusses that hierarchical matrices can be used to approximate dense matrices, such as those from boundary element or finite element methods, in a data-sparse way. Eigenvalue algorithms for symmetric hierarchical matrices must exploit the special structure of these matrices.
Those are the slides for my Master course on Monte Carlo Statistical Methods given in conjunction with the Monte Carlo Statistical Methods book with George Casella.
k-MLE: A fast algorithm for learning statistical mixture modelsFrank Nielsen
This document describes a fast algorithm called k-MLE for learning statistical mixture models. k-MLE is based on the connection between exponential family mixture models and Bregman divergences. It extends Lloyd's k-means clustering algorithm to optimize the complete log-likelihood of an exponential family mixture model using Bregman divergences. The algorithm iterates between assigning data points to clusters based on Bregman divergence, and updating the cluster parameters by taking the Bregman centroid of each cluster's assigned points. This provides a fast method for maximum likelihood estimation of exponential family mixture models.
The document discusses adaptive Markov chain Monte Carlo (MCMC) for Bayesian inference of spatial autologistic models. It notes that standard MCMC cannot be implemented when the likelihood function is unavailable or the completion step is too costly due to high dimensionality. Adaptive MCMC is proposed as an alternative that bypasses computation of the normalizing constant. Questions are raised about how to combine adaptations of the proposal distribution, tuning parameters, and sample sizes to improve the method.
Conventional tools in array signal processing have traditionally relied on the availability of a large number of samples acquired at each sensor or array element (antenna, hydrophone, microphone, etc.). Large sample size assumptions typically guarantee the consistency of estimators, detectors, classifiers and multiple other widely used signal processing procedures. However, practical scenario and array mobility conditions, together with the need for low latency and reduced scanning times, impose strong limits on the total number of observations that can be effectively processed. When the number of collected samples per sensor is small, conventional large sample asymptotic approaches are not relevant anymore. Recently, large random matrix theory tools have been proposed in order to address the small sample support problem in array signal processing. In fact, it has been shown that the most important and longstanding problems in this field can be reformulated and studied according to this asymptotic paradigm. By exploiting the latest advances in large random matrix theory and high dimensional statistics, a novel and unconventional methodology can be established, which provides an unprecedented treatment of the finite sample-per-sensor regime. In this talk, we will see that random matrix theory establishes a unifying framework for the study of array signal processing techniques under the constraint of a small number of observations per sensor, which has radically changed the way in which array processing methodologies have been traditionally established. We will show how this unconventional way of revisiting classical array processing has lead to major advances in the design and analysis of signal processing techniques for multidimensional observations.
Bayesian Nonparametrics: Models Based on the Dirichlet ProcessAlessandro Panella
This document summarizes an introduction to Bayesian nonparametric models presented by Alessandro Panella. It discusses Bayesian learning and De Finetti's theorem, which shows that any exchangeable sequence of random variables can be represented as conditionally independent given a random variable. Finite mixture models are introduced as a Bayesian approach to clustering. Dirichlet process mixture models provide a nonparametric generalization that allows for an unbounded number of clusters.
Quantum Algorithms and Lower Bounds in Continuous TimeDavid Yonge-Mallo
A poster presented at the Quantum Computing & Quantum Algorithms Program Review, in Buckhead, Atlanta, Georgia, 2008.
Abstract: "Many models of quantum computation, such as the Turing machine model or the circuit model, treat time as a discrete quantity and describe algorithms as discrete sequences of steps. However, this is not the only way to view quantum computational processes, as algorithms based on such ideas as continuous-time quantum walks show. By studying the properties of quantum computation in a continuous-time framework, we hope to discover new algorithms, develop better intuitions into existing algorithms, and gain further insights into the power and limitations of quantum computation."
The document discusses different types of functions including linear, quadratic, absolute value, and square root functions. It provides the definitions and key properties of each function such as domain, range, intercepts, vertex, and transformations that modify the graph. Examples are worked through demonstrating how to find specific characteristics of each function and graph transformations.
This document discusses probabilistic segmentation using mixture models. It explains that a mixture model represents the probability of generating a pixel measurement vector as a weighted sum of component densities. The likelihood for all observations is calculated as the product of probabilities for each data point. Missing data problems are also discussed, where the incomplete data likelihood is calculated as the product of probabilities for each incomplete data observation.
The document discusses estimating 2D homography from point correspondences between two images using the Direct Linear Transformation algorithm. It describes how each point correspondence provides two linear equations relating the entries of the homography matrix. At least four point correspondences are needed to compute the homography using DLT. The document also discusses issues like degenerate configurations, data normalization, robust estimation techniques like RANSAC to deal with outlier correspondences.
Inference for stochastic differential equations via approximate Bayesian comp...Umberto Picchini
Despite the title the methods are appropriate for more general dynamical models (including state-space models). Presentation given at Nordstat 2012, Umeå. Relevant research paper at http://arxiv.org/abs/1204.5459 and software code at https://sourceforge.net/projects/abc-sde/
The document is an introduction to graphical models. It discusses that graphical models define probability distributions over random variables using graphs to encode conditional independence assumptions. It then describes popular classes of graphical models including directed Bayesian networks and undirected Markov random fields. Bayesian networks define a factorization of the joint distribution over parent variables, while Markov random fields factorize over potentials at cliques in the graph. An example Markov random field is also shown.
A 3hrs intro lecture to Approximate Bayesian Computation (ABC), given as part of a PhD course at Lund University, February 2016. For sample codes see http://www.maths.lu.se/kurshemsida/phd-course-fms020f-nams002-statistical-inference-for-partially-observed-stochastic-processes/
The document discusses targeted Bayesian network learning (TBNL) and its application to predicting criminal suspects. It compares TBNL to traditional Bayesian network learning approaches, noting that TBNL aims to maximize the amount of information learned about a specific target variable rather than the entire distribution. The document provides examples of TBNL outperforming naive Bayes and tree-augmented networks on several datasets by exploiting correlations between attributes and the target more effectively for prediction tasks. It also analyzes the differential complexity of TBNL versus traditional explanatory models.
This document presents a method for estimating the eigenvalues of a covariance matrix when there are few samples. It involves shifting the sampled eigenvalues toward the population values based on theoretical distributions, and balancing the energy across eigenvalues. This simple 3-matrix approach improves estimation and detection performance compared to using the sampled eigenvalues alone. Simulations and hyperspectral data experiments demonstrate the effectiveness of the method.
Jan Picek, Martin Schindler, Jan Kyselý, Romana Beranová: Statistical aspects...Jiří Šmída
This document discusses using regression quantiles to estimate time-dependent thresholds for peaks-over-threshold extreme value analysis. It introduces regression quantiles methodology, which allows thresholds to vary based on covariates like time. Exceedances of regression quantile thresholds are shown to follow a generalized Pareto distribution. Tests are developed based on regression rank scores to select appropriate regression models. The approach provides a computationally simple way to incorporate non-stationarity into extreme value analysis.
This document summarizes research on sparse representations by Joel Tropp. It discusses how sparse approximation problems arise in applications like variable selection in regression and seismic imaging. It presents algorithms for solving sparse representation problems, including orthogonal matching pursuit and 1-minimization. It analyzes when these algorithms can recover sparse solutions and proves performance guarantees for random matrices and random sparse vectors. The document also discusses related areas like compressive sampling and simultaneous sparsity.
This document discusses using graphics processing units (GPUs) to perform approximate Bayesian computation (ABC) for parameter estimation of complex models. It describes how GPUs are well-suited for ABC due to their ability to perform linear computations on many threads in parallel. The document provides examples of applying ABC to GPUs for problems involving dynamical systems, network evolution models, and parameter estimation for protein interaction networks.
Eigenvalues of Symmetrix Hierarchical MatricesThomas Mach
The document summarizes the classification of eigenvalue problems for symmetric hierarchical matrices. It discusses that hierarchical matrices can be used to approximate dense matrices, such as those from boundary element or finite element methods, in a data-sparse way. Eigenvalue algorithms for symmetric hierarchical matrices must exploit the special structure of these matrices.
Those are the slides for my Master course on Monte Carlo Statistical Methods given in conjunction with the Monte Carlo Statistical Methods book with George Casella.
k-MLE: A fast algorithm for learning statistical mixture modelsFrank Nielsen
This document describes a fast algorithm called k-MLE for learning statistical mixture models. k-MLE is based on the connection between exponential family mixture models and Bregman divergences. It extends Lloyd's k-means clustering algorithm to optimize the complete log-likelihood of an exponential family mixture model using Bregman divergences. The algorithm iterates between assigning data points to clusters based on Bregman divergence, and updating the cluster parameters by taking the Bregman centroid of each cluster's assigned points. This provides a fast method for maximum likelihood estimation of exponential family mixture models.
The document discusses adaptive Markov chain Monte Carlo (MCMC) for Bayesian inference of spatial autologistic models. It notes that standard MCMC cannot be implemented when the likelihood function is unavailable or the completion step is too costly due to high dimensionality. Adaptive MCMC is proposed as an alternative that bypasses computation of the normalizing constant. Questions are raised about how to combine adaptations of the proposal distribution, tuning parameters, and sample sizes to improve the method.
Conventional tools in array signal processing have traditionally relied on the availability of a large number of samples acquired at each sensor or array element (antenna, hydrophone, microphone, etc.). Large sample size assumptions typically guarantee the consistency of estimators, detectors, classifiers and multiple other widely used signal processing procedures. However, practical scenario and array mobility conditions, together with the need for low latency and reduced scanning times, impose strong limits on the total number of observations that can be effectively processed. When the number of collected samples per sensor is small, conventional large sample asymptotic approaches are not relevant anymore. Recently, large random matrix theory tools have been proposed in order to address the small sample support problem in array signal processing. In fact, it has been shown that the most important and longstanding problems in this field can be reformulated and studied according to this asymptotic paradigm. By exploiting the latest advances in large random matrix theory and high dimensional statistics, a novel and unconventional methodology can be established, which provides an unprecedented treatment of the finite sample-per-sensor regime. In this talk, we will see that random matrix theory establishes a unifying framework for the study of array signal processing techniques under the constraint of a small number of observations per sensor, which has radically changed the way in which array processing methodologies have been traditionally established. We will show how this unconventional way of revisiting classical array processing has lead to major advances in the design and analysis of signal processing techniques for multidimensional observations.
Bayesian Nonparametrics: Models Based on the Dirichlet ProcessAlessandro Panella
This document summarizes an introduction to Bayesian nonparametric models presented by Alessandro Panella. It discusses Bayesian learning and De Finetti's theorem, which shows that any exchangeable sequence of random variables can be represented as conditionally independent given a random variable. Finite mixture models are introduced as a Bayesian approach to clustering. Dirichlet process mixture models provide a nonparametric generalization that allows for an unbounded number of clusters.
Quantum Algorithms and Lower Bounds in Continuous TimeDavid Yonge-Mallo
A poster presented at the Quantum Computing & Quantum Algorithms Program Review, in Buckhead, Atlanta, Georgia, 2008.
Abstract: "Many models of quantum computation, such as the Turing machine model or the circuit model, treat time as a discrete quantity and describe algorithms as discrete sequences of steps. However, this is not the only way to view quantum computational processes, as algorithms based on such ideas as continuous-time quantum walks show. By studying the properties of quantum computation in a continuous-time framework, we hope to discover new algorithms, develop better intuitions into existing algorithms, and gain further insights into the power and limitations of quantum computation."
The document discusses different types of functions including linear, quadratic, absolute value, and square root functions. It provides the definitions and key properties of each function such as domain, range, intercepts, vertex, and transformations that modify the graph. Examples are worked through demonstrating how to find specific characteristics of each function and graph transformations.
This document discusses probabilistic segmentation using mixture models. It explains that a mixture model represents the probability of generating a pixel measurement vector as a weighted sum of component densities. The likelihood for all observations is calculated as the product of probabilities for each data point. Missing data problems are also discussed, where the incomplete data likelihood is calculated as the product of probabilities for each incomplete data observation.
The document discusses estimating 2D homography from point correspondences between two images using the Direct Linear Transformation algorithm. It describes how each point correspondence provides two linear equations relating the entries of the homography matrix. At least four point correspondences are needed to compute the homography using DLT. The document also discusses issues like degenerate configurations, data normalization, robust estimation techniques like RANSAC to deal with outlier correspondences.
The document discusses projective geometry in 3D space (P3). It defines how points, planes, and lines are represented using homogeneous coordinates. Under projective transformations, incidence relations between points and planes are preserved. Three non-coplanar points uniquely define a plane, and three planes intersect at a point. The hierarchy of transformations from projective to Euclidean is described, along with the invariants each preserve. The plane at infinity π∞ and absolute conic Ω∞ allow measurement of affine and metric properties within a projective frame.
The document summarizes linear dynamical models and tracking using the Kalman filter. It discusses prediction using the previous state estimate, correction using the new measurement, and representing the state as a Gaussian distribution. Key steps include predicting the next state using the dynamic model, then correcting the prediction using the new measurement via Bayes' rule to get an updated state estimate. Calculations involve multiplying and summing Gaussian probability densities.
The trifocal tensor encapsulates the projective geometry relations between three views. It depends only on the relative pose between the three cameras and their internal parameters. The trifocal tensor can uniquely determine point and line correspondences between the three views and can be used to transfer points from a correspondence in two views to the corresponding point in the third view. It consists of three 3x3 matrices that relate image lines between the views and can induce homographies between views from lines in one of the images.
The document discusses dimensionality reduction techniques for reducing high-dimensional data to fewer dimensions. It categorizes dimensionality reduction into feature extraction and feature selection. Feature extraction transforms features to generate new ones, while feature selection selects the best original features. The document then discusses several feature selection algorithms from different categories (filter, wrapper, hybrid) and evaluates their performance on cancer datasets. It finds that linear support vector machines using mRMR feature selection provided the best results.
DLT stands for Direct Linear Transformation. It is an algorithm that estimates the camera matrix P by minimizing the algebraic error between measured image points xi and projected 3D points PXi. Specifically, DLT finds P by solving the equation Ap=0, where A is constructed from point correspondences and p contains the entries of P. This minimizes the sum of squared algebraic distances between the points. For affine cameras, the algebraic and geometric distances are equivalent. DLT provides an initial estimate of P that can be refined using nonlinear optimization techniques.
The document discusses segmentation and is from the Computer Science and Engineering department at the Indian Institute of Technology in Kharagpur. It contains 29 pages of content about segmentation but provides no other context or summaries of the information within.
The document discusses camera models used in computer vision. It begins by defining a camera as a mapping from the 3D world to a 2D image. The basic pinhole camera model is then described, including the camera center, image plane, principal axis, and principal point. Central projection using homogeneous coordinates is shown. The camera calibration matrix K is introduced, which relates the camera coordinate system to pixel coordinates. Finally, the full camera matrix P is defined, which combines camera intrinsics K, rotation R, and translation -C to map 3D world points to 2D image points.
Cervical cancer rates have dramatically declined in the United States due to widespread Pap smear screening and the ability to treat precancerous lesions before they develop into cancer. The introduction of the Pap test in the 1940s allowed early detection and helped reduce cervical cancer incidence and mortality rates by over 60% between 1955 and 1992. New automated screening systems using digital imaging and computational analysis now further aid in screening and may help expand screening to rural areas through remote image analysis.
This document discusses singular value decomposition (SVD) and its applications. SVD decomposes a matrix into three component matrices that reveal useful properties about the matrix's structure and rank. SVD can be used to find the best-fitting line to a set of points by minimizing the sum of squared distances between points and the line. The solution involves computing the SVD of a transformed matrix and taking the right singular vector corresponding to the second largest singular value.
Camera calibration involves determining the internal camera parameters like focal length, image center, distortion, and scaling factors that affect the imaging process. These parameters are important for applications like 3D reconstruction and robotics that require understanding the relationship between 3D world points and their 2D projections in an image. The document describes estimating internal parameters by taking images of a calibration target with known 3D positions and solving for the camera projection matrix P that relates 3D scene points to their 2D image coordinates.
The document discusses two-view geometry and epipolar geometry in computer vision. It contains the following key points in 3 sentences:
Epipolar geometry describes the intrinsic projective geometry between two views of a scene and is defined by the fundamental matrix F, which is a 3x3 matrix that maps a point in one image to an epipolar line in the other image. The epipolar line is the intersection of the epipolar plane containing the baseline between cameras and the second image plane. Special motions like pure translation result in all epipolar lines intersecting at the epipole, which is the image of the camera center from the other view.
The document discusses projective geometry and its applications in computer vision. It begins by introducing planar geometry and algebraic geometry. It then describes the 2D projective plane and how points and lines can be represented using homogeneous coordinates. Ideal points and the line at infinity are discussed. Projective transformations including homographies are explained. Conic sections and how they transform under projectivities are covered. The key concepts of duality and various subgroups of projective transformations are summarized. Examples of projective transformations and corrections are provided.
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtureszukun
1. Gaussian mixtures are commonly used in computer vision and pattern recognition tasks like classification, segmentation, and probability density function estimation.
2. The document reviews Gaussian mixtures, which model a probability distribution as a weighted sum of Gaussian distributions. It discusses estimating Gaussian mixture models with the EM algorithm and techniques for model order selection like minimum description length and Gaussian deficiency.
3. Gaussian mixtures can model images and perform color-based segmentation. The EM algorithm is used to estimate the parameters of Gaussian mixtures by alternating between expectation and maximization steps.
1. The document discusses maximum likelihood estimation and Bayesian parameter estimation for machine learning problems involving parametric densities like the Gaussian.
2. Maximum likelihood estimation finds the parameter values that maximize the probability of obtaining the observed training data. For Gaussian distributions with unknown mean and variance, MLE returns the sample mean and variance.
3. Bayesian parameter estimation treats the parameters as random variables and uses prior distributions and observed data to obtain posterior distributions over the parameters. This allows incorporation of prior knowledge with the training data.
This document proposes a linear programming (LP) based approach for solving maximum a posteriori (MAP) estimation problems on factor graphs that contain multiple-degree non-indicator functions. It presents an existing LP method for problems with single-degree functions, then introduces a transformation to handle multiple-degree functions by introducing auxiliary variables. This allows applying the existing LP method. As an example, it applies this to maximum likelihood decoding for the Gaussian multiple access channel. Simulation results demonstrate the LP approach decodes correctly with polynomial complexity.
Aristidis Likas, Associate Professor and Christoforos Nikou, Assistant Professor, University of Ioannina, Department of Computer Science , Mixture Models for Image Analysis
최근 이수가 되고 있는 Bayesian Deep Learning 관련 이론과 최근 어플리케이션들을 소개합니다. Bayesian Inference 의 이론에 관해서 간단히 설명하고 Yarin Gal 의 Monte Carlo Dropout 의 이론과 어플리케이션들을 소개합니다.
AACIMP 2010 Summer School lecture by Leonidas Sakalauskas. "Applied Mathematics" stream. "Stochastic Programming and Applications" course. Part 3.
More info at http://summerschool.ssa.org.ua
This document summarizes Hill's method for numerically approximating the eigenvalues and eigenfunctions of differential operators. Hill's method has two main steps:
1. Perform a Floquet-Bloch decomposition to reduce the problem from the real line to the interval [0,L] with periodic boundary conditions, parameterized by the Floquet exponent μ. This gives an operator with a compact resolvent.
2. Approximate the solutions by Fourier series, reducing the problem to a matrix eigenvalue problem that can be solved numerically.
The method is straightforward to implement and effective for various problems involving differential operators on the real line or with periodic boundary conditions. Convergence rates and error bounds for Hill's method are also presented.
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...zukun
The document discusses using divergence measures like the Jensen-Shannon divergence to align multiple point sets represented as probability density functions. It motivates using the JS divergence by modeling point sets as mixtures of density functions, and shows how the likelihood ratio between models leads to the JS divergence. It then formulates the problem of group-wise point set registration as minimizing the JS divergence between density functions, combined with a regularization term. Experimental results on aligning multiple 3D hippocampus point sets are also presented.
The first report of Machine Learning Seminar organized by Computational Linguistics Laboratory at Kazan Federal University. See http://cll.niimm.ksu.ru/cms/lang/en_US/main/seminars/mlseminar
This document introduces modern variational inference techniques. It discusses:
1. The goal of variational inference is to approximate the posterior distribution p(θ|D) over latent parameters θ given data D.
2. This is done by positing a variational distribution qλ(θ) and optimizing its parameters λ to minimize the KL divergence between qλ(θ) and p(θ|D).
3. The evidence lower bound (ELBO) is used as a variational objective that can be optimized using stochastic gradient descent, with gradients estimated using Monte Carlo sampling and reparametrization.
The document discusses particle filtering and state-space processes. It provides an overview of two commonly used particle filters: the bootstrap filter and auxiliary particle filter. It also presents an example of applying particle filtering to a stochastic volatility model.
Fixed points and two-cycles of the self-power mapJoshua Holden
This document summarizes research into counting the fixed points and 2-cycles of the function x → x^n (mod p) for various values of n. It presents theorems that give formulas for the number of fixed points and 2-cycles when n is a prime number p, as well as when n is a prime power p^e. The theorems are proved using techniques like the Chinese remainder theorem, Hensel's lemma, and the stationary phase formula.
Sequential Monte Carlo algorithms for agent-based models of disease transmissionJeremyHeng10
This document discusses sequential Monte Carlo algorithms for statistical inference in agent-based models of disease transmission. It begins with an overview of agent-based models and their use in epidemiology. It then describes an agent-based SIS model where each agent's state and transitions depend on covariates. The likelihood involves marginalizing over the latent states of all agents. Sequential Monte Carlo methods like particle filters are proposed to approximate this intractable likelihood. The document outlines the bootstrap particle filter and auxiliary particle filter approaches.
Estimation of the score vector and observed information matrix in intractable...Pierre Jacob
This document discusses methods for estimating derivatives of the likelihood function in intractable models, such as hidden Markov models, where the likelihood does not have a closed form. It presents three key ideas:
1) Iterated filtering, which approximates the score by perturbing the parameter and tracking the evolution of the perturbation through sequential updates.
2) Proximity mapping, which relates the shift in the posterior mode to the prior mode as the prior variance goes to zero to the score via Moreau's approximation.
3) Posterior concentration induced by a normal prior concentrating at a point, where Taylor expansions show the shift in posterior moments is order of the prior variance, relating it to the derivatives of the log
Estimation of the score vector and observed information matrix in intractable...Pierre Jacob
This document discusses methods for estimating the score vector and observed information matrix for intractable models. It begins with an overview of using derivatives in sampling algorithms. It then discusses iterated filtering, a method for estimating derivatives in hidden Markov models when the likelihood is not available in closed form. Iterated filtering introduces a perturbed model and relates the posterior mean to the score and posterior variance to the observed information matrix. The document outlines proofs that show this relationship as the prior concentration increases.
This document discusses using Gaussian process models for change point detection in atmospheric dispersion problems. It proposes using multiple kernels in a Gaussian process to model different regimes indicated by change points. A two-stage process is used to first estimate the change point (release time) and then estimate the source location. Simulation results show the approach outperforms existing techniques in estimating change points and source locations from concentration sensor measurements. The approach is applied to model real concentration data to estimate a CBRN release scenario.
1. P ROBABILISTIC S EGMENTATION
IIT Kharagpur
Computer Science and Engineering,
Indian Institute of Technology
Kharagpur.
,
1 / 36
2. Mixture Model Image Segmentation
Probability of generating a pixel measurement vector:
p (x) = p (x | θl ) πl
l
,
2 / 36
3. Mixture Model Image Segmentation
Probability of generating a pixel measurement vector:
p (x) = p (x | θl ) πl
l
The mixture model has the form:
g
p (x | Θ) = αl pl (x | θl )
l=1
,
2 / 36
4. Mixture Model Image Segmentation
Probability of generating a pixel measurement vector:
p (x) = p (x | θl ) πl
l
The mixture model has the form:
g
p (x | Θ) = αl pl (x | θl )
l=1
Component densities:
1 1
pl (x | θl ) = d/2 1/2
exp − x − µl Σ−1 x − µl
l
(2π) det(Σl ) 2
,
2 / 36
6. Mixture Model Line Fitting
p (W) = πl p (W | al )
l
,
4 / 36
7. Mixture Model Line Fitting
p (W) = πl p (W | al )
l
Likelihood for a set of observations:
g
πl
pl Wj | al
j∈ observations l=1
,
4 / 36
8. Missing data problems
Lc (x ; u) = log
pc xj ; u
j
= log pc xj ; u
j
,
5 / 36
9. Missing data problems
Lc (x ; u) = log
pc xj ; u
j
= log pc xj ; u
j
The incomplete data space:
pi (y ; u) =
,
5 / 36
10. Missing data problems
Lc (x ; u) = log
pc xj ; u
j
= log pc xj ; u
j
The incomplete data space:
pi (y ; u) = pc (x ; u)
,
5 / 36
11. Missing data problems
Lc (x ; u) = log
pc xj ; u
j
= log pc xj ; u
j
The incomplete data space:
pi (y ; u) = pc (x ; u)
(x | f (x)=y)
,
5 / 36
12. Missing data problems
Lc (x ; u) = log
pc xj ; u
j
= log pc xj ; u
j
The incomplete data space:
pi (y ; u) = pc (x ; u) dη
(x | f (x)=y)
where η measures volume on the space of x such that f (x) = y
,
5 / 36
15. Missing data problems
The incomplete data likelihood:
pi yj ; u
j∈ observations
Li (y ; u) = log
pi yj ; u
j
,
6 / 36
16. Missing data problems
The incomplete data likelihood:
pi yj ; u
j∈ observations
Li (y ; u) = log
pi yj ; u
j
= log pi yj ; u
j
,
6 / 36
17. Missing data problems
The incomplete data likelihood:
pi yj ; u
j∈ observations
Li (y ; u) = log
pi yj ; u
j
= log pi yj ; u
j
pc (x ; u) dη
= log
j {x | f (x)=yj }
,
6 / 36
18. EM for mixture models
The complete data is a composition of the incomplete data and the
missing data.
xj = yj , zj
,
7 / 36
19. EM for mixture models
The complete data is a composition of the incomplete data and the
missing data.
xj = yj , zj
Mixture model:
p (y) = πl p (y | al )
i
,
7 / 36
20. EM for mixture models
The complete data is a composition of the incomplete data and the
missing data.
xj = yj , zj
Mixture model:
p (y) = πl p (y | al )
i
Complete data log likelihood:
g
zlj log p yj | al
j∈ observations l=1
,
7 / 36
21. EM
E-step: Compute the expected value for zj for each j.
(s)
i.e. Compute ¯j . This results in ¯s = [y, ¯s ]
z x z
,
8 / 36
22. EM
E-step: Compute the expected value for zj for each j.
(s)
i.e. Compute ¯j . This results in ¯s = [y, ¯s ]
z x z
M-step: Maximize the complete data log-likelihood
with respect to u
us+1 = arg max Lc (¯s ; u)
x
u
,
8 / 36
23. EM
E-step: Compute the expected value for zj for each j.
(s)
i.e. Compute ¯j . This results in ¯s = [y, ¯s ]
z x z
M-step: Maximize the complete data log-likelihood
with respect to u
us+1 = arg max Lc (¯s ; u)
x
u
= arg max Lc ([y, ¯s ] ; u)
z
u
,
8 / 36
24. EM in General Case
Expected value of the complete data log-likelihood:
Q u ; u(s) = Lc (x ; u) p x | u(s) , y dx
We maximize with respect to u to get.
us+1 = arg max Q u ; u(s)
u
,
9 / 36
25. Image Segmentation
W HAT IS M ISSING DATA ? An (n × g) matrix I of indicator variables.
Expectation step:
E(Ilm ) = ¯lm =
I
,
10 / 36
26. Image Segmentation
W HAT IS M ISSING DATA ? An (n × g) matrix I of indicator variables.
Expectation step:
E(Ilm ) = ¯lm =
I 1 · P l th pixel comes from mth blob
+ 0 · P l th pixel does not come from mth blob
,
10 / 36
27. Image Segmentation
W HAT IS M ISSING DATA ? An (n × g) matrix I of indicator variables.
Expectation step:
E(Ilm ) = ¯lm =
I 1 · P l th pixel comes from mth blob
+ 0 · P l th pixel does not come from mth blob
= P l th pixel comes from mth blob
,
10 / 36
28. Image Segmentation
W HAT IS M ISSING DATA ? An (n × g) matrix I of indicator variables.
Expectation step:
E(Ilm ) = ¯lm =
I 1 · P l th pixel comes from mth blob
+ 0 · P l th pixel does not come from mth blob
= P l th pixel comes from mth blob
We get:
(s) (s)
αm pm xl | θm
¯lm =
I (s)
K
k =1 αk pk xl | θ(s)
k
,
10 / 36
29. Image Segmentation
C OMPLETE DATA LOG - LIKELIHOOD :
g
Lc [x, ¯lm ] ; Θ(s) = ¯lm log p (xl | θm )
I I
l∈ all pixel m=1
,
11 / 36
30. Image Segmentation
C OMPLETE DATA LOG - LIKELIHOOD :
g
Lc [x, ¯lm ] ; Θ(s) = ¯lm log p (xl | θm )
I I
l∈ all pixel m=1
Maximization step:
Θ(s+1) = arg max Lc [x, ¯lm ] ; Θ(s)
I
Θ
,
11 / 36
31. Image Segmentation
Maximization step:
n
(s+1) 1
αm = p m | xl , Θ(s)
n
l=1
n
l=1 xl p m | xl , Θ(s)
µ(s+1) =
m n
l=1 p m | xl , Θ(s)
(s) (s)
n
l=1 p m | xl , Θ(s) xl − µm xl − µm
(s+1)
Σm =
n
l=1 p m | xl , Θ(s)
,
12 / 36
32. How EM works for Image Segmentation
E-step:
(s) (s)
αm pm xl | θm
¯lm =
I (s)
K
k =1 αk pk xl | θ(s)
m
(s) (s)
For each pixel we compute the values: αm pm xl | θm for each
segment m.
(s) (s)
For each pixel compute the sum K=1 αk pk xl | θm
k , i.e.
perform summation over all the K segments.
Divide the former by the latter.
M-step:
(s+1) (s+1) (s+1)
Compute the αm , µm , Σm
,
13 / 36
33. Line Fitting Expectation Maximization
W HAT IS M ISSING DATA ?
An (n × g) matrix M of indicator variables.
1 if point k is drawn from line l
k, lth entry of M = mk,l =
0 otherwise
,
14 / 36
34. Line Fitting Expectation Maximization
W HAT IS M ISSING DATA ?
An (n × g) matrix M of indicator variables.
1 if point k is drawn from line l
k, lth entry of M = mk,l =
0 otherwise
P (mkl = 1 | point k, line l s parameters) = 1.
l
,
14 / 36
35. Line Fitting Expectation Maximization
W HAT IS M ISSING DATA ?
An (n × g) matrix M of indicator variables.
1 if point k is drawn from line l
k, lth entry of M = mk,l =
0 otherwise
P (mkl = 1 | point k, line l s parameters) = 1.
l
H OW TO FORMULATE LIKELIHOOD ?
,
14 / 36
36. Line Fitting Expectation Maximization
W HAT IS M ISSING DATA ?
An (n × g) matrix M of indicator variables.
1 if point k is drawn from line l
k, lth entry of M = mk,l =
0 otherwise
P (mkl = 1 | point k, line l s parameters) = 1.
l
H OW TO FORMULATE LIKELIHOOD ?
(distance from point k to line l )2
exp −
2σ2
,
14 / 36
37. Motion Segmentation EM
W HAT IS M ISSING DATA ? It is the motion field to which the pixel l
belongs. Indicator variable Vxy,l is the xy, l th entry of V .
1 if xy th pixel belongs to the l th motion field
Vxy,l =
0 otherwise
H OW TO FORMULATE LIKELIHOOD ?
,
15 / 36
38. Motion Segmentation EM
W HAT IS M ISSING DATA ? It is the motion field to which the pixel l
belongs. Indicator variable Vxy,l is the xy, l th entry of V .
1 if xy th pixel belongs to the l th motion field
Vxy,l =
0 otherwise
H OW TO FORMULATE LIKELIHOOD ?
(I1 (x, y) − I2 (x+m1 (x, y ; θl ), y+m2 (x, y ; θl )) ) 2
L(V , Θ) = − Vxy,l
xy,l
2σ2
where Θ = θ1 , θ2 , . . . θg
P Vxy,l = 1 ; I1 , I2 , Θ
,
15 / 36
39. Motion Segmentation EM
H OW TO FORMULATE LIKELIHOOD ?
P Vxy,l = 1 ; I1 , I2 , Θ
A common choice is the affine motion model:
m1 a11 a12 x a13
(x, y ; θl ) =
m2 a21 a22 y a23
where θl = (a11 , a12 , . . . , a23 )
Layered representation
,
16 / 36
40. Identifying Outliers EM
We construct an explicit model of the outliers.
(1 − λ) P (measurements | model) +λ P (outliers)
Here λ = [0, 1] models the frequency with which the outliers
occur,
P (outliers) is the probability model for the outliers.
W HAT IS M ISSING DATA ?
A variable that indicates which component generated each point.
,
17 / 36
41. Identifying Outliers EM
We construct an explicit model of the outliers.
(1 − λ) P (measurements | model) +λ P (outliers)
Here λ = [0, 1] models the frequency with which the outliers
occur,
P (outliers) is the probability model for the outliers.
W HAT IS M ISSING DATA ?
A variable that indicates which component generated each point.
Complete data likelihood
(1 − λ) P measurementj | model + λ P measurementj | outliers
j
,
17 / 36
42. Background Subtraction EM
For each pixel we get a series of observations for the successive
frames.
The source of these obeservations is a mixture model with two
components: the background and the noise (foreground).
The background can be modeled as a Gaussian.
The noise can come from some uniform source.
Any pixel which belongs to noise is not background.
,
18 / 36
43. Difficulties Expectation Maximization
Local minima.
Proper initialization.
Extremely small expected weights.
Parameters converging to the boundaries of parameter space.
,
19 / 36
44. Model Selection
Should we consider minimizing the negative of log likelihood?
,
20 / 36
45. Model Selection
Should we consider minimizing the negative of log likelihood?
We should have a penalty term which increases as the number of
components increase.
,
20 / 36
46. Model Selection
Should we consider minimizing the negative of log likelihood?
We should have a penalty term which increases as the number of
components increase.
An Information Criteria (AIC)
−2L(x ; Θ∗ ) + 2p
where p is the number of free parameters.
,
20 / 36
47. Model Selection
Should we consider minimizing the negative of log likelihood?
We should have a penalty term which increases as the number of
components increase.
An Information Criteria (AIC)
−2L(x ; Θ∗ ) + 2p
where p is the number of free parameters.
Bayesian Information Criteria (BIC)
p
−L(D ; θ∗ ) + log N
2
where p is the number of free parameters.
,
20 / 36
48. Bayesian Information Criteria (BIC)
P (D | M)
P (M | D) = P (M)
P (D)
P (D | M , θ) P (θ) dθ
= P (M)
P (D)
Maximizing the posterior P (M | D) yields:
p
−L(D ; θ∗ ) + 2 log N
where p is the number of free parameters.
,
21 / 36
49. Minimum Description Length (MDL) criteria
It yields a selection criteria which is the same as BIC.
p
−L(D ; θ∗ ) + log N
2
where p is the number of free parameters.
,
22 / 36