Through a linear Gaussian process, we can unify a family of Gaussian linear models including Factor Analysis, PCA, Kalman Filters, Mixture of Gaussians, and Hidden Markov Models.
This document discusses probabilistic planning domains and solutions. It defines a probabilistic planning domain as one where actions have multiple possible outcomes, each with a probability. Solutions must be either safe, with a probability of 1 of reaching the goal, or unsafe, with a probability between 0 and 1. Both acyclic and cyclic safe policies are possible, while unsafe policies can get stuck with some probability less than 1.
We study a purely functional quantum extension of lambda calculus, that is, an extension of lambda calculus to express some quantum features, where the quantum memory is abstracted out. This calculus is a typed extension of the first-order linear-algebraic lambda-calculus. The type is linear on superpositions, so to forbid from cloning them, while allows to clone basis vectors. We provide examples of the Deutsch algorithm and the Teleportation, and prove the subject reduction of the calculus. In addition, we provide a denotational semantics where superposed types are interpreted as vector spaces and non-superposed types as their basis.
The document discusses temporal planning and modeling of actions over time. It introduces the concept of representing planning problems using a time-oriented view with timelines rather than a state-oriented view. A timeline consists of temporal assertions about state variables over time intervals along with constraints. Actions are modeled as triples containing a name, a set of temporal assertions describing the effects over time, and constraints. This allows overlapping actions and reasoning about how state variable values change over time to be represented.
This document discusses probabilistic planning domains and solutions. It introduces the concept of actions having probabilistic outcomes in a probabilistic planning domain. Solutions to stochastic shortest path problems must be either safe, with a probability of 1 of reaching the goal, or unsafe, with a probability between 0 and 1. Both acyclic and cyclic safe policies are possible, while unsafe policies can get stuck in implicit or explicit dead ends with some non-zero probability of failing to reach the goal. Examples of different types of policies are provided to illustrate safe versus unsafe solutions.
This document discusses Approximate Bayesian Computation (ABC), a simulation-based method for conducting Bayesian inference when the likelihood function is intractable or impossible to evaluate directly. ABC produces an approximation of the posterior distribution by simulating data under different parameter values and accepting simulations that match the observed data. The document provides background on how ABC originated from population genetics models and outlines some of the advances in ABC, including how it can be used as an inference machine to estimate parameters from simulated data.
This document discusses approximate Bayesian computation (ABC). ABC allows Bayesian inference when the likelihood function is intractable or impossible to evaluate directly. It introduces ABC, describes how it originated from population genetics models, and outlines some of its limitations and advances, including various related computational methods like ABC with empirical likelihoods. The document also examines how ABC relates to other simulation-based statistical methods and considers perspectives on how Bayesian ABC can be.
This document discusses representing planning domains for automated planning and acting. It describes using a state-transition model with states, actions, and a prediction function to model deterministic environments. Planning domains are represented by describing states in terms of objects and properties, and actions in terms of preconditions and effects that change property values. A state-variable representation is introduced where varying properties are represented by state variables that can take on different values in different states. An example domain models robots, containers, and locations using state variables.
This document discusses regular expressions and finite automata. It begins by defining regular expressions, which are sequences of characters that define search patterns. It then discusses how regular expressions are used to define formal languages and how finite automata can be constructed to recognize these languages. Specific topics covered include the definition of regular expressions, building regular expressions, constructing finite automata from regular expressions, applying Arden's theorem to find regular expressions from finite automata, and proving languages are non-regular using the pumping lemma. Examples are provided to demonstrate how to construct finite automata from regular expressions and apply these concepts.
This document discusses probabilistic planning domains and solutions. It defines a probabilistic planning domain as one where actions have multiple possible outcomes, each with a probability. Solutions must be either safe, with a probability of 1 of reaching the goal, or unsafe, with a probability between 0 and 1. Both acyclic and cyclic safe policies are possible, while unsafe policies can get stuck with some probability less than 1.
We study a purely functional quantum extension of lambda calculus, that is, an extension of lambda calculus to express some quantum features, where the quantum memory is abstracted out. This calculus is a typed extension of the first-order linear-algebraic lambda-calculus. The type is linear on superpositions, so to forbid from cloning them, while allows to clone basis vectors. We provide examples of the Deutsch algorithm and the Teleportation, and prove the subject reduction of the calculus. In addition, we provide a denotational semantics where superposed types are interpreted as vector spaces and non-superposed types as their basis.
The document discusses temporal planning and modeling of actions over time. It introduces the concept of representing planning problems using a time-oriented view with timelines rather than a state-oriented view. A timeline consists of temporal assertions about state variables over time intervals along with constraints. Actions are modeled as triples containing a name, a set of temporal assertions describing the effects over time, and constraints. This allows overlapping actions and reasoning about how state variable values change over time to be represented.
This document discusses probabilistic planning domains and solutions. It introduces the concept of actions having probabilistic outcomes in a probabilistic planning domain. Solutions to stochastic shortest path problems must be either safe, with a probability of 1 of reaching the goal, or unsafe, with a probability between 0 and 1. Both acyclic and cyclic safe policies are possible, while unsafe policies can get stuck in implicit or explicit dead ends with some non-zero probability of failing to reach the goal. Examples of different types of policies are provided to illustrate safe versus unsafe solutions.
This document discusses Approximate Bayesian Computation (ABC), a simulation-based method for conducting Bayesian inference when the likelihood function is intractable or impossible to evaluate directly. ABC produces an approximation of the posterior distribution by simulating data under different parameter values and accepting simulations that match the observed data. The document provides background on how ABC originated from population genetics models and outlines some of the advances in ABC, including how it can be used as an inference machine to estimate parameters from simulated data.
This document discusses approximate Bayesian computation (ABC). ABC allows Bayesian inference when the likelihood function is intractable or impossible to evaluate directly. It introduces ABC, describes how it originated from population genetics models, and outlines some of its limitations and advances, including various related computational methods like ABC with empirical likelihoods. The document also examines how ABC relates to other simulation-based statistical methods and considers perspectives on how Bayesian ABC can be.
This document discusses representing planning domains for automated planning and acting. It describes using a state-transition model with states, actions, and a prediction function to model deterministic environments. Planning domains are represented by describing states in terms of objects and properties, and actions in terms of preconditions and effects that change property values. A state-variable representation is introduced where varying properties are represented by state variables that can take on different values in different states. An example domain models robots, containers, and locations using state variables.
This document discusses regular expressions and finite automata. It begins by defining regular expressions, which are sequences of characters that define search patterns. It then discusses how regular expressions are used to define formal languages and how finite automata can be constructed to recognize these languages. Specific topics covered include the definition of regular expressions, building regular expressions, constructing finite automata from regular expressions, applying Arden's theorem to find regular expressions from finite automata, and proving languages are non-regular using the pumping lemma. Examples are provided to demonstrate how to construct finite automata from regular expressions and apply these concepts.
RuleML2015: Learning Characteristic Rules in Geographic Information SystemsRuleML
We provide a general framework for learning characterization
rules of a set of objects in Geographic Information Systems (GIS) relying
on the definition of distance quantified paths. Such expressions specify
how to navigate between the different layers of the GIS starting from
the target set of objects to characterize. We have defined a generality
relation between quantified paths and proved that it is monotonous with
respect to the notion of coverage, thus allowing to develop an interactive
and effective algorithm to explore the search space of possible rules. We
describe GISMiner, an interactive system that we have developed based
on our framework. Finally, we present our experimental results from a
real GIS about mineral exploration.
This document summarizes a presentation on computing polytopes via a vertex oracle. It discusses:
1. Using a vertex oracle that takes a direction vector as input and outputs a vertex of the resultant polytope that is extremal in that direction.
2. An incremental algorithm that starts with an inner approximation of the resultant polytope and iteratively calls the oracle to extend illegal facets until the approximation equals the resultant polytope.
3. The oracle works by lifting the point set to construct a regular subdivision, then refining to a triangulation to extract the vertex of the resultant polytope extremal in the given direction.
The document discusses methods for performing spatial statistics on large datasets. Standard maximum likelihood estimation is computationally infeasible for datasets with tens of thousands of observations due to the need to compute and store large covariance matrices. The document outlines several approximation methods that can accommodate large datasets, including variogram fitting, pairwise likelihood approximations, independent block approximations, tapering of the covariance function, low-rank approximations using basis functions, and approximations based on stochastic partial differential equations. These methods allow inference for large spatial datasets by avoiding direct computation and storage of large covariance matrices.
To make Reinforcement Learning Algorithms work in the real-world, one has to get around (what Sutton calls) the "deadly triad": the combination of bootstrapping, function approximation and off-policy evaluation. The first step here is to understand Value Function Vector Space/Geometry and then make one's way into Gradient TD Algorithms (a big breakthrough to overcome the "deadly triad").
We present a proof of the Generalized Riemann hypothesis (GRH) based on asymptotic expansions and operations on series. The advantage of our method is that it only uses undergraduate maths which makes it accessible to a wider audience.
The document summarizes the policy gradient theorem, which provides a way to perform policy improvement in reinforcement learning using gradient ascent on the expected returns with respect to the policy parameters. It begins by motivating policy gradients as a way to do policy improvement when the action space is large or continuous. It then defines the necessary notation, expected returns objective function, and discounted state visitation measure. The main part of the document proves the policy gradient theorem, which expresses the policy gradient as an expectation over the discounted state visitation measure and action-value function. It notes that in practice the action-value function must be estimated, and proves the compatible function approximation theorem, which ensures the policy gradient is computed correctly when using an estimated action-value
We present a proof of the Generalized Riemann hypothesis (GRH) based on asymptotic expansions and operations on series. The advantage of our method is that it only uses undergraduate maths which makes it accessible to a wider audience.
This Presentation describes, in short, Introduction to Time Series and the overall procedure required for Time Series Modelling including general terminologies and algorithms. However the detailed Mathematics is excluded in the slides, this ppt means to give a start to understanding the Time Series Modelling before going into detailed Statistics.
Taylor's Theorem for Matrix Functions and Pseudospectral Bounds on the Condit...Sam Relton
We describe a generalization of Taylor's theorem to matrix functions, with an explicit remainder term. We then apply pseudospectral theory to bound the condition number of the matrix function, using the previous theorem.
Discussion of Fearnhead and Prangle, RSS< Dec. 14, 2011Christian Robert
The document discusses approximate Bayesian computation (ABC), a technique used when the likelihood function is intractable. ABC works by simulating data under different parameter values and accepting simulations that are close to the observed data according to a distance measure. The key challenges are choosing a sufficient summary statistic of the data and setting the tolerance level. Later sections discuss using a noisy ABC approach, where the summary statistic is perturbed, and calibrating the method so that the ABC posterior converges to the true parameter as the number of simulations increases. The document examines issues around choosing optimal summary statistics and tolerance levels to minimize errors in the ABC approximation.
This document discusses pushdown automata and context-free grammars. It begins by defining a pushdown automaton as a finite state machine with an input tape and stack. There are two ways for a PDA to accept a string - by reaching a final state or emptying the stack. Algorithms are provided for constructing a PDA from a CFG and vice versa. A two-stack PDA is also introduced, which has the computational power of a Turing machine. Examples are given to illustrate PDA constructions.
Presentation of the paper "An output-sensitive algorithm for computing (projections of) resultant polytopes" in the Annual Symposium on Computational Geometry (SoCG 2012)
Query Answering in Probabilistic Datalog+/{ Ontologies under Group PreferencesOana Tifrea-Marciuska
This document outlines an introduction to Datalog+/–, which is an ontology language that can represent tuple-generating dependencies (TGDs). It describes how queries are answered over Datalog+/– ontologies by using the chase procedure to apply TGDs. As an example, it shows applying the chase to an ontology with TGDs describing travel activities and its initial database. The chase results in adding inferred atoms with null values to represent existential variables.
This document outlines a presentation on query answering in probabilistic Datalog+/– ontologies under group preferences. It begins with an introduction that motivates the need to model group preferences and uncertainty on the semantic web. It then provides preliminaries on Datalog+/– and the chase procedure. Finally, it outlines the components of the proposed model for handling group preferences and different strategies for answering top-k ranked disjunctive atomic queries under the model.
Gibbs flow transport for Bayesian inferenceJeremyHeng10
Minisymposium on "Selected topics in computation and dynamics: machine learning and multiscale methods" at SciCADE 2019, Innsbruck, July 2019.
https://scicade2019.uibk.ac.at/
Slides are based on the article in https://arxiv.org/abs/1509.08787
This document provides an overview of separation logic, including:
- Applications include program analysis, verified software, and axiomatic semantics.
- Future work may focus on logics beyond pre/post conditions to specify order of actions or observable program states.
- SpaceInvader is an implementation of compositional shape analysis via bi-abduction that uses separation logic to reason about mutable data structures.
- Smallfoot is an earlier tool that used symbolic execution and a decidable fragment of separation logic to perform automatic reasoning with Hoare logic for a toy language.
This document discusses using the sequence of iterates generated by inertial methods to minimize convex functions. It introduces inertial methods and how they can be used to generate sequences that converge to the minimum. While the last iterate is often used, sometimes averaging over iterates or using extrapolations like Aitken acceleration can provide better estimates of the minimum. Inertial methods allow for more exploration of the function space than gradient descent alone. The geometry of the function may provide opportunities to analyze the iterate sequence and obtain improved convergence estimates.
This document summarizes Frank Nielsen's talk on divergence-based center clustering and their applications. Some key points:
- Center-based clustering aims to minimize an objective function that assigns data points to their closest cluster centers. This is an NP-hard problem when the number of dimensions and data points are greater than 1.
- Mixed divergences use dual centroids per cluster to define cluster assignments. Total Jensen divergences are proposed as a way to make divergences more robust by incorporating a conformal factor.
- For clustering when centroids do not have closed-form solutions, initialization methods like k-means++ can be used which randomly select initial seeds without computing centroids. Total Jensen k-means++
This document discusses Renyi's entropy, a generalization of Shannon entropy. It was developed by Alfred Renyi who sought a definition of information measures that preserved additivity for independent events and was compatible with probability axioms. Renyi's entropy includes Shannon entropy as a special case and allows a more flexible notion of entropy than Shannon through a parameter α. The document focuses on Renyi's quadratic entropy (α=2) and describes how it can be estimated directly from samples using a kernel density estimation approach called the information potential. Several properties of Renyi's entropy and its estimators are also outlined.
RuleML2015: Learning Characteristic Rules in Geographic Information SystemsRuleML
We provide a general framework for learning characterization
rules of a set of objects in Geographic Information Systems (GIS) relying
on the definition of distance quantified paths. Such expressions specify
how to navigate between the different layers of the GIS starting from
the target set of objects to characterize. We have defined a generality
relation between quantified paths and proved that it is monotonous with
respect to the notion of coverage, thus allowing to develop an interactive
and effective algorithm to explore the search space of possible rules. We
describe GISMiner, an interactive system that we have developed based
on our framework. Finally, we present our experimental results from a
real GIS about mineral exploration.
This document summarizes a presentation on computing polytopes via a vertex oracle. It discusses:
1. Using a vertex oracle that takes a direction vector as input and outputs a vertex of the resultant polytope that is extremal in that direction.
2. An incremental algorithm that starts with an inner approximation of the resultant polytope and iteratively calls the oracle to extend illegal facets until the approximation equals the resultant polytope.
3. The oracle works by lifting the point set to construct a regular subdivision, then refining to a triangulation to extract the vertex of the resultant polytope extremal in the given direction.
The document discusses methods for performing spatial statistics on large datasets. Standard maximum likelihood estimation is computationally infeasible for datasets with tens of thousands of observations due to the need to compute and store large covariance matrices. The document outlines several approximation methods that can accommodate large datasets, including variogram fitting, pairwise likelihood approximations, independent block approximations, tapering of the covariance function, low-rank approximations using basis functions, and approximations based on stochastic partial differential equations. These methods allow inference for large spatial datasets by avoiding direct computation and storage of large covariance matrices.
To make Reinforcement Learning Algorithms work in the real-world, one has to get around (what Sutton calls) the "deadly triad": the combination of bootstrapping, function approximation and off-policy evaluation. The first step here is to understand Value Function Vector Space/Geometry and then make one's way into Gradient TD Algorithms (a big breakthrough to overcome the "deadly triad").
We present a proof of the Generalized Riemann hypothesis (GRH) based on asymptotic expansions and operations on series. The advantage of our method is that it only uses undergraduate maths which makes it accessible to a wider audience.
The document summarizes the policy gradient theorem, which provides a way to perform policy improvement in reinforcement learning using gradient ascent on the expected returns with respect to the policy parameters. It begins by motivating policy gradients as a way to do policy improvement when the action space is large or continuous. It then defines the necessary notation, expected returns objective function, and discounted state visitation measure. The main part of the document proves the policy gradient theorem, which expresses the policy gradient as an expectation over the discounted state visitation measure and action-value function. It notes that in practice the action-value function must be estimated, and proves the compatible function approximation theorem, which ensures the policy gradient is computed correctly when using an estimated action-value
We present a proof of the Generalized Riemann hypothesis (GRH) based on asymptotic expansions and operations on series. The advantage of our method is that it only uses undergraduate maths which makes it accessible to a wider audience.
This Presentation describes, in short, Introduction to Time Series and the overall procedure required for Time Series Modelling including general terminologies and algorithms. However the detailed Mathematics is excluded in the slides, this ppt means to give a start to understanding the Time Series Modelling before going into detailed Statistics.
Taylor's Theorem for Matrix Functions and Pseudospectral Bounds on the Condit...Sam Relton
We describe a generalization of Taylor's theorem to matrix functions, with an explicit remainder term. We then apply pseudospectral theory to bound the condition number of the matrix function, using the previous theorem.
Discussion of Fearnhead and Prangle, RSS< Dec. 14, 2011Christian Robert
The document discusses approximate Bayesian computation (ABC), a technique used when the likelihood function is intractable. ABC works by simulating data under different parameter values and accepting simulations that are close to the observed data according to a distance measure. The key challenges are choosing a sufficient summary statistic of the data and setting the tolerance level. Later sections discuss using a noisy ABC approach, where the summary statistic is perturbed, and calibrating the method so that the ABC posterior converges to the true parameter as the number of simulations increases. The document examines issues around choosing optimal summary statistics and tolerance levels to minimize errors in the ABC approximation.
This document discusses pushdown automata and context-free grammars. It begins by defining a pushdown automaton as a finite state machine with an input tape and stack. There are two ways for a PDA to accept a string - by reaching a final state or emptying the stack. Algorithms are provided for constructing a PDA from a CFG and vice versa. A two-stack PDA is also introduced, which has the computational power of a Turing machine. Examples are given to illustrate PDA constructions.
Presentation of the paper "An output-sensitive algorithm for computing (projections of) resultant polytopes" in the Annual Symposium on Computational Geometry (SoCG 2012)
Query Answering in Probabilistic Datalog+/{ Ontologies under Group PreferencesOana Tifrea-Marciuska
This document outlines an introduction to Datalog+/–, which is an ontology language that can represent tuple-generating dependencies (TGDs). It describes how queries are answered over Datalog+/– ontologies by using the chase procedure to apply TGDs. As an example, it shows applying the chase to an ontology with TGDs describing travel activities and its initial database. The chase results in adding inferred atoms with null values to represent existential variables.
This document outlines a presentation on query answering in probabilistic Datalog+/– ontologies under group preferences. It begins with an introduction that motivates the need to model group preferences and uncertainty on the semantic web. It then provides preliminaries on Datalog+/– and the chase procedure. Finally, it outlines the components of the proposed model for handling group preferences and different strategies for answering top-k ranked disjunctive atomic queries under the model.
Gibbs flow transport for Bayesian inferenceJeremyHeng10
Minisymposium on "Selected topics in computation and dynamics: machine learning and multiscale methods" at SciCADE 2019, Innsbruck, July 2019.
https://scicade2019.uibk.ac.at/
Slides are based on the article in https://arxiv.org/abs/1509.08787
This document provides an overview of separation logic, including:
- Applications include program analysis, verified software, and axiomatic semantics.
- Future work may focus on logics beyond pre/post conditions to specify order of actions or observable program states.
- SpaceInvader is an implementation of compositional shape analysis via bi-abduction that uses separation logic to reason about mutable data structures.
- Smallfoot is an earlier tool that used symbolic execution and a decidable fragment of separation logic to perform automatic reasoning with Hoare logic for a toy language.
This document discusses using the sequence of iterates generated by inertial methods to minimize convex functions. It introduces inertial methods and how they can be used to generate sequences that converge to the minimum. While the last iterate is often used, sometimes averaging over iterates or using extrapolations like Aitken acceleration can provide better estimates of the minimum. Inertial methods allow for more exploration of the function space than gradient descent alone. The geometry of the function may provide opportunities to analyze the iterate sequence and obtain improved convergence estimates.
This document summarizes Frank Nielsen's talk on divergence-based center clustering and their applications. Some key points:
- Center-based clustering aims to minimize an objective function that assigns data points to their closest cluster centers. This is an NP-hard problem when the number of dimensions and data points are greater than 1.
- Mixed divergences use dual centroids per cluster to define cluster assignments. Total Jensen divergences are proposed as a way to make divergences more robust by incorporating a conformal factor.
- For clustering when centroids do not have closed-form solutions, initialization methods like k-means++ can be used which randomly select initial seeds without computing centroids. Total Jensen k-means++
This document discusses Renyi's entropy, a generalization of Shannon entropy. It was developed by Alfred Renyi who sought a definition of information measures that preserved additivity for independent events and was compatible with probability axioms. Renyi's entropy includes Shannon entropy as a special case and allows a more flexible notion of entropy than Shannon through a parameter α. The document focuses on Renyi's quadratic entropy (α=2) and describes how it can be estimated directly from samples using a kernel density estimation approach called the information potential. Several properties of Renyi's entropy and its estimators are also outlined.
Advenuture tourism: The role of social comparison theory in successful advert...katedudley575
This document summarizes Katie Dudley's research on the role of social comparison theory in effective adventure tourism advertising images. The study uses an experimental design to test how the intensity level portrayed in adventure activity images and a person's social comparison orientation affect their attitude toward the image, activity, and purchase intentions. The results found that while intensity level did not significantly impact responses, a high social comparison orientation led to more favorable attitudes and higher purchase intentions. The research aims to help adventure tourism marketers effectively appeal to different audience segments through advertising.
This document provides vocabulary related to public transportation fares, verbs, and common English idioms and expressions. It includes:
1. Lists of verbs and prepositions related to public transportation fares increasing, decreasing, and being cut down.
2. Examples of idiomatic expressions using commonly tested verbs like "to rule", "to regard", "to get up", and "to catch up".
3. Practice questions from tests of English grammar and vocabulary, focusing on topics like sports shoes, kit items, and being in a hurry.
The document discusses an organization called SETUP that is exploring future scenarios in culture and business. It mentions creative programming and coding as well as apps that can be listed. The project involves 4 sessions of 2 hours each for 200 euros, which includes a tablet. It asks how creative coding can be taught and notes that England and the US are working on this, while encouraging spreading the word to teachers.
A breakdown of political reform items on which Ronald Kimmons and John Culberson agree and disagree. For more information, see democracy.com/ronald-kimmons.
Aprender3C is a project aimed at students, teachers, and those in the library and information science fields. It provides webinars and webcasts on topics related to knowledge sharing and social learning. The organization analyzes its impact through social media analytics and website statistics. It aims to transform society into one based on shared knowledge through community, knowledge sharing, and collaboration.
This document provides an overview of the mass anti-government protests in Ukraine (known as Euromaidan) that began in late 2013 in response to the Ukrainian government suspending plans to sign an association agreement with the European Union. It describes how peaceful protests emerged and grew in size in Kiev and other regions of Ukraine throughout November 2013. However, on November 30th security forces violently dispersed protesters in Kiev, beating dozens and leaving some missing. This crackdown transformed the protests into more openly anti-government demonstrations and led to further government oppression of activists and journalists.
Tang and Song dynasties are known for poetry, landscape painting, and calligraphy. Famous Tang poets include Li Bai and Du Fu, who wrote poems about nature, friendship, and politics that are still widely read today. Landscape painting flourished during Song dynasty, using techniques like multiple perspective views that presented nature's unlimited space. Yuan drama featured love stories and social issues. Classical novels from later periods like Romance of the Three Kingdoms, Water Margin, and Journey to the West are still popular, telling epic tales. Dream of the Red Chamber provides insights into late Qing society through its depiction of a declining aristocratic family.
King Amanullah established a 7km tramway in the 1920s between Kabul and Darulaman using small steam locomotives. Over the following decades, various proposals were made to build railways connecting Afghanistan to neighboring countries like Pakistan, Iran, and Central Asia, but political instability prevented implementation. In recent years, Afghanistan has worked to establish rail connections to Uzbekistan, Turkmenistan, Iran and Pakistan through projects utilizing different track gauges, requiring several break-of-gauge stations. Future plans also include a line from China through Afghanistan to Pakistan.
This document provides a unifying review of linear Gaussian models. It discusses both continuous and discrete state static and dynamic models. The basic models involve hidden states that evolve linearly over time with Gaussian noise and observations of these states that are also corrupted by Gaussian noise. Common techniques like Kalman filtering, expectation-maximization, and principal component analysis are described as special cases of inference and learning solutions to these basic models. The review highlights relationships between models and opportunities for future extensions to other distributions or mixture state formulations.
We approach the screening problem - i.e. detecting which inputs of a computer model significantly impact the output - from a formal Bayesian model selection point of view. That is, we place a Gaussian process prior on the computer model and consider the $2^p$ models that result from assuming that each of the subsets of the $p$ inputs affect the response. The goal is to obtain the posterior probabilities of each of these models. In this talk, we focus on the specification of objective priors on the model-specific parameters and on convenient ways to compute the associated marginal likelihoods. These two problems that normally are seen as unrelated, have challenging connections since the priors proposed in the literature are specifically designed to have posterior modes in the boundary of the parameter space, hence precluding the application of approximate integration techniques based on e.g. Laplace approximations. We explore several ways of circumventing this difficulty, comparing different methodologies with synthetic examples taken from the literature.
Authors: Gonzalo Garcia-Donato (Universidad de Castilla-La Mancha) and Rui Paulo (Universidade de Lisboa)
This talk will report briey on some findings from the problem of picking the weights for a weighted function space in QMC. Then it will be mostly about importance sampling. We want to estimate the probability _ of a union of J rare events. The method uses n samples, each of which picks one of the rare events at random, samples conditionally on that rare event happening and counts the total number of rare events that happen. It was used by Naiman and Priebe for scan
statistics, Shi, Siegmund and Yakir for genomic scans and Adler, Blanchet and Liu for extrema of Gaussian processes. We call it ALOE, for `at least one event'. The ALOE estimate is unbiased and we find that it has a coefficient of variation no larger than p (J + J�1 � 2)=(4n). The coefficient of variation is also no larger than p (__=_ � 1)=n where __ is the union bound. Our motivating problem comes from power system reliability, where the phase differences between connected nodes have a joint Gaussian distribution and the J rare events arise from unacceptably large phase differences. In the grid reliability problems even some events defined by 5772
constraints in 326 dimensions, with probability below 10�22, are estimated with a coefficient of variation of about 0:0024 with only n = 10;000 sample values. In a genomic context, the rare events become false discoveries. There we are interested in the possibility of a large number of simultaneous events, not just one or more. Some work with Kenneth Tay will be presented on that problem.
Joint with Yury Maximov and Michael Chertkov Los Alamos National Laboratory and Kenneth Tay, Stanford
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
This document discusses an empirical Bayesian approach for estimating regularization parameters in inverse problems using maximum likelihood estimation. It proposes the Stochastic Optimization with Unadjusted Langevin (SOUL) algorithm, which uses Markov chain sampling to approximate gradients in a stochastic projected gradient descent scheme for optimizing the regularization parameter. The algorithm is shown to converge to the maximum likelihood estimate under certain conditions on the log-likelihood and prior distributions.
A generalized class of normalized distance functions called Q-Metrics is described in this presentation. The Q-Metrics approach relies on a unique functional, using a single bounded parameter (Lambda), which characterizes the conventional distance functions in a normalized per-unit metric space. In addition to this coverage property, a distinguishing and extremely attractive characteristic of the Q-Metric function is its low computational complexity. Q-Metrics satisfy the standard metric axioms. Novel networks for classification and regression tasks are defined and constructed using Q-Metrics. These new networks are shown to outperform conventional feed forward back propagation networks with the same size when tested on real data sets.
A generalized class of normalized distance functions called Q-Metrics is described in this presentation. The Q-Metrics approach relies on a unique functional, using a single bounded parameter Lambda, which characterizes the conventional distance functions in a normalized per-unit metric space. In addition to this coverage property, a distinguishing and extremely attractive characteristic of the Q-Metric function is its low computational complexity. Q-Metrics satisfy the standard metric axioms. Novel networks for classification and regression tasks are defined and constructed using Q-Metrics. These new networks are shown to outperform conventional feed forward back propagation networks with the same size when tested on real data sets.
Linear Bayesian update surrogate for updating PCE coefficientsAlexander Litvinenko
This is our joint work with colleagues from TU Braunschweig. Prof. H. G. Matthies had an excellent idea to develop a Bayesian surrogate formula for updating not probability densities (like in classical Bayesian formula), but PCE coefficients of the given random variable. Bojana Rosic implemented the linear case. I (with help of Elmar Zander) implemented non-linear case. Later on Elmar significantly simplified the algorithm.
This document introduces modern variational inference techniques. It discusses:
1. The goal of variational inference is to approximate the posterior distribution p(θ|D) over latent parameters θ given data D.
2. This is done by positing a variational distribution qλ(θ) and optimizing its parameters λ to minimize the KL divergence between qλ(θ) and p(θ|D).
3. The evidence lower bound (ELBO) is used as a variational objective that can be optimized using stochastic gradient descent, with gradients estimated using Monte Carlo sampling and reparametrization.
Learning to discover monte carlo algorithm on spin ice manifoldKai-Wen Zhao
The global update Monte Carlo sampler can be discovered naturally by trained machine using policy gradient method on topologically constrained environment.
1. Ryan White presented a dissertation defense on random walks on random lattices and their applications.
2. The presentation included models of stochastic cumulative loss processes with delayed observation, where losses arrive randomly over time and the process is observed at random observation times.
3. A time-insensitive analysis was performed to derive a joint functional of the process at successive observation times, allowing properties like the distribution of the first observed threshold crossing to be determined.
This document provides a summary of supervised learning techniques including linear regression, logistic regression, support vector machines, naive Bayes classification, and decision trees. It defines key concepts such as hypothesis, loss functions, cost functions, and gradient descent. It also covers generative models like Gaussian discriminant analysis, and ensemble methods such as random forests and boosting. Finally, it discusses learning theory concepts such as the VC dimension, PAC learning, and generalization error bounds.
A walk through the intersection between machine learning and mechanistic mode...JuanPabloCarbajal3
Talk at EURECOM, France.
It overviews regression in several of its forms: regularized, constrained, and mixed. It builds the bridge between machine learning and dynamical models.
Beginnig with reviewing Basyain Theorem and chain rule, then explain MAP Estimation; Maximum A Posteriori Estimation.
In the framework of MAP Estimation, we can describe a lot of famous models; naive bayes, regularized redge regression, logistic regression, log-linear model, and gaussian process.
MAP estimation is powerful framework to understand the above models from baysian point of view and cast possibility to extend models to semi-supervised ones.
The document provides an introduction to Markov Chain Monte Carlo (MCMC) methods. It discusses using MCMC to sample from distributions when direct sampling is difficult. Specifically, it introduces Gibbs sampling and the Metropolis-Hastings algorithm. Gibbs sampling updates variables one at a time based on their conditional distributions. Metropolis-Hastings proposes candidate samples and accepts or rejects them to converge to the target distribution. The document provides examples and outlines the algorithms to construct Markov chains that sample distributions of interest.
Gaussian process regression is a non-parametric Bayesian approach for supervised learning problems. It can be used to model an unknown function by specifying a prior directly over functions, such that the posterior incorporates the constraints from the training data. The kernel trick allows specifying an infinite-dimensional feature space without explicitly defining features. This allows selecting valid covariance functions that implicitly define features and incorporate prior knowledge about the solution.
This document describes methods for modeling spatial extremes using semiparametric approaches. It introduces a spatial skew-t process model that exhibits asymptotic dependence while maintaining computational tractability. It then extends this to a semiparametric Dirichlet process mixture of spatial skew-t processes that can flexibly model both the bulk distribution and tails without requiring a threshold. This flexible model is shown to perform well compared to parametric alternatives in simulations for both spatial prediction and modeling extremes.
This document summarizes an approach to perform online Gaussian process regression using random feature selection in order to address the computational challenges of traditional GPR. It proposes combining random feature mapping with online Bayesian linear regression to develop a fast approximate GPR model that can perform online learning from streaming data. The goal is to apply this method to motion planning for a 7-DOF robotic arm. The algorithm will be implemented in MATLAB/Octave and tested on inverse dynamics problems using a Barrett Technology robot arm.
The document discusses computational methods for Bayesian statistics when direct simulation from the target distribution is not possible or efficient. It introduces Markov chain Monte Carlo (MCMC) methods, including the Metropolis-Hastings algorithm and Gibbs sampler, which generate dependent samples that approximate the target distribution. The Metropolis-Hastings algorithm uses a proposal distribution to randomly walk through the parameter space. Approximate Bayesian computation (ABC) is also introduced as a method that approximates the posterior distribution when the likelihood is intractable.
The document discusses time series analysis and model selection. It introduces partial autocorrelation as a tool to determine the order of autoregressive (AR) models, as the autocorrelation does not directly reveal the order of AR models. The partial autocorrelation of an AR(p) model will be zero after lag p. Model selection criteria like the Akaike Information Criterion (AIC) provide a way to trade off model fit and complexity. Cross-validation for time series requires evaluating models on a rolling forecast origin rather than random data partitions. Linear processes provide a connection between autoregressive (AR) and moving average (MA) models, as both can be represented as linear combinations of noise terms.
This document discusses approximate Bayesian computation (ABC) techniques for performing Bayesian inference when the likelihood function is not available in closed form. It covers the basic ABC algorithm and discusses challenges with high-dimensional data. It also summarizes recent advances in ABC that incorporate nonparametric regression, reproducing kernel Hilbert spaces, and neural networks to help address these challenges.
Similar to A Unifying Review of Gaussian Linear Models (Roweis 1999) (20)
Accelerating Metropolis Hastings with Lightweight Inference CompilationFeynman Liang
This document summarizes research on accelerating Metropolis-Hastings sampling with lightweight inference compilation. It discusses background on probabilistic programming languages and Bayesian inference techniques like variational inference and sequential importance sampling. It introduces the concept of inference compilation, where a neural network is trained to construct proposals for MCMC that better match the posterior. The paper proposes a lightweight approach to inference compilation for imperative probabilistic programs that trains proposals conditioned on execution prefixes to address issues with sequential importance sampling.
Detecting paraphrases using recursive autoencodersFeynman Liang
Presentation on deep learning applied to natural language processing, presented at University of Cambridge Machine Learning Group's Research and Communication Club 2-11-2015 meeting.
This document summarizes a method called transplantation that can be used to show two planar domains have the same spectrum and are therefore isospectral. Transplantation takes a Dirichlet eigenfunction on one domain and constructs a corresponding eigenfunction on the other domain with the same eigenvalue. This is done by dividing the domains into congruent triangles and piecing together the restrictions of the eigenfunction in a way that satisfies continuity and boundary conditions. Numerical computation of the discretized Laplacian spectrum on sample isospectral domains verifies the transplanted eigenfunctions have identical eigenvalues, demonstrating the domains are isospectral.
Recursive Autoencoders for Paraphrase Detection (Socher et al)Feynman Liang
1) The document presents an approach called unfolding recursive autoencoders (RAEs) to detect paraphrases between sentences.
2) RAEs learn vector representations of phrases and sentences by reconstructing parse trees, and an unfolding approach is introduced to better capture the meaning of longer phrases.
3) A dynamic pooling layer is used to create fixed-size similarity matrices for variable length sentences to classify as paraphrases or not.
4) Experimental results on a paraphrase detection dataset show the unfolding RAE with dynamic pooling achieves state-of-the-art performance at the time in 2011.
Engineered histone acetylation using DNA-binding domains (DBD), chemical ind...Feynman Liang
Feynman Liang proposes engineering histone acetylation using DNA-binding domains, chemical inducers of dimerization, and histone acetyltransferases. He will construct a DNA-binding domain that mimics CLOCK:BMAL1, recruit a histone acetyltransferase using chemical inducers of dimerization, and toggle histone acetylation to disrupt circadian rhythms at the epigenetic level in mouse cell cultures. This modular method could specifically modify the histone code by recruiting histone modifiers to targeted DNA sequences.
A LOV2 Domain-Based Optogenetic Tool to Control Protein Degradation and Cellu...Feynman Liang
The document describes the development of a photosensitive degron (psd) module that allows for light-activated and reversible protein degradation. The psd module is based on a light-oxygen-voltage (LOV) domain from plants and a degradation domain. Testing showed that the psd module enables specific, cryptic, quantitative, and reversible degradation of target proteins in yeast cells. The technique was applied to control the yeast cell cycle and conditionally degrade various yeast genes. Compartmentalized modeling supported the experimental findings.
Integrated Genomic and Proteomic Analyses of a Systematically Perturbed Metab...Feynman Liang
The document summarizes a study that systematically perturbed the galactose metabolic pathway in yeast through genetic and environmental modifications. It integrated observations from the perturbations with an initial pathway model and a global interaction network to formulate new hypotheses. Gene and protein expression profiles were analyzed through clustering and compared to predicted responses to refine the pathway model and interaction network.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
A Unifying Review of Gaussian Linear Models (Roweis 1999)
1. A Unifying Review of Linear Gaussian Models1
Sam Roweis, Zoubin Ghahramani
Feynman Liang
Application #: 10342444
November 11, 2014
1Roweis, Sam, and Zoubin Ghahramani. A Unifying Review of Linear Gaussian
Models." Neural Computation 11.2 (1999): 305{45. Print.
F. Liang Linear Gaussian Models Nov 2014 1 / 18
3. cially disparate models: : :
(a) Factor Analysis (b) PCA
(c) Mixture of Gaussians (d) Hidden Markov Models
F. Liang Linear Gaussian Models Nov 2014 2 / 18
4. Outline
Basic model
Inference and learning
problems
EM algorithm
Various specializations of
the basic model
Factor Analysis
R = lim!0 I
SPCA
PCA
Kalman Filter
Gaussian Mixture Model
1-NN
HMM
cts state A = 0 R diag
R = I
A6= 0
discrete state
A = 0 R = lim!0 R0
A6= 0
F. Liang Linear Gaussian Models Nov 2014 3 / 18
5. The Basic (Generative) Model
Goal: Model P(fxtgt
=1; fytgt
=1)
Assumptions:
Linear dynamics, additive Gaussian
noise
xt+1 = Axt + w; w N(0;Q)
yt = Cxt + v; v N(0; R)
wlog E[w] = E[v] = 0
Markov property
Time homogeneity
w
+
xt xt+1
yt
v
A
C
+
t
Figure: The Basic Model as a DBN
P(fxtgt
=1; fytgt
=1) = P(x1)
Y1
t=1
P(xt+1jxt )
Y
t=1
P(yt jxt )
F. Liang Linear Gaussian Models Nov 2014 4 / 18
6. Why Gaussians?
Gaussian family closed under ane transforms
x N(x ;x ); y N(y ;y ); a; b; c 2 R
=) ax + by + c N(ax + by + c; a2x + b2y )
Gaussian is conjugate prior for Gaussian likelihood
P(x) Normal; P(yjx) Normal =) P(xjy) Normal
F. Liang Linear Gaussian Models Nov 2014 5 / 18
7. The Inference Problem
Given the system model and initial distribution (fA; C;Q; R; 1;Q1g):
Filtering: P(xt jfyigti
=1)
Smoothing: P(xt jfyigi
=1) where t
If we had the partition function:
P(fyigi=1) =
Z
8fxi gi
=1
P(fxig; fyig)dfxig
Then
P(xt jfyigi
=1) =
P(fxig; fyig)
P(fyig)
F. Liang Linear Gaussian Models Nov 2014 6 / 18
8. The Learning Problem
Let = fA; C;Q; R; 1;Q1g, X = fxigi
=1, Y = fyigi
=1.
Given (several) observable sequences Y :
arg max L() = arg max log P(Y j)
Solved by expectation maximization.
F. Liang Linear Gaussian Models Nov 2014 7 / 18
9. Expectation Maximixation
For any distribution Q on Sx :
L() F(Q; ) =
Z
X
Q(X) log P(X; Y j)
Z
X
Q(X) logQ(X)dX
= L() + H(Q; P(jY ; )) H(Q)
= L() DKL(QjjP(jY ; ))
Monotonically increasing coordinate ascent on F(Q; ):
E step: Qk+1 arg maxQ F(Q; k ) = P(XjY ; k )
M step: k+1 arg max F(Qk+1; )
F. Liang Linear Gaussian Models Nov 2014 8 / 18
10. Continuous-State Static Modeling
Assumptions:
x is continuously supported
A = 0
x = w N(0;Q) =) y = Cx + v N(0;CQCT + R)
wlog Q = I
Ecient Inference Using Sucient Statistics: Gaussian is conjugate
prior for Gaussian likelihood, so
P(xjy) = N(
13. = CT (CCT + R)1
Learning: R must be constrained to avoid degenerate solution. . .
F. Liang Linear Gaussian Models Nov 2014 9 / 18
14. Continuous-State Static Modeling: Factor Analysis
y = Cx + v N(0; CCT + R)
Additional Assumption:
R diagonal =) observation noise v independent along basis for y
Interpretation:
R : variance along basis
C : correlation structure of latent factors
Properties:
Scale invariant
Not rotation invariant
F. Liang Linear Gaussian Models Nov 2014 10 / 18
15. Continuous-State Static Modeling: SPCA and PCA
y = Cx + v N(0; CCT + R)
Additional Assumptions:
R = I ; 2 R
For PCA: R = lim!0 I
Interpretation:
: global noise level
Columns of C : principal components
(optimizes three equivalent objectives)
Properties
Rotation invariant
Not scale invariant
F. Liang Linear Gaussian Models Nov 2014 11 / 18
17. lter assuming linearity and normality (conjugate prior)
F. Liang Linear Gaussian Models Nov 2014 12 / 18
18. Discrete-State Modeling: Winner-Takes-All (WTA)
Non-linearity
Assume: x discretely supported,
R
7!
P
Winner-Takes-All Non-Linearity: WTA[x] = ei where i = arg maxj xj
xt+1 = WTA[Axt + w] w N(;Q)
yt = Cxt + v v N(0; R)
x WTA[N(; )] de
19. nes a probability vector where i = P(x = ei ) =
probability mass assigned by N(; ) to fz 2 Sx : 8j6= i : (z)i (z)jg
F. Liang Linear Gaussian Models Nov 2014 13 / 18
20. Static Discrete-State Modeling: Mixture of Gaussians and
Vector Quantization
x = WTA[w] w N(;Q)
y = Cx + v v N(0; R)
Additional Assumption: A = 0
Mixture of Gaussians:
P(y) =
X
i
P(x = ej ; y) =
X
i
N(Ci ; R)i
All Gaussians have same covariance R
Inference:
P(x = ej jy) =
P(x = ej ; y)
P(y)
=
PN(Cj ; R)j
i N(Ci ; R)i
Vector Quantization: R = lim!0 R0
F. Liang Linear Gaussian Models Nov 2014 14 / 18
21. Dynamic Discrete-State Modeling: Hidden Markov Models
xt+1 = WTA[Axt + w] w N(0;Q)
yt = Cxt + v v N(0; R)
Theorem
Any Markov chain transition dynamics T can be equivalently modeled
using A and Q in the above model and vice versa.
All states have same emission covariance R
Learning: EM Algorithm (Baum-Welch)
Inference: Viterbi Algorithm for MAP estimate
In discrete case, MAP estimate6= least-squares estimate
Approaches Kalman
23. ner
F. Liang Linear Gaussian Models Nov 2014 15 / 18
24. Conclusions
Linearity and normality =) computationally tractable
Universal basic model generalizes idiosyncratic special cases and
highlights relationships (e.g. static vs dynamic, zero noise limit,
hyperparameter selection)
Uni
25. ed set of equations and algorithms for inference and learning
F. Liang Linear Gaussian Models Nov 2014 16 / 18
27. ed algorithms not the most ecient
Can only model y with support Rp, x with support Rk or f1; : : : ; ng
Future Work:
Increase hierarchy beyond two levels (e.g. Speech ! n-gram !
PCFG)
Relax time homogeneity assumption (e.g. Extended Kalman Filter)
Extend to other distributions
Try other (likelihood,conjugate prior) pairs
Approximate inference (MH-MCMC)
F. Liang Linear Gaussian Models Nov 2014 17 / 18
28. References
S. Roweis, Z. Ghahramani.
A Unifying Review of Linear Gaussian Models.
Computation and Neural Systems, 11(2):305{345, 1999.
Image Attributions:
http://www.robots.ox.ac.uk/ parg/projects/ica/riz/Thesis/Figs/var/MoG.jpeg
https://github.com/echen/restricted-boltzmann-machines
http://upload.wikimedia.org/wikipedia/commons/1/15/GaussianScatterPCA.png
http://www.ee.columbia.edu/ln/LabROSA/doc/HTKBook21/img15.gif
http://commons.wikimedia.org/wiki/File:Basic concept of Kalman