My presentation at University of Nottingham "Fast low-rank methods for solvin...Alexander Litvinenko
Overview of my (with co-authors) low-rank tensor methods for solving PDEs with uncertain coefficients. Connection with Bayesian Update. Solving a coupled system: stochastic forward and stochastic inverse.
Quantitative Propagation of Chaos for SGD in Wide Neural NetworksValentin De Bortoli
The document discusses quantitative analysis of stochastic gradient descent (SGD) for training wide neural networks. It presents two different regimes - a deterministic regime where the limiting dynamics is described by an ordinary differential equation, and a stochastic regime where the limiting dynamics is a stochastic differential equation. Experiments on MNIST classification show that the stochastic regime with larger step sizes exhibits better regularization properties. The analysis provides insights into the behavior of neural network training as the number of neurons becomes large.
OLC assembly involves three main steps:
1. Overlap - Compute all overlaps between reads to construct an overlap graph
2. Layout - Bundle stretches of the overlap graph into contigs
3. Consensus - Pick the most likely nucleotide sequence for each contig by determining consensus from the underlying reads
R package 'bayesImageS': a case study in Bayesian computation using Rcpp and ...Matt Moores
There are many approaches to Bayesian computation with intractable likelihoods, including the exchange algorithm, approximate Bayesian computation (ABC), thermodynamic integration, and composite likelihood. These approaches vary in accuracy as well as scalability for datasets of significant size. The Potts model is an example where such methods are required, due to its intractable normalising constant. This model is a type of Markov random field, which is commonly used for image segmentation. The dimension of its parameter space increases linearly with the number of pixels in the image, making this a challenging application for scalable Bayesian computation. My talk will introduce various algorithms in the context of the Potts model and describe their implementation in C++, using OpenMP for parallelism. I will also discuss the process of releasing this software as an open source R package on the CRAN repository.
1. The document presents Plug-and-Play priors for Bayesian imaging using Langevin-based sampling methods.
2. It introduces the Bayesian framework for image restoration and discusses challenges in modeling the prior.
3. A Plug-and-Play approach is proposed that uses an implicit prior defined by a denoising network in conjunction with Langevin sampling, termed PnP-ULA. Experiments demonstrate its effectiveness on image deblurring and inpainting tasks.
This document discusses macrocanonical models for texture synthesis. It begins by introducing the goal of texture synthesis and providing a brief history. It then describes the parametric question of combining randomness and structure in images. Specifically, it discusses maximizing entropy under geometric constraints. The document goes on to discuss links to statistical physics, defining microcanonical and macrocanonical models. It focuses on studying the macrocanonical model, describing how to find optimal parameters through gradient descent and how to sample from the model using Langevin dynamics. The document provides examples of texture synthesis and compares results to other methods.
After we applied the stochastic Galerkin method to solve stochastic PDE, and solve large linear system, we obtain stochastic solution (random field), which is represented in Karhunen Loeve and PCE basis. No sampling error is involved, only algebraic truncation error. Now we would like to escape classical MCMC path to compute the posterior. We develop an Bayesian* update formula for KLE-PCE coefficients.
My presentation at University of Nottingham "Fast low-rank methods for solvin...Alexander Litvinenko
Overview of my (with co-authors) low-rank tensor methods for solving PDEs with uncertain coefficients. Connection with Bayesian Update. Solving a coupled system: stochastic forward and stochastic inverse.
Quantitative Propagation of Chaos for SGD in Wide Neural NetworksValentin De Bortoli
The document discusses quantitative analysis of stochastic gradient descent (SGD) for training wide neural networks. It presents two different regimes - a deterministic regime where the limiting dynamics is described by an ordinary differential equation, and a stochastic regime where the limiting dynamics is a stochastic differential equation. Experiments on MNIST classification show that the stochastic regime with larger step sizes exhibits better regularization properties. The analysis provides insights into the behavior of neural network training as the number of neurons becomes large.
OLC assembly involves three main steps:
1. Overlap - Compute all overlaps between reads to construct an overlap graph
2. Layout - Bundle stretches of the overlap graph into contigs
3. Consensus - Pick the most likely nucleotide sequence for each contig by determining consensus from the underlying reads
R package 'bayesImageS': a case study in Bayesian computation using Rcpp and ...Matt Moores
There are many approaches to Bayesian computation with intractable likelihoods, including the exchange algorithm, approximate Bayesian computation (ABC), thermodynamic integration, and composite likelihood. These approaches vary in accuracy as well as scalability for datasets of significant size. The Potts model is an example where such methods are required, due to its intractable normalising constant. This model is a type of Markov random field, which is commonly used for image segmentation. The dimension of its parameter space increases linearly with the number of pixels in the image, making this a challenging application for scalable Bayesian computation. My talk will introduce various algorithms in the context of the Potts model and describe their implementation in C++, using OpenMP for parallelism. I will also discuss the process of releasing this software as an open source R package on the CRAN repository.
1. The document presents Plug-and-Play priors for Bayesian imaging using Langevin-based sampling methods.
2. It introduces the Bayesian framework for image restoration and discusses challenges in modeling the prior.
3. A Plug-and-Play approach is proposed that uses an implicit prior defined by a denoising network in conjunction with Langevin sampling, termed PnP-ULA. Experiments demonstrate its effectiveness on image deblurring and inpainting tasks.
This document discusses macrocanonical models for texture synthesis. It begins by introducing the goal of texture synthesis and providing a brief history. It then describes the parametric question of combining randomness and structure in images. Specifically, it discusses maximizing entropy under geometric constraints. The document goes on to discuss links to statistical physics, defining microcanonical and macrocanonical models. It focuses on studying the macrocanonical model, describing how to find optimal parameters through gradient descent and how to sample from the model using Langevin dynamics. The document provides examples of texture synthesis and compares results to other methods.
After we applied the stochastic Galerkin method to solve stochastic PDE, and solve large linear system, we obtain stochastic solution (random field), which is represented in Karhunen Loeve and PCE basis. No sampling error is involved, only algebraic truncation error. Now we would like to escape classical MCMC path to compute the posterior. We develop an Bayesian* update formula for KLE-PCE coefficients.
bayesImageS: Bayesian computation for medical Image Segmentation using a hidd...Matt Moores
This document summarizes an R package called bayesImageS that enables Bayesian computation for medical image segmentation using a hidden Potts model. It discusses the statistical model, which involves a hidden Markov random field with a Potts prior on the latent labels. Bayesian computation methods like Gibbs sampling and Metropolis-Hastings using pseudolikelihood approximation are implemented in C++ for efficiency. Experimental results demonstrate the package on a CT electron density phantom and patient radiotherapy data.
Bayesian Inference and Uncertainty Quantification for Inverse ProblemsMatt Moores
So-called “inverse” problems arise when the parameters of a physical system cannot be directly observed. The mapping between these latent parameters and the space of noisy observations is represented as a mathematical model, often involving a system of differential equations. We seek to infer the parameter values that best fit our observed data. However, it is also vital to obtain accurate quantification of the uncertainty involved with these parameters, particularly when the output of the model will be used for forecasting. Bayesian inference provides well-calibrated uncertainty estimates, represented by the posterior distribution over the parameters. In this talk, I will give a brief introduction to Markov chain Monte Carlo (MCMC) algorithms for sampling from the posterior distribution and describe how they can be combined with numerical solvers for the forward model. We apply these methods to two examples of ODE models: growth curves in ecology, and thermogravimetric analysis (TGA) in chemistry. This is joint work with Matthew Berry, Mark Nelson, Brian Monaghan and Raymond Longbottom.
We apply tensor train (TT) data format to solve an elliptic PDE with uncertain coefficients. We reduce complexity and storage from exponential to linear. Post-processing in TT format is also provided.
This document discusses mixture models and the Expectation Maximization (EM) algorithm. It begins by introducing mixture models like Gaussian mixture models (GMMs) which model data as a mixture of distributions. Learning the parameters of these models is difficult because the component assignments are latent variables. The EM algorithm addresses this by iteratively computing expectations of the latent variables given the current parameters (E-step) and maximizing the expected complete log likelihood (M-step). This provides a way to learn the parameters of mixture models when latent variables are involved.
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
This document discusses an empirical Bayesian approach for estimating regularization parameters in inverse problems using maximum likelihood estimation. It proposes the Stochastic Optimization with Unadjusted Langevin (SOUL) algorithm, which uses Markov chain sampling to approximate gradients in a stochastic projected gradient descent scheme for optimizing the regularization parameter. The algorithm is shown to converge to the maximum likelihood estimate under certain conditions on the log-likelihood and prior distributions.
ICML2013読み会 Large-Scale Learning with Less RAM via RandomizationHidekazu Oiwa
Large-Scale Learning with Less RAM via Randomization proposes algorithms that reduce memory usage for machine learning models during training and prediction while maintaining prediction accuracy. It introduces a method called randomized rounding that represents model weights with fewer bits by randomly rounding values to the nearest representation. An algorithm is proposed that uses randomized rounding and adaptive learning rates on a per-coordinate basis, providing theoretical guarantees on regret bounds. Memory usage is reduced by 50% during training and 95% during prediction compared to standard floating point representation.
This document discusses various methods for calculating Wasserstein distance between probability distributions, including:
- Sliced Wasserstein distance, which projects distributions onto lower-dimensional spaces to enable efficient 1D optimal transport calculations.
- Max-sliced Wasserstein distance, which focuses sampling on the most informative projection directions.
- Generalized sliced Wasserstein distance, which uses more flexible projection functions than simple slicing, like the Radon transform.
- Augmented sliced Wasserstein distance, which applies a learned transformation to distributions before projecting, allowing more expressive matching between distributions.
These sliced/generalized Wasserstein distances have been used as loss functions for generative models with promising
- The document discusses methods for determining when to stop sampling in Monte Carlo integration to achieve a desired error tolerance.
- For independent and identically distributed (IID) sampling, the central limit theorem can be used to determine the necessary sample size based on the variance of the integrand.
- Quasi-Monte Carlo sampling can achieve faster convergence rates by using low-discrepancy point sets that more uniformly sample the domain. The error can be analyzed in the frequency domain based on the decay of the true Fourier coefficients.
- Bayesian cubature methods model the integrand as a Gaussian process, allowing inference of hyperparameters from sample points to improve integration accuracy.
Hierarchical matrix techniques for maximum likelihood covariance estimationAlexander Litvinenko
1. We apply hierarchical matrix techniques (HLIB, hlibpro) to approximate huge covariance matrices. We are able to work with 250K-350K non-regular grid nodes.
2. We maximize a non-linear, non-convex Gaussian log-likelihood function to identify hyper-parameters of covariance.
Delayed acceptance for Metropolis-Hastings algorithmsChristian Robert
The document proposes a delayed acceptance method for accelerating Metropolis-Hastings algorithms. It begins with a motivating example of non-informative inference for mixture models where computing the prior density is costly. It then introduces the delayed acceptance approach which splits the acceptance probability into pieces that are evaluated sequentially, avoiding computing the full acceptance ratio each time. It validates that the delayed acceptance chain is reversible and provides bounds on its spectral gap and asymptotic variance compared to the original chain. Finally, it discusses optimizing the delayed acceptance approach by considering the expected square jump distance and cost per iteration to maximize efficiency.
To describe the dynamics taking place in networks that structurally change over time, we propose an approach to search for attributes whose value changes impact the topology of the graph. In several applications, it appears that the variations of a group of attributes are often followed by some structural changes in the graph that one may assume they generate. We formalize the triggering pattern discovery problem as a method jointly rooted in sequence mining and graph analysis. We apply our approach on three real-world dynamic graphs of different natures - a co-authoring network, an airline network, and a social bookmarking system - assessing the relevancy of the triggering pattern mining approach.
Low-rank matrix approximations in Python by Christian Thurau PyData 2014PyData
Low-rank approximations of data matrices have become an important tool in machine learning and data mining. They allow for embedding high dimensional data in lower dimensional spaces and can therefore mitigate effects due to noise, uncover latent relations, or facilitate further processing. These properties have been proven successful in many application areas such as bio-informatics, computer vision, text processing, recommender systems, social network analysis, among others. Present day technologies are characterized by exponentially growing amounts of data. Recent advances in sensor technology, internet applications, and communication networks call for methods that scale to very large and/or growing data matrices. In this talk, we will describe how to efficiently analyze data by means of matrix factorization using the Python Matrix Factorization Toolbox (PyMF) and HDF5. We will briefly cover common methods such as k-means clustering, PCA, or Archetypal Analysis which can be easily cast as a matrix decomposition, and explain their usefulness for everyday data analysis tasks.
The document discusses algorithms for solving various optimization problems related to knapsack problems and scheduling problems. It begins by describing an efficient linear-time algorithm to find the largest subrectangle of 1s in a binary matrix using dynamic programming. It then discusses improvements to the space complexity of the 0/1 knapsack problem and algorithms for variants where items have unlimited quantities or values. Finally, it proposes algorithms for problems involving scheduling jobs on a single machine to maximize profit while meeting deadlines and partitioning a list into subsets with minimal sum difference.
My PhD talk "Application of H-matrices for computing partial inverse"Alexander Litvinenko
This document describes a hierarchical domain decomposition (HDD) method for solving stochastic elliptic boundary value problems with oscillatory or jumping coefficients. HDD constructs mappings between boundary and interface values that allow the solution to be computed locally in each subdomain. These mappings are represented as H-matrices to reduce computational costs. The total storage cost of HDD is O(kn log2nh) and complexity is O(k2nh log3nh), where n is the number of degrees of freedom, k is the H-matrix rank, and h is the mesh size. HDD can also be used to compute solutions when the right-hand side is represented on a coarser grid.
Convex Optimization Modelling with CVXOPTandrewmart11
An introduction to convex optimization modelling using cvxopt in an IPython environment. The facility location problem is used as an example to demonstrate modelling in cvxopt.
Master thesis job shop generic time lag max plusSiddhartha Verma
The document discusses solving the job shop scheduling problem with time lags using max-plus algebra. It begins with motivations and an outline. It then provides background on max-plus algebra, the job shop problem with time lags, and previous related works. The document models the job shop problem with time lags constraints using max-plus algebra, representing the constraints as potential inequalities and disjunctive constraints. It presents case studies, including a naval industry example of a 3x3 job shop problem with time lags, modeled with max-plus matrices. The document analyzes feasibility of schedules using spectral theory and max-plus algebra properties.
Low-rank tensor methods for stochastic forward and inverse problemsAlexander Litvinenko
The document discusses low-rank tensor methods for solving partial differential equations (PDEs) with uncertain coefficients. It covers two parts: (1) using the stochastic Galerkin method to discretize an elliptic PDE with uncertain diffusion coefficient represented by tensors, and (2) computing quantities of interest like the maximum value from the tensor solution in a efficient way. Specifically, it describes representing the diffusion coefficient, forcing term, and solution of the discretized PDE using tensors, and computing the maximum value and corresponding indices by solving an eigenvalue problem involving the tensor solution.
bayesImageS: Bayesian computation for medical Image Segmentation using a hidd...Matt Moores
This document summarizes an R package called bayesImageS that enables Bayesian computation for medical image segmentation using a hidden Potts model. It discusses the statistical model, which involves a hidden Markov random field with a Potts prior on the latent labels. Bayesian computation methods like Gibbs sampling and Metropolis-Hastings using pseudolikelihood approximation are implemented in C++ for efficiency. Experimental results demonstrate the package on a CT electron density phantom and patient radiotherapy data.
Bayesian Inference and Uncertainty Quantification for Inverse ProblemsMatt Moores
So-called “inverse” problems arise when the parameters of a physical system cannot be directly observed. The mapping between these latent parameters and the space of noisy observations is represented as a mathematical model, often involving a system of differential equations. We seek to infer the parameter values that best fit our observed data. However, it is also vital to obtain accurate quantification of the uncertainty involved with these parameters, particularly when the output of the model will be used for forecasting. Bayesian inference provides well-calibrated uncertainty estimates, represented by the posterior distribution over the parameters. In this talk, I will give a brief introduction to Markov chain Monte Carlo (MCMC) algorithms for sampling from the posterior distribution and describe how they can be combined with numerical solvers for the forward model. We apply these methods to two examples of ODE models: growth curves in ecology, and thermogravimetric analysis (TGA) in chemistry. This is joint work with Matthew Berry, Mark Nelson, Brian Monaghan and Raymond Longbottom.
We apply tensor train (TT) data format to solve an elliptic PDE with uncertain coefficients. We reduce complexity and storage from exponential to linear. Post-processing in TT format is also provided.
This document discusses mixture models and the Expectation Maximization (EM) algorithm. It begins by introducing mixture models like Gaussian mixture models (GMMs) which model data as a mixture of distributions. Learning the parameters of these models is difficult because the component assignments are latent variables. The EM algorithm addresses this by iteratively computing expectations of the latent variables given the current parameters (E-step) and maximizing the expected complete log likelihood (M-step). This provides a way to learn the parameters of mixture models when latent variables are involved.
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
This document discusses an empirical Bayesian approach for estimating regularization parameters in inverse problems using maximum likelihood estimation. It proposes the Stochastic Optimization with Unadjusted Langevin (SOUL) algorithm, which uses Markov chain sampling to approximate gradients in a stochastic projected gradient descent scheme for optimizing the regularization parameter. The algorithm is shown to converge to the maximum likelihood estimate under certain conditions on the log-likelihood and prior distributions.
ICML2013読み会 Large-Scale Learning with Less RAM via RandomizationHidekazu Oiwa
Large-Scale Learning with Less RAM via Randomization proposes algorithms that reduce memory usage for machine learning models during training and prediction while maintaining prediction accuracy. It introduces a method called randomized rounding that represents model weights with fewer bits by randomly rounding values to the nearest representation. An algorithm is proposed that uses randomized rounding and adaptive learning rates on a per-coordinate basis, providing theoretical guarantees on regret bounds. Memory usage is reduced by 50% during training and 95% during prediction compared to standard floating point representation.
This document discusses various methods for calculating Wasserstein distance between probability distributions, including:
- Sliced Wasserstein distance, which projects distributions onto lower-dimensional spaces to enable efficient 1D optimal transport calculations.
- Max-sliced Wasserstein distance, which focuses sampling on the most informative projection directions.
- Generalized sliced Wasserstein distance, which uses more flexible projection functions than simple slicing, like the Radon transform.
- Augmented sliced Wasserstein distance, which applies a learned transformation to distributions before projecting, allowing more expressive matching between distributions.
These sliced/generalized Wasserstein distances have been used as loss functions for generative models with promising
- The document discusses methods for determining when to stop sampling in Monte Carlo integration to achieve a desired error tolerance.
- For independent and identically distributed (IID) sampling, the central limit theorem can be used to determine the necessary sample size based on the variance of the integrand.
- Quasi-Monte Carlo sampling can achieve faster convergence rates by using low-discrepancy point sets that more uniformly sample the domain. The error can be analyzed in the frequency domain based on the decay of the true Fourier coefficients.
- Bayesian cubature methods model the integrand as a Gaussian process, allowing inference of hyperparameters from sample points to improve integration accuracy.
Hierarchical matrix techniques for maximum likelihood covariance estimationAlexander Litvinenko
1. We apply hierarchical matrix techniques (HLIB, hlibpro) to approximate huge covariance matrices. We are able to work with 250K-350K non-regular grid nodes.
2. We maximize a non-linear, non-convex Gaussian log-likelihood function to identify hyper-parameters of covariance.
Delayed acceptance for Metropolis-Hastings algorithmsChristian Robert
The document proposes a delayed acceptance method for accelerating Metropolis-Hastings algorithms. It begins with a motivating example of non-informative inference for mixture models where computing the prior density is costly. It then introduces the delayed acceptance approach which splits the acceptance probability into pieces that are evaluated sequentially, avoiding computing the full acceptance ratio each time. It validates that the delayed acceptance chain is reversible and provides bounds on its spectral gap and asymptotic variance compared to the original chain. Finally, it discusses optimizing the delayed acceptance approach by considering the expected square jump distance and cost per iteration to maximize efficiency.
To describe the dynamics taking place in networks that structurally change over time, we propose an approach to search for attributes whose value changes impact the topology of the graph. In several applications, it appears that the variations of a group of attributes are often followed by some structural changes in the graph that one may assume they generate. We formalize the triggering pattern discovery problem as a method jointly rooted in sequence mining and graph analysis. We apply our approach on three real-world dynamic graphs of different natures - a co-authoring network, an airline network, and a social bookmarking system - assessing the relevancy of the triggering pattern mining approach.
Low-rank matrix approximations in Python by Christian Thurau PyData 2014PyData
Low-rank approximations of data matrices have become an important tool in machine learning and data mining. They allow for embedding high dimensional data in lower dimensional spaces and can therefore mitigate effects due to noise, uncover latent relations, or facilitate further processing. These properties have been proven successful in many application areas such as bio-informatics, computer vision, text processing, recommender systems, social network analysis, among others. Present day technologies are characterized by exponentially growing amounts of data. Recent advances in sensor technology, internet applications, and communication networks call for methods that scale to very large and/or growing data matrices. In this talk, we will describe how to efficiently analyze data by means of matrix factorization using the Python Matrix Factorization Toolbox (PyMF) and HDF5. We will briefly cover common methods such as k-means clustering, PCA, or Archetypal Analysis which can be easily cast as a matrix decomposition, and explain their usefulness for everyday data analysis tasks.
The document discusses algorithms for solving various optimization problems related to knapsack problems and scheduling problems. It begins by describing an efficient linear-time algorithm to find the largest subrectangle of 1s in a binary matrix using dynamic programming. It then discusses improvements to the space complexity of the 0/1 knapsack problem and algorithms for variants where items have unlimited quantities or values. Finally, it proposes algorithms for problems involving scheduling jobs on a single machine to maximize profit while meeting deadlines and partitioning a list into subsets with minimal sum difference.
My PhD talk "Application of H-matrices for computing partial inverse"Alexander Litvinenko
This document describes a hierarchical domain decomposition (HDD) method for solving stochastic elliptic boundary value problems with oscillatory or jumping coefficients. HDD constructs mappings between boundary and interface values that allow the solution to be computed locally in each subdomain. These mappings are represented as H-matrices to reduce computational costs. The total storage cost of HDD is O(kn log2nh) and complexity is O(k2nh log3nh), where n is the number of degrees of freedom, k is the H-matrix rank, and h is the mesh size. HDD can also be used to compute solutions when the right-hand side is represented on a coarser grid.
Convex Optimization Modelling with CVXOPTandrewmart11
An introduction to convex optimization modelling using cvxopt in an IPython environment. The facility location problem is used as an example to demonstrate modelling in cvxopt.
Master thesis job shop generic time lag max plusSiddhartha Verma
The document discusses solving the job shop scheduling problem with time lags using max-plus algebra. It begins with motivations and an outline. It then provides background on max-plus algebra, the job shop problem with time lags, and previous related works. The document models the job shop problem with time lags constraints using max-plus algebra, representing the constraints as potential inequalities and disjunctive constraints. It presents case studies, including a naval industry example of a 3x3 job shop problem with time lags, modeled with max-plus matrices. The document analyzes feasibility of schedules using spectral theory and max-plus algebra properties.
Low-rank tensor methods for stochastic forward and inverse problemsAlexander Litvinenko
The document discusses low-rank tensor methods for solving partial differential equations (PDEs) with uncertain coefficients. It covers two parts: (1) using the stochastic Galerkin method to discretize an elliptic PDE with uncertain diffusion coefficient represented by tensors, and (2) computing quantities of interest like the maximum value from the tensor solution in a efficient way. Specifically, it describes representing the diffusion coefficient, forcing term, and solution of the discretized PDE using tensors, and computing the maximum value and corresponding indices by solving an eigenvalue problem involving the tensor solution.
In this work we discuss how to compute KLE with complexity O(k n log n), how to approximate large covariance matrices (in H-matrix format), how to use the Lanczos method.
We solve elliptic PDE with uncertain coefficients. We apply Karhunen-Loeve expansion to separate stochastic part from spatial part. The corresponding eigenvalue problem with covariance function is solved via the Hierarchical Matrix technique. We also demonstrate how low-rank tensor method can be applied for high-dimensional problems (e.g., to compute higher order statistical moments) . We provide explicit formulas to compute statistical moments of order k with linear complexity.
This document discusses Bayesian inference on mixtures models. It covers several key topics:
1. Density approximation and consistency results for mixtures as a way to approximate unknown distributions.
2. The "scarcity phenomenon" where the posterior probabilities of most component allocations in mixture models are zero, concentrating on just a few high probability allocations.
3. Challenges with Bayesian inference for mixtures, including identifiability issues, label switching, and complex combinatorial calculations required to integrate over all possible component allocations.
The asynchronous parallel algorithms are developed to solve massive optimization problems in a distributed data system, which can be run in parallel on multiple nodes with little or no synchronization. Recently they have been successfully implemented to solve a range of difficult problems in practice. However, the existing theories are mostly based on fairly restrictive assumptions on the delays, and cannot explain the convergence and speedup properties of such algorithms. In this talk we will give an overview on distributed optimization, and discuss some new theoretical results on the convergence of asynchronous parallel stochastic gradient algorithm with unbounded delays. Simulated and real data will be used to demonstrate the practical implication of these theoretical results.
Solving inverse problems via non-linear Bayesian Update of PCE coefficientsAlexander Litvinenko
We derive non-linear approximation of Bayesian update for PCE coefficients. We avoid the usage of Monte Carlo Markov Chain formula to compute posterior.
The document discusses a robust hp-adaptation method for discontinuous Galerkin discretizations applied to aerodynamic flows. It presents a constrained pseudo-transient continuation approach to enforce physical realizability constraints during the solution process. It also describes output-based error estimation techniques to drive anisotropic hp-mesh adaptation and identify regions important for accurate output prediction. The goal is to obtain quantitatively reliable computational fluid dynamics solutions on coarse grids for engineering analysis applications.
Efficient Simulations for Contamination of Groundwater Aquifers under Uncerta...Alexander Litvinenko
1. Solved time-dependent density driven flow problem with uncertain porosity and permeability in 2D and 3D
2. Computed propagation of uncertainties in porosity into the mass fraction.
3. Computed the mean, variance, exceedance probabilities, quantiles, risks.
4. Such QoIs as the number of fingers, their size, shape, propagation time can be unstable
5. For moderate perturbations, our gPCE surrogate results are similar to qMC results.
6. Used highly scalable solver on up to 800 computing nodes,
MVPA with SpaceNet: sparse structured priorsElvis DOHMATOB
The GraphNet (aka S-Lasso), as well as other “sparsity + structure” priors like TV (Total-Variation), TV-L1, etc., are not easily applicable to brain data because of technical problems
relating to the selection of the regularization parameters. Also, in
their own right, such models lead to challenging high-dimensional optimization problems. In this manuscript, we present some heuristics for speeding up the overall optimization process: (a) Early-stopping, whereby one halts the optimization process when the test score (performance on leftout data) for the internal cross-validation for model-selection stops improving, and (b) univariate feature-screening, whereby irrelevant (non-predictive) voxels are detected and eliminated before the optimization problem is entered, thus reducing the size of the problem. Empirical results with GraphNet on real MRI (Magnetic Resonance Imaging) datasets indicate that these heuristics are a win-win strategy, as they add speed without sacrificing the quality of the predictions. We expect the proposed heuristics to work on other models like TV-L1, etc.
Possible applications of low-rank tensors in statistics and UQ (my talk in Bo...Alexander Litvinenko
Just some ideas how low-rank matrices/tensors can be useful in spatial and environmental statistics, where one usually has to deal with very large data
Identification of the Mathematical Models of Complex Relaxation Processes in ...Vladimir Bakhrushin
The approach to solving the problem of complex relaxation spectra is presented.
Presentation for the XI International Conference on Defect interaction and anelastic phenomena in solids. Tula, 2007.
This document summarizes a thesis on numerical methods for stochastic systems subject to generalized Levy noise. It includes:
1) Motivation for studying such systems from both mathematical and applicational perspectives, such as in mathematical finance and chaotic flows.
2) An introduction to Levy processes and the probability collocation method (PCM) for uncertainty quantification (UQ).
3) Details on improving PCM through a multi-element approach and constructing orthogonal polynomials for discrete measures.
This document summarizes a talk given by Heiko Strathmann on using partial posterior paths to estimate expectations from large datasets without full posterior simulation. The key ideas are:
1. Construct a path of "partial posteriors" by sequentially adding mini-batches of data and computing expectations over these posteriors.
2. "Debias" the path of expectations to obtain an unbiased estimator of the true posterior expectation using a technique from stochastic optimization literature.
3. This approach allows estimating posterior expectations with sub-linear computational cost in the number of data points, without requiring full posterior simulation or imposing restrictions on the likelihood.
Experiments on synthetic and real-world examples demonstrate competitive performance versus standard M
In this work, we propose to apply trust region optimization to deep reinforcement
learning using a recently proposed Kronecker-factored approximation to
the curvature. We extend the framework of natural policy gradient and propose
to optimize both the actor and the critic using Kronecker-factored approximate
curvature (K-FAC) with trust region; hence we call our method Actor Critic using
Kronecker-Factored Trust Region (ACKTR). To the best of our knowledge, this
is the first scalable trust region natural gradient method for actor-critic methods.
It is also a method that learns non-trivial tasks in continuous control as well as
discrete control policies directly from raw pixel inputs. We tested our approach
across discrete domains in Atari games as well as continuous domains in the MuJoCo
environment. With the proposed methods, we are able to achieve higher
rewards and a 2- to 3-fold improvement in sample efficiency on average, compared
to previous state-of-the-art on-policy actor-critic methods. Code is available at
https://github.com/openai/baselines.
Similar to Developing fast low-rank tensor methods for solving PDEs with uncertain coefficients (20)
Poster to be presented at Stochastic Numerics and Statistical Learning: Theory and Applications Workshop 2024, Kaust, Saudi Arabia, https://cemse.kaust.edu.sa/stochnum/events/event/snsl-workshop-2024.
In this work we have considered a setting that mimics the Henry problem \cite{Simpson2003,Simpson04_Henry}, modeling seawater intrusion into a 2D coastal aquifer. The pure water recharge from the ``land side'' resists the salinisation of the aquifer due to the influx of saline water through the ``sea side'', thereby achieving some equilibrium in the salt concentration. In our setting, following \cite{GRILLO2010}, we consider a fracture on the sea side that significantly increases the permeability of the porous medium.
The flow and transport essentially depend on the geological parameters of the porous medium, including the fracture. We investigated the effects of various uncertainties on saltwater intrusion. We assumed uncertainties in the fracture width, the porosity of the bulk medium, its permeability and the pure water recharge from the land side. The porosity and permeability were modeled by random fields, the recharge by a random but periodic intensity and the thickness by a random variable. We calculated the mean and variance of the salt mass fraction, which is also uncertain.
The main question we investigated in this work was how well the MLMC method can be used to compute statistics of different QoIs. We found that the answer depends on the choice of the QoI. First, not every QoI requires a hierarchy of meshes and MLMC. Second, MLMC requires stable convergence rates for $\EXP{g_{\ell} - g_{\ell-1}}$ and $\Var{g_{\ell} - g_{\ell-1}}$. These rates should be independent of $\ell$. If these convergence rates vary for different $\ell$, then it will be hard to estimate $L$ and $m_{\ell}$, and MLMC will either not work or be suboptimal. We were not able to get stable convergence rates for all levels $\ell=1,\ldots,5$ when the QoI was an integral as in \eqref{eq:integral_box}. We found that for $\ell=1,\ldots 4$ and $\ell=5$ the rate $\alpha$ was different. Further investigation is needed to find the reason for this. Another difficulty is the dependence on time, i.e. the number of levels $L$ and the number of sums $m_{\ell}$ depend on $t$. At the beginning the variability is small, then it increases, and after the process of mixing salt and fresh water has stopped, the variance decreases again.
The number of random samples required at each level was estimated by calculating the decay of the variances and the computational cost for each level. These estimates depend on the minimisation function in the MLMC algorithm.
To achieve the efficiency of the MLMC approach presented in this work, it is essential that the complexity of the numerical solution of each random realisation is proportional to the number of grid vertices on the grid levels.
We investigated the applicability and efficiency of the MLMC approach to the Henry-like problem with uncertain porosity, permeability and recharge. These uncertain parameters were modelled by random fields with three independent random variables. Permeability is a function of porosity. Both functions are time-dependent, have multi-scale behaviour and are defined for two layers. The numerical solution for each random realisation was obtained using the well-known ug4 parallel multigrid solver. The number of random samples required at each level was estimated by calculating the decay of the variances and the computational cost for each level.
The MLMC method was used to compute the expected value and variance of several QoIs, such as the solution at a few preselected points $(t,\bx)$, the solution integrated over a small subdomain, and the time evolution of the freshwater integral. We have found that some QoIs require only 2-3 mesh levels and samples from finer meshes would not significantly improve the result. Other QoIs require more grid levels.
1. Investigated efficiency of MLMC for Henry problem with
uncertain porosity, permeability, and recharge.
2. Uncertainties are modeled by random fields.
3. MLMC could be much faster than MC, 3200 times faster !
4. The time dependence is challenging.
Remarks:
1. Check if MLMC is needed.
2. The optimal number of samples depends on the point (t;x)
3. An advanced MLMC may give better estimates of L and m`.
Density Driven Groundwater Flow with Uncertain Porosity and PermeabilityAlexander Litvinenko
In this work, we solved the density driven groundwater flow problem with uncertain porosity and permeability. An accurate solution of this time-dependent and non-linear problem is impossible because of the presence of natural uncertainties in the reservoir such as porosity and permeability.
Therefore, we estimated the mean value and the variance of the solution, as well as the propagation of uncertainties from the random input parameters to the solution.
We started by defining the Elder-like problem. Then we described the multi-variate polynomial approximation (\gPC) approach and used it to estimate the required statistics of the mass fraction.
Utilizing the \gPC method allowed us
to reduce the computational cost compared to the classical quasi Monte Carlo method.
\gPC assumes that the output function $\sol(t,\bx,\thetab)$ is square-integrable and smooth w.r.t uncertain input variables $\btheta$.
Many factors, such as non-linearity, multiple solutions, multiple stationary states, time dependence and complicated solvers, make the investigation of the convergence of the \gPC method a non-trivial task.
We used an easy-to-implement, but only sub-optimal \gPC technique to quantify the uncertainty. For example, it is known that by increasing the degree of global polynomials (Hermite, Langange and similar), Runge's phenomenon appears. Here, probably local polynomials, splines or their mixtures would be better. Additionally, we used an easy-to-parallelise quadrature rule, which was also only suboptimal. For instance, adaptive choice of sparse grid (or collocation) points \cite{ConradMarzouk13,nobile-sg-mc-2015,Sudret_sparsePCE,CONSTANTINE12,crestaux2009polynomial} would be better, but we were limited by the usage of parallel methods. Adaptive quadrature rules are not (so well) parallelisable. In conclusion, we can report that: a) we developed a highly parallel method to quantify uncertainty in the Elder-like problem; b) with the \gPC of degree 4 we can achieve similar results as with the \QMC method.
In the numerical section we considered two different aquifers - a solid parallelepiped and a solid elliptic cylinder. One of our goals was to see how the domain geometry influences the formation, the number and the shape of fingers.
Since the considered problem is nonlinear,
a high variance in the porosity may result in totally different solutions; for instance, the number of fingers, their intensity and shape, the propagation time, and the velocity may vary considerably.
The number of cells in the presented experiments varied from $241{,}152$ to $15{,}433{,}728$ for the cylindrical domain and from $524{,}288$ to $4{,}194{,}304$ for the parallelepiped. The maximal number of parallel processing units was $600\times 32$, where $600$ is the number of parallel nodes and $32$ is the number of computing cores on each node. The total computing time varied from 2 hours for the coarse mesh to 24 hours for the finest mesh.
Saltwater intrusion occurs when sea levels rise and saltwater moves onto the land. Usually, this occurs during storms, high tides, droughts, or when saltwater penetrates freshwater aquifers and raises the groundwater table. Since groundwater is an essential nutrition and irrigation resource, its salinization may lead to catastrophic consequences. Many acres of farmland may be lost because they can become too wet or salty to grow crops. Therefore, accurate modeling of different scenarios of saline flow is essential to help farmers and researchers develop strategies to improve the soil quality and decrease saltwater intrusion effects.
Saline flow is density-driven and described by a system of time-dependent nonlinear partial differential equations (PDEs). It features convection dominance and can demonstrate very complicated behavior.
As a specific model, we consider a Henry-like problem with uncertain permeability and porosity.
These parameters may strongly affect the flow and transport of salt.
We consider a class of density-driven flow problems. We are particularly interested in the problem of the salinization of coastal aquifers. We consider the Henry saltwater intrusion problem with uncertain porosity, permeability, and recharge parameters as a test case.
The reason for the presence of uncertainties is the lack of knowledge, inaccurate measurements,
and inability to measure parameters at each spatial or time location. This problem is nonlinear and time-dependent. The solution is the salt mass fraction, which is uncertain and changes in time. Uncertainties in porosity, permeability, recharge, and mass fraction are modeled using random fields. This work investigates the applicability of the well-known multilevel Monte Carlo (MLMC) method for such problems. The MLMC method can reduce the total computational and storage costs. Moreover, the MLMC method runs multiple scenarios on different spatial and time meshes and then estimates the mean value of the mass fraction.
The parallelization is performed in both the physical space and stochastic space. To solve every deterministic scenario, we run the parallel multigrid solver ug4 in a black-box fashion.
We use the solution obtained from the quasi-Monte Carlo method as a reference solution.
We investigated the applicability and efficiency of the MLMC approach for the Henry-like problem with uncertain porosity, permeability, and recharge. These uncertain parameters were modeled by random fields with three independent random variables. The numerical solution for each random realization was obtained using the well-known ug4 parallel multigrid solver. The number of required random samples on each level was estimated by computing the decay of the variances and computational costs for each level. We also computed the expected value and variance of the mass fraction in the whole domain, the evolution of the pdfs, the solutions at a few preselected points $(t,\bx)$, and the time evolution of the freshwater integral value. We have found that some QoIs require only 2-3 of the coarsest mesh levels, and samples from finer meshes would not significantly improve the result. Note that a different type of porosity may lead to a different conclusion.
The results show that the MLMC method is faster than the QMC method at the finest mesh. Thus, sampling at different mesh levels makes sense and helps to reduce the overall computational cost.
Here the interest is mainly to compute characterisations like the entropy,
the Kullback-Leibler divergence, more general $f$-divergences, or other such characteristics based on
the probability density. The density is often not available directly,
and it is a computational challenge to just represent it in a numerically
feasible fashion in case the dimension is even moderately large. It
is an even stronger numerical challenge to then actually compute said characteristics
in the high-dimensional case.
The task considered here was the numerical computation of characterising statistics of
high-dimensional pdfs, as well as their divergences and distances,
where the pdf in the numerical implementation was assumed discretised on some regular grid.
We have demonstrated that high-dimensional pdfs,
pcfs, and some functions of them
can be approximated and represented in a low-rank tensor data format.
Utilisation of low-rank tensor techniques helps to reduce the computational complexity
and the storage cost from exponential $\C{O}(n^d)$ to linear in the dimension $d$, e.g.\
$O(d n r^2 )$ for the TT format. Here $n$ is the number of discretisation
points in one direction, $r<<n$ is the maximal tensor rank, and $d$ the problem dimension.
This document proposes a method for weakly supervised regression on uncertain datasets. It combines graph Laplacian regularization and cluster ensemble methodology. The method solves an auxiliary minimization problem to determine the optimal solution for predicting uncertain parameters. It is tested on artificial data to predict target values using a mixture of normal distributions with labeled, inaccurately labeled, and unlabeled samples. The method is shown to outperform a simplified version by reducing mean Wasserstein distance between predicted and true values.
Computing f-Divergences and Distances of High-Dimensional Probability Density...Alexander Litvinenko
Poster presented on Stochastic Numerics and Statistical Learning: Theory and Applications Workshop in KAUST, Saudi Arabia.
The task considered here was the numerical computation of characterising statistics of
high-dimensional pdfs, as well as their divergences and distances,
where the pdf in the numerical implementation was assumed discretised on some regular grid.
Even for moderate dimension $d$, the full storage and computation with such objects become very quickly infeasible.
We have demonstrated that high-dimensional pdfs,
pcfs, and some functions of them
can be approximated and represented in a low-rank tensor data format.
Utilisation of low-rank tensor techniques helps to reduce the computational complexity
and the storage cost from exponential $\C{O}(n^d)$ to linear in the dimension $d$, e.g.
O(d n r^2) for the TT format. Here $n$ is the number of discretisation
points in one direction, r<n is the maximal tensor rank, and d the problem dimension.
The particular data format is rather unimportant,
any of the well-known tensor formats (CP, Tucker, hierarchical Tucker, tensor-train (TT)) can be used,
and we used the TT data format. Much of the presentation and in fact the central train
of discussion and thought is actually independent of the actual representation.
In the beginning it was motivated through three possible ways how one may
arrive at such a representation of the pdf. One was if the pdf was given in some approximate
analytical form, e.g. like a function tensor product of lower-dimensional pdfs with a
product measure, or from an analogous representation of the pcf and subsequent use of the
Fourier transform, or from a low-rank functional representation of a high-dimensional
RV, again via its pcf.
The theoretical underpinnings of the relation between pdfs and pcfs as well as their
properties were recalled in Section: Theory, as they are important to be preserved in the
discrete approximation. This also introduced the concepts of the convolution and of
the point-wise multiplication Hadamard algebra, concepts which become especially important if
one wants to characterise sums of independent RVs or mixture models,
a topic we did not touch on for the sake of brevity but which follows very naturally from
the developments here. Especially the Hadamard algebra is also
important for the algorithms to compute various point-wise functions in the sparse formats.
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...Alexander Litvinenko
Talk presented on SIAM IS 2022 conference.
Very often, in the course of uncertainty quantification tasks or
data analysis, one has to deal with high-dimensional random variables (RVs)
(with values in $\Rd$). Just like any other RV,
a high-dimensional RV can be described by its probability density (\pdf) and/or
by the corresponding probability characteristic functions (\pcf),
or a more general representation as
a function of other, known, random variables.
Here the interest is mainly to compute characterisations like the entropy, the Kullback-Leibler, or more general
$f$-divergences. These are all computed from the \pdf, which is often not available directly,
and it is a computational challenge to even represent it in a numerically
feasible fashion in case the dimension $d$ is even moderately large. It
is an even stronger numerical challenge to then actually compute said characterisations
in the high-dimensional case.
In this regard, in order to achieve a computationally feasible task, we propose
to approximate density by a low-rank tensor.
Low rank tensor approximation of probability density and characteristic funct...Alexander Litvinenko
This document summarizes a presentation on computing divergences and distances between high-dimensional probability density functions (pdfs) represented using tensor formats. It discusses:
1) Motivating the problem using examples from stochastic PDEs and functional representations of uncertainties.
2) Computing Kullback-Leibler divergence and other divergences when pdfs are not directly available.
3) Representing probability characteristic functions and approximating pdfs using tensor decompositions like CP and TT formats.
4) Numerical examples computing Kullback-Leibler divergence and Hellinger distance between Gaussian and alpha-stable distributions using these tensor approximations.
Identification of unknown parameters and prediction of missing values. Compar...Alexander Litvinenko
H-matrix approximation of large Mat\'{e}rn covariance matrices, Gaussian log-likelihoods.
Identifying unknown parameters and making predictions
Comparison with machine learning methods.
kNN is easy to implement and shows promising results.
Computation of electromagnetic fields scattered from dielectric objects of un...Alexander Litvinenko
This document describes using the Continuation Multi-Level Monte Carlo (CMLMC) method to compute electromagnetic fields scattered from dielectric objects of uncertain shapes. CMLMC optimally balances statistical and discretization errors using fewer samples on fine meshes and more on coarse meshes. The method is tested by computing scattering cross sections for randomly perturbed spheres under plane wave excitation and comparing results to the unperturbed sphere. Computational costs and errors are analyzed to demonstrate the efficiency of CMLMC for this scattering problem with uncertain geometry.
Identification of unknown parameters and prediction with hierarchical matrice...Alexander Litvinenko
We compare four numerical methods for the prediction of missing values in four different datasets.
These methods are 1) the hierarchical maximum likelihood estimation (H-MLE), and three machine learning (ML) methods, which include 2) k-nearest neighbors (kNN), 3) random forest, and 4) Deep Neural Network (DNN).
From the ML methods, the best results (for considered datasets) were obtained by the kNN method with three (or seven) neighbors.
On one dataset, the MLE method showed a smaller error than the kNN method, whereas, on another, the kNN method was better.
The MLE method requires a lot of linear algebra computations and works fine on almost all datasets. Its result can be improved by taking a smaller threshold and more accurate hierarchical matrix arithmetics. To our surprise, the well-known kNN method produces similar results as H-MLE and worked much faster.
1. Motivation: why do we need low-rank tensors
2. Tensors of the second order (matrices)
3. CP, Tucker and tensor train tensor formats
4. Many classical kernels have (or can be approximated in ) low-rank tensor format
5. Post processing: Computation of mean, variance, level sets, frequency
Computation of electromagnetic fields scattered from dielectric objects of un...Alexander Litvinenko
Computational tools for characterizing electromagnetic scattering from objects with uncertain shapes are needed in various applications ranging from remote sensing at microwave frequencies to Raman spectroscopy at optical frequencies. Often, such computational tools use the Monte Carlo (MC) method to sample a parametric space describing geometric uncertainties. For each sample, which corresponds to a realization of the geometry, a deterministic electromagnetic solver computes the scattered fields. However, for an accurate statistical characterization the number of MC samples has to be large. In this work, to address this challenge, the continuation multilevel Monte Carlo (\CMLMC) method is used together with a surface integral equation solver.
The \CMLMC method optimally balances statistical errors due to sampling of
the parametric space, and numerical errors due to the discretization of the geometry using a hierarchy of discretizations, from coarse to fine.
The number of realizations of finer discretizations can be kept low, with most samples
computed on coarser discretizations to minimize computational cost.
Consequently, the total execution time is significantly reduced, in comparison to the standard MC scheme.
Computation of electromagnetic fields scattered from dielectric objects of un...Alexander Litvinenko
Computational tools for characterizing electromagnetic scattering from objects with uncertain shapes are needed in various applications ranging from remote sensing at microwave frequencies to Raman spectroscopy at optical frequencies. Often, such computational tools use the Monte Carlo (MC) method to sample a parametric space describing geometric uncertainties. For each sample, which corresponds to a realization of the geometry, a deterministic electromagnetic solver computes the scattered fields. However, for an accurate statistical characterization the number of MC samples has to be large. In this work, to address this challenge, the continuation multilevel Monte Carlo (\CMLMC) method is used together with a surface integral equation solver.
The \CMLMC method optimally balances statistical errors due to sampling of
the parametric space, and numerical errors due to the discretization of the geometry using a hierarchy of discretizations, from coarse to fine.
The number of realizations of finer discretizations can be kept low, with most samples
computed on coarser discretizations to minimize computational cost.
Consequently, the total execution time is significantly reduced, in comparison to the standard MC scheme.
Propagation of Uncertainties in Density Driven Groundwater FlowAlexander Litvinenko
Major Goal: estimate risks of the pollution in a subsurface flow.
How?: we solve density-driven groundwater flow with uncertain porosity and permeability.
We set up density-driven groundwater flow problem,
review stochastic modeling and stochastic methods, use UG4 framework (https://gcsc.uni-frankfurt.de/simulation-and-modelling/ug4),
model uncertainty in porosity and permeability,
2D and 3D numerical experiments.
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...University of Maribor
Slides from talk presenting:
Aleš Zamuda: Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapter and Networking.
Presentation at IcETRAN 2024 session:
"Inter-Society Networking Panel GRSS/MTT-S/CIS
Panel Session: Promoting Connection and Cooperation"
IEEE Slovenia GRSS
IEEE Serbia and Montenegro MTT-S
IEEE Slovenia CIS
11TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONIC AND COMPUTING ENGINEERING
3-6 June 2024, Niš, Serbia
6th International Conference on Machine Learning & Applications (CMLA 2024)ClaraZara1
6th International Conference on Machine Learning & Applications (CMLA 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of on Machine Learning & Applications.
Developing fast low-rank tensor methods for solving PDEs with uncertain coefficients
1. Low-rank tensors for PDEs with
uncertain coefficients
Alexander Litvinenko
Center for Uncertainty
Quantification
ntification Logo Lock-up
http://sri-uq.kaust.edu.sa/
Extreme Computing Research Center, KAUST
Alexander Litvinenko Low-rank tensors for PDEs with uncertain coefficients
3. 4*
My interests and collaborations
Center for Uncertainty
Quantification
ation Logo Lock-up
3
4. 4*
Motivation to do Uncertainty Quantification (UQ)
Motivation: there is an urgent need to quantify and reduce the
uncertainty in multiscale-multiphysics applications.
UQ and its relevance: Nowadays computational predictions are
used in critical engineering decisions. But, how reliable are
these predictions?
Example: Saudi Aramco currently has a simulator,
GigaPOWERS, which runs with 9 billion cells. How sensitive
are these simulations w.r.t. unknown reservoir properties?
My goal is systematic, mathematically founded, develop-
ment of UQ methods and low-rank algorithms relevant for
applications.
Center for Uncertainty
Quantification
ation Logo Lock-up
5. 4*
PDE with uncertain coefficient
Consider
− div(κ(x, ω) u(x, ω)) = f(x, ω) in G × Ω, G ⊂ Rd ,
u = 0 on ∂G,
where κ(x, ω) - uncertain diffusion coefficient.
1. Efficient Analysis of High Dimensional Data in Tensor
Formats, Espig, Hackbusch, Litvinenko., Matthies, Zander,
2012.
2. Efficient low-rank approx. of the stoch. Galerkin ma-
trix in tensor formats, W¨ahnert, Espig, Hackbusch, A.L.,
Matthies, 2013.
3. PCE of random coefficients and the solution of stochas-
tic PDEs in the Tensor Train format, Dolgov, Litvinenko,
Khoromskij, Matthies, 2016.
4. Application of H-matrices for computing the KL expan-
sion, Khoromskij, Litvinenko, Matthies Computing 84 (1-2),
49-67, 2009
0 0.5 1
-20
0
20
40
60
50 realizations of the solution u,
the mean and quantiles
Related work by R. Scheichl, Chr. Schwab, A. Teckentrup, F. Nobile, D. Kressner,...
Center for Uncertainty
Quantification
ation Logo Lock-up
5
6. 4*
Canonical and Tucker tensor formats
[Pictures are taken from B. Khoromskij and A. Auer lecture course]
Storage: O(nd ) → O(dRn) and O(Rd + dRn).
Center for Uncertainty
Quantification
ation Logo Lock-up
6
7. 4*
Karhunen Lo`eve and Polynomial Chaos Expansions
Apply both
Truncated Karhunen Lo`eve Expansion (KLE):
κ(x, ω) ≈ κ0(x) +
L
j=1
κjgj(x)ξj(θ(ω)),
where θ = θ(ω) = (θ1(ω), θ2(ω), ..., ),
ξj(θ) = 1
κj G (κ(x, ω) − κ0(x)) gj(x)dx.
Truncated Polynomial Chaos Expansion (PCE)
κ(x, ω) ≈ α∈JM,p
κ(α)(x)Hα(θ)
ξj(θ) ≈ α∈JM,p
ξ
(α)
j Hα(θ).
Center for Uncertainty
Quantification
ation Logo Lock-up
7
8. 4*
Discretization of elliptic PDE
Ku = f, where
K :=
L
=1
K ⊗
M
µ=1
∆ µ, K ∈ RN×N
, ∆ µ ∈ RRµ×Rµ
,
u :=
r
j=1
uj ⊗
M
µ=1
ujµ, uj ∈ RN
, ujµ ∈ RRµ
,
f :=
R
k=1
fk ⊗
M
µ=1
gkµ, fk ∈ RN
, gkµ ∈ RRµ
.
Efficient low-rank approximation of the stochastic Galerkin matrix in tensor formats, W¨ahnert, Espig,
Hackbusch, Litvinenko, Matthies, 2013.
In [2] we analyzed tensor ranks (compression properties) of the
stochastic Galerkin operator K.
Center for Uncertainty
Quantification
ation Logo Lock-up
8
9. 4*
Numerical Experiments
2D L-shape domain, N = 557 dofs.
Total stochastic dimension is Mu = Mk + Mf = 20, there are
|JM,p| = 231 PCE coefficients
u =
231
j=1
uj,0 ⊗
20
µ=1
ujµ ∈ R557
⊗
20
µ=1
R3
.
Tensor u has 320 · 557 ≈ 2 · 1012 entries ≈ 16 TB of memory.
Instead we store only 231 · (557 + 20 · 3) ≈ 144000 entries
≈ 1.14 MB.
Center for Uncertainty
Quantification
ation Logo Lock-up
9
10. 4*
Level sets
Now we compute level sets
{ui : ui > b · max
i
u},
i := (i1, ..., iM+1)
for b ∈ {0.2, 0.4, 0.6, 0.8}.
The computing time for each b was 10 minutes.
Center for Uncertainty
Quantification
ation Logo Lock-up
10
11. 4*
Part II
Part II: Developing of cheap Bayesian update
surrogate
1. Rosic, Litvinenko, Pajonk, Matthies J. Comp. Ph. 231 (17), 5761-5787, 2013
2. Inverse Problems in a Bayesian Setting, Matthies, Zander, Pajonk, Rosic, Litvinenko. Comp. Meth.for
Solids and Fluids Multiscale Analysis, 2016
Related work by A. Stuart, Chr. Schwab, A. El Sheikh, Y.
Marzouk, H. Najm, O. Ernst
12. 4*
Numerical computation of Bayesian Update surrogate
Notation: ˆy – measurements from engineers, y(ξ) – forecast
from the simulator, ε(ω) – the Gaussian noise.
Look for ϕ such that q(ξ) = ϕ(z(ξ)), z(ξ) = y(ξ) + ε(ω):
ϕ ≈ ˜ϕ =
α∈Jp
ϕαΦα(z(ξ))
and minimize q(ξ) − ˜ϕ(z(ξ)) 2
L2
, where Φα are known
polynomials (e.g. Hermite).
Taking derivatives with respect to ϕα:
∂
∂ϕα
q(ξ) − ˜ϕ(z(ξ)), q(ξ) − ˜ϕ(z(ξ)) = 0 ∀α ∈ Jp
Center for Uncertainty
Quantification
ation Logo Lock-up
11
13. 4*
Numerical computation of NLBU
∂
∂ϕα
E
q2
(ξ) − 2
β∈J
qϕβΦβ(z) +
β,γ∈J
ϕβϕγΦβ(z)Φγ(z)
= 2E
−qΦα(z) +
β∈J
ϕβΦβ(z)Φα(z)
= 2
β∈J
E [Φβ(z)Φα(z)] ϕβ − E [qΦα(z)]
= 0 ∀α ∈ J .
Center for Uncertainty
Quantification
ation Logo Lock-up
12
14. 4*
Numerical computation of NLBU
Now, rewriting the last sum in a matrix form, obtain the linear
system of equations (=: A) to compute coefficients ϕβ:
... ... ...
... E [Φα(z(ξ))Φβ(z(ξ))]
...
... ... ...
...
ϕβ
...
=
...
E [q(ξ)Φα(z(ξ))]
...
,
where α, β ∈ J , A is of size |J | × |J |.
Center for Uncertainty
Quantification
ation Logo Lock-up
13
15. 4*
Numerical computation of NLBU
Finally, the assimilated parameter qa will be
qa = qf + ˜ϕ(ˆy) − ˜ϕ(z), (1)
z(ξ) = y(ξ) + ε(ω),
˜ϕ = β∈Jp
ϕβΦβ(z(ξ))
Center for Uncertainty
Quantification
ation Logo Lock-up
14
16. 4*
Example: 1D elliptic PDE with uncertain coeffs
− · (κ(x, ξ) u(x, ξ)) = f(x, ξ), x ∈ [0, 1]
+ Dirichlet random b.c. g(0, ξ) and g(1, ξ).
3 measurements: u(0.3) = 22, s.d. 0.2, x(0.5) = 28, s.d. 0.3,
x(0.8) = 18, s.d. 0.3.
κ(x, ξ): N = 100 dofs, M = 5, number of KLE terms 35, beta distribution for κ, Gaussian covκ, cov.
length 0.1, multi-variate Hermite polynomial of order pκ = 2;
RHS f(x, ξ): Mf = 5, number of KLE terms 40, beta distribution for κ, exponential covf , cov. length 0.03,
multi-variate Hermite polynomial of order pf = 2;
b.c. g(x, ξ): Mg = 2, number of KLE terms 2, normal distribution for g, Gaussian covg , cov. length 10,
multi-variate Hermite polynomial of order pg = 1;
pφ = 3 and pu = 3
Center for Uncertainty
Quantification
ation Logo Lock-up
15
17. 4*
Example: Updating of the parameter
0 0.5 1
0
0.5
1
1.5
0 0.5 1
0
0.5
1
1.5
Figure: Prior and posterior (updated) parameter κ.
Collaboration with Y. Marzouk, MIT, and TU Braunschweig.
Together with H. Najm, Sandia Lab, we try to compare our
technique with his advanced MCMC technique for chemical
combustion eqn.
Center for Uncertainty
Quantification
ation Logo Lock-up
16
18. 4*
Example: updating of the solution u
0 0.5 1
-20
0
20
40
60
0 0.5 1
-20
0
20
40
60
0 0.5 1
-20
0
20
40
60
0 0.5 1
-20
0
20
40
60
0 0.5 1
-20
0
20
40
60
Figure: Original and updated solutions, mean value plus/minus 1,2,3
standard deviations. Number of available measurements {0, 1, 2, 3, 5}
[graphics are built in the stochastic Galerkin library sglib, written by E. Zander in TU Braunschweig]
Center for Uncertainty
Quantification
ation Logo Lock-up
17
19. 4*
Part III: My contribution to MUNA project
MUNA=Management and minimization of Uncertainties in
Numerical Aerodynamics.
1. Quantification of airfoil geometry-induced aerodynamic
uncertainties-comparison of approaches, Liu, Litvinenko,
Schillings, Schulz, JUQ 2017
2. Numerical Methods for Uncertainty Quantification and
Bayesian Update in Aerodynamics Litvinenko, Matthies,
chapter in Springer book, Vol 122, pp 262-285, 2013.
20. 4*
Example: uncertainties in free stream turbulence
α
v
v
u
u’
α’
v1
2
Random vectors v1(θ) and v2(θ) model free stream turbulence
Center for Uncertainty
Quantification
ation Logo Lock-up
19
21. 4*
Example: 3sigma intervals
Figure: 3σ interval, σ standard deviation, in each point of RAE2822
airfoil for the pressure (cp) and friction (cf) coefficients.
Center for Uncertainty
Quantification
ation Logo Lock-up
20
22. 4*
Mean and variance of density, tke, xv, zv, pressure
Center for Uncertainty
Quantification
ation Logo Lock-up
23. 4*
Example: Kriging and geostat. optimal design
Domain: 20m × 20m × 20m, 25, 000 × 25, 000 × 25, 000 dofs,
4000 measurements.
Log-Permeability. Color scale: showing the 95% confidence interval.
Kriging and spatial design accelerated by orders of magnitude:
Combining low-rank with FFT, Nowak, Litvinenko, Mathemati-
cal Geosciences 45 (4), 411-435, 2013
Center for Uncertainty
Quantification
ation Logo Lock-up
24. 4*
Numerics on computer with 16GB RAM:
1. 2D Kriging with 270 million estimation points and 100
measurement values (0.25 sec.),
2. to compute the estimation variance (< 1 sec.),
3. to evaluate the spatial average of the estimation variance
(the A-criterion of geostat. optimal design) for 2 · 1012
estim. points (30 sec.),
4. to compute the C-criterion of geostat. optimal design for
2 · 1015 estim. points (30 sec.),
5. solve 3D Kriging problem with 15 · 1012 estim. points and
4000 measurement data values (20 sec.)
Collaboration with Stuttgart University, hydrology and
geosciences.
Center for Uncertainty
Quantification
ation Logo Lock-up
23
25. 4*
Example from spatial statistics
Goal: To improve estimation of un-
known statistical parameters in a spa-
tial soil moisture field, Mississippi
basin, [−85◦ − 73◦] × [32◦, 43◦].
Log-likelihood function with C = e−
|x−y|
θ and Z available
(satellite) data:
L(θ) = −
n
2
log(2π) −
1
2
log |C(θ)| −
1
2
Z C(θ)−1
Z.
Collaboration with statisticians: M. Genton, Y. Sun, R. Huser, H. Rue from KAUST.
n = 512K, matrix setup 261 sec., compression rate 99.98% (0.4 GB against 2006 GB). H-LU is done in
843 sec., error 2 · 10−3
.
Center for Uncertainty
Quantification
ation Logo Lock-up
26. 4*
Conclusion
Introduced:
1. Low-rank tensor methods to solve PDEs with uncertain
coefficients,
2. Post-processing in low-rank tensor format, computing level
sets
3. Bayesian update surrogate ϕ (as a linear, quadratic,...
approximation)
4. Quantification of uncertainties in Numerical Aerodynamics
5. Applications in geosciences
6. Estimating unknown coefficients in spatial statistics
(moisture example)
Center for Uncertainty
Quantification
ation Logo Lock-up
25
28. 4*
Possible collaboration
1. Dominic Breit (error estimates for UQ applications to
balance statistical and discretization errors)
2. Gabriel Lord (num. meth. for PDEs with uncertainties;
combination of multiscale methods, UQ techniques and
Bayesian inference for reservoir modeling; low-rank tensor
methods for high-dimensional problems).
3. Lehel Banjai (computation of electromagnetic fields
scattered from dielectric objects of uncertain shapes;
balancing of the Runge-Kutta time discretization step and
the H-matrix approximation rank in time-dependent PDEs),
Center for Uncertainty
Quantification
ation Logo Lock-up
27
29. 4*
Possible collaboration
1. BGS (CO2 storage, reservoir modeling, spatial statistics in
geology/geophysics),
2. Lyell Institute (subsurface flow under uncertainties, EOR,
Bayesian techniques for data assimilations)
3. EGIS:
3.1. Mike Christie (reservoir modeling under uncertainties,
EOR, seismic wave propagation in uncertain media)
3.2. Vasily Demyanov (uncertainty quantification and
low-rank approximations in geostatistics)
3.3. Dan Arnold (modeling of random geology, multi-scale,
Bayesian inference)
3.4. Ahmed El Sheikh (fast Bayesian update methods,
advanced UQ, surrogates for BU, big data from spatial
statistics).
Center for Uncertainty
Quantification
ation Logo Lock-up
28
31. 4*
Explanation of Bayesian Update surrogate
Let the stochastic model of the measurement is given by
y = M(q) + ε, ε -measurement noise
Best estimator ˜ϕ for q given z, i.e.
˜ϕ = argminϕ E[ q(·) − ϕ(z(·)) 2
2].
The best estimate (or predictor) of q given the
measurement model is
qM(ξ) = ˜ϕ(z(ξ))).
The remainder, i.e. the difference between q and qM, is
given by
q⊥
M(ξ) = q(ξ) − qM(ξ),
Due to the minimisation property of the MMSE
estimator—orthogonal to qM(ξ), i.e. cov(q⊥
M, qM) = 0.
[Thanks to Elmar Zander, TU Braunschweig]
Center for Uncertainty
Quantification
ation Logo Lock-up
30
32. In other words,
q(ξ) = qM(ξ) + q⊥
M(ξ) (2)
yields an orthogonal decomposition of q.
Actual measurement ˆy, prediction ˆq = ˜ϕ(ˆy). Part qM of q
can be “collapsed” to ˆq. Updated stochastic model q is
thus given by
q (ξ) = ˆq + q⊥
M(ξ) (3)
q (ξ) = q(ξ) + ( ˜ϕ(ˆy) − ˜ϕ(z(ξ))). (4)
Center for Uncertainty
Quantification
ation Logo Lock-up
31
33. 4*
Future plans, Idea N1
Possible collaboration work 1 To develop a low-rank adaptive
goal-oriented Bayesian update technique. The solution of the forward
and inverse problems will be considered as a whole adaptive
process, controlled by error/uncertainty estimators.
z
(y - z) q
f ε
forward update
low-rank and adaptive
y
f z
(y - z)
ε
forward
y q.....
low-rank and adaptive
... q
update
Stochastic forward spatial discret.
stochastic discret.
low-rank approx.
Inverse problem
Errors
inverse operator approx.
34. 4*
Future plans, Idea N2
Edge between Green functions in PDEs and covariance
matrices.
Possible collaboration with statistical group, Doug Nychka
(NCAR), Haavard Rue
Center for Uncertainty
Quantification
ation Logo Lock-up
32
35. 4*
Future plans, Idea N3
Data assimilation techniques, Bayesian update surrogate.
Develop non-linear, non-Gaussian Bayesian update
approximation for gPCE coefficients.
Possible collaboration with Kody Law (OAK NL), Y. Marzouk
(MIT), H. Najm (Sandia NL), TU Braunschweig and KAUST.
36. 4*
Example: Canonical rank d, whereas TT rank 2
d-Laplacian over uniform tensor grid. It is known to have the
Kronecker rank-d representation,
∆d = A⊗IN ⊗...⊗IN +IN ⊗A⊗...⊗IN +...+IN ⊗IN ⊗...⊗A ∈ RI⊗d ⊗I⊗d
(5)
with A = ∆1 = tridiag{−1, 2, −1} ∈ RN×N, and IN being the
N × N identity. Notice that for the canonical rank we have rank
kC(∆d ) = d, while TT-rank of ∆d is equal to 2 for any
dimension due to the explicit representation
∆d = (∆1 I) ×
I 0
∆1 I
× ... ×
I 0
∆1 I
×
I
∆1
(6)
where the rank product operation ”×” is defined as a regular
matrix product of the two corresponding core matrices, their
blocks being multiplied by means of tensor product. The similar
bound is true for the Tucker rank rankTuck (∆d ) = 2.
37. 4*
Advantages and disadvantages
Denote k - rank, d-dimension, n = # dofs in 1D:
1. CP: ill-posed approx. alg-m, O(dnk), hard to compute
approx.
2. Tucker: reliable arithmetic based on SVD, O(dnk + kd )
3. Hierarchical Tucker: based on SVD, storage O(dnk + dk3),
truncation O(dnk2 + dk4)
4. TT: based on SVD, O(dnk2) or O(dnk3), stable
5. Quantics-TT: O(nd ) → O(dlogq
n)
38. 4*
How to compute the variance in CP format
Let u ∈ Rr and
˜u := u − u
d
µ=1
1
nµ
1 =
r+1
j=1
d
µ=1
˜ujµ ∈ Rr+1, (7)
then the variance var(u) of u can be computed as follows
var(u) =
˜u, ˜u
d
µ=1 nµ
=
1
d
µ=1 nµ
r+1
i=1
d
µ=1
˜uiµ
,
r+1
j=1
d
ν=1
˜ujν
=
r+1
i=1
r+1
j=1
d
µ=1
1
nµ
˜uiµ, ˜ujµ .
Numerical cost is O (r + 1)2 · d
µ=1 nµ .
39. 4*
Computing QoI in low-rank tensor format
Now, we consider how to
find ‘level sets’,
for instance, all entries of tensor u from interval [a, b].
40. 4*
Definitions of characteristic and sign functions
1. To compute level sets and frequencies we need
characteristic function.
2. To compute characteristic function we need sign function.
The characteristic χI(u) ∈ T of u ∈ T in I ⊂ R is for every multi-
index i ∈ I pointwise defined as
(χI(u))i :=
1, ui ∈ I,
0, ui /∈ I.
Furthermore, the sign(u) ∈ T is for all i ∈ I pointwise defined
by
(sign(u))i :=
1, ui > 0;
−1, ui < 0;
0, ui = 0.
Center for Uncertainty
Quantification
ation Logo Lock-up
34
41. 4*
sign(u) is needed for computing χI(u)
Lemma
Let u ∈ T , a, b ∈ R, and 1 = d
µ=1
˜1µ, where
˜1µ := (1, . . . , 1)t ∈ Rnµ .
(i) If I = R<b, then we have χI(u) = 1
2 (1 + sign(b1 − u)).
(ii) If I = R>a, then we have χI(u) = 1
2(1 − sign(a1 − u)).
(iii) If I = (a, b), then we have
χI(u) = 1
2 (sign(b1 − u) − sign(a1 − u)).
Computing sign(u), u ∈ Rr , via hybrid Newton-Schulz iteration
with rank truncation after each iteration.
Center for Uncertainty
Quantification
ation Logo Lock-up
35
42. 4*
Level Set, Frequency
Definition (Level Set, Frequency)
Let I ⊂ R and u ∈ T . The level set LI(u) ∈ T of u respect to I is
pointwise defined by
(LI(u))i :=
ui, ui ∈ I ;
0, ui /∈ I ,
for all i ∈ I.
The frequency FI(u) ∈ N of u respect to I is defined as
FI(u) := # supp χI(u).
Center for Uncertainty
Quantification
ation Logo Lock-up
36
43. 4*
Computation of level sets and frequency
Proposition
Let I ⊂ R, u ∈ T , and χI(u) its characteristic. We have
LI(u) = χI(u) u
and rank(LI(u)) ≤ rank(χI(u)) rank(u).
The frequency FI(u) ∈ N of u respect to I is
FI(u) = χI(u), 1 ,
where 1 = d
µ=1
˜1µ, ˜1µ := (1, . . . , 1)T ∈ Rnµ .
Center for Uncertainty
Quantification
ation Logo Lock-up
37
44. 4*
Definition of tensor of order d
Tensor of order d is a multidimensional array over a d-tuple
index set I = I1 × · · · × Id ,
A = [ai1...id
: i ∈ I ] ∈ RI
, I = {1, ..., n }, = 1, .., d.
A is an element of the linear space
Vn =
d
=1
V , V = RI
equipped with the Euclidean scalar product ·, · : Vn × Vn → R,
defined as
A, B :=
(i1...id )∈I
ai1...id
bi1...id
, for A, B ∈ Vn.
Center for Uncertainty
Quantification
ation Logo Lock-up
38
45. 4*
Examples of rank-1 and rank-2 tensors
Rank-1:
f(x1, ..., xd ) = exp(f1(x1) + ... + fd (xd )) = d
j=1 exp(fj(xj))
Rank-2: f(x1, ..., xd ) = sin( d
j=1 xj), since
2i · sin( d
j=1 xj) = ei d
j=1 xj
− e−i d
j=1 xj
Rank-d function f(x1, ..., xd ) = x1 + x2 + ... + xd can be
approximated by rank-2: with any prescribed accuracy:
f ≈
d
j=1(1 + εxj)
ε
−
d
j=1 1
ε
+ O(ε), as ε → 0
Center for Uncertainty
Quantification
ation Logo Lock-up
39
46. 4*
Conditional probability and expectation
Classically, Bayes’s theorem gives conditional probability
P(Iq|Mz) =
P(Mz|Iq)
P(Mz)
P(Iq) (or πq(q|z) =
p(z|q)
Zs
pq(q));
Expectation with this posterior measure is conditional
expectation.
Kolmogorov starts from conditional expectation E (·|Mz),
from this conditional probability via P(Iq|Mz) = E χIq
|Mz .
Center for Uncertainty
Quantification
ation Logo Lock-up
40
47. 4*
Conditional expectation
The conditional expectation is defined as
orthogonal projection onto the closed subspace L2(Ω, P, σ(z)):
E(q|σ(z)) := PQ∞ q = argmin˜q∈L2(Ω,P,σ(z)) q − ˜q 2
L2
The subspace Q∞ := L2(Ω, P, σ(z)) represents the available
information.
The update, also called the assimilated value
qa(ω) := PQ∞ q = E(q|σ(z)), is a Q-valued RV
and represents new state of knowledge after the measurement.
Doob-Dynkin: Q∞ = {ϕ ∈ Q : ϕ = φ ◦ z, φ measurable}.
Center for Uncertainty
Quantification
ation Logo Lock-up
41