This document discusses methods for estimating derivatives of the likelihood function in intractable models, such as hidden Markov models, where the likelihood does not have a closed form. It presents three key ideas:
1) Iterated filtering, which approximates the score by perturbing the parameter and tracking the evolution of the perturbation through sequential updates.
2) Proximity mapping, which relates the shift in the posterior mode to the prior mode as the prior variance goes to zero to the score via Moreau's approximation.
3) Posterior concentration induced by a normal prior concentrating at a point, where Taylor expansions show the shift in posterior moments is order of the prior variance, relating it to the derivatives of the log
Stochastic Frank-Wolfe for Constrained Finite Sum Minimization @ Montreal Opt...Geoffrey Négiar
We propose a novel Stochastic Frank-Wolfe (a.k.a. conditional gradient) algorithm for constrained smooth finite-sum minimization with a generalized linear prediction/structure. This class of problems includes empirical risk minimization with sparse, low-rank, or other structured constraints. The proposed method is simple to implement, does not require step-size tuning, and has a constant per-iteration cost that is independent of the dataset size. Furthermore, as a byproduct of the method we obtain a stochastic estimator of the Frank-Wolfe gap that can be used as a stopping criterion. Depending on the setting, the proposed method matches or improves on the best computational guarantees for Stochastic Frank-Wolfe algorithms. Benchmarks on several datasets highlight different regimes in which the proposed method exhibits a faster empirical convergence than related methods. Finally, we provide an implementation of all considered methods in an open-source package.
A new Perron-Frobenius theorem for nonnegative tensorsFrancesco Tudisco
Based on the concept of dimensional partition we consider a general tensor spectral problem that includes all known tensor spectral problems as special cases. We formulate irreducibility and symmetry properties in terms of the dimensional partition and use the theory of multi-homogeneous order-preserving maps to derive a general and unifying Perron-Frobenius theorem for nonnegative tensors that either includes previous results of this kind or improves them by weakening the assumptions there considered.
Talk presented at SIAM Applied Linear Algebra conference Hong Kong 2018
Small updates of matrix functions used for network centralityFrancesco Tudisco
Many relevant measures of importance for nodes and edges of a network are defined in terms of suitable entries of matrix functions $f(A)$, for different choices of $f$ and $A$. Addressing the entries of $f(A)$ can be computationally challenging and this is particularly prohibitive when $A$ undergoes a perturbation $A+\delta A$ and the entries of $f(A)$ have to be updated. Given the adjacency matrix $A$ of a graph $G=(V,E)$, in this talk we consider the case where $\delta A$ is a sparse matrix that yields a small perturbation of the edge structure of $G$.
In particular, we present a bound showing that the variation of the entry $f(A)_{u,v}$ decays exponentially with the distance in $G$ that separates either $u$ or $v$ from the set of nodes touched by the edges that are perturbed. Our bound depends only on the distances in the original graph $G$ and on the field of values of the perturbed matrix $A+\delta A$. We show several numerical examples in support of the proposed result.
Talk presented at the IMA Numerical Analysis and Optimization conference, Birmingham 2018
The talk is based on the paper:
S. Pozza and F. Tudisco, On the stability of network indices defined by means of matrix functions, SIMAX, 2018
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...Francesco Tudisco
We consider the p-Laplacian on discrete graphs, a nonlinear operator that generalizes the standard graph Laplacian (obtained for p=2). We consider a set of variational eigenvalues of this operator and analyze the nodal domain count of the corresponding eigenfunctions. In particular, we show that the famous Courant’s nodal domain theorem for the linear Laplacian carries over almost unchanged to the nonlinear case. Moreover, we use the nodal domains to prove a higher-order Cheeger inequality that relates the k-way graph cut to the k-th variational eigenvalue of the p-Laplacian
Stochastic Frank-Wolfe for Constrained Finite Sum Minimization @ Montreal Opt...Geoffrey Négiar
We propose a novel Stochastic Frank-Wolfe (a.k.a. conditional gradient) algorithm for constrained smooth finite-sum minimization with a generalized linear prediction/structure. This class of problems includes empirical risk minimization with sparse, low-rank, or other structured constraints. The proposed method is simple to implement, does not require step-size tuning, and has a constant per-iteration cost that is independent of the dataset size. Furthermore, as a byproduct of the method we obtain a stochastic estimator of the Frank-Wolfe gap that can be used as a stopping criterion. Depending on the setting, the proposed method matches or improves on the best computational guarantees for Stochastic Frank-Wolfe algorithms. Benchmarks on several datasets highlight different regimes in which the proposed method exhibits a faster empirical convergence than related methods. Finally, we provide an implementation of all considered methods in an open-source package.
A new Perron-Frobenius theorem for nonnegative tensorsFrancesco Tudisco
Based on the concept of dimensional partition we consider a general tensor spectral problem that includes all known tensor spectral problems as special cases. We formulate irreducibility and symmetry properties in terms of the dimensional partition and use the theory of multi-homogeneous order-preserving maps to derive a general and unifying Perron-Frobenius theorem for nonnegative tensors that either includes previous results of this kind or improves them by weakening the assumptions there considered.
Talk presented at SIAM Applied Linear Algebra conference Hong Kong 2018
Small updates of matrix functions used for network centralityFrancesco Tudisco
Many relevant measures of importance for nodes and edges of a network are defined in terms of suitable entries of matrix functions $f(A)$, for different choices of $f$ and $A$. Addressing the entries of $f(A)$ can be computationally challenging and this is particularly prohibitive when $A$ undergoes a perturbation $A+\delta A$ and the entries of $f(A)$ have to be updated. Given the adjacency matrix $A$ of a graph $G=(V,E)$, in this talk we consider the case where $\delta A$ is a sparse matrix that yields a small perturbation of the edge structure of $G$.
In particular, we present a bound showing that the variation of the entry $f(A)_{u,v}$ decays exponentially with the distance in $G$ that separates either $u$ or $v$ from the set of nodes touched by the edges that are perturbed. Our bound depends only on the distances in the original graph $G$ and on the field of values of the perturbed matrix $A+\delta A$. We show several numerical examples in support of the proposed result.
Talk presented at the IMA Numerical Analysis and Optimization conference, Birmingham 2018
The talk is based on the paper:
S. Pozza and F. Tudisco, On the stability of network indices defined by means of matrix functions, SIMAX, 2018
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...Francesco Tudisco
We consider the p-Laplacian on discrete graphs, a nonlinear operator that generalizes the standard graph Laplacian (obtained for p=2). We consider a set of variational eigenvalues of this operator and analyze the nodal domain count of the corresponding eigenfunctions. In particular, we show that the famous Courant’s nodal domain theorem for the linear Laplacian carries over almost unchanged to the nonlinear case. Moreover, we use the nodal domains to prove a higher-order Cheeger inequality that relates the k-way graph cut to the k-th variational eigenvalue of the p-Laplacian
Numerical Fourier transform based on hyperfunction theoryHidenoriOgata
This is the PC slide of a contributed talk in the conference "ECMI2018 (The 20th European Conference on Mathematics for Industry)", 18-20 June 2018, Budapest, Hungary. In this talk, we propose a numerical method of Fourier transforms based on hyperfunction theory.
We start with motivation, few examples of uncertainties. Then we discretize elliptic PDE with uncertain coefficients, apply TT format for permeability, the stochastic operator and for the solution. We compare sparse multi-index set approach with full multi-index+TT.
Tensor Train format allows us to keep the whole multi-index set, without any multi-index set truncation.
Low rank tensor approximation of probability density and characteristic funct...Alexander Litvinenko
Very often one has to deal with high-dimensional random variables (RVs). A high-dimensional RV can be described by its probability density (\pdf) and/or by the corresponding probability characteristic functions (\pcf), or by a function representation. Here the interest is mainly to compute characterisations like the entropy, or
relations between two distributions, like their Kullback-Leibler divergence, or more general measures such as $f$-divergences,
among others. These are all computed from the \pdf, which is often not available directly, and it is a computational challenge to even represent it in a numerically feasible fashion in case the dimension $d$ is even moderately large. It is an even stronger numerical challenge to then actually compute said characterisations in the high-dimensional case.
In this regard, in order to achieve a computationally feasible task, we propose to represent the density by a high order tensor product, and approximate this in a low-rank format.
We provide a comprehensive convergence analysis of the asymptotic preserving implicit-explicit particle-in-cell (IMEX-PIC) methods for the Vlasov–Poisson system with a strong magnetic field. This study is of utmost importance for understanding the behavior of plasmas in magnetic fusion devices such as tokamaks, where such a large magnetic field needs to be applied in order to keep the plasma particles on desired tracks.
A 3hrs intro lecture to Approximate Bayesian Computation (ABC), given as part of a PhD course at Lund University, February 2016. For sample codes see http://www.maths.lu.se/kurshemsida/phd-course-fms020f-nams002-statistical-inference-for-partially-observed-stochastic-processes/
Random Matrix Theory and Machine Learning - Part 1Fabian Pedregosa
ICML 2021 tutorial on random matrix theory and machine learning. Part 1 covers: 1. A brief history of Random Matrix Theory, 2. Classical Random Matrix Ensembles (basic building blocks)
In this talk, we give an overview of results on numerical integration in Hermite spaces. These spaces contain functions defined on $\mathbb{R}^d$, and can be characterized by the decay of their Hermite coefficients. We consider the case of exponentially as well as polynomially decaying Hermite coefficients. For numerical integration, we either use Gauss-Hermite quadrature rules or algorithms based on quasi-Monte Carlo rules. We present upper and lower error bounds for these algorithms, and discuss their dependence on the dimension $d$. Furthermore, we comment on open problems for future research.
On Twisted Paraproducts and some other Multilinear Singular IntegralsVjekoslavKovac1
Presentation.
9th International Conference on Harmonic Analysis and Partial Differential Equations, El Escorial, June 12, 2012.
The 24th International Conference on Operator Theory, Timisoara, July 3, 2012.
Numerical Fourier transform based on hyperfunction theoryHidenoriOgata
This is the PC slide of a contributed talk in the conference "ECMI2018 (The 20th European Conference on Mathematics for Industry)", 18-20 June 2018, Budapest, Hungary. In this talk, we propose a numerical method of Fourier transforms based on hyperfunction theory.
We start with motivation, few examples of uncertainties. Then we discretize elliptic PDE with uncertain coefficients, apply TT format for permeability, the stochastic operator and for the solution. We compare sparse multi-index set approach with full multi-index+TT.
Tensor Train format allows us to keep the whole multi-index set, without any multi-index set truncation.
Low rank tensor approximation of probability density and characteristic funct...Alexander Litvinenko
Very often one has to deal with high-dimensional random variables (RVs). A high-dimensional RV can be described by its probability density (\pdf) and/or by the corresponding probability characteristic functions (\pcf), or by a function representation. Here the interest is mainly to compute characterisations like the entropy, or
relations between two distributions, like their Kullback-Leibler divergence, or more general measures such as $f$-divergences,
among others. These are all computed from the \pdf, which is often not available directly, and it is a computational challenge to even represent it in a numerically feasible fashion in case the dimension $d$ is even moderately large. It is an even stronger numerical challenge to then actually compute said characterisations in the high-dimensional case.
In this regard, in order to achieve a computationally feasible task, we propose to represent the density by a high order tensor product, and approximate this in a low-rank format.
We provide a comprehensive convergence analysis of the asymptotic preserving implicit-explicit particle-in-cell (IMEX-PIC) methods for the Vlasov–Poisson system with a strong magnetic field. This study is of utmost importance for understanding the behavior of plasmas in magnetic fusion devices such as tokamaks, where such a large magnetic field needs to be applied in order to keep the plasma particles on desired tracks.
A 3hrs intro lecture to Approximate Bayesian Computation (ABC), given as part of a PhD course at Lund University, February 2016. For sample codes see http://www.maths.lu.se/kurshemsida/phd-course-fms020f-nams002-statistical-inference-for-partially-observed-stochastic-processes/
Random Matrix Theory and Machine Learning - Part 1Fabian Pedregosa
ICML 2021 tutorial on random matrix theory and machine learning. Part 1 covers: 1. A brief history of Random Matrix Theory, 2. Classical Random Matrix Ensembles (basic building blocks)
In this talk, we give an overview of results on numerical integration in Hermite spaces. These spaces contain functions defined on $\mathbb{R}^d$, and can be characterized by the decay of their Hermite coefficients. We consider the case of exponentially as well as polynomially decaying Hermite coefficients. For numerical integration, we either use Gauss-Hermite quadrature rules or algorithms based on quasi-Monte Carlo rules. We present upper and lower error bounds for these algorithms, and discuss their dependence on the dimension $d$. Furthermore, we comment on open problems for future research.
On Twisted Paraproducts and some other Multilinear Singular IntegralsVjekoslavKovac1
Presentation.
9th International Conference on Harmonic Analysis and Partial Differential Equations, El Escorial, June 12, 2012.
The 24th International Conference on Operator Theory, Timisoara, July 3, 2012.
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...Alexander Litvinenko
Talk presented on SIAM IS 2022 conference.
Very often, in the course of uncertainty quantification tasks or
data analysis, one has to deal with high-dimensional random variables (RVs)
(with values in $\Rd$). Just like any other RV,
a high-dimensional RV can be described by its probability density (\pdf) and/or
by the corresponding probability characteristic functions (\pcf),
or a more general representation as
a function of other, known, random variables.
Here the interest is mainly to compute characterisations like the entropy, the Kullback-Leibler, or more general
$f$-divergences. These are all computed from the \pdf, which is often not available directly,
and it is a computational challenge to even represent it in a numerically
feasible fashion in case the dimension $d$ is even moderately large. It
is an even stronger numerical challenge to then actually compute said characterisations
in the high-dimensional case.
In this regard, in order to achieve a computationally feasible task, we propose
to approximate density by a low-rank tensor.
Here the interest is mainly to compute characterisations like the entropy,
the Kullback-Leibler divergence, more general $f$-divergences, or other such characteristics based on
the probability density. The density is often not available directly,
and it is a computational challenge to just represent it in a numerically
feasible fashion in case the dimension is even moderately large. It
is an even stronger numerical challenge to then actually compute said characteristics
in the high-dimensional case.
The task considered here was the numerical computation of characterising statistics of
high-dimensional pdfs, as well as their divergences and distances,
where the pdf in the numerical implementation was assumed discretised on some regular grid.
We have demonstrated that high-dimensional pdfs,
pcfs, and some functions of them
can be approximated and represented in a low-rank tensor data format.
Utilisation of low-rank tensor techniques helps to reduce the computational complexity
and the storage cost from exponential $\C{O}(n^d)$ to linear in the dimension $d$, e.g.\
$O(d n r^2 )$ for the TT format. Here $n$ is the number of discretisation
points in one direction, $r<<n$ is the maximal tensor rank, and $d$ the problem dimension.
Image sciences, image processing, image restoration, photo manipulation. Image and videos representation. Digital versus analog imagery. Quantization and sampling. Sources and models of noises in digital CCD imagery: photon, thermal and readout noises. Sources and models of blurs. Convolutions and point spread functions. Overview of other standard models, problems and tasks: salt-and-pepper and impulse noises, half toning, inpainting, super-resolution, compressed sensing, high dynamic range imagery, demosaicing. Short introduction to other types of imagery: SAR, Sonar, ultrasound, CT and MRI. Linear and ill-posed restoration problems.
Diffusion Schrödinger bridges for score-based generative modelingJeremyHeng10
Progressively applying Gaussian noise transforms complex data distributions to approximately Gaussian. Reversing this dynamic defines a generative model. When the forward noising process is given by a Stochastic Differential Equation (SDE), Song et al. (2021) demonstrate how the time inhomogeneous drift of the associated reverse-time SDE may be estimated using score-matching. A limitation of this approach is that the forward-time SDE must be run for a sufficiently long time for the final distribution to be approximately Gaussian. In contrast, solving the Schrödinger Bridge problem (SB), i.e. an entropy-regularized optimal transport problem on path spaces, yields diffusions which generate samples from the data distribution in finite time. We present Diffusion SB (DSB), an original approximation of the Iterative Proportional Fitting (IPF) procedure to solve the SB problem, and provide theoretical analysis along with generative modeling experiments. The first DSB iteration recovers the methodology proposed by Song et al. (2021), with the flexibility of using shorter time intervals, as subsequent DSB iterations reduce the discrepancy between the final-time marginal of the forward (resp. backward) SDE with respect to the prior (resp. data) distribution. Beyond generative modeling, DSB offers a widely applicable computational optimal transport tool as the continuous state-space analogue of the popular Sinkhorn algorithm (Cuturi, 2013).
Bayesian Inference and Uncertainty Quantification for Inverse ProblemsMatt Moores
So-called “inverse” problems arise when the parameters of a physical system cannot be directly observed. The mapping between these latent parameters and the space of noisy observations is represented as a mathematical model, often involving a system of differential equations. We seek to infer the parameter values that best fit our observed data. However, it is also vital to obtain accurate quantification of the uncertainty involved with these parameters, particularly when the output of the model will be used for forecasting. Bayesian inference provides well-calibrated uncertainty estimates, represented by the posterior distribution over the parameters. In this talk, I will give a brief introduction to Markov chain Monte Carlo (MCMC) algorithms for sampling from the posterior distribution and describe how they can be combined with numerical solvers for the forward model. We apply these methods to two examples of ODE models: growth curves in ecology, and thermogravimetric analysis (TGA) in chemistry. This is joint work with Matthew Berry, Mark Nelson, Brian Monaghan and Raymond Longbottom.
The time scale Fibonacci sequences satisfy the Friedmann-Lema\^itre-Robertson-Walker (FLRW) dynamic equation on time scale, which are an exact solution of Einstein's field equations of general relativity for an expanding homogeneous and isotropic universe. We show that the equations of motion correspond to the one-dimensional motion of a particle of position $F(t)$ in an inverted harmonic potential. For the dynamic equations on time scale describing the Fibonacci numbers $F(t)$, we present the Lagrangian and Hamiltonian formalism. Identifying these with the equations that describe factor scales, we conclude that for a certain granulation, for both the continuous and the discrete universe, we have the same dynamics.
Diffusion Schrödinger bridges for score-based generative modelingJeremyHeng10
Progressively applying Gaussian noise transforms complex data distributions to approximately Gaussian. Reversing this dynamic defines a generative model. When the forward noising process is given by a Stochastic Differential Equation (SDE), Song et al. (2021) demonstrate how the time inhomogeneous drift of the associated reverse-time SDE may be estimated using score-matching. A limitation of this approach is that the forward-time SDE must be run for a sufficiently long time for the final distribution to be approximately Gaussian. In contrast, solving the Schrödinger Bridge problem (SB), i.e. an entropy-regularized optimal transport problem on path spaces, yields diffusions which generate samples from the data distribution in finite time. We present Diffusion SB (DSB), an original approximation of the Iterative Proportional Fitting (IPF) procedure to solve the SB problem, and provide theoretical analysis along with generative modeling experiments. The first DSB iteration recovers the methodology proposed by Song et al. (2021), with the flexibility of using shorter time intervals, as subsequent DSB iterations reduce the discrepancy between the final-time marginal of the forward (resp. backward) SDE with respect to the prior (resp. data) distribution. Beyond generative modeling, DSB offers a widely applicable computational optimal transport tool as the continuous state-space analogue of the popular Sinkhorn algorithm (Cuturi, 2013).
Similar to Estimation of the score vector and observed information matrix in intractable models (20)
Susie Bayarri Plenary Lecture given in the ISBA (International Society of Bayesian Analysis) World Meeting in Montreal, Canada on June 30, 2022, by Pierre E, Jacob (https://sites.google.com/site/pierrejacob/)
Talk on the design on non-negative unbiased estimators, useful to perform exact inference for intractable target distributions.
Corresponds to the article http://arxiv.org/abs/1309.6473
SMC^2: an algorithm for sequential analysis of state-space modelsPierre Jacob
In these slides I presented the SMC^2 method (see the article here: http://arxiv.org/abs/1101.1528 ) to an audience of marine biogeochemistry people, emphasizing on the model evidence estimation aspect.
This a short presentation for a 15 minutes talk at Bayesian Inference for Stochastic Processes 7, on the SMC^2 algorithm.
http://arxiv.org/abs/1101.1528
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
This pdf is about the Schizophrenia.
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
Cancer cell metabolism: special Reference to Lactate PathwayAADYARAJPANDEY1
Normal Cell Metabolism:
Cellular respiration describes the series of steps that cells use to break down sugar and other chemicals to get the energy we need to function.
Energy is stored in the bonds of glucose and when glucose is broken down, much of that energy is released.
Cell utilize energy in the form of ATP.
The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules - a chemical called pyruvate. A small amount of ATP is formed during this process.
Most healthy cells continue the breakdown in a second process, called the Kreb's cycle. The Kreb's cycle allows cells to “burn” the pyruvates made in glycolysis to get more ATP.
The last step in the breakdown of glucose is called oxidative phosphorylation (Ox-Phos).
It takes place in specialized cell structures called mitochondria. This process produces a large amount of ATP. Importantly, cells need oxygen to complete oxidative phosphorylation.
If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis - Kreb's - oxidative phosphorylation), about 36 molecules of ATP are created, giving it much more energy to use.
IN CANCER CELL:
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
introduction to WARBERG PHENOMENA:
WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside.
Otto Heinrich Warburg (; 8 October 1883 – 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme.
WARNBURG EFFECT : cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Estimation of the score vector and observed information matrix in intractable models
1. Estimation of the score vector and observed
information matrix in intractable models
Arnaud Doucet (University of Oxford)
Pierre E. Jacob (University of Oxford)
Sylvain Rubenthaler (Universit´e Nice Sophia Antipolis)
November 28th, 2014
Pierre Jacob Derivative estimation 1/ 39
2. Outline
1 Context
2 General results and connections
3 Posterior concentration when the prior concentrates
4 Hidden Markov models
Pierre Jacob Derivative estimation 1/ 39
3. Outline
1 Context
2 General results and connections
3 Posterior concentration when the prior concentrates
4 Hidden Markov models
Pierre Jacob Derivative estimation 2/ 39
4. Motivation
Derivatives of the likelihood help optimizing / sampling.
For many models they are not available.
One can resort to approximation techniques.
Pierre Jacob Derivative estimation 2/ 39
5. Using derivatives in sampling algorithms
Modified Adjusted Langevin Algorithm
At step t, given a point θt, do:
propose
θ⋆
∼ q(dθ | θt) ≡ N(θt +
σ2
2
∇θ log π(θt), σ2
),
with probability
1 ∧
π(θ⋆)q(θt | θ⋆)
π(θt)q(θ⋆ | θt)
set θt+1 = θ⋆, otherwise set θt+1 = θt.
Pierre Jacob Derivative estimation 3/ 39
6. Using derivatives in sampling algorithms
Figure : Proposal mechanism for random walk Metropolis–Hastings.
Pierre Jacob Derivative estimation 4/ 39
7. Using derivatives in sampling algorithms
Figure : Proposal mechanism for MALA.
Pierre Jacob Derivative estimation 5/ 39
8. Using derivatives in sampling algorithms
In what sense is MALA better than MH?
Scaling with the dimension of the state space
For Metropolis–Hastings, optimal scaling leads to
σ2
= O(d−1
),
For MALA, optimal scaling leads to
σ2
= O(d−1/3
).
Roberts & Rosenthal, Optimal Scaling for Various
Metropolis-Hastings Algorithms, 2001.
Pierre Jacob Derivative estimation 6/ 39
9. Hidden Markov models
y2
X2X0
y1
X1
...
... yT
XT
θ
Figure : Graph representation of a general hidden Markov model.
Hidden process: initial distribution µθ, transition fθ.
Observations conditional upon the hidden process, from gθ.
Pierre Jacob Derivative estimation 7/ 39
10. Assumptions
Input:
Parameter θ : unknown, prior distribution p.
Initial condition µθ(dx0) : can be sampled from.
Transition fθ(dxt|xt−1) : can be sampled from.
Measurement gθ(yt|xt) : can be evaluated point-wise.
Observations y1:T = (y1, . . . , yT ).
Goals:
score: ∇θ log L(θ; y1:T ) for any θ,
observed information matrix: −∇2
θ log L(θ; y1:T ) for any θ.
Note: throughout the talk, the observations, and thus the
likelihood, are fixed.
Pierre Jacob Derivative estimation 8/ 39
11. Why is it an intractable model?
The likelihood function does not admit a closed form expression:
L(θ; y1, . . . , yT ) =
∫
XT+1
p(y1, . . . , yT | x0, . . . xT , θ)p(dx0, . . . dxT | θ)
=
∫
XT+1
T∏
t=1
gθ(yt | xt) µθ(dx0)
T∏
t=1
fθ(dxt | xt−1).
Hence the likelihood can only be estimated, e.g. by standard
Monte Carlo, or by particle filters.
What about the derivatives of the likelihood?
Pierre Jacob Derivative estimation 9/ 39
12. Fisher and Louis’ identities
Write the score as:
∇ℓ(θ) =
∫
∇ log p(x0:T , y1:T | θ)p(dx0:T | y1:T , θ).
which is an integral, with respect to the smoothing distribution
p(dx0:T | y1:T , θ), of
∇ log p(x0:T , y1:T | θ) = ∇ log µθ(x0)
+
T∑
t=1
∇ log fθ(xt | xt−1) +
T∑
t=1
∇ log gθ(yt | xt).
However pointwise evaluations of ∇ log µθ(x0) and
∇ log fθ(xt | xt−1) are not always available.
Pierre Jacob Derivative estimation 10/ 39
13. New kid on the block: Iterated Filtering
Perturbed model
Hidden states ˜Xt = (˜θt, Xt).
{
˜θ0 ∼ N(θ0, τ2Σ)
X0 ∼ µ˜θ0
(·)
and
{
˜θt ∼ N(˜θt−1, σ2Σ)
Xt ∼ f˜θt
(· | Xt−1 = xt−1)
Observations ˜Yt ∼ g˜θt
(· | Xt).
Score estimate
Consider VP,t = Cov[˜θt | y1:t−1] and ˜θF,t = E[˜θt | y1:t].
T∑
t=1
VP,t
−1
(
˜θF,t − ˜θF,t−1
)
≈ ∇ℓ(θ0)
when τ → 0 and σ/τ → 0. Ionides, Breto, King, PNAS, 2006.
Pierre Jacob Derivative estimation 11/ 39
14. Iterated Filtering: the mystery
Why is it valid?
Is it related to known techniques?
Can it be extended to estimate the second derivatives (i.e.
the Hessian, i.e. the observed information matrix)?
How does it compare to other methods such as finite
difference?
Pierre Jacob Derivative estimation 12/ 39
15. Outline
1 Context
2 General results and connections
3 Posterior concentration when the prior concentrates
4 Hidden Markov models
Pierre Jacob Derivative estimation 13/ 39
16. Iterated Filtering
Given a log likelihood ℓ and a given point, consider a prior
θ ∼ N(θ0, σ2
).
Posterior expectation when the prior variance goes to zero
First-order moments give first-order derivatives:
|σ−2
(E[θ|Y ] − θ0) − ∇ℓ(θ0)| ≤ Cσ2
.
Phrased simply,
posterior mean − prior mean
prior variance
≈ score.
Result from Ionides, Bhadra, Atchad´e, King, Iterated filtering,
2011.
Pierre Jacob Derivative estimation 13/ 39
17. Extension of Iterated Filtering
Posterior variance when the prior variance goes to zero
Second-order moments give second-order derivatives:
|σ−4
(
Cov[θ|Y ] − σ2
)
− ∇2
ℓ(θ0)| ≤ Cσ2
.
Phrased simply,
posterior variance − prior variance
prior variance2 ≈ hessian.
Result from Doucet, Jacob, Rubenthaler on arXiv, 2013.
Pierre Jacob Derivative estimation 14/ 39
18. Proximity mapping
Given a real function f and a point θ0, consider for any σ2 > 0
θ → f (θ) exp
{
−
1
2σ2
(θ − θ0)2
}
θ
θ0
Figure : Example for f : θ → exp(−|θ|) and three values of σ2
.
Pierre Jacob Derivative estimation 15/ 39
19. Proximity mapping
Proximity mapping
The σ2-proximity mapping is defined by
proxf : θ0 → argmaxθ∈R f (θ) exp
{
−
1
2σ2
(θ − θ0)2
}
.
Moreau approximation
The σ2-Moreau approximation is defined by
fσ2 : θ0 → C supθ∈R f (θ) exp
{
−
1
2σ2
(θ − θ0)2
}
where C is a normalizing constant.
Pierre Jacob Derivative estimation 16/ 39
20. Proximity mapping
θ
Figure : θ → f (θ) and θ → fσ2 (θ) for three values of σ2
.
Pierre Jacob Derivative estimation 16/ 39
21. Proximity mapping
Property
Those objects are such that
proxf (θ0) − θ0
σ2
= ∇ log fσ2 (θ0) −−−→
σ2→0
∇ log f (θ0)
Moreau (1962), Fonctions convexes duales et points proximaux
dans un espace Hilbertien.
Pereyra (2013), Proximal Markov chain Monte Carlo
algorithms.
Pierre Jacob Derivative estimation 17/ 39
22. Proximity mapping
Bayesian interpretation
If f is a seen as a likelihood function then
θ → f (θ) exp
{
−
1
2σ2
(θ − θ0)2
}
is an unnormalized posterior density function based on a
Normal prior with mean θ0 and variance σ2.
Hence
proxf (θ0) − θ0
σ2
−−−→
σ2→0
∇ log f (θ0)
can be read
posterior mode − prior mode
prior variance
≈ score.
Pierre Jacob Derivative estimation 18/ 39
23. Stein’s lemma
Stein’s lemma states that
θ ∼ N(θ0, σ2
)
if and only if for any function g such that E [|∇g(θ)|] < ∞,
E [(θ − θ0) g (θ)] = σ2
E [∇g (θ)] .
If we choose the function g : θ → exp ℓ (θ) /Z with
Z = E [exp ℓ (θ)] and apply Stein’s lemma we obtain
1
Z
E [θ exp ℓ(θ)] − θ0 =
σ2
Z
E [∇ℓ (θ) exp (ℓ (θ))]
⇔ σ−2
(E [θ | Y ] − θ0) = E [∇ℓ (θ) | Y ] .
Notation: E[φ(θ) | Y ] := E[φ(θ) exp ℓ(θ)]/Z.
Pierre Jacob Derivative estimation 19/ 39
24. Stein’s lemma
For the second derivative, we consider
h : θ → (θ − θ0) exp ℓ (θ) /Z.
Then
E
[
(θ − θ0)2
| Y
]
= σ2
+ σ4
E
[
∇2
ℓ(θ) + ∇ℓ(θ)2
| Y
]
.
Adding and subtracting terms also yields
σ−4
(
V [θ | Y ] − σ2
)
= E
[
∇2
ℓ(θ) | Y
]
+
{
E
[
∇ℓ(θ)2
| Y
]
− (E [∇ℓ(θ) | Y ])2
}
.
. . . but what we really want is
∇ℓ(θ0), ∇ℓ2
(θ0)
and not
E [∇ℓ(θ) | Y ] , E
[
∇ℓ2
(θ) | Y
]
.
Pierre Jacob Derivative estimation 20/ 39
25. Outline
1 Context
2 General results and connections
3 Posterior concentration when the prior concentrates
4 Hidden Markov models
Pierre Jacob Derivative estimation 21/ 39
26. Core Idea
The prior is a essentially a normal distribution N(θ0, σ2), but in
general has a density denoted by κ.
Posterior concentration induced by the prior
Under some assumptions, when σ → 0:
the posterior looks more and more like the prior,
the shift in posterior moments is in O(σ2).
Our arXived proof suffers from an overdose of Taylor
expansions.
Pierre Jacob Derivative estimation 21/ 39
27. Details
Introduce a test function h such that |h(u)| < c|u|α for some
c, α.
We start by writing
E {h (θ − θ0)| y} =
∫
h (σu) exp {ℓ (θ0 + σu) − ℓ(θ0)} κ (u) du
∫
exp {ℓ (θ0 + σu) − ℓ(θ0)} κ (u) du
using u = (θ − θ0)/σ and then focus on the numerator
∫
h (σu) exp {ℓ (θ0 + σu) − ℓ(θ0)} κ (u) du
since the denominator is a particular instance of this expression
with h : u → 1.
Pierre Jacob Derivative estimation 22/ 39
28. Details
For the numerator:
∫
h (σu) exp {ℓ (θ0 + σu) − ℓ(θ0)} κ (u) du
we use a Taylor expansion of ℓ around θ0 and a Taylor
expansion of exp around 0, and then take the integral with
respect to κ.
Notation:
ℓ(k)
(θ).u⊗k
=
∑
1≤i1,...,ik≤d
∂kℓ(θ)
∂θi1 . . . ∂θik
ui1 . . . uik
which in one dimension becomes
ℓ(k)
(θ).u⊗k
=
dkf (θ)
dθk
uk
.
Pierre Jacob Derivative estimation 23/ 39
29. Details
Main expansion:
∫
h(σu) exp {ℓ (θ0 + σu) − ℓ(θ0)} κ(u)du =
∫
h(σu)κ(u)du + σ
∫
h(σu)ℓ(1)
(θ0).u κ(u)du
+ σ2
∫
h(σu)
{
1
2
ℓ(2)
(θ0).u⊗2
+
1
2
(ℓ(1)
(θ0).u)2
}
κ(u)du
+ σ3
∫
h(σu)
{
1
3!
(ℓ(1)
(θ0).u)3
+
1
2
(ℓ(1)
(θ0).u)(ℓ(2)
(θ0).u⊗2
)
+
1
3!
ℓ(3)
(θ0).u⊗3
}
κ(u)du + O(σ4+α
).
In general, assumptions on the tails of the prior and the
likelihood are used to control the remainder terms and to
ensure there are O(σ4+α).
Pierre Jacob Derivative estimation 24/ 39
30. Details
We cut the integral into two bits:
∫
h(σu) exp {ℓ (θ0 + σu) − ℓ(θ0)} κ(u)du
=
∫
σ|u|≤ρ
h(σu) exp {ℓ (θ0 + σu) − ℓ(θ0)} κ(u)du
+
∫
σ|u|>ρ
h(σu) exp {ℓ (θ0 + σu) − ℓ(θ0)} κ(u)du
The expansion stems from the first term, where σ|u| is
small.
The second term ends up in the remainder in O(σ4+α)
using the assumptions.
Classic technique in Bayesian asymptotics theory.
Pierre Jacob Derivative estimation 25/ 39
31. Details
To get the score from the expansion, choose
h : u → u.
To get the observed information matrix from the
expansion, choose
h : u → u2
,
and surprisingly (?) further assume that κ is mesokurtic,
i.e. ∫
u4
κ(u)du = 3
(∫
u2
κ(u)du
)2
⇒ choose a Gaussian prior to obtain the hessian.
Pierre Jacob Derivative estimation 26/ 39
32. Outline
1 Context
2 General results and connections
3 Posterior concentration when the prior concentrates
4 Hidden Markov models
Pierre Jacob Derivative estimation 27/ 39
34. Hidden Markov models
Direct application of the previous results
1 Prior distribution N(θ0, σ2) on the parameter θ.
2 The derivative approximations involve E[θ|Y ] and
Cov[θ|Y ].
3 Posterior moments for HMMs can be estimated by
particle MCMC,
SMC2
,
ABC
or your favourite method.
Ionides et al. proposed another approach.
Pierre Jacob Derivative estimation 28/ 39
35. Iterated Filtering
Modification of the model: θ is time-varying.
The associated loglikelihood is
¯ℓ(θ1:T ) = log p(y1:T ; θ1:T )
= log
∫
XT+1
T∏
t=1
g(yt | xt, θt) µ(dx1 | θ1)
T∏
t=2
f (dxt | xt−1, θt).
Introducing θ → (θ, θ, . . . , θ) := θ[T] ∈ RT , we have
¯ℓ(θ[T]
) = ℓ(θ)
and the chain rule yields
dℓ(θ)
dθ
=
T∑
t=1
∂¯ℓ(θ[T])
∂θt
.
Pierre Jacob Derivative estimation 29/ 39
37. Iterated Filtering
Applying the general results for this prior yields, with
|x| =
∑T
t=1 |xi|:
|∇¯ℓ(θ
[T]
0 ) − Σ−1
T
(
E
[
θ1:T | Y
]
− θ
[T]
0
)
| ≤ Cτ2
Moreover we have
T∑
t=1
∂¯ℓ(θ[T])
∂θt
−
T∑
t=1
{
Σ−1
T
(
E
[
θ1:T | Y
]
− θ
[T]
0
)}
t
≤
T∑
t=1
∂¯ℓ(θ[T])
∂θt
−
{
Σ−1
T
(
E
[
θ1:T | Y
]
− θ
[T]
0
)}
t
and
dℓ(θ)
dθ
=
T∑
t=1
∂¯ℓ(θ[T])
∂θt
.
Pierre Jacob Derivative estimation 31/ 39
38. Iterated Filtering
The estimator of the score is thus given by
T∑
t=1
{
Σ−1
T
(
E
[
θ1:T | Y
]
− θ
[T]
0
)}
t
which can be reduced to
Sτ,ρ,T (θ0) =
τ−2
1 + ρ
[
(1 − ρ)
{T−1∑
t=2
E
(
θt Y
)
}
− {(1 − ρ) T + 2ρ} θ0
+E
(
θ1 Y
)
+ E
(
θT Y
)]
,
given the form of Σ−1
T . Note that in the quantities E(θt | Y ),
Y = Y1:T is the complete dataset, thus those expectations are
with respect to the smoothing distribution.
Pierre Jacob Derivative estimation 32/ 39
39. Iterated Filtering
If ρ = 1, then the parameters follow a random walk:
θ1 = θ0 + N(0, τ2
) and θt+1 = θt + N(0, σ2
).
In this case Ionides et al. proposed the estimator
Sτ,σ,T = τ−2
(
E
(
θT | Y
)
− θ0
)
as well as
S
(bis)
τ,σ,T =
T∑
t=1
VP,t
−1
(
˜θF,t − ˜θF,t−1
)
with VP,t = Cov[˜θt | y1:t−1] and ˜θF,t = E[θt | y1:t].
Those expressions only involve expectations with respect to
filtering distributions.
Pierre Jacob Derivative estimation 33/ 39
40. Iterated Filtering
If ρ = 0, then the parameters are i.i.d:
θ1 = θ0 + N(0, τ2
) and θt+1 = θ0 + N(0, τ2
).
In this case the expression of the score estimator reduces to
Sτ,T = τ−2
T∑
t=1
(
E
(
θt | Y
)
− θ0
)
which involves smoothing distributions.
There’s only one parameter τ2 to choose for the prior.
However smoothing for general hidden Markov models is
difficult, and typically resorts to “fixed lag approximations”.
Pierre Jacob Derivative estimation 34/ 39
41. Iterated Smoothing
Only for the case ρ = 0 are we able to obtain simple expressions
for the observed information matrix. We propose the following
estimator:
Iτ,T (θ0) = −τ−4
{ T∑
s=1
T∑
t=1
Cov
(
θs, θt Y
)
− τ2
T
}
.
for which we can show that
Iτ,T − (−∇2
ℓ(θ0)) ≤ Cτ2
.
Pierre Jacob Derivative estimation 35/ 39
42. Numerical results
Linear Gaussian state space model where the ground truth is
available through the Kalman filter.
X0 ∼ N(0, 1) and Xt = ρXt−1 + N(0, V )
Yt = ηXt + N(0, W ).
Generate T = 100 observations and set
ρ = 0.9, V = 0.7, η = 0.9 and W = 0.1, 0.2, 0.4, 0.9.
240 independent runs, matching the computational costs
between methods in terms of number of calls to the transition
kernel.
Pierre Jacob Derivative estimation 36/ 39
45. Bibliography
Main references:
Inference for nonlinear dynamical systems, Ionides, Breto,
King, PNAS, 2006.
Iterated filtering, Ionides, Bhadra, Atchad´e, King, Annals
of Statistics, 2011.
Efficient iterated filtering, Lindstr¨om, Ionides, Frydendall,
Madsen, 16th IFAC Symposium on System Identification.
Derivative-Free Estimation of the Score Vector
and Observed Information Matrix,
Doucet, Jacob, Rubenthaler, 2013 (on arXiv).
Pierre Jacob Derivative estimation 39/ 39