The document discusses derivative-free optimization and evolutionary algorithms. It begins with an introduction to derivative-free optimization, explaining why it is useful when derivatives are unavailable or functions are noisy. Evolutionary algorithms are then discussed, including their fundamental elements like populations, selection, and variation operators. Specific evolutionary algorithms are presented, such as the estimation of distribution algorithm (EDA) and the (1+1)-ES algorithm with 1/5th success rule adaptation. The slides note that evolutionary algorithms are robust to noise and difficult optimization problems but are generally slower than derivative-based methods.
05 history of cv a machine learning (theory) perspective on computer visionzukun
This document provides an overview of machine learning algorithms used in computer vision from the perspective of a machine learning theorist. It discusses how the theorist got involved in a computer vision project in 2002 and summarizes key algorithms at that time like boosting, support vector machines, and their developments. It also provides historical context and comparisons of algorithms like perceptron and Winnow. The document uses examples to explain concepts like kernels and the kernel trick in support vector machines.
EM algorithm and its application in probabilistic latent semantic analysiszukun
The document discusses the EM algorithm and its application in Probabilistic Latent Semantic Analysis (pLSA). It begins by introducing the parameter estimation problem and comparing frequentist and Bayesian approaches. It then describes the EM algorithm, which iteratively computes lower bounds to the log-likelihood function. Finally, it applies the EM algorithm to pLSA by modeling documents and words as arising from a mixture of latent topics.
Label propagation - Semisupervised Learning with Applications to NLPDavid Przybilla
Label propagation is a semi-supervised learning algorithm that propagates labels from a small set of labeled data points to unlabeled data points. The algorithm constructs a graph with nodes for each data point and weighted edges representing similarity between points. It then iteratively propagates the labels across the graph from labeled to unlabeled points until convergence, resulting in "soft" probabilistic labels for all points. The algorithm aims to minimize an energy function that encourages points connected by strong edges to receive similar labels. It performs well with limited labeled data by leveraging the graph structure to make predictions for unlabeled points.
Optimal control of coupled PDE networks with automated code generationDelta Pi Systems
This document summarizes an approach for optimal control of coupled partial differential equation (PDE) networks using automated code generation. It discusses representing PDE networks as graphs, formulating the optimal control problem, deriving adjoint equations to compute gradients, discretizing control variables, and generating code to solve the direct and adjoint problems. Tools used include the DOT language for graph representation, SymPy for symbolic math, Cog for code generation, SfePy for PDE solvers, and SciPy for numerics.
CVPR2010: higher order models in computer vision: Part 1, 2zukun
This document discusses tractable higher order models in computer vision using random field models. It introduces Markov random fields (MRFs) and factor graphs as graphical models for computer vision problems. Higher order models that include factors over cliques of more than two variables can model problems more accurately but are generally intractable. The document discusses various inference techniques for higher order models such as relaxation, message passing, and decomposition methods. It provides examples of how higher order and global models can be used in problems like segmentation, stereo matching, reconstruction, and denoising.
Solution of linear and nonlinear partial differential equations using mixture...Alexander Decker
This document discusses solving linear and nonlinear partial differential equations using a combination of the Elzaki transform and projected differential transform methods.
[1] It introduces the Elzaki transform and defines its properties for solving partial differential equations. Properties include formulas for the Elzaki transform of partial derivatives.
[2] It then introduces the projected differential transform method and defines its basic properties and theorems for decomposing nonlinear terms.
[3] As an example application, it shows how to use the two methods together to solve a general nonlinear, non-homogeneous partial differential equation with initial conditions. The Elzaki transform is used to obtain an expression for the solution, then the projected differential transform decomposes the
05 history of cv a machine learning (theory) perspective on computer visionzukun
This document provides an overview of machine learning algorithms used in computer vision from the perspective of a machine learning theorist. It discusses how the theorist got involved in a computer vision project in 2002 and summarizes key algorithms at that time like boosting, support vector machines, and their developments. It also provides historical context and comparisons of algorithms like perceptron and Winnow. The document uses examples to explain concepts like kernels and the kernel trick in support vector machines.
EM algorithm and its application in probabilistic latent semantic analysiszukun
The document discusses the EM algorithm and its application in Probabilistic Latent Semantic Analysis (pLSA). It begins by introducing the parameter estimation problem and comparing frequentist and Bayesian approaches. It then describes the EM algorithm, which iteratively computes lower bounds to the log-likelihood function. Finally, it applies the EM algorithm to pLSA by modeling documents and words as arising from a mixture of latent topics.
Label propagation - Semisupervised Learning with Applications to NLPDavid Przybilla
Label propagation is a semi-supervised learning algorithm that propagates labels from a small set of labeled data points to unlabeled data points. The algorithm constructs a graph with nodes for each data point and weighted edges representing similarity between points. It then iteratively propagates the labels across the graph from labeled to unlabeled points until convergence, resulting in "soft" probabilistic labels for all points. The algorithm aims to minimize an energy function that encourages points connected by strong edges to receive similar labels. It performs well with limited labeled data by leveraging the graph structure to make predictions for unlabeled points.
Optimal control of coupled PDE networks with automated code generationDelta Pi Systems
This document summarizes an approach for optimal control of coupled partial differential equation (PDE) networks using automated code generation. It discusses representing PDE networks as graphs, formulating the optimal control problem, deriving adjoint equations to compute gradients, discretizing control variables, and generating code to solve the direct and adjoint problems. Tools used include the DOT language for graph representation, SymPy for symbolic math, Cog for code generation, SfePy for PDE solvers, and SciPy for numerics.
CVPR2010: higher order models in computer vision: Part 1, 2zukun
This document discusses tractable higher order models in computer vision using random field models. It introduces Markov random fields (MRFs) and factor graphs as graphical models for computer vision problems. Higher order models that include factors over cliques of more than two variables can model problems more accurately but are generally intractable. The document discusses various inference techniques for higher order models such as relaxation, message passing, and decomposition methods. It provides examples of how higher order and global models can be used in problems like segmentation, stereo matching, reconstruction, and denoising.
Solution of linear and nonlinear partial differential equations using mixture...Alexander Decker
This document discusses solving linear and nonlinear partial differential equations using a combination of the Elzaki transform and projected differential transform methods.
[1] It introduces the Elzaki transform and defines its properties for solving partial differential equations. Properties include formulas for the Elzaki transform of partial derivatives.
[2] It then introduces the projected differential transform method and defines its basic properties and theorems for decomposing nonlinear terms.
[3] As an example application, it shows how to use the two methods together to solve a general nonlinear, non-homogeneous partial differential equation with initial conditions. The Elzaki transform is used to obtain an expression for the solution, then the projected differential transform decomposes the
11.solution of linear and nonlinear partial differential equations using mixt...Alexander Decker
This document discusses solving linear and nonlinear partial differential equations using a combination of the Elzaki transform and projected differential transform methods.
[1] It introduces the Elzaki transform and defines its properties for handling partial derivatives.
[2] It then introduces the projected differential transform method and its fundamental theorems for decomposing nonlinear terms.
[3] As an example application, it shows how to use the two methods together to solve a general nonlinear, non-homogeneous partial differential equation with initial conditions. The Elzaki transform is used to obtain an expression in terms of the nonlinear terms, which are then decomposed using the projected differential transform method.
Estimation of the score vector and observed information matrix in intractable...Pierre Jacob
This document discusses methods for estimating the score vector and observed information matrix for intractable models. It begins with an overview of using derivatives in sampling algorithms. It then discusses iterated filtering, a method for estimating derivatives in hidden Markov models when the likelihood is not available in closed form. Iterated filtering introduces a perturbed model and relates the posterior mean to the score and posterior variance to the observed information matrix. The document outlines proofs that show this relationship as the prior concentration increases.
Spectral Learning Methods for Finite State Machines with Applications to Na...LARCA UPC
The document summarizes a spectral learning method for probabilistic finite-state machines (FSMs). It introduces observable operator models that represent probabilistic transducers using conditional probabilities between inputs, outputs, and hidden states. A key contribution is a spectral algorithm that learns the parameters of these models from data in linear time, with theoretical PAC-style guarantees. Experimental results on synthetic data show the method outperforms baselines like HMMs and k-HMMs on learning tasks.
This document discusses using graphics processing units (GPUs) to perform approximate Bayesian computation (ABC) for parameter estimation of complex models. It describes how GPUs are well-suited for ABC due to their ability to perform linear computations on many threads in parallel. The document provides examples of applying ABC to GPUs for problems involving dynamical systems, network evolution models, and parameter estimation for protein interaction networks.
Influence of the sampling on Functional Data Analysistuxette
This document discusses the influence of sampling on functional data analysis. It notes that functional data is typically observed through discrete sampling rather than as true continuous functions. This sampling must be accounted for in functional data analysis methods. Specifically, the document discusses using spline approximations to represent sampled functions as elements of a reproducing kernel Hilbert space. This allows building estimators of the true functions from their sampling and understanding how sampling impacts estimators and errors in functional models.
A current perspectives of corrected operator splitting (os) for systemsAlexander Decker
This document discusses operator splitting methods for solving systems of convection-diffusion equations. It begins by introducing operator splitting, where the time evolution is split into separate steps for convection and diffusion. While efficient, operator splitting can produce significant errors near shocks.
The document then examines the nonlinear error mechanism that causes issues for operator splitting near shocks. When a shock develops in the convection step, it introduces a local linearization that neglects self-sharpening effects. This leads to splitting errors.
To address this, the document discusses corrected operator splitting, which uses the wave structure from the convection step to identify where nonlinear splitting errors occur. Terms are added to the diffusion step to compensate for
11.[104 111]analytical solution for telegraph equation by modified of sumudu ...Alexander Decker
1) The document presents a new integral transform method called the Elzaki transform to solve the general linear telegraph equation.
2) The Elzaki transform is used to obtain analytical solutions for the telegraph equation. Definitions and properties of the transform are provided.
3) Several examples are presented to demonstrate the method. Exact solutions to examples of the telegraph equation are obtained using the Elzaki transform.
Artificial Intelligence and Optimization with ParallelismOlivier Teytaud
This document discusses parallelism in artificial intelligence and evolutionary computation. It explains that comparison-based optimization algorithms, which include many evolutionary algorithms, can be naturally parallelized by speculatively running multiple branches in parallel with a branching factor of 3 or more. This allows theoretical logarithmic speedups to be achieved in practice through simple parallelization tricks.
This document describes a clustering procedure and nonparametric mixture estimation. It introduces a mixture density model where the goal is to efficiently estimate the mixture weights (αi) and component densities (fi). A two-stage clustering algorithm is proposed: 1) perform clustering on covariates (X) to estimate labels (Ik), and 2) estimate component densities (fi) using kernel density estimation within each cluster. The performance of this approach depends on the clustering method's misclassification error. A toy example with two components having disjoint support densities for X is provided to illustrate the model.
The document outlines research on developing optimal finite difference grids for solving elliptic and parabolic partial differential equations (PDEs). It introduces the motivation to accurately compute Neumann-to-Dirichlet (NtD) maps. It then summarizes the formulation and discretization of model elliptic and parabolic PDE problems, including deriving the discrete NtD map. It presents results on optimal grid design and the spectral accuracy achieved. Future work is proposed on extending the NtD map approach to non-uniformly spaced boundary data.
This document provides an overview of a 2004 CVPR tutorial on nonlinear manifolds in computer vision. The tutorial is divided into four parts that cover: (1) motivation for studying nonlinear manifolds and how differential geometry can be useful in vision, (2) tools from differential geometry like manifolds, tangent spaces, and geodesics, (3) statistics on manifolds like distributions and estimation, and (4) algorithms and applications in computer vision like pose estimation, tracking, and optimal linear projections. Nonlinear manifolds are important in computer vision as the underlying spaces in problems involving constraints like objects on circles or matrices with orthogonality constraints are nonlinear. Differential geometry provides a framework for generalizing tools from vector spaces to nonlinear
Prévision de consommation électrique avec adaptive GAMCdiscount
The document discusses generalized additive models (GAM) for short-term electricity load forecasting. GAMs are smooth additive models that decompose a response variable into additive components like trends, cyclic patterns, and nonlinear effects. They summarize how GAMs can model various drivers of electricity consumption, including temperature effects, day-of-week patterns, and lagged load values. Big additive models (BAM) allow applying GAMs to large electricity load datasets. BAMs use QR decomposition and online updating to efficiently estimate high-dimensional additive models.
This document discusses using Gaussian process models for change point detection in atmospheric dispersion problems. It proposes using multiple kernels in a Gaussian process to model different regimes indicated by change points. A two-stage process is used to first estimate the change point (release time) and then estimate the source location. Simulation results show the approach outperforms existing techniques in estimating change points and source locations from concentration sensor measurements. The approach is applied to model real concentration data to estimate a CBRN release scenario.
The document discusses using structured support vector machines to predict structured outputs by learning a scoring function F(x,y) = w*φ(x,y) that is maximized to make predictions, it provides an example of using this approach for category-level object localization in images by representing image-box pairs as features and learning to localize objects.
The document proposes a new method called Simplex Volume Analysis based on Triangular Factorization (SVATF) for hyperspectral unmixing. SVATF simplifies the endmember extraction process by using Cholesky factorization to calculate the volume of a simplex, allowing endmembers to be identified by maximizing values along the diagonal of the factorization. This reduces computational costs compared to existing methods. The paper evaluates SVATF on both synthetic and real hyperspectral data, finding the new approach can perform endmember extraction faster than alternatives with or without dimensionality reduction.
This document discusses unconditionally stable finite-difference time-domain (FDTD) methods for solving Maxwell's equations numerically. It outlines FDTD algorithms such as Yee's method from 1966 which discretize the equations on a staggered grid. It also discusses the von Neumann stability analysis and compares implicit Crank-Nicolson and alternating-direction implicit methods to conventional explicit FDTD methods. The document notes the advantages of unconditionally stable methods but also mentions potential disadvantages.
Undecidability in partially observable deterministic gamesOlivier Teytaud
Undecidability in partially observable deterministic games
Presented in Dagstuhl 2010
Accepted in IJFCS:
@article{david:hal-00710073,
hal_id = {hal-00710073},
url = {http://hal.inria.fr/hal-00710073},
title = {{The Frontier of Decidability in Partially Observable Recursive Games}},
author = {David, Auger and Teytaud, Olivier},
abstract = {{The classical decision problem associated with a game is whether a given player has a winning strategy, i.e. some strategy that leads almost surely to a victory, regardless of the other players' strategies. While this problem is relevant for deterministic fully observable games, for a partially observable game the requirement of winning with probability 1 is too strong. In fact, as shown in this paper, a game might be decidable for the simple criterion of almost sure victory, whereas optimal play (even in an approximate sense) is not computable. We therefore propose another criterion, the decidability of which is equivalent to the computability of approximately optimal play. Then, we show that (i) this criterion is undecidable in the general case, even with deterministic games (no random part in the game), (ii) that it is in the jump 0', and that, even in the stochastic case, (iii) it becomes decidable if we add the requirement that the game halts almost surely whatever maybe the strategies of the players.}},
language = {Anglais},
affiliation = {Laboratoire de Recherche en Informatique - LRI , TAO - INRIA Saclay - Ile de France},
booktitle = {{Special Issue on "Frontier between Decidability and Undecidability"}},
publisher = {World Scinet},
journal = {International Journal on Foundations of Computer Science (IJFCS)},
volume = {Accepted},
note = {revised 2011, accepted 2011, in press },
audience = {internationale },
year = {2012},
}
reinforcement learning for difficult settingsOlivier Teytaud
1) Subgoal learning, macro-actions, partial observation, and clustering of features were explored for difficult reinforcement learning settings.
2) MCTS was adapted for these settings by incorporating simulations, action categorization, macro-action building, feature clustering, and a tree of subgoals with nodes containing goals.
3) This approach combines techniques from the state of the art and was tested on problems from other projects, showing successful application to complex real-world problems.
This document summarizes a paper on the comparison-based complexity of multi-objective optimization. It outlines that the paper presents complexity upper and lower bounds for finding the Pareto front and Pareto set in multi-objective optimization problems. For upper bounds, it shows the complexity is O(Nd log(1/e)) for finding the whole Pareto set and O(N+1-d log(1/e)) for finding a single point, where N is the dimension and d is the number of objectives. For lower bounds, it applies a proof technique using packing numbers from the mono-objective case to the multi-objective case to derive tight complexity bounds.
11.solution of linear and nonlinear partial differential equations using mixt...Alexander Decker
This document discusses solving linear and nonlinear partial differential equations using a combination of the Elzaki transform and projected differential transform methods.
[1] It introduces the Elzaki transform and defines its properties for handling partial derivatives.
[2] It then introduces the projected differential transform method and its fundamental theorems for decomposing nonlinear terms.
[3] As an example application, it shows how to use the two methods together to solve a general nonlinear, non-homogeneous partial differential equation with initial conditions. The Elzaki transform is used to obtain an expression in terms of the nonlinear terms, which are then decomposed using the projected differential transform method.
Estimation of the score vector and observed information matrix in intractable...Pierre Jacob
This document discusses methods for estimating the score vector and observed information matrix for intractable models. It begins with an overview of using derivatives in sampling algorithms. It then discusses iterated filtering, a method for estimating derivatives in hidden Markov models when the likelihood is not available in closed form. Iterated filtering introduces a perturbed model and relates the posterior mean to the score and posterior variance to the observed information matrix. The document outlines proofs that show this relationship as the prior concentration increases.
Spectral Learning Methods for Finite State Machines with Applications to Na...LARCA UPC
The document summarizes a spectral learning method for probabilistic finite-state machines (FSMs). It introduces observable operator models that represent probabilistic transducers using conditional probabilities between inputs, outputs, and hidden states. A key contribution is a spectral algorithm that learns the parameters of these models from data in linear time, with theoretical PAC-style guarantees. Experimental results on synthetic data show the method outperforms baselines like HMMs and k-HMMs on learning tasks.
This document discusses using graphics processing units (GPUs) to perform approximate Bayesian computation (ABC) for parameter estimation of complex models. It describes how GPUs are well-suited for ABC due to their ability to perform linear computations on many threads in parallel. The document provides examples of applying ABC to GPUs for problems involving dynamical systems, network evolution models, and parameter estimation for protein interaction networks.
Influence of the sampling on Functional Data Analysistuxette
This document discusses the influence of sampling on functional data analysis. It notes that functional data is typically observed through discrete sampling rather than as true continuous functions. This sampling must be accounted for in functional data analysis methods. Specifically, the document discusses using spline approximations to represent sampled functions as elements of a reproducing kernel Hilbert space. This allows building estimators of the true functions from their sampling and understanding how sampling impacts estimators and errors in functional models.
A current perspectives of corrected operator splitting (os) for systemsAlexander Decker
This document discusses operator splitting methods for solving systems of convection-diffusion equations. It begins by introducing operator splitting, where the time evolution is split into separate steps for convection and diffusion. While efficient, operator splitting can produce significant errors near shocks.
The document then examines the nonlinear error mechanism that causes issues for operator splitting near shocks. When a shock develops in the convection step, it introduces a local linearization that neglects self-sharpening effects. This leads to splitting errors.
To address this, the document discusses corrected operator splitting, which uses the wave structure from the convection step to identify where nonlinear splitting errors occur. Terms are added to the diffusion step to compensate for
11.[104 111]analytical solution for telegraph equation by modified of sumudu ...Alexander Decker
1) The document presents a new integral transform method called the Elzaki transform to solve the general linear telegraph equation.
2) The Elzaki transform is used to obtain analytical solutions for the telegraph equation. Definitions and properties of the transform are provided.
3) Several examples are presented to demonstrate the method. Exact solutions to examples of the telegraph equation are obtained using the Elzaki transform.
Artificial Intelligence and Optimization with ParallelismOlivier Teytaud
This document discusses parallelism in artificial intelligence and evolutionary computation. It explains that comparison-based optimization algorithms, which include many evolutionary algorithms, can be naturally parallelized by speculatively running multiple branches in parallel with a branching factor of 3 or more. This allows theoretical logarithmic speedups to be achieved in practice through simple parallelization tricks.
This document describes a clustering procedure and nonparametric mixture estimation. It introduces a mixture density model where the goal is to efficiently estimate the mixture weights (αi) and component densities (fi). A two-stage clustering algorithm is proposed: 1) perform clustering on covariates (X) to estimate labels (Ik), and 2) estimate component densities (fi) using kernel density estimation within each cluster. The performance of this approach depends on the clustering method's misclassification error. A toy example with two components having disjoint support densities for X is provided to illustrate the model.
The document outlines research on developing optimal finite difference grids for solving elliptic and parabolic partial differential equations (PDEs). It introduces the motivation to accurately compute Neumann-to-Dirichlet (NtD) maps. It then summarizes the formulation and discretization of model elliptic and parabolic PDE problems, including deriving the discrete NtD map. It presents results on optimal grid design and the spectral accuracy achieved. Future work is proposed on extending the NtD map approach to non-uniformly spaced boundary data.
This document provides an overview of a 2004 CVPR tutorial on nonlinear manifolds in computer vision. The tutorial is divided into four parts that cover: (1) motivation for studying nonlinear manifolds and how differential geometry can be useful in vision, (2) tools from differential geometry like manifolds, tangent spaces, and geodesics, (3) statistics on manifolds like distributions and estimation, and (4) algorithms and applications in computer vision like pose estimation, tracking, and optimal linear projections. Nonlinear manifolds are important in computer vision as the underlying spaces in problems involving constraints like objects on circles or matrices with orthogonality constraints are nonlinear. Differential geometry provides a framework for generalizing tools from vector spaces to nonlinear
Prévision de consommation électrique avec adaptive GAMCdiscount
The document discusses generalized additive models (GAM) for short-term electricity load forecasting. GAMs are smooth additive models that decompose a response variable into additive components like trends, cyclic patterns, and nonlinear effects. They summarize how GAMs can model various drivers of electricity consumption, including temperature effects, day-of-week patterns, and lagged load values. Big additive models (BAM) allow applying GAMs to large electricity load datasets. BAMs use QR decomposition and online updating to efficiently estimate high-dimensional additive models.
This document discusses using Gaussian process models for change point detection in atmospheric dispersion problems. It proposes using multiple kernels in a Gaussian process to model different regimes indicated by change points. A two-stage process is used to first estimate the change point (release time) and then estimate the source location. Simulation results show the approach outperforms existing techniques in estimating change points and source locations from concentration sensor measurements. The approach is applied to model real concentration data to estimate a CBRN release scenario.
The document discusses using structured support vector machines to predict structured outputs by learning a scoring function F(x,y) = w*φ(x,y) that is maximized to make predictions, it provides an example of using this approach for category-level object localization in images by representing image-box pairs as features and learning to localize objects.
The document proposes a new method called Simplex Volume Analysis based on Triangular Factorization (SVATF) for hyperspectral unmixing. SVATF simplifies the endmember extraction process by using Cholesky factorization to calculate the volume of a simplex, allowing endmembers to be identified by maximizing values along the diagonal of the factorization. This reduces computational costs compared to existing methods. The paper evaluates SVATF on both synthetic and real hyperspectral data, finding the new approach can perform endmember extraction faster than alternatives with or without dimensionality reduction.
This document discusses unconditionally stable finite-difference time-domain (FDTD) methods for solving Maxwell's equations numerically. It outlines FDTD algorithms such as Yee's method from 1966 which discretize the equations on a staggered grid. It also discusses the von Neumann stability analysis and compares implicit Crank-Nicolson and alternating-direction implicit methods to conventional explicit FDTD methods. The document notes the advantages of unconditionally stable methods but also mentions potential disadvantages.
Undecidability in partially observable deterministic gamesOlivier Teytaud
Undecidability in partially observable deterministic games
Presented in Dagstuhl 2010
Accepted in IJFCS:
@article{david:hal-00710073,
hal_id = {hal-00710073},
url = {http://hal.inria.fr/hal-00710073},
title = {{The Frontier of Decidability in Partially Observable Recursive Games}},
author = {David, Auger and Teytaud, Olivier},
abstract = {{The classical decision problem associated with a game is whether a given player has a winning strategy, i.e. some strategy that leads almost surely to a victory, regardless of the other players' strategies. While this problem is relevant for deterministic fully observable games, for a partially observable game the requirement of winning with probability 1 is too strong. In fact, as shown in this paper, a game might be decidable for the simple criterion of almost sure victory, whereas optimal play (even in an approximate sense) is not computable. We therefore propose another criterion, the decidability of which is equivalent to the computability of approximately optimal play. Then, we show that (i) this criterion is undecidable in the general case, even with deterministic games (no random part in the game), (ii) that it is in the jump 0', and that, even in the stochastic case, (iii) it becomes decidable if we add the requirement that the game halts almost surely whatever maybe the strategies of the players.}},
language = {Anglais},
affiliation = {Laboratoire de Recherche en Informatique - LRI , TAO - INRIA Saclay - Ile de France},
booktitle = {{Special Issue on "Frontier between Decidability and Undecidability"}},
publisher = {World Scinet},
journal = {International Journal on Foundations of Computer Science (IJFCS)},
volume = {Accepted},
note = {revised 2011, accepted 2011, in press },
audience = {internationale },
year = {2012},
}
reinforcement learning for difficult settingsOlivier Teytaud
1) Subgoal learning, macro-actions, partial observation, and clustering of features were explored for difficult reinforcement learning settings.
2) MCTS was adapted for these settings by incorporating simulations, action categorization, macro-action building, feature clustering, and a tree of subgoals with nodes containing goals.
3) This approach combines techniques from the state of the art and was tested on problems from other projects, showing successful application to complex real-world problems.
This document summarizes a paper on the comparison-based complexity of multi-objective optimization. It outlines that the paper presents complexity upper and lower bounds for finding the Pareto front and Pareto set in multi-objective optimization problems. For upper bounds, it shows the complexity is O(Nd log(1/e)) for finding the whole Pareto set and O(N+1-d log(1/e)) for finding a single point, where N is the dimension and d is the number of objectives. For lower bounds, it applies a proof technique using packing numbers from the mono-objective case to the multi-objective case to derive tight complexity bounds.
- Version control systems like SVN, Git, and Mercurial allow multiple developers to work together on projects by tracking changes over time.
- SVN is recommended for beginners as it is relatively easy to use. Popular options for hosting SVN repositories include SourceForge, Google Code, and setting up your own server.
- More advanced distributed version control systems like Git and Mercurial give developers more flexibility but may be more complicated to set up and use. Common projects using these include Linux kernel (Git) and Mozilla Firefox (Mercurial).
This document discusses using quasi-random numbers rather than random numbers in evolutionary strategies (ES). Previous work found that quasi-random numbers worked better than random for initialization, restarts, and mutations on highly multimodal functions. The current work finds that quasi-random mutations also improve performance over random mutations on the Bbob test functions. Scatterplots show the quasi-random points are more uniformly distributed than random points. Further work includes implementing quasi-random numbers in open-source ES code and testing on more multimodal problems and algorithms.
Machine learning 2016: deep networks and Monte Carlo Tree SearchOlivier Teytaud
Deep networks and Monte Carlo tree search (MCTS) are two recent machine learning algorithms. Deep networks use multiple layers of neural networks trained in an unsupervised manner layer-by-layer before fine-tuning the entire network in a supervised manner, achieving better generalization than shallow networks. MCTS uses Monte Carlo simulations and the Upper Confidence Trees algorithm to achieve strong performance at games like Go, where it is difficult to define an evaluation function. Both algorithms were important breakthroughs that performed well in applications with minimal reliance on mathematical theory.
Why power system studies (and many others!) should be open data / open sourceOlivier Teytaud
1. The document discusses intellectual property and code in the contexts of games and power systems.
2. For games, the author's work on Go and Urban Rivals is discussed, noting challenges with commercializing or open sourcing the code.
3. For power systems, the author argues that open data and debate is needed given proprietary codes and studies currently used in the energy transition debate that receives public funding.
Published in ECML 2010
@inproceedings{DBLP:conf/pkdd/RoletT10,
author = {Philippe Rolet and
Olivier Teytaud},
title = {Complexity Bounds for Batch Active Learning in Classification},
booktitle = {ECML/PKDD (3)},
year = {2010},
pages = {293-305},
ee = {http://dx.doi.org/10.1007/978-3-642-15939-8_19},
crossref = {DBLP:conf/pkdd/2010-3},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
@proceedings{DBLP:conf/pkdd/2010-3,
editor = {Jos{\'e} L. Balc{\'a}zar and
Francesco Bonchi and
Aristides Gionis and
Mich{\`e}le Sebag},
title = {Machine Learning and Knowledge Discovery in Databases, European
Conference, ECML PKDD 2010, Barcelona, Spain, September
20-24, 2010, Proceedings, Part III},
booktitle = {ECML/PKDD (3)},
publisher = {Springer},
series = {Lecture Notes in Computer Science},
volume = {6323},
year = {2010},
isbn = {978-3-642-15938-1},
ee = {http://dx.doi.org/10.1007/978-3-642-15939-8},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
1. The document discusses various methods for continuous optimization, including rates of convergence for noise-free and noisy settings.
2. In noise-free settings, methods like Newton's method and BFGS have quadratic or superlinear convergence rates, while evolutionary strategies (ES) have linear convergence rates.
3. Lower bounds on optimization complexity are also discussed, showing minimum comparisons or evaluations needed depending on problem properties like domain size and precision required.
it contains the detail information about Dynamic programming, Knapsack problem, Forward / backward knapsack, Optimal Binary Search Tree (OBST), Traveling sales person problem(TSP) using dynamic programming
Nature-Inspired Optimization Algorithms Xin-She Yang
This document discusses nature-inspired optimization algorithms. It begins with an overview of the essence of optimization algorithms and their goal of moving to better solutions. It then discusses some issues with traditional algorithms and how nature-inspired algorithms aim to address these. Several nature-inspired algorithms are described in detail, including particle swarm optimization, firefly algorithm, cuckoo search, and bat algorithm. These are inspired by behaviors in swarms, fireflies, cuckoos, and bats respectively. Examples of applications to engineering design problems are also provided.
This document provides an overview and summary of Numerical Python (NumPy), an extension to the Python programming language that adds support for large, multi-dimensional arrays and matrices, along with a large library of high-level mathematical functions to operate on these arrays. It describes how to install NumPy, test the installation, and introduces some of the key features like array objects, universal functions (ufuncs), and convenience functions for array creation and manipulation.
A generalized class of normalized distance functions called Q-Metrics is described in this presentation. The Q-Metrics approach relies on a unique functional, using a single bounded parameter (Lambda), which characterizes the conventional distance functions in a normalized per-unit metric space. In addition to this coverage property, a distinguishing and extremely attractive characteristic of the Q-Metric function is its low computational complexity. Q-Metrics satisfy the standard metric axioms. Novel networks for classification and regression tasks are defined and constructed using Q-Metrics. These new networks are shown to outperform conventional feed forward back propagation networks with the same size when tested on real data sets.
A generalized class of normalized distance functions called Q-Metrics is described in this presentation. The Q-Metrics approach relies on a unique functional, using a single bounded parameter Lambda, which characterizes the conventional distance functions in a normalized per-unit metric space. In addition to this coverage property, a distinguishing and extremely attractive characteristic of the Q-Metric function is its low computational complexity. Q-Metrics satisfy the standard metric axioms. Novel networks for classification and regression tasks are defined and constructed using Q-Metrics. These new networks are shown to outperform conventional feed forward back propagation networks with the same size when tested on real data sets.
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...npinto
Abstract:
Machine learning researchers and practitioners develop computer
algorithms that "improve performance automatically through
experience". At Google, machine learning is applied to solve many
problems, such as prioritizing emails in Gmail, recommending tags for
YouTube videos, and identifying different aspects from online user
reviews. Machine learning on big data, however, is challenging. Some
"simple" machine learning algorithms with quadratic time complexity,
while running fine with hundreds of records, are almost impractical to
use on billions of records.
In this talk, I will describe lessons drawn from various Google
projects on developing large scale machine learning systems. These
systems build on top of Google's computing infrastructure such as GFS
and MapReduce, and attack the scalability problem through massively
parallel algorithms. I will present the design decisions made in
these systems, strategies of scaling and speeding up machine learning
systems on web scale data.
Speaker biography:
Max Lin is a software engineer with Google Research in New York City
office. He is the tech lead of the Google Prediction API, a machine
learning web service in the cloud. Prior to Google, he published
research work on video content analysis, sentiment analysis, machine
learning, and cross-lingual information retrieval. He had a PhD in
Computer Science from Carnegie Mellon University.
Metaheuristic Algorithms: A Critical AnalysisXin-She Yang
The document discusses metaheuristic algorithms and their application to optimization problems. It provides an overview of several nature-inspired algorithms including particle swarm optimization, firefly algorithm, harmony search, and cuckoo search. It describes how these algorithms were inspired by natural phenomena like swarming behavior, flashing fireflies, and bird breeding. The document also discusses applications of these algorithms to engineering design problems like pressure vessel design and gear box design optimization.
This document discusses automatic Bayesian cubature for numerical integration. It begins with an introduction to multivariate integration and the challenges it poses. It then describes an automatic cubature algorithm that generates sample points and computes error bounds iteratively until a tolerance threshold is met. Next, it covers Bayesian cubature, which treats integrands as random functions to obtain probabilistic error bounds. It defines a Bayesian trio identity relating the integration error to discrepancies, variations, and alignments. The document concludes with discussions of future work.
Scientific Computing with Python Webinar 9/18/2009:Curve FittingEnthought, Inc.
This webinar will provide an overview of the tools that SciPy and NumPy provide for regression analysis including linear and non-linear least-squares and a brief look at handling other error metrics. We will also demonstrate simple GUI tools that can make some problems easier and provide a quick overview of the new Scikits package statsmodels whose API is maturing in a separate package but should be incorporated into SciPy in the future.
This document discusses algorithms for rendering lines in raster graphics. It begins by introducing common line primitives in OpenGL and reviewing basic line drawing math. It then describes the Digital Differential Analyzer (DDA) line algorithm and its limitations. The document introduces Bresenham's midpoint line algorithm as a faster alternative that uses integer arithmetic. It explains how Bresenham's algorithm works by tracking the sign of a decision variable to select the next pixel along the line. The document concludes by generalizing Bresenham's algorithm and discussing optimizations.
Cuckoo Search Algorithm: An IntroductionXin-She Yang
This presentation explains the fundamental ideas of the standard Cuckoo Search (CS) algorithm, which also contains the links to the free Matlab codes at Mathswork file exchanges and the animations of numerical simulations (video at Youtube). An example of multi-objective cuckoo search (MOCS) is also given with link to the Matlab code.
Introduction to Neural Networks and Deep Learning from ScratchAhmed BESBES
If you're willing to understand how neural networks work behind the scene and debug the back-propagation algorithm step by step by yourself, this presentation should be a good starting point.
We'll cover elements on:
- the popularity of neural networks and their applications
- the artificial neuron and the analogy with the biological one
- the perceptron
- the architecture of multi-layer perceptrons
- loss functions
- activation functions
- the gradient descent algorithm
At the end, there will be an implementation FROM SCRATCH of a fully functioning neural net.
code: https://github.com/ahmedbesbes/Neural-Network-from-scratch
This document provides an introduction to machine learning, covering key topics such as what machine learning is, common learning algorithms and applications. It discusses linear models, kernel methods, neural networks, decision trees and more. It also addresses challenges in machine learning like balancing fit and robustness, and evaluating model performance using techniques like ROC curves. The goal of machine learning is to build models that can learn from data to make predictions or decisions.
This document discusses randomized algorithms. It begins by listing different categories of algorithms, including randomized algorithms. Randomized algorithms introduce randomness into the algorithm to avoid worst-case behavior and find efficient approximate solutions. Quicksort is presented as an example randomized algorithm, where randomness improves its average runtime from quadratic to linear. The document also discusses the randomized closest pair algorithm and a randomized algorithm for primality testing. Both introduce randomness to improve efficiency compared to deterministic algorithms for the same problems.
BEADS : filtrage asymétrique de ligne de base (tendance) et débruitage pour d...Laurent Duval
This paper jointly addresses the problems of chromatogram baseline correction and noise reduction. The proposed approach is based on modeling the series of chromatogram peaks as sparse with sparse derivatives, and on modeling the baseline as a low-pass signal. A convex optimization problem is formulated so as to encapsulate these non-parametric models. To account for the positivity of chromatogram peaks, an asymmetric penalty functions is utilized. A robust, computationally efficient, iterative algorithm is developed that is guaranteed to converge to the unique optimal solution. The approach, termed Baseline Estimation And Denoising with Sparsity (BEADS), is evaluated and compared with two state-of-the-art methods using both simulated and real chromatogram data.
This document discusses object detection using Adaboost and various techniques. It begins with an overview of the Adaboost algorithm and provides a toy example to illustrate how it works. Next, it describes how Viola and Jones used Adaboost with Haar-like features and an integral image representation for rapid face detection in images. It achieved high detection rates with very low false positives. The document also discusses how Schneiderman and Kanade used a parts-based representation with localized wavelet coefficients as features for object detection and used statistical independence of parts to obtain likelihoods for classification.
The document discusses algorithms and their analysis. It defines an algorithm as a well-defined computational procedure that takes inputs and produces outputs. It discusses analyzing algorithms based on their time complexity, space complexity, and correctness. It provides examples of analyzing simple algorithms and calculating their complexity based on the number of elementary operations.
Swift is proposed as a next-generation platform for TensorFlow that could provide benefits over Python like improved performance, type safety, and enabling automatic differentiation at compile time. However, Python currently dominates the machine learning ecosystem. Swift and Python are intended to have a complementary relationship, with each suited to different use cases. Examples show comparable MNIST implementations in both Swift and Python for TensorFlow.
AppSec PNW: Android and iOS Application Security with MobSFAjin Abraham
Mobile Security Framework - MobSF is a free and open source automated mobile application security testing environment designed to help security engineers, researchers, developers, and penetration testers to identify security vulnerabilities, malicious behaviours and privacy concerns in mobile applications using static and dynamic analysis. It supports all the popular mobile application binaries and source code formats built for Android and iOS devices. In addition to automated security assessment, it also offers an interactive testing environment to build and execute scenario based test/fuzz cases against the application.
This talk covers:
Using MobSF for static analysis of mobile applications.
Interactive dynamic security assessment of Android and iOS applications.
Solving Mobile app CTF challenges.
Reverse engineering and runtime analysis of Mobile malware.
How to shift left and integrate MobSF/mobsfscan SAST and DAST in your build pipeline.
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
What is an RPA CoE? Session 1 – CoE VisionDianaGray10
In the first session, we will review the organization's vision and how this has an impact on the COE Structure.
Topics covered:
• The role of a steering committee
• How do the organization’s priorities determine CoE Structure?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
The Microsoft 365 Migration Tutorial For Beginner.pptxoperationspcvita
This presentation will help you understand the power of Microsoft 365. However, we have mentioned every productivity app included in Office 365. Additionally, we have suggested the migration situation related to Office 365 and how we can help you.
You can also read: https://www.systoolsgroup.com/updates/office-365-tenant-to-tenant-migration-step-by-step-complete-guide/
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsDianaGray10
Join us to learn how UiPath Apps can directly and easily interact with prebuilt connectors via Integration Service--including Salesforce, ServiceNow, Open GenAI, and more.
The best part is you can achieve this without building a custom workflow! Say goodbye to the hassle of using separate automations to call APIs. By seamlessly integrating within App Studio, you can now easily streamline your workflow, while gaining direct access to our Connector Catalog of popular applications.
We’ll discuss and demo the benefits of UiPath Apps and connectors including:
Creating a compelling user experience for any software, without the limitations of APIs.
Accelerating the app creation process, saving time and effort
Enjoying high-performance CRUD (create, read, update, delete) operations, for
seamless data management.
Speakers:
Russell Alfeche, Technology Leader, RPA at qBotic and UiPath MVP
Charlie Greenberg, host
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Derivative Free Optimization
1. DERIVATIVE-FREE
OPTIMIZATION
http://www.lri.fr/~teytaud/dfo.pdf
(or Quentin's web page ?)
Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège using also Slides from A. Auger
2. The next slide is the most important
of all.
Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
3. In case of trouble,
Interrupt me.
Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
4. In case of trouble,
Interrupt me.
Further discussion needed:
- R82A, Montefiore institute
- olivier.teytaud@inria.fr
- or after the lessons (the 25 th
, not the 18th)
Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
5.
6. I. Optimization and DFO
II. Evolutionary algorithms
III. From math. programming
IV. Using machine learning
V. Conclusions
Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
9. Derivative-free optimization of f
No gradient !
Only depends on the x's and f(x)'s
Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
13. Derivative-free optimization of f
Why derivative free optimization ?
Ok, it's slower
But sometimes you have no derivative
It's simpler (by far) ==> less bugs
14. Derivative-free optimization of f
Why derivative free optimization ?
Ok, it's slower
But sometimes you have no derivative
It's simpler (by far)
It's more robust (to noise, to strange functions...)
15. Derivative-free optimization of f
Optimization algorithms
==> Newton optimization ?
Why derivative free
==> Quasi-Newton (BFGS)
Ok, it's slower
But sometimes you have no derivative
==> Gradient descent
It's simpler (by far)
==> ...robust (to noise, to strange functions...)
It's more
16. Derivative-free optimization of f
Optimization algorithms
Why derivative free optimization ?
Ok, it's slower
Derivative-free optimization
But sometimes you have no derivative
(don't need gradients)
It's simpler (by far)
It's more robust (to noise, to strange functions...)
17. Derivative-free optimization of f
Optimization algorithms
Why derivative free optimization ?
Derivative-free optimization
Ok, it's slower
But sometimes you have no derivative
Comparison-based optimization
(coming soon),
It's simpler (by far)comparisons,
just needing
It's more robust (to noise, to strange functions...)
incuding evolutionary algorithms
18. I. Optimization and DFO
II. Evolutionary algorithms
III. From math. programming
IV. Using machine learning
V. Conclusions
Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
19. II. Evolutionary algorithms
a. Fundamental elements
b. Algorithms
c. Math. analysis
Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
22. K exp( - p(x) ) with
- p(x) a degree 2 polynomial (neg. dom coef)
- K a normalization constant
Preliminaries:
- Gaussian distribution
- Multivariate Gaussian distribution
- Non-isotropic Gaussian distribution
- Markov chains
23. K exp( - p(x) ) with
- p(x) a degree 2 polynomial (neg. dom coef)
- K a normalization constant
Translation
of the
Preliminaries:Gaussian
Sze
of the
- Gaussian distribution
Gaussian
- Multivariate Gaussian distribution
- Non-isotropic Gaussian distribution
- Markov chains
25. Preliminaries:
Isotropic case:
- Gaussian distribution
- Multivariate Gaussian distribution||2 /22)
==> general case: density = K exp( - || x -
==> level sets are rotationally invariant
- Non-isotropic Gaussian distribution
==> completely defined by and
- Markov chains
(do you understand why K is fixed by ?)
==> “isotropic” Gaussian
34. Population-based comparison-based algorithms ?
Abstract notations: x(i) is a population, I(i) is an
internal state of the algorithm.
x(1),I(1) = Opt()
x(2),I(2) = Opt(x(1), sign(y(1,1)-y(1,2)), I(1) )
… … ...
x(n),I(n) = Opt(x(n-1),sign(y(n-1,1)-y(n-1,2) ,I(n-1))
Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 34
35. Population-based comparison-based algorithms ?
Abstract notations: x(i) is a population, I(i) is an
internal state of the algorithm.
x(1),I(1) = Opt()
x(2),I(2) = Opt(x(1), (1), I(1) )
… … ...
x(n),I(n) = Opt(x(n-1),(n-1) ,I(n-1))
Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 35
36. Comparison-based optimization
==> Same behavior on many functions
is comparison-based if
Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 36
37. Comparison-based optimization
==> Same behavior on many functions
is comparison-based if
Quasi-Newton methods very poor on this.
Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 37
38. Why comparison-based algorithms ?
==> more robust
==> this can be mathematically
formalized: comparison-based opt.
are slow ( d log ||xn-x*||/n ~ constant)
but robust (optimal for some worst
case analysis)
Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
39. II. Evolutionary algorithms
a. Fundamental elements
b. Algorithms
c. Math. analysis
Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
40. Parameters:
Generate points around x
x,
( x + N where N is a standard
Gaussian)
o f an
egy
ema
Strat
c sch
ution
Basi
Evol
41. Parameters:
Generate points around x
x,
( x + N where N is a standard
Gaussian)
o f an
egy
Compute their fitness values
ema
Strat
c sch
ution
Basi
Evol
42. Parameters:
Generate points around x
x,
( x + N where N is a standard
Gaussian)
o f an
egy
Compute their fitness values
ema
Strat
c sch
ution
Select the best
Basi
Evol
43. Parameters:
Generate points around x
x,
( x + N where N is a standard
Gaussian)
o f an
egy
Compute their fitness values
ema
Strat
c sch
ution
Select the best
Basi
Evol
Let x = average of these best
44. Parameters:
Generate points around x
x,
( x + N where N is a standard
Gaussian)
o f an
egy
Compute their fitness values
ema
Strat
c sch
ution
Select the best
Basi
Evol
Let x = average of these best
45. Parameters:
Generate points around x
x,
( x + N where N is a standard
Gaussian)
llel
para
Compute their fitness values
Multi-cores,
sly
Clusters, Grids...
ou
Select the best
Obvi
Let x = average of these best
46. Parameters:
Generate points around x
x,
( x + N where N is a standard
Gaussian)
llel
para
Compute their fitness values
sly
ple.
ou
Select the best
ly sim
Obvi
Real
Let x = average of these best
47. Parameters:
Generate points around x
x,
( x + N where N is a standard
Gaussian)
llel
para
Not a negligible advantage.
Compute their fitness values
When I accessed, for the 1st time,
sly
to a crucial industrial
ple.
code of an important
ou
Select the best
company, I believed
ly sim
Obvi
that it would be
Real
clean and bug free.
Let x = average of these best
(I was young)
48. Parameters:
Generate 1 point x' around x
x,
( x + N where N is a standard
Gaussian)
Compute its fitness value
) - ES
le
Keep the best (x or x').
ru
(1 +1
1/5 th
x=best(x,x')
The
with
=2 if x' best
=0.84 otherwise
61. Estimation of Multivariate Normal Algorithm
Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 61
62. Estimation of Multivariate Normal Algorithm
Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 62
63. Estimation of Multivariate Normal Algorithm
Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 63
64. Estimation of Multivariate Normal Algorithm
Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 64
65. EMNA is usually non-isotropic
Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 65
66. EMNA is usually non-isotropic
Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 66
67. Self-adaptation (works in many frameworks)
Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 67
68. Self-adaptation (works in many frameworks)
Can be used for non-isotropic
multivariate Gaussian
distributions.
Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 68
69. Let's generalize.
We have seen algorithms which work as follows:
- we keep one search point in memory
(and one step-size)
- we generate individuals
- we evaluate these individuals
- we regenerate a search point and a step-size
Maybe we could keep more than one search point ?
70. Let's generalize.
We have seen algorithms which work as follows:
- we keep one search point in memory
(and one step-size) points
==> mu search
- we generate individuals
- we evaluate thesegenerated individuals
==> lambda individuals
- we regenerate a search point and a step-size
Maybe we could keep more than one search point ?
71. Parameters: Generate points
x1,...,x around x1,...,x
e.g. each x randomly generated
from two points
llel
para
Compute their fitness values
sly
ple.
ou
Select the best
ly sim
Obvi
Real
Don't average...
72. Generate points
around x1,...,x
e.g. each x randomly generated
from two points
73. Generate points
around x1,...,x
e.g. each x randomly generated This is a
from two points cross-over
74. Generate points
around x1,...,x
e.g. each x randomly generated This is a
from two points cross-over
Example of procedure for generating a point:
- Randomly draw k parents x1,...,xk
(truncation selection: randomly in selected individuals)
- For generating the ith coordinate of new individual z:
u=random(1,k)
z(i) = x(u)i
75. Let's summarize:
We have seen a general scheme for optimization:
- generate a population (e.g. from some distribution, or from
a set of search points)
- select the best = new search points
==> Small difference between an
Evolutionary Algorithm (EA) and an
Estimation of Distribution Algorithm (EDA).
==> Some EA (older than the EDA acronym) are EDAs.
76. Let's summarize:
We have seen a general scheme for optimization:
- generate a population (e.g. from some distribution, or from
a set of search points)
- select the best = new search points EDA
EA
==> Small difference between an
Evolutionary Algorithm (EA) and an
Estimation of Distribution Algorithm (EDA).
==> Some EA (older than the EDA acronym) are EDAs.
77. Gives a lot freedom:
- choose your representation
and operators (depending on the problem)
- if you have a step-size, choose adaptation rule
- choose your population-size (depending on your
computer/grid )
- choose (carefully) e.g. min(dimension, /4)
78. Gives a lot freedom:
- choose your operators (depending on the problem)
- if you have a step-size, choose adaptation rule
- choose your population-size (depending on your
computer/grid )
- choose (carefully) e.g. min(dimension, /4)
Can handle strange things:
- optimize a physical structure ?
- structure represented as a Voronoi
- cross-over makes sense, benefits from local structure
- not so many algorithms can work on that
82. Voronoi representation:
- a family of points
- their labels
==> cross-over makes sense
==> you can optimize a shape
83. Voronoi representation:
- a family of points
- their labels
==> cross-over makes sense
==> you can optimize a shape
==> not that mathematical;
but really useful
Mutations: each label is changed with proba 1/n
Cross-over: each point/label is randomly drawn from one of
the two parents
84. Voronoi representation:
- a family of points
- their labels
==> cross-over makes sense
==> you can optimize a shape
==> not that mathematical;
but really useful
Mutations: each label is changed with proba 1/n
Cross-over: randomly pick one split in the representation:
- left part from parent 1
- right part from parent 2
==> related to biology
85. Gives a lot freedom:
- choose your operators (depending on the problem)
- if you have a step-size, choose adaptation rule
- choose your population-size (depending on your
computer/grid )
- choose (carefully) e.g. min(dimension, /4)
Can handle strange things:
- optimize a physical structure ?
- structure represented as a Voronoi
- cross-over makes sense, benefits from local structure
- not so many algorithms can work on that
86. II. Evolutionary algorithms
a. Fundamental elements
b. Algorithms
c. Math. Analysis
Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
87. Consider the (1+1)-ES.
x(n) = x(n-1) or x(n-1) + (n-1)N
We want to maximize:
- E log || x(n) - f* ||
Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
88. Consider the (1+1)-ES.
x(n) = x(n-1) or x(n-1) + (n-1)N
We want to maximize:
- E log || x(n) - f* ||
--------------------------
- E log || x(n-1) – f* ||
Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
89. Consider the (1+1)-ES.
x(n) = x(n-1) or x(n-1) + (n-1)N
We don't know f*.
We want to maximize:
How can we optimize this ?
- E log || x(n) - f* ||
We will observe
-------------------------- the acceptance rate,
- E log || x(n-1) – f* ||
and we will deduce if
Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège is too large or too small..
90. - E log || x(n) - f* ||
ON THE NORM FUNCTION
--------------------------
- E log || x(n-1) – f* ||
Rejected Accepted
mutations mutations
Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
91. - E log || x(n) - f* || For each step-size,
-------------------------- evaluate this “expected progress rate”
- E log || x(n-1) – f* || and evaluate “P(acceptance)”
Rejected Accepted
mutations mutations
Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
98. th
1/5 rule
Based on maths showing
that good step-size
<==> success rate < 1/5
Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 98
99. I. Optimization and DFO
II. Evolutionary algorithms
III. From math. programming
IV. Using machine learning
V. Conclusions
Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
100. III. From math. programming
==>pattern search method
Comparison with ES:
- code more complicated
- same rate
- deterministic
- less robust
Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
101. III. From math. programming
Also:
- Nelder-Mead algorithm (similar to pattern search,
better constant in the rate)
Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
102. III. From math. programming
Also:
- Nelder-Mead algorithm (similar to pattern search,
better constant in the rate)
- NEWUOA (using value functions and
not only comparisons)
Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
103. I. Optimization and DFO
II. Evolutionary algorithms
III. From math. programming
IV. Using machine learning
V. Conclusions
Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
104. IV. Using machine learning
What if computing f takes days ?
==> parallelism
==> and “learn” an approximation of f
Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
105. IV. Using machine learning
Statistical tools: f ' (x) = approximation
( x, x1,f(x1), x2,f(x2), … , xn,f(xn))
y(n+1) = f ' (x(n+1) )
e.g. f' = quadratic function closest to f on the x(i)'s.
106. IV. Using machine learning
==> keyword “surrogate models”
==> use f' instead of f
==> periodically, re-use the real f
107. I. Optimization and DFO
II. Evolutionary algorithms
III. From math. programming
IV. Using machine learning
V. Conclusions
Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
108. Derivative free optimization is fun.
==> nice maths
==> nice applications + easily parallel algorithms
==> can handle really complicated domains
(mixed continuous / integer, optimization
on sets of programs)
Yet,
often suboptimal on highly structured problems (when
BFGS is easy to use, thanks to fast gradients)
109. Keywords, readings
==> cross-entropy (so close to evolution strategies)
==> genetic programming (evolutionary algorithms for
automatically building programs)
==> H.-G. Beyer's book on ES = good starting point
==> many resources on the web
==> keep in mind that representation / operators are
often the key
==> we only considered isotropic algorithms; sometimes not
a good idea at all