The document discusses simulation as a tool for statistical computation. Simulation allows researchers to reproduce randomness on a computer by using deterministic pseudo-random number generators. It is useful for evaluating complex systems, determining properties of statistical procedures, validating models, approximating integrals, and maximizing functions. Simulation relies on generating uniform random variables between 0 and 1 using algorithms like congruential generators, and is widely used across many fields involving probability and statistics.
AI Data Summit 2019 Thompson Sampling - Thompson Sampling Tutorial “The redis...Hanan Shteingart
AI Data Summit 2019 Thompson Sampling - Thompson Sampling Tutorial “The rediscovery of a swiss army knife”
aka “Squeezing a 96-page review into a 25 min talk”
Based on Russo, D. J., Van Roy, B., Kazerouni, A., Osband, I., & Wen, Z. (2018). A tutorial on Thompson sampling. Foundations and Trends® inMachine Learning, 11(1), 1-96.
This document discusses computer vision applications using TensorFlow for deep learning. It introduces computer vision and convolutional neural networks. It then demonstrates how to build and train a CNN for MNIST handwritten digit recognition using TensorFlow. Finally, it shows how to load and run the pre-trained Google Inception model for image classification.
Simulation involves imitating the operation of a real-world process over time, usually on a computer. It is widely used for decision making and analyzing complex systems that cannot be solved mathematically. A simulation study involves problem formulation, model conceptualization, validation, experimentation, and implementation. Key aspects of a model include entities, attributes, resources, variables, events, and activities.
The information in this slide is very useful for me to do the assignment regarding the simulation in which we have to report together with the presentation...
The document discusses simulation as an important tool for statistical computation. It describes how simulation allows researchers to (1) reproduce randomness on computers, (2) determine probabilistic properties of statistical procedures, and (3) approximate integrals and maximize functions. The key components of simulation include pseudo-random number generators that produce uniform random variables, which are then used to simulate other probabilistic processes and systems.
AN ALTERNATIVE APPROACH FOR SELECTION OF PSEUDO RANDOM NUMBERS FOR ONLINE EXA...cscpconf
The document proposes an alternative approach for selecting pseudo-random numbers for online examination systems. It compares three random number generators: a procedural language random number generator, the PHP random number generator, and an atmospheric noise-based true random number generator. It tests the randomness quality of patterns generated by each using the Diehard statistical tests. The results show that the true random number generator passes all tests, while the procedural language and PHP generators fail most tests, indicating their patterns have lower randomness quality than the true random generator.
The document discusses simulation as a tool for statistical computation. Simulation allows researchers to reproduce randomness on a computer by using deterministic pseudo-random number generators. It is useful for evaluating complex systems, determining properties of statistical procedures, validating models, approximating integrals, and maximizing functions. Simulation relies on generating uniform random variables between 0 and 1 using algorithms like congruential generators, and is widely used across many fields involving probability and statistics.
AI Data Summit 2019 Thompson Sampling - Thompson Sampling Tutorial “The redis...Hanan Shteingart
AI Data Summit 2019 Thompson Sampling - Thompson Sampling Tutorial “The rediscovery of a swiss army knife”
aka “Squeezing a 96-page review into a 25 min talk”
Based on Russo, D. J., Van Roy, B., Kazerouni, A., Osband, I., & Wen, Z. (2018). A tutorial on Thompson sampling. Foundations and Trends® inMachine Learning, 11(1), 1-96.
This document discusses computer vision applications using TensorFlow for deep learning. It introduces computer vision and convolutional neural networks. It then demonstrates how to build and train a CNN for MNIST handwritten digit recognition using TensorFlow. Finally, it shows how to load and run the pre-trained Google Inception model for image classification.
Simulation involves imitating the operation of a real-world process over time, usually on a computer. It is widely used for decision making and analyzing complex systems that cannot be solved mathematically. A simulation study involves problem formulation, model conceptualization, validation, experimentation, and implementation. Key aspects of a model include entities, attributes, resources, variables, events, and activities.
The information in this slide is very useful for me to do the assignment regarding the simulation in which we have to report together with the presentation...
The document discusses simulation as an important tool for statistical computation. It describes how simulation allows researchers to (1) reproduce randomness on computers, (2) determine probabilistic properties of statistical procedures, and (3) approximate integrals and maximize functions. The key components of simulation include pseudo-random number generators that produce uniform random variables, which are then used to simulate other probabilistic processes and systems.
AN ALTERNATIVE APPROACH FOR SELECTION OF PSEUDO RANDOM NUMBERS FOR ONLINE EXA...cscpconf
The document proposes an alternative approach for selecting pseudo-random numbers for online examination systems. It compares three random number generators: a procedural language random number generator, the PHP random number generator, and an atmospheric noise-based true random number generator. It tests the randomness quality of patterns generated by each using the Diehard statistical tests. The results show that the true random number generator passes all tests, while the procedural language and PHP generators fail most tests, indicating their patterns have lower randomness quality than the true random generator.
Monte Carlo methods use random sampling to solve quantitative problems. They were first used by Stanislaw Ulam and Nicholas Metropolis to solve non-random problems by transforming them into random forms. Monte Carlo simulations play a major role in experimental physics by designing experiments, evaluating potential outputs and risks, and validating results. Random numbers are generated using pseudorandom number generators or by transforming uniform random variables using probability distribution functions. The accuracy of Monte Carlo simulations improves as the number of samples increases, with the standard error declining proportionally with the square root of the number of samples.
This document discusses machine learning and related concepts. It provides early definitions of machine learning from Arthur Samuel in 1959 and Tom Mitchell in 1998. It also discusses key topics in machine learning including heuristics, supervised vs unsupervised learning, and performance measures. Examples are provided to illustrate concepts like training models on labeled data and making predictions on new test data. Concerns about overfitting are also mentioned.
Machine learning is a branch of artificial intelligence concerned with building systems that can learn from data. The document discusses various machine learning concepts including what machine learning is, related fields, the machine learning workflow, challenges, different types of machine learning algorithms like supervised learning, unsupervised learning and reinforcement learning, and popular Python libraries used for machine learning like Scikit-learn, Pandas and Matplotlib. It also provides examples of commonly used machine learning algorithms and datasets.
This document discusses static and dynamic models, deterministic and stochastic models, and various methods for studying systems with uncertainty. Deterministic models use differential equations to exactly predict outcomes, while stochastic models use random variables and can only compute probabilities. Numerical methods and simulation are introduced as ways to study more complex systems. Simulation models represent real systems and allow experiments to be performed faster and safer. Monte Carlo methods and discrete event simulation are discussed as techniques for simulation.
Monte Carlo Simulations (UC Berkeley School of Information; July 11, 2019)Ivan Corneillet
My guest lecture on Monte Carlo simulations [or "how to be approximately right, now vs. precisely wrong, later (or never…)"] for the Managing Cyber Risk course of UC Berkeley School of Information's Cybersecurity Master.
Artificial Agents Without Ontological Access to Realityogeorgeon
The document discusses agents without ontological access to reality (AWOAs). It argues that most artificial intelligence and machine learning algorithms assume that input data represents reality, but this may not be accurate based on philosophical perspectives. The document proposes that agents begin with outputting data and receiving results, rather than assuming input data represents states of reality. This embodied model could generate more interesting behaviors than traditional models that treat agents as passive observers of reality.
Two methods are described for optimizing cognitive model parameters: differential evolution (DE) and high-throughput computing with HTCondor. DE is a genetic algorithm that uses a population of models to explore the parameter space in parallel. It is well-suited for models with few parameters or short run times. HTCondor allows running a population of models over a computer network, making it suitable for larger, more complex models or simulating many participants. Examples of using each method with an ACT-R paired associate model are provided.
Statistical software for Sampling from Finite Populations: an analysis using ...michele de meo
The thesis analyzes statistical software for probability proportional to size (PPS) sampling without replacement using Monte Carlo simulations. It tests if software like SAS, SPSS, and R correctly implement the Hanurav-Vijayan and Rao-Sampford algorithms by comparing estimated joint selection probabilities from simulations to the true probabilities. The results show R estimates match the true probabilities, but SAS and SPSS estimates differ significantly, suggesting their implementations are incorrect or use a suboptimal pseudorandom number generator.
This document discusses the use of machine learning in official statistics. It begins by contrasting the traditional "data modeling" approach used in statistics with the newer "algorithmic modeling" approach used in machine learning. It then discusses how machine learning can be applied both in traditional statistical production processes based on primary survey data, as well as in new multi-source production processes that incorporate alternative data sources like big data. Specifically, machine learning can be used for tasks like imputation, outlier detection, and estimation. The document concludes that machine learning represents a paradigm shift for official statistics and is particularly well-suited for new data sources, as it prioritizes prediction over interpretability and generalizability.
06-01 Machine Learning and Linear Regression.pptxSaharA84
This document discusses machine learning and linear regression. It provides examples of supervised learning problems like predicting housing prices and classifying cancer as malignant or benign. Unsupervised learning is used to discover patterns in unlabeled data, like grouping customers for market segmentation. Linear regression finds the linear function that best fits some training data to make predictions on new data. It can be extended to nonlinear functions by adding polynomial features. More complex models may overfit the training data and not generalize well to new examples.
The document discusses using random forests for approximate Bayesian computation (ABC) model choice. It proposes:
1. Using random forests to infer a model from summary statistics, as random forests can handle a large number of statistics and find efficient combinations.
2. Replacing estimates of posterior model probabilities, which are poorly approximated, with posterior predictive expected losses to evaluate models.
3. An example comparing MA(1) and MA(2) time series models using two autocorrelations as summaries, finding embedded models and that random forests perform similarly to other methods on small problems.
Computational Biology, Part 4 Protein Coding Regionsbutest
The document discusses different machine learning approaches for supervised classification and sequence analysis. It describes several classification algorithms like k-nearest neighbors, decision trees, linear discriminants, and support vector machines. It also discusses evaluating classifiers using cross-validation and confusion matrices. For sequence analysis, it covers using position-specific scoring matrices, hidden Markov models, cobbling, and family pairwise search to identify new members of protein families. It compares the performance of these different machine learning methods on sequence analysis tasks.
This document provides an overview of machine learning concepts and techniques including linear regression, logistic regression, unsupervised learning, and k-means clustering. It discusses how machine learning involves using data to train models that can then be used to make predictions on new data. Key machine learning types covered are supervised learning (regression, classification), unsupervised learning (clustering), and reinforcement learning. Example machine learning applications are also mentioned such as spam filtering, recommender systems, and autonomous vehicles.
La résolution de problèmes à l'aide de graphesData2B
This document discusses how network science can be used to analyze and draw insights from different types of data. It describes how network science is the study of networks representing physical, biological, and social phenomena. It provides examples of how network science can be applied to geographic, temporal, social, and semantic network data. The document also discusses how network science combined with data science and machine learning techniques can enable machines to perform more human-like reasoning about ambiguous or uncertain concepts.
PHStat Notes Using the PHStat Stack Data and .docxShiraPrater50
PHStat Notes
Using the PHStat Stack Data and Unstack Data Tools p. 28
One‐ and Two‐Way Tables and Charts p. 63
Normal Probability Tools p. 97
Generating Probabilities in PHStat p. 98
Confi dence Intervals for the Mean p. 136
Confi dence Intervals for Proportions p. 136
Confi dence Intervals for the Population Variance p. 137
Determining Sample Size p. 137
One‐Sample Test for the Mean, Sigma Unknown p. 169
One‐Sample Test for Proportions p. 169
Using Two‐Sample t ‐Test Tools p. 169
Testing for Equality of Variances p. 170
Chi‐Square Test for Independence p. 171
Using Regression Tools p. 209
Stepwise Regression p. 211
Best-Subsets Regression p. 212
Creating x ‐ and R ‐Charts p. 267
Creating p ‐Charts p. 268
Using the Expected Monetary Value Tool p. 375
Excel Notes
Creating Charts in Excel 2010 p. 29
Creating a Frequency Distribution and Histogram p. 61
Using the Descriptive Statistics Tool p. 61
Using the Correlation Tool p. 62
Creating Box Plots p. 63
Creating PivotTables p. 63
Excel‐Based Random Sampling Tools p. 134
Using the VLOOKUP Function p. 135
Sampling from Probability Distributions p. 135
Single‐Factor Analysis of Variance p. 171
Using the Trendline Option p. 209
Using Regression Tools p. 209
Using the Correlation Tool p. 211
Forecasting with Moving Averages p. 243
Forecasting with Exponential Smoothing p. 243
Using CB Predictor p. 244
Creating Data Tables p. 298
Data Table Dialog p. 298
Using the Scenario Manager p. 298
Using Goal Seek p. 299
Net Present Value and the NPV Function p. 299
Using the IRR Function p. 375
Crystal Ball Notes
Customizing Defi ne Assumption p. 338
Sensitivity Charts p. 339
Distribution Fitting with Crystal Ball p. 339
Correlation Matrix Tool p. 341
Tornado Charts p. 341
Bootstrap Tool p. 342
TreePlan Note
Constructing Decision Trees in Excel p. 376
This page intentionally left blank
Useful Statistical Functions in Excel 2010 Description
AVERAGE( data range ) Computes the average value (arithmetic mean) of a set of data.
BINOM.DIST( number_s, trials, probability_s, cumulative ) Returns the individual term binomial distribution.
BINOM.INV( trials, probability_s, alpha)
CHISQ.DIST( x, deg_freedom, cumulative )
CHISQ.DIST.RT( x, deg_freedom, cumulative )
CHISQ.TEST( actual_range, expected_range )
Returns the smallest value for which the cumulative binomial
distribution is greater than or equal to a criterion value.
Returns the left-tailed probability of the chi-square distribution.
Returns the right-tailed probability of the chi-square
distribution.
Returns the test for independence; the value of the chi-square
distribution and the appropriate degrees of freedom.
CONFIDENCE.NORM( alpha, standard_dev, size ) Retu ...
Automatically Tolerating And Correcting Memory ErrorsEmery Berger
Applications written in unsafe languages like C and C++ are vulnerable to memory errors such as buffer overflows, dangling pointers, and reads of uninitialized data. These errors, which lead to program crashes, security vulnerabilities, and unpredictable behavior, are both difficult to avoid and costly to repair.
This talk presents two systems that automatically harden unaltered C and C++ programs against heap-based memory errors. The first, DieHard, uses randomization and replication to make programs probabilistically resistant to a wide range of memory errors. Instead of crashing or running amok, DieHard lets programs run correctly in the face of memory errors with high probability. DieHard trades a modest increase in memory consumption (and optionally, the extra processing power of multicore CPUs) for dramatically increased reliability.
While DieHard tolerates errors, our second system, Exterminator, automatically isolates and corrects them. Exterminator exploits randomization to pinpoint errors with high precision. From this information, Exterminator generates patches that fix these errors in current and subsequent executions. In addition, Exterminator enables collaborative bug correction by merging patches generated by multiple users.
The fundamental problem of Forensic StatisticsGiulia Cereda
When using Y-chromosome DNA profiles, it often happens that a DNA profile found on a crime scene and matching the suspect’s profile does not appear in the relevant data-base. This creates a big challenge to the analyst who is required to supply a likelihood ratio (LR) or match-probability in order to quantify the evidential value of the match. Sensible estimation of the LR seems to rely on sensible estimation of the population frequency of this previously unseen haplotype.
There are three existing proposals of quite different nature: Roewer et al. (2000), based on Bayesian estimation of the haplotype frequency with a Beta prior; Brenner (2010), based on the number of singletons observed in the database; and Andersen et al. (2013) using a mixture of independent discrete Laplace distributions as a parametric approximation of the distribution of allelic frequencies.
We add two new methods. One is similar to Brenner’s, and like Brenner’s is strongly related to the Good-Turing estimator. A second method is based on Anevski, Gill and Zohren’s (arXiv.org/math.ST:1312.1200) study of a non-parametric maximum-likelihood estimator. It is somehow intermediate between the parametric approach of Andersen and non-parametric methods based on Good-Turing estimators. We believe that it avoids the disadvantages of those while moreover providing a supplementary means of evaluating their accuracy.
For all methods it is imperative to assess two more levels of uncertainty, beyond the uncertainty about which hypothesis is true given the evidence, which would hold if we knew everything about the population probability distribution. LR is a ratio of probabilities which are usually based on a model which is at best only a good approximation to the truth. Moreover we only estimate parameters of that model by fitting it to the data in our database.
This document provides an overview of advanced statistics concepts including descriptive statistics, estimation, hypothesis testing, and power analysis. It begins with foundational concepts like scales of measurement and graphical data exploration. It then covers topics like characteristics of estimators, confidence intervals, the principles of hypothesis testing, types of errors, and how to calculate and increase statistical power. The document is intended to teach students how to apply statistical tools and interpret results in research contexts.
Monte Carlo methods use random sampling to solve quantitative problems. They were first used by Stanislaw Ulam and Nicholas Metropolis to solve non-random problems by transforming them into random forms. Monte Carlo simulations play a major role in experimental physics by designing experiments, evaluating potential outputs and risks, and validating results. Random numbers are generated using pseudorandom number generators or by transforming uniform random variables using probability distribution functions. The accuracy of Monte Carlo simulations improves as the number of samples increases, with the standard error declining proportionally with the square root of the number of samples.
This document discusses machine learning and related concepts. It provides early definitions of machine learning from Arthur Samuel in 1959 and Tom Mitchell in 1998. It also discusses key topics in machine learning including heuristics, supervised vs unsupervised learning, and performance measures. Examples are provided to illustrate concepts like training models on labeled data and making predictions on new test data. Concerns about overfitting are also mentioned.
Machine learning is a branch of artificial intelligence concerned with building systems that can learn from data. The document discusses various machine learning concepts including what machine learning is, related fields, the machine learning workflow, challenges, different types of machine learning algorithms like supervised learning, unsupervised learning and reinforcement learning, and popular Python libraries used for machine learning like Scikit-learn, Pandas and Matplotlib. It also provides examples of commonly used machine learning algorithms and datasets.
This document discusses static and dynamic models, deterministic and stochastic models, and various methods for studying systems with uncertainty. Deterministic models use differential equations to exactly predict outcomes, while stochastic models use random variables and can only compute probabilities. Numerical methods and simulation are introduced as ways to study more complex systems. Simulation models represent real systems and allow experiments to be performed faster and safer. Monte Carlo methods and discrete event simulation are discussed as techniques for simulation.
Monte Carlo Simulations (UC Berkeley School of Information; July 11, 2019)Ivan Corneillet
My guest lecture on Monte Carlo simulations [or "how to be approximately right, now vs. precisely wrong, later (or never…)"] for the Managing Cyber Risk course of UC Berkeley School of Information's Cybersecurity Master.
Artificial Agents Without Ontological Access to Realityogeorgeon
The document discusses agents without ontological access to reality (AWOAs). It argues that most artificial intelligence and machine learning algorithms assume that input data represents reality, but this may not be accurate based on philosophical perspectives. The document proposes that agents begin with outputting data and receiving results, rather than assuming input data represents states of reality. This embodied model could generate more interesting behaviors than traditional models that treat agents as passive observers of reality.
Two methods are described for optimizing cognitive model parameters: differential evolution (DE) and high-throughput computing with HTCondor. DE is a genetic algorithm that uses a population of models to explore the parameter space in parallel. It is well-suited for models with few parameters or short run times. HTCondor allows running a population of models over a computer network, making it suitable for larger, more complex models or simulating many participants. Examples of using each method with an ACT-R paired associate model are provided.
Statistical software for Sampling from Finite Populations: an analysis using ...michele de meo
The thesis analyzes statistical software for probability proportional to size (PPS) sampling without replacement using Monte Carlo simulations. It tests if software like SAS, SPSS, and R correctly implement the Hanurav-Vijayan and Rao-Sampford algorithms by comparing estimated joint selection probabilities from simulations to the true probabilities. The results show R estimates match the true probabilities, but SAS and SPSS estimates differ significantly, suggesting their implementations are incorrect or use a suboptimal pseudorandom number generator.
This document discusses the use of machine learning in official statistics. It begins by contrasting the traditional "data modeling" approach used in statistics with the newer "algorithmic modeling" approach used in machine learning. It then discusses how machine learning can be applied both in traditional statistical production processes based on primary survey data, as well as in new multi-source production processes that incorporate alternative data sources like big data. Specifically, machine learning can be used for tasks like imputation, outlier detection, and estimation. The document concludes that machine learning represents a paradigm shift for official statistics and is particularly well-suited for new data sources, as it prioritizes prediction over interpretability and generalizability.
06-01 Machine Learning and Linear Regression.pptxSaharA84
This document discusses machine learning and linear regression. It provides examples of supervised learning problems like predicting housing prices and classifying cancer as malignant or benign. Unsupervised learning is used to discover patterns in unlabeled data, like grouping customers for market segmentation. Linear regression finds the linear function that best fits some training data to make predictions on new data. It can be extended to nonlinear functions by adding polynomial features. More complex models may overfit the training data and not generalize well to new examples.
The document discusses using random forests for approximate Bayesian computation (ABC) model choice. It proposes:
1. Using random forests to infer a model from summary statistics, as random forests can handle a large number of statistics and find efficient combinations.
2. Replacing estimates of posterior model probabilities, which are poorly approximated, with posterior predictive expected losses to evaluate models.
3. An example comparing MA(1) and MA(2) time series models using two autocorrelations as summaries, finding embedded models and that random forests perform similarly to other methods on small problems.
Computational Biology, Part 4 Protein Coding Regionsbutest
The document discusses different machine learning approaches for supervised classification and sequence analysis. It describes several classification algorithms like k-nearest neighbors, decision trees, linear discriminants, and support vector machines. It also discusses evaluating classifiers using cross-validation and confusion matrices. For sequence analysis, it covers using position-specific scoring matrices, hidden Markov models, cobbling, and family pairwise search to identify new members of protein families. It compares the performance of these different machine learning methods on sequence analysis tasks.
This document provides an overview of machine learning concepts and techniques including linear regression, logistic regression, unsupervised learning, and k-means clustering. It discusses how machine learning involves using data to train models that can then be used to make predictions on new data. Key machine learning types covered are supervised learning (regression, classification), unsupervised learning (clustering), and reinforcement learning. Example machine learning applications are also mentioned such as spam filtering, recommender systems, and autonomous vehicles.
La résolution de problèmes à l'aide de graphesData2B
This document discusses how network science can be used to analyze and draw insights from different types of data. It describes how network science is the study of networks representing physical, biological, and social phenomena. It provides examples of how network science can be applied to geographic, temporal, social, and semantic network data. The document also discusses how network science combined with data science and machine learning techniques can enable machines to perform more human-like reasoning about ambiguous or uncertain concepts.
PHStat Notes Using the PHStat Stack Data and .docxShiraPrater50
PHStat Notes
Using the PHStat Stack Data and Unstack Data Tools p. 28
One‐ and Two‐Way Tables and Charts p. 63
Normal Probability Tools p. 97
Generating Probabilities in PHStat p. 98
Confi dence Intervals for the Mean p. 136
Confi dence Intervals for Proportions p. 136
Confi dence Intervals for the Population Variance p. 137
Determining Sample Size p. 137
One‐Sample Test for the Mean, Sigma Unknown p. 169
One‐Sample Test for Proportions p. 169
Using Two‐Sample t ‐Test Tools p. 169
Testing for Equality of Variances p. 170
Chi‐Square Test for Independence p. 171
Using Regression Tools p. 209
Stepwise Regression p. 211
Best-Subsets Regression p. 212
Creating x ‐ and R ‐Charts p. 267
Creating p ‐Charts p. 268
Using the Expected Monetary Value Tool p. 375
Excel Notes
Creating Charts in Excel 2010 p. 29
Creating a Frequency Distribution and Histogram p. 61
Using the Descriptive Statistics Tool p. 61
Using the Correlation Tool p. 62
Creating Box Plots p. 63
Creating PivotTables p. 63
Excel‐Based Random Sampling Tools p. 134
Using the VLOOKUP Function p. 135
Sampling from Probability Distributions p. 135
Single‐Factor Analysis of Variance p. 171
Using the Trendline Option p. 209
Using Regression Tools p. 209
Using the Correlation Tool p. 211
Forecasting with Moving Averages p. 243
Forecasting with Exponential Smoothing p. 243
Using CB Predictor p. 244
Creating Data Tables p. 298
Data Table Dialog p. 298
Using the Scenario Manager p. 298
Using Goal Seek p. 299
Net Present Value and the NPV Function p. 299
Using the IRR Function p. 375
Crystal Ball Notes
Customizing Defi ne Assumption p. 338
Sensitivity Charts p. 339
Distribution Fitting with Crystal Ball p. 339
Correlation Matrix Tool p. 341
Tornado Charts p. 341
Bootstrap Tool p. 342
TreePlan Note
Constructing Decision Trees in Excel p. 376
This page intentionally left blank
Useful Statistical Functions in Excel 2010 Description
AVERAGE( data range ) Computes the average value (arithmetic mean) of a set of data.
BINOM.DIST( number_s, trials, probability_s, cumulative ) Returns the individual term binomial distribution.
BINOM.INV( trials, probability_s, alpha)
CHISQ.DIST( x, deg_freedom, cumulative )
CHISQ.DIST.RT( x, deg_freedom, cumulative )
CHISQ.TEST( actual_range, expected_range )
Returns the smallest value for which the cumulative binomial
distribution is greater than or equal to a criterion value.
Returns the left-tailed probability of the chi-square distribution.
Returns the right-tailed probability of the chi-square
distribution.
Returns the test for independence; the value of the chi-square
distribution and the appropriate degrees of freedom.
CONFIDENCE.NORM( alpha, standard_dev, size ) Retu ...
Automatically Tolerating And Correcting Memory ErrorsEmery Berger
Applications written in unsafe languages like C and C++ are vulnerable to memory errors such as buffer overflows, dangling pointers, and reads of uninitialized data. These errors, which lead to program crashes, security vulnerabilities, and unpredictable behavior, are both difficult to avoid and costly to repair.
This talk presents two systems that automatically harden unaltered C and C++ programs against heap-based memory errors. The first, DieHard, uses randomization and replication to make programs probabilistically resistant to a wide range of memory errors. Instead of crashing or running amok, DieHard lets programs run correctly in the face of memory errors with high probability. DieHard trades a modest increase in memory consumption (and optionally, the extra processing power of multicore CPUs) for dramatically increased reliability.
While DieHard tolerates errors, our second system, Exterminator, automatically isolates and corrects them. Exterminator exploits randomization to pinpoint errors with high precision. From this information, Exterminator generates patches that fix these errors in current and subsequent executions. In addition, Exterminator enables collaborative bug correction by merging patches generated by multiple users.
The fundamental problem of Forensic StatisticsGiulia Cereda
When using Y-chromosome DNA profiles, it often happens that a DNA profile found on a crime scene and matching the suspect’s profile does not appear in the relevant data-base. This creates a big challenge to the analyst who is required to supply a likelihood ratio (LR) or match-probability in order to quantify the evidential value of the match. Sensible estimation of the LR seems to rely on sensible estimation of the population frequency of this previously unseen haplotype.
There are three existing proposals of quite different nature: Roewer et al. (2000), based on Bayesian estimation of the haplotype frequency with a Beta prior; Brenner (2010), based on the number of singletons observed in the database; and Andersen et al. (2013) using a mixture of independent discrete Laplace distributions as a parametric approximation of the distribution of allelic frequencies.
We add two new methods. One is similar to Brenner’s, and like Brenner’s is strongly related to the Good-Turing estimator. A second method is based on Anevski, Gill and Zohren’s (arXiv.org/math.ST:1312.1200) study of a non-parametric maximum-likelihood estimator. It is somehow intermediate between the parametric approach of Andersen and non-parametric methods based on Good-Turing estimators. We believe that it avoids the disadvantages of those while moreover providing a supplementary means of evaluating their accuracy.
For all methods it is imperative to assess two more levels of uncertainty, beyond the uncertainty about which hypothesis is true given the evidence, which would hold if we knew everything about the population probability distribution. LR is a ratio of probabilities which are usually based on a model which is at best only a good approximation to the truth. Moreover we only estimate parameters of that model by fitting it to the data in our database.
This document provides an overview of advanced statistics concepts including descriptive statistics, estimation, hypothesis testing, and power analysis. It begins with foundational concepts like scales of measurement and graphical data exploration. It then covers topics like characteristics of estimators, confidence intervals, the principles of hypothesis testing, types of errors, and how to calculate and increase statistical power. The document is intended to teach students how to apply statistical tools and interpret results in research contexts.
This document discusses differentially private distributed Bayesian linear regression with Markov chain Monte Carlo (MCMC) methods. It proposes adding noise to the summaries (S) and coefficients (z) of local linear regression models on different devices to provide differential privacy. Gibbs sampling is used to simulate the genuine posterior distribution over the linear model parameters (theta, sigma_y, Sigma_x, z1:J, S1:J) in a distributed manner while maintaining privacy. Alternative approaches like exploiting approximate posteriors from all devices or learning iteratively are also mentioned.
This document discusses mixture models and approximations to computing model evidence. It contains:
1) An overview of mixtures of distributions and common priors used for mixtures.
2) Approximations to computing marginal likelihoods or model evidence using Chib's representation and Rao-Blackwellization. Permutations are used to address label switching issues.
3) Methods for more efficient sampling for computing model evidence, including iterative bridge sampling and dual importance sampling with approximations to reduce the number of permutations considered.
Sequential Monte Carlo is also briefly mentioned as an alternative approach.
This document describes the adaptive restore algorithm, a non-reversible Markov chain Monte Carlo method. It begins with an overview of the restore process, which takes regenerations from an underlying diffusion or jump process to construct a reversible Markov chain with a target distribution. The adaptive restore process enriches this by allowing the regeneration distribution to adapt over time. It converges almost surely to the minimal regeneration distribution. Parameters like the initial regeneration distribution and rates are discussed. Examples are provided for the adaptive Brownian restore algorithm and calibrating the parameters.
This document summarizes techniques for approximating marginal likelihoods and Bayes factors, which are important quantities in Bayesian inference. It discusses Geyer's 1994 logistic regression approach, links to bridge sampling, and how mixtures can be used as importance sampling proposals. Specifically, it shows how optimizing the logistic pseudo-likelihood relates to the bridge sampling optimal estimator. It also discusses non-parametric maximum likelihood estimation based on simulations.
This document discusses Bayesian restricted likelihood methods for situations where the likelihood cannot be fully trusted. It presents several approaches including empirical likelihood, Bayesian empirical likelihood, using insufficient statistics, approximate Bayesian computation (ABC), and MCMC on manifolds. The key ideas are developing Bayesian tools that are robust to model misspecification by questioning the likelihood, prior, and other assumptions.
This document discusses various methods for approximating marginal likelihoods and Bayes factors, including:
1. Geyer's 1994 logistic regression approach for approximating marginal likelihoods using importance sampling.
2. Bridge sampling and its connection to Geyer's approach. Optimal bridge sampling requires knowledge of unknown normalizing constants.
3. Using mixtures of importance distributions and the target distribution as proposals to estimate marginal likelihoods through Rao-Blackwellization. This connects to bridge sampling estimates.
4. The document discusses various methods for approximating marginal likelihoods and comparing hypotheses using Bayes factors. It outlines the historical development and connections between different approximation techniques.
1. The document discusses approximate Bayesian computation (ABC), a technique used when the likelihood function is intractable. ABC works by simulating parameters from the prior and simulating data, rejecting simulations that are not close to the observed data based on a tolerance level.
2. Random forests can be used in ABC to select informative summary statistics from a large set of possibilities and estimate parameters. The random forests classify simulations as accepted or rejected based on the summaries, implicitly selecting important summaries.
3. Calibrating the tolerance level in ABC is important but difficult, as it determines how close simulations must be to the observed data. Methods discussed include using quantiles of prior predictive simulations or asymptotic convergence properties.
The document summarizes Approximate Bayesian Computation (ABC). It discusses how ABC provides a way to approximate Bayesian inference when the likelihood function is intractable or too computationally expensive to evaluate directly. ABC works by simulating data under different parameter values and accepting simulations that are close to the observed data according to a distance measure and tolerance level. Key points discussed include:
- ABC provides an approximation to the posterior distribution by sampling from simulations that fall within a tolerance of the observed data.
- Summary statistics are often used to reduce the dimension of the data and improve the signal-to-noise ratio when applying the tolerance criterion.
- Random forests can help select informative summary statistics and provide semi-automated ABC
This document describes a new method called component-wise approximate Bayesian computation (ABCG or ABC-Gibbs) that combines approximate Bayesian computation (ABC) with Gibbs sampling. ABCG aims to more efficiently explore parameter spaces when the number of parameters is large. It works by alternately sampling each parameter from its ABC-approximated conditional distribution given current values of other parameters. The document provides theoretical analysis showing ABCG converges to a stationary distribution under certain conditions. It also presents examples demonstrating ABCG can better separate estimates from the prior compared to simple ABC, especially for hierarchical models.
ABC stands for approximate Bayesian computation. It is a method for performing Bayesian inference when the likelihood function is intractable or impossible to evaluate directly. ABC produces samples from an approximate posterior distribution by simulating parameter and summary statistic values that match the observed summary statistics within a tolerance level. The choice of summary statistics is important but difficult, as there is typically no sufficient statistic. Several strategies have been developed for selecting good summary statistics, including using random forests or the Lasso to evaluate and select from a large set of potential summaries.
The document describes a new method called component-wise approximate Bayesian computation (ABC) that combines ABC with Gibbs sampling. It aims to improve ABC's ability to efficiently explore parameter spaces when the number of parameters is large. The method works by alternating sampling from each parameter's ABC posterior conditional distribution given current values of other parameters and the observed data. The method is proven to converge to a stationary distribution under certain assumptions, especially for hierarchical models where conditional distributions are often simplified. Numerical experiments on toy examples demonstrate the method can provide a better approximation of the true posterior than vanilla ABC.
1) Likelihood-free Bayesian experimental design is discussed as an intractable likelihood optimization problem, where the goal is to find the optimal design d that minimizes expected loss without using the full posterior distribution.
2) Several Bayesian tools are proposed to make the design problem more Bayesian, including Bayesian non-parametrics, annealing algorithms, and placing a posterior on the design d.
3) Gaussian processes are a default modeling choice for complex unknown functions in these problems, but their accuracy is difficult to assess and they may incur a dimension curse.
हिंदी वर्णमाला पीपीटी, hindi alphabet PPT presentation, hindi varnamala PPT, Hindi Varnamala pdf, हिंदी स्वर, हिंदी व्यंजन, sikhiye hindi varnmala, dr. mulla adam ali, hindi language and literature, hindi alphabet with drawing, hindi alphabet pdf, hindi varnamala for childrens, hindi language, hindi varnamala practice for kids, https://www.drmullaadamali.com
The simplified electron and muon model, Oscillating Spacetime: The Foundation...RitikBhardwaj56
Discover the Simplified Electron and Muon Model: A New Wave-Based Approach to Understanding Particles delves into a groundbreaking theory that presents electrons and muons as rotating soliton waves within oscillating spacetime. Geared towards students, researchers, and science buffs, this book breaks down complex ideas into simple explanations. It covers topics such as electron waves, temporal dynamics, and the implications of this model on particle physics. With clear illustrations and easy-to-follow explanations, readers will gain a new outlook on the universe's fundamental nature.
How to Manage Your Lost Opportunities in Odoo 17 CRMCeline George
Odoo 17 CRM allows us to track why we lose sales opportunities with "Lost Reasons." This helps analyze our sales process and identify areas for improvement. Here's how to configure lost reasons in Odoo 17 CRM
Main Java[All of the Base Concepts}.docxadhitya5119
This is part 1 of my Java Learning Journey. This Contains Custom methods, classes, constructors, packages, multithreading , try- catch block, finally block and more.
How to Setup Warehouse & Location in Odoo 17 InventoryCeline George
In this slide, we'll explore how to set up warehouses and locations in Odoo 17 Inventory. This will help us manage our stock effectively, track inventory levels, and streamline warehouse operations.
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...PECB
Denis is a dynamic and results-driven Chief Information Officer (CIO) with a distinguished career spanning information systems analysis and technical project management. With a proven track record of spearheading the design and delivery of cutting-edge Information Management solutions, he has consistently elevated business operations, streamlined reporting functions, and maximized process efficiency.
Certified as an ISO/IEC 27001: Information Security Management Systems (ISMS) Lead Implementer, Data Protection Officer, and Cyber Risks Analyst, Denis brings a heightened focus on data security, privacy, and cyber resilience to every endeavor.
His expertise extends across a diverse spectrum of reporting, database, and web development applications, underpinned by an exceptional grasp of data storage and virtualization technologies. His proficiency in application testing, database administration, and data cleansing ensures seamless execution of complex projects.
What sets Denis apart is his comprehensive understanding of Business and Systems Analysis technologies, honed through involvement in all phases of the Software Development Lifecycle (SDLC). From meticulous requirements gathering to precise analysis, innovative design, rigorous development, thorough testing, and successful implementation, he has consistently delivered exceptional results.
Throughout his career, he has taken on multifaceted roles, from leading technical project management teams to owning solutions that drive operational excellence. His conscientious and proactive approach is unwavering, whether he is working independently or collaboratively within a team. His ability to connect with colleagues on a personal level underscores his commitment to fostering a harmonious and productive workplace environment.
Date: May 29, 2024
Tags: Information Security, ISO/IEC 27001, ISO/IEC 42001, Artificial Intelligence, GDPR
-------------------------------------------------------------------------------
Find out more about ISO training and certification services
Training: ISO/IEC 27001 Information Security Management System - EN | PECB
ISO/IEC 42001 Artificial Intelligence Management System - EN | PECB
General Data Protection Regulation (GDPR) - Training Courses - EN | PECB
Webinars: https://pecb.com/webinars
Article: https://pecb.com/article
-------------------------------------------------------------------------------
For more information about PECB:
Website: https://pecb.com/
LinkedIn: https://www.linkedin.com/company/pecb/
Facebook: https://www.facebook.com/PECBInternational/
Slideshare: http://www.slideshare.net/PECBCERTIFICATION
How to Add Chatter in the odoo 17 ERP ModuleCeline George
In Odoo, the chatter is like a chat tool that helps you work together on records. You can leave notes and track things, making it easier to talk with your team and partners. Inside chatter, all communication history, activity, and changes will be displayed.
it describes the bony anatomy including the femoral head , acetabulum, labrum . also discusses the capsule , ligaments . muscle that act on the hip joint and the range of motion are outlined. factors affecting hip joint stability and weight transmission through the joint are summarized.
This presentation includes basic of PCOS their pathology and treatment and also Ayurveda correlation of PCOS and Ayurvedic line of treatment mentioned in classics.
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPRAHUL
This Dissertation explores the particular circumstances of Mirzapur, a region located in the
core of India. Mirzapur, with its varied terrains and abundant biodiversity, offers an optimal
environment for investigating the changes in vegetation cover dynamics. Our study utilizes
advanced technologies such as GIS (Geographic Information Systems) and Remote sensing to
analyze the transformations that have taken place over the course of a decade.
The complex relationship between human activities and the environment has been the focus
of extensive research and worry. As the global community grapples with swift urbanization,
population expansion, and economic progress, the effects on natural ecosystems are becoming
more evident. A crucial element of this impact is the alteration of vegetation cover, which plays a
significant role in maintaining the ecological equilibrium of our planet.Land serves as the foundation for all human activities and provides the necessary materials for
these activities. As the most crucial natural resource, its utilization by humans results in different
'Land uses,' which are determined by both human activities and the physical characteristics of the
land.
The utilization of land is impacted by human needs and environmental factors. In countries
like India, rapid population growth and the emphasis on extensive resource exploitation can lead
to significant land degradation, adversely affecting the region's land cover.
Therefore, human intervention has significantly influenced land use patterns over many
centuries, evolving its structure over time and space. In the present era, these changes have
accelerated due to factors such as agriculture and urbanization. Information regarding land use and
cover is essential for various planning and management tasks related to the Earth's surface,
providing crucial environmental data for scientific, resource management, policy purposes, and
diverse human activities.
Accurate understanding of land use and cover is imperative for the development planning
of any area. Consequently, a wide range of professionals, including earth system scientists, land
and water managers, and urban planners, are interested in obtaining data on land use and cover
changes, conversion trends, and other related patterns. The spatial dimensions of land use and
cover support policymakers and scientists in making well-informed decisions, as alterations in
these patterns indicate shifts in economic and social conditions. Monitoring such changes with the
help of Advanced technologies like Remote Sensing and Geographic Information Systems is
crucial for coordinated efforts across different administrative levels. Advanced technologies like
Remote Sensing and Geographic Information Systems
9
Changes in vegetation cover refer to variations in the distribution, composition, and overall
structure of plant communities across different temporal and spatial scales. These changes can
occur natural.
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Presentation slides for my simulation course at Dauphine
1. Simulation: a ubiquitous tool for statistical computation
Statistique exploratoire (NOISE)
introduction to computer language R
rudiments of simulation (aka Monte Carlo methods)
illustration by the boostrap method
2. Simulation: a ubiquitous tool for statistical computation
Organisation
weekly computer labs using R and Rstudio
one computer account per student
midterm and final exams on anonymous account
exam made of programming problems (e.g., 3 out of 6)
exam evaluation based on executable R code
handouts and on-line help available on anonymous account
no other document
3. Simulation: a ubiquitous tool for statistical computation
Reference
documents on
https://www.ceremade.dauphine.fr/~xian/Noise.html
R. Drouihlet, P. Lafaye de Micheaux & B. Liquet (2010) Le
logiciel R Springer, Paris
C. Robert & G. Casella (2007) Introduction to Monte Carlo
Methods with R Springer, New York
Manuals on http://cran.r-project.org/manuals.html
(CRAN is the Comprehensive R Archive Network)
4. Simulation: a ubiquitous tool for statistical computation
Simulation: a ubiquitous tool for statistical
computation
Christian P. Robert
Universit´e Paris-Dauphine
http://www.ceremade.dauphine.fr/~xian
September 26, 2013
5. Simulation: a ubiquitous tool for statistical computation
Outline
Simulation, what’s that for?!
Producing randomness by deterministic means
Monte Carlo principles
Simulated annealing
6. Simulation: a ubiquitous tool for statistical computation
Simulation, what’s that for?!
Illustrations
Necessity to “(re)produce chance” on a computer
Evaluation of the behaviour of a complex system (network,
computer program, queue, particle system, atmosphere,
epidemics, economic actions, &tc)
[ c Office of Oceanic and Atmospheric Research]
7. Simulation: a ubiquitous tool for statistical computation
Simulation, what’s that for?!
Illustrations
Necessity to “(re)produce chance” on a computer
Production of changing landscapes, characters, behaviours in
computer games and flight simulators
[ c guides.ign.com]
8. Simulation: a ubiquitous tool for statistical computation
Simulation, what’s that for?!
Illustrations
Necessity to “(re)produce chance” on a computer
Determine probabilistic properties of a new statistical
procedure or under an unknown distribution [bootstrap]
(left) Estimation of the cdf F from a normal sample of 100 points;
(right) variation of this estimation over 200 normal samples
9. Simulation: a ubiquitous tool for statistical computation
Simulation, what’s that for?!
Illustrations
Necessity to “(re)produce chance” on a computer
Validation of a probabilistic model
Histogram of 103
variates from a distribution and fit by this distribution density
10. Simulation: a ubiquitous tool for statistical computation
Simulation, what’s that for?!
Illustrations
Necessity to “(re)produce chance” on a computer
Approximation of a integral
[ c my daughter’s math book]
11. Simulation: a ubiquitous tool for statistical computation
Simulation, what’s that for?!
Illustrations
Necessity to “(re)produce chance” on a computer
Maximisation of a weakly regular function/likelihood
[ c Dan Rice Sudoku blog]
12. Simulation: a ubiquitous tool for statistical computation
Simulation, what’s that for?!
Illustrations
Necessity to “(re)produce chance” on a computer
Pricing of a complex financial product (exotic options)
Simulation of a Garch(1,1) process and of its volatility (103
time units)
13. Simulation: a ubiquitous tool for statistical computation
Simulation, what’s that for?!
Illustrations
Necessity to “(re)produce chance” on a computer
Handling complex statistical problems by approximate
Bayesian computation (ABC)
core principle
Simulate a parameter value (at random) and pseudo-data from the
likelihood until the pseudo-data is “close enough” to the observed
data, then
keep the corresponding parameter value
[Griffith & al., 1997; Tavar´e & al., 1999]
14. Simulation: a ubiquitous tool for statistical computation
Simulation, what’s that for?!
Illustrations
Necessity to “(re)produce chance” on a computer
Handling complex statistical problems by approximate
Bayesian computation (ABC)
demo-genetic inference
Genetic model of evolution from a
common ancestor (MRCA)
characterized by a set of parameters
that cover historical, demographic, and
genetic factors
Dataset of polymorphism (DNA sample)
observed at the present time
!""#$%&'()*+,(-*.&(/+0$'"1)()&$/+2!,03!
1/+*%*'"4*+56(""4&7()&$/.+.1#+4*.+8-9':*.+
Différents scénarios possibles, choix de scenario par ABC
Le scenario 1a est largement soutenu par rapport aux
15. Simulation: a ubiquitous tool for statistical computation
Simulation, what’s that for?!
Illustrations
Necessity to “(re)produce chance” on a computer
Handling complex statistical problems by approximate
Bayesian computation (ABC)
Pygmies population demo-genetics
Pygmies populations: do they
have a common origin? when
and how did they split from
non-pygmies populations? were
there more recent interactions
between pygmies and
non-pygmies populations?
94
!""#$%&'()*+,(-*.&(/+0$'"1)()&$/+2!,03!
1/+*%*'"4*+56(""4&7()&$/.+.1#+4*.+8-9':*.+
16. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
Pseudo-random generator
Pivotal element/building block of simulation: always requires
availability of uniform U (0, 1) random variables
[ c MMP World]
17. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
Pseudo-random generator
Pivotal element/building block of simulation: always requires
availability of uniform U (0, 1) random variables
0.1333139
0.3026299
0.4342966
0.2395357
0.3223723
0.8531162
0.3921457
0.7625259
0.1701947
0.2816627
...
18. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
Pseudo-random generator
Pivotal element/building block of simulation: always requires
availability of uniform U (0, 1) random variables
Definition (Pseudo-random generator)
A pseudo-random generator is a deterministic function f from ]0, 1[
to ]0, 1[ such that, for any starting value u0 and any n, the
sequence
{u0, f(u0), f(f(u0)), . . . , fn
(u0)}
behaves (statistically) like an iid U (0, 1) sequence
19. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
Pseudo-random generator
Pivotal element/building block of simulation: always requires
availability of uniform U (0, 1) random variables
q
q
q
q
q
q
q
q
q
0.0 0.2 0.4 0.6 0.8
0.00.20.40.60.8
xt+1
10 steps (ut , ut+1) of a uniform generator
20. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
Philosophical foray
¡Paradox!
While avoiding randomness, the deterministic sequence
(u0, u1 = f(u0), . . . , un = f(un−1))
must resemble a random sequence!
Debate on whether or not true
randomness does exist (Laplace’s
demon versus Schroedinger’s
cat), in which case pseudo
random generators are not
random (von Neuman’s state of
sin)
21. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
Philosophical foray
¡Paradox!
While avoiding randomness, the deterministic sequence
(u0, u1 = f(u0), . . . , un = f(un−1))
must resemble a random sequence!
Debate on whether or not true
randomness does exist (Laplace’s
demon versus Schroedinger’s
cat), in which case pseudo
random generators are not
random (von Neuman’s state of
sin)
22. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
Philosophical foray
¡Paradox!
While avoiding randomness, the deterministic sequence
(u0, u1 = f(u0), . . . , un = f(un−1))
must resemble a random sequence!
Debate on whether or not true
randomness does exist (Laplace’s
demon versus Schroedinger’s
cat), in which case pseudo
random generators are not
random (von Neuman’s state of
sin)
23. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
Philosophical foray
¡Paradox!
While avoiding randomness, the deterministic sequence
(u0, u1 = f(u0), . . . , un = f(un−1))
must resemble a random sequence!
Debate on whether or not true
randomness does exist (Laplace’s
demon versus Schroedinger’s
cat), in which case pseudo
random generators are not
random (von Neuman’s state of
sin)
24. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
True random generators
Intel circuit producing “truly random” numbers:
There is no reason physical generators should be
“more” random than congruential (deterministic)
pseudo-random generators, as those are valid
generators, i.e. their distribution is exactly known
(e.g., uniform) and, in the case of parallel
generations, completely independent
25. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
True random generators
Intel generator satisfies all benchmarks of
“randomness” maintained by NIST:
Skepticism about physical devices, when compared
with mathematical functions, because of (a)
non-reproducibility and (b) instability of the device,
which means that proven uniformity at time t does
not induce uniformity at time t + 1
26. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
A standard uniform generator
The congruencial generator on {1, 2, . . . , M}
f(x) = (ax + b) mod (M)
has a period equal to M for proper choices of (a, b) and becomes a
generator on ]0, 1[ when dividing by M + 1
27. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
A standard uniform generator
The congruencial generator on {1, 2, . . . , M}
f(x) = (ax + b) mod (M)
has a period equal to M for proper choices of (a, b) and becomes a
generator on ]0, 1[ when dividing by M + 1
Example
Take
f(x) = (69069069x + 12345) mod (232
)
and produce
... 518974515 2498053016 1113825472 1109377984 ...
i.e.
... 0.1208332 0.5816233 0.2593327 0.2582972 ...
28. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
A standard uniform generator
The congruencial generator on {1, 2, . . . , M}
f(x) = (ax + b) mod (M)
has a period equal to M for proper choices of (a, b) and becomes a
generator on ]0, 1[ when dividing by M + 1
29. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
A standard uniform generator
The congruencial generator on {1, 2, . . . , M}
f(x) = (ax + b) mod (M)
has a period equal to M for proper choices of (a, b) and becomes a
generator on ]0, 1[ when dividing by M + 1
30. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
Approximating π
My daughter’s pseudo-code:
N=1000
ˆπ = 0
for I=1,N do
X=RDN(1), Y=RDN(1)
if X2
+ Y2
< 1 then
ˆπ = ˆπ + 1
end if
end for
return 4*ˆπ/N
34. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
Approximating π
My daughter’s pseudo-code:
N=1000
ˆπ = 0
for I=1,N do
X=RDN(1), Y=RDN(1)
if X2
+ Y2
< 1 then
ˆπ = ˆπ + 1
end if
end for
return 4*ˆπ/N
106
simulations
35. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
Distributions that differ from uniform distributions
Proble
Given a probability distribution with
density f , how can we produce
randomness according to f ?!
implemented algorithms in a
resident software only available for
common distributions
new distributions may require fast
resolution
no approximation allowed
x
f(x)
an arbitrary density
36. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
Distributions that differ from uniform distributions
Proble
Given a probability distribution with
density f , how can we produce
randomness according to f ?!
implemented algorithms in a
resident software only available for
common distributions
new distributions may require fast
resolution
no approximation allowed
x
f(x)
an arbitrary density
37. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
The Accept-Reject Algorithm (remember π?!)
Given a probability distribution with density f , how can we produce
randomness according to f ?!
The uniform distribution on the
sub-graph region
Sf = {(x, u); 0 ≤ u ≤ f (x)}
produces a marginal distribution
in x with density f .
(”Fundamental theorem of
simulation”)
x
f(x)
38. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
The Accept-Reject Algorithm (remember π?!)
Given a probability distribution with density f , how can we produce
randomness according to f ?!
In practice, this means we are
back to throwing dots on a box
containing
Sf = {(x, u); 0 ≤ u ≤ f (x)}
and counting those inside!
x
f(x)
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
39. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
The Accept-Reject Algorithm
Refinement of the above through construction of a proper box:
1. Find a hat on f , i.e. a density g that can be simulated and
such that
sup
x
f (x) g(x) = M < ∞
2. Generate dots on the subgraph of g, i.e. Y1, Y2, . . . ∼ g, and
U1, U2, . . . ∼ U ([0, 1])
3. Accept only the Yk’s such that
Uk ≤ f (Yk)/Mg(Yk)
40. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
The Accept-Reject Algorithm
Refinement of the above through construction of a proper box:
1. Find a hat on f , i.e. a density g that can be simulated and
such that
sup
x
f (x) g(x) = M < ∞
2. Generate dots on the subgraph of g, i.e. Y1, Y2, . . . ∼ g, and
U1, U2, . . . ∼ U ([0, 1])
3. Accept only the Yk’s such that
Uk ≤ f (Yk)/Mg(Yk)
41. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
The Accept-Reject Algorithm
Refinement of the above through construction of a proper box:
1. Find a hat on f , i.e. a density g that can be simulated and
such that
sup
x
f (x) g(x) = M < ∞
2. Generate dots on the subgraph of g, i.e. Y1, Y2, . . . ∼ g, and
U1, U2, . . . ∼ U ([0, 1])
3. Accept only the Yk’s such that
Uk ≤ f (Yk)/Mg(Yk)
43. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
Properties
Does not require normalisation constant of f
Does not require an exact upper bound M
Requires on average M Yk’s for one simulated X (efficiency
measure)
44. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
Properties
Does not require normalisation constant of f
Does not require an exact upper bound M
Requires on average M Yk’s for one simulated X (efficiency
measure)
45. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
Properties
Does not require normalisation constant of f
Does not require an exact upper bound M
Requires on average M Yk’s for one simulated X (efficiency
measure)
46. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
Slice sampler
There exists other ways of exploitating the fundamental idea of
simulation over the subgraph:
47. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
Slice sampler
There exists other ways of exploitating the fundamental idea of
simulation over the subgraph:
If direct uniform simulation on
Sf = {(u, x); 0 ≤ u ≤ f (x)}
is too complex [because of unavailable hat] use instead a random
walk on Sf :
48. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
Slice sampler
There exists other ways of exploitating the fundamental idea of
simulation over the subgraph:
this means doing random jumps in vertical then horizontal
direction, accounting for the boundaries
0 ≤ u ≤ f (x), i.e. U(0, 1)
f (x) ≥ u, i.e. x ∼ US(u)
49. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
Slice sampler
Slice sampler algorithm
For t = 1, . . . , T
when at (x(t), ω(t)) simulate
1. ω(t+1) ∼ U[0,f (x(t))]
2. x(t+1) ∼ US(t+1) , where
S(t+1)
= {y; f (y) ≥ ω(t+1)
}.
50. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
Slice sampler
Slice sampler algorithm
For t = 1, . . . , T
when at (x(t), ω(t)) simulate
1. ω(t+1) ∼ U[0,f (x(t))]
2. x(t+1) ∼ US(t+1) , where
S(t+1)
= {y; f (y) ≥ ω(t+1)
}.
0.0 0.2 0.4 0.6 0.8 1.0
0.0000.0020.0040.0060.0080.010
x
y
51. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
Slice sampler
Slice sampler algorithm
For t = 1, . . . , T
when at (x(t), ω(t)) simulate
1. ω(t+1) ∼ U[0,f (x(t))]
2. x(t+1) ∼ US(t+1) , where
S(t+1)
= {y; f (y) ≥ ω(t+1)
}.
0.0 0.2 0.4 0.6 0.8 1.0
0.0000.0020.0040.0060.0080.010
x
y
52. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
Slice sampler
Slice sampler algorithm
For t = 1, . . . , T
when at (x(t), ω(t)) simulate
1. ω(t+1) ∼ U[0,f (x(t))]
2. x(t+1) ∼ US(t+1) , where
S(t+1)
= {y; f (y) ≥ ω(t+1)
}.
0.0 0.2 0.4 0.6 0.8 1.0
0.0000.0020.0040.0060.0080.010
x
y
53. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
Slice sampler
Slice sampler algorithm
For t = 1, . . . , T
when at (x(t), ω(t)) simulate
1. ω(t+1) ∼ U[0,f (x(t))]
2. x(t+1) ∼ US(t+1) , where
S(t+1)
= {y; f (y) ≥ ω(t+1)
}.
0.0 0.2 0.4 0.6 0.8 1.0
0.0000.0020.0040.0060.0080.010
x
y
54. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
Slice sampler
Slice sampler algorithm
For t = 1, . . . , T
when at (x(t), ω(t)) simulate
1. ω(t+1) ∼ U[0,f (x(t))]
2. x(t+1) ∼ US(t+1) , where
S(t+1)
= {y; f (y) ≥ ω(t+1)
}.
0.0 0.2 0.4 0.6 0.8 1.0
0.0000.0020.0040.0060.0080.010
x
y
55. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
Slice sampler
Slice sampler algorithm
For t = 1, . . . , T
when at (x(t), ω(t)) simulate
1. ω(t+1) ∼ U[0,f (x(t))]
2. x(t+1) ∼ US(t+1) , where
S(t+1)
= {y; f (y) ≥ ω(t+1)
}.
0.0 0.2 0.4 0.6 0.8 1.0
0.0000.0020.0040.0060.0080.010
x
y
56. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
Slice sampler
Slice sampler algorithm
For t = 1, . . . , T
when at (x(t), ω(t)) simulate
1. ω(t+1) ∼ U[0,f (x(t))]
2. x(t+1) ∼ US(t+1) , where
S(t+1)
= {y; f (y) ≥ ω(t+1)
}.
0.0 0.2 0.4 0.6 0.8 1.0
0.0000.0020.0040.0060.0080.010
x
y
57. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
The Metropolis–Hastings algorithm
Further generalisation of the fundamental idea to situations when
the slice sampler cannot be easily implemented
58. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
The Metropolis–Hastings algorithm
Further generalisation of the fundamental idea to situations when
the slice sampler cannot be easily implemented
Idea
Create a sequence (Xn) such that, for n ‘large enough’, the density
of Xn is close to f , only using a ‘local’ knowledge of f ...
This is the domain of Markovian
simulation methods: there is a Markov
dependence between the Xn’s
Andre¨ı Markov
59. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
The Metropolis–Hastings algorithm
Further generalisation of the fundamental idea to situations when
the slice sampler cannot be easily implemented
Idea
Create a sequence (Xn) such that, for n ‘large enough’, the density
of Xn is close to f , only using a ‘local’ knowledge of f ...
This is the domain of Markovian
simulation methods: there is a Markov
dependence between the Xn’s
Markov bar, Melbourne
60. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
The Metropolis–Hastings algorithm
Further generalisation of the fundamental idea to situations when
the slice sampler cannot be easily implemented
‘local’ exploration can produce ‘global’ vision:
[ c Civilization 5]
61. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
The Metropolis–Hastings algorithm (2)
If f is the density of interest, we pick a proposal conditional density
q(y|xn)
such that
it connects with the current value xn
it is easy to simulate
it is positive everywhere f is positive
62. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
The Metropolis–Hastings algorithm (2)
If f is the density of interest, we pick a proposal conditional density
q(y|xn)
such that
it connects with the current value xn
it is easy to simulate
it is positive everywhere f is positive
63. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
Random walk Metropolis–Hastings
Proposal
Yt = Xn + εn,
where εn ∼ g, independent from Xn, and g symmetrical
64. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
Random walk Metropolis–Hastings
Proposal
Yt = Xn + εn,
where εn ∼ g, independent from Xn, and g symmetrical
Motivation
local perturbation of Xn / myopic exploration of its neighbourhood
Yn accepted or rejected depending on the relative values of f (Xn)
and f (Yn)
65. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
Random walk Metropolis–Hastings
Proposal
Yt = Xn + εn,
where εn ∼ g, independent from Xn, and g symmetrical
66. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
Random walk Metropolis–Hastings
Resulting algorithm
Random walk Metropolis–Hastings
Starting from X(t) = x(t)
1. Generate Yt ∼ g(y − x(t))
2. Take
X(t+1)
=
Yt with proba. min 1,
f (Yt)
f (x(t))
,
x(t) otherwise
67. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
Properties
Always accepts higher point and sometimes lower points
(similar to gradient algorithms)
Depends on the dispersion of g
Average probability of acceptance
= min{f (x), f (y)}g(y − x) dxdy
close to 1 if g has a small variance [Danger!]
far from 1 if g has a large variance [Re-Danger!]
68. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
Properties
Always accepts higher point and sometimes lower points
(similar to gradient algorithms)
Depends on the dispersion of g
Average probability of acceptance
= min{f (x), f (y)}g(y − x) dxdy
close to 1 if g has a small variance [Danger!]
far from 1 if g has a large variance [Re-Danger!]
69. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
Properties
Always accepts higher point and sometimes lower points
(similar to gradient algorithms)
Depends on the dispersion of g
Average probability of acceptance
= min{f (x), f (y)}g(y − x) dxdy
close to 1 if g has a small variance [Danger!]
far from 1 if g has a large variance [Re-Danger!]
70. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
Properties
scale=1
71. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
Properties
scale=2
72. Simulation: a ubiquitous tool for statistical computation
Producing randomness by deterministic means
Properties
scale=3
73. Simulation: a ubiquitous tool for statistical computation
Monte Carlo principles
General purpose of Monte Carlo methods
Given a probability density f known up to a normalizing constant,
f (x) ∝ ˜f (x), and an integrable function h, compute
I(h) = h(x)f (x)dx =
h(x)˜f (x)dx
˜f (x)dx
when h(x)˜f (x)dx is intractable.
[Remember π!!]
74. Simulation: a ubiquitous tool for statistical computation
Monte Carlo principles
General purpose of Monte Carlo methods
Given a probability density f known up to a normalizing constant,
f (x) ∝ ˜f (x), and an integrable function h, compute
I(h) = h(x)f (x)dx =
h(x)˜f (x)dx
˜f (x)dx
when h(x)˜f (x)dx is intractable.
[Remember π!!]
75. Simulation: a ubiquitous tool for statistical computation
Monte Carlo principles
Monte Carlo basics
Generate a pseudo-random
sample x1, . . . , xN from f and
estimate I(h) by
ˆIMC
N (h) = N−1
N
i=1
h(xi ).
and let N grow to infinity...
76. Simulation: a ubiquitous tool for statistical computation
Monte Carlo principles
Monte Carlo basics
Generate a pseudo-random
sample x1, . . . , xN from f and
estimate I(h) by
ˆIMC
N (h) = N−1
N
i=1
h(xi ).
and let N grow to infinity...
A. Doucet, C. Andrieu, X, A. Philippe, J. Rosenthal, E. Moulines
77. Simulation: a ubiquitous tool for statistical computation
Monte Carlo principles
Monte Carlo basics
Generate a pseudo-random
sample x1, . . . , xN from f and
estimate I(h) by
ˆIMC
N (h) = N−1
N
i=1
h(xi ).
and let N grow to infinity...
78. Simulation: a ubiquitous tool for statistical computation
Monte Carlo principles
Monte Carlo basics
Generate a pseudo-random
sample x1, . . . , xN from f and
estimate I(h) by
ˆIMC
N (h) = N−1
N
i=1
h(xi ).
and let N grow to infinity...
Caveat
Often impossible or inefficient to simulate directly from f
79. Simulation: a ubiquitous tool for statistical computation
Monte Carlo principles
Importance Sampling
For proposal distribution with density q(x), alternative
representations
I(h) = h(x){f /q}(x)q(x)dx
Principle
Generate an iid sample x1, . . . , xN ∼ q and estimate I(h) by
ˆIIS
q,N(h) = N−1
N
i=1
h(xi ){f /q}(xi ).
80. Simulation: a ubiquitous tool for statistical computation
Monte Carlo principles
Importance Sampling
For proposal distribution with density q(x), alternative
representations
I(h) = h(x){f /q}(x)q(x)dx
Principle
Generate an iid sample x1, . . . , xN ∼ q and estimate I(h) by
ˆIIS
q,N(h) = N−1
N
i=1
h(xi ){f /q}(xi ).
81. Simulation: a ubiquitous tool for statistical computation
Simulated annealing
Optimisation problems
A (genuine) puzzle
During a dinner with 20 couples sitting at four tables with ten
seats, everyone wants to share a table with everyone. The
assembly decides to switch seats after each serving towards this
goal. What is the minimal number of servings needed to ensure
that every couple shared a table with every other couple at some
point? And what is the optimal switching strategy?
[http://xianblog.wordpress.com/2012/04/12/not-le-monde-puzzle/]
solution to the puzzle
82. Simulation: a ubiquitous tool for statistical computation
Simulated annealing
Stochastic minimisation
Example (Egg-box function)
Consider the function
h(x, y) = (x sin(20y) + y sin(20x))2
cosh(sin(10x)x)
+ (x cos(10y) − y sin(10x))2
cosh(cos(20y)y) ,
to be minimised.
83. Simulation: a ubiquitous tool for statistical computation
Simulated annealing
Stochastic minimisation
Example (Egg-box function)
Consider the function
h(x, y) = (x sin(20y) + y sin(20x))2
cosh(sin(10x)x)
+ (x cos(10y) − y sin(10x))2
cosh(cos(20y)y) ,
to be minimised. (I know
that the global minimum is 0
for (x, y) = (0, 0).)
84. Simulation: a ubiquitous tool for statistical computation
Simulated annealing
Stochastic minimisation
Example (Egg-box function)
Consider the function
h(x, y) = (x sin(20y) + y sin(20x))2
cosh(sin(10x)x)
+ (x cos(10y) − y sin(10x))2
cosh(cos(20y)y) ,
to be minimised. (I know
that the global minimum is 0
for (x, y) = (0, 0).)
85. Simulation: a ubiquitous tool for statistical computation
Simulated annealing
Stochastic minimisation
Example (egg-box function (2))
Instead of solving the first order equations
∂h(x, y)
∂x
= 0 ,
∂h(x, y)
∂y
= 0
and of checking that the second order conditions are met, we can
generate a random sequence in R2
θj+1 = θj +
αj
2βj
∆h(θj , βj ζj ) ζj
86. Simulation: a ubiquitous tool for statistical computation
Simulated annealing
Stochastic minimisation
Example (egg-box function (2))
we can generate a random sequence in R2
θj+1 = θj +
αj
2βj
∆h(θj , βj ζj ) ζj
where
the ζj ’s are uniform on the unit circle x2 + y2 = 1;
∆h(θ, ζ) = h(θ + ζ) − h(θ − ζ);
(αj ) and (βj ) converge to 0
87. Simulation: a ubiquitous tool for statistical computation
Simulated annealing
Calibration parameters
choice 1 2 3 4
αj 1/ log(j + 1) 1/100 log(j + 1) 1/(j + 1) 1/(j + 1)
βj 1/ log(j + 1).1 1/ log(j + 1).1 1/(j + 1).5 1/(j + 1).1
α ↓ 0 slowly, j αj = ∞
β ↓ 0 more slowly,
j (αj /βj )2 < ∞
Scenarios 1-2: not enough
energy
Scenarios 3-4: good
[ c George Casella 1951–2012
88. Simulation: a ubiquitous tool for statistical computation
Simulated annealing
The traveling salesman problem
A classical allocation problem:
Salesman who needs to
visit n cities in a row
Traveling costs between
pairs of cities are known
Search of the optimal
circuit
Aboriginal art, NGA, Melbourne
89. Simulation: a ubiquitous tool for statistical computation
Simulated annealing
The traveling salesman problem
A classical allocation problem:
Salesman who needs to
visit n cities in a row
Traveling costs between
pairs of cities are known
Search of the optimal
circuit
Aboriginal art, NGA, Melbourne
90. Simulation: a ubiquitous tool for statistical computation
Simulated annealing
The traveling salesman problem
A classical allocation problem:
Salesman who needs to
visit n cities in a row
Traveling costs between
pairs of cities are known
Search of the optimal
circuit
Procter & Gamble competition, 1962
91. Simulation: a ubiquitous tool for statistical computation
Simulated annealing
An NP-complete problem
The traveling salesman
problem is an example of
mathematical problems that
require explosive resolution
times
Aboriginal art, NGA, Melbourne
92. Simulation: a ubiquitous tool for statistical computation
Simulated annealing
An NP-complete problem
The traveling salesman
problem is an example of
mathematical problems that
require explosive resolution
times
Number of possible circuits n!
and exact solutions available
in O(2n) time
Exact solution for 15, 112 German cities
found in 2001 in 22.6 CPU years.
93. Simulation: a ubiquitous tool for statistical computation
Simulated annealing
An NP-complete problem
The traveling salesman
problem is an example of
mathematical problems that
require explosive resolution
times
Numerous practical
consequences (networks,
integrated circuit design,
genomic sequencing, &tc.)
Exact solution for the 24, 978 Sweedish cities
found in 2004 in 84.8 CPU years.
94. Simulation: a ubiquitous tool for statistical computation
Simulated annealing
Resolution via simulation
The simulated annealing algorithm:
name is borrowed from metallurgy
metal manufactured by a slow
decrease of temperature
(annealing) stronger than when
manufactured by fast decrease
[ c Joachim Robert, ca. 2006]
95. Simulation: a ubiquitous tool for statistical computation
Simulated annealing
Resolution via simulation
Repeat
Random modifications of parts of the original circuit with cost
C0
Evaluation of the cost C of the new circuit
Acceptation of the new circuit with probability
min exp
C0 − C
T
, 1
[Metropolis et al., 1953]
96. Simulation: a ubiquitous tool for statistical computation
Simulated annealing
Resolution via simulation
Repeat
Random modifications of parts of the original circuit with cost
C0
Evaluation of the cost C of the new circuit
Acceptation of the new circuit with probability
min exp
C0 − C
T
, 1
T, temperature, is progressively reduced
[Metropolis et al., 1953]
97. Simulation: a ubiquitous tool for statistical computation
Simulated annealing
A family meeting (1)
Recall the table puzzle
Definition of a target function
Set the story of the 20 couple tables during the 6 courses as a
(20, 6) matrix, e.g.,
S =
1 1 3 2 4 2
2 1 3 3 2 4
...
...
...
...
...
...
3 2 2 2 4 4
and define a penalty function as the number of missed encounters
98. Simulation: a ubiquitous tool for statistical computation
Simulated annealing
A family meeting (1)
Recall the table puzzle
Definition of a target function
I=sample(rep(1:4,5))
for (i in 2:6)
I=cbind(I,sample(rep(1:4,5)))
meet=function(I){
M=outer(I[,1],I[,1],"==")
for (i in 2:6)
M=M+outer(I[,i],I[,i],"==")
M
}
penalty=function(M){ sum(M==0) }
penat=penalty(meet(I))
99. Simulation: a ubiquitous tool for statistical computation
Simulated annealing
A family meeting (2)
Random switch of couples
Pick two couples [among the 20 couples] at random with
probabilities proportional to the number of other couples they
have not seen
prob=apply(meet(I)==0,1,sum)
switch their respective position during one of the 6 courses
accept the switch with Metropolis–Hastings probability
log(runif(1))<(penalty(old)-penalty(new))/gamma
100. Simulation: a ubiquitous tool for statistical computation
Simulated annealing
A family meeting (2)
Random switch of couples
For instance, propose to replace
S =
1 1 3 2 4 2
2 1 3 3 2 4
...
...
...
...
...
...
3 2 2 2 4 4
with S =
1 1 3 2 2 2
2 1 3 3 4 4
...
...
...
...
...
...
3 2 2 2 4 4
101. Simulation: a ubiquitous tool for statistical computation
Simulated annealing
A family meeting (2)
Random switch of couples
for (t in 1:N){
prop=sample(1:20,2,prob=apply(meet(I)==0,1,sum))
cour=sample(1:6,1)
Ip=I
Ip[prop[1],cour]=I[prop[2],cour]
Ip[prop[2],cour]=I[prop[1],cour]
penatp=penalty(meet(Ip))
if (log(runif(1))<(penat-penatp)/gamma){
I=Ip
penat=penatp}
}
103. Simulation: a ubiquitous tool for statistical computation
Simulated annealing
Solving sudokus
Solving a Sudoku grid as a minimisation problem:
[ c Dan Rice Sudoku blog]
104. Simulation: a ubiquitous tool for statistical computation
Simulated annealing
Solving sudokus
Given a partly filled Sudoku grid (with a single solution),
define a random Sudoku grid by filling the empty slots at
random
s=matrix(0,ncol=9,nrow=9)
s[1,c(1,6,7)]=c(8,1,2)
s[2,c(2:3)]=c(7,5)
s[3,c(5,8,9)]=c(5,6,4)
s[4,c(3,9)]=c(7,6)
s[5,c(1,4)]=c(9,7)
s[6,c(1,2,6,8,9)]=c(5,2,9,4,7)
s[7,c(1:3)]=c(2,3,1)
s[8,c(3,5,7,9)]=c(6,2,1,9)
105. Simulation: a ubiquitous tool for statistical computation
Simulated annealing
Solving sudokus
Given a partly filled Sudoku grid (with a single solution),
define a random Sudoku grid by filling the empty slots at
random
define a penalty function corresponding to the number of
missed constraints
#local score
scor=function(i,s){
a=((i-1)%%9)+1
b=trunc((i-1)/9)
boxa=3*trunc((a-1)/3)+1
boxb=3*trunc(b/3)+1
return(sum(s[i]==s[9*b+(1:9)])+
sum(s[i]==s[a+9*(0:8)])+
sum(s[i]==s[boxa:(boxa+2),boxb:(boxb+2)])-3)
}
106. Simulation: a ubiquitous tool for statistical computation
Simulated annealing
Solving sudokus
Given a partly filled Sudoku grid (with a single solution),
define a random Sudoku grid by filling the empty slots at
random
define a penalty function corresponding to the number of
missed constraints
fill “deterministic” slots
make simulated annealing moves
# random moves on the sites
i=sample((1:81)[as.vector(s)==0],sample(1:sum(s==0),1,pro=1/(1:sum(s==0))))
for (r in 1:length(i))
prop[i[r]]=sample((1:9)[pool[i[r]+81*(0:8)]],1)
if (log(runif(1))/lcur<tarcur-target(prop)){
nchange=nchange+(tarcur>target(prop))
cur=prop
points(t,tarcur,col="forestgreen",cex=.3,pch=19)
tarcur=target(cur)
}
107. Simulation: a ubiquitous tool for statistical computation
Simulated annealing
Solving sudokus
many possible variants in the
proposals
rather slow and sometimes
really slow
may get stuck on a penalty of
2 (and never reach zero)
does not compete at all with
nonrandom solvers