1.
Stochastic models + quasi-random (Teytaud, Tao (Inria), Lri (Paris-Sud), UMR-Cnrs 8623, France; OASE Lab, NUTN, TaiwanFirst part: randomness.What is a stochastic / randomized modelTerminology, toolsSecond part: quasi-random pointsRandom points can be very disappointingSometimes quasi-random points are better
2.
Useful maths we will need these tools...Prime number: 2,3,5,7,11,13,17,...P(A|B): conditionning in probability. P(dice=1 | dice in {1,2,3} ) ? P(dice=3 | dice in {1,2} ) ?Frequency in datas x(1),x(2),...,x(n): 1,2,6,3,7: frequency(odd) ? frequency ( x(i+1) > x(i) ) ? frequency ( x(i+1) > 3 | x(i) < 4 ) ?
3.
Lets take time for understanding random simulationsI guess you all know how to simulate a randomvariable uniform in [0,1] e.g. double u=drand48();But do you know how to simulate one year ofweather in Tainan ?Not so simple.Lets see this in more details.
4.
Random sequence in dimension 1What is a climate model ?Define: w1 = weather at time step 1 w2 = weather at time step 2 w3 = weather at time step 3 w4 = weather at time step 4 … ==> lets keep it simple, lets define the weather by one single number in [0,1].
5.
I want a generative modelAs well as I can repeat u=drand48(), andgenerate a sample u1, u2, u3, I want to be ableto generateW1=(w11,w12,w13,...,w1T)W2=(w21,w22,w23,...,w2T)W3=...… ==> think of a generator of curves
6.
Random sequence in dimension 1What is a climate model ?Define: w1 = weather at time step 1 The models tells you how can be w1. For example, it gives the density function g: P(w1 in I) = integral of g on I 0 1
7.
Take-home message number 1: a random variable w on R is entirely defined by P(w in I) for each interval I 0 1
8.
Random sequence in dimension 1P(w1 in I) = integral of g on IP(w1 <= c) = integral of g on [-infinity,c] = G(c) g 0 1
9.
Generating w1: easy with the inverse cumulative distribution P(w1 in I) = integral of g on I P(w1 <= c) = integral of g on [0,c] = G(c) GConsider invG= inverse of G.G=cumulative distributioni.e. G(invG(x))=x Trick for generating w1:u=drand48()w1=invG(u)=invCDF(u); 1 g 0
10.
Generating w1: easy with theinverse cumulative distribution 1 Gu=drand48()w1=invG(u); 1 0
11.
Generating w1: easy with theinverse cumulative distribution 1 Gu=drand48()w1=invG(u); 1 0
12.
Generating w1: easy with theinverse cumulative distribution 1 Gu=drand48()w1=invG(u); 0
13.
Generating w1: easy with theinverse cumulative distribution 1 Gu=drand48()w1=invG(u); 0
14.
Generating w1: easy with theinverse cumulative distribution 1 Gu=drand48()w1=invG(u); 0
15.
Take-home message number 2: a random variable w on R is more conveniently defined by P(w <= t) for each t, and the best is invCDF = inverse of (t → P(w<=t)) Because then: w=invP(drand48());
16.
Generating w2: also easy withinv. cumulative distribution ?w1=invG1(drand48()); can wegeneratew2=invG2(drand48()); each wiw3=invG3(drand48()); independently ?… ==> very easy !==> but very bad :-(==> no correlation :-(==> w4 very high and w5 very low is unrealistic; but in this model it happens very often!
17.
Generating wi: also easy withinv. cumulative distribution ? Realistic: large-scale variations! Unrealistic; and average value almost constant
18.
So how can we do ?ould not give the (independent) distribution of w2, but the distribution of w2rand48());rand48());rand48());e sense ? This is a Markov Chain.should NOT be generated independently!
19.
Variant ould not give the (independent) distribution of w2, but the distribution of w2 drand48());w1, drand48());w1, drand48());w2, drand48());w3, drand48());kov chainorder 1 for today
20.
Lets see an exampleAssume that we have a plant.This plant is a function: (Production,State,Benefit) = f( Demand , State , Weather )Demand = g(weather,economy,noise) (where Economy is the part of Economy which is not too dependent on weather)Benefit per year
21.
GraphicallyWeather: w1, w2, w3, w4, w5, … ==> random sequence ==> we assume a distribution of w(i) | w(i-1) ==> this is a Markov Model ( forget w(i-2) )Economy e1, e2, e3, e4, e5, … ==> random sequence ==> we assume a distribution of e(i) | e(i-1)Noise = given distribution ==> n1, n2, n3, ....
22.
Graphically m ea ns : de e1 e2 pe e3 e4 e5 n de nc d1 d2 d3y d4 d5 w1 w2 w3 w4 w5The “model” should tell you how to generate d2, given d1, e2,w2.(ei,di,wi) is a Markov chain. (di) is a hidden Markov chain: a part is hidden.
23.
How to build a stochastic model ?Its about uncertaintiesEven without hidden models, its complicatedWe have not discussed how to design astochastic model (typically from historicalarchive):Typically, discretization: w(k) in I1 or I2 or I3 with I1=[- ,a],I2=]a,b], I3=]b,I ]G(w,w)= frequency of w(k+1) <= w for w(k) in same interval as w
24.
Yet another take-home messageTypically, discretization: w(k) in I1 or I2 or I3 with I1=[- ,a], I2=]a,b], I3=]b,, ]G(w,w)= frequency of w(k+1) <= w for w(k) in same interval as w(obviously more intervals in many real cases...)==> However, this reduces extreme values
25.
A completely different approach ?Write p1,p2,p3,...,pN all the parameters of themodelCollect data x1,...,xDFor each i in {1,2,...,D}, xi=(xi1,...,xiT) = a curveOptimize p1,p2,p3,...,pN so that all moments oforder <= 2 are (nearly) the same as the moments ofthe archive.Moment1(i) = (x1i+x2i+...+xDi)/D ==> where is i ?Moment2(i,j) = average of
26.
Example of parametric HMMParameters = { parameters of e, parameters ofw, parameters of d } = {15 sets of parameters }= very big e1 e2 e3 e4 e5 d1 d2 d3 d4 d5 w1 w2 w3 w4 w5
27.
Main troublesOk, we know what is a stochastic modelThe case of HMM is much more complicated(but tools exist)But gathering data is not always so easy.For example, climate: do you trust the 50 lastyears for predicting the next 10 years ?Even if you trust the past 50 years, do you thinkits enough for building a sophisticated model ?We need a combination between
28.
ValidationStatistical models always lieBecause the structure is wrongBecause there are not enough data ==> typically, extreme values are more rare in models than in realityCheck the extreme eventsUsually, its good to have more extreme valuesthan datas (because all models tend to makethem too rare...).
29.
Example: French climateFrance has a quite climateNo big windNo heavy rains 6.2 times moreNo heat wave than 921 earthquake!But:2003: huge heat wave. 15 000 died in France.1999: hurricane-like winds (96 died in Europe; gusts at 169 km/h in Paris)1987: huge rain falls (96 mm in 24 hours)
30.
Example: 2003 heat waveParis:9 days with max temp. > 35°C1 night with no less than 25.5°C <== disasterFrance: 15 000 diedItaly: 20 000 died==> European countries were not ready for this
31.
Example: 2003 heat wave==> plenty of take-home messagesBad model: air conditionning sometimesautomatically stopped because suchhigh temperatures = considered asmeasurement bugs ==> extreme valuesneglectedHeat wave + no wind ==> increasedpollution ==> old people die (babies carefully
32.
Example: 2003 heat wave ==> plenty of take-home messagesBe careful with extreme valuesneglected ==> extreme values are not always measurement bugs ==> removing air conditionning because its too hot... (some systems were not ready
33.
Example: 2003 heat wave ==> plenty of take-home messagesBe careful with extreme valuesneglected ==> extreme values are not always measurement bugsIndependence is a very strongassumption
34.
Example: 2003 heat wave ==> plenty of take-home messagesBe careful with extreme valuesneglected ==> extreme values are not always measurement bugsIndependence is a very strongassumption
35.
Quasi-random points (Teytaud, Tao (Inria), Lri (Paris-Sud), UMR-Cnrs 8623; collabs with S. Gelly, J. Mary, S. Lallich, E. Prudhomme,...)Quasi-random points ?Dimension 1Dimension nBetter in dimension nStrange spaces
37.
Why do we need random / quasi-random points ?Numerical integration [thousands of papers; Niederreiter 92] integral(f) nearly equal to sum f(xi)Learning [Cervellera et al, IEEETNN 2004, Mary phD 2005]Optimization [Teytaud et al, EA2005]Modelizat° of random-process [Growe-Kruska et al, IEEEBPTP03]Path planning [Tuffin]
38.
Where do we need numerical integration ?Just everywhere.Expected pollution (=average pollution...)= integral of possible pollutions as a function of many random variables (weather, defaults on pieces, gasoline, use of the car...)
39.
Take-home messageWhen optimizingthe design of somethingwhich is built in a factory,take into account the variance in the productionsystem ==> all cars are different.==> very important effect==> real piece != specifications
40.
Why do we need numerical integration ?Expected benefit (=average benefit...)= integral of possible benefit as a function of many random variables (weather, prices of raw materials...) ==> economical benefit (company) ==> overall welfare (state)
41.
Why do we need numerical integration ?Risk (=probability of failure...)= integral of possible failures as a function of many random variables (quakes, flood, heat waves, electricity breakdowns, human error...)
42.
Take-home messageHuman error must be takeninto account:- difficult to modelize- e.g. a minimum probability that action X is not performed (for all actions) (or that unexpected action Y is performed) (what about an adversarial human ?) ==> protection by independent validations
43.
Why do we need numerical integration ?Expected benefit as a functionof many prices/random variables,Expected efficiency depending on machiningvibrationsEvaluating schedulings in industry (with random events like faults, delay...)(e.g. processors)
44.
How to know if some points are well distributed ?I propose N points x=(x1,...,xN)How to know if these points are well distributed ?A naive solution: f(x)=max min ||y-xi|| (maximized) y i (naive, but not always so bad)
45.
How to know if some points are well distributed ?I propose N points x=(x1,...,xN)How to know if these points are well distributed ?A naive solution: g(x)=min min ||xj-xi||2 (maximized) i j!=i = “dispersion” (naive, but not always so bad)
47.
Low Discrepancy ?Discrepancy2 = mean ( |Area – Frequency |2 ) Rectangle
48.
Is there better than random points for low discrepancy ?Random --> Discrepancy ~ sqrt ( 1/n )Quasi-random --> Discrepancy ~ log(n)^d/nQuasi-random with N known --> Discrepancy ~ log(n)^(d-1)/nKoksma & Hlawka :error in Monte-Carlo integration < Discrepancy x VV= total variation (Hardy & Krause)( many generalizations in Hickernel, A GeneralizedDiscrepancy and Quadrature Error Bound, 1997 )==> sometimes V or log(n)^d huge==> dont always trust QR
55.
Dimension 1What would you do ?--> Van Der Corputn=1, n=2, n=3...n=1, n=10, n=11, n=100, n=101, n=110... (p=2)x=.1, x=.01, x=.11, x=.001, x=.101, … (binary!)
56.
Dimension 1What would you do ?--> Van Der Corputn=1, n=2, n=3...n=1, n=2, n=10, n=11, n=12, n=20... (p=3)x=.1, x=.2, x=.01, x=.11, x=.21, x=.02... (ternary!)
57.
Dimension 1 more generalp=2, but also p=3, 4, ... but p=13 is not very nice :
58.
Dimension 2: maybe just use two Van Der Corput sequences with same p ?x --> (x,x) ?
59.
Dimension 2x --> (x,x) ? with two different basis.
60.
Dimension 2 or n : Haltonx --> (x,x) with diff. prime numbers is ok(needsmaths...)(as smallnumbersare better,use the nsmallest...)
62.
Dimension n : the troubleThere are not so many small prime numbers
63.
Dimension n : scrambling (here, random comes back)Pi(p) : [1,p-1] --> [1,p-1]Pi(p) applied tocoordinate withprime number p
64.
Dimension n : scramblingPi(p) : [1,p-1] --> [1,p-1] (randomly chosen)Pi(p) applied to coordinate with prime p (thereis much more complicated)
65.
Beyond low discrepancy ?Other discrepancies : why rectangles ?Other solutions : lattices{x0+nx} modulo 1 (very fast and simple)Lets see very different approaches Low discrepancy for other spaces than [0,1]^n Stratification Symmetries
66.
Some animals are quite good Why in the square ? for low-discrepancyOther spaces/distributions:gaussians,sphere
67.
Why in the square ?Uniformity in the square is okBut what about Gaussians distributions ?x in ]0,1[^dy(i) such that P( N > y(i) ) = x(i)with N standard gaussianthen y is quasi-random and gaussian ==> so you can have quasi-random Gaussian numbers
68.
Why in the square ?Other n-dimensionnal random variables by the “conditionning” trickConsider a QR point: (x1,....xn) in [0,1]^nYou want to simulate z with distribution Zz1=inf { z; P(Z1<z) >x1 } = invG1(x1)z2=inf { z; P(Z2<z|Z1=z1) > x2 } = invG2(z1,x2)z3=inf { z; P(Z3<z|Z1=z1,Z2=z2) > x3 } = invG2(z1,z2,x3)
69.
Why in the square ? Theorem: If x is random([0,1]n), then z is distributed as Z !==> convert the uniform square into strange spaces or variables
70.
Why not for random walks ?500 steps of random walks ==> hugedimensionQuasi-random basically does not work in hugedimensionBut first coordinates of QR are ok; just usethem for mostimportant coordinates! ==> change the order of variablesand use conditionning !
71.
Why not for random walks ?Quasi-random number x in R^500 (e.g. Gaussian)Change order: y(250) first (y(250) ---> x(1) )y(1 | y(250) ) <---> x(2)y(500 | y(1) and y(250)) <---> x(3)
72.
Why not for random walks ?500 steps of random walks ==> hugedimensionBut strong derandomization possible : start byy(250), then y(1), then y(500), then y(125), theny(375)...
73.
Why not for random walks ?500 steps of random walks ==> hugedimensionBut strong derandomization possible :
74.
Very different approaches for derandomization ?Symetries : instead of x1 and x2 in [0,1], try x and 1-xOr more generally, just draw n/2 points,and use their symetries ==> in dimension d, n/2d points and their 2d
75.
Free ! Symmetries in Octave/Matlabx=rand(800,2);subplot(2,2,1);plot(x(:,1),x(:,2),+);x=rand(400,2);x=[x;1-x];subplot(2,2,2);plot(x(:,1),x(:,2),+);x=rand(200,2);x=[x;1-x;x(:,1),1-x(:,2);1-x(:,1),x(:,2)];subplot(2,2,3);plot(x(:,1),x(:,2),+);x=rand(100,2);x=[x;1-x;x(:,1),1-x(:,2);1-x(:,1),x(:,2)];x=[x;x(:,2),x(:,1)];subplot(2,2,4);plot(x(:,1),x(:,2),+);
76.
Antithetic variables in Octave/Matlabx=rand(800,2);subplot(2,2,1);plot(x(:,1),x(:,2),+);x=rand(400,2);x=[x;1-x];subplot(2,2,2);plot(x(:,1),x(:,2),+);x=rand(200,2);x=[x;1-x;x(:,1),1-x(:,2);1-x(:,1),x(:,2)];subplot(2,2,3);plot(x(:,1),x(:,2),+);x=rand(100,2);x=[x;1-x;x(:,1),1-x(:,2);1-x(:,1),x(:,2)];x=[x;x(:,2),x(:,1)];subplot(2,2,4);plot(x(:,1),x(:,2),+);
77.
Very different approaches for derandomization ?Control : instead of estimating E f(x)Choose g “looking like” f and estimate E (g-f)(x)Then E f = E g +E(g-f) is much betterTroubles:You need a good gYou must be able of evaluating Eg
78.
Very different approaches for derandomization ?Pi-estimation : instead of estimating E f(x)Look for y with density ≃(f)d(x)Then E f(x) = E f(y) d(x)/d(y)==> Variance is much betterTroubles:You have to generate yYou have to know (f)
79.
Very different approaches for derandomization ?Stratification (jittering) :Instead of generating n points i.i.dGeneratek points in stratum 1k points in stratum 2...k points in stratum mwith m.k=n ==> more stable ==>depends on the choice of strata
82.
Summary on MC improvements ?In many books you will read that quasi-randompoints are great.Remember that people who spend their lifestudying quasi-random numbers will rarelyconclude that all this was a bit useless.Sometimes its really good.Sometimes its similar to random.Modern Quasi-Monte-Carlo methods(randomized) are usually at least as good asrandom methods ==> no risk.
83.
Summary on MC improvements ?Carefully designing the model (from data) isoften more important than the randomization.Typically, neglecting dependencies is often adisaster.Yet, there are cases in which improved MC arethe key.Remarks on random search: dispersion muchbetter than discrepancy...
84.
Biblio (almost all on google)“Pi-estimation” books for stratification, symmetries, ...Owen, A.B. "Quasi-Monte Carlo Sampling", A Chapter onQMC for a SIGGRAPH 2003 course.Fred J. Hickernell, A generalized discrepancy andquadrature error bound, 1998B. Tuffin, On the Use of low-Discrepancy sequencesin Monte-Carlo methods, 1996Matousek, Geometric Discrepancy (book 99) these slides : http://www.lri.fr/~teytaud/btr2.pdf or http://www.lri.fr/~teytaud/btr2.ppt
Be the first to comment