• Like
  • Save
Introduction of stan
Upcoming SlideShare
Loading in...5
×
 

Introduction of stan

on

  • 4,581 views

 

Statistics

Views

Total Views
4,581
Views on SlideShare
1,952
Embed Views
2,629

Actions

Likes
11
Downloads
3
Comments
0

8 Embeds 2,629

http://d.hatena.ne.jp 2585
http://cloud.feedly.com 15
https://twitter.com 14
http://cache.yahoofs.jp 8
http://feedly.com 3
http://digg.com 2
https://www.google.co.jp 1
http://www.google.co.jp 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Introduction of stan Introduction of stan Presentation Transcript

    • Introduction of Stan @Teito Nakagawa #TokyoBUGS 1st 29 September 2013
    • INDEX • Motivation • What Is Stan? • How to Install it(Windows). • Grammer of Stan • Rat Data Model • Execution from Command Line • Execution from R:RStan • Reference
    • INDEX • Motivation • What Is Stan? • How to Install it(Windows). • Grammer of Stan • Rat Data Model • Execution from Command Line • Execution from R:RStan • Reference
    • Motivation As an analyst, I’m using… SMALL DATA census report deficit data
    • Motivation But a requirement is BIG. I must make a model. I must tell many things.
    • Motivation That’s the reason that I start to learn BUGS. BUT IT TAKES MUCH TIME
    • Motivation So, I start to learn Stan.
    • INDEX • Motivation • What Is Stan? • How to Install it(Windows). • Grammer of Stan • Rat Data Model • Execution from Command Line • Execution from R:RStan • Reference
    • What Is Stan? • What Is Stan? • Who Develop Stan? • Sample Code of Stan • Execution of Stan
    • What Is Stan? • “Stan is a package for obtaining Bayesian inference using the No-U-Turn sampler, a variant of Hamiltonian Monte Carlo.”(Official Site http://mc-stan.org/) – Similar to BUGS but more Procedural – Still updating – Fast:Compile to Execution File – Easy to use:Having R Interface – First Converge:Hamilton Monte Carlo and NUTS
    • Who Develop Stan? • Andrew Gelman, his stuffs, Jiqiang Guo and Marcus Brubaker Photo Photo Photo
    • Sample Code of Stan – Similar to BUGS but more Procedural # http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/Vol1.pdf # Page 3: Rats data { int<lower=0> N; int<lower=0> T; real x[T]; real y[N,T]; real xbar; } ... model { mu_alpha ~ normal(0, 100); mu_beta ~ normal(0, 100); sigmasq_y ~ inv_gamma(0.001, 0.001); From https://github.com/stan-dev/stan/tree/master/src/models/bugs_examples/vol1/rats
    • Execution of Stan – Fast:Compile to Execution File 1. stanc:translating the Stan program to C++ 2. make:compiling the resulting C++ to an executable 3. exe:Running the stan program. In Detail, Discuss in later >¥bin¥stanc --name=rats --o=rats.cpp .¥rats.stan >make src/models/bugs_examples/vol1/rats/rats >.¥rats --data=rats.data.R --init=rats.init.R
    • INDEX • Motivation • What Is Stan? • How to Install it(Windows). • Grammer of Stan • Rat Data Model • Execution from Command Line • Execution from R:RStan • Reference
    • How to Install it(Windows). 1. Environment 2. Install rtools 3. Install Rstan 4. Install stan 5. Build Stan
    • 1.Environment • I tested following model executions and install at my PC. •Windows 8 64bit •Intel(R) Core(TM) i7-2600 CPU 3.4GHZ •4core •8thread •12.0 GB memory •R 3.0.1 •Rtools 3.1 •Stan 1.3.0 •RStan1.3.0
    • 2.Install Rtools • Rtools is a collectionof resources for building packages for R under Microsoft Windows • g++ is installed by Rtools. • Download the installer and execute it. – http://cran.r-project.org/bin/windows/Rtools/ • You shall check install notice of official site but in most cases you can install it with just clicking “next” . Installation screen shot
    • 3.Install RStan • Rstan is a library for using Stan from R. • It is not registered at CRAN. • You can install it just doing following script from R. – The script was a modified script originally written in https://code.google.com/p/stan/wiki/RStanGettin gStarted#Install_Rstan
    • 3.Install RStan #additional package instllation install.packages('inline') install.packages('Rcpp') #check to use rcpp:if it works, then it is printed “hello world” library(inline) library(Rcpp) src <- ' std::vector<std::string> s; s.push_back("hello"); s.push_back("world"); return Rcpp::wrap(s);‘ hellofun <- cxxfunction(body = src, includes = '', plugin = 'Rcpp', verbose = FALSE) cat(hellofun(), '¥n') #rstan instllation Sys.setenv(R_MAKEVARS_USER='') options(repos = c(getOption("repos"), rstan = "http://wiki.stan.googlecode.com/git/R")) install.packages('rstan', type = 'source') #load rstan library(rstan)
    • 4.Install Stan To use Stan from command line, we can install stan itself by following step. 1. Download tar file stan-src-1.m.p.tgz – Downloading Site: https://code.google.com/p/stan/downloads/list 2. Just unzip the above file in Documents directory following command – tar has been already installed in Windows if Rtools has been installed. > tar --no-same-owner -xzf stan-src-1.m.p.tgz
    • 5.Build Stan Bulid stan at a once after installing Stan. 1. Make the library 2. Make the model parser and code generator *<stan-home> is the directory which is generated by the previous tar command. >cd <stan-home> >make bin libstan.a >cd <stan-home> >make bin/stanc <stan-home>/bin as a result
    • INDEX • Motivation • What Is Stan? • How to Install it(Windows). • Grammer of Stan • Rat Data Model • Execution from Command Line • Execution from R:RStan • Reference
    • Grammer of Stan 1. Grammer of Stan 2. Blocks 3. DataTypes 4. Scope of Variables
    • Stan program … • Stan Program defines a statistical model through conditional probability. • Stan Program consists of variable type declarations and statements. • Stan Program has specific blocks. • Stan Program can deal with various variable types. • Stan Program is different from BUGS.
    • Stan Program consits of variable type declarations and statements. data { int<lower=0> N; int<lower=0> T; real x[T]; real y[N,T]; real xbar; } transformed data { real x_minus_xbar[T]; real y_linear[N*T]; for (t in 1:T) x_minus_xbar[t] <- x[t] - xbar; … rats_vec.stan block block Variable type declaration defines variable Statement Assingnments, Sampling Loop, Condition
    • Stan Program has specific blocks. • Skeletetal Stan Program • The order must be kept. • Blocks are optional except model block data { ... declarations ... } transformed data { ... declarations ... statements ... } parameters { ... declarations ... } transformed parameters { ... declarations ... statements ... } model { ... declarations ... statements ... } generated quantities { ... declarations ... statements ... } Order Scope
    • Stan Program has specific blocks. • Given input data. • Executed first and load Data • Transform variables for a convenience Transformed data • Result output parameter • Updated on iterations. Parameters
    • Stan Program has specific blocks. • Transform parameters for a convenience Transformed Parameters • Model itself, Write this based on what you want to describe.Model • Generate Quantitie for monitoring convergence. Generated Quantities
    • Stan Program can deal with various variable types. From http://stan.googlecode.com/files/stan- reference-1.3.0.pdf
    • Stan Program can deal with various variable types. • Scalar – Int is 32bit scalar integer. Upper and lower constraints are allowed. e.g. int N; int<lower=0,upper=1> cond; – Real is 64bit scalar numeric value. e.g. real<lower=0> sigma; real<lower=-1,upper=1> rho; • Vector Data Types – Real value is only allowed. – Vector is any types of vector data. e.g. vector<lower=0>[3] u; – UnitSimplex:for categorical or multinominal data, a vector contains non-negative values added to 1 e.g. simplex[5] theta;
    • Stan Program can deal with various variable types. • Vector Data Types – Unit Vector: vector with a norm of one. e.g. unit_vector[5] theta; – Ordered Vector:Ordered vectors are most often employed as cut points in ordered logistic regression models e.g. ordered[5] c; – Positive, Ordered Vector: e.g. positive_ordered[5] d; – Row Vector:It is different from vector.Stan distinguish between row and column e.g. row_vector<lower=-1,upper=1>[10] u;
    • Stan Program can deal with various variable types. • Matrix Data Types – Matrix:Matrix e.g. matrix<upper=0>[3,4] B; – Correlation Matrices:From -1 to 1, values are allowed. e.g. corr_matrix[3] Sigma; – Covariance Matrices: symmetric and positive definite. e.g. cov_matrix[K] Omega; • Array Data Types – Arrays are declared by enclosing the dimensions in square brackets following the name of the variable. – An array’s elements may be any of the basic data types. e.g. cov_matrix[5] mu[2,3,4];
    • INDEX • Motivation • What Is Stan? • How to Install it(Windows). • Grammer of Stan • Rat Data Model • Execution from Command Line • Execution from R:RStan • Reference
    • Rats Data Model 1. Rats Data 2. Rats Model
    • Rats Data • Rats data and its model are contained WinBUGS example volume I. (http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/Vol1.pdf) • Original article is Gelfand et al (1990) • Weights of young rats measured by weekly for hierarchical model • Rows:individual rats (N=30) • Columns:day(M=5) From http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/Vol1.pdf
    • Rats Model • Hierarchical Regression Model considering individual and time differences. ondistributiNormalofprecision idayofeffectiindividualofeffect daysofmedianxdaysxdataobservedY ii barj : :: )22(:::    From http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/Vol1.pdf
    • Rats Model model { mu_alpha ~ normal(0, 100); mu_beta ~ normal(0, 100); sigmasq_y ~ inv_gamma(0.001, 0.001); sigmasq_alpha ~ inv_gamma(0.001, 0.001); sigmasq_beta ~ inv_gamma(0.001, 0.001); alpha ~ normal(mu_alpha, sigma_alpha); // vectorized beta ~ normal(mu_beta, sigma_beta); // vectorized for (n in 1:N) for (t in 1:T) y[n,t] ~ normal(alpha[n] + beta[n] * (x[t] - xbar), sigma_y); }
    • INDEX • Motivation • What Is Stan? • How to Install it(Windows). • Grammer of Stan • Rat Data Model • Execution from Command Line • Execution from R:RStan • Reference
    • Execution from CommandLine • Execution of Stan • stanc • make • execution
    • Execution of Stan 1. stanc:translating the Stan program to C++ 2. make:compiling the resulting C++ to an executable 3. exe:Running the stan program. >¥bin¥stanc --name=rats --o=rats.cpp .¥rats.stan >make src/models/bugs_examples/vol1/rats/rats >.¥rats --data=rats.data.R --init=rats.init.R
    • stanc • The model translation program stanc changes .stan file to .cpp file. USAGE: stanc [options] <model_file> --name=<string> Model name (default = "$model_filename_model") --o=<file> Output file for generated C++ code (default = "$name.cpp") >¥bin¥stanc --name=rats --o=rats.cpp .¥rats.stan
    • make • We can compile the generated .cpp file by make command >make src/models/bugs_examples/vol1/rats/rats
    • execution • We can execute stan sampler by executing the generated .exe file USAGE: .¥src¥models¥bugs_examples¥vol1¥rats¥rats [options] OPTIONS: --data=<file>:Read data from specified dump-format file (required if model declares data) --init=<file>:Use initial values from specified file or zero values if <file>=0 (default is random initialization) --samples=<file> File into which samples are written(default = samples.csv) --append_samples Append samples to existing file if it exists(does not write header --seed=<int> Random number generation seed (default = randomly generated from time) --chain_id=<int> Markov chain identifier (default = 1) --iter=<+int> Total number of iterations, including warmup(default = 2000) --thin=<+int> Period between saved samples after warm up(default = max(1, floor(iter - warmup) / 1000)) --refresh=<int> Period between samples updating progress report print (0 for no printing) (default = max(1,iter/200))) >.¥rats --data=rats.data.R --init=rats.init.R
    • INDEX • Motivation • What Is Stan? • How to Install it(Windows). • Grammer of Stan • Rat Data Model • Execution from Command Line • Execution from R • Reference
    • Execution from R • Rstan • Execution from R • plot(stanfit) • traceplot(stanfit) • fit using previous model • parallel execution from R
    • RStan • Rstan is a interface to Stan – Compiling Stan code, c++ code and execute from RStan – Visualization function of Stan Result(stanfit class) Stan code C++ code exe stanc() sampling()stan_model() S4:stanfit plot() traceplot() extract() Architecture of Rstan stan () stan ()
    • Execution from R #set to dir which contains source file STAN_HOME<-<STAN_HOME> dirpath<-paste0(STAN_HOME, "/include/stansrc/models/bugs_examples/vol1/rats") #load data to list:dat source(paste0(dirpath, "/rats.data.R")) dat<-list(y=y, x=x, xbar=xbar, N=N, T=T) #fit1:to simulate the model as one liner fit1 <- stan(file = paste0(dirpath, "/rats.stan"), data = dat, iter = 1000, chains = 4) #fit2:to simulate the model step by step #translating from stan code to c++ code rt <- stanc(file = paste0(dirpath, "/rats.stan"), model_name="stan", verbose=TRUE) #compile c++ code for model sm <- stan_model(stanc_ret = rt, verbose = FALSE) #execute model simulation fit2 <- sampling(sm, data = dat, chains = 4, iter=1000)
    • plot(stanfit) We can check a value and R-hat each paramters
    • traceplot(stanfit) We can trace each chains.
    • fit using previous model Once a model is fitted, we can use the fitted result as an input to fit the model with other data or settings. This would save us time of compiling the C++ code for the model https://code.google.com/p/stan/wiki/RStanGettingStarted #fit again using the previous fit result fit3<-stan(fit=fit1, data = dat, iter = 400, chains = 4)
    • Parallel Execution from R #parallel processing of library(doSNOW) library(foreach) cl<-makeCluster(4) #change the 2 to your number of CPU cores registerDoSNOW(cl) #parallel processing each chain of stan sflist1<-foreach(i=1:10,.packages='rstan') %dopar% { stan(fit = fit1, data=dat, chains = 1, chain_id = i, refresh = -1) } #merging the chains f3<-sflist2stanfit(sflist1)
    • Parallel ExecutionPerformance #Parralel Processing timecalc<-matrix(0, nrow=4, ncol=7) iter<-c(1000, 3000, 5000, 10000, 30000, 50000, 100000) numproc<-c(1,2,4,8) #Single Processing for(i in 1:7){ cat("p:", 1,", iter:", iter[i], "¥r¥n") t<-proc.time() #------------------------------------------------- a<-stan(fit = fit1, data=dat, chains = 8, refresh = -1, iter=iter[i]) #------------------------------------------------- timecalc[1,i]<-(proc.time()-t)["elapsed"] } #Parallel Processing for(p in 2:4){ for(i in 1:7){ cat("proc:",numproc[p],"iter:", iter[i], "¥r¥n") t<-proc.time() #------------------------------------------------- #parallel processing of library(doSNOW) library(foreach) cl<-makeCluster(numproc[p]) registerDoSNOW(cl) #parallel processing each chain of stan sflist1<-foreach(k=1:8,.packages='rstan') %dopar% { stan(fit = fit1, data=dat, chains = 1, chain_id = k, refresh = -1, iter=iter[i]) } #merging each chains f3<-sflist2stanfit(sflist1) #------------------------------------------------- timecalc[p,i]<-(proc.time()-t)["elapsed"] } }
    • Performance result 4cluster is BEST on My PC.
    • INDEX • Motivation • What Is Stan? • How to Install it(Windows). • Grammer of Stan • Rat Data Model • Execution from Command Line • Execution from R • Reference
    • Reference • Reference – User‘s Guide and Reference Manual:Grammer, Diffrence between BUGS and Get-Started (http://stan.googlecode.com/files/stan-reference- 1.3.0.pdf) – Official Site(http://mc-stan.org/)
    • End Of Slide Stanislaw MarcinUlam (13 April 1909 – 13 May 1984) http://en.wikipedia.org/wiki/Stanislaw_Ulam