Have you met Julia?
Tommaso Rigon
May 2, 2016
Which software are we more likely to use?
A non comprehensive list
In statistics many programming languages can be used. One could use:
1 C / Fortran.
1 Low-level programming languages.
2 General purpose languages, but very efficient for numeric computing.
2 Python
1 Open source and general purpose language.
2 Widespread in industry and among computer scientists.
3 Matlab
1 Closed source (!)
2 Optimized for numerical computing, fast and clear linear algebra.
4 R
1 Open source: a lot of additional statistical packages are available.
2 R is developed by statisticians for statisticians.
3 Widely spread among academics.
A typical workflow in R
Suppose we are going to analyze a real dataset:
1 Data managemenent. We need to read the data in R, from a textual file or
from a database. We also need to arrange them in a convenient form (the
dplyr package is awesome!).
2 Data visualization. Visualize the data (See package ggplot2).
3 Statistical Modeling. First analyses are done using available packages.
4 Developing. We need to implement our new methodology.
5 Reporting. We need to communicate effectively our results, usually with
tables and graphs. (See Markdown and Knitr projects).
1 The script is quickly developed in R, but it is often (very) slow. Sometimes
this precludes the use of the whole dataset.
2 The slow parts need to be written in C or Fortran and then interfaced to R
(package Rcpp helps!)
R is great but...
A vectorized language
1 The language encourages operating on the whole object (i.e. vectorized
code). However, some tasks (e.g. MCMC) can not be easily vectorized.
2 Unvectorized R code (for and while loops) is slow.
A nested for cycle compared to the same vectorized operation
system.time(for (i in 1:10^4) for (j in 1:10^3) runif(1))
# user system elapsed
# 17.689 0.000 17.528
# user system elapsed
# 0.410 0.000 0.424
What is Julia?
Julia according to its developer
Julia is a high-level, high-performance dynamic programming language for
technical computing, with syntax that is familiar to users of other technical
computing environments.
Julia in a nutshell
Julia was released recently, in 2012, by Jeff Bezanson, Stefan Karpinski, Viral
Shah, Alan Edelman. The last stable version is the 0.4.5.
1 Open-source, with MIT liberal license.
2 High-level and familiar. It can work on the level of vectors, matrices, arrays.
The syntax is similar to Matlab and R and easy to read without a huge effort.
3 Technical computing. It is specifically optimized for scientific computing,
not necessarily statistics.
Why Julia?
Julia in a nutshell - Technical details
1 Julia is REPL (read–eval–print loop). Exactly as in R, it is possible to
interact with the software, facilitating debugging, testing and developing.
Conversely, languages like C are usually ECRL (edit-compile-run loop).
2 Based on a sophisticated compiler which is JIT (Just in time) and LLVM
3 Julia is fast. Its compiler is designed to approach the speed of C.
4 No need to vectorize code for performance; devectorized code is fast.
5 Efficient support for Unicode, including but not limited to UTF-8. It means,
for instance, that
µ = 10; σ = 5
are legitimate assignments in Julia.
6 Designed for parallelism and distributed computation
Packages available
Julia for statistics
These are some useful packages for statistical computing
1 Distributions. Probability distributions and associated functions (similar but
not equal to the d-p-q-r system in R).
2 DataFrames. For handling datasets, having eventually missing values.
3 GLM. Generalized linear models, including linear model.
4 StatBase. Basic descriptive function: sample mean, median, sample
5 ...and many others!
Julia and R integration
1 rjulia. An R package that calls Julia functions and import / export objects
between the two environments. Currently under development, available only
on GitHub.
2 RCall. As the name itself suggests, it calls R from Julia.
Speeding up computations
Do we really need such fast and powerful tools?
In many cases, we do not. Suppose our implementation is bad written and
inefficient, but it takes about 1 second to be executed. Does it worth to improve
the code?
Where efficient computation are really necessary?
Just to mention some areas among others:
1 In almost any procedure applied to huge datasets (even linear models!)
2 In any procedures which involves Cross-Validation (Both Lasso and CART
use often CV for model selection).
3 In “boostrap like” procedures (bootstrap, bagged trees,...).
4 In Bayesian statistics, in approximating the posterior distribution through
simulations (e.g. MCMC, Importance sampling, ABC,...).
5 A combination of the previous.
Bootstrap example
Bootstrap example
What is the bootstrap? A (very) brief explanation
1 It is inferential technique, which (usually!) makes use of simulation. For
instance, in a frequentist framework, it can be used for assessing confidence
2 Let ˆθ(Y ) be an estimator of θ and Yi ∼ F i.i.d. random vectors, for
i = 1, . . . , n. The “true” c.d.f. F is replaced with an estimate ˆF. Then, we
simulate Y ∗r
from ˆF for r = 1, . . . , R and we get
1 , . . . , ˆθ∗
R , where ˆθ∗
r = ˆθ(Y ∗r
which is a bootstrap sample of the estimator.
3 The bootstrap sample can be used to make inference on θ. This is usually
the main goal, but now we are mainly interested in fastly simulating it.
Bootstrap example
Inference on the correlation coefficient
An example, using the “cars” dataset
I have considered the dataset cars available in R. Suppose that Yi = (Y1i , Y2i )
are i.i.d. I would like to make inference on the correlation coefficient
ˆρ =
ˆCov(Y1, Y2)
ˆVar(Y1) ˆVar(Y2)
using the so called non parametric boostrap, that is, ˆF is replaced by the empirical
distribution function. This operation can be vectorized and has been done both
in Julia and in R, for comparison.
In practice...
We need to “resample” the original data, with replacement. Then, we evaluate
the correlation coefficient for each bootstrap sample.
Bootstrap example
Listing 1: Bootstrap; R implementation
rho_boot <- function(R,dataset){
n <- NROW(dataset)
# Sampling the indexes
index <- matrix(sample(1:n,R*n,replace=TRUE),R,n)
# Bootstrap correlation estimate
apply(index,1,function(x) cor(dataset[x,1],dataset[x,2]) )
Listing 2: Bootstrap; Julia implementation
function rho_boot(R,dataset)
n = size(data)[1]
# Sampling the indexes
index = rand(1:n,n,R)
out = Array(Float64,R)
for i in 1:R
# Bootstrap correlation estimate
out[i] = cor(dataset[index[:,i],:])[1,2]
Bootstrap example
Performance - Milliseconds in log-scale
Naive_R Julia Boot_library Boot_library_cor2
Bootstrap example
Global Performance
And the winner is...
1 The R code is vectorized and therefore we expect a good performance.
2 Despite this, for this particular problem Julia ≈ 10 times faster than R. This
is true even if we use boot package.
Speeding up the R code
The bottleneck of the calculations is the R cor function. It is designed for the
evaluation of an entire correlation matrix. It also check for missing values before
performing the calculation. Therefore, we can easily improve the code defining the
function cor2. Now, Julia ≈ 5 times faster than R.
cor2 <- function(x,y) {
xbar <- x-mean(x)
ybar <- y-mean(y)
Bootstrap example
Bootstrap final result
0.6 0.7 0.8 0.9
Correlation coefficient
Tommaso Rigon Have you met Julia? May 2, 2016 14 / 25
Principal component analysis
Notation about PCA
Let yi = (yi1, . . . , yip) for i = 1, . . . , n, be i.i.d realizations from a random vector
having covariance matrix Σ. Let ˆΣ be the sample variance and R the related
correlation matrix. The spectral decomposition of R is denoted as follow
, Λ = Diag(λ1, . . . , λp), λ1 > λ2 > · · · > λp
The quantity interest
The quantity of interest is the the cumulative percentage of the "total variance"
explained by the first k principal components:
ˆτk =
i=1 λi
i=1 λi
Tommaso Rigon Have you met Julia? May 2, 2016 15 / 25
The iris dataset
A famous example
The iris dataset was considered just for illustrative purpose. We would like to
assess the variability of the quantity
ˆτ1 =
using a non parametric bootstrap approach. The quantity ˆτ1 is the relative
importance of the first principal component. Without the bootstrap, it would be
difficult to assess the variability of this estimate. Also, notice that we are not
assuming a specific parametric family of distribution for Y .
Tommaso Rigon Have you met Julia? May 2, 2016 16 / 25
function tau_est(data)
R = cor(data)
lambda = eigvals(R)
tau = (lambda/sum(lambda))[end] # Also lambda[end]/p is fine
function pca_boot(R,data)
n = size(data)[1]
index = rand(1:n,n,R)
out = Array(Float64,R)
for i in 1:R
out[i] = tau_est(data[index[:,i],:])
Tommaso Rigon Have you met Julia? May 2, 2016 17 / 25
Bootstrap final result
0.70 0.74 0.78
Explained variance
Tommaso Rigon Have you met Julia? May 2, 2016 18 / 25
A Bayesian logistic regression
The “shuttle” dataset
I have considered the famous “shuttle” dataset, having sample size n = 23. We
suppose the following Bayesian logistic regression:
Yi ∼ Bin(6, θi ), θi =
1 + e−ηi
, ηi = β0 + β1xi ,
where xi are a known constants and i = 1, . . . , n. Moreover let βj ∼ N(0, σ2
j = 1, 2, be the prior distributions and σ2
µ an hyperparameter.
MCMC posterior computation
I have approximated the posterior distribution of β | Y using a Metropolis
algorithm. I have used a multivariate Gaussian random walk as proposal
distribution, having covariance matrix equal to the observed information.
Tommaso Rigon Have you met Julia? May 2, 2016 19 / 25
First step: the log-posterior
Listing 3: Julia implementation
using Distributions
# Log-likelihood
function loglik(data::Matrix, beta::Vector)
eta = beta[1] + beta[2]*data[:,3]
theta = 1./(1 + exp(- eta))
sum(data[:,2].*eta) + sum(data[:,1].*log(1-theta))
# Log-posterior up to an additive constant
function lpost(data::Matrix, beta::Vector, sigma_mu::Float64)
norm = Normal(0,sigma_mu)
loglik(data,beta) + logpdf(norm,beta[1]) + logpdf(norm,beta[2])
Tommaso Rigon Have you met Julia? May 2, 2016 20 / 25
Metropolis Algorithm
Listing 4: Julia implementation
using Optim # For numerical optimization
using ForwardDiff # For numerical derivative
# Maximum likelihood estimate
beta_hat = optimize(x -> -loglik(data,x),[0.0, 0.0], method=:l_bfgs).minimum
# Observed information matrix
Sigma = inv(ForwardDiff.hessian(x -> -loglik(data,x), beta_hat))
Listing 5: Julia implementation
function Metropolis(R::Int64, Sigma::Matrix, sigma_mu::Float64,start::Vector)
out = zeros(R,2)
beta = start #Initialization
for r in 1:R
beta_star = rand(MvNormal(beta,Sigma)) # Proposal distribution
alpha = exp(lpost(data,beta_star,sigma_mu) - lpost(data,beta,sigma_mu))
if rand(1)[1] < alpha # ‘rand’ is a pseudo random from a Uniform
beta = copy(beta_star) # Copy if accepted
out[r,:] = beta
Tommaso Rigon Have you met Julia? May 2, 2016 21 / 25
Performance - Milliseconds in log-scale
Tommaso Rigon Have you met Julia? May 2, 2016 22 / 25
Global performance
Julia now really shines!
1 For this particular problem Julia ≈ 20 times faster than R. In fact, the for
loops is used extensively and there is no way to vectorize this operation.
2 Also, Julia ≈ 13 times faster than OpenBUGS. However, OpenBUGS does
not necessarily use our Gaussian random walk, but tries to select the “best”
way to do MCMC according to its own criteria. Therefore, a fair comparison
should take into account, at the very least, the autocorrelation of the
sampled chain.
3 Finally, Julia ≈ 10 times faster than STAN but, as for OpenBUGS, we should
be careful to make a direct comparison.
Tommaso Rigon Have you met Julia? May 2, 2016 23 / 25
Bayesian logistic final result
−5 0 5 10 15
Tommaso Rigon Have you met Julia? May 2, 2016 24 / 25
Tommaso Rigon Have you met Julia? May 2, 2016 25 / 25

