Nimbus

An opensource library to implement random forests in genomic contexts

Oscar González-Recio Juanjo Bazán Selma Forni

The Problem
Massive amount of information from high troughput genotyping platforms.

The Problem

Need to extract knowledge from large, noisy, redundant, missing and fuzzy data.

The Problem

Need to extract knowledge from large, noisy, redundant, missing and fuzzy data.

Massive amount of information consumes the attention of its recipients.
We need to allocate that attention efciently.

Why Random Forest?
Using Machine Learning techniques we can extract hidden relationships that
exist in these huge volumes of data and do not follow a particular parametric
design.

Why Random Forest?
design.

Random Forest have desirable statistical properties.

Why Random Forest?
design.


Random Forest scales well computationally.

Why Random Forest?
design.


Random Forest scales well computationally.

Random Forest performs extremely well in a variety of possible complex
domains (Breiman, 2001; Gonzalez-Recio & Forni, 2011).

The Algorithm
Ensemble methods:
- Combination of diferent methods (usually simple models).
- They have very good predictive ability because use additivity of models performances.
Based on Classifcation And Regression Trees (CART).

Use Randomization and Bagging.

Performs Feature Subset Selection.

Convenient for classifcation problems.

Fast computation.

Simple interpretation of results for human minds.

Previous work in genome-wide prediction (Gonzalez-Recio and Forni, 2011)

The Algorithm
Perform bootstrap on data: Ψ* = (y, X)

Build a CART ( fi (y, X) = ht (x) ) using only mtry proportion of SNPs in each node.

Repeat M times to reduce residuals by a factor of M.

Average estimates c0 = μ ; ci = 1/M

The Algorithm
Classifcation and regression trees:

Let Ψ = (y, X) be a set of data, with
y = vector of phenotypes (response variables)

X = (x1, x2) = matrix of features

y1 x11 x12
… … ...
yi xi1 xi2
… … ...
yn xn1 xn2

Nimbus library
Written in Ruby

Open source programming language

Syntax focused on simplicity

Natural to read and easy to write

www.ruby-lang.org

Nimbus library
How to install:

> gem install nimbus

Prerequisites:

Ruby and Rubygems (default library manager) installed in the system

Nimbus library
How to run:

> nimbus

Confguration:

Via confg.yml fle

Nimbus library
Input fles:

training

testing

Nimbus library
Features, use cases:

Training of a prediction forest
Using a training set of individuals
Nimbus creates a reutilizable forest
Generalization error are computed for every tree in the forest
Nimbus calculates SNP importances

Genomic Prediction of a testing sample
Using a new trained forest
Using a custom/reused forest, specif ed via conf g.yml
i i

Outputs
Random Forest fle

In standard YAML format

Outputs
Predictions for the training sample

Outputs
Predictions for the testing sample

More info:
Nimbus website:
www.nimbusgem.org
Source code:
www.github.com/xuanxu/nimbus
Report bugs/request features:
www.github.com/xuanxu/nimbus/issues

Nimbus

Recommended

Recommended

More Related Content

Similar to Nimbus

Similar to Nimbus (20)

More from Juanjo Bazán

More from Juanjo Bazán (6)

Recently uploaded

Recently uploaded (20)

Nimbus