An opensource library to implement random forests in genomic contexts      Oscar González-Recio     Juanjo Bazán       Sel...
The Problem
The ProblemMassive amount of information from high troughput genotyping platforms.
The ProblemMassive amount of information from high troughput genotyping platforms.Need to extract knowledge from large, no...
The ProblemMassive amount of information from high troughput genotyping platforms.Need to extract knowledge from large, no...
Why Random Forest?
Why Random Forest?Using Machine Learning techniques we can extract hidden relationships thatexist in these huge volumes of...
Why Random Forest?Using Machine Learning techniques we can extract hidden relationships thatexist in these huge volumes of...
Why Random Forest?Using Machine Learning techniques we can extract hidden relationships thatexist in these huge volumes of...
Why Random Forest?Using Machine Learning techniques we can extract hidden relationships thatexist in these huge volumes of...
The Algorithm
The AlgorithmEnsemble methods:    - Combination of diferent methods (usually simple models).    - They have very good pred...
The AlgorithmPerform bootstrap on data: Ψ* = (y, X)Build a CART ( fi (y, X) = ht (x) ) using only mtry proportion of SNPs ...
The AlgorithmClassifcation and regression trees:Let Ψ = (y, X) be a set of data, withy = vector of phenotypes (response va...
The Algorithm
Nimbus library
Nimbus libraryWritten in Ruby            Open source programming language            Syntax focused on simplicity         ...
Nimbus libraryHow to install:         > gem install nimbusPrerequisites:        Ruby and Rubygems (default library manager...
Nimbus libraryHow to run:        > nimbusConfguration:        Via confg.yml fle
Nimbus libraryconfg.yml fle:
Nimbus libraryInput fles:                   training                    testing
Nimbus libraryFeatures, use cases:    Training of a prediction forest         Using a training set of individuals         ...
OutputsRandom Forest fle   In standard YAML format
OutputsPredictions for the training sample
OutputsPredictions for the testing sample
OutputsSNP importances
More info:Nimbus website:               www.nimbusgem.orgSource code:               www.github.com/xuanxu/nimbusReport bug...
Thank you!
Questions?
Upcoming SlideShare
Loading in …5
×

Nimbus

884 views

Published on

Nimbus is a ruby gem implementing random forest algorithm for genomic selection contexts.
http://www.nimbusgem.org

Presentation from the EAAP 2011 conference in Stavanger, Norway.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Nimbus

  1. 1. An opensource library to implement random forests in genomic contexts Oscar González-Recio Juanjo Bazán Selma Forni
  2. 2. The Problem
  3. 3. The ProblemMassive amount of information from high troughput genotyping platforms.
  4. 4. The ProblemMassive amount of information from high troughput genotyping platforms.Need to extract knowledge from large, noisy, redundant, missing and fuzzy data.
  5. 5. The ProblemMassive amount of information from high troughput genotyping platforms.Need to extract knowledge from large, noisy, redundant, missing and fuzzy data.Massive amount of information consumes the attention of its recipients.We need to allocate that attention efciently.
  6. 6. Why Random Forest?
  7. 7. Why Random Forest?Using Machine Learning techniques we can extract hidden relationships thatexist in these huge volumes of data and do not follow a particular parametricdesign.
  8. 8. Why Random Forest?Using Machine Learning techniques we can extract hidden relationships thatexist in these huge volumes of data and do not follow a particular parametricdesign.Random Forest have desirable statistical properties.
  9. 9. Why Random Forest?Using Machine Learning techniques we can extract hidden relationships thatexist in these huge volumes of data and do not follow a particular parametricdesign.Random Forest have desirable statistical properties.Random Forest scales well computationally.
  10. 10. Why Random Forest?Using Machine Learning techniques we can extract hidden relationships thatexist in these huge volumes of data and do not follow a particular parametricdesign.Random Forest have desirable statistical properties.Random Forest scales well computationally.Random Forest performs extremely well in a variety of possible complexdomains (Breiman, 2001; Gonzalez-Recio & Forni, 2011).
  11. 11. The Algorithm
  12. 12. The AlgorithmEnsemble methods: - Combination of diferent methods (usually simple models). - They have very good predictive ability because use additivity of models performances.Based on Classifcation And Regression Trees (CART).Use Randomization and Bagging.Performs Feature Subset Selection.Convenient for classifcation problems.Fast computation.Simple interpretation of results for human minds.Previous work in genome-wide prediction (Gonzalez-Recio and Forni, 2011)
  13. 13. The AlgorithmPerform bootstrap on data: Ψ* = (y, X)Build a CART ( fi (y, X) = ht (x) ) using only mtry proportion of SNPs in each node.Repeat M times to reduce residuals by a factor of M.Average estimates c0 = μ ; ci = 1/M
  14. 14. The AlgorithmClassifcation and regression trees:Let Ψ = (y, X) be a set of data, withy = vector of phenotypes (response variables)X = (x1, x2) = matrix of featuresy1 x11 x12… … ...yi xi1 xi2… … ...yn xn1 xn2
  15. 15. The Algorithm
  16. 16. Nimbus library
  17. 17. Nimbus libraryWritten in Ruby Open source programming language Syntax focused on simplicity Natural to read and easy to write www.ruby-lang.org
  18. 18. Nimbus libraryHow to install: > gem install nimbusPrerequisites: Ruby and Rubygems (default library manager) installed in the system
  19. 19. Nimbus libraryHow to run: > nimbusConfguration: Via confg.yml fle
  20. 20. Nimbus libraryconfg.yml fle:
  21. 21. Nimbus libraryInput fles: training testing
  22. 22. Nimbus libraryFeatures, use cases: Training of a prediction forest Using a training set of individuals Nimbus creates a reutilizable forest Generalization error are computed for every tree in the forest Nimbus calculates SNP importances Genomic Prediction of a testing sample Using a new trained forest Using a custom/reused forest, specif ed via conf g.yml i i
  23. 23. OutputsRandom Forest fle In standard YAML format
  24. 24. OutputsPredictions for the training sample
  25. 25. OutputsPredictions for the testing sample
  26. 26. OutputsSNP importances
  27. 27. More info:Nimbus website: www.nimbusgem.orgSource code: www.github.com/xuanxu/nimbusReport bugs/request features: www.github.com/xuanxu/nimbus/issues
  28. 28. Thank you!
  29. 29. Questions?

×