SlideShare a Scribd company logo
1 of 29
Download to read offline
An opensource library to implement random forests in genomic contexts




      Oscar González-Recio     Juanjo Bazán       Selma Forni
The Problem
The Problem
Massive amount of information from high troughput genotyping platforms.
The Problem
Massive amount of information from high troughput genotyping platforms.

Need to extract knowledge from large, noisy, redundant, missing and fuzzy data.
The Problem
Massive amount of information from high troughput genotyping platforms.

Need to extract knowledge from large, noisy, redundant, missing and fuzzy data.

Massive amount of information consumes the attention of its recipients.
We need to allocate that attention efciently.
Why Random Forest?
Why Random Forest?
Using Machine Learning techniques we can extract hidden relationships that
exist in these huge volumes of data and do not follow a particular parametric
design.
Why Random Forest?
Using Machine Learning techniques we can extract hidden relationships that
exist in these huge volumes of data and do not follow a particular parametric
design.

Random Forest have desirable statistical properties.
Why Random Forest?
Using Machine Learning techniques we can extract hidden relationships that
exist in these huge volumes of data and do not follow a particular parametric
design.

Random Forest have desirable statistical properties.


Random Forest scales well computationally.
Why Random Forest?
Using Machine Learning techniques we can extract hidden relationships that
exist in these huge volumes of data and do not follow a particular parametric
design.

Random Forest have desirable statistical properties.


Random Forest scales well computationally.


Random Forest performs extremely well in a variety of possible complex
domains (Breiman, 2001; Gonzalez-Recio & Forni, 2011).
The Algorithm
The Algorithm
Ensemble methods:
    - Combination of diferent methods (usually simple models).
    - They have very good predictive ability because use additivity of models performances.
Based on Classifcation And Regression Trees (CART).

Use Randomization and Bagging.

Performs Feature Subset Selection.

Convenient for classifcation problems.

Fast computation.

Simple interpretation of results for human minds.

Previous work in genome-wide prediction (Gonzalez-Recio and Forni, 2011)
The Algorithm
Perform bootstrap on data: Ψ* = (y, X)




Build a CART ( fi (y, X) = ht (x) ) using only mtry proportion of SNPs in each node.



Repeat M times to reduce residuals by a factor of M.


Average estimates c0 = μ ; ci = 1/M
The Algorithm
Classifcation and regression trees:



Let Ψ = (y, X) be a set of data, with
y = vector of phenotypes (response variables)

X = (x1, x2) = matrix of features

y1 x11 x12
… … ...
yi xi1 xi2
… … ...
yn xn1 xn2
The Algorithm
Nimbus library
Nimbus library
Written in Ruby


            Open source programming language

            Syntax focused on simplicity

            Natural to read and easy to write

            www.ruby-lang.org
Nimbus library
How to install:


         > gem install nimbus


Prerequisites:

        Ruby and Rubygems (default library manager) installed in the system
Nimbus library
How to run:


        > nimbus


Confguration:


        Via confg.yml fle
Nimbus library
confg.yml fle:
Nimbus library
Input fles:

                   training




                    testing
Nimbus library
Features, use cases:


    Training of a prediction forest
         Using a training set of individuals
         Nimbus creates a reutilizable forest
         Generalization error are computed for every tree in the forest
         Nimbus calculates SNP importances


    Genomic Prediction of a testing sample
         Using a new trained forest
         Using a custom/reused forest, specif ed via conf g.yml
                                            i           i
Outputs
Random Forest fle


   In standard YAML format
Outputs
Predictions for the training sample
Outputs
Predictions for the testing sample
Outputs
SNP importances
More info:
Nimbus website:
               www.nimbusgem.org
Source code:
               www.github.com/xuanxu/nimbus
Report bugs/request features:
               www.github.com/xuanxu/nimbus/issues
Thank you!
Questions?

More Related Content

Similar to Nimbus

32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..butest
 
Classification of chestnuts with feature selection by noise resilient classif...
Classification of chestnuts with feature selection by noise resilient classif...Classification of chestnuts with feature selection by noise resilient classif...
Classification of chestnuts with feature selection by noise resilient classif...Elena Roglia
 
GNA 13552928 deep learning for GAN a.ppt
GNA 13552928 deep learning for GAN a.pptGNA 13552928 deep learning for GAN a.ppt
GNA 13552928 deep learning for GAN a.pptManiMaran230751
 
2013 talk at TGAC, November 4
2013 talk at TGAC, November 42013 talk at TGAC, November 4
2013 talk at TGAC, November 4c.titus.brown
 
Creating smaller, faster, production-ready mobile machine learning models.
Creating smaller, faster, production-ready mobile machine learning models.Creating smaller, faster, production-ready mobile machine learning models.
Creating smaller, faster, production-ready mobile machine learning models.Jameson Toole
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspectiveAnirban Santara
 
Introduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at GalvanizeIntroduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at GalvanizeIntel Nervana
 
Secrets of Supercomputing
Secrets of SupercomputingSecrets of Supercomputing
Secrets of SupercomputingMarcus Vannini
 
Deep Learning and Watson Studio
Deep Learning and Watson StudioDeep Learning and Watson Studio
Deep Learning and Watson StudioSasha Lazarevic
 
An Introduction to Random Forest and linear regression algorithms
An Introduction to Random Forest and linear regression algorithmsAn Introduction to Random Forest and linear regression algorithms
An Introduction to Random Forest and linear regression algorithmsShouvic Banik0139
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.pptyang947066
 
13 random forest
13 random forest13 random forest
13 random forestVishal Dutt
 
what is Random-Forest-Machine-Learning.pptx
what is Random-Forest-Machine-Learning.pptxwhat is Random-Forest-Machine-Learning.pptx
what is Random-Forest-Machine-Learning.pptxAnupama Kate
 
Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)SungminYou
 

Similar to Nimbus (20)

32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..
 
Classification of chestnuts with feature selection by noise resilient classif...
Classification of chestnuts with feature selection by noise resilient classif...Classification of chestnuts with feature selection by noise resilient classif...
Classification of chestnuts with feature selection by noise resilient classif...
 
GNA 13552928 deep learning for GAN a.ppt
GNA 13552928 deep learning for GAN a.pptGNA 13552928 deep learning for GAN a.ppt
GNA 13552928 deep learning for GAN a.ppt
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
 
2013 talk at TGAC, November 4
2013 talk at TGAC, November 42013 talk at TGAC, November 4
2013 talk at TGAC, November 4
 
Occurrence Prediction_NLP
Occurrence Prediction_NLPOccurrence Prediction_NLP
Occurrence Prediction_NLP
 
Creating smaller, faster, production-ready mobile machine learning models.
Creating smaller, faster, production-ready mobile machine learning models.Creating smaller, faster, production-ready mobile machine learning models.
Creating smaller, faster, production-ready mobile machine learning models.
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective
 
presentation.ppt
presentation.pptpresentation.ppt
presentation.ppt
 
Introduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at GalvanizeIntroduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at Galvanize
 
Random Forest
Random ForestRandom Forest
Random Forest
 
2014 nci-edrn
2014 nci-edrn2014 nci-edrn
2014 nci-edrn
 
Secrets of Supercomputing
Secrets of SupercomputingSecrets of Supercomputing
Secrets of Supercomputing
 
Deep Learning and Watson Studio
Deep Learning and Watson StudioDeep Learning and Watson Studio
Deep Learning and Watson Studio
 
An Introduction to Random Forest and linear regression algorithms
An Introduction to Random Forest and linear regression algorithmsAn Introduction to Random Forest and linear regression algorithms
An Introduction to Random Forest and linear regression algorithms
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.ppt
 
13 random forest
13 random forest13 random forest
13 random forest
 
what is Random-Forest-Machine-Learning.pptx
what is Random-Forest-Machine-Learning.pptxwhat is Random-Forest-Machine-Learning.pptx
what is Random-Forest-Machine-Learning.pptx
 
2013 duke-talk
2013 duke-talk2013 duke-talk
2013 duke-talk
 
Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)
 

More from Juanjo Bazán

How to manage an open source project
How to manage an open source projectHow to manage an open source project
How to manage an open source projectJuanjo Bazán
 
Rails para programadores Java
Rails para programadores JavaRails para programadores Java
Rails para programadores JavaJuanjo Bazán
 

More from Juanjo Bazán (6)

How to manage an open source project
How to manage an open source projectHow to manage an open source project
How to manage an open source project
 
Space software
Space softwareSpace software
Space software
 
Ruby & Ciencia
Ruby & CienciaRuby & Ciencia
Ruby & Ciencia
 
Ruby and Science
Ruby and ScienceRuby and Science
Ruby and Science
 
Contribuir a Rails
Contribuir a RailsContribuir a Rails
Contribuir a Rails
 
Rails para programadores Java
Rails para programadores JavaRails para programadores Java
Rails para programadores Java
 

Recently uploaded

Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Recently uploaded (20)

Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

Nimbus

  • 1. An opensource library to implement random forests in genomic contexts Oscar González-Recio Juanjo Bazán Selma Forni
  • 3. The Problem Massive amount of information from high troughput genotyping platforms.
  • 4. The Problem Massive amount of information from high troughput genotyping platforms. Need to extract knowledge from large, noisy, redundant, missing and fuzzy data.
  • 5. The Problem Massive amount of information from high troughput genotyping platforms. Need to extract knowledge from large, noisy, redundant, missing and fuzzy data. Massive amount of information consumes the attention of its recipients. We need to allocate that attention efciently.
  • 7. Why Random Forest? Using Machine Learning techniques we can extract hidden relationships that exist in these huge volumes of data and do not follow a particular parametric design.
  • 8. Why Random Forest? Using Machine Learning techniques we can extract hidden relationships that exist in these huge volumes of data and do not follow a particular parametric design. Random Forest have desirable statistical properties.
  • 9. Why Random Forest? Using Machine Learning techniques we can extract hidden relationships that exist in these huge volumes of data and do not follow a particular parametric design. Random Forest have desirable statistical properties. Random Forest scales well computationally.
  • 10. Why Random Forest? Using Machine Learning techniques we can extract hidden relationships that exist in these huge volumes of data and do not follow a particular parametric design. Random Forest have desirable statistical properties. Random Forest scales well computationally. Random Forest performs extremely well in a variety of possible complex domains (Breiman, 2001; Gonzalez-Recio & Forni, 2011).
  • 12. The Algorithm Ensemble methods: - Combination of diferent methods (usually simple models). - They have very good predictive ability because use additivity of models performances. Based on Classifcation And Regression Trees (CART). Use Randomization and Bagging. Performs Feature Subset Selection. Convenient for classifcation problems. Fast computation. Simple interpretation of results for human minds. Previous work in genome-wide prediction (Gonzalez-Recio and Forni, 2011)
  • 13. The Algorithm Perform bootstrap on data: Ψ* = (y, X) Build a CART ( fi (y, X) = ht (x) ) using only mtry proportion of SNPs in each node. Repeat M times to reduce residuals by a factor of M. Average estimates c0 = μ ; ci = 1/M
  • 14. The Algorithm Classifcation and regression trees: Let Ψ = (y, X) be a set of data, with y = vector of phenotypes (response variables) X = (x1, x2) = matrix of features y1 x11 x12 … … ... yi xi1 xi2 … … ... yn xn1 xn2
  • 17. Nimbus library Written in Ruby Open source programming language Syntax focused on simplicity Natural to read and easy to write www.ruby-lang.org
  • 18. Nimbus library How to install: > gem install nimbus Prerequisites: Ruby and Rubygems (default library manager) installed in the system
  • 19. Nimbus library How to run: > nimbus Confguration: Via confg.yml fle
  • 21. Nimbus library Input fles: training testing
  • 22. Nimbus library Features, use cases: Training of a prediction forest Using a training set of individuals Nimbus creates a reutilizable forest Generalization error are computed for every tree in the forest Nimbus calculates SNP importances Genomic Prediction of a testing sample Using a new trained forest Using a custom/reused forest, specif ed via conf g.yml i i
  • 23. Outputs Random Forest fle In standard YAML format
  • 24. Outputs Predictions for the training sample
  • 25. Outputs Predictions for the testing sample
  • 27. More info: Nimbus website: www.nimbusgem.org Source code: www.github.com/xuanxu/nimbus Report bugs/request features: www.github.com/xuanxu/nimbus/issues