Advertisement

Aug. 29, 2014•0 likes## 22 likes

•19,489 views## views

Be the first to like this

Show More

Total views

0

On Slideshare

0

From embeds

0

Number of embeds

0

Download to read offline

Report

Data & Analytics

Random Forests are without contest one of the most robust, accurate and versatile tools for solving machine learning tasks. Implementing this algorithm properly and efficiently remains however a challenging task involving issues that are easily overlooked if not considered with care. In this talk, we present the Random Forests implementation developed within the Scikit-Learn machine learning library. In particular, we describe the iterative team efforts that led us to gradually improve our codebase and eventually make Scikit-Learn's Random Forests one of the most efficient implementations in the scientific ecosystem, across all libraries and programming languages. Algorithmic and technical optimizations that have made this possible include: - An efficient formulation of the decision tree algorithm, tailored for Random Forests; - Cythonization of the tree induction algorithm; - CPU cache optimizations, through low-level organization of data into contiguous memory blocks; - Efficient multi-threading through GIL-free routines; - A dedicated sorting procedure, taking into account the properties of data; - Shared pre-computations whenever critical. Overall, we believe that lessons learned from this case study extend to a broad range of scientific applications and may be of interest to anybody doing data analysis in Python.

Gilles LouppeFollow

Postdoctoral Research Associate at NYU / CERNAdvertisement

Advertisement

Advertisement

VAE-type Deep Generative ModelsKenta Oono

Recommendation System --Theory and PracticeKimikazu Kato

Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorVivian S. Zhang

Additive model and boosting treeDong Guo

Tree models with Scikit-Learn: Great models with little assumptionsGilles Louppe

Tensor flow (1)景逸 王

- Accelerating Random Forests in Scikit-Learn Gilles Louppe Universite de Liege, Belgium August 29, 2014 1 / 26
- Motivation ... and many more applications ! 2 / 26
- About Scikit-Learn Machine learning library for Python Classical and well-established algorithms Emphasis on code quality and usability Myself @glouppe PhD student (Liege, Belgium) Core developer on Scikit-Learn since 2011 Chief tree hugger scikit 3 / 26
- Outline 1 Basics 2 Scikit-Learn implementation 3 Python improvements 4 / 26
- Machine Learning 101 Data comes as... A set of samples L = f(xi ; yi )ji = 0; : : : ;N

Advertisement