CudaTree (GTC 2014)

•

1 like•1,376 views

Random Forests have become an extremely popular machine learning algorithm for making predictions from large and complicated data sets. The currently highest performing implementations of Random Forests all run on the CPU. We implemented a Random Forest learner for the GPU (using PyCUDA and runtime code generation) which outperforms the currently preferred libraries (scikits-learn and wiseRF). The "obvious" parallelization strategy (using one thread-block per tree) results in poor performance. Instead, we developed a more nuanced collection of kernels to handle various tradeoffs between the number of samples and the number of features.

Technology

CudaTree
Training Random Forests on the GPU
Alex Rubinsteyn (NYU, Mount Sinai)
@ GTC on March 25th, 2014

What’s a Random Forest?
trees = []
for i in 1 .. T:
Xi, Yi = random_sample(X, Y)
t = RandomizedDecisionTree()
t.fit(Xi,Yi)
trees.append(t)
A bagged ensemble of randomized decision
trees. Learning by uncorrelated memorization.
Few free parameters, popular for “data science”

Random Forest trees are trained like normal
decision trees (CART), but use random subset
of features at each split.
bestScore = bestThresh = None
for i in RandomSubset(nFeatures):
Xi, Yi = sort(X[:, i], Y)
for nLess, thresh in enumerate(Xi):
score = Impurity(nLess, Yi)
if score < bestScore:
bestScore = score
bestThresh = thresh
Decision Tree Training
1;

Random Forests:
Easy to Parallelize?
• Each tree gets trained independently!
• Simple parallelization strategy:“each
processor trains a tree”
• Works great for multi-core
implementations (wiseRF, scikit-learn, &c)
• Terrible for GPU

Random Forest GPU
Training Strategies
• Tree per Thread: naive/slow
• Depth-First: data parallel threshold
selection for one tree node
• Breadth-First: learn whole level of a tree
simultaneously (threshold per block)
• Hybrid: Use Depth-First training until too
few samples, switch to Breadth-First

Depth-First Training
Thread block = subset of samples

Depth-First Training:
Algorithm Sketch
(1)Parallel Preﬁx Scan: compute label count
histograms
(2)Map: Evaluate impurity score for all feature
thresholds
(3)Reduce: Which feature/threshold pair has the
lowest impurity score?
(4)Map: Marking whether each sample goes left or
right at the split
(5)Shufﬂe: keep samples/labels on both sides of the
split contiguous.

Breadth-First Training
Thread block = tree node

Breadth-First Training:
Algorithm Sketch
Uses the same sequence of data parallel operations
(Scan label counts, Map impurity evaluation, &c) as
Depth-First training but within each thread block
Fast at the bottom of the tree (lots of small tree
nodes), insufﬁcient parallelism at the top.

Hybrid Training
★ c = number of classes
★ n = total number of samples
★ f = number of features considered at split
3702 + 1.58c + 0.0577n + 21.84f
Add a tree node to the Breadth-First queue
when it contains fewer samples than:
(coefﬁcients from machine-speciﬁc regression, needs to be generalized)

GPU Algorithms vs.
Number of Features
• Randomly generated
synthetic data
• Performance relative
to sklearn 0.14

Benchmark Data
Dataset Samples Features Classes
ImageNet 10k 4k 10
CIFAR100 50k 3k 100
covertype 581k 57 7
poker 1M 11 10
PAMAP2 2.87M 52 10
intrustion 5M 41 24

Benchmark Results
Dataset wiseRF
sklearn
0.15
CudaTree
(Titan)
CudaTree
+
wiseRF
ImageNet 23s 13s 27s 25s
CIFAR100 160s 180s 197s 94s
covertype 107s 73s 67s 52s
poker 117s 98s 59s 58s
PAMAP2 1,066s 241s 934s 757s
intrustion 667s 1,682s 199s 153s
6- core Xeon E5-2630, 24GB, GTX Titan, n_trees = 100, features_per_split = sqrt(n)

Thanks!
• Installing: pip install cudatree
• Source:https://github.com/EasonLiao/CudaTree
• Credit:Yisheng Liao did most of the
hard work, Russell Power & Jinyang
Li were the sanity check brigade.

What's hot

Support Vector MachinesSakis Sotiropoulos

Support Vector machineAnandha L Ranganathan

Machine Learning 3 - Decision Tree Learningbutest

Svm Presentationshahparin

002.decision treeshoangminhdong

A Random Forest Approach To Skin Detection With RAuro Tripathy

LR1. Summary Day 1Machine Learning Valencia

Id3 algorithmSreekuttanJayakumar

In-class slides with activitiesSERC at Carleton College

1. C Basics for Data Structures Bridge CourseP. Subathra Kishore, KAMARAJ College of Engineering and Technology, Madurai

Tree models with Scikit-Learn: Great models with little assumptionsGilles Louppe

2.6 support vector machines and associative classifiers revisedKrish_ver2

Machine Learning using Support Vector MachineMohsin Ul Haq

MATLABvishwesh sharma

Lecture 5 Relationship between pixel-2VARUN KUMAR

L1 intro2 supervised_learningYogendra Singh

Could a Data Science Program use Data Science Insights?Zachary Thomas

Support vector machineSomnathMore3

SVMMohit Shrivastava

Ppt shuaiXiang Zhang

What's hot (20)

Support Vector Machines

Support Vector machine

Machine Learning 3 - Decision Tree Learning

Svm Presentation

002.decision trees

A Random Forest Approach To Skin Detection With R

LR1. Summary Day 1

Id3 algorithm

In-class slides with activities

1. C Basics for Data Structures Bridge Course

Tree models with Scikit-Learn: Great models with little assumptions

2.6 support vector machines and associative classifiers revised

Machine Learning using Support Vector Machine

MATLAB

Lecture 5 Relationship between pixel-2

L1 intro2 supervised_learning

Could a Data Science Program use Data Science Insights?

Support vector machine

SVM

Ppt shuai

Similar to CudaTree (GTC 2014)

MS CS - Selecting Machine Learning AlgorithmKaniska Mandal

Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya

Decision Tree - C4.5&CARTXueping Peng

Decision tree learningDr. Radhey Shyam

Isolation ForestKonkuk University, Korea

iccv2009 tutorial: boosting and random forest - part Izukun

Comparison Study of Decision Tree Ensembles for RegressionSeonho Park

Introduction to Big Data ScienceAlbert Bifet

机器学习AdaboostShocky1

Machine learning in science and industry — day 2arogozhnikov

NEED THE ANSWER IN 30 mins thank you Write a recursive .pdfamazonedistributors

Introduction to conventional machine learning techniquesXavier Rafael Palou

Machine learning interviews day2rajmohanc

모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로 r-kor

deepnet-lourentzou.pptyang947066

Multiclass Recognition with Multiple Feature Treescsandit

Multiclass recognition withcsandit

pptbutest

Machine Learning Comparative Analysis - Part 1Kaniska Mandal

Machine Learning Algorithms (Part 1)Zihui Li

Similar to CudaTree (GTC 2014) (20)

MS CS - Selecting Machine Learning Algorithm

Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)

Decision Tree - C4.5&CART

Decision tree learning

Isolation Forest

iccv2009 tutorial: boosting and random forest - part I

Comparison Study of Decision Tree Ensembles for Regression

Introduction to Big Data Science

机器学习Adaboost

Machine learning in science and industry — day 2

NEED THE ANSWER IN 30 mins thank you Write a recursive .pdf

Introduction to conventional machine learning techniques

Machine learning interviews day2

모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로

deepnet-lourentzou.ppt

Multiclass Recognition with Multiple Feature Trees

Multiclass recognition with

ppt

Machine Learning Comparative Analysis - Part 1

Machine Learning Algorithms (Part 1)

Recently uploaded

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh

Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix

Pigging Solutions in Pet Food ManufacturingPigging Solutions

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106

AI as an Interface for Commercial BuildingsMemoori

How to Remove Document Management Hurdles with X-Docs?XfilesPro

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

Key Features Of Token Development (1).pptxLBM Solutions

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j

Understanding the Laravel MVC ArchitecturePixlogix Infotech

Recently uploaded (20)

Factors to Consider When Choosing Accounts Payable Services Providers.pptx

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi

Maximizing Board Effectiveness 2024 Webinar.pptx

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

[2024]Digital Global Overview Report 2024 Meltwater.pdf

08448380779 Call Girls In Civil Lines Women Seeking Men

Swan(sea) Song – personal research during my six years at Swansea ... and bey...

Pigging Solutions in Pet Food Manufacturing

Presentation on how to chat with PDF using ChatGPT code interpreter

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics

AI as an Interface for Commercial Buildings

How to Remove Document Management Hurdles with X-Docs?

08448380779 Call Girls In Friends Colony Women Seeking Men

Unblocking The Main Thread Solving ANRs and Frozen Frames

Key Features Of Token Development (1).pptx

Handwritten Text Recognition for manuscripts and early printed texts

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph

Understanding the Laravel MVC Architecture

CudaTree (GTC 2014)

1. CudaTree Training Random Forests on the GPU Alex Rubinsteyn (NYU, Mount Sinai) @ GTC on March 25th, 2014

2. What’s a Random Forest? trees = [] for i in 1 .. T: Xi, Yi = random_sample(X, Y) t = RandomizedDecisionTree() t.fit(Xi,Yi) trees.append(t) A bagged ensemble of randomized decision trees. Learning by uncorrelated memorization. Few free parameters, popular for “data science”

3. Random Forest trees are trained like normal decision trees (CART), but use random subset of features at each split. bestScore = bestThresh = None for i in RandomSubset(nFeatures): Xi, Yi = sort(X[:, i], Y) for nLess, thresh in enumerate(Xi): score = Impurity(nLess, Yi) if score < bestScore: bestScore = score bestThresh = thresh Decision Tree Training 1;

4. Random Forests: Easy to Parallelize? • Each tree gets trained independently! • Simple parallelization strategy:“each processor trains a tree” • Works great for multi-core implementations (wiseRF, scikit-learn, &c) • Terrible for GPU

5. Random Forest GPU Training Strategies • Tree per Thread: naive/slow • Depth-First: data parallel threshold selection for one tree node • Breadth-First: learn whole level of a tree simultaneously (threshold per block) • Hybrid: Use Depth-First training until too few samples, switch to Breadth-First

6. Depth-First Training Thread block = subset of samples

7. Depth-First Training Thread block = subset of samples

8. Depth-First Training Thread block = subset of samples

9. Depth-First Training Thread block = subset of samples

10. Depth-First Training Thread block = subset of samples

11. Depth-First Training Thread block = subset of samples

12. Depth-First Training: Algorithm Sketch (1)Parallel Preﬁx Scan: compute label count histograms (2)Map: Evaluate impurity score for all feature thresholds (3)Reduce: Which feature/threshold pair has the lowest impurity score? (4)Map: Marking whether each sample goes left or right at the split (5)Shufﬂe: keep samples/labels on both sides of the split contiguous.

13. Breadth-First Training Thread block = tree node

14. Breadth-First Training Thread block = tree node

15. Breadth-First Training Thread block = tree node

16. Breadth-First Training Thread block = tree node

17. Breadth-First Training: Algorithm Sketch Uses the same sequence of data parallel operations (Scan label counts, Map impurity evaluation, &c) as Depth-First training but within each thread block Fast at the bottom of the tree (lots of small tree nodes), insufﬁcient parallelism at the top.

18. Hybrid Training ★ c = number of classes ★ n = total number of samples ★ f = number of features considered at split 3702 + 1.58c + 0.0577n + 21.84f Add a tree node to the Breadth-First queue when it contains fewer samples than: (coefﬁcients from machine-speciﬁc regression, needs to be generalized)

19. GPU Algorithms vs. Number of Features • Randomly generated synthetic data • Performance relative to sklearn 0.14

20. Benchmark Data Dataset Samples Features Classes ImageNet 10k 4k 10 CIFAR100 50k 3k 100 covertype 581k 57 7 poker 1M 11 10 PAMAP2 2.87M 52 10 intrustion 5M 41 24

21. Benchmark Results Dataset wiseRF sklearn 0.15 CudaTree (Titan) CudaTree + wiseRF ImageNet 23s 13s 27s 25s CIFAR100 160s 180s 197s 94s covertype 107s 73s 67s 52s poker 117s 98s 59s 58s PAMAP2 1,066s 241s 934s 757s intrustion 667s 1,682s 199s 153s 6- core Xeon E5-2630, 24GB, GTX Titan, n_trees = 100, features_per_split = sqrt(n)

22. Thanks! • Installing: pip install cudatree • Source:https://github.com/EasonLiao/CudaTree • Credit:Yisheng Liao did most of the hard work, Russell Power & Jinyang Li were the sanity check brigade.

CudaTree (GTC 2014)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to CudaTree (GTC 2014)

Similar to CudaTree (GTC 2014) (20)

Recently uploaded

Recently uploaded (20)

CudaTree (GTC 2014)