0
Online Random Forest in
10 Minutes
Traditional Supervised Learning
Algorithms
●
●
●
●
●

Regression
Random Forest
Support Vector Machines
Classification and ...
Inputs
● Data Matrix (Regression)
Predictand

Predictor 1

Predictor 2

Predictor 3

Predictor 4

.56

Red

.456

Male

.5...
Inputs
● Data Matrix (Binary Classification)
Predictand

Predictor 1

Predictor 2

Predictor 3

Predictor 4

Yes

Red

.45...
Inputs To Streaming Classification
● Observations now have an explicit arrival
order.
Predictand

Predictor 1

Predictor 2...
Inputs To Streaming Classification
● New Observations can arrive at any time
Predictand

Predictor 1

Predictor 2

Predict...
Problems
● Do the important predictors change over
time and when does this change occur?
● How far back is data relevant t...
Enter Online Random Forest
● Input is a single new observation
● Trees learn incrementally on this new data
● Trees are dr...
Visualization of a single tree
Accuracy on test cases: 75%

5, 6

0, 70

Pure data stop
splitting
Visualization of a single tree
Accuracy on test cases: 55%

0, 70

2, 25

20,3

50 new observations have
come and we creat...
Tree gets pruned
Accuracy on test cases: 55% …
compare to Random variable and
incorporate the age of the tree.
Accuracy is...
New Tree
It’s a stump that hasn’t yet split
any data. If asked for a
classification request it will vote
the prior probabi...
Online Random Forest
● By dropping trees that predict poorly we can
adapt to change in important predictors
● If previous ...
Online Random Forest
● This process of incremental learning and
dropping is constantly occurring so we can
constantly adap...
Example Stream
Changing Feature Importance
Online random forests in 10 minutes
Online random forests in 10 minutes
Online random forests in 10 minutes
Online random forests in 10 minutes
Upcoming SlideShare
Loading in...5
×

Online random forests in 10 minutes

1,013

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,013
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Online random forests in 10 minutes"

  1. 1. Online Random Forest in 10 Minutes
  2. 2. Traditional Supervised Learning Algorithms ● ● ● ● ● Regression Random Forest Support Vector Machines Classification and Regression Tree (CART) etc
  3. 3. Inputs ● Data Matrix (Regression) Predictand Predictor 1 Predictor 2 Predictor 3 Predictor 4 .56 Red .456 Male .589 .78 Green .654 Female .6654 .987 Blue .678 Female .789 .123 Blue .999 Male .543
  4. 4. Inputs ● Data Matrix (Binary Classification) Predictand Predictor 1 Predictor 2 Predictor 3 Predictor 4 Yes Red .456 Male .589 No Green .654 Female .6654 Yes Blue .678 Female .789 No Blue .999 Male .543
  5. 5. Inputs To Streaming Classification ● Observations now have an explicit arrival order. Predictand Predictor 1 Predictor 2 Predictor 3 Predictor 4 Time Yes Red .456 Male .589 Jan 1st 2011 No Green .654 Female .6654 Feb 4th 2012 Yes Blue .678 Female .789 Feb 5th 2013 No Blue .999 Male .543 July 4th
  6. 6. Inputs To Streaming Classification ● New Observations can arrive at any time Predictand Predictor 1 Predictor 2 Predictor 3 Predictor 4 Time Yes Red .456 Male .589 Jan 1st 2011 No Green .654 Female .6654 Feb 4th 2012 Yes Blue .678 Female .789 Feb 5th 2013 No Blue .999 Male .543 July 4th 2013 Yes Red .456 Male .456 NOW
  7. 7. Problems ● Do the important predictors change over time and when does this change occur? ● How far back is data relevant to today’s problem? ● What happens when our predictors change again in the future? ● What if this is all happening rapidly… will it scale?
  8. 8. Enter Online Random Forest ● Input is a single new observation ● Trees learn incrementally on this new data ● Trees are dropped from the forest based on performance and replaced a new “ungrown” tree
  9. 9. Visualization of a single tree Accuracy on test cases: 75% 5, 6 0, 70 Pure data stop splitting
  10. 10. Visualization of a single tree Accuracy on test cases: 55% 0, 70 2, 25 20,3 50 new observations have come and we create another split off the parent node’s left branch
  11. 11. Tree gets pruned Accuracy on test cases: 55% … compare to Random variable and incorporate the age of the tree. Accuracy is TOO BAD. Prune the tree 0, 70 2, 25 20,3
  12. 12. New Tree It’s a stump that hasn’t yet split any data. If asked for a classification request it will vote the prior probability calculated from the last 100 observations that the old pruned tree saw
  13. 13. Online Random Forest ● By dropping trees that predict poorly we can adapt to change in important predictors ● If previous data is relevant to today’s problem, tree’s learned from it in the past. If it no longer becomes relevant it will be reflected in the accuracy and the tree will get prune
  14. 14. Online Random Forest ● This process of incremental learning and dropping is constantly occurring so we can constantly adapt to a changing signal ● We built our Online Random Forest with scala’s actor framework ● We distribute our tree’s computations (and physical location) therefore we can handle high input data streams
  15. 15. Example Stream
  16. 16. Changing Feature Importance
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×