Auro Tripathy
                  auro@shatterline.com

*Random Forests are registered trademarks of Leo Breiman and Adele Cutler
   Attributions, code and dataset location (1
    minute)
   Overview of the scheme (2 minutes)
   Refresher on Random Forest and R
    Support (2 minutes)
   Results and continuing work (1 minute)
   Q&A (1 minute and later)
ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5651638
   R code available here; my contribution
       http://www.shatterline.com/SkinDetection.html
   Data set available here
     http://www.feeval.org/Data-sets/Skin_Colors.html
     Permission to use may be required
    All training sets organized as a two-movie
     sequence
    1. A movies sequence of frames in color
    2. A corresponding sequence of frames in binary
       black-and-white, the ground-truth
    Extract individual frames in jpeg format
     using ffmpeg, a transcoding tool
    ffmpeg -i 14.avi -f image2 -ss 1.000 -vframes 1
        14_500offset10s.jpeg

ffmpeg -i 14_gt_500frames.avi -f image2 -ss 1.000 -vframes 1
    14_gt_500frames_offset10s.jpeg
Image                       Ground-truth


The original authors used 8991 such image-pairs, the image along with
its manually annotated pixel-level ground-truth.
   Attributions, code and dataset location (1
    minute)
   Overview of the scheme (2 minutes)
   Refresher on Random Forest and R
    Support (2 minutes)
   Results and continuing work (1 minute)
   Q&A (1 minute and later)
   Skin-color classification/segmentation
   Uses Improved Hue, Saturation, Luminance
    (IHLS) color-space
   RBG values transformed to HLS
   HLS used as feature-vectors
   Original authors also experimented with
       Bayesian network,
       Multilayer Perceptron,
       SVM,
       AdaBoost (Adaptive Boosting),
       Naive Bayes,
       RBF network

“Random Forest shows the best performance in terms of accuracy,
precision and recall”
The most important property of this [IHLS] space is a “well-
behaved” saturation coordinate which, in contrast to commonly
used ones, always has a small numerical value for near-
achromatic colours, and is completely independent of the
brightness function
             A 3D-polar Coordinate Colour Representation Suitable for
             Image, Analysis Allan Hanbury and Jean Serra

MATLAB routines implementing the RGB-to-IHLS and IHLS-to-RGB are
available at http://www.prip.tuwien.ac.at/˜hanbury.

R routines implementing the RGB-to-IHLS and IHLS-to-RGB are
available at http://www.shatterline.com/SkinDetection.html
   Package ‘ReadImages’
       This package provides functions for reading
        JPEG and PNG files
   Package ‘randomForest’
       Breiman and Cutler’s Classification and
        regression based on a forest of trees using
        random inputs.
   Package ‘foreach’
     Support for the foreach looping construct
     Stretch goal to use %dopar%
set.seed(371)
skin.rf <- foreach(i = c(1:nrow(training.frames.list)), .combine=combine,
.packages='randomForest') %do%
{
    #Read the Image
    #transform from RGB to IHLS
    #Read the corresponding ground-truth image
    #data is ready, now apply random forest #not using the formula interface
    randomForest(table.data, y=table.truth, mtry = 2, importance = FALSE,
    proximity = FALSE, ntree=10, do.trace = 100)

}




table.pred.truth <- predict(skin.rf, test.table.data)
   Attributions, code and dataset location (1
    minute)
   Overview of the scheme (2 minutes)
   Refresher on Random Forest and R
    Support (2 minutes)
   Results and continuing work (1 minute)
   Q&A (1 minute and later)
   Have lots of decision-tree learners
   Each learner’s training set is sampled
    independently – with replacement
   Add more randomness – at each node of
    the tree, the splitting attribute is selected
    from a randomly chosen sample of
    attributes
Each decision tree votes
                                for a classification




     Forest chooses a
classification with the
            most votes
   Quick training phase
   Trees can grow in parallel
   Trees have attractive computing
    properties
   For example…
       Computation cost of making a binary tree is
        low O(N Log N)
       Cost of using a tree is even lower – O(Log N)
       N is the number of data points
       Applies to balanced binary trees; decision
        trees often not balanced
   Attributions, code and dataset location (1
    minute)
   Overview of the scheme (2 minutes)
   Refresher on Random Forest and R
    Support (2 minutes)
   Results and continuing work (1 minute)
   Q&A (1 minute and later)
ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5651638
My Results? OK, but incomplete due to very small training set.
Need parallel computing cluster
   Attributions, code and dataset location (1
    minute)
   Overview of the scheme (2 minutes)
   Refresher on Random Forest and R
    Support (2 minutes)
   Results and continuing work (1 minute)
   Q&A (1 minute and later)

A Random Forest Approach To Skin Detection With R

  • 1.
    Auro Tripathy auro@shatterline.com *Random Forests are registered trademarks of Leo Breiman and Adele Cutler
  • 2.
    Attributions, code and dataset location (1 minute)  Overview of the scheme (2 minutes)  Refresher on Random Forest and R Support (2 minutes)  Results and continuing work (1 minute)  Q&A (1 minute and later)
  • 3.
  • 4.
    R code available here; my contribution  http://www.shatterline.com/SkinDetection.html  Data set available here  http://www.feeval.org/Data-sets/Skin_Colors.html  Permission to use may be required
  • 5.
    All training sets organized as a two-movie sequence 1. A movies sequence of frames in color 2. A corresponding sequence of frames in binary black-and-white, the ground-truth  Extract individual frames in jpeg format using ffmpeg, a transcoding tool ffmpeg -i 14.avi -f image2 -ss 1.000 -vframes 1 14_500offset10s.jpeg ffmpeg -i 14_gt_500frames.avi -f image2 -ss 1.000 -vframes 1 14_gt_500frames_offset10s.jpeg
  • 6.
    Image Ground-truth The original authors used 8991 such image-pairs, the image along with its manually annotated pixel-level ground-truth.
  • 7.
    Attributions, code and dataset location (1 minute)  Overview of the scheme (2 minutes)  Refresher on Random Forest and R Support (2 minutes)  Results and continuing work (1 minute)  Q&A (1 minute and later)
  • 8.
    Skin-color classification/segmentation  Uses Improved Hue, Saturation, Luminance (IHLS) color-space  RBG values transformed to HLS  HLS used as feature-vectors  Original authors also experimented with  Bayesian network,  Multilayer Perceptron,  SVM,  AdaBoost (Adaptive Boosting),  Naive Bayes,  RBF network “Random Forest shows the best performance in terms of accuracy, precision and recall”
  • 9.
    The most importantproperty of this [IHLS] space is a “well- behaved” saturation coordinate which, in contrast to commonly used ones, always has a small numerical value for near- achromatic colours, and is completely independent of the brightness function A 3D-polar Coordinate Colour Representation Suitable for Image, Analysis Allan Hanbury and Jean Serra MATLAB routines implementing the RGB-to-IHLS and IHLS-to-RGB are available at http://www.prip.tuwien.ac.at/˜hanbury. R routines implementing the RGB-to-IHLS and IHLS-to-RGB are available at http://www.shatterline.com/SkinDetection.html
  • 10.
    Package ‘ReadImages’  This package provides functions for reading JPEG and PNG files  Package ‘randomForest’  Breiman and Cutler’s Classification and regression based on a forest of trees using random inputs.  Package ‘foreach’  Support for the foreach looping construct  Stretch goal to use %dopar%
  • 11.
    set.seed(371) skin.rf <- foreach(i= c(1:nrow(training.frames.list)), .combine=combine, .packages='randomForest') %do% { #Read the Image #transform from RGB to IHLS #Read the corresponding ground-truth image #data is ready, now apply random forest #not using the formula interface randomForest(table.data, y=table.truth, mtry = 2, importance = FALSE, proximity = FALSE, ntree=10, do.trace = 100) } table.pred.truth <- predict(skin.rf, test.table.data)
  • 12.
    Attributions, code and dataset location (1 minute)  Overview of the scheme (2 minutes)  Refresher on Random Forest and R Support (2 minutes)  Results and continuing work (1 minute)  Q&A (1 minute and later)
  • 13.
    Have lots of decision-tree learners  Each learner’s training set is sampled independently – with replacement  Add more randomness – at each node of the tree, the splitting attribute is selected from a randomly chosen sample of attributes
  • 14.
    Each decision treevotes for a classification Forest chooses a classification with the most votes
  • 15.
    Quick training phase  Trees can grow in parallel  Trees have attractive computing properties  For example…  Computation cost of making a binary tree is low O(N Log N)  Cost of using a tree is even lower – O(Log N)  N is the number of data points  Applies to balanced binary trees; decision trees often not balanced
  • 16.
    Attributions, code and dataset location (1 minute)  Overview of the scheme (2 minutes)  Refresher on Random Forest and R Support (2 minutes)  Results and continuing work (1 minute)  Q&A (1 minute and later)
  • 17.
    ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5651638 My Results? OK,but incomplete due to very small training set. Need parallel computing cluster
  • 18.
    Attributions, code and dataset location (1 minute)  Overview of the scheme (2 minutes)  Refresher on Random Forest and R Support (2 minutes)  Results and continuing work (1 minute)  Q&A (1 minute and later)