Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A Random Forest using a Multi-valued Decision Diagram on an FPGa

3,032 views

Published on

The ISMVL (Int'l Symp. on Multiple-Valued Logic) presentation slide on May, 22nd, 2017 at Novi Sad, Serbia. It is a kind of machine learning to realize a high-performance and low power.

Published in: Engineering
  • Be the first to comment

A Random Forest using a Multi-valued Decision Diagram on an FPGa

  1. 1. A Random Forest using a Multi-valued Decision Diagram on an FPGA 1Hiroki Nakahara, 1Akira Jinguji, 1Shimpei Sato, 2Tsutomu Sasao 1Tokyo Institute of Technology, JP, 2Meiji University, JP May, 22nd, 2017 @ISMVL2017
  2. 2. Outline • Background • Random forest (RF) • Multi-valued decision diagram (MDD) • RF using MDDs • Experimental results • Conclusion 2
  3. 3. Machine Learning 3 Much computation power, and Big data (Left): “Single-Threaded Integer Performance,” 2016 (Right): Nakahara, “Trend of Search Engine on modern Internet,” 2014
  4. 4. Machine Learning Algorithms M. Warrick, “How to get started with machine learning,” PyCon2014 4
  5. 5. Introduction • Random Forest (RF) • Ensemble learning method • Consists of multiple decision trees (DTs) • Applications: Segmentation, human pose detection • It is based on binary DTs (BDTs) • A node is evaluated by an if-then-else statement • The same variable may appear several times • Multiple-valued decision diagram (MDD) • Each variable appears only once on a path 5
  6. 6. Introduction (Contʼd) • Target platform • CPU: Too slow • GPU: Not suitable to the RF → slow, and consumes much power • FPGA: Faster, low power, long TAT • High-level synthesis (HLS) for the RF using MDDs on an FPGA • Low power, high performance, short design time 6
  7. 7. Random Forest 7
  8. 8. Classification by a Binary Decision Tree (BDT) • Partition of the feature map 1.00 0.53 0.29 0.00 0.09 0.63 0.71 1.00 C1 C2 C1 C 1 C2 C1 X1 X2 X2<0.53? X2<0.29? X1<0.09? X1<0.63? X1<0.71? Y N N NN NY Y Y Y C1 C1C2 C1C2 C1 8
  9. 9. Training of a BDT • It is built by randomized samples • Recursively partition the dataset to maximize its entropy → The same variables may appear 9 1.00 0.53 0.29 0.00 0.09 0.63 0.71 1.00 C1 C2 C1 C 1 C2 C1 X1 X2 X2<0.53? X2<0.29? X1<0.09? X1<0.63? X1<0.71? Y N N NN NY Y Y Y C1 C1C2 C1C2 C1
  10. 10. Random Forest (RF) • Ensemble learning • Classification and regression • Consists of multiple BDT 10 Tree 1 Tree 2 Tree n C1 C2 C1 Voter C1 (Class) InputX1<0.53? X3<0.71? X2<0.63? X2<0.63? X3<0.72? Y N N NN NY Y Y Y C1 C1C2 C1C3 C1 Tree 1 Binary Decision Tree (BDT) Random Forest ...
  11. 11. Applications • Key point matching [Lepetit et al., 2006] • Object detector [Shotton et al., 2008][Gall et al., 2011] • Hand written character recognition [Amit&Geman, 1997] • Visual word clustering [Moosmann et al.,2006] • Pose recognition [Yamashita et al., 2010] • Human detector [Mitsui et al., 2011] [Dahang et al., 2012] • Human pose estimation [Shotton 2011] 11
  12. 12. Known Problem • Build BDTs from randomized samples • The same variable may appear on a path • Tend to be slow, even if we use the GPUs 12 X2<0.53? X2<0.29? X2<0.09? X1<0.63? X1<0.71? Y N N NN NY Y Y Y C1 C1C2 C1C2 C1 if X2 < 0.09 then output C1; else goto Child_node;
  13. 13. Multi-valued Decision Diagram 13
  14. 14. 14 Binary Decision Diagram (BDD) • Recursively apply Shannon expansion to a given logic function • Non-terminal node: If-then-else statement • Terminal node: Set functional value 0 1 x1 x2 x3 x4 x5 x6 Non‐terminal node Terminal node
  15. 15. 15 Measurement of BDD Memory size: # of nodes size of a node Worst case performance: LPL (Longest Path Length) →Dedicated fully pipeline hardware 0 1 x1 x2 x3 x4 x5 x6 
  16. 16. 16 Multi-Valued Decision Diagram (MDD) • MDD(k): 2k outgoing edges • Evaluates k variables at a time 0 1 x1 x2 x3 x4 x5 x6 BDD 0 1 X3 X2 X1 {x5,x6} {x3,x4} {x1,x2} MDD(2)
  17. 17. Comparison the BDT with the MDD 17 X2<0.53? X2<0.29? X1<0.09? X1<0.63? X1<0.71? Y N N NN NY Y Y Y C1 C1C2 C1C2 C1 X2 X1 X1 C1 C2 <0.29 <0.53 <1.00 <1.00 <0.71 <0.71 <1.00 <0.63 BDT MDD
  18. 18. # of Nodes 18 1.00 0.53 0.29 0.00 0.09 0.63 0.71 1.00 C1 C2 C1 C 1 C2 C1 X2 X1 1.00 0.53 0.29 0.00 0.09 0.63 0.71 1.00 C1 C2 C1 C 1 C2 C1 X2 X1 BDT MDD
  19. 19. Complexities of the BDT and the MDD 19 # Nodes LPL BDT O(Σ|Xi|) O(Σ|Xi|) MDD O(|Xi|k) O(n) The RF prefers shallow decision trees for avoid  the overfitting
  20. 20. Random Forest using MDDs on an FPGA 20
  21. 21. FPGA (Field Programmable Gate Array) • Reconfigurable architecture • Look-up Table (LUT) • Configurable channel • Advantages • Faster than CPU • Dissipate lower power than GPU • Short time design than ASIC 21
  22. 22. Fully Pipeline Circuit Tree 1 Tree 2 Tree b C1 C2 C1 Voter C1 X (Input) ... 22
  23. 23. MUX-based Realization 23
  24. 24. System Design Tool 24 ① ② ④ ③ 1. Behavior design + pragmas 2. Profile analysis 3. IP core generation by HLS 4. Bitstream generation by FPGA CAD tool 5. Middle ware generation ↓ Automatically done
  25. 25. Proposed Tool Flow Training Dataset scikit‐learn Hyper Parameter (by Grid‐ search) Random Forest Host Code Kernel Code aocx Binary Host PC FPGA Board aoc gcc RF2AOC 25 scikit‐learn Intel SDK for OpenCL
  26. 26. Experimental Results 26
  27. 27. Comparison the MDD based with the BDT based 27 BDT MDD Name Path len. (Peform.) #Nodes (Mem.) Max. Path Path len. (Peform.) #Nodes (Mem.) Dermatology 720 676 15 322 118336 Contraceptive  Method 600 1055 9 198 7360 Glass  Identification 952 1260 10 268 17204 Hayes‐Roth 480 577 5 73 448 Hepatitis 720 1040 15 357 145664 Ionosphere 1196 1077 20 381 671744 Iris 1056 777 4 199 517 Dataset: UCI Machine Learning Repository http://archive.ics.uci.edu/ml/datasets.html
  28. 28. Comparison of Platforms • Implemented RF following devices • CPU: Intel Core i7 650 • GPU: NVIDIA GeForce GTX Titan • FPGA: Terasic DE5-NET • Measure dynamic power including the host PC • Test bench: 10,000 random vectors • Execution time including communication time between the host PC and devices 28 GPU FPGA
  29. 29. Comparison of Platforms 29 GPU@86W GeForce Titan CPU@13W Xeon (R) E5607 FPGA@15W Stratix V A7 Name LPS LPS/W LPS LPS/W LPS LPS/W Dermatology 336.2 3.9 211.6 16.3 3221.2 214.7 Contraceptive  Method 521.9 6.1 286.4 22.0 10924.3 728.3 Glass  Identification 726.7 8.5 587.5 45.2 6442.3 429.5 Hayes‐Roth 1512.9 17.6 1165.5 89.7 12884.6 859.0 Hepatitis 739.1 8.6 662.7 51.0 8209.9 547.3 Ionosphere 821.0 9.5 595.9 45.8 9663.5 644.2 Iris 446.6 5.2 436.7 33.6 4831.7 322.1 LPS: #Looks Per Second
  30. 30. Conclusion • Proposed the RF using MDDs • Reduced the path length • Increased the column multiplicity • # of nodes: O(|X|k) • The shallow decision diagram is recommended to avoid the overfitting • Developed the high-level synthesis design flow toward the FPGA realization • 10.7x faster than the GPU • 14.0x faster than the CPU 30

×