SlideShare a Scribd company logo
HARDWARE ACCELERATION OF SVM TRAINING
FOR REAL-TIME EMBEDDED SYSTEMS: AN
OVERVIEW
Ilham Amezzane
Ibn Tofail University
March 26th, 20181
Outline
 Background
 Accelerating SVM Training with:
 GPU
 FPGA
 GPU vs FPGA Performance Comparison
 Conclusion
2
 Smartphone-based Applications :
 Healthcare
 Smart Homes
 WSNs
 Challenges:
 Large datasets
 Needs of accelerating the processing speed
 Limited resources
3
Real-time Embedded Applications
Support Vector Machines (SVM)
4
 Instance-based:
 Optimal hyperplane for linearly separable patterns.
 Strength:
• Can apply linear classification techniques to non-linear data using the kernel trick.
• High accuracy
 Weakness:
• Memory-intensive
• Hard to interpret
 Quadratic Programming (QP):
 size grows with the number of training samples : of O(N2) complexity.
 Several decomposition methods:
 e.g. Sequential Minimal Optimization (SMO)
 CPU standard version (LIBSVM):
 SMO based
 For real-time applications, can be :
 very time-consuming
 computationally intensive
SVM Training Algorithm: Limitations
5
Outline
 Background
 Accelerating SVM Training with:
 GPU
 FPGA
 GPU vs FPGA Performance Comparison
 Conclusion
6
Graphic Processing Unit (GPU)
 Computer intensive
 Highly-parallel computation
 More data processing than caching and flow control
7
https://www.carestream.com/blog/wp-content/uploads/2015/09/CSH_CPU-GPU_Illustration.png
GPU Programming Frameworks
 CUDA:
 NVIDIA
 OpenCL:
 AMD (CPUs, GPUs),
 Intel (CPUs, GPUs),
 Nvidia (GPUs),
 Qualcomm (embedded/mobile CPUs)
 ALTERA (FPGAs),
OpenCL allows heterogeneous computation in one system.
8
(2008, 2010)/ Works based on modified SMO algorithm of the standard LibSVM:
 Dataset dependent speedups
(2011)/ Works based on pre-calculating the kernel matrix elements:
 Combining the CPU and the GPU
 GPU speed has higher impact on the total training time.
(2011)/ New package GPUSVM :
 a CV tool, a fast training tool and a predicting tool.
 2.27 – 77 times faster
(2013)/ A novel implementation to accelerate the CV procedure :
 Running multiple training tasks simultaneously
 10- 100 times faster.
9
Research Works with GPU
10
(2015)/ Heterogeneous computing system
 OpenCL framework
 9- 22 times faster.
(2016)/ Converting a gradient-ascent based algorithm to a GPU implementation:
 Fastest for high-dimensional feature vectors.
(2016)/ Accelerating the CV process:
 OpenCL framework
 Applied in a mobile device
 1.5 times faster
Research Works with GPU
 Dense matrix format
 For storing datasets
 RBF kernel
 Without the possibility of changing the used kernel easily
 Binary classification
 In most cases
11
Limitations
Outline
 Background
 Accelerating SVM Training with:
 GPU
 FPGA
 GPU vs FPGA Performance Comparison
 Conclusion
12
 Parallelism & Pipelining
 High performance
 Reconfigurability
13
Field-programmable Gate Array (FPGA)
Generic FPGA Architecture
FPGA
 Typical approaches to speed up the SVM computations :
 Increasing the level of parallelism
 exploiting the inherent parallelism of the SVM algorithm.
 Reducing the bit width of the data representation
 reducing the resource usage.
14
(2008)/ A scalable FPGA architecture based on Gilbert’s algorithm:
 Partitioned into floating-point and fixed-point domains.
 3 orders of magnitude faster than SW implementation.
(2011)/ A novel architecture for the SMO process:
 With a memory block and a cache block
 A decrease in processing time from using the cache
(2011)/ Modular design improved:
 90% reduction in training time
(2014)/ A novel reconfigurable chip design for accelerating SMO :
 Reconfigurable architectures.
 Dynamic scheduling for an efficient reconfiguration.
 Power consumption (17 times )
 Training speed (16 times )
15
Research works with FPGA
Research works with FPGA
(2015)/ First floating-point based and multi-use reconfigurable HW: R2SVM
 Modifications of the number of classes/features.
 Modifications of kernel selection and parameters at run-time.
 Extensive pipelining and parallelism.
 Examined in a human-computer wireless interface
 Operating at a very low power level.
(2016)/ A novel optimised dataflow architecture for incremental SVM training:
 Up to 40.97 times faster.
16
Outline
 Background
 Accelerating SVM Training with:
 GPU
 FPGA
 GPU vs FPGA Performance Comparison
 Conclusion
17
Feature Analysis Winner
Floating-point
Processing
Total Flops of GPUs > the best FPGAs’ GPU
Timing Latency Deterministic timing in FPGAs, with latencies < GPUs FPGA
Processing/Watt FPGAs are 3-4 times better in terms of GFLOPS per watt FPGA
Backward
Compatibility
FPGA HDL can be moved to newer platforms, but with some
reworking.
GPU
Flexibility FPGA lacks flexibility to modify the hardware implementation of
the synthesized code.
GPU
Size FPGA’s lower power consumption (smaller dimensions). FPGA
18
GPU vs FPGA Performance Comparison
http://www.bertendsp.com/pdf/whitepaper/BWP001_GPU_vs_FPGA_Performance_Comparison_v1.0.pdf
Outline
 Background
 Accelerating SVM Training with:
 GPU
 FPGA
 GPU vs FPGA Performance Comparison
 Conclusion
19
 GPUs and FPGAs can offer significant improvements to the SVM
training time without scarifying recognition accuracy.
 Power management techniques are extremely important to ensure
longevity and reliability of GPUs in embedded systems.
 A single platform cannot be considered as most energy efficient for all
possible applications.
20
Conclusion
References
[1]. Catanzaro, B., Sundaram, N., Keutzer, K.: Fast support vector machine training and classification on graphics processors. In: Proceedings of the
25th international conference on Machine learning. pp. 104–111. ICML ’08, ACM, New York, NY, USA (2008)
[2]. Herrero-Lopez, S., Williams, J.R., Sanchez, A.: Parallel multiclass classification using SVMs on GPUs. In: Proceedings of the 3rd Workshop on
General-Purpose Computation on Graphics Processing Units. pp. 2–11. GPGPU ’10, ACM, New York, NY, USA (2010)
[3]. Cotter, A., Srebro, N., Keshet, J.: A GPU-tailored approach for training kernelized SVMs. In: Proceedings of the 17th ACM SIGKDD conference. pp.
805–813. KDD ’11 (2011), http://doi.acm.org/10.1145/2020408.2020548
[4]. Athanasopoulos, A., Dimou, A., Mezaris, V. and Kompatsiaris, I., 2011, April. GPU acceleration for support vector machines. In Procs. 12th Inter.
Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2011), Delft, Netherlands.
[5]. Li, Q., Salman, R., Test, E. et al. centr.eur.j.comp.sci. (2011) 1: 387. https://doi.org/10.2478/s13537-011-0028-7
[6]. Li, Q., Salman, R., Test, E., Strack, R. and Kecman, V., 2013. Parallel multitask cross validation for support vector machine using GPU. Journal of
Parallel and Distributed Computing, 73(3), pp.293-302.
[7]. Codreanu, V., Dröge, B., Williams, D., Yasar, B., Yang, P., Liu, B., Dong, F., Surinta, O., Schomaker, L.R., Roerdink, J.B. and Wiering, M.A., 2016.
Evaluating automatically parallelized versions of the support vector machine. Concurrency and Computation: Practice and Experience, 28(7),
pp.2274-2294.
[8]. Peters, E., 2015. High Performance Implementation of Support Vector Machines Using OpenCL. Rochester Institute of Technology.
[9]. Cagnin, H.E., Winck, A.T. and Barros, R.C., 2015, November. A Portable OpenCL-Based Approach for SVMs in GPU. In Intelligent Systems
(BRACIS), 2015 Brazilian Conference on(pp. 198-203). IEEE.
[10]. Nan, Y.Y., Li, Q.Z., Piao, J.C. and Kim, S.D., GPU-Accelerated SVM Training Algorithm Based on PC and Mobile Device.
[11]. Vanek, J., Michálek, J. and Psutka, J., 2017. A Comparison of Support Vector Machines Training GPU-Accelerated Open Source
Implementations. arXiv preprint arXiv:1707.06470.
21
[12]. Kuan, T. W., Wang, J. F., Wang, J. C., Lin, P. C., & Gu, G. H. (2012). VLSI design of an SVM learning core on sequential minimal
optimization algorithm. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 20(4), 673-683.
[13]. Wang. JF, P. Jr-Shiang, W. Jia-Ching, L. Po-Chuan, and K. Ta-Wen, "Hard ware/Software Co-design for Fast trainable Speaker Identification
System Based on SMO," in 2011 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2011, pp. 1621-1625.
[14]. C. H. Peng, B. W. Chen, T. W. Kuan, P. C. Lin, J. F. Wang, and N. S. Shih, "REC-STA: Reconfigurable and Efficient Chip Design With SMO-
based Training Accelerator," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 22, pp. 1791-1802, 2014.
[15]. S. Shao, O. Mencer, and W. Luk, “Dataflow design for optimal incremental svm train ing,” in FPT, 2016.
[16]. Papadonikolakis, M. and Bouganis, C.S., 2008, December. A scalable fpga architect ture for non-linear svm training. In ICECE Technology,
2008. FPT 2008. International Conference on (pp. 337-340). IEEE.
[17]. Papadonikolakis, M., Bouganis, C.S. and Constantinides, G., 2009, December. Performance comparison of GPU and FPGA architectures
for the SVM training problem. In Field-Programmable Technology, 2009. FPT 2009. International Conference on (pp. 388-391). IEEE.
[18]. Kane, J., Hernandez, R. and Yang, Q., 2015, May. A Reconfigurable Multiclass Support Vector Machine Architecture for Real-Time
Embedded Systems Classification. In Field-Programmable Custom Computing Machines (FCCM), 2015 IEEE 23rd Annual International
Symposium on (pp. 244-251). IEEE.
22
References

More Related Content

What's hot

"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co..."New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
Edge AI and Vision Alliance
 
Spine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localizationSpine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localization
Devansh16
 
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A SupercomputerIntroduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A SupercomputerFörderverein Technische Fakultät
 
AI On the Edge: Model Compression
AI On the Edge: Model CompressionAI On the Edge: Model Compression
AI On the Edge: Model Compression
Apache MXNet
 
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Intel® Software
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2Junli Gu
 
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015Junli Gu
 
OpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation finalOpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation finalJunli Gu
 
"Collaboratively Benchmarking and Optimizing Deep Learning Implementations," ...
"Collaboratively Benchmarking and Optimizing Deep Learning Implementations," ..."Collaboratively Benchmarking and Optimizing Deep Learning Implementations," ...
"Collaboratively Benchmarking and Optimizing Deep Learning Implementations," ...
Edge AI and Vision Alliance
 
Accumulo and the Convergence of Machine Learning, Big Data, and Supercomputing
Accumulo and the Convergence of Machine Learning, Big Data, and SupercomputingAccumulo and the Convergence of Machine Learning, Big Data, and Supercomputing
Accumulo and the Convergence of Machine Learning, Big Data, and Supercomputing
Accumulo Summit
 
Slides for In-Datacenter Performance Analysis of a Tensor Processing Unit
Slides for In-Datacenter Performance Analysis of a Tensor Processing UnitSlides for In-Datacenter Performance Analysis of a Tensor Processing Unit
Slides for In-Datacenter Performance Analysis of a Tensor Processing Unit
Carlo C. del Mundo
 
Cache Optimization Techniques for General Purpose Graphic Processing Units
Cache Optimization Techniques for General Purpose Graphic Processing UnitsCache Optimization Techniques for General Purpose Graphic Processing Units
Cache Optimization Techniques for General Purpose Graphic Processing Units
Vajira Thambawita
 
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ..."Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...
Edge AI and Vision Alliance
 
TPU paper slide
TPU paper slideTPU paper slide
TPU paper slide
Dong-Hyun Hwang
 
Effective machine learning_with_tpu
Effective machine learning_with_tpuEffective machine learning_with_tpu
Effective machine learning_with_tpu
Athul Suresh
 
Hybrid Multicore Computing : NOTES
Hybrid Multicore Computing : NOTESHybrid Multicore Computing : NOTES
Hybrid Multicore Computing : NOTES
Subhajit Sahu
 
Survey_Report_Deep Learning Algorithm
Survey_Report_Deep Learning AlgorithmSurvey_Report_Deep Learning Algorithm
Survey_Report_Deep Learning AlgorithmSahil Kaw
 
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place SolutionKaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Preferred Networks
 
Open power ddl and lms
Open power ddl and lmsOpen power ddl and lms
Open power ddl and lms
Ganesan Narayanasamy
 
DeepLearningAlgorithmAccelerationOnHardwarePlatforms_V2.0
DeepLearningAlgorithmAccelerationOnHardwarePlatforms_V2.0DeepLearningAlgorithmAccelerationOnHardwarePlatforms_V2.0
DeepLearningAlgorithmAccelerationOnHardwarePlatforms_V2.0Sahil Kaw
 

What's hot (20)

"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co..."New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
 
Spine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localizationSpine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localization
 
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A SupercomputerIntroduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
 
AI On the Edge: Model Compression
AI On the Edge: Model CompressionAI On the Edge: Model Compression
AI On the Edge: Model Compression
 
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2
 
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
 
OpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation finalOpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation final
 
"Collaboratively Benchmarking and Optimizing Deep Learning Implementations," ...
"Collaboratively Benchmarking and Optimizing Deep Learning Implementations," ..."Collaboratively Benchmarking and Optimizing Deep Learning Implementations," ...
"Collaboratively Benchmarking and Optimizing Deep Learning Implementations," ...
 
Accumulo and the Convergence of Machine Learning, Big Data, and Supercomputing
Accumulo and the Convergence of Machine Learning, Big Data, and SupercomputingAccumulo and the Convergence of Machine Learning, Big Data, and Supercomputing
Accumulo and the Convergence of Machine Learning, Big Data, and Supercomputing
 
Slides for In-Datacenter Performance Analysis of a Tensor Processing Unit
Slides for In-Datacenter Performance Analysis of a Tensor Processing UnitSlides for In-Datacenter Performance Analysis of a Tensor Processing Unit
Slides for In-Datacenter Performance Analysis of a Tensor Processing Unit
 
Cache Optimization Techniques for General Purpose Graphic Processing Units
Cache Optimization Techniques for General Purpose Graphic Processing UnitsCache Optimization Techniques for General Purpose Graphic Processing Units
Cache Optimization Techniques for General Purpose Graphic Processing Units
 
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ..."Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...
 
TPU paper slide
TPU paper slideTPU paper slide
TPU paper slide
 
Effective machine learning_with_tpu
Effective machine learning_with_tpuEffective machine learning_with_tpu
Effective machine learning_with_tpu
 
Hybrid Multicore Computing : NOTES
Hybrid Multicore Computing : NOTESHybrid Multicore Computing : NOTES
Hybrid Multicore Computing : NOTES
 
Survey_Report_Deep Learning Algorithm
Survey_Report_Deep Learning AlgorithmSurvey_Report_Deep Learning Algorithm
Survey_Report_Deep Learning Algorithm
 
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place SolutionKaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
 
Open power ddl and lms
Open power ddl and lmsOpen power ddl and lms
Open power ddl and lms
 
DeepLearningAlgorithmAccelerationOnHardwarePlatforms_V2.0
DeepLearningAlgorithmAccelerationOnHardwarePlatforms_V2.0DeepLearningAlgorithmAccelerationOnHardwarePlatforms_V2.0
DeepLearningAlgorithmAccelerationOnHardwarePlatforms_V2.0
 

Similar to Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Overview

BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...
BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...
BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...
Big Data Week
 
OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021
OpenACC
 
The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...
NECST Lab @ Politecnico di Milano
 
OpenACC Monthly Highlights: October2020
OpenACC Monthly Highlights: October2020OpenACC Monthly Highlights: October2020
OpenACC Monthly Highlights: October2020
OpenACC
 
electronics-11-03883.pdf
electronics-11-03883.pdfelectronics-11-03883.pdf
electronics-11-03883.pdf
RioCarthiis
 
Garbage collection auto tuning for java map reduce on multi-cores
Garbage collection auto tuning for java map reduce on multi-coresGarbage collection auto tuning for java map reduce on multi-cores
Garbage collection auto tuning for java map reduce on multi-cores
Pradeeban Kathiravelu, Ph.D.
 
OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021
OpenACC
 
Hardback solution to accelerate multimedia computation through mgp in cmp
Hardback solution to accelerate multimedia computation through mgp in cmpHardback solution to accelerate multimedia computation through mgp in cmp
Hardback solution to accelerate multimedia computation through mgp in cmp
eSAT Publishing House
 
The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: Democratize the acceleration of compute intensive applica...The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: Democratize the acceleration of compute intensive applica...
NECST Lab @ Politecnico di Milano
 
Varun Gatne - Resume - Final
Varun Gatne - Resume - FinalVarun Gatne - Resume - Final
Varun Gatne - Resume - FinalVarun Gatne
 
A SURVEY OF NEURAL NETWORK HARDWARE ACCELERATORS IN MACHINE LEARNING
A SURVEY OF NEURAL NETWORK HARDWARE ACCELERATORS IN MACHINE LEARNING A SURVEY OF NEURAL NETWORK HARDWARE ACCELERATORS IN MACHINE LEARNING
A SURVEY OF NEURAL NETWORK HARDWARE ACCELERATORS IN MACHINE LEARNING
mlaij
 
Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​
Larry Smarr
 
2023comp90024_Spartan.pdf
2023comp90024_Spartan.pdf2023comp90024_Spartan.pdf
2023comp90024_Spartan.pdf
LevLafayette1
 
OpenACC Monthly Highlights: May 2020
OpenACC Monthly Highlights: May 2020OpenACC Monthly Highlights: May 2020
OpenACC Monthly Highlights: May 2020
OpenACC
 
Data Parallel Deep Learning
Data Parallel Deep LearningData Parallel Deep Learning
Data Parallel Deep Learning
inside-BigData.com
 
20594-39025-1-PB.pdf
20594-39025-1-PB.pdf20594-39025-1-PB.pdf
20594-39025-1-PB.pdf
IjictTeam
 
1605.08695.pdf
1605.08695.pdf1605.08695.pdf
1605.08695.pdf
mohammadA42
 
Getting the most out of multi-GPU on Inference stage using Hadoop-spark cluster
Getting the most out of multi-GPU on Inference stage using Hadoop-spark clusterGetting the most out of multi-GPU on Inference stage using Hadoop-spark cluster
Getting the most out of multi-GPU on Inference stage using Hadoop-spark cluster
Daesu Chung
 
Presentation
PresentationPresentation
Presentationbutest
 

Similar to Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Overview (20)

BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...
BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...
BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...
 
OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021
 
The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...
 
OpenACC Monthly Highlights: October2020
OpenACC Monthly Highlights: October2020OpenACC Monthly Highlights: October2020
OpenACC Monthly Highlights: October2020
 
electronics-11-03883.pdf
electronics-11-03883.pdfelectronics-11-03883.pdf
electronics-11-03883.pdf
 
Garbage collection auto tuning for java map reduce on multi-cores
Garbage collection auto tuning for java map reduce on multi-coresGarbage collection auto tuning for java map reduce on multi-cores
Garbage collection auto tuning for java map reduce on multi-cores
 
OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021
 
Hardback solution to accelerate multimedia computation through mgp in cmp
Hardback solution to accelerate multimedia computation through mgp in cmpHardback solution to accelerate multimedia computation through mgp in cmp
Hardback solution to accelerate multimedia computation through mgp in cmp
 
The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: Democratize the acceleration of compute intensive applica...The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: Democratize the acceleration of compute intensive applica...
 
Varun Gatne - Resume - Final
Varun Gatne - Resume - FinalVarun Gatne - Resume - Final
Varun Gatne - Resume - Final
 
A SURVEY OF NEURAL NETWORK HARDWARE ACCELERATORS IN MACHINE LEARNING
A SURVEY OF NEURAL NETWORK HARDWARE ACCELERATORS IN MACHINE LEARNING A SURVEY OF NEURAL NETWORK HARDWARE ACCELERATORS IN MACHINE LEARNING
A SURVEY OF NEURAL NETWORK HARDWARE ACCELERATORS IN MACHINE LEARNING
 
Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​
 
2023comp90024_Spartan.pdf
2023comp90024_Spartan.pdf2023comp90024_Spartan.pdf
2023comp90024_Spartan.pdf
 
OpenACC Monthly Highlights: May 2020
OpenACC Monthly Highlights: May 2020OpenACC Monthly Highlights: May 2020
OpenACC Monthly Highlights: May 2020
 
Data Parallel Deep Learning
Data Parallel Deep LearningData Parallel Deep Learning
Data Parallel Deep Learning
 
20120140505010
2012014050501020120140505010
20120140505010
 
20594-39025-1-PB.pdf
20594-39025-1-PB.pdf20594-39025-1-PB.pdf
20594-39025-1-PB.pdf
 
1605.08695.pdf
1605.08695.pdf1605.08695.pdf
1605.08695.pdf
 
Getting the most out of multi-GPU on Inference stage using Hadoop-spark cluster
Getting the most out of multi-GPU on Inference stage using Hadoop-spark clusterGetting the most out of multi-GPU on Inference stage using Hadoop-spark cluster
Getting the most out of multi-GPU on Inference stage using Hadoop-spark cluster
 
Presentation
PresentationPresentation
Presentation
 

Recently uploaded

Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfCOLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
Kamal Acharya
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Vaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdfVaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdf
Kamal Acharya
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
Pipe Restoration Solutions
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
abh.arya
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdf
Kamal Acharya
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
PrashantGoswami42
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 

Recently uploaded (20)

Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfCOLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
Vaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdfVaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdf
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdf
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 

Hardware Acceleration of SVM Training for Real-time Embedded Systems: An Overview

  • 1. HARDWARE ACCELERATION OF SVM TRAINING FOR REAL-TIME EMBEDDED SYSTEMS: AN OVERVIEW Ilham Amezzane Ibn Tofail University March 26th, 20181
  • 2. Outline  Background  Accelerating SVM Training with:  GPU  FPGA  GPU vs FPGA Performance Comparison  Conclusion 2
  • 3.  Smartphone-based Applications :  Healthcare  Smart Homes  WSNs  Challenges:  Large datasets  Needs of accelerating the processing speed  Limited resources 3 Real-time Embedded Applications
  • 4. Support Vector Machines (SVM) 4  Instance-based:  Optimal hyperplane for linearly separable patterns.  Strength: • Can apply linear classification techniques to non-linear data using the kernel trick. • High accuracy  Weakness: • Memory-intensive • Hard to interpret
  • 5.  Quadratic Programming (QP):  size grows with the number of training samples : of O(N2) complexity.  Several decomposition methods:  e.g. Sequential Minimal Optimization (SMO)  CPU standard version (LIBSVM):  SMO based  For real-time applications, can be :  very time-consuming  computationally intensive SVM Training Algorithm: Limitations 5
  • 6. Outline  Background  Accelerating SVM Training with:  GPU  FPGA  GPU vs FPGA Performance Comparison  Conclusion 6
  • 7. Graphic Processing Unit (GPU)  Computer intensive  Highly-parallel computation  More data processing than caching and flow control 7 https://www.carestream.com/blog/wp-content/uploads/2015/09/CSH_CPU-GPU_Illustration.png
  • 8. GPU Programming Frameworks  CUDA:  NVIDIA  OpenCL:  AMD (CPUs, GPUs),  Intel (CPUs, GPUs),  Nvidia (GPUs),  Qualcomm (embedded/mobile CPUs)  ALTERA (FPGAs), OpenCL allows heterogeneous computation in one system. 8
  • 9. (2008, 2010)/ Works based on modified SMO algorithm of the standard LibSVM:  Dataset dependent speedups (2011)/ Works based on pre-calculating the kernel matrix elements:  Combining the CPU and the GPU  GPU speed has higher impact on the total training time. (2011)/ New package GPUSVM :  a CV tool, a fast training tool and a predicting tool.  2.27 – 77 times faster (2013)/ A novel implementation to accelerate the CV procedure :  Running multiple training tasks simultaneously  10- 100 times faster. 9 Research Works with GPU
  • 10. 10 (2015)/ Heterogeneous computing system  OpenCL framework  9- 22 times faster. (2016)/ Converting a gradient-ascent based algorithm to a GPU implementation:  Fastest for high-dimensional feature vectors. (2016)/ Accelerating the CV process:  OpenCL framework  Applied in a mobile device  1.5 times faster Research Works with GPU
  • 11.  Dense matrix format  For storing datasets  RBF kernel  Without the possibility of changing the used kernel easily  Binary classification  In most cases 11 Limitations
  • 12. Outline  Background  Accelerating SVM Training with:  GPU  FPGA  GPU vs FPGA Performance Comparison  Conclusion 12
  • 13.  Parallelism & Pipelining  High performance  Reconfigurability 13 Field-programmable Gate Array (FPGA) Generic FPGA Architecture
  • 14. FPGA  Typical approaches to speed up the SVM computations :  Increasing the level of parallelism  exploiting the inherent parallelism of the SVM algorithm.  Reducing the bit width of the data representation  reducing the resource usage. 14
  • 15. (2008)/ A scalable FPGA architecture based on Gilbert’s algorithm:  Partitioned into floating-point and fixed-point domains.  3 orders of magnitude faster than SW implementation. (2011)/ A novel architecture for the SMO process:  With a memory block and a cache block  A decrease in processing time from using the cache (2011)/ Modular design improved:  90% reduction in training time (2014)/ A novel reconfigurable chip design for accelerating SMO :  Reconfigurable architectures.  Dynamic scheduling for an efficient reconfiguration.  Power consumption (17 times )  Training speed (16 times ) 15 Research works with FPGA
  • 16. Research works with FPGA (2015)/ First floating-point based and multi-use reconfigurable HW: R2SVM  Modifications of the number of classes/features.  Modifications of kernel selection and parameters at run-time.  Extensive pipelining and parallelism.  Examined in a human-computer wireless interface  Operating at a very low power level. (2016)/ A novel optimised dataflow architecture for incremental SVM training:  Up to 40.97 times faster. 16
  • 17. Outline  Background  Accelerating SVM Training with:  GPU  FPGA  GPU vs FPGA Performance Comparison  Conclusion 17
  • 18. Feature Analysis Winner Floating-point Processing Total Flops of GPUs > the best FPGAs’ GPU Timing Latency Deterministic timing in FPGAs, with latencies < GPUs FPGA Processing/Watt FPGAs are 3-4 times better in terms of GFLOPS per watt FPGA Backward Compatibility FPGA HDL can be moved to newer platforms, but with some reworking. GPU Flexibility FPGA lacks flexibility to modify the hardware implementation of the synthesized code. GPU Size FPGA’s lower power consumption (smaller dimensions). FPGA 18 GPU vs FPGA Performance Comparison http://www.bertendsp.com/pdf/whitepaper/BWP001_GPU_vs_FPGA_Performance_Comparison_v1.0.pdf
  • 19. Outline  Background  Accelerating SVM Training with:  GPU  FPGA  GPU vs FPGA Performance Comparison  Conclusion 19
  • 20.  GPUs and FPGAs can offer significant improvements to the SVM training time without scarifying recognition accuracy.  Power management techniques are extremely important to ensure longevity and reliability of GPUs in embedded systems.  A single platform cannot be considered as most energy efficient for all possible applications. 20 Conclusion
  • 21. References [1]. Catanzaro, B., Sundaram, N., Keutzer, K.: Fast support vector machine training and classification on graphics processors. In: Proceedings of the 25th international conference on Machine learning. pp. 104–111. ICML ’08, ACM, New York, NY, USA (2008) [2]. Herrero-Lopez, S., Williams, J.R., Sanchez, A.: Parallel multiclass classification using SVMs on GPUs. In: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units. pp. 2–11. GPGPU ’10, ACM, New York, NY, USA (2010) [3]. Cotter, A., Srebro, N., Keshet, J.: A GPU-tailored approach for training kernelized SVMs. In: Proceedings of the 17th ACM SIGKDD conference. pp. 805–813. KDD ’11 (2011), http://doi.acm.org/10.1145/2020408.2020548 [4]. Athanasopoulos, A., Dimou, A., Mezaris, V. and Kompatsiaris, I., 2011, April. GPU acceleration for support vector machines. In Procs. 12th Inter. Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2011), Delft, Netherlands. [5]. Li, Q., Salman, R., Test, E. et al. centr.eur.j.comp.sci. (2011) 1: 387. https://doi.org/10.2478/s13537-011-0028-7 [6]. Li, Q., Salman, R., Test, E., Strack, R. and Kecman, V., 2013. Parallel multitask cross validation for support vector machine using GPU. Journal of Parallel and Distributed Computing, 73(3), pp.293-302. [7]. Codreanu, V., Dröge, B., Williams, D., Yasar, B., Yang, P., Liu, B., Dong, F., Surinta, O., Schomaker, L.R., Roerdink, J.B. and Wiering, M.A., 2016. Evaluating automatically parallelized versions of the support vector machine. Concurrency and Computation: Practice and Experience, 28(7), pp.2274-2294. [8]. Peters, E., 2015. High Performance Implementation of Support Vector Machines Using OpenCL. Rochester Institute of Technology. [9]. Cagnin, H.E., Winck, A.T. and Barros, R.C., 2015, November. A Portable OpenCL-Based Approach for SVMs in GPU. In Intelligent Systems (BRACIS), 2015 Brazilian Conference on(pp. 198-203). IEEE. [10]. Nan, Y.Y., Li, Q.Z., Piao, J.C. and Kim, S.D., GPU-Accelerated SVM Training Algorithm Based on PC and Mobile Device. [11]. Vanek, J., Michálek, J. and Psutka, J., 2017. A Comparison of Support Vector Machines Training GPU-Accelerated Open Source Implementations. arXiv preprint arXiv:1707.06470. 21
  • 22. [12]. Kuan, T. W., Wang, J. F., Wang, J. C., Lin, P. C., & Gu, G. H. (2012). VLSI design of an SVM learning core on sequential minimal optimization algorithm. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 20(4), 673-683. [13]. Wang. JF, P. Jr-Shiang, W. Jia-Ching, L. Po-Chuan, and K. Ta-Wen, "Hard ware/Software Co-design for Fast trainable Speaker Identification System Based on SMO," in 2011 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2011, pp. 1621-1625. [14]. C. H. Peng, B. W. Chen, T. W. Kuan, P. C. Lin, J. F. Wang, and N. S. Shih, "REC-STA: Reconfigurable and Efficient Chip Design With SMO- based Training Accelerator," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 22, pp. 1791-1802, 2014. [15]. S. Shao, O. Mencer, and W. Luk, “Dataflow design for optimal incremental svm train ing,” in FPT, 2016. [16]. Papadonikolakis, M. and Bouganis, C.S., 2008, December. A scalable fpga architect ture for non-linear svm training. In ICECE Technology, 2008. FPT 2008. International Conference on (pp. 337-340). IEEE. [17]. Papadonikolakis, M., Bouganis, C.S. and Constantinides, G., 2009, December. Performance comparison of GPU and FPGA architectures for the SVM training problem. In Field-Programmable Technology, 2009. FPT 2009. International Conference on (pp. 388-391). IEEE. [18]. Kane, J., Hernandez, R. and Yang, Q., 2015, May. A Reconfigurable Multiclass Support Vector Machine Architecture for Real-Time Embedded Systems Classification. In Field-Programmable Custom Computing Machines (FCCM), 2015 IEEE 23rd Annual International Symposium on (pp. 244-251). IEEE. 22 References