MapReduce based SVM

3,733 views

Published on

Training SVM in Cloud Computing Systems with MapReduce

Published in: Technology
  • Be the first to comment

MapReduce based SVM

  1. 1. 1. Introduction 2. Support Vector Machine 3. MapReduce 4. Development of System Model 5. Simulation Results 6.Conclusion CloudSVM: Training an SVM Classifier in Cloud Computing Systems 1 F. Ozgur CATAK 2 - M. Erdal BALABAN 1 TUBITAK - National Research Institute of Electronics and Cryptology(UEKAE) 2 Istanbul University, Faculty of Business Administration, Department of Quantitative Methods ICPCA / SWS 2012 28 Nov 20121 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
  2. 2. 1. Introduction 2. Support Vector Machine 3. MapReduce 4. Development of System Model 5. Simulation Results 6.Conclusion Motivation Our Research Focus Overcome Big Space Complexity and Time Complexity of Support Vector Machine Algorithm Training SVM in Cloud Systems with MapReduce Using HDFS File System Try to Find out a Global Classifier Function2 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
  3. 3. 1. Introduction 2. Support Vector Machine 3. MapReduce 4. Development of System Model 5. Simulation Results 6.Conclusion Contents 1 1. Introduction 1.1 Support Vector Machine 1.2 SVM Solutions 2 2. Support Vector Machine 2.1 Definition 2.3 Optimization Problem 2.4 Lagrange Multiplier 3 3. MapReduce 3.1 MapReduce - Cloud Computing Algorithm 3.2 Schematic View of MapReduce 4 4. Development of System Model 4.1 Overview 4.2 CloudSVM Architecture Schematic View 4.3 CloudSVM Algorithm MapReduce Function 5 5. Simulation Results 5.1 Method 5.2 UCI Dataset Results 5.3 Convergence of CloudSVM 6 6.Conclusion Conclusion & Recommendation References3 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
  4. 4. 1. Introduction 2. Support Vector Machine 3. MapReduce 1.1 Support Vector Machine 4. Development of System Model 1.2 SVM Solutions 5. Simulation Results 6.Conclusion Contents 1 1. Introduction 1.1 Support Vector Machine 1.2 SVM Solutions 2 2. Support Vector Machine 2.1 Definition 2.3 Optimization Problem 2.4 Lagrange Multiplier 3 3. MapReduce 3.1 MapReduce - Cloud Computing Algorithm 3.2 Schematic View of MapReduce 4 4. Development of System Model 4.1 Overview 4.2 CloudSVM Architecture Schematic View 4.3 CloudSVM Algorithm MapReduce Function 5 5. Simulation Results 5.1 Method 5.2 UCI Dataset Results 5.3 Convergence of CloudSVM 6 6.Conclusion Conclusion & Recommendation References4 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
  5. 5. 1. Introduction 2. Support Vector Machine 3. MapReduce 1.1 Support Vector Machine 4. Development of System Model 1.2 SVM Solutions 5. Simulation Results 6.Conclusion 1. INTRODUCTION Support Vector Machine - SVM Developed from Statistical Learning Theory (Vapnik & Chervonenkis) Supervised learning method in statistics and computer science Analyze data and recognize patterns, used for classification and regression analysis Maximum generalization accuracy while avoiding overfit Issues computationally expensive to process Quadratic optimization problem has O(m3 ) time and O(m2 ) space complexity, where m is the training set size4 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
  6. 6. 1. Introduction 2. Support Vector Machine 3. MapReduce 1.1 Support Vector Machine 4. Development of System Model 1.2 SVM Solutions 5. Simulation Results 6.Conclusion 1. INTRODUCTION Solution - Feature Reduction Singular Value Decomposition (SVD) Principal Component Analysis (PCA) Independent Component Analysis (ICA) Correlation Based Feature Selection (CFS) Solution - Distributed Computing Conventional distributed machine learning methods are complicated Pre-Configured Intranet/Internet Environments Costly5 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
  7. 7. 1. Introduction 2. Support Vector Machine 2.1 Definition 3. MapReduce 2.3 Optimization Problem 4. Development of System Model 2.4 Lagrange Multiplier 5. Simulation Results 6.Conclusion Contents 1 1. Introduction 1.1 Support Vector Machine 1.2 SVM Solutions 2 2. Support Vector Machine 2.1 Definition 2.3 Optimization Problem 2.4 Lagrange Multiplier 3 3. MapReduce 3.1 MapReduce - Cloud Computing Algorithm 3.2 Schematic View of MapReduce 4 4. Development of System Model 4.1 Overview 4.2 CloudSVM Architecture Schematic View 4.3 CloudSVM Algorithm MapReduce Function 5 5. Simulation Results 5.1 Method 5.2 UCI Dataset Results 5.3 Convergence of CloudSVM 6 6.Conclusion Conclusion & Recommendation References6 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
  8. 8. 1. Introduction 2. Support Vector Machine 2.1 Definition 3. MapReduce 2.3 Optimization Problem 4. Development of System Model 2.4 Lagrange Multiplier 5. Simulation Results 6.Conclusion 2. SUPPORT VECTOR MACHINE Support Vector Machine In machine learning, support vector machines are supervised learning models with associated learning algorithms that analyze data and recognize patterns, used for classification and regression analysis. An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible.6 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
  9. 9. 1. Introduction 2. Support Vector Machine 2.1 Definition 3. MapReduce 2.3 Optimization Problem 4. Development of System Model 2.4 Lagrange Multiplier 5. Simulation Results 6.Conclusion 2. SUPPORT VECTOR MACHINE D a set of n points of the form. for each xi in data set D w.xi − b > 1 if yi = 1, (1) w.xi − b < −1 if yi = −1 (2) Or equivalently yi (w.xi − b) ≥ 1, ∀(xi , yi ) ∈ D (3) . The distance between these two hyper- |F (xi )| 1 planes is # w» =⇒ w #» . Maximize distance between these two hyperplanes: D = {(xi , yi ) | xi ∈ Rm , yi ∈ {−1, 1} }n i=1 1 #» 2 M inimize : P (w, b) = w 2 (4) subject to : yi ( w, #»i #» x + b) ≥ 17 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
  10. 10. 1. Introduction 2. Support Vector Machine 2.1 Definition 3. MapReduce 2.3 Optimization Problem 4. Development of System Model 2.4 Lagrange Multiplier 5. Simulation Results 6.Conclusion 2. SUPPORT VECTOR MACHINE By introducing Lagrange multipliers α, the previous linear constrained problem can be expressed as Optimization Problem #» M inimize :P ( w, b) = 1 #» 2 w 2 (5) Subject to :yi ( w, #»i + b) ≥ 1 #» x Lagrange Multipliers n #» J( w, b, α) = 1 #» w 2 + αi (yi ( w. #»i − b) − 1) #» x (6) 2 i=18 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
  11. 11. 1. Introduction 2. Support Vector Machine 2.1 Definition 3. MapReduce 2.3 Optimization Problem 4. Development of System Model 2.4 Lagrange Multiplier 5. Simulation Results 6.Conclusion 2. SUPPORT VECTOR MACHINE Lagrange Multiplier Solution Minimization of Lagrange Function J(w, b, α) respect to w and b’. Saddle Points ; #» ∂J( w , b, α) State 1 = =0 ∂w #» ∂J( w , b, α) State 2 = =0 ∂b State 1 ve 2 solution, m n #» w = αi yi # i and x» αi yi = 0 (7) i=1 i=1 New Optimization Problem n 1 n n M aksimize :Q = αi − αi αj yi yj # i # j » » x x i=1 2 i=1 j=1 n (8) subject to : αi y i = 0 i=1 α≥09 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
  12. 12. 1. Introduction 2. Support Vector Machine 3. MapReduce 3.1 MapReduce - Cloud Computing Algorithm 4. Development of System Model 3.2 Schematic View of MapReduce 5. Simulation Results 6.Conclusion Contents 1 1. Introduction 1.1 Support Vector Machine 1.2 SVM Solutions 2 2. Support Vector Machine 2.1 Definition 2.3 Optimization Problem 2.4 Lagrange Multiplier 3 3. MapReduce 3.1 MapReduce - Cloud Computing Algorithm 3.2 Schematic View of MapReduce 4 4. Development of System Model 4.1 Overview 4.2 CloudSVM Architecture Schematic View 4.3 CloudSVM Algorithm MapReduce Function 5 5. Simulation Results 5.1 Method 5.2 UCI Dataset Results 5.3 Convergence of CloudSVM 6 6.Conclusion Conclusion & Recommendation References10 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
  13. 13. 1. Introduction 2. Support Vector Machine 3. MapReduce 3.1 MapReduce - Cloud Computing Algorithm 4. Development of System Model 3.2 Schematic View of MapReduce 5. Simulation Results 6.Conclusion 3. MapReduce - Cloud Computing Algorithm MapReduce Overview Breaks large problem into smaller parts, solve in parallel, combine results. Programmer specifies map and reduce functions. Transparent Scaling: use same code on MBs locally or TBs across thousands of machines.10 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
  14. 14. 1. Introduction 2. Support Vector Machine 3. MapReduce 3.1 MapReduce - Cloud Computing Algorithm 4. Development of System Model 3.2 Schematic View of MapReduce 5. Simulation Results 6.Conclusion 3. MapReduce - Cloud Computing Algorithm MapReduce Overview Most Popular Cloud Computing Model Elastic Framework for Software Developers for Parallel and Distributed Applications Input and Output files are on distributed file system. map(key1 , value1 ) ⇒ list(key2 , value2 ) reduce(key2 , list(value2 )) ⇒ list(value3 ) Figure : Overview of MapReduce11 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
  15. 15. 1. Introduction 2. Support Vector Machine 4.1 Overview 3. MapReduce 4.2 CloudSVM Architecture Schematic View 4. Development of System Model 4.3 CloudSVM Algorithm MapReduce Function 5. Simulation Results 6.Conclusion Contents 1 1. Introduction 1.1 Support Vector Machine 1.2 SVM Solutions 2 2. Support Vector Machine 2.1 Definition 2.3 Optimization Problem 2.4 Lagrange Multiplier 3 3. MapReduce 3.1 MapReduce - Cloud Computing Algorithm 3.2 Schematic View of MapReduce 4 4. Development of System Model 4.1 Overview 4.2 CloudSVM Architecture Schematic View 4.3 CloudSVM Algorithm MapReduce Function 5 5. Simulation Results 5.1 Method 5.2 UCI Dataset Results 5.3 Convergence of CloudSVM 6 6.Conclusion Conclusion & Recommendation References12 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
  16. 16. 1. Introduction 2. Support Vector Machine 4.1 Overview 3. MapReduce 4.2 CloudSVM Architecture Schematic View 4. Development of System Model 4.3 CloudSVM Algorithm MapReduce Function 5. Simulation Results 6.Conclusion 4. Development of System Model CloudSVM It’s a new Technique for Training SVM in Cloud with MapReduce Training data set is uploaded to HDFS We found classifier functions with this novel approach for data sets in HDFS What’s new in CloudSVM SVM has O(m3 ) time complexity and O(m2 ) space complexity where m is data set size. It is very important result for large scale data sets and BigData12 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
  17. 17. 1. Introduction 2. Support Vector Machine 4.1 Overview 3. MapReduce 4.2 CloudSVM Architecture Schematic View 4. Development of System Model 4.3 CloudSVM Algorithm MapReduce Function 5. Simulation Results 6.Conclusion 4. Development of System Model Figure : CloudSVM Architecture Schematic View.13 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
  18. 18. 1. Introduction 2. Support Vector Machine 4.1 Overview 3. MapReduce 4.2 CloudSVM Architecture Schematic View 4. Development of System Model 4.3 CloudSVM Algorithm MapReduce Function 5. Simulation Results 6.Conclusion 4. Development of System Model CloudSVM Algorithm Map Function SVGlobal = ∅ {Empty global support vector set} while ht = ht−1 do for l ∈ L {For each subset loop} do t t t Dl ← Dl ∪ SVGlobal end for end while14 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
  19. 19. 1. Introduction 2. Support Vector Machine 4.1 Overview 3. MapReduce 4.2 CloudSVM Architecture Schematic View 4. Development of System Model 4.3 CloudSVM Algorithm MapReduce Function 5. Simulation Results 6.Conclusion 4. Development of System Model CloudSVM Algorithm Reduce Function while ht = ht−1 do for l ∈ L do SVl , ht ← svm(Dl ) {Train merged Dataset to obtain Support Vectors and Hypothesis } end for for l ∈ L do SVGlobal ← SVGlobal ∪ SVl end for end while15 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
  20. 20. 1. Introduction 2. Support Vector Machine 5.1 Method 3. MapReduce 5.2 UCI Dataset Results 4. Development of System Model 5.3 Convergence of CloudSVM 5. Simulation Results 6.Conclusion Contents 1 1. Introduction 1.1 Support Vector Machine 1.2 SVM Solutions 2 2. Support Vector Machine 2.1 Definition 2.3 Optimization Problem 2.4 Lagrange Multiplier 3 3. MapReduce 3.1 MapReduce - Cloud Computing Algorithm 3.2 Schematic View of MapReduce 4 4. Development of System Model 4.1 Overview 4.2 CloudSVM Architecture Schematic View 4.3 CloudSVM Algorithm MapReduce Function 5 5. Simulation Results 5.1 Method 5.2 UCI Dataset Results 5.3 Convergence of CloudSVM 6 6.Conclusion Conclusion & Recommendation References16 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
  21. 21. 1. Introduction 2. Support Vector Machine 5.1 Method 3. MapReduce 5.2 UCI Dataset Results 4. Development of System Model 5.3 Convergence of CloudSVM 5. Simulation Results 6.Conclusion 5. SIMULATION RESULTS Method We used 10-fold cross-validation, dividing the set of samples at random into 10 approximately equal-size parts. We used ”Hinge Loss” for testing our models trained with CloudSVM algorithm. Empirical risk can be computed with an approximation. l(f ( #»), y) = max {0, 1 − y.f ( #»)} x x (9) n l(h( #»i ), yi ) 1 Remp (h) = x (10) n i=1 According to the empirical risk minimization principle the learning algorithm ˆ should choose a hypothesis h which minimizes the empirical risk: ˆ h = arg min Remp (h). (11) h∈H16 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
  22. 22. 1. Introduction 2. Support Vector Machine 5.1 Method 3. MapReduce 5.2 UCI Dataset Results 4. Development of System Model 5.3 Convergence of CloudSVM 5. Simulation Results 6.Conclusion 5. SIMULATION RESULTS Softwares & Development Environments Hadoop 0.23 Python 2.7 SciPy, NumPy (Scientific and Numeric Python Libraries) pythonxy (Scientific-oriented Python Distribution based on Qt and Spyder) MrJob 0.3.5 (Hadoop Streaming) LibSVM Centos 6.2 64 bit17 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
  23. 23. 1. Introduction 2. Support Vector Machine 5.1 Method 3. MapReduce 5.2 UCI Dataset Results 4. Development of System Model 5.3 Convergence of CloudSVM 5. Simulation Results 6.Conclusion 5. SIMULATION RESULTS Table : Various UCI Datasets Dataset Row Feature γ C Iteration SV Accuracy Kernel Type German 1000 24 100 1 5 606 0.7728 Linear Heart 270 13 100 1 3 137 0.8259 Linear Ionosphere 351 34 108 1 3 160 0.8423 Linear Satellite 4435 36 100 1 2 1384 0.9064 Linear18 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
  24. 24. 1. Introduction 2. Support Vector Machine 5.1 Method 3. MapReduce 5.2 UCI Dataset Results 4. Development of System Model 5.3 Convergence of CloudSVM 5. Simulation Results 6.Conclusion 5. SIMULATION RESULTS Table : Data set prediction accuracy with iterations German & Heart Datasets. Smoothly Converges to Loss Values and SVs Size19 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
  25. 25. 1. Introduction 2. Support Vector Machine 5.1 Method 3. MapReduce 5.2 UCI Dataset Results 4. Development of System Model 5.3 Convergence of CloudSVM 5. Simulation Results 6.Conclusion 5. SIMULATION RESULTS Table : Data set prediction accuracy with iterations Ionosphere & Satellite Datasets. Smoothly Converges to Loss Values and SVs Size20 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
  26. 26. 1. Introduction 2. Support Vector Machine 3. MapReduce Conclusion & Recommendation 4. Development of System Model References 5. Simulation Results 6.Conclusion Contents 1 1. Introduction 1.1 Support Vector Machine 1.2 SVM Solutions 2 2. Support Vector Machine 2.1 Definition 2.3 Optimization Problem 2.4 Lagrange Multiplier 3 3. MapReduce 3.1 MapReduce - Cloud Computing Algorithm 3.2 Schematic View of MapReduce 4 4. Development of System Model 4.1 Overview 4.2 CloudSVM Architecture Schematic View 4.3 CloudSVM Algorithm MapReduce Function 5 5. Simulation Results 5.1 Method 5.2 UCI Dataset Results 5.3 Convergence of CloudSVM 6 6.Conclusion Conclusion & Recommendation References21 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
  27. 27. 1. Introduction 2. Support Vector Machine 3. MapReduce Conclusion & Recommendation 4. Development of System Model References 5. Simulation Results 6.Conclusion 6. CONCLUSION & RECOMMENDATION Conclusion We showed the simulation results Stable and High Generalization Property Independent of Network and Computer Infrastructure (Cloud Computing Based) Recommendation Multiclass Classification Application to Real Datasets How many several different parts can be divided?21 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
  28. 28. 1. Introduction 2. Support Vector Machine 3. MapReduce Conclusion & Recommendation 4. Development of System Model References 5. Simulation Results 6.Conclusion 6. REFERENCES Vapnik, V.N.: The nature of statistical learning theory. Springer, NY (1995) Chang, E.Y., Zhu, K., Wang, H., Bai, H., Li, J. and Qiu, Z.,Cui, H.: PSVM: Parallelizing Support Vector Machines on Distributed Computers. Advances in Neural Information Processing Systems 20, (2007) Lu, Y., Roychowdhury, V., Vandenberghe, L.: Distributed parallel support vector machines in strongly connected networks. IEEE Trans. Neural Networks, 19, 1167-1178 (2008) Graf,H. P., Cosatto, E., Bottou, L., Durdanovic, I., Vapnik, V.: Parallel support vector machines: The cascade SVM.In: Proceedings of the Eighteenth Annual Conference on Neural Information Processing Systems (NIPS), pp. 521-528. MIT Press, Vancouver (2004) Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In :Proceedings of the 6th conference on Symposium on Operating Systems Design & Implementation(OSDI), pp. 10-10. USENIX Association, Berkeley (2004)22 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems

×