Shogun 2.0 @ PyData NYC 2012

1,059 views

Published on

Talk about Shogun at PyData NYC 2012.

  • Be the first to comment

Shogun 2.0 @ PyData NYC 2012

  1. 1. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integration The SHOGUN Machine Learning Toolbox 2.0 (and its python interface) S¨ren Sonnenburg, Gunnar R¨tsch, Sebastian Henschel, o a Christian Widmer,Jonas Behr, Alexander Zien, Fabio De Bona, Alexander Binder, Christian Gehl, and Vojtech Franc GSoC students: Sergey Lisitsyn, Heiko Strathmann, many more... fml
  2. 2. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integrationWhat is Shogun? Machine Learning Toolkit Broad range of ML algorithms (600 classes) Large-scale algorithms (up to 50 million examples) Core written in C++ (> 190, 000 lines of code) SWIG bindings (support for 8 target languages) Used in many projects Gene starts: ARTS [7] Splice sites: mSplicer [5] Sensor fusion (private sector) ...many more (see google scholar)! pics/msklogo.p
  3. 3. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integrationArchitecture SWIG - Simple Wrapper Interface Generator Bindings to a growing number of languages! pics/msklogo.p Typemaps!!
  4. 4. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integrationShogun’s history Project started 1999 Early focus on large-scale SVMs and Kernels GSoC significantly pushed project forward pics/msklogo.p
  5. 5. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integrationMachine Learning - Learning from DataWhat is Machine Learning and what can it do for you? What is ML? AIM: Learning from empirical data! Applications speech and handwriting recognition medical diagnosis, bioinformatics computer vision, object recognition stock market analysis network security, intrusion detection . . . pics/msklogo.p
  6. 6. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integrationMachine Learning - Learning from DataWhat is Machine Learning and what can it do for you? What is ML? AIM: Learning from empirical data! Applications speech and handwriting recognition medical diagnosis, bioinformatics computer vision, object recognition stock market analysis network security, intrusion detection . . . pics/msklogo.p
  7. 7. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integrationSupport Vector MachinesSupport Vector Machine (SVMs) SVM primal n 1 2 min w 2 +C max 1 − yi w xi , 0) w 2 i=1 regularizer = robustness loss = error on train data Training: Solve optimization problem pics/msklogo.p
  8. 8. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integrationSupport Vector MachinesSupport Vector Machine (SVMs) SVM primal n 1 2 min w 2 +C max 1 − yi w xi , 0) w 2 i=1 regularizer = robustness loss = error on train data Training: Solve optimization problem pics/msklogo.p
  9. 9. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integrationSupport Vector MachinesSVM with Kernels SVM dual k(xi ,xj ) n n n 1 max − αi αj yi yj xT xj i )− αi , α 2 i=1 j=1 i=1 s.t. 0 ≤ αi ≤ C ∀i ∈ {1, n} Kernel: Similarity measure; generalization of dot product pics/msklogo.p Corresponds to dot product in higher dimensional space
  10. 10. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integrationDemo: Support Vector Classification Task: separate 2 clouds of points in 2D Simple code example: SVM Training lab = BinaryLabels(labels) train_xt = RealFeatures(features) gk = GaussianKernel(train_xt, train_xt, width) svm = LibSVM(10.0, gk, lab) svm.train() test_examples = RealFeatures(test_features) out = svm.apply(test_examples) pics/msklogo.p
  11. 11. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integrationSVMs and Kernels Provides generic interface to 11 SVM solvers Established implementations for solving SVMs with kernels More recent developments: Fast linear SVM solvers Kernels for Real-valued Data (in demo) Linear Kernel, Polynomial Kernel, Gaussian Kernel String Kernels Applications in Bioinformatics [4, 8, 10] Intrusion Detection Heterogeneous Data Sources M Combined kernel: K (x, z) = i=1 βi · Ki (x, z) βi can be learned using Multiple Kernel Learning [6, 2] pics/msklogo.p
  12. 12. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integrationBeyond Classification (a) GP regression (b) Structured Output (c) Multitask Learning Regression: Labels are real values (think least squares) Structured Output Learning: Predict complex structures Multitask Learning: Solve several related problems simultaneuously pics/msklogo.p
  13. 13. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integrationMultitask Learning Example: Learn movie user preferece Multitask Learning: Jointly learn models for different countries pics/msklogo.p Couple related models more strongly
  14. 14. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integrationMultitask Learning Example: Learn movie user preferece Multitask Learning: Jointly learn models for different countries pics/msklogo.p Couple related models more strongly
  15. 15. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integrationMultitask Learning Example: Learn movie user preferece Multitask Learning: Jointly learn models for different countries pics/msklogo.p Couple related models more strongly
  16. 16. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integrationMultitask Learning Example: Learn movie user preferece Multitask Learning: Jointly learn models for different countries pics/msklogo.p Couple related models more strongly
  17. 17. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integrationRegularization-based MTL Multitask Learning is often implemented using regularization: T T 2A Graph-regularizer: s=1 t=1 w s − wt s,t Keeps model parameters similar Based on given similarity matrix A n L2,1 -regularizer: W 2,1 = i=1 wi Selects common sub-space Allows any wt in that sub-space Clustered MTL: Unknown task relationship Identifies similar tasks pics/msklogo.p
  18. 18. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integrationMultitask Learning: MTL Training feat, labels = ... # Shogun Data objects task_one = Task(0,10) task_two = Task(10,20) group = TaskGroup() group.append_task(task_one) group.append_task(task_two) mtlr = MultitaskL12(0.1,0.1,feat,labels,group) mtlr.train() Efficient LibLinear-style solver Graph-reg SVM [9] pics/msklogo.p 10 other MTL methods (based on SLEP[3]/MALSAR[1])
  19. 19. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integrationStructured Output Learning Complex outputs Similar framework, different loss function Bundle-methods: state of the art solvers! pics/msklogo.p
  20. 20. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integrationOther methods (d) Sparse/L1 methods (e) Gaussian processes (f) Dim-reduct ...and much more I can’t talk about! pics/msklogo.p
  21. 21. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integrationPython integration Python integration Serialization Matrix integration No-copy data wrapping Rapid prototyping with directors pics/msklogo.p
  22. 22. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integrationPython integration pythonic interaction with shogun objects m_real = array(in_data, dtype=float64, order=’F’) f_real = RealFeatures(m_real) # slicing print f_real[0:3, 1] # operators f_real += f_real f_real *= f_real f_real -= f_real # no copy a = RealFeatures() pics/msklogo.p a.frombuffer(feats, False)
  23. 23. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integrationPython integration: Directors Simple code example: SVM Training class ExampleLinearKernel(DirectorKernel): def __init__(self): DirectorKernel.__init__(self, True) def kernel_function(self, idx_a, idx_b): seq1 = self.get_lhs().get_feature_vector(idx_a) seq2 = self.get_rhs().get_feature_vector(idx_b) return numpy.dot(seq1, seq2) k = ExampleLinearKernel() svm = SVMLight() svm.set_kernel(k) svm.train(train_data) pics/msklogo.p
  24. 24. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integrationHow to get started Dive into Shogun Visit our website Source on github (fork-me!) Documentation available Many python examples (> 200) Debian Package, MacPorts Active Mailing-List pics/msklogo.p
  25. 25. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integrationWhen is SHOGUN for you? You want to work with SVMs (11 solvers to choose from) You want to work with Kernels (35 different kernels) ⇒ Esp.: String Kernels / combinations of Kernels You’re interested recent ML developments (MTL, Structured Output) You have large scale computations to do (up to 50 million) You use one of the following languages: Python, Octave/MATLAB, R, Java, C#, Ruby, Lua, C++ pics/msklogo.p
  26. 26. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integrationContributors Original authors: Gunnar Raetsch, Soeren Sonnenburg, Christian Widmer, Alexander Binder, Alexander Zien, Marius Kloft, Sebastian Henschel, Christian Gehl, Jonas Behr. Integrated Code: Alex Smola (prloqo), Antoine Bordes (LaRank), Thorsten Joachims (SVMLight), Chin-Chung Chang and Chih-Jen Lin (LIBSVM), Chih-Jen Lin (LibLinear), Vojtech Franc (LibOCAS), Leon Bottou (SGD SVM), Vikas Sindhwani (SVMLin), Jieping Ye and Jun Liu (SLEP), Jiayu Zhou and Jieping Ye (MALSAR) GSoC alumni: Heiko Strathmann (both 2011 and 2012), Sergey Lisitsyn (both 2011 and 2012), Chiyuan Zhang (2012), Fernando Iglesias (2012), Viktor Gal (2012), Michal Uricar (2012), Jacob Walker (2012), Evgeniy Andreev (2012), Baozeng Ding (2011), Alesis Novik (2011), Shashwat Lal Das (2011) pics/msklogo.p
  27. 27. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integrationThank you! Thank you for your attention!! For more information, visit: Implementation http://www.shogun-toolbox.org More machine learning software http://mloss.org Machine Learning Data http://mldata.org pics/msklogo.p
  28. 28. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integrationReferences I Zhou Jiayu, Jianhui Chen, and Jieping Ye. User Manual MALSAR : Multi-tAsk Learning via Structural Regularization. Technical report, Arizona State University, 2012. M. Kloft, U. Brefeld, S. Sonnenburg, P. Laskov, K.R. M¨ller, and A. Zien. u Efficient and accurate lp-norm multiple kernel learning. Advances in Neural Information Processing Systems, 22(22):997–1005, 2009. Jun Liu, Shuiwang Ji, and Jieping Ye. SLEP : Sparse Learning with Efficient Projections. 2011. pics/msklogo.p
  29. 29. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integrationReferences II G. Schweikert, A. Zien, G. Zeller, J. Behr, C. Dieterich, C.S. Ong, P. Philips, F. De Bona, L. Hartmann, A. Bohlen, et al. mGene: Accurate SVM-based gene finding with an application to nematode genomes. Genome research, 19(11):2133, 2009. Gabriele Schweikert, Alexander Zien, Georg Zeller, Jonas Behr, Christoph Dieterich, Cheng Soon Ong, Petra Philips, Fabio De Bona, Lisa Hartmann, Anja Bohlen, Nina Kr¨ger, S¨ren Sonnenburg, and Gunnar R¨tsch. u o a mGene: accurate SVM-based gene finding with an application to nematode genomes. Genome research, 19(11):2133–43, November 2009. S. Sonnenburg, G. R¨tsch, C. Sch¨fer, and B. Sch¨lkopf. a a o Large scale multiple kernel learning. The Journal of Machine Learning Research, 7:1565, 2006. pics/msklogo.p
  30. 30. Introduction Machine Learning Dry is all theory: Live Demo SVMs and Kernels Beyond Binary Classification Python integrationReferences III S Sonnenburg, A Zien, and G R¨tsch. a ARTS: accurate recognition of transcription starts in human. Bioinformatics, 2006. S. Sonnenburg, A. Zien, and G. R¨tsch. a ARTS: accurate recognition of transcription starts in human. Bioinformatics, 22(14):e472, 2006. C Widmer, M Kloft, N G¨rnitz, and G R¨tsch. o a Efficient Training of Graph-Regularized Multitask SVMs. In ECML 2012, 2012. C. Widmer, J. Leiva, Y. Altun, and G. Raetsch. Leveraging Sequence Classification by Taxonomy-based Multitask Learning. In Research in Computational Molecular Biology, pages 522–534. Springer, 2010. pics/msklogo.p

×