intro to scikits.learn

1,674 views

Published on

package presentation: scikits.learn

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,674
On SlideShare
0
From Embeds
0
Number of Embeds
63
Actions
Shares
0
Downloads
27
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

intro to scikits.learn

  1. 1. ML made easy jss 2011-05-19Thursday, May 19, 2011
  2. 2. Google Prediction API • The announced subject of this session • RESTful machine learning service • Limits: no access to models (or any internals), max. 100 MB training data, max. 40k predictions/day (100 in free tier) • No fun for serious use • Might work well for ppl w/o background in MLThursday, May 19, 2011
  3. 3. Still: Simple, unified API to access range of ML algorithms plus measures and infrastructure for parameter search would be good thing to have. Enter:Thursday, May 19, 2011
  4. 4. scikits.learn • Python module for machine learning, built on scipy & numpy • Started in 2007 as GSoC, main contrib by INRIAThursday, May 19, 2011
  5. 5. Features • Solid: Supervised learning: Support Vector Machines, Generalized Linear Models • Work in progress: Unsupervised learning: Clustering, Gaussian mixture models, manifold learning, ICA, Gaussian Processes • Planed: Gaussian graphical models, matrix factorizationThursday, May 19, 2011
  6. 6. Back End • Own Numpy/SciPy implementations • C/C++ modules (liblinear & libsvm) • Cython (linear models not covered w/ liblinear) • Multi-processingThursday, May 19, 2011
  7. 7. Docs • In-depth RST documentation • Interfaces, Narrative, Method Background, Practical Tips • Lots of examples • Active community & mailing list • Developer: optimization, conventions, etc.Thursday, May 19, 2011
  8. 8. API clf = Classifier(kernel=‘rbf’) clf is a (pickel-able) model object clf.fit(X, y) clf.predict(y2) same API for all ML techniquesThursday, May 19, 2011
  9. 9. Full Example from scikits.learn.svm import SVC from scikits.learn.metrics import classification_report from numpy import array X = array([[1, 1, 1], [1, 0, 1], [0, 1, 1], [0, 0, 1], ..]) y = array([0, 1, 1, 0, ..]) N = 4 clf = SVC(kernel=rbf, gamma=1e-4, C=1000) clf.fit(X[:N], y[:N]) pred = clf.predict(X[N:]) print classification_report(y[N:], pred)Thursday, May 19, 2011
  10. 10. Grid Param Search Classification report for the best estimator: SVC(kernel=rbf, C=10, probability=False, degree=3, coef0=0.0, tol=0.001, cache_size=100.0, shrinking=True, gamma=0.001) Tuned for precision with optimal value: 1.000 precision recall f1-score support 0 1.00 1.00 1.00 1000 1 1.00 1.00 1.00 1000 avg / total 1.00 1.00 1.00 2000 Grid scores: [({C: 1, gamma: 0.001, kernel: rbf}, 0.66544212631169153), ({C: 1, gamma: 0.0001, kernel: rbf}, 0.66544212631169153), ({C: 10, gamma: 0.001, kernel: rbf}, 1.0), ({C: 10, gamma: 0.0001, kernel: rbf}, 0.66544212631169153), ({C: 100, gamma: 0.001, kernel: rbf}, 1.0), ({C: 100, gamma: 0.0001, kernel: rbf}, 1.0), ({C: 1000, gamma: 0.001, kernel: rbf}, 1.0), ({C: 1000, gamma: 0.0001, kernel: rbf}, 1.0), ({C: 1, kernel: linear}, 1.0), ({C: 10, kernel: linear}, 1.0),Thursday, May 19, 2011
  11. 11. and so many examples GMMThursday, May 19, 2011

×