Pycon 2012 Scikit-Learn

2,516 views
2,351 views

Published on

Presentation for PyCon India 2012

Published in: Technology, Education
2 Comments
6 Likes
Statistics
Notes
No Downloads
Views
Total views
2,516
On SlideShare
0
From Embeds
0
Number of Embeds
853
Actions
Shares
0
Downloads
53
Comments
2
Likes
6
Embeds 0
No embeds

No notes for slide

Pycon 2012 Scikit-Learn

  1. 1. Learning from the Past: with Scikit-Learn ANOOP THOMAS MATHEW Profoundis Labs Pvt. Ltd.
  2. 2. Agenda● Basics of Machine Learning● Introduction some common techniques● Let you know scikit-learn exists● Some inspiration on using machine learning in daily life scenarios and live projects.
  3. 3. How to draw a snake?How to draw a snake?
  4. 4. How to draw a snake?How to draw a snake? IR D! W E hi s is T
  5. 5. IntroductionA lot of Data!What to do???
  6. 6. IntroductionWhat is Machine Learning (Data Mining)? (in plain english)
  7. 7. Machine Learning "A computer program is said to learn fromexperience E with respect to some class of tasksT and performance measure P, if its performanceat tasks in T, as measured by P, improves withexperience E" Tom M. Mitchell
  8. 8. Machine Learning● Supervised Learning - model.fit(X, y)● Unsupervised Learning - model.fit(X)
  9. 9. Supervised Learning
  10. 10. For example ...from sklearn.linear_model import Ridge as RidgeRegressionfrom sklearn import datasetsfrom matplotlib import pyplot as pltboston = datasets.load_boston()X = boston.datay = boston.targetclf = RidgeRegression()clf.fit(X, y)clf.predict(X)
  11. 11. Unsupervised Learning
  12. 12. For example ...from sklearn.cluster import KMeansfrom numpy.random import RandomStaterng = RandomState(42)k_means = KMeans(3, random_state=rng)k_means.fit(X)
  13. 13. What can Scikit-learn do?ClusteringClassificationRegression
  14. 14. Terminology• Model the collection of parameters you are trying to fit• Data what you are using to fit the model• Target the value you are trying to predict with your model• Features attributes of your data that will be used in prediction• Methods algorithms that will use your data to fit a model
  15. 15. Steps for Analysis● Understand the task. See how to measure the performance.● Choose the source of training experience.● Decide what will be input and output.● Choose a set of models to the output function.● Choose a learning algorithm.
  16. 16. Steps for Analysis● Understand the task. See how to measure the performance. Find the right question to ask.● Choose the source of training experience. ● Keep training and testing dataset separate. Beware of overfitting !● Decide what will be input and expected output.● Choose a set of models to approximate the output function. (use dimensinality reduction)● Choose a learning algorithm. Try different ones ;)
  17. 17. Some Common Algorithms Some Common AlgorithmsPrincipal Component Analysis
  18. 18. Some Common Algorithms Some Common Algorithms● Support Vector Machine
  19. 19. Some Common Algorithms Some Common Algorithms● Nearest Neighbour Classifier
  20. 20. Some Common Algorithms Some Common Algorithms● Decision Tree Learning
  21. 21. Some Common Algorithms Some Common Algorithmsk-means clustering
  22. 22. Some Common Algorithms Some Common AlgorithmsDB SCAN Clustering
  23. 23. Some Example Usecases Some Example Usecases● Log file analysis● Outlier dectection● Fraud Dectection● Forcasting● User patterns
  24. 24. A few comments A few comments● nltk is a good(better) for text processing● scikit-learn is for medium size problems● for humongous projects, think of mahout● matplotlib can be used for visualization● visualize it in browser using d3.js● have a look at pandas for numerical analysis
  25. 25. Conclusion Conclusion● This is just the tip of an iceberg.● Scikit-learn is really cool to hack with.● A lot of examples(http://scikit-learn.org/stable/auto_examples/index.html)
  26. 26. Final words Final wordspip install scikit-learnIts all in the internet. Happy Hacking!

×