Successfully reported this slideshow.
Upcoming SlideShare
×

Pycon 2012 Scikit-Learn

2,732 views

Published on

Presentation for PyCon India 2012

Published in: Technology, Education
• Full Name
Comment goes here.

Are you sure you want to Yes No
• will be updated soon! :)

Are you sure you want to  Yes  No
• under construction???

Are you sure you want to  Yes  No

Pycon 2012 Scikit-Learn

1. 1. Learning from the Past: with Scikit-Learn ANOOP THOMAS MATHEW Profoundis Labs Pvt. Ltd.
2. 2. Agenda● Basics of Machine Learning● Introduction some common techniques● Let you know scikit-learn exists● Some inspiration on using machine learning in daily life scenarios and live projects.
3. 3. How to draw a snake?How to draw a snake?
4. 4. How to draw a snake?How to draw a snake? IR D! W E hi s is T
5. 5. IntroductionA lot of Data!What to do???
6. 6. IntroductionWhat is Machine Learning (Data Mining)? (in plain english)
7. 7. Machine Learning "A computer program is said to learn fromexperience E with respect to some class of tasksT and performance measure P, if its performanceat tasks in T, as measured by P, improves withexperience E" Tom M. Mitchell
8. 8. Machine Learning● Supervised Learning - model.fit(X, y)● Unsupervised Learning - model.fit(X)
9. 9. Supervised Learning
10. 10. For example ...from sklearn.linear_model import Ridge as RidgeRegressionfrom sklearn import datasetsfrom matplotlib import pyplot as pltboston = datasets.load_boston()X = boston.datay = boston.targetclf = RidgeRegression()clf.fit(X, y)clf.predict(X)
11. 11. Unsupervised Learning
12. 12. For example ...from sklearn.cluster import KMeansfrom numpy.random import RandomStaterng = RandomState(42)k_means = KMeans(3, random_state=rng)k_means.fit(X)
13. 13. What can Scikit-learn do?ClusteringClassificationRegression
14. 14. Terminology• Model the collection of parameters you are trying to fit• Data what you are using to fit the model• Target the value you are trying to predict with your model• Features attributes of your data that will be used in prediction• Methods algorithms that will use your data to fit a model
15. 15. Steps for Analysis● Understand the task. See how to measure the performance.● Choose the source of training experience.● Decide what will be input and output.● Choose a set of models to the output function.● Choose a learning algorithm.
16. 16. Steps for Analysis● Understand the task. See how to measure the performance. Find the right question to ask.● Choose the source of training experience. ● Keep training and testing dataset separate. Beware of overfitting !● Decide what will be input and expected output.● Choose a set of models to approximate the output function. (use dimensinality reduction)● Choose a learning algorithm. Try different ones ;)
17. 17. Some Common Algorithms Some Common AlgorithmsPrincipal Component Analysis
18. 18. Some Common Algorithms Some Common Algorithms● Support Vector Machine
19. 19. Some Common Algorithms Some Common Algorithms● Nearest Neighbour Classifier
20. 20. Some Common Algorithms Some Common Algorithms● Decision Tree Learning
21. 21. Some Common Algorithms Some Common Algorithmsk-means clustering
22. 22. Some Common Algorithms Some Common AlgorithmsDB SCAN Clustering
23. 23. Some Example Usecases Some Example Usecases● Log file analysis● Outlier dectection● Fraud Dectection● Forcasting● User patterns
24. 24. A few comments A few comments● nltk is a good(better) for text processing● scikit-learn is for medium size problems● for humongous projects, think of mahout● matplotlib can be used for visualization● visualize it in browser using d3.js● have a look at pandas for numerical analysis
25. 25. Conclusion Conclusion● This is just the tip of an iceberg.● Scikit-learn is really cool to hack with.● A lot of examples(http://scikit-learn.org/stable/auto_examples/index.html)
26. 26. Final words Final wordspip install scikit-learnIts all in the internet. Happy Hacking!