Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Intro to scikit-learn

In this talk by AWeber's Michael Becker, you will get a brief overview of Machine Learning and scikit-learn. This is a scaled down version of this talk from Pycon 2013: http://github.com/jakevdp/sklearn_pycon2013

  • Be the first to comment

Intro to scikit-learn

  1. 1. Intro to scikit-learn Michael Becker PyData Boston 2013
  2. 2. Who is this guy? Software Engineer @ AWeber Founder of the DataPhilly Meetup group @beckerfuffle beckerfuffle.com These slides and more @ github.com/mdbecker
  3. 3. On the shoulders of giants • Machine Learning 101 tutorial from scikit-learn.
  4. 4. On the shoulders of giants • Machine Learning 101 tutorial from scikit-learn. • IPython notebooks from pycon 2013.
  5. 5. What is Machine Learning?
  6. 6. What is Machine Learning?
  7. 7. Data in scikit-learn • Stored as a 2d-array • [n_samples, n_features] • n_samples: items to process • n_features: distinct traits
  8. 8. The Iris Dataset
  9. 9. The Iris Dataset
  10. 10. The Iris Dataset
  11. 11. The Iris Dataset: Loading
  12. 12. The Iris Dataset: Loading
  13. 13. The Iris Dataset: Loading
  14. 14. The Iris Dataset: Loading
  15. 15. The Iris Dataset: Loading
  16. 16. The Iris Dataset: Loading
  17. 17. The Iris Dataset: Loading
  18. 18. Machine Learning: Supervised
  19. 19. Machine Learning: Unsupervised
  20. 20. Scikit-learn's interface
  21. 21. Scikit-learn's interface
  22. 22. Feature Extraction Often data is unstructured & non-numerical: •Text documents
  23. 23. Feature Extraction Often data is unstructured & non-numerical: •Text documents •Images
  24. 24. Feature Extraction Often data is unstructured & non-numerical: •Text documents •Images •Sounds
  25. 25. Supervised Learning: Classification
  26. 26. Supervised Learning: Classification
  27. 27. Supervised Learning: Classification
  28. 28. Supervised Learning: Classification
  29. 29. Supervised Learning: Classification
  30. 30. Supervised Learning: Classification
  31. 31. Supervised Learning: Classification
  32. 32. Supervised Learning: Classification
  33. 33. Supervised Learning: Classification
  34. 34. Supervised Learning: Classification • Email classification • Language identification • New article categorization • Sentiment analysis • Facial recognition • ...
  35. 35. Unsupervised Learning
  36. 36. Dimensionality Reduction
  37. 37. Principal Component Analysis
  38. 38. Unsupervised Learning: PCA
  39. 39. Unsupervised Learning: PCA
  40. 40. Unsupervised Learning: PCA
  41. 41. Unsupervised Learning: PCA
  42. 42. Validation & Testing
  43. 43. Validation & Testing
  44. 44. Validation & Testing
  45. 45. Overfitting
  46. 46. Cross-Validation
  47. 47. Cross-Validation
  48. 48. Cross-Validation
  49. 49. Additional Resources • Machine Learning 101 tutorial from scikit-learn.
  50. 50. Additional Resources • IPython notebooks from pycon 2013.
  51. 51. My info Tweet me @beckerfuffle Find me at beckerfuffle.com These slides and more @ github.com/mdbecker

×