Introduction to Machine Learning with Python and scikit-learn

3,438 views

Published on

PyATL talk about machine learning. Provides both an intro to machine learning and how to do it with Python. Includes simple examples with code and results.

Published in: Education, Technology
1 Comment
17 Likes
Statistics
Notes
  • Good day my name is Benita single female, you looking very cool and responsible person that makes me attract to your profile i believe we can be good friends write me back with(benitalove4@hotmail.com)so that i can tell you more about myself with my photos waiting your respond?
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
3,438
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
241
Comments
1
Likes
17
Embeds 0
No embeds

No notes for slide

Introduction to Machine Learning with Python and scikit-learn

  1. 1. Introduction to Machine Learning with Python and scikit-learn Python Atlanta Nov. 14th 2013 Matt Hagy matt@liveramp.com
  2. 2. Machine Learning (ML): • Finding patterns in data • Modeling patterns • Use models to make predictions Slide #2 Intro to Machine Learning with Python matt@liveramp.com
  3. 3. ML can be easy* • You already have ML applications! • You can start applying ML methods now with Python &scikit-learn • Theoretical knowledge of ML not needed (initially)* *Gaining more background, theory, and experience will help Slide #3 Intro to Machine Learning with Python matt@liveramp.com
  4. 4. Simple Example Slide #4 Intro to Machine Learning with Python matt@liveramp.com
  5. 5. Simple Model Slide #5 Intro to Machine Learning with Python matt@liveramp.com
  6. 6. import numpyas np from sklearn.linear_modelimport LinearRegression x,y = np.load('data.npz') x_test = np.linspace(0, 200) model = LinearRegression() model.fit(x[::, np.newaxis], y) y_test = model.predict(x_test[::, np.newaxis]) Slide #6 Intro to Machine Learning with Python matt@liveramp.com
  7. 7. Slide #7 Intro to Machine Learning with Python matt@liveramp.com
  8. 8. Variance/Bias Trade Off • Need models that can adapt to relationships in our data • Highly adaptable models can over-fit and will not generalize • Regularization – Common strategy to address variance/bias trade off Slide #8 Intro to Machine Learning with Python matt@liveramp.com
  9. 9. Slide #9 Intro to Machine Learning with Python matt@liveramp.com
  10. 10. import numpy as np from sklearn.svmimport SVR from sklearn.pipelineimport Pipeline from sklearn.preprocessingimport StandardScaler x,y = np.load('data.npz') x_test = np.linspace(0, 200) regularization term model = Pipeline([ ('standardize', StandardScaler()), ('svr', SVR(kernel='rbf', verbose=0, C=5e6, epsilon=20)) ]) model.fit(x[::, np.newaxis], y) y_test = model.predict(x_test[::, np.newaxis]) Slide #10 Intro to Machine Learning with Python matt@liveramp.com
  11. 11. Supervised Learning Output, Y 0 3 1 3 4 2 9 3 4 1 6 3 7 9 3 17 6 7 Sample Input, X Slide #11 Modeling relationship between inputs and outputs Intro to Machine Learning with Python matt@liveramp.com
  12. 12. Multiple Inputs Input, X Sample X1 X2 X3 Xn Output, Y 0 3 1 3 4 2 9 3 4 2 3 1 6 8 9 1 2 3 1 0 3 1 2 7 5 4 2 4 7 0 2 9 1 3 2 1 1 6 3 7 9 3 17 6 7 Slide #12 … Intro to Machine Learning with Python matt@liveramp.com
  13. 13. Example: Image Classification • Classify handwritten digits with ML models • Each input is an entire image • Output is digit in the image Slide #13 Intro to Machine Learning with Python matt@liveramp.com
  14. 14. Input, X Output, Y 9 2 Slide #14 Intro to Machine Learning with Python matt@liveramp.com
  15. 15. import numpyas np from sklearn.ensembleimport RandomForestClassifier with np.load(’train.npz') as data: pixels_train = data['pixels'] labels_train = data['labels’] with np.load(’test.npz') as data: pixels_test = data['pixels'] # flatten X_train = pixels_train.reshape(pixels_train.shape[0], -1) X_test = pixels_test.reshape(pixels_test.shape[0], -1) model = RandomForestClassifier(n_estimators=50) model.fit(X_train, labels_train) labels_test = model.predict(X_test) Slide #15 Intro to Machine Learning with Python matt@liveramp.com
  16. 16. Predicting the tags of Stack Overflow questions with machine learning Kaggle Data Science Competition • Given 6 million training questions labeled with tags • Predict the tags for 2 million unlabeled test questions www.users.globalnet.co.uk/~slocks/instructions.html stackoverflow.com/questions/895371/bubble-sort-homework Slide #16 Intro to Machine Learning with Python matt@liveramp.com
  17. 17. Text Classification Overview Feature Extraction & Selection Raw Posts Slide #17 Model Selection & Training Vector Space Intro to Machine Learning with Python Machine Learning Model matt@liveramp.com
  18. 18. Term Frequency Feature Extraction Characterize text by the frequency of specific words in each text entry Slide #18 processing sorted array faster “Why is processing a sorted array faster than processing an array this is not sorted?” Term Frequencies why Example Title: 1 2 2 2 1 Ignore common words (i.e. stop words) Intro to Machine Learning with Python matt@liveramp.com
  19. 19. sorted array faster need help java homework Title 1 1 2 2 2 1 0 0 0 0 Title 2 0 0 0 0 0 1 1 1 1 Title 3 0 0 1 1 0 0 1 0 1 why processing Frequency of key terms is anticipated to be correlated with the tags of the question Slide #19 Intro to Machine Learning with Python matt@liveramp.com
  20. 20. Example Model Coefficients Slide #22 Intro to Machine Learning with Python matt@liveramp.com
  21. 21. ML can be easy* • You already have ML problems! • You can start applying ML methods now with Python &scikit-learn • Theoretical knowledge of ML not needed (initially)* scikit-learn.org github.com/scikit-learn Slide #24 Intro to Machine Learning with Python matt@liveramp.com
  22. 22. Helping companies use their marketing data to delight customers Tools Opportunities • Backend Engineers • Data Scientists • Full-Stack Engineers • Java • Hadoop (Map/Reduce) • Ruby Build and work with large distributed systems that process massive data sets. Check out: liveramp.com/careers Slide #25 Intro to Machine Learning with Python matt@liveramp.com

×