Advertisement

Machine Learning Fundamentals

SigOpt
SigOpt
Apr. 4, 2018
Advertisement

More Related Content

Similar to Machine Learning Fundamentals(20)

Advertisement
Advertisement

Machine Learning Fundamentals

  1. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Alexandra Johnson - Software Engineer, SigOpt alexandra@sigopt.com Twitter: @alexandraj777 Machine Learning Fundamentals
  2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What Is Machine Learning?
  3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. (Don't say "machine" or "learning") What Is Machine Learning?
  4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. (Don't say "machine" or "learning") A solution to a problem that improves with data What Is Machine Learning?
  5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. (Don't say "machine" or "learning") A solution to a problem that improves with data Data: emails, articles, images, list of homes What Is Machine Learning?
  6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. (Don't say "machine" or "learning") A solution to a problem that improves with data Data: emails, articles, images, list of homes Problem: label an email as spam (classification), predict a home's price (regression), and others What Is Machine Learning?
  7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. ● Problem: quickly identify if an email is spam or not spam ● Data: a list of emails, a list of "labels" spam or not spam ● Goal: function that will correctly label never-before-seen emails as spam or not spam Example: Classify Spam Emails
  8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. ● Pick a model: xgboost, random forest, mxnet CNN, etc ● Transform your data to be readable by the model ● Feature engineering: explore your data to pick out information you is important Build - Train - Tune - Deploy
  9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. ● Model: random forest ● Features: percentage of misspelled words, number of words from a blacklist, domain name of email sender Build - Train - Tune - Deploy Example def extract_features(email): return [ email.mispelled_words, email.words_on_blacklist, email.sender.domain, ]
  10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. ● Expose the model to your data so it can better solve your problem ● Think of a model as a class, this method has already been implemented ● Compute intensive, best done on a server Build - Train - Tune - Deploy
  11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. ● Model: random forest ● Features: percentage of misspelled words, number of words from a blacklist, domain name of email sender Build - Train - Tune - Deploy Example email_features = [ [0.1, 1, 'hotmail.com'], [0.7, 20, 'gmail.com'], [0.3, 92, 'yahoo.com'], ] labels = [0, 1, 1] model = RandomForest() model.train(email_features, labels)
  12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Build - Train - Tune - Deploy ● Models have tunable knobs, aka "hyperparameters" ● Different hyperparameters = different performance ● Train data set for training, validation data set for measuring performance ● Overfitting: your model is really good on your old data, but really bad on never-before-seen data
  13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Build - Train - Tune - Deploy Example def evaluate(num_leaves, max_depth): train_data, train_labels, validation_data, validation_labels = split(email_features, labels) model = RandomForest(num_leaves=num_leaves, max_depth=max_depth) model.train(train_data, train_labels) validation_score = model.score(validation_data, validation_labels) return validation_score
  14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Build - Train - Tune - Deploy ● We train our model to solve our problem on old data but we really want to solve our problem on new data ● Create a REST endpoint for accessing the model ● A/B test different versions
  15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. model = RandomForest(best_hyperparameters) model.train(emails, labels) def is_spam(email): email_features = extract_features(email) return model.predict(email_features) Build - Train - Tune - Deploy Example
  16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thanks! Questions? alexandra@sigopt.com Twitter: @alexandraj777
Advertisement