Simpler 	

Machine Learning 	

with SKLL
Dan Blanchard	

Educational Testing Service	

dblanchard@ets.org	



PyData NYC 2...
Survived

Perished
Survived
first class,	

female,	

1 sibling,	

35 years old

Perished
Survived
first class,	

female,	

1 sibling,	

35 years old

Perished
third class, 	

female,	

2 siblings,	

18 years old
Survived
first class,	

female,	

1 sibling,	

35 years old

Perished
third class, 	

female,	

2 siblings,	

18 years old
...
Survived
first class,	

female,	

1 sibling,	

35 years old

Perished
third class, 	

female,	

2 siblings,	

18 years old
...
SciKit-Learn Laboratory
SKLL
SKLL
SKLL

It's where the learning happens.
Learning to Predict Survival
1. Split up given training set: train (80%) and dev (20%)
Learning to Predict Survival
1. Split up given training set: train (80%) and dev (20%)
$ ./make_titanic_example_data.py
!
...
Learning to Predict Survival
2. Pick classifiers to try:	

1. Random forest	

2. Support Vector Machine (SVM)	

3. Naive Ba...
Learning to Predict Survival
3. Create configuration file for SKLL
Learning to Predict Survival
3. Create configuration file for SKLL
[General]
experiment_name = Titanic_Evaluate
task = evalu...
Learning to Predict Survival
3. Create configuration file for SKLL
[General]
experiment_name = Titanic_Evaluate
task = evalu...
Learning to Predict Survival
3. Create configuration file for SKLL
[General]
experiment_name = Titanic_Evaluate
task = evalu...
Learning to Predict Survival
3. Create configuration file for SKLL
[General]
experiment_name = Titanic_Evaluate
task = evalu...
Learning to Predict Survival
3. Create configuration file for SKLL
[General]
experiment_name = Titanic_Evaluate
task = evalu...
Learning to Predict Survival
3. Create configuration file for SKLL
[General]
experiment_name = Titanic_Evaluate
task = evalu...
Learning to Predict Survival
3. Create configuration file for SKLL
[General]
experiment_name = Titanic_Evaluate
task = evalu...
Learning to Predict Survival
3. Create configuration file for SKLL
[General]
experiment_name = Titanic_Evaluate
task = evalu...
Learning to Predict Survival
3. Create configuration file for SKLL
[General]
experiment_name = Titanic_Evaluate
task = evalu...
Learning to Predict Survival
3. Create configuration file for SKLL
[General]
experiment_name = Titanic_Evaluate
task = evalu...
Learning to Predict Survival
3. Create configuration file for SKLL
[General]
experiment_name = Titanic_Evaluate
task = evalu...
Learning to Predict Survival
3. Create configuration file for SKLL
[General]
experiment_name = Titanic_Evaluate
task = evalu...
Learning to Predict Survival
4. Run the configuration file with run_experiment
$ run_experiment evaluate.cfg
!
Loading train...
Learning to Predict Survival
5. Examine results
Experiment Name: Titanic_Evaluate
Training Set: train
Test Set: dev
Featur...
Aggregate Evaluation Results

Dev.
Accuracy

Learner

0.821

RandomForestClassifier

0.771

SVC

0.709

MultinomialNB
Tuning learner
• Can we do better than default hyperparameters?
Tuning learner
• Can we do better than default hyperparameters?
[General]
experiment_name = Titanic_Evaluate
task = evalua...
Tuned Evaluation Results

Untuned
Accuracy

Tuned
Accuracy

Learner

0.821

0.849

RandomForestClassifier

0.771

0.737

SV...
Tuned Evaluation Results

Untuned
Accuracy

Tuned
Accuracy

Learner

0.821

0.849

RandomForestClassifier

0.771

0.737

SV...
Using All Available Data
Using All Available Data
• Use training and dev to generate predictions on test
Using All Available Data
• Use training and dev to generate predictions on test
[General]
experiment_name = Titanic_Predic...
Test Set Performance

Untuned
Accuracy
(Train only)

Tuned
Accuracy
(Train only)

Untuned
Tuned
Accuracy
Accuracy
(Train +...
Advanced SKLL Features
Advanced SKLL Features
• Read/write .arff, .csv, .jsonlines, .megam, .ndj,
and .tsv data
Advanced SKLL Features
• Read/write .arff, .csv, .jsonlines, .megam, .ndj,
and .tsv data
• Parameter grids for all support...
Advanced SKLL Features
• Read/write .arff, .csv, .jsonlines, .megam, .ndj,
and .tsv data
• Parameter grids for all support...
Advanced SKLL Features
• Read/write .arff, .csv, .jsonlines, .megam, .ndj,
and .tsv data
• Parameter grids for all support...
Advanced SKLL Features
• Read/write .arff, .csv, .jsonlines, .megam, .ndj,
and .tsv data
• Parameter grids for all support...
Advanced SKLL Features
• Read/write .arff, .csv, .jsonlines, .megam, .ndj,
and .tsv data
• Parameter grids for all support...
Advanced SKLL Features
• Read/write .arff, .csv, .jsonlines, .megam, .ndj,
and .tsv data
• Parameter grids for all support...
Advanced SKLL Features
• Read/write .arff, .csv, .jsonlines, .megam, .ndj,
and .tsv data
• Parameter grids for all support...
Currently Supported Learners
Classifiers

Regressors

Linear Support Vector Machine

Elastic Net

Logistic Regression

Lass...
Coming Soon
Classifiers

Regressors
AdaBoost
K-Nearest Neighbors

Stochastic Gradient Descent
Acknowledgements
• Mike Heilman	

• Nitin Madnani	

• Aoife Cahill
References
• Dataset: kaggle.com/c/titanic-gettingStarted	

• SKLL GitHub: github.com/EducationalTestingService/skll	

• S...
Bonus Slides
Cross-validation
[General]
experiment_name = Titanic_CV
task = cross_validate
!
[Input]
train_location = train+dev
feature...
Cross-validation Results
Avg. CV
Accuracy

Learner

0.815

RandomForestClassifier

0.717

SVC

0.681

MultinomialNB
SKLL API
SKLL API
from skll import Learner, load_examples
SKLL API
from skll import Learner, load_examples
# Load training examples
train_examples = load_examples('myexamples.megam...
SKLL API
from skll import Learner, load_examples
# Load training examples
train_examples = load_examples('myexamples.megam...
SKLL API
from skll import Learner, load_examples
# Load training examples
train_examples = load_examples('myexamples.megam...
SKLL API
from skll import Learner, load_examples
# Load training examples
train_examples = load_examples('myexamples.megam...
SKLL API
from skll import Learner, load_examples
# Load training examples
train_examples = load_examples('myexamples.megam...
SKLL API
from skll import Learner, load_examples
# Load training examples
train_examples = load_examples('myexamples.megam...
SKLL API
from skll import Learner, load_examples
# Load training examples
train_examples = load_examples('myexamples.megam...
SKLL API
from skll import Learner, load_examples
# Load training examples
train_examples = load_examples('myexamples.megam...
SKLL API
from skll import Learner, load_examples
# Load training examples
train_examples = load_examples('myexamples.megam...
SKLL API
from skll import Learner, load_examples
# Load training examples
train_examples = load_examples('myexamples.megam...
SKLL API
from skll import Learner, load_examples
# Load training examples
train_examples = load_examples('myexamples.megam...
SKLL API
from skll import Learner, load_examples
# Load training examples
train_examples = load_examples('myexamples.megam...
SKLL API
from skll import Learner, load_examples
# Load training examples
train_examples = load_examples('myexamples.megam...
SKLL API
import numpy as np
import os
from skll import write_feature_file
!
# Create some training examples
classes = []
i...
Simpler Machine Learning with SKLL
Simpler Machine Learning with SKLL
Simpler Machine Learning with SKLL
Upcoming SlideShare
Loading in …5
×

Simpler Machine Learning with SKLL

1,296 views

Published on

Published in: Technology, Education
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,296
On SlideShare
0
From Embeds
0
Number of Embeds
17
Actions
Shares
0
Downloads
35
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Simpler Machine Learning with SKLL

  1. 1. Simpler Machine Learning with SKLL Dan Blanchard Educational Testing Service dblanchard@ets.org 
 PyData NYC 2013
  2. 2. Survived Perished
  3. 3. Survived first class, female, 1 sibling, 35 years old Perished
  4. 4. Survived first class, female, 1 sibling, 35 years old Perished third class, female, 2 siblings, 18 years old
  5. 5. Survived first class, female, 1 sibling, 35 years old Perished third class, female, 2 siblings, 18 years old second class, male, 0 siblings, 50 years old
  6. 6. Survived first class, female, 1 sibling, 35 years old Perished third class, female, 2 siblings, 18 years old second class, male, 0 siblings, 50 years old Can we predict survival from data?
  7. 7. SciKit-Learn Laboratory
  8. 8. SKLL
  9. 9. SKLL
  10. 10. SKLL It's where the learning happens.
  11. 11. Learning to Predict Survival 1. Split up given training set: train (80%) and dev (20%)
  12. 12. Learning to Predict Survival 1. Split up given training set: train (80%) and dev (20%) $ ./make_titanic_example_data.py ! Creating titanic/train directory Creating titanic/dev directory Creating titanic/test directory Loading train.csv............done Loading test.csv........done
  13. 13. Learning to Predict Survival 2. Pick classifiers to try: 1. Random forest 2. Support Vector Machine (SVM) 3. Naive Bayes
  14. 14. Learning to Predict Survival 3. Create configuration file for SKLL
  15. 15. Learning to Predict Survival 3. Create configuration file for SKLL [General] experiment_name = Titanic_Evaluate task = evaluate ! [Input] train_location = train test_location = dev featuresets = [["family.csv", "misc.csv", "socioeconomic.csv", "vitals.csv"]] learners = ["RandomForestClassifier", "SVC", "MultinomialNB"] label_col = Survived ! [Output] results = output models = output
  16. 16. Learning to Predict Survival 3. Create configuration file for SKLL [General] experiment_name = Titanic_Evaluate task = evaluate ! [Input] directory with feature files train_location = train for training learner test_location = dev featuresets = [["family.csv", "misc.csv", "socioeconomic.csv", "vitals.csv"]] learners = ["RandomForestClassifier", "SVC", "MultinomialNB"] label_col = Survived ! [Output] results = output models = output
  17. 17. Learning to Predict Survival 3. Create configuration file for SKLL [General] experiment_name = Titanic_Evaluate task = evaluate ! [Input] train_location = train directory with feature files test_location = dev for evaluating performance featuresets = [["family.csv", "misc.csv", "socioeconomic.csv", "vitals.csv"]] learners = ["RandomForestClassifier", "SVC", "MultinomialNB"] label_col = Survived ! [Output] results = output models = output
  18. 18. Learning to Predict Survival 3. Create configuration file for SKLL [General] experiment_name = Titanic_Evaluate task = evaluate ! [Input] train_location = train test_location = dev featuresets = [["family.csv", "misc.csv", "socioeconomic.csv", "vitals.csv"]] learners = ["RandomForestClassifier", "SVC", "MultinomialNB"] label_col = Survived ! [Output] results = output models = output
  19. 19. Learning to Predict Survival 3. Create configuration file for SKLL [General] experiment_name = Titanic_Evaluate task = evaluate ! [Input] # of siblings, spouses, train_location = train children parents, test_location = dev featuresets = [["family.csv", "misc.csv", "socioeconomic.csv", "vitals.csv"]] learners = ["RandomForestClassifier", "SVC", "MultinomialNB"] label_col = Survived ! [Output] results = output models = output
  20. 20. Learning to Predict Survival 3. Create configuration file for SKLL [General] experiment_name = Titanic_Evaluate task = evaluate ! [Input] train_location = train departure port test_location = dev featuresets = [["family.csv", "misc.csv", "socioeconomic.csv", "vitals.csv"]] learners = ["RandomForestClassifier", "SVC", "MultinomialNB"] label_col = Survived ! [Output] results = output models = output
  21. 21. Learning to Predict Survival 3. Create configuration file for SKLL [General] experiment_name = Titanic_Evaluate task = evaluate ! [Input] train_location = train test_location = dev & passenger class fare featuresets = [["family.csv", "misc.csv", "socioeconomic.csv", "vitals.csv"]] learners = ["RandomForestClassifier", "SVC", "MultinomialNB"] label_col = Survived ! [Output] results = output models = output
  22. 22. Learning to Predict Survival 3. Create configuration file for SKLL [General] experiment_name = Titanic_Evaluate task = evaluate ! [Input] train_location = train test_location = dev sex, & age featuresets = [["family.csv", "misc.csv", "socioeconomic.csv", "vitals.csv"]] learners = ["RandomForestClassifier", "SVC", "MultinomialNB"] label_col = Survived ! [Output] results = output models = output
  23. 23. Learning to Predict Survival 3. Create configuration file for SKLL [General] experiment_name = Titanic_Evaluate task = evaluate ! [Input] train_location = train test_location = dev featuresets = [["family.csv", "misc.csv", "socioeconomic.csv", "vitals.csv"]] learners = ["RandomForestClassifier", "SVC", "MultinomialNB"] label_col = Survived ! [Output] results = output models = output
  24. 24. Learning to Predict Survival 3. Create configuration file for SKLL [General] experiment_name = Titanic_Evaluate task = evaluate ! [Input] train_location = train test_location = dev featuresets = [["family.csv", "misc.csv", "socioeconomic.csv", "vitals.csv"]] learners = ["RandomForestClassifier", "SVC", "MultinomialNB"] label_col = Survived ! [Output] results = output directory to store evaluation results models = output
  25. 25. Learning to Predict Survival 3. Create configuration file for SKLL [General] experiment_name = Titanic_Evaluate task = evaluate ! [Input] train_location = train test_location = dev featuresets = [["family.csv", "misc.csv", "socioeconomic.csv", "vitals.csv"]] learners = ["RandomForestClassifier", "SVC", "MultinomialNB"] label_col = Survived ! [Output] results = output models = output directory to store trained models
  26. 26. Learning to Predict Survival 3. Create configuration file for SKLL [General] experiment_name = Titanic_Evaluate task = evaluate ! [Input] train_location = train test_location = dev featuresets = [["family.csv", "misc.csv", "socioeconomic.csv", "vitals.csv"]] learners = ["RandomForestClassifier", "SVC", "MultinomialNB"] label_col = Survived ! [Output] results = output models = output directory to store trained models
  27. 27. Learning to Predict Survival 4. Run the configuration file with run_experiment $ run_experiment evaluate.cfg ! Loading train/family.csv...........done Loading train/misc.csv...........done Loading train/socioeconomic.csv...........done Loading train/vitals.csv...........done Loading dev/family.csv.....done Loading dev/misc.csv.....done Loading dev/socioeconomic.csv.....done Loading dev/vitals.csv.....done Loading train/family.csv...........done Loading train/misc.csv...........done Loading train/socioeconomic.csv...........done Loading train/vitals.csv...........done Loading dev/family.csv.....done ...
  28. 28. Learning to Predict Survival 5. Examine results Experiment Name: Titanic_Evaluate Training Set: train Test Set: dev Feature Set: ["family.csv", "misc.csv", “socioeconomic.csv", "vitals.csv"] Learner: RandomForestClassifier Task: evaluate ! +-------+------+------+-----------+--------+-----------+ | | 0.0 | 1.0 | Precision | Recall | F-measure | +-------+------+------+-----------+--------+-----------+ | 0.000 | [97] | 18 | 0.874 | 0.843 | 0.858 | +-------+------+------+-----------+--------+-----------+ | 1.000 | 14 | [50] | 0.735 | 0.781 | 0.758 | +-------+------+------+-----------+--------+-----------+ (row = reference; column = predicted) Accuracy = 0.8212290502793296
  29. 29. Aggregate Evaluation Results Dev. Accuracy Learner 0.821 RandomForestClassifier 0.771 SVC 0.709 MultinomialNB
  30. 30. Tuning learner • Can we do better than default hyperparameters?
  31. 31. Tuning learner • Can we do better than default hyperparameters? [General] experiment_name = Titanic_Evaluate task = evaluate ! [Input] train_location = train test_location = dev featuresets = [["family.csv", "misc.csv", "socioeconomic.csv", "vitals.csv"]] learners = ["RandomForestClassifier", "SVC", "MultinomialNB"] label_col = Survived ! [Tuning] grid_search = true objective = accuracy ! [Output] results = output
  32. 32. Tuned Evaluation Results Untuned Accuracy Tuned Accuracy Learner 0.821 0.849 RandomForestClassifier 0.771 0.737 SVC 0.709 0.709 MultinomialNB
  33. 33. Tuned Evaluation Results Untuned Accuracy Tuned Accuracy Learner 0.821 0.849 RandomForestClassifier 0.771 0.737 SVC 0.709 0.709 MultinomialNB
  34. 34. Using All Available Data
  35. 35. Using All Available Data • Use training and dev to generate predictions on test
  36. 36. Using All Available Data • Use training and dev to generate predictions on test [General] experiment_name = Titanic_Predict task = predict ! [Input] train_location = train+dev test_location = test featuresets = [["family.csv", "misc.csv", "socioeconomic.csv", "vitals.csv"]] learners = ["RandomForestClassifier", "SVC", "MultinomialNB"] label_col = Survived ! [Tuning] grid_search = true objective = accuracy ! [Output] results = output
  37. 37. Test Set Performance Untuned Accuracy (Train only) Tuned Accuracy (Train only) Untuned Tuned Accuracy Accuracy (Train + Dev) (Train + Dev) 0.732 0.746 0.746 0.756 RandomForestClassifier 0.608 0.617 0.612 0.641 SVC 0.627 0.623 0.622 0.622 MultinomialNB Learner
  38. 38. Advanced SKLL Features
  39. 39. Advanced SKLL Features • Read/write .arff, .csv, .jsonlines, .megam, .ndj, and .tsv data
  40. 40. Advanced SKLL Features • Read/write .arff, .csv, .jsonlines, .megam, .ndj, and .tsv data • Parameter grids for all supported classifiers/regressors
  41. 41. Advanced SKLL Features • Read/write .arff, .csv, .jsonlines, .megam, .ndj, and .tsv data • Parameter grids for all supported classifiers/regressors • Parallelize experiments on DRMAA clusters
  42. 42. Advanced SKLL Features • Read/write .arff, .csv, .jsonlines, .megam, .ndj, and .tsv data • Parameter grids for all supported classifiers/regressors • Parallelize experiments on DRMAA clusters • Ablation experiments
  43. 43. Advanced SKLL Features • Read/write .arff, .csv, .jsonlines, .megam, .ndj, and .tsv data • Parameter grids for all supported classifiers/regressors • Parallelize experiments on DRMAA clusters • Ablation experiments • Collapse/rename classes from config file
  44. 44. Advanced SKLL Features • Read/write .arff, .csv, .jsonlines, .megam, .ndj, and .tsv data • Parameter grids for all supported classifiers/regressors • Parallelize experiments on DRMAA clusters • Ablation experiments • Collapse/rename classes from config file • Rescale predictions to be closer to observed data
  45. 45. Advanced SKLL Features • Read/write .arff, .csv, .jsonlines, .megam, .ndj, and .tsv data • Parameter grids for all supported classifiers/regressors • Parallelize experiments on DRMAA clusters • Ablation experiments • Collapse/rename classes from config file • Rescale predictions to be closer to observed data • Feature scaling
  46. 46. Advanced SKLL Features • Read/write .arff, .csv, .jsonlines, .megam, .ndj, and .tsv data • Parameter grids for all supported classifiers/regressors • Parallelize experiments on DRMAA clusters • Ablation experiments • Collapse/rename classes from config file • Rescale predictions to be closer to observed data • Feature scaling • Python API
  47. 47. Currently Supported Learners Classifiers Regressors Linear Support Vector Machine Elastic Net Logistic Regression Lasso Multinomial Naive Bayes Linear Decision Tree Gradient Boosting Random Forest Support Vector Machine
  48. 48. Coming Soon Classifiers Regressors AdaBoost K-Nearest Neighbors Stochastic Gradient Descent
  49. 49. Acknowledgements • Mike Heilman • Nitin Madnani • Aoife Cahill
  50. 50. References • Dataset: kaggle.com/c/titanic-gettingStarted • SKLL GitHub: github.com/EducationalTestingService/skll • SKLL Docs: skll.readthedocs.org • Titanic configs and data splitting script in examples dir on GitHub @Dan_S_Blanchard ! dan-blanchard
  51. 51. Bonus Slides
  52. 52. Cross-validation [General] experiment_name = Titanic_CV task = cross_validate ! [Input] train_location = train+dev featuresets = [["family.csv", "misc.csv", "socioeconomic.csv", "vitals.csv"]] learners = ["RandomForestClassifier", "SVC", "MultinomialNB"] label_col = Survived ! [Tuning] grid_search = true objective = accuracy ! [Output] results = output
  53. 53. Cross-validation Results Avg. CV Accuracy Learner 0.815 RandomForestClassifier 0.717 SVC 0.681 MultinomialNB
  54. 54. SKLL API
  55. 55. SKLL API from skll import Learner, load_examples
  56. 56. SKLL API from skll import Learner, load_examples # Load training examples train_examples = load_examples('myexamples.megam')
  57. 57. SKLL API from skll import Learner, load_examples # Load training examples train_examples = load_examples('myexamples.megam') # Train a linear SVM learner = Learner('LinearSVC') learner.train(train_examples)
  58. 58. SKLL API from skll import Learner, load_examples # Load training examples train_examples = load_examples('myexamples.megam') # Train a linear SVM learner = Learner('LinearSVC') learner.train(train_examples) # Load test examples and evaluate test_examples = load_examples('test.tsv') (conf_matrix, accuracy, prf_dict, model_params, obj_score) = learner.evaluate(test_examples)
  59. 59. SKLL API from skll import Learner, load_examples # Load training examples train_examples = load_examples('myexamples.megam') # Train a linear SVM learner = Learner('LinearSVC') learner.train(train_examples) # Load test examples and evaluate confusion matrix test_examples = load_examples('test.tsv') (conf_matrix, accuracy, prf_dict, model_params, obj_score) = learner.evaluate(test_examples)
  60. 60. SKLL API from skll import Learner, load_examples # Load training examples train_examples = load_examples('myexamples.megam') # Train a linear SVM learner = Learner('LinearSVC') learner.train(train_examples) # Load test examples and evaluate test_examples = load_examples('test.tsv') (conf_matrix, accuracy, prf_dict, model_params, obj_score) = learner.evaluate(test_examples)
  61. 61. SKLL API from skll import Learner, load_examples # Load training examples train_examples = load_examples('myexamples.megam') # Train a linear SVM learner = Learner('LinearSVC') learner.train(train_examples) precision, recall, f-score # Load test examples and evaluate for each class test_examples = load_examples('test.tsv') (conf_matrix, accuracy, prf_dict, model_params, obj_score) = learner.evaluate(test_examples)
  62. 62. SKLL API from skll import Learner, load_examples # Load training examples train_examples = load_examples('myexamples.megam') # Train a linear SVM learner = Learner('LinearSVC') learner.train(train_examples) tuned model # Load test examples and evaluate parameters test_examples = load_examples('test.tsv') (conf_matrix, accuracy, prf_dict, model_params, obj_score) = learner.evaluate(test_examples)
  63. 63. SKLL API from skll import Learner, load_examples # Load training examples train_examples = load_examples('myexamples.megam') # Train a linear SVM learner = Learner('LinearSVC') learner.train(train_examples) # Load test examples and evaluate objective function test_examples = load_examples('test.tsv') score on test set (conf_matrix, accuracy, prf_dict, model_params, obj_score) = learner.evaluate(test_examples)
  64. 64. SKLL API from skll import Learner, load_examples # Load training examples train_examples = load_examples('myexamples.megam') # Train a linear SVM learner = Learner('LinearSVC') learner.train(train_examples) # Load test examples and evaluate test_examples = load_examples('test.tsv') (conf_matrix, accuracy, prf_dict, model_params, obj_score) = learner.evaluate(test_examples)
  65. 65. SKLL API from skll import Learner, load_examples # Load training examples train_examples = load_examples('myexamples.megam') # Train a linear SVM learner = Learner('LinearSVC') learner.train(train_examples) # Load test examples and evaluate test_examples = load_examples('test.tsv') (conf_matrix, accuracy, prf_dict, model_params, obj_score) = learner.evaluate(test_examples) # Generate predictions from trained model predictions = learner.predict(test_examples)
  66. 66. SKLL API from skll import Learner, load_examples # Load training examples train_examples = load_examples('myexamples.megam') # Train a linear SVM learner = Learner('LinearSVC') learner.train(train_examples) # Load test examples and evaluate test_examples = load_examples('test.tsv') (conf_matrix, accuracy, prf_dict, model_params, obj_score) = learner.evaluate(test_examples) # Generate predictions from trained model predictions = learner.predict(test_examples) # Perform 10-fold cross-validation with a radial SVM learner = Learner('SVC') (fold_result_list, grid_search_scores) = learner.cross_validate(train_examples)
  67. 67. SKLL API from skll import Learner, load_examples # Load training examples train_examples = load_examples('myexamples.megam') # Train a linear SVM learner = Learner('LinearSVC') learner.train(train_examples) # Load test examples and evaluate test_examples = load_examples('test.tsv') (conf_matrix, accuracy, prf_dict, model_params, obj_score) = learner.evaluate(test_examples) # Generate predictions from trained model predictions = learner.predict(test_examples) per-fold # evaluation results cross-validation with a radial SVM Perform 10-fold learner = Learner('SVC') (fold_result_list, grid_search_scores) = learner.cross_validate(train_examples)
  68. 68. SKLL API from skll import Learner, load_examples # Load training examples train_examples = load_examples('myexamples.megam') # Train a linear SVM learner = Learner('LinearSVC') learner.train(train_examples) # Load test examples and evaluate test_examples = load_examples('test.tsv') (conf_matrix, accuracy, prf_dict, model_params, obj_score) = learner.evaluate(test_examples) # Generate predictions from trained model predictions = learner.predict(test_examples) # Perform 10-fold cross-validation with a radial SVM per-fold training learner = Learner('SVC') set obj. scores (fold_result_list, grid_search_scores) = learner.cross_validate(train_examples)
  69. 69. SKLL API import numpy as np import os from skll import write_feature_file ! # Create some training examples classes = [] ids = [] features = [] for i in range(num_train_examples): y = "dog" if i % 2 == 0 else "cat" ex_id = "{}{}".format(y, i) x = {"f1": np.random.randint(1, 4), "f2": np.random.randint(1, 4), "f3": np.random.randint(1, 4)} classes.append(y) ids.append(ex_id) features.append(x) # Write them to a file train_path = os.path.join(_my_dir, 'train', 'test_summary.jsonlines') write_feature_file(train_path, ids, classes, features)

×