Hierarchical Classification by Jurgen Van Gael

1,218 views

Published on

Hierarchical Classification by Jurgen Van Gael

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,218
On SlideShare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
26
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Hierarchical Classification by Jurgen Van Gael

  1. 1. Hierarchical  Classification Jurgen Van Gael - .
  2. 2. About •  Computer Scientist w/ background in ML. •  London Machine Learning Meetup. •  Founder of Math.NET numerical library. •  Previously @ Microsoft Research. •  Data science team lead at Rangespan.
  3. 3. Taxonomy  Classification •  Input: raw product data •  Output: classification models, classified product data ROOT Electronics Audio Audio   Cables Amps … Computers … Clothing Pants T-­‐‑Shirts … Toys Model   Rockets … …
  4. 4. Data   Collection Feature   Extraction Training Testing Labeling
  5. 5. Feature  Extraction
  6. 6. Name: INK-M50 Black Ink Cartridge (600 pages) Manufacturer: Samsung Description: null Label: toner-inkjet-cartridges "category": "toner-inkjet-cartridges”, "features": ["cartridge", "samsung", "black", "ink", "ink-m50", "pages”] Feature  Extraction: •  Text  cleaning  (stopword,  lexicalisation) •  Unigram  +  Bigram  Features •  LDA  Topic  Features Data   Collection Feature   Extraction Training Testing Labelling
  7. 7. h"p://radimrehurek.com/gensim
  8. 8. Training,  Testing  &  Labelling
  9. 9. Hierarchical  Classification D A C B E D A C E B 4  (5)  way  multiclass  classification
  10. 10. Hierarchical  Classification D A C B E D A C B E 2  +  3  way  multiclass  classification
  11. 11. Naïve  Bayes            Neural  Network Logistic  Regression Support   Vector   Machines   … ?
  12. 12. Logistic  Regression  -­‐‑  Model word printer-­‐‑ ink printer-­‐‑hardware cartridge 4.0 0.3 the 0.0 0.0 samsung 0.5 0.5 black 0.5 0.3 printer -­‐‑1.0 2.0 ink 5.0 -­‐‑1.7 … … … For each class For each feature Add the weight Exponentiate & Normalize 10.0 Σ= -­‐‑0.6 Pr= 0.99997 0.0003 Data   Collection Feature   Extraction Training Testing Labelling
  13. 13. Logistic  Regression  -­‐‑  Inference •  Optimise using Wapiti. •  Hyperparameter optimisation using grid search. •  Using development set to stop training? Data   Collection Feature   Extraction Training Testing Labelling
  14. 14. h"p://wapiti.limsi.fr/
  15. 15. ROOT Electronics Clothing Data   Collection Feature   Extraction Training Testing Labelling
  16. 16. Cross Validation Calibration •  Estimate classifier errors. •  DO NOT o  Test on training data. o  Leave data aside. •  Are my probability estimates correct. •  Computation: o  Take x data points with p(.|x) = 0.9, o  Check that about 90% of labels were correct. Data   Collection Feature   Extraction Training Testing Labelling Training  Data Error  =  1.2% Error  =  1.1% Error  =  1.2% Error  =  1.2% Error  =  1.3% = Error  =  1.2%
  17. 17. Data   Collection Feature   Extraction Training Testing Labelling ROOT Electronics Clothing Using  Bayes  rule  to  chain  classifiers:
  18. 18. Active  Learning
  19. 19. ROOT Electronics Clothing p(electronics|{text})  =  0.1 Data   Collection Feature   Extraction Training Testing Labelling
  20. 20. •  High probability datapoints o  Upload to production •  Low probability datapoints o  Subsample o  Acquire more labels Data   Collection Feature   Extraction Training Testing Labelling ROOT Electronics Clothing p(electronics|{text})  =  0.1 e.g.  Mechanical  Turk
  21. 21. Implementation
  22. 22. Implementation MongoDB S3  Raw S3  Training  Data S3  Models 1.  JSON  export 2.  Feature  Extraction 3.  Training 4.  Classification
  23. 23. Training   MapReduce •  Dumbo on Hadoop •  2000 classifiers •  5 fold CV (+ full) •  20 hypers on grid = 200.000 training runs
  24. 24. Labelling •  128 chunks •  Full Cascade each chunk D A CB E Chunk   1 Chunk   2 Chunk   3 Chunk   N … D A CB ED A CB ED A CB E
  25. 25. Thoughts •  Extra’s: o Partial labeling: stop when probability becomes low. o Data ensemble learning. •  Most time spent feature engineering. •  Tie the parameters of the classifiers? o Frustratingly easy domain adaptation, Hal Daume III •  Partially flattening the hierarchy for training?

×