• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Hierarchical Classification by Jurgen Van Gael
 

Hierarchical Classification by Jurgen Van Gael

on

  • 208 views

Hierarchical Classification by Jurgen Van Gael

Hierarchical Classification by Jurgen Van Gael

Statistics

Views

Total Views
208
Views on SlideShare
207
Embed Views
1

Actions

Likes
0
Downloads
7
Comments
0

1 Embed 1

http://www.steampdf.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Hierarchical Classification by Jurgen Van Gael Hierarchical Classification by Jurgen Van Gael Presentation Transcript

    • Hierarchical  Classification Jurgen Van Gael - .
    • About •  Computer Scientist w/ background in ML. •  London Machine Learning Meetup. •  Founder of Math.NET numerical library. •  Previously @ Microsoft Research. •  Data science team lead at Rangespan.
    • Taxonomy  Classification •  Input: raw product data •  Output: classification models, classified product data ROOT Electronics Audio Audio   Cables Amps … Computers … Clothing Pants T-­‐‑Shirts … Toys Model   Rockets … …
    • Data   Collection Feature   Extraction Training Testing Labeling
    • Feature  Extraction
    • Name: INK-M50 Black Ink Cartridge (600 pages) Manufacturer: Samsung Description: null Label: toner-inkjet-cartridges "category": "toner-inkjet-cartridges”, "features": ["cartridge", "samsung", "black", "ink", "ink-m50", "pages”] Feature  Extraction: •  Text  cleaning  (stopword,  lexicalisation) •  Unigram  +  Bigram  Features •  LDA  Topic  Features Data   Collection Feature   Extraction Training Testing Labelling
    • h"p://radimrehurek.com/gensim
    • Training,  Testing  &  Labelling
    • Hierarchical  Classification D A C B E D A C E B 4  (5)  way  multiclass  classification
    • Hierarchical  Classification D A C B E D A C B E 2  +  3  way  multiclass  classification
    • Naïve  Bayes            Neural  Network Logistic  Regression Support   Vector   Machines   … ?
    • Logistic  Regression  -­‐‑  Model word printer-­‐‑ ink printer-­‐‑hardware cartridge 4.0 0.3 the 0.0 0.0 samsung 0.5 0.5 black 0.5 0.3 printer -­‐‑1.0 2.0 ink 5.0 -­‐‑1.7 … … … For each class For each feature Add the weight Exponentiate & Normalize 10.0 Σ= -­‐‑0.6 Pr= 0.99997 0.0003 Data   Collection Feature   Extraction Training Testing Labelling
    • Logistic  Regression  -­‐‑  Inference •  Optimise using Wapiti. •  Hyperparameter optimisation using grid search. •  Using development set to stop training? Data   Collection Feature   Extraction Training Testing Labelling
    • h"p://wapiti.limsi.fr/
    • ROOT Electronics Clothing Data   Collection Feature   Extraction Training Testing Labelling
    • Cross Validation Calibration •  Estimate classifier errors. •  DO NOT o  Test on training data. o  Leave data aside. •  Are my probability estimates correct. •  Computation: o  Take x data points with p(.|x) = 0.9, o  Check that about 90% of labels were correct. Data   Collection Feature   Extraction Training Testing Labelling Training  Data Error  =  1.2% Error  =  1.1% Error  =  1.2% Error  =  1.2% Error  =  1.3% = Error  =  1.2%
    • Data   Collection Feature   Extraction Training Testing Labelling ROOT Electronics Clothing Using  Bayes  rule  to  chain  classifiers:
    • Active  Learning
    • ROOT Electronics Clothing p(electronics|{text})  =  0.1 Data   Collection Feature   Extraction Training Testing Labelling
    • •  High probability datapoints o  Upload to production •  Low probability datapoints o  Subsample o  Acquire more labels Data   Collection Feature   Extraction Training Testing Labelling ROOT Electronics Clothing p(electronics|{text})  =  0.1 e.g.  Mechanical  Turk
    • Implementation
    • Implementation MongoDB S3  Raw S3  Training  Data S3  Models 1.  JSON  export 2.  Feature  Extraction 3.  Training 4.  Classification
    • Training   MapReduce •  Dumbo on Hadoop •  2000 classifiers •  5 fold CV (+ full) •  20 hypers on grid = 200.000 training runs
    • Labelling •  128 chunks •  Full Cascade each chunk D A CB E Chunk   1 Chunk   2 Chunk   3 Chunk   N … D A CB ED A CB ED A CB E
    • Thoughts •  Extra’s: o Partial labeling: stop when probability becomes low. o Data ensemble learning. •  Most time spent feature engineering. •  Tie the parameters of the classifiers? o Frustratingly easy domain adaptation, Hal Daume III •  Partially flattening the hierarchy for training?