Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Shop vertical classification - Meetup Presentation

133 views

Published on

Presentation at the "Machine Learning" meetup in Toronto, March 1, 2016

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Shop vertical classification - Meetup Presentation

  1. 1. Shop Vertical Classification @ Arthur Prévot Meetup Machine Learning – Toronto – March 1st 2016
  2. 2. Background • Large ecommerce platform • 240K+ current customers • Many more shops created (churned or didn’t make it to customer status)
  3. 3. Problem ● No information about their industry in most cases 1st solution ● ask them 2nd solution ● We have html product descriptions for each shop ● We have labelled data (mechanical turk) Classifier
  4. 4. Context • Started during a Shopify Hack Day • Pursued as a side project at work • Used sk-learn and • Moved to Spark MLlib for full scale testing and production • Now in production
  5. 5. Product Description
  6. 6. Getting Label Data • Asked Amazon Mechanical Turkers to assess 80K stores • Having to choose among 15 verticals • Involved hundreds of turkers
  7. 7. 80K shops Shop Aggregated product data 1 “Nice octopolo shirt !…” 2 “Nice hat and nice shirt …” 3 “Set of <b> tires </b> …” 4 “Beef and more beef…” 5 “Tire set for bikes” ... ... Input
  8. 8. 80K shops Shop Text 1 “nice octopolo shirt…” 2 “nice hat and nice shirt…” 3 “set tire…” 4 “beef beef…” 5 “tire set bike” ... ... Cleaning • HTML code removed • Stop word removed • Words stemmed
  9. 9. Shops nice octopolo shirt hat set tires beef bike ... label 1 1 1 1 ... Apparel 2 2 1 1 ... Apparel 3 1 1 ... Auto 4 2 … Food 5 1 1 1 … Auto ... ... ... ... … … … … … ... … 10K words (8 in ex) Term Frequency 80Kshops Joining mech turk
  10. 10. Model • Few quick tests using sklearn and settled on Naïve Bayes
  11. 11. Shops nice octopolo shirt hat set tires beef bike label 1 1 1 1 Apparel 2 2 1 1 Apparel 3 1 1 Auto 4 2 Food 5 1 1 1 Auto 80Kshops Shops nice octopolo shirt hat set tires beef bike label priors 1, 2 P (nice | apparel) P (octopolo | apparel) P (shirt | aprel) P (hat | apparel) P (set | apparel) P (tires | aprel) P (beef | apparel) P (bike | apparel) Apparel P(apparel) 3, 5 P (nice | auto) P (octopolo | auto) P (shirt | auto) P (hat || auto) P (set || auto) P (tires || auto) P (beef | auto) P (bike | auto) Auto P(auto) 4 P (nice | food) P (octopolo | food) P (shirt | food) P (hat || food P (set || food) P (tires || food) P (beef | food) P (bike | food) Food P(food) 15labels Naïve Bayes Model
  12. 12. Shops nice octopolo shirt hat set tires beef bike label priors 1, 2 P (nice | apparel) P (octopolo | apparel) P (shirt | aprel) P (hat | apparel) P (set | apparel) P (tires | aprel) P (beef | apparel) P (bike | apparel) Apparel P(apprel) 3, 5 P (nice | auto) P (octopolo | auto) P (shirt | auto) P (hat || auto) P (set || auto) P (tires || auto) P (beef | auto) P (bike | auto) Auto P(auto) 4 P (nice | food) P (octopolo | food) P (shirt | food) P (hat || food P (set || food) P (tires || food) P (beef | food) P (bike | food) Food P(food) What and why • These are the model parameters • Needed as input to the prediction formula !"#$%&'#$ )*+,, = +"./+01 ! &* $2&)
  13. 13. Shops nice octopolo shirt hat set tires beef bike label priors 1, 2 P (nice | apparel) P (octopolo | apparel) P (shirt | aprel) P (hat | apparel) P (set | apparel) P (tires | aprel) P (beef | apparel) P (bike | apparel) Apparel P(apparel) 3, 5 P (nice | auto) P (octopolo | auto) P (shirt | auto) P (hat || auto) P (set || auto) P (tires || auto) P (beef | auto) P (bike | auto) Auto P(auto) 4 P (nice | food) P (octopolo | food) P (shirt | food) P (hat || food P (set || food) P (tires || food) P (beef | food) P (bike | food) Food P(food) What and why ! &* $2&) = 4 15 ∗4 781 15) 4(781) ∝ ! &* ∗ ! $2& &*) = ! &* ∗ ! ;$< &*) * ! ;$= &*) * … * ! ;$> &*) (Bayes Theorem) with conditional independence assumption, actually violated.. denominator not important to compare likelihoods !"#$%&'#$ )*+,, = +"./+01 ! &* $2&)
  14. 14. Shops nice octopolo shirt hat set tires beef bike label priors 1, 2 P (nice | apparel) P (octopolo | apparel) P (shirt | aprel) P (hat | apparel) P (set | apparel) P (tires | aprel) P (beef | apparel) P (bike | apparel) Apparel P(apparel) 3, 5 P (nice | auto) P (octopolo | auto) P (shirt | auto) P (hat || auto) P (set || auto) P (tires || auto) P (beef | auto) P (bike | auto) Auto P(auto) 4 P (nice | food) P (octopolo | food) P (shirt | food) P (hat || food P (set || food) P (tires || food) P (beef | food) P (bike | food) Food P(food) Numerical Limitation • Multiplying many values close to 0 -> float underflow ! &* $2&) ∝ ! &* ∗ ! ;$< &*) * ! ;$= &*) * … * ! ;$> &*)
  15. 15. Shops nice octopolo shirt hat set tires beef bike label priors 1, 2 Log(P(..)) Log(P(..)) Log(P(. .)) Log(P(..)) Log(P(..)) Log(P(.. )) Log(P(..)) Log(P(..)) Apparel Log(P(..)) 3, 5 Log(P(..)) Log(P(..)) Log(P(. .)) Log(P(..)) Log(P(..)) Log(P(.. )) Log(P(..)) Log(P(..)) Auto Log(P(..)) 4 Log(P(..)) Log(P(..)) Log(P(. .)) Log(P(..)) Log(P(..)) Log(P(.. )) Log(P(..)) Log(P(..)) Food Log(P(..)) Numerical limitation ?2. ! &* $2&) ∝ log ! &* + log( ! ;$< &*)) + log (! ;$= &*)) + … + log(! ;$> &*)) • Way around: take log -> leads to summation instead of multiplication • No impact on comparisons across classes ! &* $2&) ∝ ! &* ∗ ! ;$< &*) * ! ;$= &*) * … * ! ;$> &*) From before, so:
  16. 16. Shops nice octopolo shirt hat set tires beef bike label priors 1, 2 P (nice | apparel) P (octopolo | apparel) P (shirt | aprel) P (hat | apparel) P (set | apparel) P (tires | aprel) P (beef | apparel) P (bike | apparel) Apparel P(apprel) 3, 5 P (nice | auto) P (octopolo | auto) P (shirt | auto) P (hat || auto) P (set || auto) P (tires || auto) P (beef | auto) P (bike | auto) Auto P(auto) 4 P (nice | food) P (octopolo | food) P (shirt | food) P (hat || food P (set || food) P (tires || food) P (beef | food) P (bike | food) Food P(food) Getting cell probabilities ! ;$> &*) = DEF GH ∑ DEFKLEMN Dealing with P(wd|cl)=0 which makes P(cl|doc)=0 regardless of other words !(&*) = DEF D ≈ DEF GH P< ∑ (DEFP<)KLEMN = DEF GH P< ∑ (DEF)PQ81RSKLEMN
  17. 17. Shops nice octopolo shirt hat set tires beef bike label 1 1 1 1 Apparel 2 2 1 1 Apparel 3 1 1 Auto 4 2 Food 5 1 1 1 Auto 80Kshops Shops nice octopolo shirt hat set tires beef bike label priors 1, 2 3 + 1 7 + 8 1 + 1 7 + 8 2 + 1 7 + 8 1 + 1 7 + 8 0 + 1 7 + 8 0 + 1 7 + 8 0 + 1 7 + 8 0 + 1 7 + 8 Apparel 2 5 3, 5 Auto 4 Food 15labels
  18. 18. Shops nice octopolo shirt hat set tires beef bike label 1 1 1 1 Apparel 2 2 1 1 Apparel 3 1 1 Auto 4 2 Food 5 1 1 1 Auto 80Kshops Shops nice octopolo shirt hat set tires beef bike label priors 1, 2 3 + 1 7 + 8 1 + 1 7 + 8 1 + 1 7 + 8 0 + 1 7 + 8 0 + 1 7 + 8 0 + 1 7 + 8 0 + 1 7 + 8 0 + 1 7 + 8 Apparel 2 5 3, 5 0 + 1 5 + 8 0 + 1 5 + 8 0 + 1 5 + 8 0 + 1 5 + 8 2 + 1 5 + 8 2 + 1 5 + 8 0 + 1 5 + 8 1 + 1 5 + 8 Auto 2 5 4 Food 15labels
  19. 19. Shops nice octopolo shirt hat set tires beef bike label 1 1 1 1 Apparel 2 2 1 1 Apparel 3 1 1 Auto 4 2 Food 5 1 1 1 Auto 80Kshops Shops nice octopolo shirt hat set tires beef bike label priors 1, 2 3 + 1 7 + 8 1 + 1 7 + 8 1 + 1 7 + 8 0 + 1 7 + 8 0 + 1 7 + 8 0 + 1 7 + 8 0 + 1 7 + 8 0 + 1 7 + 8 Apparel 2 5 3, 5 0 + 1 5 + 8 0 + 1 5 + 8 0 + 1 5 + 8 0 + 1 5 + 8 2 + 1 5 + 8 2 + 1 5 + 8 0 + 1 5 + 8 1 + 1 5 + 8 Auto 2 5 4 0 + 1 2 + 8 0 + 1 2 + 8 0 + 1 2 + 8 0 + 1 2 + 8 0 + 1 2 + 8 0 + 1 2 + 8 2 + 1 2 + 8 0 + 1 2 + 8 Food 1 5 15labels
  20. 20. class LabeledDataFilter(): ... class Featurizer(): ... class Trainer() ... class Evaluator() ... class Predictor() ... class verticalPredictor(): use Featurizer() use Predictor() ... product_data Training job (every 7 days) Prediction job (every day) model accuracy product_data shop+industry model Code
  21. 21. Change in Training Set • Start of home card • Allowed asking for Industry in a voluntary way • Quickly grew to 50K shops • Advantage: growing over time • Issue: training set is not fully random
  22. 22. Shop Name Shop URL Shop Address Shop City … Shop Predicted Industry … Shop Dimension In the Data Warehouse Updated daily
  23. 23. Results Shops top category turker 1 turker2 turker 3 Chive Apparel Apparel Apparel Art Lackers Sports Sports Apparel Sports Tesla Auto Auto Auto Sports ... ... ... ... 60-80%
  24. 24. Results Shops top category turker 1 turker2 turker 3 algo top1 algo top2 algo top3 Chive Apparel Apparel Apparel Art Apparel Sport Art Lackers Sports Sports Apparel Sports Sports Apparel Food Tesla Auto Auto Auto Sports Fashion Auto Electro ... ... ... ... 60-80% ~65%
  25. 25. Results Shops top category turker 1 turker2 turker 3 algo top1 algo top2 algo top3 Chive Apparel Apparel Apparel Art Apparel Sport Art Lackers Sports Sports Apparel Sports Sports Apparel Food Tesla Auto Auto Auto Sports unknown Auto Electro ... ... ... ... 90% ~75%
  26. 26. Business Use Management or product teams: • What are the biggest industries per shop count, per sales made? • How does that evolve over time ? Theme team: • We want to develop new themes for a given vertical, can we see the top stores in this vertical to understand trends ? Event team: • We want to be part of an event in the music business, can we get interesting shops in this field ?
  27. 27. Could be improved ●More metrics: Add multiclass precision/recall ○Now available in mllib ●Better performances: Rerun for combination of parameters ○Also added recently to mllib but missing some components
  28. 28. DEMO
  29. 29. THE END

×