Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
What to Upload to SlideShare
Loading in …3
×
1 of 42

Power of Visualizing Embeddings

2

Share

Download to read offline

Text or Image classification done using deep neural networks presents us with a unique way to identify each trained image/word via something known as ‘Embedding’. Embedding refers to fix sized vectors which are learnt during the training process of a neural network but it is very difficult to make sense of these random values.

Related Books

Free with a 30 day trial from Scribd

See all

Power of Visualizing Embeddings

  1. 1. The Power of Visualizing Embeddings Pramod Singh
  2. 2. About Me .. ▪ Team Lead – Data Science Bain and Company ▪ Speaker -O’Reilly Strata conference -GIDS ▪ Published Author • Machine Learning using PySpark • Learn PySpark • Learn TensorFlow 2.0 : The easy way • Machine Learning in Production ( WIP) ▪ https://www.linkedin.com/in/pramodchahar/
  3. 3. Agenda ▪ Inspiration for this session ▪ Conventional Approach ▪ Learning Embeddings ▪ Custom Embeddings ▪ Visualizing Embeddings ▪ FAQs
  4. 4. Interactions Finance Exteriors Interiors Maintenance Car DealerFeatures
  5. 5. User Journey – Core Elements Different Pages Categories Time Spent Sequence of Events
  6. 6. User Representation – I User ID Total Visits Total Time Spent Total Pages Total Sessions Converted 121A 10 25 110 4 0
  7. 7. User Representation – II User ID Total Pages Finance Specification Dealer … … Finance(sec) Specification(sec) Dealer(sec) … Converted 121A 10 3 4 4 3 5 5 0
  8. 8. All Users User ID Total Pages Finance Specification Dealer … … Finance(sec) Specification(sec) Dealer(sec) … Converted 121A 10 3 4 4 3 5 5 0 19X2 50 0 21 0 0 350 0 0 GG52 33 8 4 9 45 50 78 1
  9. 9. Applicable to other domains Finance & Insurance E-Commerce/Retail Real Estate
  10. 10. Key Questions ▪ Which set of customer journeys are similar to each other ? ▪ Which set of customer journeys indicate broken vs seamless experience ? ▪ Which are those 4-5 major routes that customers takes in order to convert ?
  11. 11. Category Representation Frequency Based Prediction BasedOne Hot Encoding
  12. 12. Challenges High-Cardinality Variables Sematic Signal w/o Supervision
  13. 13. Challenges Specifications Number of columns = Number of unique categories Price Features Specifications Price Features Reviews .. … 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 Similarity between Specifications and Price = 0 Similarity between Price and Features = 0 Similarity between Features and Specifications = 0
  14. 14. Gaps • Sequence of events is ignored Can we represent each of these page categories with a vector which captures the underlying semantics ? Using this vector , can we represent each user journey?
  15. 15. Embeddings
  16. 16. Embeddings “An embedding is a mapping of a discrete — categorical — variable to a vector of continuous numbers such that the vectors of similar entities are closer to one another in vector space.” king  -  man + woman = queen
  17. 17. Category Similarity using Embeddings Price 0.43 0.75 0.98 … …. … 0.55 0.87 Specification 0.23 0.10 0.33 … …. … 0.45 0.20 Features 0.22 0.09 0.30 … …. … 0.44 0.18 Similarity between Specifications and Price = - 0.75 Similarity between Price and Features = - 0.83 Similarity between Features and Specifications = 0.91
  18. 18. Immediate Advantages Fix Size Representation Similar Categories
  19. 19. Embeddings Image Text Music User
  20. 20. Learning Embeddings Without Label With Label Pre-Trained
  21. 21. — John R. Firth (a dominant figure in 20th century Linguistics) “You shall know a word by the company it keeps.”
  22. 22. Without Label
  23. 23. Sequence Based Embedding The earth is round and moves around the sun “All we need is a sequence of categories”
  24. 24. Sequence Based Embedding* The earth is round and moves around the sun • Context and Target Words • Given a word, which are the neighboring words ? • Given the neighboring words, what's the target word ? *Window size
  25. 25. CBOW Model The Earth round and is Neural Network Target
  26. 26. Skip-Gram Model The Earth round and is
  27. 27. Word2Vec Homepage Offers Finance Offers Specification … … Test Drive
  28. 28. With Label
  29. 29. Embedding layer in DNN Homepage Offers Finance Specifications … … … … CrossEntropy Loss Observed target Predicted target Embedding Layer
  30. 30. Custom Embeddings
  31. 31. Category Embeddings Page-Category Embedding Brochure Reviews Finance Test Drive Specification 0.13 0.45 .. 0.21 0.67 Column Length : Embedding Size : 100 0.25 0.23 .. 0.53 0.98 0.98 0.12 .. 0.34 0.76 0.21 0.53 .. 0.23 0.87 0.87 0.24 .. 0.63 0.25
  32. 32. Embedding Visualization Categories related to services, warranty, review are closer Categories related to test drive activities are closer Categories vehicle information are closer
  33. 33. User Journey Mapping Page ‘A’ Page ‘B’ Page ‘D’Page ‘C’Visitor 1 Page ‘E’ Page ‘B’ Page ‘D’ Page ‘E’Visitor 2 0.43 0.75 0.98 0.55 0.87 0.54 0.23 0.56 0.35 0.76
  34. 34. Customize Embeddings Page-Category Embedding Brochure Reviews Finance Test Drive Specification 0.13 0.45 .. 0.21 0.67 Column Length : Embedding Size : 100 0.25 0.23 .. 0.53 0.98 0.98 0.12 .. 0.34 0.76 0.21 0.53 .. 0.23 0.87 0.87 0.24 .. 0.63 0.25 User Journey 0.43 0.75 0.98 0.55 0.87 Brochure Specification Finance Reviews Test Drive Time Spent Brochure 0.13 0.45 .. 0.21 0.67 0.43 Time Spent Specification 0.25 0.23 .. 0.53 0.98 0.75 … … … … … … Test Drive 0.87 0.24 .. 0.63 0.25 0.87 User Journey Embedding 0.53 0.76 0.35 0.65 0.89
  35. 35. Customer Journey Visualization* *Dummy Data
  36. 36. Tensorflow Projector
  37. 37. Advantages of Embeddings • Finding nearest neighbours in the low dimensional space • Input features for machine learning prediction • For understanding relations between between categories
  38. 38. Additional Resources • https://towardsdatascience.com/neural-network-embeddings-explained-4d028e6f0526 • http://jalammar.github.io/illustrated-transformer/ • https://www.youtube.com/results?search_query=sequence+embeddings+pramod
  39. 39. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.

×