Data Summer Conf 2018, “From the math to the business value: machine learning in the real world (ENG)” — Javier Rodriguez Zaurin, Data Scientist at Simply Business

Recommendation algorithms
in fashion
Rehan Ali, Y.J. Kim and Nik Anestev

2
I am Javier
DAWN: Deep Learning to Analyze Webb-detected
Nascent Galaxies
Machine Learning in Retail (jaggu.com)
B2B NLP services (growthintel.com)
Machine Learning in Insurance

RecoTour
◉ Similarity-based recommendations
◉ KNN CF
◉ GBM-based recommendations
◉ LibLinear, FM, FFM
◉ NMF
◉ Deep Learning
https://github.com/jrzaurin/RecoTour

Recommendation
algorithms in
Fashion
◉ Computer Vision
◉ NLP
◉ User Behaviour
◉ Production Pipeline

1 Computer Vision
Computer Vision is a field that includes methods for acquiring, processing, analysing and
understanding images to produce numerical information that can be interpreted by a
computer.

Item similarity: shape,color, pattern

Shape: Shape Context
1. Per point in the contour:
1.1. Find, within a given radius, the number of
other points at a distance d and angle .
1.2. Fill the corresponding bucket in a “Shape
Context” Matrix
2. Repeat
0 0 2 6 5 12 11 0 0 23 0 0
0 0 0 0 34 1 4 9 10 17 0 0
0 1 5 14 0 0 6 45 1 23 0 1
4 8 0 9 21 9 0 0 6 12 9 0
d
https://github.com/jrzaurin/Shoe-Shape-Classifier
cv2.ShapeContextDistanceExtractor()

Color: color histograms
In the RGB space colors range from 0 to 255. Simply
divide each channel into N bins and count the number
of pixels within each color-bin.
cv2.calcHist()

Color: color naming
We “all” see different colors.

Color: color naming algorithm
1. Prepare color naming table: download table and transform RGB into LAB color space.
2. Remove Image background
3. Compute a LAB color histogram of the image without background and extract those
bins that have nonzero counts
4. Compute Color distances : calculate the color distance between each bin and the LAB
values in the table from Step 1. The color distance should be computed using an
adequate metric, e.g. deltaE functions.
5. Count and assign color : let's say we have 100 nonzero bins. Based on the distances
computed at Step 4 we know that The LAB values from those bins are:
{bin1 : blue, bin2: blue, bin3: blue, ..., bin45: red, bin46: red, ..., bin100: green}
{bin1 : 10 pixels, bin2: 24, bin3: 2, ..., bin45: 20, bin46: 15, ..., bin100: 26}
{blue: 1293, red: 67, green: 325}
Content of the image is "blue" or "blue and green”

Pattern: Gabor Filters
cv2.getGaborKernel( ksize, σ, θ, λ, γ, ψ, ktype)
Where

Real Scenes
fashion scene
Automatically detect
and segment clothing

Real Scenes: clothe detection and segmentation algorithm
The algorithm splits an image I into n regions L = l1
,l1
,...,ln
. It then uses a pose estimator and a
skin detector to determine which of the regions are clothing ( C ) or not clothing (¬C)
◉ Pose estimator generates a probability map: pbody
◉ Skin detector generates a map pskin
◉ Edge detector generates L = l1
,l1
,...,ln
.
◉ We then classify each pixel in each region I(i,j)
○
○
◉ Classifies each region li
with pixel count ti
:
○
○
◉ Non-Maximal Suppression to retrieve largest clothing region

Dollár, Piotr, and C. Lawrence Zitnick. "Structured forests for fast edge detection." Computer Vision (ICCV), 2013
IEEE International Conference on. IEEE, 2013.

Background Torso Similar Regions Skin Clothing

2 NLP
Natural Language Processing is a field of computer science, artificial intelligence, and
computational linguistics concerned with the interactions between computers and
human (natural) languages
Item similarity: text mining+tf-idf
[…] It's crafted from coated canvas and decked out in the designer's famous
geometric pattern in tones of black, white and brown. There's plenty of room
inside to hold daily or overnight essentials. Carry it by the black leather handles
or over the shoulder with the detachablestrap. Black, white and brown
cube-print, coated-canvas Black leather top handless and trim detail Front and
back slip-pockets Gold-tone hook-fastening and zip top-closure Detachabel
and adjustabel […]

Text mining: spell corrector
norvig.com/spell-correct.html
For a given misspelled word w, we are trying to find the correction c, out of all possible
candidate corrections, that maximizes the probability that c is the intended correction:
◉ Language Model: P(c)
Probability that c appears as a word of English text
◉ Error Model: P(c|w)
Probability that w would be typed in a text when the author meant c

tf-idf
Term Frequency tf(t,d)
Can simply be the raw counts of a term in a document or a modified version
Inverse document Frequency idf(t,d)
Is a measure of how much information a word provide
tf-idf searcher:
Similarity Score = w1
*simproduct_title
+ w2
*simprod_description

CV
Content Based Recommender
NLP
Shape Vector
Color Vector
Pattern Vector
Title Vector
Description Vector

User behaviour
u1
u2
u3
u4
u5
u6
u7
u8
it1
0 0 1 0 2 0 0 0
it2
0 1 0 3 0 0 0 0
it3
0 0 0 0 1 0 0 1
it4
4 1 0 0 0 0 0 0
it5
0 0 2 0 0 1 5 0
KNN item-based collaborative filtering
Using the interaction matrix we recommend items that
are similar based on how users interact with them
Matrix Factorization
Our goal is to find a representation of our users and
items based on interactions rather than feature-based
definitions
Where R is the MxN interaction matrix (M items and N
users) and I and U are MxK and NxK matrices. K are
called latent factors, and are representations of our
users and items.
Interaction matrix based on
user’s behaviour in the site
https://github.com/jrzaurin/RecoTour

Click, scroll, tap, add
to basket, open, etc
Recommendations
Redshift and CloudSearch
in sync
EC2
Http requests and queue
messages
User Interface
Data Collection and
recommendation service
ML in production
Data processing and ML

Any questions ?
You can find me at
◉ jrzaurin@gmail.com
◉ javier.rodriguez@simplybusiness.co.uk
Thanks!

Data Summer Conf 2018, “From the math to the business value: machine learning in the real world (ENG)” — Javier Rodriguez Zaurin, Data Scientist at Simply Business

Recommended

Recommended

More Related Content

What's hot

What's hot (12)

Similar to Data Summer Conf 2018, “From the math to the business value: machine learning in the real world (ENG)” — Javier Rodriguez Zaurin, Data Scientist at Simply Business

Similar to Data Summer Conf 2018, “From the math to the business value: machine learning in the real world (ENG)” — Javier Rodriguez Zaurin, Data Scientist at Simply Business (20)

More from Provectus

More from Provectus (20)

Recently uploaded

Recently uploaded (20)

Data Summer Conf 2018, “From the math to the business value: machine learning in the real world (ENG)” — Javier Rodriguez Zaurin, Data Scientist at Simply Business