Word embeddings for social goods

Word embeddings for
social goods
Kyiv Deep Learning Study Group #1
Sergii Gavrylov

Overview
● Problem description
● Data preprocessing
● Bag-of-words
● Continuous bag-of-words
● Weighted continuous bag-of-words
● Convolutional neural network

Dataset labels
Function
Object_Type
Operating_Status
Position_Type
Pre_K
Reporting
Sharing
Student_Type
Use

Loss function
Multi-multi-class log loss

Preprocessing
● Tokenize all text columns with OpenNLP, lowercase tokens
and filter out stop words ["‐", "-", "-", "‒", "–", "—", "―", "－", "+", "/", "*",
".", ",", "'", "(", ")", """, "&", ":", "to", "of", "and", "or", "for", "the", "a"]

Preprocessing
● Perform softmax normalization for all float columns
● Replace all NaNs in float columns with 0

Preprocessing
● Perform softmax normalization for all float columns
● Replace all NaNs in float columns with 0
● Keep floats intact, replace NaNs with the specified value
or

Word representation
One-hot encoding
social [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
public [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]

Bag-of-words
Text column features

Bag-of-words
Sub_Object_Description
employees
wages
salaries
services
personal

Bag-of-words
employees
wages
salaries
services
personal [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Bag-of-words
employees
wages
salaries
services
personal [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Sub_Object_Description_bow [0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0]
∑

Bag-of-words
● Concatenated BOW features and floats comprise final feature
vector.

Bag-of-words
vector.
● Replace FTE field NaNs with -1, Total filed NaNs with -20000

Bag-of-words
vector.
● Train sklearn.ensemble.RandomForestClassifier

Bag-of-words
vector.
● Score: 0.8671

Bag-of-words
(+) Pros
● Simplicity

Bag-of-words
(+) Pros
(-) Cons
● Simplicity
● Notion of word similarity is undefined with one-hot encoding

Bag-of-words
(+) Pros
(-) Cons
● Simplicity
social [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
public [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]

Bag-of-words
(+) Pros
(-) Cons
● Simplicity
social [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
public [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]
● Impossible to generalize to unseen words

Bag-of-words
(+) Pros
(-) Cons
● Simplicity
social [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
public [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]
● Impossible to generalize to unseen words
● One-hot encoding can be memory inefficient

Word representation
Distributed representation
social [-0.56, 8.65, 5.32, -3.14]
public [-0.42, 9.84, 4.51, -2.71]

Word representation
Cosine similarity

Stanford GloVe
Trained on the Common Crawl (840B tokens)
Vector dimensionality is 300
nlp.stanford.edu/projects/glove

Sub_Object_Description Function_Description...Text columns
Continuous bag-of-words

personal
employees
wages
salaries
services
instructional
staff
training
services
words words
...

personal
employees
wages
salaries
services
instructional
staff
training
services
words vectors words vectors
...

personal
employees
wages
salaries
services
instructional
staff
training
services
Sub_Object_Description Function_Description
∑ ∑
...Text columns
...

personal
employees
wages
salaries
services
instructional
staff
training
services
Sub_Object_Description Function_Description
∑ ∑
CBOW features
...Text columns
...
...

● Concatenated CBOW features and floats comprise the final
feature vector.

feature vector.

feature vector.
● Score: 0.6616

(+) Pros
● Simplicity

(+) Pros
● Simplicity
● Possible to generalize to unseen words

(+) Pros
● Simplicity
(-) Cons
● All words are equal

(+) Pros
● Simplicity
(-) Cons
● All words are equal, but some words are more equal than others

Weighted CBOW

Weighted CBOW
employees
wages
salaries
services
personal

Weighted CBOW
employees
wages
salaries
services
personal ×
×
×
×
×

Weighted CBOW
employees
wages
salaries
services
personal ×
×
×
×
×
∑
Sub_Object_Description_wcbow

Weighted CBOW
● Concatenated WCBOW features and floats compose the final
feature vector.

Weighted CBOW
feature vector.
● Use softmax normalization for float columns

Weighted CBOW
feature vector.
● Replace all float NaNs with 0

Weighted CBOW
feature vector.
● Train softmax jointly with θc parameters

Weighted CBOW
feature vector.
● Train softmax jointly with θc parameters
● Score: 0.9159

Weighted CBOW
Why so poorly?
H1: Softmax is not so powerful as Random forest

Weighted CBOW
Why so poorly?
H1: Softmax is not as powerful as Random Forest
H2: Model assumes that for describing relevant words it is
enough to use one direction per column in the word space

How many directions
should a good model
have?

Convolutional NN
personal services wagessalariesemployees

Convolutional NN
mean
max

Convolutional NN
max
mean

Convolutional NN
employees
wages
salaries
services
personal

Convolutional NN
employees
wages
salaries
services
personal
×

Convolutional NN
employees
wages
salaries
services
personal
× =

Convolutional NN
employees
wages
salaries
services
personal
× =
Stride size = word dimensionality

Convolutional NN
● Concatenated mean and max values for feature maps and floats form the final
feature vector

Convolutional NN
feature vector

Convolutional NN
feature vector
● Train softmax with Wf parameters jointly

feature vector
● Train softmax with Wf parameters jointly
● Score: 0.6932
Convolutional NN

Convolutional NN
Why is it not as good as CBOW + RF?

Convolutional NN
● There is fewer parameters

Convolutional NN
● The performance is still comparable to CBOW+RF. Therefore, using cnn
is a sensible idea.

Convolutional NN
● The performance is still comparable to CBOW+RF. Therefore, using cnn
is a sensible idea.
● Probably, we could gain more from this type of feature learner if we go
deeper

Final model
● Train RF on the concatenated CBOW features and NN logits

Final model
● Train 2 CBOW classifiers, 2 NN classifiers, 2 meta-classifiers
and blend them

Final model
● Train 2 CBOW classifiers, 2 NN classifiers, 2 meta-classifiers
and blend them
● Score: 0.5228

Conclusion
● Explore your data before doing any analysis
● Keep trying
● Ensembles are powerful
● Participating in competitions provides a great
learning opportunity

Word embeddings for social goods

Recommended

Recommended

More Related Content

Similar to Word embeddings for social goods

Similar to Word embeddings for social goods (20)

Recently uploaded

Recently uploaded (20)

Word embeddings for social goods

Editor's Notes