Word embeddings for
social goods
Kyiv Deep Learning Study Group #1
Sergii Gavrylov
Overview
● Problem description
● Data preprocessing
● Bag-of-words
● Continuous bag-of-words
● Weighted continuous bag-of-words
● Convolutional neural network
www.drivendata.org
Box-Plots for Education
Dataset features
Dataset labels
Function
Object_Type
Operating_Status
Position_Type
Pre_K
Reporting
Sharing
Student_Type
Use
Dataset labels
Loss function
Multi-multi-class log loss
Preprocessing
● Tokenize all text columns with OpenNLP, lowercase tokens
and filter out stop words ["‐", "-", "-", "‒", "–", "—", "―", "-", "+", "/", "*",
".", ",", "'", "(", ")", """, "&", ":", "to", "of", "and", "or", "for", "the", "a"]
Preprocessing
● Tokenize all text columns with OpenNLP, lowercase tokens
and filter out stop words ["‐", "-", "-", "‒", "–", "—", "―", "-", "+", "/", "*",
".", ",", "'", "(", ")", """, "&", ":", "to", "of", "and", "or", "for", "the", "a"]
● Perform softmax normalization for all float columns
● Replace all NaNs in float columns with 0
Preprocessing
● Tokenize all text columns with OpenNLP, lowercase tokens
and filter out stop words ["‐", "-", "-", "‒", "–", "—", "―", "-", "+", "/", "*",
".", ",", "'", "(", ")", """, "&", ":", "to", "of", "and", "or", "for", "the", "a"]
● Perform softmax normalization for all float columns
● Replace all NaNs in float columns with 0
● Keep floats intact, replace NaNs with the specified value
or
Word representation
One-hot encoding
social [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
public [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]
Bag-of-words
Text column features
Bag-of-words
Text column features
Sub_Object_Description
employees
wages
salaries
services
personal
Bag-of-words
Text column features
Sub_Object_Description
employees
wages
salaries
services
personal [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Bag-of-words
Text column features
Sub_Object_Description
employees
wages
salaries
services
personal [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Sub_Object_Description_bow [0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0]
∑
Bag-of-words
● Concatenated BOW features and floats comprise final feature
vector.
Bag-of-words
● Concatenated BOW features and floats comprise final feature
vector.
● Replace FTE field NaNs with -1, Total filed NaNs with -20000
Bag-of-words
● Concatenated BOW features and floats comprise final feature
vector.
● Replace FTE field NaNs with -1, Total filed NaNs with -20000
● Train sklearn.ensemble.RandomForestClassifier
Bag-of-words
● Concatenated BOW features and floats comprise final feature
vector.
● Replace FTE field NaNs with -1, Total filed NaNs with -20000
● Train sklearn.ensemble.RandomForestClassifier
● Score: 0.8671
Bag-of-words
(+) Pros
● Simplicity
Bag-of-words
(+) Pros
(-) Cons
● Simplicity
● Notion of word similarity is undefined with one-hot encoding
Bag-of-words
(+) Pros
(-) Cons
● Simplicity
● Notion of word similarity is undefined with one-hot encoding
social [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
public [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]
Bag-of-words
(+) Pros
(-) Cons
● Simplicity
● Notion of word similarity is undefined with one-hot encoding
social [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
public [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]
● Impossible to generalize to unseen words
Bag-of-words
(+) Pros
(-) Cons
● Simplicity
● Notion of word similarity is undefined with one-hot encoding
social [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
public [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]
● Impossible to generalize to unseen words
● One-hot encoding can be memory inefficient
Word representation
Distributed representation
social [-0.56, 8.65, 5.32, -3.14]
public [-0.42, 9.84, 4.51, -2.71]
Word representation
Cosine similarity
Stanford GloVe
Trained on the Common Crawl (840B tokens)
Vector dimensionality is 300
nlp.stanford.edu/projects/glove
Sub_Object_Description Function_Description...Text columns
Continuous bag-of-words
personal
employees
wages
salaries
services
instructional
staff
training
services
words words
Sub_Object_Description Function_Description...Text columns
...
Continuous bag-of-words
personal
employees
wages
salaries
services
instructional
staff
training
services
words vectors words vectors
Sub_Object_Description Function_Description...Text columns
...
Continuous bag-of-words
personal
employees
wages
salaries
services
instructional
staff
training
services
words vectors words vectors
Sub_Object_Description Function_Description
∑ ∑
...Text columns
...
Continuous bag-of-words
personal
employees
wages
salaries
services
instructional
staff
training
services
words vectors words vectors
Sub_Object_Description Function_Description
∑ ∑
CBOW features
...Text columns
...
...
Continuous bag-of-words
personal
employees
wages
salaries
services
instructional
staff
training
services
words vectors words vectors
Sub_Object_Description Function_Description
∑ ∑
CBOW features
...Text columns
...
...
Continuous bag-of-words
Continuous bag-of-words
● Concatenated CBOW features and floats comprise the final
feature vector.
Continuous bag-of-words
● Concatenated CBOW features and floats comprise the final
feature vector.
● Replace FTE field NaNs with -1, Total filed NaNs with -20000
Continuous bag-of-words
● Concatenated CBOW features and floats comprise the final
feature vector.
● Replace FTE field NaNs with -1, Total filed NaNs with -20000
● Train sklearn.ensemble.RandomForestClassifier
Continuous bag-of-words
● Concatenated CBOW features and floats comprise the final
feature vector.
● Replace FTE field NaNs with -1, Total filed NaNs with -20000
● Train sklearn.ensemble.RandomForestClassifier
● Score: 0.6616
Continuous bag-of-words
(+) Pros
● Simplicity
Continuous bag-of-words
(+) Pros
● Simplicity
● Possible to generalize to unseen words
Continuous bag-of-words
(+) Pros
● Simplicity
● Possible to generalize to unseen words
(-) Cons
● All words are equal
Continuous bag-of-words
(+) Pros
● Simplicity
● Possible to generalize to unseen words
(-) Cons
● All words are equal, but some words are more equal than others
Weighted CBOW
Text column features
Weighted CBOW
Text column features
Sub_Object_Description
employees
wages
salaries
services
personal
Weighted CBOW
Text column features
Sub_Object_Description
employees
wages
salaries
services
personal
Weighted CBOW
Text column features
Sub_Object_Description
employees
wages
salaries
services
personal
Weighted CBOW
Text column features
Sub_Object_Description
employees
wages
salaries
services
personal
Weighted CBOW
Text column features
Sub_Object_Description
employees
wages
salaries
services
personal
Weighted CBOW
Text column features
Sub_Object_Description
employees
wages
salaries
services
personal
Weighted CBOW
Text column features
Sub_Object_Description
employees
wages
salaries
services
personal
Weighted CBOW
Text column features
Sub_Object_Description
employees
wages
salaries
services
personal
Weighted CBOW
Text column features
Sub_Object_Description
employees
wages
salaries
services
personal
Weighted CBOW
Text column features
Sub_Object_Description
employees
wages
salaries
services
personal
Weighted CBOW
Text column features
Sub_Object_Description
employees
wages
salaries
services
personal
Weighted CBOW
Text column features
Sub_Object_Description
employees
wages
salaries
services
personal
Weighted CBOW
Text column features
Sub_Object_Description
employees
wages
salaries
services
personal
Weighted CBOW
Text column features
Sub_Object_Description
employees
wages
salaries
services
personal ×
×
×
×
×
Weighted CBOW
Text column features
Sub_Object_Description
employees
wages
salaries
services
personal ×
×
×
×
×
∑
Sub_Object_Description_wcbow
Weighted CBOW
Text column features
Sub_Object_Description
employees
wages
salaries
services
personal ×
×
×
×
×
∑
Sub_Object_Description_wcbow
Weighted CBOW
● Concatenated WCBOW features and floats compose the final
feature vector.
Weighted CBOW
● Concatenated WCBOW features and floats compose the final
feature vector.
● Use softmax normalization for float columns
Weighted CBOW
● Concatenated WCBOW features and floats compose the final
feature vector.
● Use softmax normalization for float columns
● Replace all float NaNs with 0
Weighted CBOW
● Concatenated WCBOW features and floats compose the final
feature vector.
● Use softmax normalization for float columns
● Replace all float NaNs with 0
● Train softmax jointly with θc parameters
Weighted CBOW
● Concatenated WCBOW features and floats compose the final
feature vector.
● Use softmax normalization for float columns
● Replace all float NaNs with 0
● Train softmax jointly with θc parameters
● Score: 0.9159
Weighted CBOW
Why so poorly?
Weighted CBOW
Why so poorly?
H1: Softmax is not so powerful as Random forest
Weighted CBOW
Why so poorly?
H1: Softmax is not as powerful as Random Forest
H2: Model assumes that for describing relevant words it is
enough to use one direction per column in the word space
How many directions
should a good model
have?
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
mean
max
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
mean
max
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
personal services wagessalariesemployees
max
mean
Convolutional NN
personal services wagessalariesemployees
Convolutional NN
employees
wages
salaries
services
personal
Convolutional NN
employees
wages
salaries
services
personal
Convolutional NN
employees
wages
salaries
services
personal
×
Convolutional NN
employees
wages
salaries
services
personal
× =
Convolutional NN
employees
wages
salaries
services
personal
× =
Stride size = word dimensionality
Convolutional NN
● Concatenated mean and max values for feature maps and floats form the final
feature vector
Convolutional NN
● Concatenated mean and max values for feature maps and floats form the final
feature vector
● Use softmax normalization for float columns
Convolutional NN
● Concatenated mean and max values for feature maps and floats form the final
feature vector
● Use softmax normalization for float columns
● Replace all float NaNs with 0
Convolutional NN
● Concatenated mean and max values for feature maps and floats form the final
feature vector
● Use softmax normalization for float columns
● Replace all float NaNs with 0
● Train softmax with Wf parameters jointly
● Concatenated mean and max values for feature maps and floats form the final
feature vector
● Use softmax normalization for float columns
● Replace all float NaNs with 0
● Train softmax with Wf parameters jointly
● Score: 0.6932
Convolutional NN
Convolutional NN
Why is it not as good as CBOW + RF?
Convolutional NN
Why is it not as good as CBOW + RF?
● There is fewer parameters
Convolutional NN
Why is it not as good as CBOW + RF?
● There is fewer parameters
● The performance is still comparable to CBOW+RF. Therefore, using cnn
is a sensible idea.
Convolutional NN
Why is it not as good as CBOW + RF?
● There is fewer parameters
● The performance is still comparable to CBOW+RF. Therefore, using cnn
is a sensible idea.
● Probably, we could gain more from this type of feature learner if we go
deeper
Final model
● Train RF on the concatenated CBOW features and NN logits
Final model
● Train RF on the concatenated CBOW features and NN logits
● Train 2 CBOW classifiers, 2 NN classifiers, 2 meta-classifiers
and blend them
Final model
● Train RF on the concatenated CBOW features and NN logits
● Train 2 CBOW classifiers, 2 NN classifiers, 2 meta-classifiers
and blend them
● Score: 0.5228
Results
Conclusion
● Explore your data before doing any analysis
● Keep trying
● Ensembles are powerful
● Participating in competitions provides a great
learning opportunity

Word embeddings for social goods

Editor's Notes

  • #4 DrivenData is a for-profit social enterprise that hosts online competitions, with the goal of engaging a global community of data scientists in solving social problems.
  • #5 ERS helps school districts use their resources more strategically by providing them with a way to compare their spending to other school districts. Before partnering with DrivenData, that process involved assigning every line item to certain categories in a comprehensive financial spending framework — a task that required an average 400 man-hours per project, and limited the nonprofit's ability to give school districts the analysis they need to improve.