How to Transform Clinical Trial Management with Advanced Data Analytics
April 10th of 2018 budapest presentation
1. Selective Waves:
Transfer Learning for
Sentiment Classification
Ammar Rashed# and Ahmet Bulut#,*
ammarrashed@std.sehir.edu.tr, ahmetbulut@sehir.edu.tr
#Media Lab, *Data Science Lab
Istanbul Sehir University
34865 Dragos, Istanbul
April 10th, 2018 Budapest, Hungary
8. Sentiment Classification
• Given a piece of text (review, complaint, tweet, response,
email, status update, text message, and etc.), can we predict
its author’s tone of voice, i.e., its sentiment?
- A customer complaining about your company’s product.
- A user tweeting negatively on a newly released movie.
- A press conference bolstering your company’s public image.
14. Supervised Learning
• When you have a large labeled dataset, learning a predictor is
pretty straightforward.
• We can learn a multi-class classifier using multinomial logistic
regression, neural network, decision trees, random forests, and
etc.
• What if you do not have a large enough labeled dataset?
• Use crowdsourcing, e.g., Mechanical Turk.
• Hire annotators and pay them by the hour to label for you.
15. Supervised Learning
• What if you do not have a large enough labeled dataset?
• This is especially true for most languages except English.
• If we are going to build a sentiment analyser for Turkish but
we only have a small dataset for it, then we should make
use of the sentiment data available in English.
• We denote this approach as Transfer Learning.
19. Technical Details
• Facebook’s fastText is used for efficient word representations.
For any word in a given language, it outputs a word
embeddings vector of a desired dimension (e.g., d = 300).
• If a review contains two words with word vectors [1,2,3],
[4,5,6], then the BOW average would be computed as:
[(1+4)/2, (2+5)/2, (3+6)/2] = [2.5, 3.5, 4.5]
• Since our sentiment classification is a multi-class classification
problem, we use softmax at the last layer.
• softmax returns the probabilities of each class and the target
class will have the highest probability.
• The sum of all the probabilities equals to 1.
21. Technical Details
• During model training, we used cross-entropy loss function in
order to compute the deltas used in updating the weight
vectors through back propagation.
23. Technical Details
• In order to quantify the accuracy of predicted scores, we used
the Mean Absolute Error (MAE) as the error metric.
• We used k-fold cross validation (k = 10) for reporting test
results.
25. Thank you!
Ammar Rashed# and Ahmet Bulut#,*
ammarrashed@std.sehir.edu.tr, ahmetbulut@sehir.edu.tr
#Media Lab, *Data Science Lab
Istanbul Sehir University
34865 Dragos, Istanbul