Semantic Analysis to Compute Personality Traits from Social Media Posts

Semantic Analysis to Compute
Personality Traits from Social
Media Posts
Master Degree in Computer Engineering
DAUIN
Supervisor
Prof. Maurizio Morisio
Intership Tutor
Dott. Ing. Giuseppe Rizzo
Candidate
Giulio Carducci s225395

Personality
Is it possible to automatically compute the personality of an
individual from the language he/she uses in social networks??
2

Background – Lexical Hypothesis
Lexical Hypothesis
• Personality characteristics that are important to
a group of people will eventually become a part
of that group's language.
• Main personality characteristics of an individual
are more likely to be encoded into language as a
single word.
Sir Francis Galton
Galton, F. Measurement of Character. Fortnightly Review, 1884, 36:179-185.
3

Five Factor Model (FFM)
• Openness
inventive/curious vs. consistent/cautious
• Conscientiousness
efficient/organized vs. easy-going/careless
• Extraversion
outgoing/energetic vs. solitary/reserved
• Agreeableness
friendly/compassionate vs. challenging/detached
• Neuroticism
sensitive/nervous vs. secure/confident
Background – Personality
4

Social networks are rich sources of
information
Personality prediction from social
network data
• Page likes
• Number of followers/following
• Choice of profile picture
• Personal profile information
• ...
Background – Personality and Social Networks
5

myPersonality
• Up to 95% prediction accuracy
• Average accuracy of 77%
Background – Personality and Social Networks
6

Word Embedding denotes a set of NLP techniques where words are mapped
to vectors of real numbers.
‘cat’ 𝑥1, 𝑥2, 𝑥3, . . . , 𝑥 𝑛−1, 𝑥 𝑛 𝑛 = 300
Word embeddings can boost the performances of many NLP applications,
and have two main advantages over traditional word vectorization
techniques:
• Dimensionality reduction
Vector space of dimension 𝑛 instead of the number of distinct words
• Contextual similarity
Similar words are mapped to vectors that are close in the vector space
Background – Semantic Analysis
7

Similar words:
• Masculine/Feminine
• Verb forms
• States/Capitals
• Synonyms
• Similar concepts
• ...
Geometrical properties
𝑣𝑒𝑐𝑡𝑜𝑟 ′𝐾𝑖𝑛𝑔′
− 𝑣𝑒𝑐𝑡𝑜𝑟 ′𝑀𝑎𝑛′
+ 𝑣𝑒𝑐𝑡𝑜𝑟 ′𝑊𝑜𝑚𝑎𝑛′
≈ 𝑣𝑒𝑐𝑡𝑜𝑟(′𝑄𝑢𝑒𝑒𝑛′
)
𝑣𝑒𝑐𝑡𝑜𝑟 ′𝐼𝑡𝑎𝑙𝑦′ + 𝑣𝑒𝑐𝑡𝑜𝑟 ′𝐶𝑎𝑝𝑖𝑡𝑎𝑙′
≈ 𝑣𝑒𝑐𝑡𝑜𝑟(′𝑅𝑜𝑚𝑒′
)
Background – Semantic Analysis
8

Experimental Setup – Overview
9

• Big 16,000,000 status updates
of 115,000 users
• Small 10,000 status updates
of 250 users
Statistics Value MIN MAX AVG
Status updates per user 1 39 223
Total words 146,128
Total words after
preprocessings
72,896
Distinct words 15,470
Distinct words after
preprocessing
15,185
Words per status update 1 14 113
Words per status update
after preprocessing
0 7 57
Experimental Setup – Gold Standard
MyPersonality Dataset
10

• 1 million word vectors
• 𝑛 = 300
• Trained on Wikipedia 2017, UMBC webbase corpus and statmt.org news
dataset
• Trained with the continuous-bag-of-words (cbow) model from word2vec
• Ordered by descending frequency
• 95.08% word coverage on myPersonality
Experimental Setup – Word Embeddings
11

Geography
Stopwords
First names
Numbers
Verbs
Experimental Setup – Word Embeddings
12

• Conversion to lowercase
“Today is a #sunny day!” → “today is a #sunny day!”.
• Stop-words removal
“today is a #sunny day!” → “today #sunny day!”.
• Punctuation removal
“today #sunny day!” → “today sunny day”.
• Tokenization
“today sunny day” → [today] [sunny] [day].
• Short posts removal
All posts with less than 3 tokens are removed.
Removes noise and less-informative data
Experimental Setup – Text Preprocessing
13

Experimental Setup – Text Transformation
14

• Feed training data to the algorithm to compute a predictive model
• Training samples: ( 𝑣𝑒𝑐 𝑠𝑡𝑎𝑡𝑢𝑠 𝑢𝑝𝑑𝑎𝑡𝑒 , 𝐵𝐼𝐺5 𝑠𝑐𝑜𝑟𝑒)
• Supervised Learning: for each training sample, we specify the ground truth label
Linear Regression
𝒚 = 𝛽𝑿+ ∈
𝑦𝑖 = 𝛽01 + 𝛽1 𝒙𝑖1 + 𝛽2 𝑥𝑖2 + . . . +𝛽900 𝑥𝑖900 + 𝜖𝑖 𝑖 = 1,2, . . . , 𝑁
Least Absolute Shrinkage ans Selection Operator (LASSO)
𝑚𝑖𝑛
𝛽
1
2∗𝑛 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
𝑿𝛽 − 𝑦 2
2
+ 𝛼 𝛽 1
Support Vector Machines (SVM)
𝒚 = 𝛽𝑿+ ∈
𝐽 𝛽 =
1
2
𝛽 𝑖=1
𝑁
𝜉𝑖 + 𝜉𝑖
∗
min 𝐽(𝛽)
Experimental Setup – Model Training
15

Type equation here.
• Also called Tuning of the hyperparameters
• Loss Function
Mean Squared Error (MSE) Mean squared difference between actual and
predicted values. Average value over 10-fold cross-validation
𝑀𝑆𝐸(𝑃, 𝐴) =
1
𝑛
𝑖=1
𝑛
𝑝𝑖 − 𝑎𝑖
2
𝑃 = (𝑝1, 𝑝2, … , 𝑝 𝑛)
𝐴 = (𝑎1, 𝑎2, … , 𝑎 𝑛)
Algorithm Parameter Value
SVM Kernel linear, rbf, poly
C 1, 10, 100
Gamma 0.01, 0.1, 1, 10
Degree 2,3
LASSO Alpha 1−15
, 1−10
, 1−8
, 1−5
, 1−4
, 1−3
, 1−2
, 1, 5, 10
SVM
Experimental Setup – Parameters Optimization
16
∈ ℝ+
: [0, +∞)

• Further cleaning steps applied before preprocessing:
∙ Pure retweets removal (= retweets with no added comment)
∙ URLs removal
∙ Mentions removal
• Preprocessing and transformation performed the same way as status updates
[𝑥1, 𝑥2, 𝑥3, … , 𝑥899, 𝑥900]
[𝑥1, 𝑥2, 𝑥3, … , 𝑥899, 𝑥900]
[𝑥1, 𝑥2, 𝑥3, … , 𝑥899, 𝑥900]
[0 − 5]
[0 − 5]
[0 − 5]
[0 − 5]
Clean Preprocess Transform
Experimental Setup – Personality Prediction
17

Trait
SVM Configuration
MSE
Kernel C Gamma
Openness rbf 1 1 0.3316
Conscientiousness rbf 10 1 0.5300
Extraversion rbf 10 1 0.7084
Agreeableness rbf 10 1 0.4477
Neuroticism rbf 10 10 0.5572
• Margin over Lreg: 8%
• Margin over LASSO: 1%
Method
MSE
Mean Std
Sum 0.6942 0.4862
Maximum 0.5350 0.0228
Minimum 0.5342 0.0230
Average 0.5366 0.0246
Concatenation 0.5364 0.0188
• Low mean MSE
• Lowest MSE std
Concatenation is more stable
with respect to other methods
Experimental Results – Algorithm and Transformation
18

MyPersonality big 16,000,000 status updates of 116,000 users
Same approach of myPersonality small
Training samples: ( 𝑣𝑒𝑐 𝑠𝑡𝑎𝑡𝑢𝑠 𝑢𝑝𝑑𝑎𝑡𝑒 , 𝐵𝐼𝐺5 𝑠𝑐𝑜𝑟𝑒)
Training time, Overfitting
Downsample
• 5000
• 10000
• 15000
• 20000
Dataset
Mean Squared Error
OPE CON EXT AGR NEU
MP small 0.3316 0.5300 0.7084 0.4477 0.5572
MP big (10k) 0.4184 0.5101 0.6971 0.4799 0.6459
MP big (20k) 0.4181 0.5066 0.6816 0.4773 0.6444
Experimental Results – MyPersonality Big
19

Statistic Value MIN AVG MAX
Total users 24
Total tweets 18,473
Tweets per user 9 769.7 2,252
Avg words per
tweet per user
5 6.8 8.8
• 26 participants
• 2 removed – not enough tweets
• Big Five Inventory (BFI, 44 items)
Experimental Results – Twitter Sample
20

Dataset
Mean Squared Error
OPE CON EXT AGR NEU
Twitter Sample (MP small) 0.3812 0.3129 0.3002 0.1319 0.2673
Twitter Sample (MP big) 0.3178 0.3236 0.4110 0.1362 0.2803
Literature* 0.4761 0.5776 0.7744 0.6241 0.7225
GET https://api.twitter.com/1.1/statuses/user_timeline.json?screen_name=username&count=200
statuses/user_timeline
• Returns: up to 200 tweets of username
• Format: json
• Rate limit: 1500/15 minutes
*Quercia, D., Kosinski, M., Stillwell, D., Crowcroft, J. Our Twitter Profiles, Our Selves: Predicting Personality with Twitter.
180-185. 10.1109/PASSAT/ SocialCom.2011.26.
Experimental Results – Twitter Sample
21

• Train word embeddings on textual data from social media
• Use a CNN for text transformation and prediction
• Expand the feature vector with additional semantic features
• Train multilingual word embeddings
• Test the approach on a bigger dataset
• Expand the Twitter sample
Future Work
22

...
[ 𝑚1 , 𝑚2, … , 𝑚300]
[ 𝑥1,1 , 𝑥2,1, … , 𝑥300,1]
𝑚1 = max 𝑥1,𝑖 𝑖 = 1,2, … , 𝑁
[ 𝑥1,2 , 𝑥2,2, … , 𝑥300,2] [ 𝑥1,𝑁 , 𝑥2,𝑁, … , 𝑥300,𝑁]...
Word Embedding 2Word Embedding 1 Word Embedding N
Appendix – Text Transformation (MAX)

i7-6700HQ @ 2.60 GHz
16GB RAM
Python
• Scikit-learn
• Numpy
• Pandas
• tweepy
Appendix – Technical Setup

Semantic Analysis to Compute Personality Traits from Social Media Posts

Recommended

Recommended

More Related Content

Similar to Semantic Analysis to Compute Personality Traits from Social Media Posts

Similar to Semantic Analysis to Compute Personality Traits from Social Media Posts (20)

Recently uploaded

Recently uploaded (20)

Semantic Analysis to Compute Personality Traits from Social Media Posts

Editor's Notes