SlideShare a Scribd company logo
1 of 10
Download to read offline
Prediction of Reaction towards Textual Posts in Social Networks
Mohamed Mahmoud (elgeish@stanford.edu)
Abstract
Posting on social networks could be a
gratifying or a terrifying experience de-
pending on the reaction the post and its
author —by association— receive from the
readers. To better understand what makes
a post popular, this project inquires into
the factors that determine the number of
likes, comments, and shares a textual post
gets on LinkedIn; and finds a predictor
function that can estimate those quanti-
tative social gestures.
Keywords: Linear Regression; LinkedIn;
Machine Learning; Popularity; Social Net-
works
1 Introdcution
Social media have been playing a prominent role
in our daily lives in the past decade; they connect
us with our families, friends, and colleagues in un-
precedented ways. Sharing on social networks be-
came a primary means of communication —a basic
human need— that has the potential of reaching
a large group of recipients around the globe. Your
posts on social networks are a reflection of who you
think you are (self-perception), or what you want
others to see; and how readers react to them, is
a reflection of how they perceive you. The lack of
immediacy in such social interactions allows the au-
thor to ruminate on what to write, when sharing a
textual post, in hopes of maximizing the fulfillment
of a desideratum (akin to an essayist or a reporter
seeking a certain goal). On the other hand, readers
reciprocate by showing appreciation of the effort
exerted in the form of virtual social gestures (akin
to fan letters), which may help the post travel fur-
ther, and propagate through many social networks.
Let’s take LinkedIn as an example of social net-
works where members are allowed to like, comment
on, and re-share posts; the more likes a LinkedIn
post gets, the more fulfilled the author feels about
it; such reaction vouches for the author’s ethos, and
supports the author’s claim to fame. Conversely,
the author’s reputation and self-esteem might suf-
fer when the post is uncelebrated. In order to alle-
viate the social pressure associated with sharing on
social media, this work proposes a system that pre-
dicts quantitative reaction towards textual posts in
social networks over a specific time window. This
time series analysis can help authors gauge the pop-
ularity (as a proxy for quality) of an update before
posting it; they can refine the post if the scores are
unsatisfactory, and examine how various versions
of the same post score. In addition, the system can
be used by social networks to predict which posts
are more appealing to readers for ranking purposes
[1]. This work will focus on LinkedIn as the social
network of choice.
2 Input-Output Behavior
The input to the proposed system is a textual post
shared by a member of a social network, and the
output is a prediction of the quantitative reaction
(number of likes, comments, and shares) the post
shall receive within a certain time window; we’ll
use a fixed window of one day. Take, for example,
the post in the figure below:
Figure 1: an example of a textual post
The output corresponding to this input post is
a vector of (predicted number of likes, predicted
number of comments, and predicted number of
shares) — one for each day. Here’s another exam-
ple that should score much higher than the former:
Figure 2: an example of a popular textual post
1
3 Model
Disclaimer: The analysis, exploration, and prepa-
ration of data; feature engineering, and extrac-
tion; use, and fine-tuning of machine learning al-
gorithms, along with code developed for train-
ing, validating, and testing the model; and prac-
tices adopted for this project are driven by my
own personal experience, and not connected to
any LinkedIn product. The data discussed here
were anonymized —by removing personally identi-
fiable information— in accordance with LinkedIn’s
strict policies regarding data privacy. In order
to protect LinkedIn’s intellectual property, some
of the features mentioned below are redacted.
The goal of the proposed system is to obtain
a predictor function f that maps new input x to
output y ∈ R3
corresponding to (predicted number
of likes, predicted number of comments, and pre-
dicted number of shares) for the input post and age
in days (time window).
This is a regression problem, which is solved
by using the following framework (using supervised
learning, which is given the training data to pro-
duce the predictor function):
Figure 3: diagram of a learning framework
The training data Dtrain is a set of examples,
which are basically input-output pairs: the inputs
are the posts (including age in days), and outputs
are the corresponding social gestures.
3.1 Data Preparation
The data came from original textual posts on
LinkedIn; an original post is authored by its poster,
and not a re-share of another post. In order to learn
a predictor, three datasets were gathered for train-
ing, validation, and test; the datasets came from
historical LinkedIn data about using ETL (Extract,
Transform, Load); one of the challenges faced dur-
ing this project is performing ETL at the scale of
LinkedIn: joining multiple datasets that contain
billions of records; each dataset contains certain
aspects of the record that serves as an example for
training. Member IDs, along with personally iden-
tifiable information, were removed once the meta-
data were generated. The combined tally of exam-
ples in the training and validation sets is around
3.8 million — after removing outliers. In order to
maintain good data hygiene, the test set was gath-
ered after training was completed from a date range
that doesn’t overlap with that of the training and
validation datasets, and it amounts roughly to 0.25
million examples (out of 4.05 million examples in
total). The data are split to 75% for training, 19%
for validation, and 6% for testing.
3.1.1 Outliers
Outliers were determined by plotting [2] the dis-
tribution of the label values; here’s an example of
the distribution of label values for likes using a log
scale:
0e+00
2e+05
4e+05
6e+05
log(label value)
count
Figure 4: distribution of label values for likes
After exploring the distribution of data at dif-
ferent bucket widths, a cutoff line was chosen to
remove outliers that are unlikely to occur (based
on the gathered data). This process was repeated
for comments and shares as well.
3.1.2 Missing Data
Some of the fields in the collected records were
missing at random; for example, fields that were
left blank by the members of the social network.
2
Whenever possible, a replacement value was cal-
culated for the missing field. For example, for a
missing timezone field, an approximation was cal-
culated using the member’s country. A more in-
teresting example for a missing field is one that’s
real-valued, which was replaced by the mean value
of that field in the observed examples, which is an
acceptable solution to replace data missing at ran-
dom (MAR).
3.1.3 Raw Data
A labeled example is basically a tuple of the tex-
tual post along with its metadata (input); and its
label is the number of likes, comments, and shared
generated for the post (output). A typical record
in a dataset looks like the following:
((Text & Metadata), (Likes, Comments, Shares))
Metadata include data about the post like age, vis-
ibility (e.g. public), author metadata (e.g. network
size), etc.; such metadata is essential to better rep-
resent the inputs (in correlation to the output) in
the context of the problem at hand; in a social net-
work, the text of a post per se is insufficient to
predict its popularity; we need to consider other
factors like metadata about the post and its au-
thor; for example, the size of the author’s network
is expected to play a major role in predicting pop-
ularity.
After the raw data were extracted, the records
were serialized and stored into binary files to save
space. Those binary files served as input files for
the feature extractor.
3.2 Scoring
Each input is going to be distilled into a fea-
ture vector φ(x) = [φ1(x), . . . , φd(x)] ∈ Rd
,
which represents the input and will be computed
using a feature extractor. Correspondingly, a
weight vector wi = [wi1, . . . , wid] ∈ Rd
; i ∈
{likes, comments, shares} specifies the weight of
each feature to each component of the prediction
vector y ∈ R3
.
Given a feature vector φ(x) and a weight vec-
tor wi, the respective prediction score component
yi ∈ R is their inner product:
yi = wi · φ(x)
The score vector represents a snapshot of the
post’s popularity at the given age in days, to get
the time series, the age is incremented by the de-
sired time unit.
3.3 Feature Extraction
Based on domain knowledge, a feature vector φ(x)
is picked to represent an input x and contribute to
the prediction vector y.
The examples were loaded from the binary files
where they were stored, then transformed into tu-
ples of (feature vector, label vector). The feature
vector is the union of raw features and features de-
rived from the raw data. The feature extractor
stored the tuples into binary files using Python’s
cPickle module. Caching the feature vectors and
their respective labels speeds up the training pro-
cess instead of re-extracting the features from the
examples every time predicted scores are calcu-
lated.
The features that contribute to the popularity
of textual posts can be divided into the following:
• Textual features (pertaining to the post’s
text)
• Post metadata features (pertaining to the ac-
tion of posting)
• Author features (pertaining to the author)
In each category, there are real-valued and indi-
cator features that are listed below as feature tem-
plates (to be filled out by the training data).
3.3.1 Textual Features
• log(post length) ∈ [0.0, 3.2): boolean feature
• log(post length) ∈ [3.2, 6.4): boolean feature
• log(post length) ∈ [6.4, ): boolean feature
• contains a URL: boolean feature
• contains a question: boolean feature
• contains an e-mail address: boolean feature
• contains a hashtag: boolean feature
• contains a simley emoticon: boolean feature
• contains a frowny emoticon: boolean feature
• post length: real-valued feature
• log(post length): real-valued feature
• ratio of non-alphanumeric characters: real-
valued feature
• word count: real-valued feature
• stemming: real-valued features, one for each
stem in the text and its count
3
• unigrams: real-valued features, one for each
word in the text and its count
• bigrams: real-valued features, one for each bi-
gram (two adjacent words) in the text and its
count
• trigrams: real-valued features, one for each
trigram (three adjacent words) in the text
and its count
3.3.1.1 Bucketization of Post Length
In order to figure out the relationships between
the proposed features and their respective weights,
interactive exploration of the data was performed
as a part of the feature engineering process; plot-
ting various features vs. the number of social ges-
tures uncovered some insights. For example, the
log(post length) can be bucketized into three buck-
ets:
0 2 4 6
log(post length)
numberoflikesexcludingoutliers
Figure 5: bubble chart of log(post length) and num-
ber of likes excluding outliers
The bucket boundaries are 0, 3.2, and 6.4; un-
surprisingly, counts of comments and shares fol-
lowed suit due to the correlation between the three
social gestures:
0 2 4 6
log(post length)
numberofcommentsexcludingoutliers
Figure 6: bubble chart of log(post length) and num-
ber of comments excluding outliers
0 2 4 6
log(post length)
numberofsharesexcludingoutliers
Figure 7: bubble chart of log(post length) and num-
ber of shares excluding outliers
3.3.1.2 Stemming
Using the Snowball stemmer from nltk [3],
which is language-specific, whenever the post’s lan-
guage was supported; otherwise, the Porter stem-
mer was used. The Snowball stemmer has a much
better understanding of the language model —
including stopwords exclusion— and it supports
the following languages: Danish, Dutch, English,
4
Finnish, French, German, Hungarian, Italian, Nor-
wegian, Portuguese, Romanian, Russian, Spanish
and Swedish; Many more languages were found in
the examples.
3.3.2 Post Metadata Features
• language of post: indicator feature
• day of month: indicator feature
• day of week: indicator feature
• hour of day: indicator feature
• day of month and hour: indicator feature
• day of week and hour: indicator feature
• sharing visibility: indicator feature
• post is in member interface locale: boolean
feature
• post is in member default locale: boolean fea-
ture
• post is in member locale: boolean feature
• post age in days: real-valued feature
• log(post age in days): real-valued feature
• post age in minutes: real-valued feature
• log(post age in minutes): real-valued feature
• mentions count: real-valued feature
3.3.2.1 Language Identification
Language identification was performed using a
language identifier software library developed by
LinkedIn.
3.3.2.2 Locality of Timestamps
Timestamps were adjusted to represent local
time according to the post’s timezone; the predictor
should calculate the same score for two posts shared
at the same local time — all other factors being
equal. For example, if a member who lives in Cali-
fornia shared a post at 10 AM PST, it should have
the same score as a post shared at 10 AM EST by
an identical member who lives in New York; time-
based features have to be trained with respect to
groups of values that contribute to the score in the
same fashion. This is important because of the role
the time of day when a post was published plays in
predicting its popularity; we here assume that the
majority of the post’s target readership lives in the
same timezone as the poster [4], [5].
3.3.3 Author Features
• default locale: indicator feature
• interface locale: indicator feature
• country: indicator feature
• industry: indicator feature
• timezone: indicator feature
• connections visibility: indicator feature
• feed visibility: indicator feature
• picture visibility: indicator feature
• is a LinkedIn influencer: boolean feature
• interface locale is default: boolean feature
• connections count: real-valued feature
• log(connections count): real-valued feature
• followers count: real-valued feature
• log(followers count): real-valued feature
• average likes count: real-valued feature
• average comments count: real-valued feature
• average shares count: real-valued feature
• a set of proprietary features (redacted)
Figure 8: chart of average likes per member (ex-
cluding outliers) and number of likes (excluding
outliers); unsurprisingly, there is a correlation be-
tween the two dimensions
5
4 Approach
4.1 Baseline
A rule-based system was chosen to predict the num-
ber of likes, comments, and shares for a given input
by searching for certain keywords in the text, and
factoring in the size of the author’s network in a
formula for each prediction, for example:
likes = α(network size) +
word∈text
weight(word)
The coefficients and weights in such system can
be guessed based on heuristics or domain exper-
tise. An example of a baseline that was chosen
with α = 0.011 and weight(’I’) = 1 yielded a large
test error; see the results section for more details.
4.2 Oracle
An oracle can see the future, and tell us exactly
how many likes, comments, and shares a post has
at the end of a future time window. So, in our case,
it’s basically a time machine.
4.3 Linear Regression
Linear regression can be used to predict the num-
ber of social gestures by learning the weights vec-
tors that contribute to the scores. The objective
is to minimize the average loss determined by the
squared loss function:
Losssquared(x, yi, wi) = (wi · φ(x) − yi)2
4.4 Stochastic Gradient Descent
The choice of stochastic gradient descent (SGD) of
linear regression was an obvious one as the datasets
used in this project are large, a fact that influenced
the tuning of hyperparameters and the algorithm
as well.
4.4.1 Hyperparameters
Because of the large number of example, the fol-
lowing formula was used for the learning rate:
η = min(0.001, number of updates)
It starts with a value that’s relatively large –yet
small enough to keep the weights from overflowing–
then get smaller as the number of updates increases
(as the convergence rate increases).
Termination of the algorithm was determined
by either reaching diminishing improvements ( =
0.0001) of the combined training and validation
errors (|TEt+1 − TEt| + |VEt+1 − VEt| < ), or
exhausting the maximum number of iterations al-
lowed; however, the latter can be incremented by 1
if a significant improvement in the validation error
has been observed (ε = 0.1); The same condition
was used to save a snapshot of the training pro-
gram in case it got aborted. When the program
was done, the weights vector was pickled (serial-
ized) and saved to disk for the test program to load
and evaluate. The examples used for training were
equally divided to 100 file; the training dataset had
80 files, and the validation dataset had 20 files. The
order of the training files to process at the start of
each iteration was picked at random, and the con-
tent of each file was then shuffled as well in hopes
of increasing the convergence rate.
4.4.2 Feature Normalization
One of the issues found while training was overflow
of weights; the feature values were highly variant,
and had to be normalized; the following formula
was used to rescale the values:
φ(x) =
φ(x) − min(φ(x))
max(φ(x)) − min(φ(x))
4.5 Gradient Descent Using VW
Vowpal Wabbit [6] makes use of parallel threads,
feature hashing, and cache files to speed up the gra-
dient descent algorithm. Running VW with a single
pass, and with 100 passes generated similar results;
when run with multiple passes, VW held out 10%
of the examples for validation and reported the dev
loss instead of the training loss. VW was run using
the squared loss function, and feature normaliza-
tion.
4.5.1 Hyperparameters
VW was also run with the adaptive option, which
sets an individual learning rate for each feature,
which improves learning when the feature vector is
large [7]. The initial learning rate was set to the
default value (η0 = 0.5).
5 Results
Please note that the fitted coefficients were
redacted to protect LinkedIn’s intellectual prop-
erty.
6
Baseline SGD VW (1 Pass) VW (100 Passes)
Likes
Iterations N/A 101 1 100
Features Count 2 3016665 590650105 53157724900
Training RMSE N/A 3.930 3.683 N/A
Dev RMSE N/A 4.222 N/A 3.384
Test RMSE 8.352 2.440 N/A N/A
Comments
Iterations N/A 25 1 100
Features Count 2 3016878 590650105 5315772490
Training RMSE N/A 2.374 2.144 N/A
Dev RMSE N/A 2.497 N/A 2.120
Test RMSE 2.576 1.068 N/A N/A
Shares
Iterations N/A 2 1 100
Features Count 2 3017243 590650105 53157724900
Training RMSE N/A 0.446 0.474 N/A
Dev RMSE N/A 0.484 N/A 0.447
Test RMSE 63.878 0.424 N/A N/A
Table 1: comparison of various approaches
RMSE is the root-mean-square error (since the
predictors used the squared loss function); in the
table above, it’s rounded to the third decimal place.
VW produced a model file of hashed features
and their respective weights; parsing that exces-
sively large vector into a format that the test har-
ness understands —to calculate Test RMSE— was
prohibitive; Dev RMSE is a good proxy for it (in
the scope of this project).
6 Literature Review
Another project that was set to predict Facebook
likes [5] compared two approaches: linear regres-
sion, and nearest neighbor; and found out the for-
mer is more effective in predicting likes than the
latter. It’s also worth mentioning that the number
of examples used in that project, 49216, is two or-
ders of magnitude smaller than the one used here,
which indicates that the test data may not have
enough statistical coverage for a social network of
more than one billion daily unique users [8].
7 Potential Improvements
There are a few more improvements that I would
have liked to explore; for example, adding more
textual features like part-of-speech tagging, and
lemmatization.
Another possible addition to the feature vector
is metadata features about who liked, commented
on, and shared a post; the observed labels repre-
sent a time series of an underlying sequences of
values that aren’t IID (Independent and Identically
Distributed); for example, the probability that a
post gets n more likes at time tj depends on who
liked it at all times before tj because likes propa-
gate through the news feed (when member x likes a
public post y, it shows up as news for x’s network,
and they can like it, comment on it, and/or share
it from their news feeds); this is known as serial
coupling [9]; augmenting the feature vector with
metadata features about members who reacted to
the post can improve the accuracy of predicting its
popularity [10], but it will make the cardinality of
the feature vector orders of magnitude larger than
what is currently used. Exploring other loss func-
tions, ensemble learning, more non-linear features,
and feature interactions (e.g., the cross product of
metadata features) might yield more accurate pre-
dictions.
8 Acknowledgments
I’d like to thank Percy Liang, and CS221 TA’s for
the guidance they gave me throughout the quarter.
I’d also like to thank LinkedIn (special thanks to
Guy Lebanon and Bee-Chung Chen) for providing
the data that made this project possible.
7
References
[1] D. Agarwal, B.-C. Chen, Q. He, Z. Hua, G. Lebanon, Y. Ma, P. Shivaswamy, H.-P. Tseng, J. Yang,
and L. Zhang, “Personalizing linkedin feed”, in Proceedings of the 21th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, ser. KDD ’15, Sydney, NSW, Australia: ACM,
2015, pp. 1651–1660, isbn: 978-1-4503-3664-2. doi: 10.1145/2783258.2788614. [Online]. Avail-
able: http://doi.acm.org/10.1145/2783258.2788614.
[2] H. Wickham, Ggplot2: Elegant graphics for data analysis. Springer New York, 2009, isbn: 978-0-387-
98140-6. [Online]. Available: http://had.co.nz/ggplot2/book.
[3] S. Bird, E. Klein, and E. Loper, Natural Language Processing with Python, 1st. O’Reilly Media, Inc.,
2009, isbn: 0596516495, 9780596516499.
[4] Z. Ellison and S. Hildick-Smith, “Blowing up the twittersphere: Predicting the optimal time to tweet”,
Stanford, Stanford, CA, Tech. Rep., 2014. [Online]. Available: http://cs229.stanford.edu/
proj2014/Seth%20Hildick- Smith, %20Zach%20Ellison, %20Blowing%20Up%20The%
20Twittersphere-%20Predicting%20the%20Optimal%20Time%20to%20Tweet.pdf.
[5] K. Chen, B. Huang, and B. Lee, “Facebook like predictor within your friends”, Northwestern Uni-
versity, Evanston, IL, Tech. Rep., 2015. [Online]. Available: http://kbbz.github.io/files/
Final%20report.pdf.
[6] J. Langford, L. Li, and A. Strehl, Vowpal Wabbit, 2007.
[7] J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic
optimization”, J. Mach. Learn. Res., vol. 12, pp. 2121–2159, Jul. 2011, issn: 1532-4435. [Online].
Available: http://dl.acm.org/citation.cfm?id=1953048.2021068.
[8] M. Zuckerberg, 2015.
[9] L. Cao, “Non-iidness learning in behavioral and social data”, The Computer Journal, 2013. doi:
10 . 1093 / comjnl / bxt084. eprint: http : / / comjnl . oxfordjournals . org / content /
early/2013/08/22/comjnl.bxt084.full.pdf+html. [Online]. Available: http://comjnl.
oxfordjournals.org/content/early/2013/08/22/comjnl.bxt084.abstract.
[10] M. Dundar, B. Krishnapuram, J. Bi, and R. B. Rao, “Learning classifiers when the training data is not
iid”, in Proceedings of the 20th International Joint Conference on Artifical Intelligence, ser. IJCAI’07,
Hyderabad, India: Morgan Kaufmann Publishers Inc., 2007, pp. 756–761. [Online]. Available: http:
//dl.acm.org/citation.cfm?id=1625275.1625397.
8
Appendix A Learning Rate Plots
0
5
10
15
0 5 10 15 20
log2(examples count)
RMSE
comments
likes
shares
Figure A9: line chart of learning rate using VW (single pass)
9
0
5
10
15
0 10 20
log2(examples count)
RMSE
comments
likes
shares
Figure A10: line chart of learning rate using VW (100 passes)
10

More Related Content

What's hot

Social media recommendation based on people and tags (final)
Social media recommendation based on people and tags (final)Social media recommendation based on people and tags (final)
Social media recommendation based on people and tags (final)es712
 
Fuzzy AndANN Based Mining Approach Testing For Social Network Analysis
Fuzzy AndANN Based Mining Approach Testing For Social Network AnalysisFuzzy AndANN Based Mining Approach Testing For Social Network Analysis
Fuzzy AndANN Based Mining Approach Testing For Social Network AnalysisIJERA Editor
 
Tags as tools for social classification
Tags as tools for social classificationTags as tools for social classification
Tags as tools for social classificationIsabella Peters
 
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...IRJET Journal
 
IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...
IRJET-  	  Twitter Sentimental Analysis for Predicting Election Result using ...IRJET-  	  Twitter Sentimental Analysis for Predicting Election Result using ...
IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...IRJET Journal
 
Detecting Spam Tags Against Collaborative Unfair Through Trust Modelling
Detecting Spam Tags Against Collaborative Unfair Through Trust ModellingDetecting Spam Tags Against Collaborative Unfair Through Trust Modelling
Detecting Spam Tags Against Collaborative Unfair Through Trust ModellingIOSR Journals
 
Can you trust everything?
Can you trust everything?Can you trust everything?
Can you trust everything?Colin Lieu
 
Trust influence and social media
Trust influence and social mediaTrust influence and social media
Trust influence and social mediaDawn Dawson
 
Mining social data
Mining social dataMining social data
Mining social dataMalk Zameth
 
Digital Trails Dave King 1 5 10 Part 2 D3
Digital Trails   Dave King   1 5 10   Part 2   D3Digital Trails   Dave King   1 5 10   Part 2   D3
Digital Trails Dave King 1 5 10 Part 2 D3Dave King
 
NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISrathnaarul
 
Analyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-TweetsAnalyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-TweetsRESHAN FARAZ
 
Data mining for social media
Data mining for social mediaData mining for social media
Data mining for social mediarangesharp
 
Final Poster for Engineering Showcase
Final Poster for Engineering ShowcaseFinal Poster for Engineering Showcase
Final Poster for Engineering ShowcaseTucker Truesdale
 
Text mining on Twitter information based on R platform
Text mining on Twitter information based on R platformText mining on Twitter information based on R platform
Text mining on Twitter information based on R platformFayan TAO
 
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...IEEEFINALYEARSTUDENTPROJECTS
 
What Sets Verified Users apart? Insights Into, Analysis of and Prediction of ...
What Sets Verified Users apart? Insights Into, Analysis of and Prediction of ...What Sets Verified Users apart? Insights Into, Analysis of and Prediction of ...
What Sets Verified Users apart? Insights Into, Analysis of and Prediction of ...IIIT Hyderabad
 
DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...
DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...
DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...Università degli Studi di Milano-Bicocca
 

What's hot (20)

Social media recommendation based on people and tags (final)
Social media recommendation based on people and tags (final)Social media recommendation based on people and tags (final)
Social media recommendation based on people and tags (final)
 
Fuzzy AndANN Based Mining Approach Testing For Social Network Analysis
Fuzzy AndANN Based Mining Approach Testing For Social Network AnalysisFuzzy AndANN Based Mining Approach Testing For Social Network Analysis
Fuzzy AndANN Based Mining Approach Testing For Social Network Analysis
 
Tags as tools for social classification
Tags as tools for social classificationTags as tools for social classification
Tags as tools for social classification
 
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...
 
IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...
IRJET-  	  Twitter Sentimental Analysis for Predicting Election Result using ...IRJET-  	  Twitter Sentimental Analysis for Predicting Election Result using ...
IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...
 
Detecting Spam Tags Against Collaborative Unfair Through Trust Modelling
Detecting Spam Tags Against Collaborative Unfair Through Trust ModellingDetecting Spam Tags Against Collaborative Unfair Through Trust Modelling
Detecting Spam Tags Against Collaborative Unfair Through Trust Modelling
 
Can you trust everything?
Can you trust everything?Can you trust everything?
Can you trust everything?
 
Trust influence and social media
Trust influence and social mediaTrust influence and social media
Trust influence and social media
 
Mining social data
Mining social dataMining social data
Mining social data
 
Digital Trails Dave King 1 5 10 Part 2 D3
Digital Trails   Dave King   1 5 10   Part 2   D3Digital Trails   Dave King   1 5 10   Part 2   D3
Digital Trails Dave King 1 5 10 Part 2 D3
 
Who gives a tweet
Who gives a tweetWho gives a tweet
Who gives a tweet
 
NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSIS
 
757
757757
757
 
Analyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-TweetsAnalyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-Tweets
 
Data mining for social media
Data mining for social mediaData mining for social media
Data mining for social media
 
Final Poster for Engineering Showcase
Final Poster for Engineering ShowcaseFinal Poster for Engineering Showcase
Final Poster for Engineering Showcase
 
Text mining on Twitter information based on R platform
Text mining on Twitter information based on R platformText mining on Twitter information based on R platform
Text mining on Twitter information based on R platform
 
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
 
What Sets Verified Users apart? Insights Into, Analysis of and Prediction of ...
What Sets Verified Users apart? Insights Into, Analysis of and Prediction of ...What Sets Verified Users apart? Insights Into, Analysis of and Prediction of ...
What Sets Verified Users apart? Insights Into, Analysis of and Prediction of ...
 
DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...
DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...
DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...
 

Similar to Prediction of Reaction towards Textual Posts in Social Networks

2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...IEEEFINALYEARSTUDENTPROJECT
 
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...IEEEMEMTECHSTUDENTSPROJECTS
 
Profile Analysis of Users in Data Analytics Domain
Profile Analysis of   Users in Data Analytics DomainProfile Analysis of   Users in Data Analytics Domain
Profile Analysis of Users in Data Analytics DomainDrjabez
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGIJwest
 
srd117.final.512Spring2016
srd117.final.512Spring2016srd117.final.512Spring2016
srd117.final.512Spring2016Saurabh Deochake
 
Integrated expert recommendation model for online communitiesst02
Integrated expert recommendation model for online communitiesst02Integrated expert recommendation model for online communitiesst02
Integrated expert recommendation model for online communitiesst02IJwest
 
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)paperpublications3
 
IRJET- Finding Related Forum Posts through Intention-Based Segmentation
IRJET-  	  Finding Related Forum Posts through Intention-Based SegmentationIRJET-  	  Finding Related Forum Posts through Intention-Based Segmentation
IRJET- Finding Related Forum Posts through Intention-Based SegmentationIRJET Journal
 
Predicting the Brand Popularity from the Brand Metadata
Predicting the Brand Popularity from the Brand MetadataPredicting the Brand Popularity from the Brand Metadata
Predicting the Brand Popularity from the Brand MetadataIJECEIAES
 
Social Friend Overlying Communities Based on Social Network Context
Social Friend Overlying Communities Based on Social Network ContextSocial Friend Overlying Communities Based on Social Network Context
Social Friend Overlying Communities Based on Social Network ContextIRJET Journal
 
14420-Article Text-17938-1-2-20201228.pdf
14420-Article Text-17938-1-2-20201228.pdf14420-Article Text-17938-1-2-20201228.pdf
14420-Article Text-17938-1-2-20201228.pdfMehwishKanwal14
 
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptxSampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx20211a05p7
 
software_engg-chap-03.ppt
software_engg-chap-03.pptsoftware_engg-chap-03.ppt
software_engg-chap-03.ppt064ChetanWani
 
TruSIS: Trust Accross Social Network
TruSIS: Trust Accross Social NetworkTruSIS: Trust Accross Social Network
TruSIS: Trust Accross Social NetworkLora Aroyo
 
EffectiveCrowdSourcingForProductFeatureIdeation v18
EffectiveCrowdSourcingForProductFeatureIdeation v18EffectiveCrowdSourcingForProductFeatureIdeation v18
EffectiveCrowdSourcingForProductFeatureIdeation v18Karthikeyan Rajasekharan
 
Who are the top influencers and what characterizes them?
Who are the top influencers and what characterizes them?Who are the top influencers and what characterizes them?
Who are the top influencers and what characterizes them?Nicola Procopio
 

Similar to Prediction of Reaction towards Textual Posts in Social Networks (20)

2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
 
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
 
Profile Analysis of Users in Data Analytics Domain
Profile Analysis of   Users in Data Analytics DomainProfile Analysis of   Users in Data Analytics Domain
Profile Analysis of Users in Data Analytics Domain
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
srd117.final.512Spring2016
srd117.final.512Spring2016srd117.final.512Spring2016
srd117.final.512Spring2016
 
Integrated expert recommendation model for online communitiesst02
Integrated expert recommendation model for online communitiesst02Integrated expert recommendation model for online communitiesst02
Integrated expert recommendation model for online communitiesst02
 
Q046049397
Q046049397Q046049397
Q046049397
 
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
 
IRJET- Finding Related Forum Posts through Intention-Based Segmentation
IRJET-  	  Finding Related Forum Posts through Intention-Based SegmentationIRJET-  	  Finding Related Forum Posts through Intention-Based Segmentation
IRJET- Finding Related Forum Posts through Intention-Based Segmentation
 
Predicting the Brand Popularity from the Brand Metadata
Predicting the Brand Popularity from the Brand MetadataPredicting the Brand Popularity from the Brand Metadata
Predicting the Brand Popularity from the Brand Metadata
 
Social Friend Overlying Communities Based on Social Network Context
Social Friend Overlying Communities Based on Social Network ContextSocial Friend Overlying Communities Based on Social Network Context
Social Friend Overlying Communities Based on Social Network Context
 
14420-Article Text-17938-1-2-20201228.pdf
14420-Article Text-17938-1-2-20201228.pdf14420-Article Text-17938-1-2-20201228.pdf
14420-Article Text-17938-1-2-20201228.pdf
 
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptxSampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
 
software_engg-chap-03.ppt
software_engg-chap-03.pptsoftware_engg-chap-03.ppt
software_engg-chap-03.ppt
 
TruSIS: Trust Accross Social Network
TruSIS: Trust Accross Social NetworkTruSIS: Trust Accross Social Network
TruSIS: Trust Accross Social Network
 
Report
ReportReport
Report
 
EffectiveCrowdSourcingForProductFeatureIdeation v18
EffectiveCrowdSourcingForProductFeatureIdeation v18EffectiveCrowdSourcingForProductFeatureIdeation v18
EffectiveCrowdSourcingForProductFeatureIdeation v18
 
Who are the top influencers and what characterizes them?
Who are the top influencers and what characterizes them?Who are the top influencers and what characterizes them?
Who are the top influencers and what characterizes them?
 

Recently uploaded

Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxnada99848
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 

Recently uploaded (20)

Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptx
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 

Prediction of Reaction towards Textual Posts in Social Networks

  • 1. Prediction of Reaction towards Textual Posts in Social Networks Mohamed Mahmoud (elgeish@stanford.edu) Abstract Posting on social networks could be a gratifying or a terrifying experience de- pending on the reaction the post and its author —by association— receive from the readers. To better understand what makes a post popular, this project inquires into the factors that determine the number of likes, comments, and shares a textual post gets on LinkedIn; and finds a predictor function that can estimate those quanti- tative social gestures. Keywords: Linear Regression; LinkedIn; Machine Learning; Popularity; Social Net- works 1 Introdcution Social media have been playing a prominent role in our daily lives in the past decade; they connect us with our families, friends, and colleagues in un- precedented ways. Sharing on social networks be- came a primary means of communication —a basic human need— that has the potential of reaching a large group of recipients around the globe. Your posts on social networks are a reflection of who you think you are (self-perception), or what you want others to see; and how readers react to them, is a reflection of how they perceive you. The lack of immediacy in such social interactions allows the au- thor to ruminate on what to write, when sharing a textual post, in hopes of maximizing the fulfillment of a desideratum (akin to an essayist or a reporter seeking a certain goal). On the other hand, readers reciprocate by showing appreciation of the effort exerted in the form of virtual social gestures (akin to fan letters), which may help the post travel fur- ther, and propagate through many social networks. Let’s take LinkedIn as an example of social net- works where members are allowed to like, comment on, and re-share posts; the more likes a LinkedIn post gets, the more fulfilled the author feels about it; such reaction vouches for the author’s ethos, and supports the author’s claim to fame. Conversely, the author’s reputation and self-esteem might suf- fer when the post is uncelebrated. In order to alle- viate the social pressure associated with sharing on social media, this work proposes a system that pre- dicts quantitative reaction towards textual posts in social networks over a specific time window. This time series analysis can help authors gauge the pop- ularity (as a proxy for quality) of an update before posting it; they can refine the post if the scores are unsatisfactory, and examine how various versions of the same post score. In addition, the system can be used by social networks to predict which posts are more appealing to readers for ranking purposes [1]. This work will focus on LinkedIn as the social network of choice. 2 Input-Output Behavior The input to the proposed system is a textual post shared by a member of a social network, and the output is a prediction of the quantitative reaction (number of likes, comments, and shares) the post shall receive within a certain time window; we’ll use a fixed window of one day. Take, for example, the post in the figure below: Figure 1: an example of a textual post The output corresponding to this input post is a vector of (predicted number of likes, predicted number of comments, and predicted number of shares) — one for each day. Here’s another exam- ple that should score much higher than the former: Figure 2: an example of a popular textual post 1
  • 2. 3 Model Disclaimer: The analysis, exploration, and prepa- ration of data; feature engineering, and extrac- tion; use, and fine-tuning of machine learning al- gorithms, along with code developed for train- ing, validating, and testing the model; and prac- tices adopted for this project are driven by my own personal experience, and not connected to any LinkedIn product. The data discussed here were anonymized —by removing personally identi- fiable information— in accordance with LinkedIn’s strict policies regarding data privacy. In order to protect LinkedIn’s intellectual property, some of the features mentioned below are redacted. The goal of the proposed system is to obtain a predictor function f that maps new input x to output y ∈ R3 corresponding to (predicted number of likes, predicted number of comments, and pre- dicted number of shares) for the input post and age in days (time window). This is a regression problem, which is solved by using the following framework (using supervised learning, which is given the training data to pro- duce the predictor function): Figure 3: diagram of a learning framework The training data Dtrain is a set of examples, which are basically input-output pairs: the inputs are the posts (including age in days), and outputs are the corresponding social gestures. 3.1 Data Preparation The data came from original textual posts on LinkedIn; an original post is authored by its poster, and not a re-share of another post. In order to learn a predictor, three datasets were gathered for train- ing, validation, and test; the datasets came from historical LinkedIn data about using ETL (Extract, Transform, Load); one of the challenges faced dur- ing this project is performing ETL at the scale of LinkedIn: joining multiple datasets that contain billions of records; each dataset contains certain aspects of the record that serves as an example for training. Member IDs, along with personally iden- tifiable information, were removed once the meta- data were generated. The combined tally of exam- ples in the training and validation sets is around 3.8 million — after removing outliers. In order to maintain good data hygiene, the test set was gath- ered after training was completed from a date range that doesn’t overlap with that of the training and validation datasets, and it amounts roughly to 0.25 million examples (out of 4.05 million examples in total). The data are split to 75% for training, 19% for validation, and 6% for testing. 3.1.1 Outliers Outliers were determined by plotting [2] the dis- tribution of the label values; here’s an example of the distribution of label values for likes using a log scale: 0e+00 2e+05 4e+05 6e+05 log(label value) count Figure 4: distribution of label values for likes After exploring the distribution of data at dif- ferent bucket widths, a cutoff line was chosen to remove outliers that are unlikely to occur (based on the gathered data). This process was repeated for comments and shares as well. 3.1.2 Missing Data Some of the fields in the collected records were missing at random; for example, fields that were left blank by the members of the social network. 2
  • 3. Whenever possible, a replacement value was cal- culated for the missing field. For example, for a missing timezone field, an approximation was cal- culated using the member’s country. A more in- teresting example for a missing field is one that’s real-valued, which was replaced by the mean value of that field in the observed examples, which is an acceptable solution to replace data missing at ran- dom (MAR). 3.1.3 Raw Data A labeled example is basically a tuple of the tex- tual post along with its metadata (input); and its label is the number of likes, comments, and shared generated for the post (output). A typical record in a dataset looks like the following: ((Text & Metadata), (Likes, Comments, Shares)) Metadata include data about the post like age, vis- ibility (e.g. public), author metadata (e.g. network size), etc.; such metadata is essential to better rep- resent the inputs (in correlation to the output) in the context of the problem at hand; in a social net- work, the text of a post per se is insufficient to predict its popularity; we need to consider other factors like metadata about the post and its au- thor; for example, the size of the author’s network is expected to play a major role in predicting pop- ularity. After the raw data were extracted, the records were serialized and stored into binary files to save space. Those binary files served as input files for the feature extractor. 3.2 Scoring Each input is going to be distilled into a fea- ture vector φ(x) = [φ1(x), . . . , φd(x)] ∈ Rd , which represents the input and will be computed using a feature extractor. Correspondingly, a weight vector wi = [wi1, . . . , wid] ∈ Rd ; i ∈ {likes, comments, shares} specifies the weight of each feature to each component of the prediction vector y ∈ R3 . Given a feature vector φ(x) and a weight vec- tor wi, the respective prediction score component yi ∈ R is their inner product: yi = wi · φ(x) The score vector represents a snapshot of the post’s popularity at the given age in days, to get the time series, the age is incremented by the de- sired time unit. 3.3 Feature Extraction Based on domain knowledge, a feature vector φ(x) is picked to represent an input x and contribute to the prediction vector y. The examples were loaded from the binary files where they were stored, then transformed into tu- ples of (feature vector, label vector). The feature vector is the union of raw features and features de- rived from the raw data. The feature extractor stored the tuples into binary files using Python’s cPickle module. Caching the feature vectors and their respective labels speeds up the training pro- cess instead of re-extracting the features from the examples every time predicted scores are calcu- lated. The features that contribute to the popularity of textual posts can be divided into the following: • Textual features (pertaining to the post’s text) • Post metadata features (pertaining to the ac- tion of posting) • Author features (pertaining to the author) In each category, there are real-valued and indi- cator features that are listed below as feature tem- plates (to be filled out by the training data). 3.3.1 Textual Features • log(post length) ∈ [0.0, 3.2): boolean feature • log(post length) ∈ [3.2, 6.4): boolean feature • log(post length) ∈ [6.4, ): boolean feature • contains a URL: boolean feature • contains a question: boolean feature • contains an e-mail address: boolean feature • contains a hashtag: boolean feature • contains a simley emoticon: boolean feature • contains a frowny emoticon: boolean feature • post length: real-valued feature • log(post length): real-valued feature • ratio of non-alphanumeric characters: real- valued feature • word count: real-valued feature • stemming: real-valued features, one for each stem in the text and its count 3
  • 4. • unigrams: real-valued features, one for each word in the text and its count • bigrams: real-valued features, one for each bi- gram (two adjacent words) in the text and its count • trigrams: real-valued features, one for each trigram (three adjacent words) in the text and its count 3.3.1.1 Bucketization of Post Length In order to figure out the relationships between the proposed features and their respective weights, interactive exploration of the data was performed as a part of the feature engineering process; plot- ting various features vs. the number of social ges- tures uncovered some insights. For example, the log(post length) can be bucketized into three buck- ets: 0 2 4 6 log(post length) numberoflikesexcludingoutliers Figure 5: bubble chart of log(post length) and num- ber of likes excluding outliers The bucket boundaries are 0, 3.2, and 6.4; un- surprisingly, counts of comments and shares fol- lowed suit due to the correlation between the three social gestures: 0 2 4 6 log(post length) numberofcommentsexcludingoutliers Figure 6: bubble chart of log(post length) and num- ber of comments excluding outliers 0 2 4 6 log(post length) numberofsharesexcludingoutliers Figure 7: bubble chart of log(post length) and num- ber of shares excluding outliers 3.3.1.2 Stemming Using the Snowball stemmer from nltk [3], which is language-specific, whenever the post’s lan- guage was supported; otherwise, the Porter stem- mer was used. The Snowball stemmer has a much better understanding of the language model — including stopwords exclusion— and it supports the following languages: Danish, Dutch, English, 4
  • 5. Finnish, French, German, Hungarian, Italian, Nor- wegian, Portuguese, Romanian, Russian, Spanish and Swedish; Many more languages were found in the examples. 3.3.2 Post Metadata Features • language of post: indicator feature • day of month: indicator feature • day of week: indicator feature • hour of day: indicator feature • day of month and hour: indicator feature • day of week and hour: indicator feature • sharing visibility: indicator feature • post is in member interface locale: boolean feature • post is in member default locale: boolean fea- ture • post is in member locale: boolean feature • post age in days: real-valued feature • log(post age in days): real-valued feature • post age in minutes: real-valued feature • log(post age in minutes): real-valued feature • mentions count: real-valued feature 3.3.2.1 Language Identification Language identification was performed using a language identifier software library developed by LinkedIn. 3.3.2.2 Locality of Timestamps Timestamps were adjusted to represent local time according to the post’s timezone; the predictor should calculate the same score for two posts shared at the same local time — all other factors being equal. For example, if a member who lives in Cali- fornia shared a post at 10 AM PST, it should have the same score as a post shared at 10 AM EST by an identical member who lives in New York; time- based features have to be trained with respect to groups of values that contribute to the score in the same fashion. This is important because of the role the time of day when a post was published plays in predicting its popularity; we here assume that the majority of the post’s target readership lives in the same timezone as the poster [4], [5]. 3.3.3 Author Features • default locale: indicator feature • interface locale: indicator feature • country: indicator feature • industry: indicator feature • timezone: indicator feature • connections visibility: indicator feature • feed visibility: indicator feature • picture visibility: indicator feature • is a LinkedIn influencer: boolean feature • interface locale is default: boolean feature • connections count: real-valued feature • log(connections count): real-valued feature • followers count: real-valued feature • log(followers count): real-valued feature • average likes count: real-valued feature • average comments count: real-valued feature • average shares count: real-valued feature • a set of proprietary features (redacted) Figure 8: chart of average likes per member (ex- cluding outliers) and number of likes (excluding outliers); unsurprisingly, there is a correlation be- tween the two dimensions 5
  • 6. 4 Approach 4.1 Baseline A rule-based system was chosen to predict the num- ber of likes, comments, and shares for a given input by searching for certain keywords in the text, and factoring in the size of the author’s network in a formula for each prediction, for example: likes = α(network size) + word∈text weight(word) The coefficients and weights in such system can be guessed based on heuristics or domain exper- tise. An example of a baseline that was chosen with α = 0.011 and weight(’I’) = 1 yielded a large test error; see the results section for more details. 4.2 Oracle An oracle can see the future, and tell us exactly how many likes, comments, and shares a post has at the end of a future time window. So, in our case, it’s basically a time machine. 4.3 Linear Regression Linear regression can be used to predict the num- ber of social gestures by learning the weights vec- tors that contribute to the scores. The objective is to minimize the average loss determined by the squared loss function: Losssquared(x, yi, wi) = (wi · φ(x) − yi)2 4.4 Stochastic Gradient Descent The choice of stochastic gradient descent (SGD) of linear regression was an obvious one as the datasets used in this project are large, a fact that influenced the tuning of hyperparameters and the algorithm as well. 4.4.1 Hyperparameters Because of the large number of example, the fol- lowing formula was used for the learning rate: η = min(0.001, number of updates) It starts with a value that’s relatively large –yet small enough to keep the weights from overflowing– then get smaller as the number of updates increases (as the convergence rate increases). Termination of the algorithm was determined by either reaching diminishing improvements ( = 0.0001) of the combined training and validation errors (|TEt+1 − TEt| + |VEt+1 − VEt| < ), or exhausting the maximum number of iterations al- lowed; however, the latter can be incremented by 1 if a significant improvement in the validation error has been observed (ε = 0.1); The same condition was used to save a snapshot of the training pro- gram in case it got aborted. When the program was done, the weights vector was pickled (serial- ized) and saved to disk for the test program to load and evaluate. The examples used for training were equally divided to 100 file; the training dataset had 80 files, and the validation dataset had 20 files. The order of the training files to process at the start of each iteration was picked at random, and the con- tent of each file was then shuffled as well in hopes of increasing the convergence rate. 4.4.2 Feature Normalization One of the issues found while training was overflow of weights; the feature values were highly variant, and had to be normalized; the following formula was used to rescale the values: φ(x) = φ(x) − min(φ(x)) max(φ(x)) − min(φ(x)) 4.5 Gradient Descent Using VW Vowpal Wabbit [6] makes use of parallel threads, feature hashing, and cache files to speed up the gra- dient descent algorithm. Running VW with a single pass, and with 100 passes generated similar results; when run with multiple passes, VW held out 10% of the examples for validation and reported the dev loss instead of the training loss. VW was run using the squared loss function, and feature normaliza- tion. 4.5.1 Hyperparameters VW was also run with the adaptive option, which sets an individual learning rate for each feature, which improves learning when the feature vector is large [7]. The initial learning rate was set to the default value (η0 = 0.5). 5 Results Please note that the fitted coefficients were redacted to protect LinkedIn’s intellectual prop- erty. 6
  • 7. Baseline SGD VW (1 Pass) VW (100 Passes) Likes Iterations N/A 101 1 100 Features Count 2 3016665 590650105 53157724900 Training RMSE N/A 3.930 3.683 N/A Dev RMSE N/A 4.222 N/A 3.384 Test RMSE 8.352 2.440 N/A N/A Comments Iterations N/A 25 1 100 Features Count 2 3016878 590650105 5315772490 Training RMSE N/A 2.374 2.144 N/A Dev RMSE N/A 2.497 N/A 2.120 Test RMSE 2.576 1.068 N/A N/A Shares Iterations N/A 2 1 100 Features Count 2 3017243 590650105 53157724900 Training RMSE N/A 0.446 0.474 N/A Dev RMSE N/A 0.484 N/A 0.447 Test RMSE 63.878 0.424 N/A N/A Table 1: comparison of various approaches RMSE is the root-mean-square error (since the predictors used the squared loss function); in the table above, it’s rounded to the third decimal place. VW produced a model file of hashed features and their respective weights; parsing that exces- sively large vector into a format that the test har- ness understands —to calculate Test RMSE— was prohibitive; Dev RMSE is a good proxy for it (in the scope of this project). 6 Literature Review Another project that was set to predict Facebook likes [5] compared two approaches: linear regres- sion, and nearest neighbor; and found out the for- mer is more effective in predicting likes than the latter. It’s also worth mentioning that the number of examples used in that project, 49216, is two or- ders of magnitude smaller than the one used here, which indicates that the test data may not have enough statistical coverage for a social network of more than one billion daily unique users [8]. 7 Potential Improvements There are a few more improvements that I would have liked to explore; for example, adding more textual features like part-of-speech tagging, and lemmatization. Another possible addition to the feature vector is metadata features about who liked, commented on, and shared a post; the observed labels repre- sent a time series of an underlying sequences of values that aren’t IID (Independent and Identically Distributed); for example, the probability that a post gets n more likes at time tj depends on who liked it at all times before tj because likes propa- gate through the news feed (when member x likes a public post y, it shows up as news for x’s network, and they can like it, comment on it, and/or share it from their news feeds); this is known as serial coupling [9]; augmenting the feature vector with metadata features about members who reacted to the post can improve the accuracy of predicting its popularity [10], but it will make the cardinality of the feature vector orders of magnitude larger than what is currently used. Exploring other loss func- tions, ensemble learning, more non-linear features, and feature interactions (e.g., the cross product of metadata features) might yield more accurate pre- dictions. 8 Acknowledgments I’d like to thank Percy Liang, and CS221 TA’s for the guidance they gave me throughout the quarter. I’d also like to thank LinkedIn (special thanks to Guy Lebanon and Bee-Chung Chen) for providing the data that made this project possible. 7
  • 8. References [1] D. Agarwal, B.-C. Chen, Q. He, Z. Hua, G. Lebanon, Y. Ma, P. Shivaswamy, H.-P. Tseng, J. Yang, and L. Zhang, “Personalizing linkedin feed”, in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’15, Sydney, NSW, Australia: ACM, 2015, pp. 1651–1660, isbn: 978-1-4503-3664-2. doi: 10.1145/2783258.2788614. [Online]. Avail- able: http://doi.acm.org/10.1145/2783258.2788614. [2] H. Wickham, Ggplot2: Elegant graphics for data analysis. Springer New York, 2009, isbn: 978-0-387- 98140-6. [Online]. Available: http://had.co.nz/ggplot2/book. [3] S. Bird, E. Klein, and E. Loper, Natural Language Processing with Python, 1st. O’Reilly Media, Inc., 2009, isbn: 0596516495, 9780596516499. [4] Z. Ellison and S. Hildick-Smith, “Blowing up the twittersphere: Predicting the optimal time to tweet”, Stanford, Stanford, CA, Tech. Rep., 2014. [Online]. Available: http://cs229.stanford.edu/ proj2014/Seth%20Hildick- Smith, %20Zach%20Ellison, %20Blowing%20Up%20The% 20Twittersphere-%20Predicting%20the%20Optimal%20Time%20to%20Tweet.pdf. [5] K. Chen, B. Huang, and B. Lee, “Facebook like predictor within your friends”, Northwestern Uni- versity, Evanston, IL, Tech. Rep., 2015. [Online]. Available: http://kbbz.github.io/files/ Final%20report.pdf. [6] J. Langford, L. Li, and A. Strehl, Vowpal Wabbit, 2007. [7] J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic optimization”, J. Mach. Learn. Res., vol. 12, pp. 2121–2159, Jul. 2011, issn: 1532-4435. [Online]. Available: http://dl.acm.org/citation.cfm?id=1953048.2021068. [8] M. Zuckerberg, 2015. [9] L. Cao, “Non-iidness learning in behavioral and social data”, The Computer Journal, 2013. doi: 10 . 1093 / comjnl / bxt084. eprint: http : / / comjnl . oxfordjournals . org / content / early/2013/08/22/comjnl.bxt084.full.pdf+html. [Online]. Available: http://comjnl. oxfordjournals.org/content/early/2013/08/22/comjnl.bxt084.abstract. [10] M. Dundar, B. Krishnapuram, J. Bi, and R. B. Rao, “Learning classifiers when the training data is not iid”, in Proceedings of the 20th International Joint Conference on Artifical Intelligence, ser. IJCAI’07, Hyderabad, India: Morgan Kaufmann Publishers Inc., 2007, pp. 756–761. [Online]. Available: http: //dl.acm.org/citation.cfm?id=1625275.1625397. 8
  • 9. Appendix A Learning Rate Plots 0 5 10 15 0 5 10 15 20 log2(examples count) RMSE comments likes shares Figure A9: line chart of learning rate using VW (single pass) 9
  • 10. 0 5 10 15 0 10 20 log2(examples count) RMSE comments likes shares Figure A10: line chart of learning rate using VW (100 passes) 10