SlideShare a Scribd company logo
1 of 53
Download to read offline
American International University - Bangladesh
Faculty of Science and Information Technology
Department of Computer Science
User Behavior Modeling & Recommendation
System Based On Social Networks
A thesis submitted for the degree of
Bachelor of Science in Computer Science and Engineering
By:
Alam Shah
10-17685-3
Hossain, MD. Shakawat
11-18494-1
Taher, Najeeb Ahmad
11-18198-1
Supervisor:
Md. Saddam Hossain
Assistant Professor, Department of Computer Science, American
International University-Bangladesh
Summer 2014
Declaration
This is to certify that this project is our original work. No part of this has been
submitted elsewhere partially or fully for the award of any other degree. Any
material reproduced in this project has been properly acknowledged.
Alam Shah Hossain MD. Shakawat
ID: 10-17685-3 ID: 11-18494-1
Department: CSE Department: CSE
Taher, Najeeb Ahmad
ID: 11-18198-1
Department: CSE
i
Approval
The thesis titled “User Behavior Modeling & Recommendation System Based
On Social Networks” has been submitted to the following respected members of
the Board of Examiners of the Faculty of Science and Information Technology
in partial fulfillment of the requirements for the degree of Bachelor of Science in
Computer Science Engineering and has been accepted satisfactory.
Md. Saddam Hossain
Assistant Professor
Faculty of Computer Science
American International University-Bangladesh
Dr. Dip Nandi
Assistant Professor & Head
Faculty of Computer Science
American International University-Bangladesh
ii
iii
Professor Dr. Tafazzal Hossain
Dean
Faculty of Computer Science
American International University-Bangladesh
Dr. Carmen Z. Lamagna
Vice Chancellor
American International University-Bangladesh
iii
Acknowledgements
Special thanks to our honorable teacher and supervisor Md. Sad-
dam Hossain, Assistant Professor, Department of Computer Science,
American International University-Bangladesh. We are very grateful
to him for giving us the opportunity to work with him. Without his
continuous support, it would be very difficult for us to complete this
work. We would also like to thank all the faculty members for their
guidelines for making proper documentation for our project.
Abstract
At present social networks play an important role to express people’s
sentiment and people’s interest in a particular field. Extracting a
user’s public social network data (what the user shares with friends
and relatives and how the user reacts over others’ thought) means
extracting the user’s behavior. Defining some determined hypothesis
if we make machine understand human sentiment and interest, it is
possible to recommend a user his/her personal interest on basis of
the user’s sentiment analyzed by machine. Our main approach is to
suggest a user regarding the user’s specific interest that is anticipated
by analyzing the user’s public data. This can be extended to further
business analysis to suggest products or services of different companies
depending on the consumer’s personal choice. This automation will
also help to choose the correct candidate for any questionnaire. This
system will also help anyone to know about himself or herself, how
one’s behavior may influence others. It is possible to identify different
types of people such as- dependable people, leadership skilled, people
of supportive mentality, people of negative mentality etc.
Table of Contents
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . : vii
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . : 1
2. Previous Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . : 3
2.1 Location Based Social Network. . . . . . . . . . . . . . . . . . ...: 3
2.2 Collaborative Recommendation
Based Social Network. . . . . . . . . . . . . . . . . . . . . . . . . . . ..: 8
2.3 Sentimental Intensity
Analysis of Informal Texts. . . . . . . . . . . . . . . . . . . . . . . . : 12
2.4 Big Five [1] Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . ..: 16
3. Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...: 28
4. Proposed Research Methodology. . . . . . . . . . . . . . . . . . . . . ...: 29
4.1 Data Collection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...: 29
4.2 Data Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .: 31
4.3 Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..: 32
4.4 Recommendation Analysis. . . . . . . . . . . . . . . . . . . . . . . . : 33
5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .......: 36
vi
List of Figures
4.1 Modeling User Behavior . . . . . . . . . . . . . . . . . . . . . . . 30
4.2 Pie Chart of LIWC Results . . . . . . . . . . . . . . . . . . . . . 32
4.3 Personality Based Recommendation System . . . . . . . . . . . . 33
vii
List of Tables
2.1 Comparison of different location based social networks . . . . . . 7
4.1 Relationship between LIWC categories and Big Five factors . . . 31
4.2 Products under Big Five factors . . . . . . . . . . . . . . . . . . . 34
4.3 Products under Big Five factors . . . . . . . . . . . . . . . . . . . 34
4.4 Products under Big Five factors . . . . . . . . . . . . . . . . . . . 34
4.5 Products under Big Five factors . . . . . . . . . . . . . . . . . . . 35
4.6 Products under Big Five factors . . . . . . . . . . . . . . . . . . . 35
4.7 Products under Big Five factors . . . . . . . . . . . . . . . . . . . 35
viii
Chapter 1
Introduction
With millions of users, social networking services like Facebook [2] and Twitter [3]
have become some of the most popular internet applications. These applications
are sources of knowledge and information. The rich knowledge that has been
accumulated in these social networking sites enables a variety of recommendation
systems for new users and media [4]. To use such opportunity, it is possible to
create automated system that can categorize social network users according to
Big Five [1] personality factors. To categorize users in such categorization system,
users’ data are needed to be collected without interfering their daily activities.
Thus the system will help people to know about other people. For example: An
employee needs vacation and if his boss is listed as a friend on OSN (Online Social
Networks) then the employee gets the chance to apply for his demand according
to the boss’s behavior determined by the system (Neuroticism [1] indicates higher
chances of disagree when Agreeableness [1] indicates higher chances of agree).
Online Social Networks (OSN) deal with big data, after analyzing such data, the
system will be able to predict a suitable person for leadership or people who may
oppose the leadership. Many challenges to recommendation systems have been
tackled by many new approaches, using different data sources and methodologies
to generate different kinds of recommendations. In this article we provide a
description of such systems.
From the very beginning, Consumer interests have a great influence on business
policy. Offering the right products or services to the right customers is the main
objective of every successful business policy. Many business organizations can
1
2
be benefited by using the data collected from OSN. At present the popularity of
social networks is increasing very rapidly. From sociologist’s points of view, OSN
can be characterized as “collective goods produced through computer mediated
collective action” [5]. Users spend a huge amount of time of their daily life
involving in OSN and share a lot of information about them and their friends
and families. So, this is a great opportunity to know about the sentiments and
the interests of the people. It is possible to understand the behavior of the users
of OSN as it becomes a crucial factor for advertising policies and better product
design.
In particular giving the success of item recommendation systems of commercial
websites, such as Amazon [6] and Netflix [7], it is considered worthwhile to revisit
the recommendation problem through the perspective of social networking. In
general, recommendation systems aim to provide personalized recommendations
of items to users based on their previous behavior as well as on other information
gathered by item descriptions and user profiles.
Our experiment is based on Twitter [3] and Facebook [2]; the most popular OSN
websites having a large place of advertisements. These websites have a very big
number of users and the users feel comfortable using these social networking sites
because of the user-friendly features of these sites such as micro-blogging, status
updating, photos and videos sharing, commenting on posts, joining and creating
groups, liking and subscribing pages and profiles, creating events, playing games
and so on.
We aim to analyze user behavior by the following steps- collecting the user’s
past activities in OSN, mapping it on Big Five factors [1], finding out a set of
particular interests field of the user and recommending him or her by giving
informative services.
2
Chapter 2
Previous work
OSN is the practice of expanding the number of business and social contacts of
a person by making connections through individuals [8]. In this era of internet
OSN is extremely popular among people. According to Nielsen Onlines report
two third of world population spent 10% of their time in internet in OSN [9]. As
OSN give opportunity to its user to express what he/she wants to say with their
friends, relatives and others connected through their OSN account. There are
huge amount of chances to identify/characterize one’s behavior types implicitly
without interfering his or her personal life [4].
2.1 Location Based Social Network [10]
A social network is a social structure made up of individuals connected by one or
more specific types of interdependency, such as friendship, common interests, and
shared knowledge. Generally, a social networking service builds on and reflects
the real-life social networks among people through online platforms such as a
website, providing ways for users to share ideas, activities, events, and interests
over the Internet. The increasing availability of location-acquisition technology
(for example GPS and Wi-Fi) empowers people to add a location dimension to
existing online social networks in a variety of ways. For example, users can upload
location-tagged photos to a social networking service such as Flickr [11], comment
3
2.1. Location Based Social Network [10] 4
on an event at the exact place where the event is happening (for instance, in Twit-
ter [3]), share their present location on a website (such as Foursquare [12]) for
organizing a group activity in the real world, record travel routes with GPS tra-
jectories to share travel experiences in an online community. Here, a location can
be represented in absolute (latitude-longitude coordinates), relative (100 meters
north of the Space Needle), and symbolic (home, office, or shopping mall) form.
Also, the location embedded into a social network can be a stand-alone instant
location of an individual, like in a bar at 9pm, or a location history accumulated
over a certain period, such as a GPS trajectory: a cinema a restaurant a park a
bar.
The dimension of location brings social networks back to reality, bridging the
gap between the physical world and online social networking services. For exam-
ple, a user with a mobile phone can leave his/her comments with respect to a
restaurant in an online social site (after finishing dinner) so that the people from
his/her social structure can reference his/her comments when they later visit the
restaurant. In this example, users create their own location-related stories in the
physical world and browse other peoples information as well. An online social site
becomes a platform for facilitating the sharing of peoples experiences. Further-
more, people in an existing social network can expand their social structure with
the new interdependency derived from their locations. As location is one of the
most important components of user context, extensive knowledge about an indi-
viduals interests and behavior can be learned from her locations. For instance,
people who enjoy the same restaurant can connect with each other. Individuals
constantly hiking the same mountain can be put in contact with each other to
share their travel experiences. Sometimes, two individuals who do not share the
same absolute location can still be linked as long as their locations are indicative
of a similar interest, such as beaches or lakes.
These kinds of location-embedded and location-driven social structures are known
as location-based social networks, formally defined as follows:
“A location-based social network (LBSN) [10] does not only mean adding a loca-
tion to an existing social network so that people in the social structure can share
location embedded information, but also consists of the new social structure made
up of individuals connected by the interdependency derived from their locations in
4
2.1. Location Based Social Network [10] 5
the physical world as well as their location-tagged media content, such as photos,
video, and texts. Here, the physical location consists of the instant location of an
individual at a given timestamp and the location history that an individual has
accumulated in a certain period. Further, the interdependency includes not only
that two persons co-occur in the same physical location or share similar location
histories but also the knowledge, e.g., common interests, behavior, and activities,
inferred from an individual’s location (history)and location-tagged data.”
In a location-based social network, people can not only track and share the
location-related information of an individual via either mobile devices or desktop
computers, but also leverage collaborative social knowledge learned from user gen-
erated and location-related content, such as GPS trajectories and geo-tagged pho-
tos. One example is determining this summers most popular restaurant by mining
peoples geo-tagged comments. Another example could be identifying the most
popular travel routes in a city based on a large number of users geo-tagged pho-
tos. Consequently, LBSNs enable many novel applications that change the way
we live, such as physical location (or activity) recommendation systems [13] [14]
and travel planning , while offering many new research opportunities for social
network analysis (like user modeling in the physical world and connection strength
analysis) [15] [16], spatio-temporal data mining [17], ubiquitous computing [18],
and spatio-temporal databases [17] [19] Existing applications providing location-
based social networking services can be broadly categorized into three folds: geo-
tagged-media-based, point-location-driven and trajectory-centric.
• Geo-tagged-media-based. [10] Quite a few geo-tagging services enable users
to add a location label to media content such as text, photos, and videos
generated in the physical world. The tagging can occur instantly when
the medium is generated, or after a user has returned home. In this way,
people can browse their content at the exact location where it was created
(on a digital map or in the physical world using a mobile phone). Users can
also comment on the media and expand their social structures using the
interdependency derived from the geo-tagged content (for example, in favor
of the same photo taken at a location). Representative websites of such
location-based social networking services include Flickr, Panoramio, and
5
2.1. Location Based Social Network [10] 6
Geo-twitter. Though a location dimension has been added to these social
networks, the focus of such services is still on the media content. That is,
location is used only as a feature to organize and enrich media content while
the major interdependency between users is based on the media itself.
• Point-location-driven. [10] Applications like Foursquare and Google Lati-
tude encourage people to share their current locations, such as a restaurant
or a museum. In Foursquare, points and badges are awarded for checking
in at venues. The individual with the most number of check-ins at a venue
is crowned Mayor. With the real-time location of users, an individual can
discover friends (from her social network) around her physical location so as
to enable certain social activities in the physical world, e.g., inviting people
to have dinner or go shopping. Meanwhile, users can add tips to venues
that other users can read, which serve as suggestions for things to do, see,
or eat at the location. With this kind of service, a venue (point location) is
the main element determining the in-terdependency connecting users, while
user-generated content such as tips and badges feature a point location.
• Trajectory-centric. [10] In a trajectory-centric social networking service,
such as Bikely, SportsDo, and Microsoft GeoLife, users pay attention to
both point locations (passed by a trajectory) and the detailed route con-
necting these point locations. These services do not only tell users basic
information, such as distance, duration, and velocity, about a particular
trajectory, but also show a users experiences represented by tags, tips, and
photos for the trajectory. In short, these services provide how and what
information in addition to where and when. In this way, other people can
reference a users travel/sports experience by browsing or replaying the tra-
jectory on a digital map, and follow the trajectory in the real world with a
GPS-phone.
6
2.1. Location Based Social Network [10] 7
Table 2.1 provides a brief comparison among the set here services. The major
differences between the point-location-driven and the trajectory-centric LBSN lie
in two aspects. One is that a trajectory offers richer information than a point
location, such as how to reach a location, the temporal duration that a user
stayed in a location, the time length for travelling between two locations, and the
physical/traffic conditions of a route. As a result, we are more likely to accurately
understand an individuals behavior and interests in a trajectory-centric LBSN.
The other is that in a point-location-driven LBSN users usually share their real-
time location while the trajectory-centric more likely delivers historical locations
as users typically prefer to upload a trajectory after a trip has finished (though
it can be operated in a continuously uploading manner). This property could
compromise some scenarios based on the real-time location of a user, however, it
reduces to some extent the privacy issues in a location-based social network. In
other words, when people see a users trajectory the user is no longer there.
Table 2.1. Comparison of different location based social networks
LBSN Services Focus Real-time Information
Geo-tagged-media-based Media Normal Poor
Point-location-driven Point location Instant Normal
Trajectory-centric Trajectory Relatively Slow Rich
Actually, the location data generated in the first two LBSN services can be
converted into the form of a trajectory which might be used by the third category
of LBSN service. For example, if we sequentially connect the point locations of
the geo-tagged photos taken by a user over several days, a sparse trajectory can be
formulated. Likewise, the check-in records of an individual ordered by time can
be regarded as a low-sampling-rate trajectory. However, due to the sparseness,
i.e., the distance and time interval between two consecutive points in a trajectory
could be very big, the uncertainty existing in a single trajectory from the first
two services is increased. Aiming to put these trajectories into trajectory-centric
LBSN services, we need to use them in a collective and collaborative way.
Trajectory data is the most complex data structure to be found in the three
7
2.2. Collaborative Recommendation Based Social Network [20] 8
LBSN services, and provides the richest information. If it is handled well, other
data sources become easier to deal with. Moreover, as mentioned above, loca-
tion data can be converted into a trajectory on many occasions. Consequently,
some methodologies designed for trajectory data can be employed by the first two
LBSN services.
2.2 Collaborative Recommendation Based So-
cial Network [20]
With the recent advances in technology, there is an emerging presence of social
media and social networking systems. In the case of multimedia enriched social
network systems, such as last.fm, the collective goods are musical tracks and the
collective action is the process of crafting individual profiles of musical preference
and linking them either explicitly, via bonds of friendship, or implicitly, through
collaborative annotation.
This collective action leads to the creation of an implicit social networking struc-
ture, which we aim to further explore. In particular given the success of item
recommendation systems in commercial websites, such as Amazon.com and Net-
flix, it is considered worthwhile to revisit the recommendation problem through
the novel perspective of social networking. In general, recommendation systems
aim to provide personalized recommendations of items to users based on their
previous behavior as well as on other information gathered by item descriptions
and user profiles.
However, no emphasis has been placed yet on personalization based explicitly on
social networks. The reason is that despite there is an increasing interest in the
exploration of social networks, there does not exist a concrete dataset that in-
cludes both explicit bonds of friendships among users and free-form collaborative
annotation of items. This is due to that most social media systems do not allow
for free access to all user profiles or lists of friends.
Given the incentives of the widespread add option of social networks and of the
8
2.2. Collaborative Recommendation Based Social Network [20] 9
lack of some previous study that directly addresses the problem of efficiently in-
tegrating the added value knowledge provided by those networks in the field of
collaborative recommendation, we propose a new methodology that tackles the
aforementioned issues. Within this context we make the following contributions:
• Kontas et al. [20] introduce a dataset based on data from the last.fm so-
cial network that describes a social graph among users, tracks and tags,
effectively including bonds of friendship and collaborative annotation.
• Kontas et al. [20] evaluate a Random Walk with Restarts (RWR) model
on this dataset and show that the incorporation of friendship and social
tagging can improve the performance of an item recommendation system.
• Kontas et al. [20] show that the RWR method outperforms the standard
Collaborative Filtering (CF) method, which we also evaluate against the
same dataset.
• Kontas et al. [20] show that our method using the RWR method requires
no training and successfully manages to capture
Kontas et al. [20] may distinguish two broad categories of collaborative recom-
mendation systems, namely content-based and collaborative filtering. A content-
based system selects items based on the correlation between the content of the
items (e.g. keywords describing the items, such as album genre, artists, etc., for
music tracks) and the users’ preferences [5]. However, it is limited to dictionary-
bound relations between the keywords used by users and the descriptions of items
and therefore does not explore implicit associations between users.
Collaborative filtering systems are divided into two categories, i.e. memory-
based and model-based. In the memory based systems [21] we calculate the
similarity between all users, based on their ratings of items using some heuristic
measure such as the cosine similarity or the Pearson correlation score. Then we
predict a missing rate by aggregating the ratings of the k nearest neighbors of
9
2.2. Collaborative Recommendation Based Social Network [20] 10
the user we want to recommend to. The problem with memory-based systems is
that we have to decide on a rather arbitrary basis over parameters such as the
number of neighbors. What is more, in the case of social networks there is no
straightforward way to introduce similarities between users based on friendships
and social tagging, other than some way of ad hoc interpolation of similarity
weights from those different sources.
The model-based filtering systems assume that the users build up clusters based
on their similar behavior in rating of items. A model is learned based on patterns
recognized in the rating behaviors of users using clustering, Bayesian networks
and other machine learning techniques [22] [23]. The problem with model-based
methods is that it is necessary to fine-tune several parameters of the model as
well as the fact that the models produced might not generalize well in radically
different context. What is more, as in the case of memory-based systems extra
effort and training needs to be done in order to introduce knowledge from social
networks.
Many research publications have been lately revolving around the area of so-
cial media. In particular, several studies focus on dataset collection and analysis
from social networks. Das et al. [24] proposed sample based algorithms that
capture information in the neighborhood of a user in dynamic social networks
utilizing random walks. Halpin et al. [25] studied the distribution of tags in
the social bookmarking site del.icio.us and proposed a generative model of col-
laborative tagging in order to evaluate the dynamics that lie beneath the act of
collaborative recommendation. Their findings prove that the dataset collected fol-
lows a power-law distribution. Even though both studies examine social networks
that are based on social tagging, they do not explore the dynamics of friendships
among users. Taking into account the power of free-form tagging of items by users
other than their authors/owners, researchers also focus on tag recommendation.
Subramanya and Liu [26] propose a system that automatically recommends tags
for blogs, using similarity ranking in a manner similar to collaborative filtering
techniques. Stromhaier [27] studies a novel idea in tag recommendation, which
bridges the gap between the keywords issued by a user in a query and the tags
actually used by a social system. He argues that the tags used by a user when
10
2.2. Collaborative Recommendation Based Social Network [20] 11
performing a query exhibit his or her intent, whereas the annotations of items
describe content semantics. As a result, he proposes a new form of purpose tags,
which extract the intent of the user and facilitate goal oriented search in a social
network. Both studies underline the importance and discriminative power of so-
cial tagging, which is also validated by our work.
Several studies exist in the field of applying Random Walks on bipartite
graphs. Craswell and Szummer [28] study a clickthrough data graph in order
to perform item recommendation. Nevertheless, no social content is available
between users. Yildirim and Krishnamoorthy [23] propose a novel recommenda-
tion algorithm which performs Random Walks on a graph that denotes similarity
measures between items. They evaluate their system using data from Movie Lens.
Although, the use of the Random Walk model performs well in the context of
recommendation, their use of an Item-Item similarity matrix raises some issues
as to the ability of the system to extend when other similarities are introduced
based on social tagging. Recent work has also been done in the field of applying
Random Walks over a social graph instead of bipartite graphs, similar to what
we propose in this paper. Clements et al. [29] propose a single term query system
performing Random Walks on graphs including users, items and tags. They use
data from LibraryThing, an online book catalogue where users rate and tag books
they have read. Due to lack of ground truth, they assume that the tags assigned
to an item by each user are the same as they would use as query terms to retrieve
the annotated item. We argue that this assumption is rather strong and that
a user experiment would be more appropriate in order to properly establish the
ground truth.
Hotho et al. evaluate a variation of adapted PageRank on a dataset from del.icio.us,
exploring folksonomies of bookmarks based also on collaborative annotation [30].
However, since they evaluate their proposed algorithm empirically, any compar-
ison attempts to their results becomes cumbersome. Although both studies are
close to our approach, we use a different model, namely RWR, in which we explic-
itly include friendships in our dataset and perform collaborative recommendations
instead of queries on the graph.
11
2.3. Sentiment Intensity Analysis of Informal Texts [31] 12
2.3 Sentiment Intensity Analysis of Informal Texts
[31]
The proliferation of social networks such as blogs, forums and other online means
of expression and communication have resulted in a landscape where people are
able to freely discuss online through a variety of means and applications.
Probably one of the most novel and interesting way of communication in cy-
berspace is through 3D virtual environments. In such environments, people, rep-
resented by their avatars, socialize and interact with each other and with virtual
humans operated by machines i.e., computer systems.
Despite the fact that the graphics of those environments remain relatively poor,
futuristic movies such as Avatar [32] provide an example of sophisticated land-
scapes and renderings that will be attainable by such environments in the fore-
seeable future. However, regardless of how attractive and realistic such artificial
3D worlds become, they will always remain heavily dependant on the quality of
human communication that takes place within them. As shown in [33] [34] [35],
communication in environments that are not limited to one, textual modality,
consists of not just semantic data transfer, but also of dense non-verbal commu-
nication where sentiment plays an important role. Moreover, without emotion
no consistent and coherent (virtual) body language is possible. Such primordial
movements include facial expressions, eye looks, arm-language coordination, etc.
Sentiment detection from textual utterances can play an important role in the
development of realistic and interactive dialog systems. Such systems serve var-
ious educational, business or entertainment oriented functions and also include
systems that are deployed in 3D virtual environments. With the aid of dialog
coherence” modules, conversational systems aim at a realistic interaction flow at
the emotional level e.g., Affect Listeners [36] and can greatly benefit from the
correct identification of the emotional state of their participants. Taking into
consideration that the majority of input to practical conversational systems con-
stitute of short, informal, textual exchanges, it is essential that the sentiment
analysis component integrated in the dialog system is able to cope with this type
of informal, often incomplete or ill-formed type of communication.
Sentiment analysis, the process of automatically detecting if a text segment con-
12
2.3. Sentiment Intensity Analysis of Informal Texts [31] 13
tains emotional or opinionated content and extracting its polarity or valence, is
a field of research that has received significant attention in recent years, both in
academia and in industry. The aforementioned increase of user-generated con-
tent on the web has resulted in a wealth of information that is potentially of vital
importance to institutions and companies, providing them with data to research
their consumers, manage their reputations and identify new opportunities. As
a result, most of the research in the field has been limited to product reviews,
where the aim is to predict whether the reviewer recommends a product or not,
based on the textual content of the review.
The focus of this paper is different. Instead of focusing our attention to prod-
uct reviews, we explore a more ubiquitous field of informal, social interactions in
cyberspace. The unprecedented popularity of social platforms such as Facebook,
Twitter, MySpace as well as 3D virtual worlds has resulted in an unparallel in-
crease of textual exchanges that remains relatively unexplored especially in terms
of its emotional content.
Specifically, Paltoglou et al. [31] aim to answer the following question: can lexicon-
based approaches perform more effectively than machine-learning approaches in
this domain? This question is particularly important, because previous research
in sentiment analysis using product reviews has shown that machine-learning ap-
proaches typically outperform lexicon-based ones but no exploration of whether
the same holds for informal, social interactions has been carried in the past. The
difference between the two domains is numerous. Firstly, reviews tend to be
longer and more verbose than typical social interactions which may only be a
few words long and often contain significant spelling errors [37]. Secondly, no
clear “golden standard” exists in the domain of informal communications with
which to train a machine-learning classifier in opposition to the “thumbs up” or
“thumbs down” feature of reviews. Lastly, social exchanges on the web tend to
be much more diverse in terms of their topics with issues ranging from politics
and recent news to religion while in contrast; product reviews by definition have
a specific subject, i.e. the product under discussion. The study of emotional
and social interactions in virtual worlds implies the study of virtual human (VH)
behaviors. Two types of VH exist: avatars (i.e. the projection of a real human in
the 3D environment) and agents (i.e. the projection of an autonomous machine
13
2.3. Sentiment Intensity Analysis of Informal Texts [31] 14
simulating a human in the virtual world). These VH types result in three possible
types of communications: avatar to avatar, agent to agent and avatar to agent.
Each one of those has the following interesting aspects respectively:
- A non verbal body language based on VH emotional states and mind profile.
- A potential visualization of the interaction from a third VH that should be
represented by an avatar.
- A non-verbal communication for the human representation and an action of
agent strongly influenced by interpreted emotions from the avatar. It
seems only logical that artificial intelligence and conversation systems would
strongly benefit these aspects in order to make the communication more re-
alistic. The structure of this paper is as follows. The next section provides
a brief overview of relevant work in sentiment analysis. Section 3 presents
the lexicon based classifier and section 4 presents the two machine-learning
classifiers that will be used in this study. Section 5 describes the data sets
that were used and explains the experimental setup while section 6 presents
and analyzes the results.
Finally, Paltoglou et al. [31] conclude and present some potential future directions
of research. Sentiment analysis, also known as opinion mining, has known con-
siderable interest recently. Most research has focused on analyzing the content
of either movie or general product reviews (e.g. [38]). Attempts to expand the
application of sentiment analysis to other domains, such as debates [39], news and
blogs [40] are also prominent. The seminal book of Pang and Lee [41] presents a
thorough analysis of the work in the field. In this section we will focus on the more
prominent work which is relevant to our approach. Pang et al. [46] were amongst
of the first to explore the sentiment analysis of reviews, focusing on machine-
learning approaches. These approaches generally function as follows: initially, a
general inductive process learns the characteristics of a class during a training
phase, by observing the properties of a number of pre classified documents (i.e.
reference corpus ) and applies the acquired knowledge to determine the best cat-
egory for new, unseen documents, during testing. Pang et al. [46] experimented
14
2.3. Sentiment Intensity Analysis of Informal Texts [31] 15
with three different algorithms: Support Vector Machines (SVMs), Naive Bayes
and Maximum Entropy classifiers, using a variety of features, such as unigrams
and bigrams, part-of-speech tags, binary and term frequency feature weights and
others. Their best attained accuracy in a dataset consisting of movie reviews, was
attained using a SVM classifier with binary features, although all three classifiers
gave very comparable performance. Other approaches (e.g. [42] [43]) have focused
on extending the feature set with semantically or linguistically-driven features
in order to improve classification accuracy. Dictionary/lexicon-based sentiment
analysis is typically based on lists of words with some sort of pre-determined
emotional weight. Examples of such dictionaries include the General Inquirer
(GI) dictionary [44] and the “Linguistic Inquiry and Word Count” (LIWC) soft-
ware [45], which are also used in the present study. Both lexicons are build with
the aid of experts that classify certain tokens in terms of their affective content
(e.g. positive or negative). The “Affective Norms for English Words” (ANEW)
lexicon [46] contains ratings of terms on a nine-point scale in regard to three
individual dimensions: valence, arousal and dominance. The ratings were pro-
duced manually by psychology class students. Ways to produce such emotional
dictionaries in an automatic or semi-automatic fashion have also been introduced
in research [47]. Emotional dictionaries have mostly been utilized in psychology
or sociology oriented research [48].
The idea of emotional conversationalists is relatively old. First attempts to create
such a system can be traced back to Parry [49], a chatterbot intended for studying
the nature of paranoia and able to express fears, anxieties or beliefs. More recent
work include research on the development of synthetic characters and chatterbots
with personalities [50] and studies on emotional responses and their influence on
the creation of believable agents or interactive virtual personalities [51]. In [52]
authors focused on the role of emotions for gaining rapport in spoken dialog sys-
tems by rendering responses that contain suitable emotion, both lexically and
auditory. Studies on the role of facial expressions in building rapport in a virtual
human-users interactions were conducted in [53]. A chatterbot system that gen-
erates emotional responses by selecting and displaying expressive images of the
character emulated by the chatterbot was presented in [54]. It has been almost
two decades that emotional communication for virtual worlds is a challenging
15
2.4. Big Five modeling [1] 16
research field. One of the pioneer paper has been proposed by Cassel et al. [55].
In the proposed system, conversations between multiple human-like agents were
automatically generates and animates with appropriate and synchronized speech,
intonation, facial expressions, and hand gestures proposed numerous ways to
design personality and emotion models for virtual humans. More recently, pre-
dicted a specific personality and emotional states from hierarchical fuzzy rules to
facilitate personality and emotion control, and in 2009, Pelachaud et al. [56] de-
veloped a model of behavior expressivity using a set of six parameters that act as
modulation of behavior animation. Finally, this year, [35] introduced a graphical
representation of human emotion extracted from text sentences. The main con-
tributions of that approach included an original pipeline that extracts, processes,
and renders emotion of 3D VH. Additionally, the paper presented methods to
optimize the computational pipeline so that real time virtual reality rendering
can be achieved on common PCs. Lastly, it was demonstrated how the Poisson
distribution can be utilized to transfer database extracted lexical and language
parameters into coherent intensities of valence and arousal (i.e. parameters of
Russell’s circumplex model of emotion).
2.4 Big Five modeling [1]
At present, many researchers believe that there are five core personality traits
and the evidence of this theory has been growing over the past 50 years [1]. From
the point of view of a sociologist, social media can be characterized as collective
goods produced through computer-mediated collective action [57]. While people
of each category have different attitude corresponding sites, taste of products,
different skill to accomplish work. The five factors are Extraversion, Agreeable-
ness, Conscientiousness, Neuroticism and Openness [58]. The people of different
categories have different ways to express their thoughts and OSN users have dif-
ferent level of significance to express their thoughts or behavior [1] [4]. The users
of OSN can be categorize according to Big Five factors. The behavior of an OSN
user varies from users location to location but there is a similarity having same
behavior in people from same or nearby location [59]. Behavior also varies from
16
2.4. Big Five modeling [1] 17
different aged people.
The personality traits used in the 5 factor model are Extraversion, Agreeableness,
Conscientiousness, Neuroticism and Openness to experience [58]. It is important
to ignore the positive or negative associations that these words have in everyday
language. For example, Agreeableness is obviously advantageous for achieving
and maintaining popularity. Agreeable people are better liked than disagreeable
people. On the other hand, agreeableness is not useful in situations that require
tough or totally objective decisions. Disagreeable people can make excellent sci-
entists, critics, or soldiers. Remember, none of the five traits is in themselves
positive or negative, they are simply characteristics that individuals exhibit to a
greater or lesser extent.
Each of these 5 personality traits describes, relative to other people, the frequency
or intensity of a person’s feelings, thoughts, or behaviors. Everyone possesses all
5 of these traits to a greater or lesser degree. For example, two individuals could
be described as agreeable (agreeable people value getting along with others). But
there could be significant variation in the degree to which they are both agree-
able. In other words, all 5 personality traits exist on a continuum rather than as
attributes that a person does or does not have.
Each of the Big Five personality traits is made up of 6 facets or sub traits. These
can be assessed independently of the trait that they belong to.
• Extraversion
Extraversion is marked by pronounced engagement with the external world.
Extraverts enjoy being with people, are full of energy, and often experience
positive emotions. They tend to be enthusiastic, action-oriented, individu-
als who are likely to say “Yes!” or “Let’s go!” to opportunities for excite-
ment. In groups they like to talk, assert themselves, and draw attention to
themselves. Introverts lack the exuberance, energy, and activity levels of
extraverts. They tend to be quiet, low-key, deliberate, and disengaged from
the social world. Their lack of social involvement should not be interpreted
as shyness or depression; the introvert simply needs less stimulation than
an extravert and prefers to be alone. The independence and reserve of the
introvert is sometimes mistaken as unfriendliness or arrogance. In reality,
17
2.4. Big Five modeling [1] 18
an introvert who scores high on the agreeableness dimension will not seek
others out but will be quite pleasant when approached.
Extraversion Facets:
– Friendliness. Friendly people genuinely like other people and openly
demonstrate positive feelings toward others. They make friends quickly
and it is easy for them to form close, intimate relationships. Low scor-
ers on Friendliness are not necessarily cold and hostile, but they do
not reach out to others and are perceived as distant and reserved.
– Gregariousness. Gregarious people find the company of others pleas-
antly stimulating and rewarding. They enjoy the excitement of crowds.
Low scorers tend to feel overwhelmed by, and therefore actively avoid,
large crowds. They do not necessarily dislike being with people some-
times, but their need for privacy and time to themselves is much greater
than for individuals who score high on this scale.
– Assertiveness. High scorers Assertiveness like to speak out, take charge,
and direct the activities of others. They tend to be leaders in groups.
Low scorers tend not to talk much and let others control the activities
of groups.
– Activity Level. Active individuals lead fast-paced, busy lives. They
move about quickly, energetically, and vigorously, and they are in-
volved in many activities. People who score low on this scale follow a
slower and more leisurely, relaxed pace.
– Excitement-Seeking. High scorers on this scale are easily bored with-
out high levels of stimulation. They love bright lights and hustle and
bustle. They are likely to take risks and seek thrills. Low scorers are
overwhelmed by noise and commotion and are adverse to thrill-seeking.
– Cheerfulness. This scale measures positive mood and feelings, not neg-
ative emotions (which are a part of the Neuroticism domain). Persons
who score high on this scale typically experience a range of positive
18
2.4. Big Five modeling [1] 19
feelings, including happiness, enthusiasm, optimism, and joy. Low
scorers are not as prone to such energetic, high spirits.
• Agreeableness
Agreeableness reflects individual differences in concern with cooperation
and social harmony. Agreeable individuals value getting along with others.
They are therefore considerate, friendly, generous, helpful, and willing to
compromise their interests with others’. Agreeable people also have an op-
timistic view of human nature. They believe people are basically honest,
decent, and trustworthy. Disagreeable individuals place self-interest above
getting along with others. They are generally unconcerned with others’
well-being, and therefore are unlikely to extend themselves for other peo-
ple. Sometimes their skepticism about others’ motives causes them to be
suspicious, unfriendly, and uncooperative. Agreeableness is obviously ad-
vantageous for attaining and maintaining popularity. Agreeable people are
better liked than disagreeable people. On the other hand, agreeableness is
not useful in situations that require tough or absolute objective decisions.
Disagreeable people can make excellent scientists, critics, or soldiers.
Agreeableness Facets:
– Trust. A person with high trust assumes that most people are fair,
honest, and have good intentions. Persons low in trust may see others
as selfish, devious, and potentially dangerous.
– Morality. High scorers on this scale see no need for pretence or ma-
nipulation when dealing with others and are therefore candid, frank,
and sincere. Low scorers believe that a certain amount of deception in
social relationships is necessary. People find it relatively easy to relate
to the straightforward high-scorers on this scale. They generally find
it more difficult to relate to the low-scorers on this scale. It should be
made clear that low scorers are not unprincipled or immoral; they are
simply more guarded and less willing to openly reveal the whole truth.
19
2.4. Big Five modeling [1] 20
– Altruism. Altruistic people find helping other people genuinely re-
warding. Consequently, they are generally willing to assist those who
are in need. Altruistic people find that doing things for others is a
form of self-fulfillment rather than self-sacrifice. Low scorers on this
scale do not particularly like helping those in need. Requests for help
feel like an imposition rather than an opportunity for self-fulfillment.
– Cooperation. Individuals who score high on this scale dislike con-
frontations. They are perfectly willing to compromise or to deny their
own needs in order to get along with others. Those who score low on
this scale are more likely to intimidate others to get their way.
– Modesty. High scorers on this scale do not like to claim that they are
better than other people. In some cases this attitude may derive from
low self-confidence or self-esteem. Nonetheless, some people with high
self-esteem find immodesty unseemly. Those who are willing to de-
scribe themselves as superior tend to be seen as disagreeably arrogant
by other people.
– Sympathy. People who score high on this scale are tender-hearted and
compassionate. They feel the pain of others vicariously and are easily
moved to pity. Low scorers are not affected strongly by human suf-
fering. They pride themselves on making objective judgments based
on reason. They are more concerned with truth and impartial justice
than with mercy.
• Conscientiousness
Conscientiousness concerns the way in which we control, regulate, and direct
our impulses. Impulses are not inherently bad; occasionally time constraints
require a snap decision, and acting on our first impulse can be an effective
response. Also, in times of play rather than work, acting spontaneously
and impulsively can be fun. Impulsive individuals can be seen by others as
colorful and fun-to-be-with.
Nonetheless, acting on impulse can lead to trouble in a number of ways.
Some impulses are antisocial. Uncontrolled antisocial acts not only harm
20
2.4. Big Five modeling [1] 21
other members of society, but also can result in retribution toward the
perpetrator of such impulsive acts. Another problem with impulsive acts is
that they often produce immediate rewards but undesirable, long-term con-
sequences. Examples include excessive socializing that leads to being fired
from one’s job, hurling an insult that causes the breakup of an important
relationship, or using pleasure-inducing drugs that eventually destroy one’s
health.
Impulsive behavior, even when not seriously destructive, diminishes a per-
son’s effectiveness in significant ways. Acting impulsively disallows con-
templating alternative courses of action, some of which would have been
wiser than the impulsive choice. Impulsivity also sidetracks people during
projects that require organized sequences of steps or stages. Accomplish-
ments of an impulsive person are therefore small, scattered, and inconsis-
tent.
A hallmark of intelligence, what potentially separates human beings from
earlier life forms, is the ability to think about future consequences before
acting on an impulse. Intelligent activity involves contemplation of long-
range goals, organizing and planning routes to these goals, and persisting
toward one’s goals in the face of short-lived impulses to the contrary. The
idea that intelligence involves impulse control is nicely captured by the term
prudence, an alternative label for the Conscientiousness domain. Prudent
means both wise and cautious. Persons who score high on the Conscien-
tiousness scale are, in fact, perceived by others as intelligent.
The benefits of high conscientiousness are obvious. Conscientious individ-
uals avoid trouble and achieve high levels of success through purposeful
planning and persistence. They are also positively regarded by others as
intelligent and reliable. On the negative side, they can be compulsive perfec-
tionists and workaholics. Furthermore, extremely conscientious individuals
might be regarded as stuffy and boring. Unconscientious people may be
criticized for their unreliability, lack of ambition, and failure to stay within
the lines, but they will experience many short-lived pleasures and they will
never be called stuffy.
21
2.4. Big Five modeling [1] 22
Conscientiousness Facets:
– Self-Efficacy. Self-Efficacy describes confidence in one’s ability to ac-
complish things. High scorers believe they have the intelligence (com-
mon sense), drive, and self-control necessary for achieving success. Low
scorers do not feel effective, and may have a sense that they are not in
control of their lives.
– Orderliness. Persons with high scores on orderliness are well-organized.
They like to live according to routines and schedules. They keep lists
and make plans. Low scorers tend to be disorganized and scattered.
– Dutifulness. This scale reflects the strength of a person’s sense of duty
and obligation. Those who score high on this scale have a strong sense
of moral obligation. Low scorers find contracts, rules, and regulations
overly confining. They are likely to be seen as unreliable or even
irresponsible.
– Achievement-Striving. Individuals who score high on this scale strive
hard to achieve excellence. Their drive to be recognized as successful
keeps them on track toward their lofty goals. They often have a strong
sense of direction in life, but extremely high scores may be too single-
minded and obsessed with their work. Low scorers are content to get
by with a minimal amount of work, and might be seen by others as
lazy.
– Self-Discipline. What many people call will-power refers to the ability
to persist at difficult or unpleasant tasks until they are completed.
People who possess high self-discipline are able to overcome reluctance
to begin tasks and stay on track despite distractions. Those with low
self-discipline procrastinate and show poor follow-through, often failing
to complete tasks-even tasks they want very much to complete.
– Cautiousness. Cautiousness describes the disposition to think through
possibilities before acting. High scorers on the Cautiousness scale take
their time when making decisions. Low scorers often say or do first
22
2.4. Big Five modeling [1] 23
thing that comes to mind without deliberating alternatives and the
probable consequences of those alternatives.
• Neuroticism
The term neurosis is used to describe a condition marked by mental distress,
emotional suffering, and an inability to cope effectively with the normal de-
mands of life. It is suggested that everyone shows some signs of neurosis,
but that we differ in our degree of suffering and our specific symptoms of
distress. Today neuroticism refers to the tendency to experience negative
feelings. Those who score high on Neuroticism may experience primarily
one specific negative feeling such as anxiety, anger, or depression, but are
likely to experience several of these emotions. People high in neuroticism
are emotionally reactive. They respond emotionally to events that would
not affect most people, and their reactions tend to be more intense than
normal. They are more likely to interpret ordinary situations as threaten-
ing, and minor frustrations as hopelessly difficult. Their negative emotional
reactions tend to persist for unusually long periods of time, which means
they are often in a bad mood. These problems in emotional regulation can
diminish a neurotic’s ability to think clearly, make decisions, and cope ef-
fectively with stress.
At the other end of the scale, individuals who score low in neuroticism are
less easily upset and are less emotionally reactive. They tend to be calm,
emotionally stable, and free from persistent negative feelings. Freedom from
negative feelings does not mean that low scorers experience a lot of positive
feelings; frequency of positive emotions is a component of the Extraversion
domain.
Neuroticism Facets:
– Anxiety. The ”fight-or-flight” system of the brain of anxious individ-
uals is too easily and too often engaged. Therefore, people who are
high in anxiety often feel like something dangerous is about to happen.
23
2.4. Big Five modeling [1] 24
They may be afraid of specific situations or be just generally fearful.
They feel tense, jittery, and nervous.
– Anger. Persons who score high in Anger feel enraged when things do
not go their way. They are sensitive about being treated fairly and
feel resentful and bitter when they feel they are being cheated. This
scale measures the tendency to feel angry; whether or not the person
expresses annoyance and hostility depends on the individual’s level on
Agreeableness. Low scorers do not get angry often or easily.
– Depression. This scale measures the tendency to feel sad, dejected,
and discouraged. High scorers lack energy and have difficult initiating
activities. Low scorers tend to be free from these depressive feelings.
– Self-Consciousness. Self-conscious individuals are sensitive about what
others think of them. Their concern about rejection and ridicule cause
them to feel shy and uncomfortable abound others. They are eas-
ily embarrassed and often feel ashamed. Their fears that others will
criticize or make fun of them are exaggerated and unrealistic, but
their awkwardness and discomfort may make these fears a self-fulfilling
prophecy. Low scorers, in contrast, do not suffer from the mistaken
impression that everyone is watching and judging them. They do not
feel nervous in social situations.
– Immoderation. Immoderate individuals feel strong cravings and urges
that they have difficulty resisting. They tend to be oriented toward
short-term pleasures and rewards rather than long-term consequences.
Low scorers do not experience strong, irresistible cravings and conse-
quently do not find themselves tempted to overindulge.
– Vulnerability. High scorers on Vulnerability experience panic, confu-
sion, and helplessness when under pressure stress. Low scorers feel
more poised, confident, and clear-thinking when stressed.
24
2.4. Big Five modeling [1] 25
• Openness to Experience
Openness to Experience describes a dimension of cognitive style that dis-
tinguishes imaginative, creative people from down-to-earth, conventional
people. Open people are intellectually curious, appreciative of art, and
sensitive to beauty. They tend to be, compared to closed people, more
aware of their feelings. They tend to think and act in individualistic and
nonconforming ways. Intellectuals typically score high on Openness to Ex-
perience; consequently, this factor has also been called Culture or Intellect.
Nonetheless, Intellect is probably best regarded as one aspect of openness
to experience. Scores on Openness to Experience are only modestly related
to years of education and scores on standard intelligent tests.
Another characteristic of the open cognitive style is a facility for thinking in
symbols and abstractions far removed from concrete experience. Depend-
ing on the individual’s specific intellectual abilities, this symbolic cognition
may take the form of mathematical, logical, or geometric thinking, artistic
and metaphorical use of language, music composition or performance, or
one of the many visual or performing arts. People with low scores on open-
ness to experience tend to have narrow, common interests. They prefer the
plain, straightforward, and obvious over the complex, ambiguous, and sub-
tle. They may regard the arts and sciences with suspicion, regarding these
endeavors as abstruse or of no practical use. Closed people prefer familiar-
ity over novelty; they are conservative and resistant to change. Openness
is often presented as healthier or more mature by psychologists, who are
often themselves open to experience. However, open and closed styles of
thinking are useful in different environments. The intellectual style of the
open person may serve a professor well, but research has shown that closed
thinking is related to superior job performance in police work, sales, and a
number of service occupations.
Openness to Experience Facets:
– Imagination. To imaginative individuals, the real world is often too
25
2.4. Big Five modeling [1] 26
plain and ordinary. High scorers on this scale use fantasy as a way of
creating a richer, more interesting world. Low scorers are on this scale
are more oriented to facts than fantasy.
– Artistic Interests. High scorers on this scale love beauty, both in art
and in nature. They become easily involved and absorbed in artistic
and natural events. They are not necessarily artistically trained or
talented, although many will be. The defining features of this scale
are interest in, and appreciation of natural and artificial beauty. Low
scorers lack aesthetic sensitivity and interest in the arts.
– Emotionality. Persons high on Emotionality have good access to and
awareness of their own feelings. Low scorers are less aware of their
feelings and tend not to express their emotions openly.
– Adventurousness. High scorers on adventurousness are eager to try
new activities, travel to foreign lands, and experience different things.
They find familiarity and routine boring, and will take a new route
home just because it is different. Low scorers tend to feel uncomfort-
able with change and prefer familiar routines.
– Intellect. Intellect and artistic interests are the two most important,
central aspects of openness to experience. High scorers on Intellect love
to play with ideas. They are open-minded to new and unusual ideas,
and like to debate intellectual issues. They enjoy riddles, puzzles, and
brain teasers. Low scorers on Intellect prefer dealing with people or
things rather than ideas. They regard intellectual exercises as a waste
of time. Intellect should not be equated with intelligence. Intellect
is an intellectual style, not an intellectual ability, although high scor-
ers on Intellect score slightly higher than low-Intellect individuals on
standardized intelligence tests.
– Liberalism. Psychological liberalism refers to a readiness to challenge
authority, convention, and traditional values. In its most extreme
form, psychological liberalism can even represent outright hostility to-
ward rules, sympathy for law-breakers, and love of ambiguity, chaos,
and disorder. Psychological conservatives prefer the security and sta-
26
2.4. Big Five modeling [1] 27
bility brought by conformity to tradition. Psychological liberalism and
conservatism are not identical to political affiliation, but certainly in-
cline individuals toward certain political parties.
It is possible, although unusual, to score high in one or more facets of a per-
sonality trait and low in other facets of the same trait. For example, you could
score highly in Imagination, Artistic Interests, Emotionality and Adventurous-
ness, but score low in Intellect and Liberalism.
27
Chapter 3
Research Questions
The main objective of this paper is to draw user’s virtual behavior model by an-
alyzing his/her OSN existence and to recommend products to the user on basis
of the user’s behavior model. To reach our main goal, we need to consider few
sub objectives, such as collecting user’s social network activities, analyizing the
user’s activity for few days, categorize the user’s activity in Big Five factors, rec-
ommending some services or products to the user on basis of the user’s behavior
model.
In order to fulfill our objectives some research questions will arise. The main
research question of this paper is: How to categorize users of OSN according to
Big Five factors from their behaviours in OSN? The sub research questions are
1. How do OSN(Online Social Networks) represent one user?
2. How can we analysis user behavior ?
3. How to categorize user behavior in Big Five factors?
28
Chapter 4
Proposed Research Methodology
In this paper our aim is to make relationship among text corpus from social
network with psychological theory of personality. We will also try to imple-
ment a recommendation system based on behavior analysis. So correlational and
exploratory methodologies are used in this paper where our concept is Behav-
ior indicator in Big Five Modeling and variables are Extraversion, Neuroticism,
Agreeableness, Openness and Conscientiousness.
• 4.1 Data Collection: In this research to categorize user’s behavior the big
data is collected. The data is collected from OSN(Twitter). The data is
stored in OSN by user’s activities such as posts by the user, posts by the
user’s friends, liked pages etc. The collected data is the public data so there
is no barrier to use these data. At a time a user’s previous 30 days data
will be collected. Data will be directly collected by the system from OSN
by full user authorization. After collecting data it will be stored in system
database with security.
Twitter, a social network site, can be used for sentiment analysis as it has
a very large number of short messages created by its users [60]. So we used
Twitter to collect users’ data. Using Twitter REST api 1.1, we collected
public tweets and re-tweets. Our twitter app requires users to authorize
the app for extracting data from their profiles. The twitter app will not
collect data if users do not allow it to run. We made sure all data we
29
30
USER
LIWC
Mapping
OSN(Twitter)
Twitter API
Represents
Figure 4.1. Modeling User Behavior
extract from twitter is public data. By calling get statuses/user timeline
and get statuses/retweets of me methods we can collect the user’s tweets
and retweets. The system can also collect public data from profiles that the
user is currently following by using get friends/ids method. The data we
collected are in json format and our twitter app can write the data to text
files. As separated files are easier to use we separated each user’s data file
by using user’s unique identifier- userid or username.
30
31
• 4.2 Data Analysis: Text file which contain past data of a single user is an-
alyzed through LIWC (Linguistic Inquiry and Word Count). It is a text
analysis software program designed by James W. Pennebaker, Roger J.
Booth and Martha E. Each text file analyzed by LIWC2007 can be treated
as a whole or broken into segments. It counts the words according to its
dictionary. After finishing this process it saves in a specified file where the
result is written on the below corresponding its category. Where, these
categories indicate different aspects of Big Five factors. On basis of these
results the modelling is implemented. The data table is given below which
shows which category lies in which factor.
Table 4.1. Relationship between LIWC categories and Big Five factors
Big Five factors LIWC Categories
Extraversion Social process, Family, Friends, Humans, Affec-
tive, Biological process, Sexual, Achievement
Openness to Experience Leisure, Insight, Body, Ingestion
Neuroticism Swear words, Negation, Negative emotion, Anger
, Sadness, Sexual
Conscientiousness Relativity, Motion, Space, Time, Religion, Death,
Money, Certainty
Agreeableness Positive Emotion, Feel, Discrepancy(would), Ten-
tative(maybe), Hear
The collected data is analyzed by LIWC to split every sentence. Then
according to the Big Five factors and the meaning and the use of words
there will be a percentage marking. After marking the percentage will be
summed and the higher marking category will be taken as user behavior.
31
32
• 4.3 Results Result of total counted words provided by LIWC is in percentage.
LIWC gives the result in such way:
result=(TC*100)/WC Where WC = total words in text file. TC = total
words in category.
The opposite method is used to know the exact number of words. Where,
TC=(result*100)/WC
Then which categories lie in same factor of the Big Five factors, values
of those categories are summed using linear regression formula. Linear
regression f(X)=X1+X2+X3+. . . +Xi
We used percentaged value of each factor.
Percentage formula part/whole=%/100
These results are used to draw the pie chart using EXCEL.
Example:
Figure 4.2. Pie Chart of LIWC Results
32
33
USER
Figure 4.3. Personality Based Recommendation System
• 4.4 Recommendation Analysis: Depending on the behavior analysis some
brands of products are suggested or recommended to users. Major percent-
age of behavior can influence one to like a particular type of products. There
are some examples given in table below which show majority of people hav-
ing a particular behavior have interest on a particular brand or product or
service. The following tables show some examples of recommendations.
33
34
As for example user A, B and C are followers of Age of Empires game
page in Twitter. After analyzing their tweets and retweets, machine maps
their behavior and it seems that major part of their behavior is extrovert.
And now after analyzing the tweets and retweets of user X if machine finds
that majority of his behavior is influenced by extroversion then we can
recommend him games like Age of Empires.
Table 4.2. Products under Big Five factors
Big Five Factors Product Categories/Brands
Video Games
Extraversion Strategy(Age of Empires, Commandos)
Openness to Experience Racing(Need for Speed)
Neuroticism Shooting(Call of duty, Counter Strike)
Conscientiousness Chess, Sudoku
Agreeableness Sports(Fifa)
Table 4.3. Products under Big Five factors
Big Five Factors Product Categories/Brands
Movies
Extraversion Political, Fantasy, Family
Openness to Experience Comedy, Sports, Drama
Neuroticism Crime scene, Action, Horror
Conscientiousness Political, Historical, Conspiracy
Agreeableness Romantic, Drama
Table 4.4. Products under Big Five factors
Big Five Factors Product Categories/Brands
Music
Extraversion Rock
Openness to Experience Classical, Vocal, Country wood
Neuroticism Pop, Heavy Metal
Conscientiousness New Released, Historic
Agreeableness Romantic, Country
34
35
Table 4.5. Products under Big Five factors
Big Five Factors Product Categories/Brands
Food
Extraversion Bead, Meat
Openness to Experience Multicultural Food, Pizza
Neuroticism Fast Food
Conscientiousness Salad, Vegetable
Agreeableness Bread, Cheese
Table 4.6. Products under Big Five factors
Big Five Factors Product Categories/Brands
Beverage
Extraversion Coffee, Tea
Openness to Experience Milkshake, Green Tea
Neuroticism Soft Drinks
Conscientiousness Green tea, Black Coffee
Agreeableness coffee, tea, soft Drinks
Table 4.7. Products under Big Five factors
Big Five Factors Product Categories/Brands
Sports
Extraversion Football, Athletics
Openness to Experience Cricket, Swim
Neuroticism Boxing, Rugby, Marshal arts
Conscientiousness Athletics, Marshal arts
Agreeableness Gymnastics
35
Chapter 5
Conclusions
In our thesis we proved that personality can be automated through analyzing
language cues. There has been little work done regarding to this field and to
the very best of our knowledge our research is one of the very first researches to
examine the recognition of personality and to introduce recommendation system
based on sentiment analysis results. During our research we realized that feature
selection is one of the most important tasks, as some of the best models only
contain a small subset of all feature set.
LIWC features are beneficial for all traits. For all recognition tasks we an-
alyzed the influence of the most relevant individual features in specific models.
We also used Stanford NLP (natural language processing) application to analyze
and split the texts. Later we only used LIWC because it generates more accurate
results than Standard NLP for our data analysis.
At this moment our system can only use text information. But in future our
system will be able to analyze data from shared links or videos. Our system
cannot identify quotations (which user uses to share others speech). The system
lacks the ability to understand double negatives in a sentence. For example: “The
service of Samsung Galaxy S3 is not very bad”.
There is a big scope of analyzing exclamatory sentences or smileys(sentimental
expressions). Our system can not understand sarcastic behavior at this moment.
Recommendation system on brands depends more accurately on percentage of
36
37
Big Five factors. Depth of measuring and scale of marking will be more efficient.
37
Bibliography
[1] K. Cherry, “The big five personality dimensions,” 2012. Accessed: 2010-09-
30.
[2] “Facebook.com.” Accessed: 2014-06-01.
[3] “Twitter.com.” Accessed: 2014-06-01.
[4] J. Bao, Y. Zheng, and M. F. Mokbel, “Location-based and preference-aware
recommendation using sparse geo-social networking data,” in Proceedings of
the 20th International Conference on Advances in Geographic Information
Systems, pp. 199–208, ACM, 2012.
[5] A. M. Ferman, J. H. Errico, P. v. Beek, and M. I. Sezan, “Content-based
filtering and personalization using structured metadata,” in Proceedings of
the 2nd ACM/IEEE-CS joint conference on Digital libraries, pp. 393–393,
ACM, 2002.
[6] “Amazon.com.” Accessed: 2014-04-01.
[7] “Netflix.com.” Accessed: 2014-04-01.
[8] F. Benevenuto, T. Rodrigues, M. Cha, and V. Almeida, “Characterizing user
behavior in online social networks,” in Proceedings of the 9th ACM SIG-
COMM conference on Internet measurement conference, pp. 49–62, ACM,
2009.
[9] N. O. Report, “Social networks and blogs now 4th most popular online ac-
tivity.”
38
BIBLIOGRAPHY 39
[10] Y. Zheng, “Location-based social networks: Users,” in Computing with Spa-
tial Trajectories, pp. 243–276, Springer, 2011.
[11] “Flickr.com.” Accessed: 2014-04-01.
[12] “Foursquare.com.” Accessed: 2014-01-01.
[13] X. Cao, G. Cong, and C. S. Jensen, “Mining significant semantic loca-
tions from gps data,” Proceedings of the VLDB Endowment, vol. 3, no. 1-2,
pp. 1009–1020, 2010.
[14] Y. Zheng, L. Zhang, X. Xie, and W.-Y. Ma, “Mining interesting locations
and travel sequences from gps trajectories,” in Proceedings of the 18th inter-
national conference on World wide web, pp. 791–800, ACM, 2009.
[15] Q. Li, Y. Zheng, X. Xie, Y. Chen, W. Liu, and W.-Y. Ma, “Mining user sim-
ilarity based on location history,” in Proceedings of the 16th ACM SIGSPA-
TIAL international conference on Advances in geographic information sys-
tems, p. 34, ACM, 2008.
[16] X. Xiao, Y. Zheng, Q. Luo, and X. Xie, “Finding similar users using category-
based location history,” in Proceedings of the 18th SIGSPATIAL Interna-
tional Conference on Advances in Geographic Information Systems, pp. 442–
445, ACM, 2010.
[17] W. Liu, Y. Zheng, S. Chawla, J. Yuan, and X. Xing, “Discovering spatio-
temporal causal interactions in traffic data streams,” in Proceedings of the
17th ACM SIGKDD international conference on Knowledge discovery and
data mining, pp. 1010–1018, ACM, 2011.
[18] Y. Zheng, Q. Li, Y. Chen, X. Xie, and W.-Y. Ma, “Understanding mobility
based on gps data,” in Proceedings of the 10th international conference on
Ubiquitous computing, pp. 312–321, ACM, 2008.
[19] L. Wang, Y. Zheng, X. Xie, and W.-Y. Ma, “A flexible spatio-temporal
indexing scheme for large-scale gps track retrieval,” in Mobile Data Man-
agement, 2008. MDM’08. 9th International Conference on, pp. 1–8, IEEE,
2008.
39
BIBLIOGRAPHY 40
[20] I. Konstas, V. Stathopoulos, and J. M. Jose, “On social networks and col-
laborative recommendation,” in Proceedings of the 32nd international ACM
SIGIR conference on Research and development in information retrieval,
pp. 195–202, ACM, 2009.
[21] J. L. Herlocker, J. A. Konstan, A. Borchers, and J. Riedl, “An algorithmic
framework for performing collaborative filtering,” in Proceedings of the 22nd
annual international ACM SIGIR conference on Research and development
in information retrieval, pp. 230–237, ACM, 1999.
[22] G. Adomavicius and A. Tuzhilin, “Toward the next generation of recom-
mender systems: A survey of the state-of-the-art and possible extensions,”
Knowledge and Data Engineering, IEEE Transactions on, vol. 17, no. 6,
pp. 734–749, 2005.
[23] H. Yildirim and M. S. Krishnamoorthy, “A random walk method for allevi-
ating the sparsity problem in collaborative filtering,” in Proceedings of the
2008 ACM conference on Recommender systems, pp. 131–138, ACM, 2008.
[24] G. Das, N. Koudas, M. Papagelis, and S. Puttaswamy, “Efficient sampling of
information in social networks,” in Proceedings of the 2008 ACM workshop
on Search in social media, pp. 67–74, ACM, 2008.
[25] H. Halpin, V. Robu, and H. Shepherd, “The complex dynamics of collabora-
tive tagging,” in Proceedings of the 16th international conference on World
Wide Web, pp. 211–220, ACM, 2007.
[26] S. B. Subramanya and H. Liu, “Socialtagger-collaborative tagging for blogs
in the long tail,” in Proceedings of the 2008 ACM workshop on Search in
social media, pp. 19–26, ACM, 2008.
[27] M. Strohmaier, “Purpose tagging: capturing user intent to assist goal-
oriented social search,” in Proceedings of the 2008 ACM workshop on Search
in social media, pp. 35–42, ACM, 2008.
40
BIBLIOGRAPHY 41
[28] N. Craswell and M. Szummer, “Random walks on the click graph,” in Pro-
ceedings of the 30th annual international ACM SIGIR conference on Re-
search and development in information retrieval, pp. 239–246, ACM, 2007.
[29] M. Clements, A. P. de Vries, and M. J. Reinders, “Optimizing single term
queries using a personalized markov random walk over the social graph,”
in Workshop on Exploiting Semantic Annotations in Information Retrieval
(ESAIR), 2008.
[30] A. Hotho, R. J¨aschke, C. Schmitz, and G. Stumme, Information retrieval in
folksonomies: Search and ranking. Springer, 2006.
[31] G. Paltoglou, S. Gobron, M. Skowron, M. Thelwall, and D. Thalmann, “Sen-
timent analysis of informal textual communication in cyberspace,” Proc. En-
gage, pp. 13–25, 2010.
[32] “Avatarmovie.com.” Accessed: 2014-04-01.
[33] A. Kappas, U. Hess, and K. R. Scherer, “6. voice and emotion,” Fundamen-
tals of nonverbal behavior, p. 200, 1991.
[34] P. Becheiraz and D. Thalmann, “A model of nonverbal communication
and interpersonal relationship between virtual actors,” in Computer Ani-
mation’96. Proceedings, pp. 58–67, IEEE, 1996.
[35] S. Gobron, J. Ahn, G. Paltoglou, M. Thelwall, and D. Thalmann, “From sen-
tence to emotion: a real-time three-dimensional graphics metaphor of emo-
tions extracted from text,” The Visual Computer, vol. 26, no. 6-8, pp. 505–
519, 2010.
[36] M. Skowron, “Affect listeners: Acquisition of affective states by means of
conversational systems,” in Development of Multimodal Interfaces: Active
Listening and Synchrony, pp. 169–181, Springer, 2010.
[37] M. Thelwall and D. Wilkinson, “Public dialogs in social network sites: What
is their purpose?,” Journal of the American Society for Information Science
and Technology, vol. 61, no. 2, pp. 392–404, 2010.
41
BIBLIOGRAPHY 42
[38] B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up?: sentiment classi-
fication using machine learning techniques,” in Proceedings of the ACL-02
conference on Empirical methods in natural language processing-Volume 10,
pp. 79–86, Association for Computational Linguistics, 2002.
[39] M. Thomas, B. Pang, and L. Lee, “Get out the vote: Determining support
or opposition from congressional floor-debate transcripts,” in Proceedings of
the 2006 conference on empirical methods in natural language processing,
pp. 327–335, Association for Computational Linguistics, 2006.
[40] I. Ounis, C. Macdonald, and I. Soboroff, “Overview of the trec-2008 blog
track,” tech. rep., DTIC Document, 2008.
[41] B. Pang and L. Lee, “Opinion mining and sentiment analysis,” Foundations
and trends in information retrieval, vol. 2, no. 1-2, pp. 1–135, 2008.
[42] T. Mullen and N. Collier, “Sentiment analysis using support vector machines
with diverse information sources.,” in EMNLP, vol. 4, pp. 412–418, 2004.
[43] C. Whitelaw, N. Garg, and S. Argamon, “Using appraisal groups for senti-
ment analysis,” in Proceedings of the 14th ACM international conference on
Information and knowledge management, pp. 625–631, ACM, 2005.
[44] T. Wilson, J. Wiebe, and P. Hoffmann, “Recognizing contextual polarity in
phrase-level sentiment analysis,” in Proceedings of the conference on human
language technology and empirical methods in natural language processing,
pp. 347–354, Association for Computational Linguistics, 2005.
[45] J. W. Pennebaker, M. E. Francis, and R. J. Booth, “Linguistic inquiry and
word count: Liwc 2001,” Mahway: Lawrence Erlbaum Associates, vol. 71,
p. 2001, 2001.
[46] M. Bradley and P. Lang, “Affective norms for english words (anew): Techni-
cal manual and affective ratings,” Gainesville, FL: The Center for Research
in Psychophysiology, University of Florida, 1999.
[47] J. Brooke, M. Tofiloski, and M. Taboada, “Cross-linguistic sentiment analy-
sis: From english to spanish.,” in RANLP, pp. 50–54, 2009.
42
BIBLIOGRAPHY 43
[48] R. B. Slatcher, C. K. Chung, J. W. Pennebaker, and L. D. Stone, “Winning
words: Individual differences in linguistic style among us presidential and
vice presidential candidates,” Journal of Research in Personality, vol. 41,
no. 1, pp. 63–75, 2007.
[49] K. M. Colby, S. Weber, and F. D. Hilf, “Artificial paranoia,” Artificial In-
telligence, vol. 2, no. 1, pp. 1–25, 1971.
[50] F. Barthelemy, B. Dosquet, S. Gries, and X. Magnant, “Believable synthetic
characters in a virtual emarket,” in Artificial Intelligence and Applications:
IASTED International Conference Proceedings, as part of the 22 nd IASTED
International Multi-Conference on Applied Informatics, 2004.
[51] J. Bates et al., “The role of emotion in believable agents,” Communications
of the ACM, vol. 37, no. 7, pp. 122–125, 1994.
[52] J. C. Acosta, “Using emotion to gain rapport in a spoken dialog system,”
in Proceedings of Human Language Technologies: The 2009 Annual Confer-
ence of the North American Chapter of the Association for Computational
Linguistics, Companion Volume: Student Research Workshop and Doctoral
Consortium, pp. 49–54, Association for Computational Linguistics, 2009.
[53] J. Gratch, N. Wang, J. Gerten, E. Fast, and R. Duffy, “Creating rapport
with virtual agents,” in Intelligent Virtual Agents, pp. 125–138, Springer,
2007.
[54] P. Turney and M. L. Littman, “Unsupervised learning of semantic orientation
from a hundred-billion-word corpus,” 2002.
[55] J. Cassell, C. Pelachaud, N. Badler, M. Steedman, B. Achorn, T. Becket,
B. Douville, S. Prevost, and M. Stone, “Animated conversation: rule-based
generation of facial expression, gesture & spoken intonation for multiple con-
versational agents,” in Proceedings of the 21st annual conference on Com-
puter graphics and interactive techniques, pp. 413–420, ACM, 1994.
[56] C. Pelachaud, “Studies on gesture expressivity for a virtual agent,” Speech
Communication, vol. 51, no. 7, pp. 630–639, 2009.
43
BIBLIOGRAPHY 44
[57] J. C. Ward and A. L. Ostrom, “The internet as information minefield:
an analysis of the source and content of brand information yielded by net
searches,” Journal of Business research, vol. 56, no. 11, pp. 907–914, 2003.
[58] S. Bai, T. Zhu, and L. Cheng, “Big-five personality prediction based on user
behaviors at social network sites,” arXiv preprint arXiv:1204.4809, 2012.
[59] M. Smith, V. Barash, L. Getoor, and H. W. Lauw, “Leveraging social context
for searching social media,” in Proceedings of the 2008 ACM workshop on
Search in social media, pp. 91–94, ACM, 2008.
[60] A. Pak and P. Paroubek, “Twitter as a corpus for sentiment analysis and
opinion mining.,” in LREC, 2010.
44

More Related Content

What's hot

IRJET- Sentiment Analysis using Machine Learning
IRJET- Sentiment Analysis using Machine LearningIRJET- Sentiment Analysis using Machine Learning
IRJET- Sentiment Analysis using Machine LearningIRJET Journal
 
Designing a recommender system based on social networks and location based se...
Designing a recommender system based on social networks and location based se...Designing a recommender system based on social networks and location based se...
Designing a recommender system based on social networks and location based se...IJMIT JOURNAL
 
TSRC Discussion Paper E: btr11
TSRC Discussion Paper E:  btr11TSRC Discussion Paper E:  btr11
TSRC Discussion Paper E: btr11Roxanne Persaud
 
Implementation of Privacy Policy Specification System for User Uploaded Image...
Implementation of Privacy Policy Specification System for User Uploaded Image...Implementation of Privacy Policy Specification System for User Uploaded Image...
Implementation of Privacy Policy Specification System for User Uploaded Image...rahulmonikasharma
 
The use of social media in the recruitment process
The use of social media in the recruitment processThe use of social media in the recruitment process
The use of social media in the recruitment processBhagyashree Zope
 
AN INTEGRATED RANKING ALGORITHM FOR EFFICIENT INFORMATION COMPUTING IN SOCIAL...
AN INTEGRATED RANKING ALGORITHM FOR EFFICIENT INFORMATION COMPUTING IN SOCIAL...AN INTEGRATED RANKING ALGORITHM FOR EFFICIENT INFORMATION COMPUTING IN SOCIAL...
AN INTEGRATED RANKING ALGORITHM FOR EFFICIENT INFORMATION COMPUTING IN SOCIAL...ijwscjournal
 
Measuring information credibility in social media using combination of user p...
Measuring information credibility in social media using combination of user p...Measuring information credibility in social media using combination of user p...
Measuring information credibility in social media using combination of user p...IJECEIAES
 
IRJET- An Analysis of Personal Data Shared to Third Parties by Web Services
IRJET- An Analysis of Personal Data Shared to Third Parties by Web ServicesIRJET- An Analysis of Personal Data Shared to Third Parties by Web Services
IRJET- An Analysis of Personal Data Shared to Third Parties by Web ServicesIRJET Journal
 
An empirical study on the usage of social media in german b2 c online stores
An empirical study on the usage of social media in german b2 c online storesAn empirical study on the usage of social media in german b2 c online stores
An empirical study on the usage of social media in german b2 c online storesijait
 
Myo pyae phoo_pwint_000844592_comp1645
Myo pyae phoo_pwint_000844592_comp1645Myo pyae phoo_pwint_000844592_comp1645
Myo pyae phoo_pwint_000844592_comp1645Sofia Nolasco
 
Recommendation System Using Social Networking
Recommendation System Using Social Networking Recommendation System Using Social Networking
Recommendation System Using Social Networking ijcseit
 
Saura, palos sanchez & velicia, 2020 frontiers
Saura, palos sanchez & velicia, 2020 frontiersSaura, palos sanchez & velicia, 2020 frontiers
Saura, palos sanchez & velicia, 2020 frontiersppalos68
 
Gamification and Crowdsourcing as Engagement Techniques for Human Rights Orga...
Gamification and Crowdsourcing as Engagement Techniques for Human Rights Orga...Gamification and Crowdsourcing as Engagement Techniques for Human Rights Orga...
Gamification and Crowdsourcing as Engagement Techniques for Human Rights Orga...I Z
 
INFORMATION RETRIEVAL TOPICS IN TWITTER USING WEIGHTED PREDICTION NETWORK
INFORMATION RETRIEVAL TOPICS IN TWITTER USING WEIGHTED PREDICTION NETWORKINFORMATION RETRIEVAL TOPICS IN TWITTER USING WEIGHTED PREDICTION NETWORK
INFORMATION RETRIEVAL TOPICS IN TWITTER USING WEIGHTED PREDICTION NETWORKIAEME Publication
 

What's hot (20)

Book
BookBook
Book
 
IRJET- Sentiment Analysis using Machine Learning
IRJET- Sentiment Analysis using Machine LearningIRJET- Sentiment Analysis using Machine Learning
IRJET- Sentiment Analysis using Machine Learning
 
13socm04 buregio
13socm04 buregio13socm04 buregio
13socm04 buregio
 
Designing a recommender system based on social networks and location based se...
Designing a recommender system based on social networks and location based se...Designing a recommender system based on social networks and location based se...
Designing a recommender system based on social networks and location based se...
 
TSRC Discussion Paper E: btr11
TSRC Discussion Paper E:  btr11TSRC Discussion Paper E:  btr11
TSRC Discussion Paper E: btr11
 
Implementation of Privacy Policy Specification System for User Uploaded Image...
Implementation of Privacy Policy Specification System for User Uploaded Image...Implementation of Privacy Policy Specification System for User Uploaded Image...
Implementation of Privacy Policy Specification System for User Uploaded Image...
 
The use of social media in the recruitment process
The use of social media in the recruitment processThe use of social media in the recruitment process
The use of social media in the recruitment process
 
AN INTEGRATED RANKING ALGORITHM FOR EFFICIENT INFORMATION COMPUTING IN SOCIAL...
AN INTEGRATED RANKING ALGORITHM FOR EFFICIENT INFORMATION COMPUTING IN SOCIAL...AN INTEGRATED RANKING ALGORITHM FOR EFFICIENT INFORMATION COMPUTING IN SOCIAL...
AN INTEGRATED RANKING ALGORITHM FOR EFFICIENT INFORMATION COMPUTING IN SOCIAL...
 
Measuring information credibility in social media using combination of user p...
Measuring information credibility in social media using combination of user p...Measuring information credibility in social media using combination of user p...
Measuring information credibility in social media using combination of user p...
 
re
rere
re
 
IRJET- An Analysis of Personal Data Shared to Third Parties by Web Services
IRJET- An Analysis of Personal Data Shared to Third Parties by Web ServicesIRJET- An Analysis of Personal Data Shared to Third Parties by Web Services
IRJET- An Analysis of Personal Data Shared to Third Parties by Web Services
 
Citizen 2.0
Citizen 2.0Citizen 2.0
Citizen 2.0
 
An empirical study on the usage of social media in german b2 c online stores
An empirical study on the usage of social media in german b2 c online storesAn empirical study on the usage of social media in german b2 c online stores
An empirical study on the usage of social media in german b2 c online stores
 
Smart communities-evaluation-family-net-centers
Smart communities-evaluation-family-net-centersSmart communities-evaluation-family-net-centers
Smart communities-evaluation-family-net-centers
 
Myo pyae phoo_pwint_000844592_comp1645
Myo pyae phoo_pwint_000844592_comp1645Myo pyae phoo_pwint_000844592_comp1645
Myo pyae phoo_pwint_000844592_comp1645
 
Recommendation System Using Social Networking
Recommendation System Using Social Networking Recommendation System Using Social Networking
Recommendation System Using Social Networking
 
social networking site
social networking sitesocial networking site
social networking site
 
Saura, palos sanchez & velicia, 2020 frontiers
Saura, palos sanchez & velicia, 2020 frontiersSaura, palos sanchez & velicia, 2020 frontiers
Saura, palos sanchez & velicia, 2020 frontiers
 
Gamification and Crowdsourcing as Engagement Techniques for Human Rights Orga...
Gamification and Crowdsourcing as Engagement Techniques for Human Rights Orga...Gamification and Crowdsourcing as Engagement Techniques for Human Rights Orga...
Gamification and Crowdsourcing as Engagement Techniques for Human Rights Orga...
 
INFORMATION RETRIEVAL TOPICS IN TWITTER USING WEIGHTED PREDICTION NETWORK
INFORMATION RETRIEVAL TOPICS IN TWITTER USING WEIGHTED PREDICTION NETWORKINFORMATION RETRIEVAL TOPICS IN TWITTER USING WEIGHTED PREDICTION NETWORK
INFORMATION RETRIEVAL TOPICS IN TWITTER USING WEIGHTED PREDICTION NETWORK
 

Viewers also liked

the near future of tourism services based on digital traces
the near future of tourism services based on digital tracesthe near future of tourism services based on digital traces
the near future of tourism services based on digital tracesnicolas nova
 
Unmetric facebook analysis
Unmetric facebook analysisUnmetric facebook analysis
Unmetric facebook analysisBalvor LLC
 
Multidimensional Patterns of Disturbance in Digital Social Networks
Multidimensional Patterns of Disturbance in Digital Social NetworksMultidimensional Patterns of Disturbance in Digital Social Networks
Multidimensional Patterns of Disturbance in Digital Social NetworksDimitar Denev
 
Designing The Social In
Designing The Social InDesigning The Social In
Designing The Social InErin Malone
 
R5 What Is The Impact Of Urban Activities
R5 What Is The Impact Of Urban ActivitiesR5 What Is The Impact Of Urban Activities
R5 What Is The Impact Of Urban ActivitiesSHS Geog
 
Urban Agents and Citizen Apps
Urban Agents and Citizen AppsUrban Agents and Citizen Apps
Urban Agents and Citizen AppsFabio Carrera
 
CSCW 2011 Talk on "Activity Analysis"
CSCW 2011 Talk on "Activity Analysis"CSCW 2011 Talk on "Activity Analysis"
CSCW 2011 Talk on "Activity Analysis"Jakob Bardram
 
User Behaviour Pattern Recognition On Twitter Social Network
User Behaviour Pattern Recognition On Twitter Social NetworkUser Behaviour Pattern Recognition On Twitter Social Network
User Behaviour Pattern Recognition On Twitter Social NetworkGeorge Konstantakopoulos
 
Introduction to power laws
Introduction to power lawsIntroduction to power laws
Introduction to power lawsColin Gillespie
 
Points of distribution
Points of distributionPoints of distribution
Points of distributionChatham EMA
 
Impact of Urban Logistics of Commercial Vehicles
Impact of Urban Logistics of Commercial Vehicles  Impact of Urban Logistics of Commercial Vehicles
Impact of Urban Logistics of Commercial Vehicles Sandeep Kar
 
Distinguish between chinese urban planning and american urban planning
Distinguish between chinese urban planning and american urban planningDistinguish between chinese urban planning and american urban planning
Distinguish between chinese urban planning and american urban planningyang239500
 
Intro To Power Laws (March 2008)
Intro To Power Laws (March 2008)Intro To Power Laws (March 2008)
Intro To Power Laws (March 2008)Socialphysicist
 
U-Tool: A Urban-Toolkit for enhancing city maps through citizens’ activity
U-Tool: A Urban-Toolkit for enhancing city maps through citizens’ activityU-Tool: A Urban-Toolkit for enhancing city maps through citizens’ activity
U-Tool: A Urban-Toolkit for enhancing city maps through citizens’ activityMiguel Rebollo
 
People Pattern: "The Science of Sharing"
People Pattern: "The Science of Sharing"People Pattern: "The Science of Sharing"
People Pattern: "The Science of Sharing"People Pattern
 

Viewers also liked (20)

Media Wave Platform
Media Wave PlatformMedia Wave Platform
Media Wave Platform
 
the near future of tourism services based on digital traces
the near future of tourism services based on digital tracesthe near future of tourism services based on digital traces
the near future of tourism services based on digital traces
 
Unmetric facebook analysis
Unmetric facebook analysisUnmetric facebook analysis
Unmetric facebook analysis
 
Multidimensional Patterns of Disturbance in Digital Social Networks
Multidimensional Patterns of Disturbance in Digital Social NetworksMultidimensional Patterns of Disturbance in Digital Social Networks
Multidimensional Patterns of Disturbance in Digital Social Networks
 
4 urbact iufn
4 urbact iufn4 urbact iufn
4 urbact iufn
 
Designing The Social In
Designing The Social InDesigning The Social In
Designing The Social In
 
R5 What Is The Impact Of Urban Activities
R5 What Is The Impact Of Urban ActivitiesR5 What Is The Impact Of Urban Activities
R5 What Is The Impact Of Urban Activities
 
Urban Agents and Citizen Apps
Urban Agents and Citizen AppsUrban Agents and Citizen Apps
Urban Agents and Citizen Apps
 
Urban Impact Framework
Urban Impact FrameworkUrban Impact Framework
Urban Impact Framework
 
CSCW 2011 Talk on "Activity Analysis"
CSCW 2011 Talk on "Activity Analysis"CSCW 2011 Talk on "Activity Analysis"
CSCW 2011 Talk on "Activity Analysis"
 
User Behaviour Pattern Recognition On Twitter Social Network
User Behaviour Pattern Recognition On Twitter Social NetworkUser Behaviour Pattern Recognition On Twitter Social Network
User Behaviour Pattern Recognition On Twitter Social Network
 
Zipf distribution
Zipf distributionZipf distribution
Zipf distribution
 
Introduction to power laws
Introduction to power lawsIntroduction to power laws
Introduction to power laws
 
Points of distribution
Points of distributionPoints of distribution
Points of distribution
 
Impact of Urban Logistics of Commercial Vehicles
Impact of Urban Logistics of Commercial Vehicles  Impact of Urban Logistics of Commercial Vehicles
Impact of Urban Logistics of Commercial Vehicles
 
Distinguish between chinese urban planning and american urban planning
Distinguish between chinese urban planning and american urban planningDistinguish between chinese urban planning and american urban planning
Distinguish between chinese urban planning and american urban planning
 
Intro To Power Laws (March 2008)
Intro To Power Laws (March 2008)Intro To Power Laws (March 2008)
Intro To Power Laws (March 2008)
 
U-Tool: A Urban-Toolkit for enhancing city maps through citizens’ activity
U-Tool: A Urban-Toolkit for enhancing city maps through citizens’ activityU-Tool: A Urban-Toolkit for enhancing city maps through citizens’ activity
U-Tool: A Urban-Toolkit for enhancing city maps through citizens’ activity
 
People Pattern: "The Science of Sharing"
People Pattern: "The Science of Sharing"People Pattern: "The Science of Sharing"
People Pattern: "The Science of Sharing"
 
"Professional Input in Enhancing Administrative Urban Planning Activities" by...
"Professional Input in Enhancing Administrative Urban Planning Activities" by..."Professional Input in Enhancing Administrative Urban Planning Activities" by...
"Professional Input in Enhancing Administrative Urban Planning Activities" by...
 

Similar to User behavior model & recommendation on basis of social networks

Abdulwahaab Saif S Alsaif Investigate The Impact Of Social Media On Students
Abdulwahaab Saif S Alsaif Investigate The Impact Of Social Media On StudentsAbdulwahaab Saif S Alsaif Investigate The Impact Of Social Media On Students
Abdulwahaab Saif S Alsaif Investigate The Impact Of Social Media On StudentsLisa Garcia
 
THE VERIFICATION OF VIRTUAL COMMUNITY MEMBER’S SOCIO-DEMOGRAPHIC PROFILE
THE VERIFICATION OF VIRTUAL COMMUNITY MEMBER’S SOCIO-DEMOGRAPHIC PROFILE THE VERIFICATION OF VIRTUAL COMMUNITY MEMBER’S SOCIO-DEMOGRAPHIC PROFILE
THE VERIFICATION OF VIRTUAL COMMUNITY MEMBER’S SOCIO-DEMOGRAPHIC PROFILE acijjournal
 
Master Thesis: The Design of a Rich Internet Application for Exploratory Sear...
Master Thesis: The Design of a Rich Internet Application for Exploratory Sear...Master Thesis: The Design of a Rich Internet Application for Exploratory Sear...
Master Thesis: The Design of a Rich Internet Application for Exploratory Sear...Roman Atachiants
 
Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...Mike Kujawski
 
Interaction_Design_Project_N00147768
Interaction_Design_Project_N00147768Interaction_Design_Project_N00147768
Interaction_Design_Project_N00147768Stephen Norman
 
IRJET - Venue Recommender for Events based on User Preferences
IRJET -  	  Venue Recommender for Events based on User PreferencesIRJET -  	  Venue Recommender for Events based on User Preferences
IRJET - Venue Recommender for Events based on User PreferencesIRJET Journal
 
Dissertation - Social Media Marketing
Dissertation - Social Media MarketingDissertation - Social Media Marketing
Dissertation - Social Media MarketingLuke Edwards
 
IRJET- Event Detection and Text Summary by Disaster Warning
IRJET- Event Detection and Text Summary by Disaster WarningIRJET- Event Detection and Text Summary by Disaster Warning
IRJET- Event Detection and Text Summary by Disaster WarningIRJET Journal
 
An investigation into the physical build and psychological aspects of an inte...
An investigation into the physical build and psychological aspects of an inte...An investigation into the physical build and psychological aspects of an inte...
An investigation into the physical build and psychological aspects of an inte...Jessica Navarro
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systemsvivatechijri
 
From Data to Insights: how to build accurate customer insights from online co...
From Data to Insights: how to build accurate customer insights from online co...From Data to Insights: how to build accurate customer insights from online co...
From Data to Insights: how to build accurate customer insights from online co...Pulsar
 
Social Media Marketing Strategy: Lessons from the Hospitality Industry
Social Media Marketing Strategy: Lessons from the Hospitality IndustrySocial Media Marketing Strategy: Lessons from the Hospitality Industry
Social Media Marketing Strategy: Lessons from the Hospitality IndustryAni Nacheva
 
Presentation10-OF-project.pptx
Presentation10-OF-project.pptxPresentation10-OF-project.pptx
Presentation10-OF-project.pptxShaliniKumari491
 
IRJET- Big Data Driven Information Diffusion Analytics and Control on Social ...
IRJET- Big Data Driven Information Diffusion Analytics and Control on Social ...IRJET- Big Data Driven Information Diffusion Analytics and Control on Social ...
IRJET- Big Data Driven Information Diffusion Analytics and Control on Social ...IRJET Journal
 
LINKING SOFTWARE DEVELOPMENT PHASE AND PRODUCT ATTRIBUTES WITH USER EVALUATIO...
LINKING SOFTWARE DEVELOPMENT PHASE AND PRODUCT ATTRIBUTES WITH USER EVALUATIO...LINKING SOFTWARE DEVELOPMENT PHASE AND PRODUCT ATTRIBUTES WITH USER EVALUATIO...
LINKING SOFTWARE DEVELOPMENT PHASE AND PRODUCT ATTRIBUTES WITH USER EVALUATIO...csandit
 
A.R.C. Usability Evaluation
A.R.C. Usability EvaluationA.R.C. Usability Evaluation
A.R.C. Usability EvaluationJPC Hanson
 
Mobile App Analytics
Mobile App AnalyticsMobile App Analytics
Mobile App AnalyticsNynne Silding
 
WinkShare: A Social Network to Connect with Strangers
WinkShare: A Social Network to Connect with StrangersWinkShare: A Social Network to Connect with Strangers
WinkShare: A Social Network to Connect with StrangersSanjay Rao
 

Similar to User behavior model & recommendation on basis of social networks (20)

Abdulwahaab Saif S Alsaif Investigate The Impact Of Social Media On Students
Abdulwahaab Saif S Alsaif Investigate The Impact Of Social Media On StudentsAbdulwahaab Saif S Alsaif Investigate The Impact Of Social Media On Students
Abdulwahaab Saif S Alsaif Investigate The Impact Of Social Media On Students
 
THE VERIFICATION OF VIRTUAL COMMUNITY MEMBER’S SOCIO-DEMOGRAPHIC PROFILE
THE VERIFICATION OF VIRTUAL COMMUNITY MEMBER’S SOCIO-DEMOGRAPHIC PROFILE THE VERIFICATION OF VIRTUAL COMMUNITY MEMBER’S SOCIO-DEMOGRAPHIC PROFILE
THE VERIFICATION OF VIRTUAL COMMUNITY MEMBER’S SOCIO-DEMOGRAPHIC PROFILE
 
Final_Thesis
Final_ThesisFinal_Thesis
Final_Thesis
 
Master Thesis: The Design of a Rich Internet Application for Exploratory Sear...
Master Thesis: The Design of a Rich Internet Application for Exploratory Sear...Master Thesis: The Design of a Rich Internet Application for Exploratory Sear...
Master Thesis: The Design of a Rich Internet Application for Exploratory Sear...
 
merged_document
merged_documentmerged_document
merged_document
 
Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...
 
Interaction_Design_Project_N00147768
Interaction_Design_Project_N00147768Interaction_Design_Project_N00147768
Interaction_Design_Project_N00147768
 
IRJET - Venue Recommender for Events based on User Preferences
IRJET -  	  Venue Recommender for Events based on User PreferencesIRJET -  	  Venue Recommender for Events based on User Preferences
IRJET - Venue Recommender for Events based on User Preferences
 
Dissertation - Social Media Marketing
Dissertation - Social Media MarketingDissertation - Social Media Marketing
Dissertation - Social Media Marketing
 
IRJET- Event Detection and Text Summary by Disaster Warning
IRJET- Event Detection and Text Summary by Disaster WarningIRJET- Event Detection and Text Summary by Disaster Warning
IRJET- Event Detection and Text Summary by Disaster Warning
 
An investigation into the physical build and psychological aspects of an inte...
An investigation into the physical build and psychological aspects of an inte...An investigation into the physical build and psychological aspects of an inte...
An investigation into the physical build and psychological aspects of an inte...
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
From Data to Insights: how to build accurate customer insights from online co...
From Data to Insights: how to build accurate customer insights from online co...From Data to Insights: how to build accurate customer insights from online co...
From Data to Insights: how to build accurate customer insights from online co...
 
Social Media Marketing Strategy: Lessons from the Hospitality Industry
Social Media Marketing Strategy: Lessons from the Hospitality IndustrySocial Media Marketing Strategy: Lessons from the Hospitality Industry
Social Media Marketing Strategy: Lessons from the Hospitality Industry
 
Presentation10-OF-project.pptx
Presentation10-OF-project.pptxPresentation10-OF-project.pptx
Presentation10-OF-project.pptx
 
IRJET- Big Data Driven Information Diffusion Analytics and Control on Social ...
IRJET- Big Data Driven Information Diffusion Analytics and Control on Social ...IRJET- Big Data Driven Information Diffusion Analytics and Control on Social ...
IRJET- Big Data Driven Information Diffusion Analytics and Control on Social ...
 
LINKING SOFTWARE DEVELOPMENT PHASE AND PRODUCT ATTRIBUTES WITH USER EVALUATIO...
LINKING SOFTWARE DEVELOPMENT PHASE AND PRODUCT ATTRIBUTES WITH USER EVALUATIO...LINKING SOFTWARE DEVELOPMENT PHASE AND PRODUCT ATTRIBUTES WITH USER EVALUATIO...
LINKING SOFTWARE DEVELOPMENT PHASE AND PRODUCT ATTRIBUTES WITH USER EVALUATIO...
 
A.R.C. Usability Evaluation
A.R.C. Usability EvaluationA.R.C. Usability Evaluation
A.R.C. Usability Evaluation
 
Mobile App Analytics
Mobile App AnalyticsMobile App Analytics
Mobile App Analytics
 
WinkShare: A Social Network to Connect with Strangers
WinkShare: A Social Network to Connect with StrangersWinkShare: A Social Network to Connect with Strangers
WinkShare: A Social Network to Connect with Strangers
 

Recently uploaded

Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 

Recently uploaded (20)

Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 

User behavior model & recommendation on basis of social networks

  • 1. American International University - Bangladesh Faculty of Science and Information Technology Department of Computer Science User Behavior Modeling & Recommendation System Based On Social Networks A thesis submitted for the degree of Bachelor of Science in Computer Science and Engineering By: Alam Shah 10-17685-3 Hossain, MD. Shakawat 11-18494-1 Taher, Najeeb Ahmad 11-18198-1 Supervisor: Md. Saddam Hossain Assistant Professor, Department of Computer Science, American International University-Bangladesh Summer 2014
  • 2. Declaration This is to certify that this project is our original work. No part of this has been submitted elsewhere partially or fully for the award of any other degree. Any material reproduced in this project has been properly acknowledged. Alam Shah Hossain MD. Shakawat ID: 10-17685-3 ID: 11-18494-1 Department: CSE Department: CSE Taher, Najeeb Ahmad ID: 11-18198-1 Department: CSE i
  • 3. Approval The thesis titled “User Behavior Modeling & Recommendation System Based On Social Networks” has been submitted to the following respected members of the Board of Examiners of the Faculty of Science and Information Technology in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science Engineering and has been accepted satisfactory. Md. Saddam Hossain Assistant Professor Faculty of Computer Science American International University-Bangladesh Dr. Dip Nandi Assistant Professor & Head Faculty of Computer Science American International University-Bangladesh ii
  • 4. iii Professor Dr. Tafazzal Hossain Dean Faculty of Computer Science American International University-Bangladesh Dr. Carmen Z. Lamagna Vice Chancellor American International University-Bangladesh iii
  • 5. Acknowledgements Special thanks to our honorable teacher and supervisor Md. Sad- dam Hossain, Assistant Professor, Department of Computer Science, American International University-Bangladesh. We are very grateful to him for giving us the opportunity to work with him. Without his continuous support, it would be very difficult for us to complete this work. We would also like to thank all the faculty members for their guidelines for making proper documentation for our project.
  • 6. Abstract At present social networks play an important role to express people’s sentiment and people’s interest in a particular field. Extracting a user’s public social network data (what the user shares with friends and relatives and how the user reacts over others’ thought) means extracting the user’s behavior. Defining some determined hypothesis if we make machine understand human sentiment and interest, it is possible to recommend a user his/her personal interest on basis of the user’s sentiment analyzed by machine. Our main approach is to suggest a user regarding the user’s specific interest that is anticipated by analyzing the user’s public data. This can be extended to further business analysis to suggest products or services of different companies depending on the consumer’s personal choice. This automation will also help to choose the correct candidate for any questionnaire. This system will also help anyone to know about himself or herself, how one’s behavior may influence others. It is possible to identify different types of people such as- dependable people, leadership skilled, people of supportive mentality, people of negative mentality etc.
  • 7. Table of Contents List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . : vii 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . : 1 2. Previous Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . : 3 2.1 Location Based Social Network. . . . . . . . . . . . . . . . . . ...: 3 2.2 Collaborative Recommendation Based Social Network. . . . . . . . . . . . . . . . . . . . . . . . . . . ..: 8 2.3 Sentimental Intensity Analysis of Informal Texts. . . . . . . . . . . . . . . . . . . . . . . . : 12 2.4 Big Five [1] Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . ..: 16 3. Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...: 28 4. Proposed Research Methodology. . . . . . . . . . . . . . . . . . . . . ...: 29 4.1 Data Collection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...: 29 4.2 Data Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .: 31 4.3 Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..: 32 4.4 Recommendation Analysis. . . . . . . . . . . . . . . . . . . . . . . . : 33 5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .......: 36 vi
  • 8. List of Figures 4.1 Modeling User Behavior . . . . . . . . . . . . . . . . . . . . . . . 30 4.2 Pie Chart of LIWC Results . . . . . . . . . . . . . . . . . . . . . 32 4.3 Personality Based Recommendation System . . . . . . . . . . . . 33 vii
  • 9. List of Tables 2.1 Comparison of different location based social networks . . . . . . 7 4.1 Relationship between LIWC categories and Big Five factors . . . 31 4.2 Products under Big Five factors . . . . . . . . . . . . . . . . . . . 34 4.3 Products under Big Five factors . . . . . . . . . . . . . . . . . . . 34 4.4 Products under Big Five factors . . . . . . . . . . . . . . . . . . . 34 4.5 Products under Big Five factors . . . . . . . . . . . . . . . . . . . 35 4.6 Products under Big Five factors . . . . . . . . . . . . . . . . . . . 35 4.7 Products under Big Five factors . . . . . . . . . . . . . . . . . . . 35 viii
  • 10. Chapter 1 Introduction With millions of users, social networking services like Facebook [2] and Twitter [3] have become some of the most popular internet applications. These applications are sources of knowledge and information. The rich knowledge that has been accumulated in these social networking sites enables a variety of recommendation systems for new users and media [4]. To use such opportunity, it is possible to create automated system that can categorize social network users according to Big Five [1] personality factors. To categorize users in such categorization system, users’ data are needed to be collected without interfering their daily activities. Thus the system will help people to know about other people. For example: An employee needs vacation and if his boss is listed as a friend on OSN (Online Social Networks) then the employee gets the chance to apply for his demand according to the boss’s behavior determined by the system (Neuroticism [1] indicates higher chances of disagree when Agreeableness [1] indicates higher chances of agree). Online Social Networks (OSN) deal with big data, after analyzing such data, the system will be able to predict a suitable person for leadership or people who may oppose the leadership. Many challenges to recommendation systems have been tackled by many new approaches, using different data sources and methodologies to generate different kinds of recommendations. In this article we provide a description of such systems. From the very beginning, Consumer interests have a great influence on business policy. Offering the right products or services to the right customers is the main objective of every successful business policy. Many business organizations can 1
  • 11. 2 be benefited by using the data collected from OSN. At present the popularity of social networks is increasing very rapidly. From sociologist’s points of view, OSN can be characterized as “collective goods produced through computer mediated collective action” [5]. Users spend a huge amount of time of their daily life involving in OSN and share a lot of information about them and their friends and families. So, this is a great opportunity to know about the sentiments and the interests of the people. It is possible to understand the behavior of the users of OSN as it becomes a crucial factor for advertising policies and better product design. In particular giving the success of item recommendation systems of commercial websites, such as Amazon [6] and Netflix [7], it is considered worthwhile to revisit the recommendation problem through the perspective of social networking. In general, recommendation systems aim to provide personalized recommendations of items to users based on their previous behavior as well as on other information gathered by item descriptions and user profiles. Our experiment is based on Twitter [3] and Facebook [2]; the most popular OSN websites having a large place of advertisements. These websites have a very big number of users and the users feel comfortable using these social networking sites because of the user-friendly features of these sites such as micro-blogging, status updating, photos and videos sharing, commenting on posts, joining and creating groups, liking and subscribing pages and profiles, creating events, playing games and so on. We aim to analyze user behavior by the following steps- collecting the user’s past activities in OSN, mapping it on Big Five factors [1], finding out a set of particular interests field of the user and recommending him or her by giving informative services. 2
  • 12. Chapter 2 Previous work OSN is the practice of expanding the number of business and social contacts of a person by making connections through individuals [8]. In this era of internet OSN is extremely popular among people. According to Nielsen Onlines report two third of world population spent 10% of their time in internet in OSN [9]. As OSN give opportunity to its user to express what he/she wants to say with their friends, relatives and others connected through their OSN account. There are huge amount of chances to identify/characterize one’s behavior types implicitly without interfering his or her personal life [4]. 2.1 Location Based Social Network [10] A social network is a social structure made up of individuals connected by one or more specific types of interdependency, such as friendship, common interests, and shared knowledge. Generally, a social networking service builds on and reflects the real-life social networks among people through online platforms such as a website, providing ways for users to share ideas, activities, events, and interests over the Internet. The increasing availability of location-acquisition technology (for example GPS and Wi-Fi) empowers people to add a location dimension to existing online social networks in a variety of ways. For example, users can upload location-tagged photos to a social networking service such as Flickr [11], comment 3
  • 13. 2.1. Location Based Social Network [10] 4 on an event at the exact place where the event is happening (for instance, in Twit- ter [3]), share their present location on a website (such as Foursquare [12]) for organizing a group activity in the real world, record travel routes with GPS tra- jectories to share travel experiences in an online community. Here, a location can be represented in absolute (latitude-longitude coordinates), relative (100 meters north of the Space Needle), and symbolic (home, office, or shopping mall) form. Also, the location embedded into a social network can be a stand-alone instant location of an individual, like in a bar at 9pm, or a location history accumulated over a certain period, such as a GPS trajectory: a cinema a restaurant a park a bar. The dimension of location brings social networks back to reality, bridging the gap between the physical world and online social networking services. For exam- ple, a user with a mobile phone can leave his/her comments with respect to a restaurant in an online social site (after finishing dinner) so that the people from his/her social structure can reference his/her comments when they later visit the restaurant. In this example, users create their own location-related stories in the physical world and browse other peoples information as well. An online social site becomes a platform for facilitating the sharing of peoples experiences. Further- more, people in an existing social network can expand their social structure with the new interdependency derived from their locations. As location is one of the most important components of user context, extensive knowledge about an indi- viduals interests and behavior can be learned from her locations. For instance, people who enjoy the same restaurant can connect with each other. Individuals constantly hiking the same mountain can be put in contact with each other to share their travel experiences. Sometimes, two individuals who do not share the same absolute location can still be linked as long as their locations are indicative of a similar interest, such as beaches or lakes. These kinds of location-embedded and location-driven social structures are known as location-based social networks, formally defined as follows: “A location-based social network (LBSN) [10] does not only mean adding a loca- tion to an existing social network so that people in the social structure can share location embedded information, but also consists of the new social structure made up of individuals connected by the interdependency derived from their locations in 4
  • 14. 2.1. Location Based Social Network [10] 5 the physical world as well as their location-tagged media content, such as photos, video, and texts. Here, the physical location consists of the instant location of an individual at a given timestamp and the location history that an individual has accumulated in a certain period. Further, the interdependency includes not only that two persons co-occur in the same physical location or share similar location histories but also the knowledge, e.g., common interests, behavior, and activities, inferred from an individual’s location (history)and location-tagged data.” In a location-based social network, people can not only track and share the location-related information of an individual via either mobile devices or desktop computers, but also leverage collaborative social knowledge learned from user gen- erated and location-related content, such as GPS trajectories and geo-tagged pho- tos. One example is determining this summers most popular restaurant by mining peoples geo-tagged comments. Another example could be identifying the most popular travel routes in a city based on a large number of users geo-tagged pho- tos. Consequently, LBSNs enable many novel applications that change the way we live, such as physical location (or activity) recommendation systems [13] [14] and travel planning , while offering many new research opportunities for social network analysis (like user modeling in the physical world and connection strength analysis) [15] [16], spatio-temporal data mining [17], ubiquitous computing [18], and spatio-temporal databases [17] [19] Existing applications providing location- based social networking services can be broadly categorized into three folds: geo- tagged-media-based, point-location-driven and trajectory-centric. • Geo-tagged-media-based. [10] Quite a few geo-tagging services enable users to add a location label to media content such as text, photos, and videos generated in the physical world. The tagging can occur instantly when the medium is generated, or after a user has returned home. In this way, people can browse their content at the exact location where it was created (on a digital map or in the physical world using a mobile phone). Users can also comment on the media and expand their social structures using the interdependency derived from the geo-tagged content (for example, in favor of the same photo taken at a location). Representative websites of such location-based social networking services include Flickr, Panoramio, and 5
  • 15. 2.1. Location Based Social Network [10] 6 Geo-twitter. Though a location dimension has been added to these social networks, the focus of such services is still on the media content. That is, location is used only as a feature to organize and enrich media content while the major interdependency between users is based on the media itself. • Point-location-driven. [10] Applications like Foursquare and Google Lati- tude encourage people to share their current locations, such as a restaurant or a museum. In Foursquare, points and badges are awarded for checking in at venues. The individual with the most number of check-ins at a venue is crowned Mayor. With the real-time location of users, an individual can discover friends (from her social network) around her physical location so as to enable certain social activities in the physical world, e.g., inviting people to have dinner or go shopping. Meanwhile, users can add tips to venues that other users can read, which serve as suggestions for things to do, see, or eat at the location. With this kind of service, a venue (point location) is the main element determining the in-terdependency connecting users, while user-generated content such as tips and badges feature a point location. • Trajectory-centric. [10] In a trajectory-centric social networking service, such as Bikely, SportsDo, and Microsoft GeoLife, users pay attention to both point locations (passed by a trajectory) and the detailed route con- necting these point locations. These services do not only tell users basic information, such as distance, duration, and velocity, about a particular trajectory, but also show a users experiences represented by tags, tips, and photos for the trajectory. In short, these services provide how and what information in addition to where and when. In this way, other people can reference a users travel/sports experience by browsing or replaying the tra- jectory on a digital map, and follow the trajectory in the real world with a GPS-phone. 6
  • 16. 2.1. Location Based Social Network [10] 7 Table 2.1 provides a brief comparison among the set here services. The major differences between the point-location-driven and the trajectory-centric LBSN lie in two aspects. One is that a trajectory offers richer information than a point location, such as how to reach a location, the temporal duration that a user stayed in a location, the time length for travelling between two locations, and the physical/traffic conditions of a route. As a result, we are more likely to accurately understand an individuals behavior and interests in a trajectory-centric LBSN. The other is that in a point-location-driven LBSN users usually share their real- time location while the trajectory-centric more likely delivers historical locations as users typically prefer to upload a trajectory after a trip has finished (though it can be operated in a continuously uploading manner). This property could compromise some scenarios based on the real-time location of a user, however, it reduces to some extent the privacy issues in a location-based social network. In other words, when people see a users trajectory the user is no longer there. Table 2.1. Comparison of different location based social networks LBSN Services Focus Real-time Information Geo-tagged-media-based Media Normal Poor Point-location-driven Point location Instant Normal Trajectory-centric Trajectory Relatively Slow Rich Actually, the location data generated in the first two LBSN services can be converted into the form of a trajectory which might be used by the third category of LBSN service. For example, if we sequentially connect the point locations of the geo-tagged photos taken by a user over several days, a sparse trajectory can be formulated. Likewise, the check-in records of an individual ordered by time can be regarded as a low-sampling-rate trajectory. However, due to the sparseness, i.e., the distance and time interval between two consecutive points in a trajectory could be very big, the uncertainty existing in a single trajectory from the first two services is increased. Aiming to put these trajectories into trajectory-centric LBSN services, we need to use them in a collective and collaborative way. Trajectory data is the most complex data structure to be found in the three 7
  • 17. 2.2. Collaborative Recommendation Based Social Network [20] 8 LBSN services, and provides the richest information. If it is handled well, other data sources become easier to deal with. Moreover, as mentioned above, loca- tion data can be converted into a trajectory on many occasions. Consequently, some methodologies designed for trajectory data can be employed by the first two LBSN services. 2.2 Collaborative Recommendation Based So- cial Network [20] With the recent advances in technology, there is an emerging presence of social media and social networking systems. In the case of multimedia enriched social network systems, such as last.fm, the collective goods are musical tracks and the collective action is the process of crafting individual profiles of musical preference and linking them either explicitly, via bonds of friendship, or implicitly, through collaborative annotation. This collective action leads to the creation of an implicit social networking struc- ture, which we aim to further explore. In particular given the success of item recommendation systems in commercial websites, such as Amazon.com and Net- flix, it is considered worthwhile to revisit the recommendation problem through the novel perspective of social networking. In general, recommendation systems aim to provide personalized recommendations of items to users based on their previous behavior as well as on other information gathered by item descriptions and user profiles. However, no emphasis has been placed yet on personalization based explicitly on social networks. The reason is that despite there is an increasing interest in the exploration of social networks, there does not exist a concrete dataset that in- cludes both explicit bonds of friendships among users and free-form collaborative annotation of items. This is due to that most social media systems do not allow for free access to all user profiles or lists of friends. Given the incentives of the widespread add option of social networks and of the 8
  • 18. 2.2. Collaborative Recommendation Based Social Network [20] 9 lack of some previous study that directly addresses the problem of efficiently in- tegrating the added value knowledge provided by those networks in the field of collaborative recommendation, we propose a new methodology that tackles the aforementioned issues. Within this context we make the following contributions: • Kontas et al. [20] introduce a dataset based on data from the last.fm so- cial network that describes a social graph among users, tracks and tags, effectively including bonds of friendship and collaborative annotation. • Kontas et al. [20] evaluate a Random Walk with Restarts (RWR) model on this dataset and show that the incorporation of friendship and social tagging can improve the performance of an item recommendation system. • Kontas et al. [20] show that the RWR method outperforms the standard Collaborative Filtering (CF) method, which we also evaluate against the same dataset. • Kontas et al. [20] show that our method using the RWR method requires no training and successfully manages to capture Kontas et al. [20] may distinguish two broad categories of collaborative recom- mendation systems, namely content-based and collaborative filtering. A content- based system selects items based on the correlation between the content of the items (e.g. keywords describing the items, such as album genre, artists, etc., for music tracks) and the users’ preferences [5]. However, it is limited to dictionary- bound relations between the keywords used by users and the descriptions of items and therefore does not explore implicit associations between users. Collaborative filtering systems are divided into two categories, i.e. memory- based and model-based. In the memory based systems [21] we calculate the similarity between all users, based on their ratings of items using some heuristic measure such as the cosine similarity or the Pearson correlation score. Then we predict a missing rate by aggregating the ratings of the k nearest neighbors of 9
  • 19. 2.2. Collaborative Recommendation Based Social Network [20] 10 the user we want to recommend to. The problem with memory-based systems is that we have to decide on a rather arbitrary basis over parameters such as the number of neighbors. What is more, in the case of social networks there is no straightforward way to introduce similarities between users based on friendships and social tagging, other than some way of ad hoc interpolation of similarity weights from those different sources. The model-based filtering systems assume that the users build up clusters based on their similar behavior in rating of items. A model is learned based on patterns recognized in the rating behaviors of users using clustering, Bayesian networks and other machine learning techniques [22] [23]. The problem with model-based methods is that it is necessary to fine-tune several parameters of the model as well as the fact that the models produced might not generalize well in radically different context. What is more, as in the case of memory-based systems extra effort and training needs to be done in order to introduce knowledge from social networks. Many research publications have been lately revolving around the area of so- cial media. In particular, several studies focus on dataset collection and analysis from social networks. Das et al. [24] proposed sample based algorithms that capture information in the neighborhood of a user in dynamic social networks utilizing random walks. Halpin et al. [25] studied the distribution of tags in the social bookmarking site del.icio.us and proposed a generative model of col- laborative tagging in order to evaluate the dynamics that lie beneath the act of collaborative recommendation. Their findings prove that the dataset collected fol- lows a power-law distribution. Even though both studies examine social networks that are based on social tagging, they do not explore the dynamics of friendships among users. Taking into account the power of free-form tagging of items by users other than their authors/owners, researchers also focus on tag recommendation. Subramanya and Liu [26] propose a system that automatically recommends tags for blogs, using similarity ranking in a manner similar to collaborative filtering techniques. Stromhaier [27] studies a novel idea in tag recommendation, which bridges the gap between the keywords issued by a user in a query and the tags actually used by a social system. He argues that the tags used by a user when 10
  • 20. 2.2. Collaborative Recommendation Based Social Network [20] 11 performing a query exhibit his or her intent, whereas the annotations of items describe content semantics. As a result, he proposes a new form of purpose tags, which extract the intent of the user and facilitate goal oriented search in a social network. Both studies underline the importance and discriminative power of so- cial tagging, which is also validated by our work. Several studies exist in the field of applying Random Walks on bipartite graphs. Craswell and Szummer [28] study a clickthrough data graph in order to perform item recommendation. Nevertheless, no social content is available between users. Yildirim and Krishnamoorthy [23] propose a novel recommenda- tion algorithm which performs Random Walks on a graph that denotes similarity measures between items. They evaluate their system using data from Movie Lens. Although, the use of the Random Walk model performs well in the context of recommendation, their use of an Item-Item similarity matrix raises some issues as to the ability of the system to extend when other similarities are introduced based on social tagging. Recent work has also been done in the field of applying Random Walks over a social graph instead of bipartite graphs, similar to what we propose in this paper. Clements et al. [29] propose a single term query system performing Random Walks on graphs including users, items and tags. They use data from LibraryThing, an online book catalogue where users rate and tag books they have read. Due to lack of ground truth, they assume that the tags assigned to an item by each user are the same as they would use as query terms to retrieve the annotated item. We argue that this assumption is rather strong and that a user experiment would be more appropriate in order to properly establish the ground truth. Hotho et al. evaluate a variation of adapted PageRank on a dataset from del.icio.us, exploring folksonomies of bookmarks based also on collaborative annotation [30]. However, since they evaluate their proposed algorithm empirically, any compar- ison attempts to their results becomes cumbersome. Although both studies are close to our approach, we use a different model, namely RWR, in which we explic- itly include friendships in our dataset and perform collaborative recommendations instead of queries on the graph. 11
  • 21. 2.3. Sentiment Intensity Analysis of Informal Texts [31] 12 2.3 Sentiment Intensity Analysis of Informal Texts [31] The proliferation of social networks such as blogs, forums and other online means of expression and communication have resulted in a landscape where people are able to freely discuss online through a variety of means and applications. Probably one of the most novel and interesting way of communication in cy- berspace is through 3D virtual environments. In such environments, people, rep- resented by their avatars, socialize and interact with each other and with virtual humans operated by machines i.e., computer systems. Despite the fact that the graphics of those environments remain relatively poor, futuristic movies such as Avatar [32] provide an example of sophisticated land- scapes and renderings that will be attainable by such environments in the fore- seeable future. However, regardless of how attractive and realistic such artificial 3D worlds become, they will always remain heavily dependant on the quality of human communication that takes place within them. As shown in [33] [34] [35], communication in environments that are not limited to one, textual modality, consists of not just semantic data transfer, but also of dense non-verbal commu- nication where sentiment plays an important role. Moreover, without emotion no consistent and coherent (virtual) body language is possible. Such primordial movements include facial expressions, eye looks, arm-language coordination, etc. Sentiment detection from textual utterances can play an important role in the development of realistic and interactive dialog systems. Such systems serve var- ious educational, business or entertainment oriented functions and also include systems that are deployed in 3D virtual environments. With the aid of dialog coherence” modules, conversational systems aim at a realistic interaction flow at the emotional level e.g., Affect Listeners [36] and can greatly benefit from the correct identification of the emotional state of their participants. Taking into consideration that the majority of input to practical conversational systems con- stitute of short, informal, textual exchanges, it is essential that the sentiment analysis component integrated in the dialog system is able to cope with this type of informal, often incomplete or ill-formed type of communication. Sentiment analysis, the process of automatically detecting if a text segment con- 12
  • 22. 2.3. Sentiment Intensity Analysis of Informal Texts [31] 13 tains emotional or opinionated content and extracting its polarity or valence, is a field of research that has received significant attention in recent years, both in academia and in industry. The aforementioned increase of user-generated con- tent on the web has resulted in a wealth of information that is potentially of vital importance to institutions and companies, providing them with data to research their consumers, manage their reputations and identify new opportunities. As a result, most of the research in the field has been limited to product reviews, where the aim is to predict whether the reviewer recommends a product or not, based on the textual content of the review. The focus of this paper is different. Instead of focusing our attention to prod- uct reviews, we explore a more ubiquitous field of informal, social interactions in cyberspace. The unprecedented popularity of social platforms such as Facebook, Twitter, MySpace as well as 3D virtual worlds has resulted in an unparallel in- crease of textual exchanges that remains relatively unexplored especially in terms of its emotional content. Specifically, Paltoglou et al. [31] aim to answer the following question: can lexicon- based approaches perform more effectively than machine-learning approaches in this domain? This question is particularly important, because previous research in sentiment analysis using product reviews has shown that machine-learning ap- proaches typically outperform lexicon-based ones but no exploration of whether the same holds for informal, social interactions has been carried in the past. The difference between the two domains is numerous. Firstly, reviews tend to be longer and more verbose than typical social interactions which may only be a few words long and often contain significant spelling errors [37]. Secondly, no clear “golden standard” exists in the domain of informal communications with which to train a machine-learning classifier in opposition to the “thumbs up” or “thumbs down” feature of reviews. Lastly, social exchanges on the web tend to be much more diverse in terms of their topics with issues ranging from politics and recent news to religion while in contrast; product reviews by definition have a specific subject, i.e. the product under discussion. The study of emotional and social interactions in virtual worlds implies the study of virtual human (VH) behaviors. Two types of VH exist: avatars (i.e. the projection of a real human in the 3D environment) and agents (i.e. the projection of an autonomous machine 13
  • 23. 2.3. Sentiment Intensity Analysis of Informal Texts [31] 14 simulating a human in the virtual world). These VH types result in three possible types of communications: avatar to avatar, agent to agent and avatar to agent. Each one of those has the following interesting aspects respectively: - A non verbal body language based on VH emotional states and mind profile. - A potential visualization of the interaction from a third VH that should be represented by an avatar. - A non-verbal communication for the human representation and an action of agent strongly influenced by interpreted emotions from the avatar. It seems only logical that artificial intelligence and conversation systems would strongly benefit these aspects in order to make the communication more re- alistic. The structure of this paper is as follows. The next section provides a brief overview of relevant work in sentiment analysis. Section 3 presents the lexicon based classifier and section 4 presents the two machine-learning classifiers that will be used in this study. Section 5 describes the data sets that were used and explains the experimental setup while section 6 presents and analyzes the results. Finally, Paltoglou et al. [31] conclude and present some potential future directions of research. Sentiment analysis, also known as opinion mining, has known con- siderable interest recently. Most research has focused on analyzing the content of either movie or general product reviews (e.g. [38]). Attempts to expand the application of sentiment analysis to other domains, such as debates [39], news and blogs [40] are also prominent. The seminal book of Pang and Lee [41] presents a thorough analysis of the work in the field. In this section we will focus on the more prominent work which is relevant to our approach. Pang et al. [46] were amongst of the first to explore the sentiment analysis of reviews, focusing on machine- learning approaches. These approaches generally function as follows: initially, a general inductive process learns the characteristics of a class during a training phase, by observing the properties of a number of pre classified documents (i.e. reference corpus ) and applies the acquired knowledge to determine the best cat- egory for new, unseen documents, during testing. Pang et al. [46] experimented 14
  • 24. 2.3. Sentiment Intensity Analysis of Informal Texts [31] 15 with three different algorithms: Support Vector Machines (SVMs), Naive Bayes and Maximum Entropy classifiers, using a variety of features, such as unigrams and bigrams, part-of-speech tags, binary and term frequency feature weights and others. Their best attained accuracy in a dataset consisting of movie reviews, was attained using a SVM classifier with binary features, although all three classifiers gave very comparable performance. Other approaches (e.g. [42] [43]) have focused on extending the feature set with semantically or linguistically-driven features in order to improve classification accuracy. Dictionary/lexicon-based sentiment analysis is typically based on lists of words with some sort of pre-determined emotional weight. Examples of such dictionaries include the General Inquirer (GI) dictionary [44] and the “Linguistic Inquiry and Word Count” (LIWC) soft- ware [45], which are also used in the present study. Both lexicons are build with the aid of experts that classify certain tokens in terms of their affective content (e.g. positive or negative). The “Affective Norms for English Words” (ANEW) lexicon [46] contains ratings of terms on a nine-point scale in regard to three individual dimensions: valence, arousal and dominance. The ratings were pro- duced manually by psychology class students. Ways to produce such emotional dictionaries in an automatic or semi-automatic fashion have also been introduced in research [47]. Emotional dictionaries have mostly been utilized in psychology or sociology oriented research [48]. The idea of emotional conversationalists is relatively old. First attempts to create such a system can be traced back to Parry [49], a chatterbot intended for studying the nature of paranoia and able to express fears, anxieties or beliefs. More recent work include research on the development of synthetic characters and chatterbots with personalities [50] and studies on emotional responses and their influence on the creation of believable agents or interactive virtual personalities [51]. In [52] authors focused on the role of emotions for gaining rapport in spoken dialog sys- tems by rendering responses that contain suitable emotion, both lexically and auditory. Studies on the role of facial expressions in building rapport in a virtual human-users interactions were conducted in [53]. A chatterbot system that gen- erates emotional responses by selecting and displaying expressive images of the character emulated by the chatterbot was presented in [54]. It has been almost two decades that emotional communication for virtual worlds is a challenging 15
  • 25. 2.4. Big Five modeling [1] 16 research field. One of the pioneer paper has been proposed by Cassel et al. [55]. In the proposed system, conversations between multiple human-like agents were automatically generates and animates with appropriate and synchronized speech, intonation, facial expressions, and hand gestures proposed numerous ways to design personality and emotion models for virtual humans. More recently, pre- dicted a specific personality and emotional states from hierarchical fuzzy rules to facilitate personality and emotion control, and in 2009, Pelachaud et al. [56] de- veloped a model of behavior expressivity using a set of six parameters that act as modulation of behavior animation. Finally, this year, [35] introduced a graphical representation of human emotion extracted from text sentences. The main con- tributions of that approach included an original pipeline that extracts, processes, and renders emotion of 3D VH. Additionally, the paper presented methods to optimize the computational pipeline so that real time virtual reality rendering can be achieved on common PCs. Lastly, it was demonstrated how the Poisson distribution can be utilized to transfer database extracted lexical and language parameters into coherent intensities of valence and arousal (i.e. parameters of Russell’s circumplex model of emotion). 2.4 Big Five modeling [1] At present, many researchers believe that there are five core personality traits and the evidence of this theory has been growing over the past 50 years [1]. From the point of view of a sociologist, social media can be characterized as collective goods produced through computer-mediated collective action [57]. While people of each category have different attitude corresponding sites, taste of products, different skill to accomplish work. The five factors are Extraversion, Agreeable- ness, Conscientiousness, Neuroticism and Openness [58]. The people of different categories have different ways to express their thoughts and OSN users have dif- ferent level of significance to express their thoughts or behavior [1] [4]. The users of OSN can be categorize according to Big Five factors. The behavior of an OSN user varies from users location to location but there is a similarity having same behavior in people from same or nearby location [59]. Behavior also varies from 16
  • 26. 2.4. Big Five modeling [1] 17 different aged people. The personality traits used in the 5 factor model are Extraversion, Agreeableness, Conscientiousness, Neuroticism and Openness to experience [58]. It is important to ignore the positive or negative associations that these words have in everyday language. For example, Agreeableness is obviously advantageous for achieving and maintaining popularity. Agreeable people are better liked than disagreeable people. On the other hand, agreeableness is not useful in situations that require tough or totally objective decisions. Disagreeable people can make excellent sci- entists, critics, or soldiers. Remember, none of the five traits is in themselves positive or negative, they are simply characteristics that individuals exhibit to a greater or lesser extent. Each of these 5 personality traits describes, relative to other people, the frequency or intensity of a person’s feelings, thoughts, or behaviors. Everyone possesses all 5 of these traits to a greater or lesser degree. For example, two individuals could be described as agreeable (agreeable people value getting along with others). But there could be significant variation in the degree to which they are both agree- able. In other words, all 5 personality traits exist on a continuum rather than as attributes that a person does or does not have. Each of the Big Five personality traits is made up of 6 facets or sub traits. These can be assessed independently of the trait that they belong to. • Extraversion Extraversion is marked by pronounced engagement with the external world. Extraverts enjoy being with people, are full of energy, and often experience positive emotions. They tend to be enthusiastic, action-oriented, individu- als who are likely to say “Yes!” or “Let’s go!” to opportunities for excite- ment. In groups they like to talk, assert themselves, and draw attention to themselves. Introverts lack the exuberance, energy, and activity levels of extraverts. They tend to be quiet, low-key, deliberate, and disengaged from the social world. Their lack of social involvement should not be interpreted as shyness or depression; the introvert simply needs less stimulation than an extravert and prefers to be alone. The independence and reserve of the introvert is sometimes mistaken as unfriendliness or arrogance. In reality, 17
  • 27. 2.4. Big Five modeling [1] 18 an introvert who scores high on the agreeableness dimension will not seek others out but will be quite pleasant when approached. Extraversion Facets: – Friendliness. Friendly people genuinely like other people and openly demonstrate positive feelings toward others. They make friends quickly and it is easy for them to form close, intimate relationships. Low scor- ers on Friendliness are not necessarily cold and hostile, but they do not reach out to others and are perceived as distant and reserved. – Gregariousness. Gregarious people find the company of others pleas- antly stimulating and rewarding. They enjoy the excitement of crowds. Low scorers tend to feel overwhelmed by, and therefore actively avoid, large crowds. They do not necessarily dislike being with people some- times, but their need for privacy and time to themselves is much greater than for individuals who score high on this scale. – Assertiveness. High scorers Assertiveness like to speak out, take charge, and direct the activities of others. They tend to be leaders in groups. Low scorers tend not to talk much and let others control the activities of groups. – Activity Level. Active individuals lead fast-paced, busy lives. They move about quickly, energetically, and vigorously, and they are in- volved in many activities. People who score low on this scale follow a slower and more leisurely, relaxed pace. – Excitement-Seeking. High scorers on this scale are easily bored with- out high levels of stimulation. They love bright lights and hustle and bustle. They are likely to take risks and seek thrills. Low scorers are overwhelmed by noise and commotion and are adverse to thrill-seeking. – Cheerfulness. This scale measures positive mood and feelings, not neg- ative emotions (which are a part of the Neuroticism domain). Persons who score high on this scale typically experience a range of positive 18
  • 28. 2.4. Big Five modeling [1] 19 feelings, including happiness, enthusiasm, optimism, and joy. Low scorers are not as prone to such energetic, high spirits. • Agreeableness Agreeableness reflects individual differences in concern with cooperation and social harmony. Agreeable individuals value getting along with others. They are therefore considerate, friendly, generous, helpful, and willing to compromise their interests with others’. Agreeable people also have an op- timistic view of human nature. They believe people are basically honest, decent, and trustworthy. Disagreeable individuals place self-interest above getting along with others. They are generally unconcerned with others’ well-being, and therefore are unlikely to extend themselves for other peo- ple. Sometimes their skepticism about others’ motives causes them to be suspicious, unfriendly, and uncooperative. Agreeableness is obviously ad- vantageous for attaining and maintaining popularity. Agreeable people are better liked than disagreeable people. On the other hand, agreeableness is not useful in situations that require tough or absolute objective decisions. Disagreeable people can make excellent scientists, critics, or soldiers. Agreeableness Facets: – Trust. A person with high trust assumes that most people are fair, honest, and have good intentions. Persons low in trust may see others as selfish, devious, and potentially dangerous. – Morality. High scorers on this scale see no need for pretence or ma- nipulation when dealing with others and are therefore candid, frank, and sincere. Low scorers believe that a certain amount of deception in social relationships is necessary. People find it relatively easy to relate to the straightforward high-scorers on this scale. They generally find it more difficult to relate to the low-scorers on this scale. It should be made clear that low scorers are not unprincipled or immoral; they are simply more guarded and less willing to openly reveal the whole truth. 19
  • 29. 2.4. Big Five modeling [1] 20 – Altruism. Altruistic people find helping other people genuinely re- warding. Consequently, they are generally willing to assist those who are in need. Altruistic people find that doing things for others is a form of self-fulfillment rather than self-sacrifice. Low scorers on this scale do not particularly like helping those in need. Requests for help feel like an imposition rather than an opportunity for self-fulfillment. – Cooperation. Individuals who score high on this scale dislike con- frontations. They are perfectly willing to compromise or to deny their own needs in order to get along with others. Those who score low on this scale are more likely to intimidate others to get their way. – Modesty. High scorers on this scale do not like to claim that they are better than other people. In some cases this attitude may derive from low self-confidence or self-esteem. Nonetheless, some people with high self-esteem find immodesty unseemly. Those who are willing to de- scribe themselves as superior tend to be seen as disagreeably arrogant by other people. – Sympathy. People who score high on this scale are tender-hearted and compassionate. They feel the pain of others vicariously and are easily moved to pity. Low scorers are not affected strongly by human suf- fering. They pride themselves on making objective judgments based on reason. They are more concerned with truth and impartial justice than with mercy. • Conscientiousness Conscientiousness concerns the way in which we control, regulate, and direct our impulses. Impulses are not inherently bad; occasionally time constraints require a snap decision, and acting on our first impulse can be an effective response. Also, in times of play rather than work, acting spontaneously and impulsively can be fun. Impulsive individuals can be seen by others as colorful and fun-to-be-with. Nonetheless, acting on impulse can lead to trouble in a number of ways. Some impulses are antisocial. Uncontrolled antisocial acts not only harm 20
  • 30. 2.4. Big Five modeling [1] 21 other members of society, but also can result in retribution toward the perpetrator of such impulsive acts. Another problem with impulsive acts is that they often produce immediate rewards but undesirable, long-term con- sequences. Examples include excessive socializing that leads to being fired from one’s job, hurling an insult that causes the breakup of an important relationship, or using pleasure-inducing drugs that eventually destroy one’s health. Impulsive behavior, even when not seriously destructive, diminishes a per- son’s effectiveness in significant ways. Acting impulsively disallows con- templating alternative courses of action, some of which would have been wiser than the impulsive choice. Impulsivity also sidetracks people during projects that require organized sequences of steps or stages. Accomplish- ments of an impulsive person are therefore small, scattered, and inconsis- tent. A hallmark of intelligence, what potentially separates human beings from earlier life forms, is the ability to think about future consequences before acting on an impulse. Intelligent activity involves contemplation of long- range goals, organizing and planning routes to these goals, and persisting toward one’s goals in the face of short-lived impulses to the contrary. The idea that intelligence involves impulse control is nicely captured by the term prudence, an alternative label for the Conscientiousness domain. Prudent means both wise and cautious. Persons who score high on the Conscien- tiousness scale are, in fact, perceived by others as intelligent. The benefits of high conscientiousness are obvious. Conscientious individ- uals avoid trouble and achieve high levels of success through purposeful planning and persistence. They are also positively regarded by others as intelligent and reliable. On the negative side, they can be compulsive perfec- tionists and workaholics. Furthermore, extremely conscientious individuals might be regarded as stuffy and boring. Unconscientious people may be criticized for their unreliability, lack of ambition, and failure to stay within the lines, but they will experience many short-lived pleasures and they will never be called stuffy. 21
  • 31. 2.4. Big Five modeling [1] 22 Conscientiousness Facets: – Self-Efficacy. Self-Efficacy describes confidence in one’s ability to ac- complish things. High scorers believe they have the intelligence (com- mon sense), drive, and self-control necessary for achieving success. Low scorers do not feel effective, and may have a sense that they are not in control of their lives. – Orderliness. Persons with high scores on orderliness are well-organized. They like to live according to routines and schedules. They keep lists and make plans. Low scorers tend to be disorganized and scattered. – Dutifulness. This scale reflects the strength of a person’s sense of duty and obligation. Those who score high on this scale have a strong sense of moral obligation. Low scorers find contracts, rules, and regulations overly confining. They are likely to be seen as unreliable or even irresponsible. – Achievement-Striving. Individuals who score high on this scale strive hard to achieve excellence. Their drive to be recognized as successful keeps them on track toward their lofty goals. They often have a strong sense of direction in life, but extremely high scores may be too single- minded and obsessed with their work. Low scorers are content to get by with a minimal amount of work, and might be seen by others as lazy. – Self-Discipline. What many people call will-power refers to the ability to persist at difficult or unpleasant tasks until they are completed. People who possess high self-discipline are able to overcome reluctance to begin tasks and stay on track despite distractions. Those with low self-discipline procrastinate and show poor follow-through, often failing to complete tasks-even tasks they want very much to complete. – Cautiousness. Cautiousness describes the disposition to think through possibilities before acting. High scorers on the Cautiousness scale take their time when making decisions. Low scorers often say or do first 22
  • 32. 2.4. Big Five modeling [1] 23 thing that comes to mind without deliberating alternatives and the probable consequences of those alternatives. • Neuroticism The term neurosis is used to describe a condition marked by mental distress, emotional suffering, and an inability to cope effectively with the normal de- mands of life. It is suggested that everyone shows some signs of neurosis, but that we differ in our degree of suffering and our specific symptoms of distress. Today neuroticism refers to the tendency to experience negative feelings. Those who score high on Neuroticism may experience primarily one specific negative feeling such as anxiety, anger, or depression, but are likely to experience several of these emotions. People high in neuroticism are emotionally reactive. They respond emotionally to events that would not affect most people, and their reactions tend to be more intense than normal. They are more likely to interpret ordinary situations as threaten- ing, and minor frustrations as hopelessly difficult. Their negative emotional reactions tend to persist for unusually long periods of time, which means they are often in a bad mood. These problems in emotional regulation can diminish a neurotic’s ability to think clearly, make decisions, and cope ef- fectively with stress. At the other end of the scale, individuals who score low in neuroticism are less easily upset and are less emotionally reactive. They tend to be calm, emotionally stable, and free from persistent negative feelings. Freedom from negative feelings does not mean that low scorers experience a lot of positive feelings; frequency of positive emotions is a component of the Extraversion domain. Neuroticism Facets: – Anxiety. The ”fight-or-flight” system of the brain of anxious individ- uals is too easily and too often engaged. Therefore, people who are high in anxiety often feel like something dangerous is about to happen. 23
  • 33. 2.4. Big Five modeling [1] 24 They may be afraid of specific situations or be just generally fearful. They feel tense, jittery, and nervous. – Anger. Persons who score high in Anger feel enraged when things do not go their way. They are sensitive about being treated fairly and feel resentful and bitter when they feel they are being cheated. This scale measures the tendency to feel angry; whether or not the person expresses annoyance and hostility depends on the individual’s level on Agreeableness. Low scorers do not get angry often or easily. – Depression. This scale measures the tendency to feel sad, dejected, and discouraged. High scorers lack energy and have difficult initiating activities. Low scorers tend to be free from these depressive feelings. – Self-Consciousness. Self-conscious individuals are sensitive about what others think of them. Their concern about rejection and ridicule cause them to feel shy and uncomfortable abound others. They are eas- ily embarrassed and often feel ashamed. Their fears that others will criticize or make fun of them are exaggerated and unrealistic, but their awkwardness and discomfort may make these fears a self-fulfilling prophecy. Low scorers, in contrast, do not suffer from the mistaken impression that everyone is watching and judging them. They do not feel nervous in social situations. – Immoderation. Immoderate individuals feel strong cravings and urges that they have difficulty resisting. They tend to be oriented toward short-term pleasures and rewards rather than long-term consequences. Low scorers do not experience strong, irresistible cravings and conse- quently do not find themselves tempted to overindulge. – Vulnerability. High scorers on Vulnerability experience panic, confu- sion, and helplessness when under pressure stress. Low scorers feel more poised, confident, and clear-thinking when stressed. 24
  • 34. 2.4. Big Five modeling [1] 25 • Openness to Experience Openness to Experience describes a dimension of cognitive style that dis- tinguishes imaginative, creative people from down-to-earth, conventional people. Open people are intellectually curious, appreciative of art, and sensitive to beauty. They tend to be, compared to closed people, more aware of their feelings. They tend to think and act in individualistic and nonconforming ways. Intellectuals typically score high on Openness to Ex- perience; consequently, this factor has also been called Culture or Intellect. Nonetheless, Intellect is probably best regarded as one aspect of openness to experience. Scores on Openness to Experience are only modestly related to years of education and scores on standard intelligent tests. Another characteristic of the open cognitive style is a facility for thinking in symbols and abstractions far removed from concrete experience. Depend- ing on the individual’s specific intellectual abilities, this symbolic cognition may take the form of mathematical, logical, or geometric thinking, artistic and metaphorical use of language, music composition or performance, or one of the many visual or performing arts. People with low scores on open- ness to experience tend to have narrow, common interests. They prefer the plain, straightforward, and obvious over the complex, ambiguous, and sub- tle. They may regard the arts and sciences with suspicion, regarding these endeavors as abstruse or of no practical use. Closed people prefer familiar- ity over novelty; they are conservative and resistant to change. Openness is often presented as healthier or more mature by psychologists, who are often themselves open to experience. However, open and closed styles of thinking are useful in different environments. The intellectual style of the open person may serve a professor well, but research has shown that closed thinking is related to superior job performance in police work, sales, and a number of service occupations. Openness to Experience Facets: – Imagination. To imaginative individuals, the real world is often too 25
  • 35. 2.4. Big Five modeling [1] 26 plain and ordinary. High scorers on this scale use fantasy as a way of creating a richer, more interesting world. Low scorers are on this scale are more oriented to facts than fantasy. – Artistic Interests. High scorers on this scale love beauty, both in art and in nature. They become easily involved and absorbed in artistic and natural events. They are not necessarily artistically trained or talented, although many will be. The defining features of this scale are interest in, and appreciation of natural and artificial beauty. Low scorers lack aesthetic sensitivity and interest in the arts. – Emotionality. Persons high on Emotionality have good access to and awareness of their own feelings. Low scorers are less aware of their feelings and tend not to express their emotions openly. – Adventurousness. High scorers on adventurousness are eager to try new activities, travel to foreign lands, and experience different things. They find familiarity and routine boring, and will take a new route home just because it is different. Low scorers tend to feel uncomfort- able with change and prefer familiar routines. – Intellect. Intellect and artistic interests are the two most important, central aspects of openness to experience. High scorers on Intellect love to play with ideas. They are open-minded to new and unusual ideas, and like to debate intellectual issues. They enjoy riddles, puzzles, and brain teasers. Low scorers on Intellect prefer dealing with people or things rather than ideas. They regard intellectual exercises as a waste of time. Intellect should not be equated with intelligence. Intellect is an intellectual style, not an intellectual ability, although high scor- ers on Intellect score slightly higher than low-Intellect individuals on standardized intelligence tests. – Liberalism. Psychological liberalism refers to a readiness to challenge authority, convention, and traditional values. In its most extreme form, psychological liberalism can even represent outright hostility to- ward rules, sympathy for law-breakers, and love of ambiguity, chaos, and disorder. Psychological conservatives prefer the security and sta- 26
  • 36. 2.4. Big Five modeling [1] 27 bility brought by conformity to tradition. Psychological liberalism and conservatism are not identical to political affiliation, but certainly in- cline individuals toward certain political parties. It is possible, although unusual, to score high in one or more facets of a per- sonality trait and low in other facets of the same trait. For example, you could score highly in Imagination, Artistic Interests, Emotionality and Adventurous- ness, but score low in Intellect and Liberalism. 27
  • 37. Chapter 3 Research Questions The main objective of this paper is to draw user’s virtual behavior model by an- alyzing his/her OSN existence and to recommend products to the user on basis of the user’s behavior model. To reach our main goal, we need to consider few sub objectives, such as collecting user’s social network activities, analyizing the user’s activity for few days, categorize the user’s activity in Big Five factors, rec- ommending some services or products to the user on basis of the user’s behavior model. In order to fulfill our objectives some research questions will arise. The main research question of this paper is: How to categorize users of OSN according to Big Five factors from their behaviours in OSN? The sub research questions are 1. How do OSN(Online Social Networks) represent one user? 2. How can we analysis user behavior ? 3. How to categorize user behavior in Big Five factors? 28
  • 38. Chapter 4 Proposed Research Methodology In this paper our aim is to make relationship among text corpus from social network with psychological theory of personality. We will also try to imple- ment a recommendation system based on behavior analysis. So correlational and exploratory methodologies are used in this paper where our concept is Behav- ior indicator in Big Five Modeling and variables are Extraversion, Neuroticism, Agreeableness, Openness and Conscientiousness. • 4.1 Data Collection: In this research to categorize user’s behavior the big data is collected. The data is collected from OSN(Twitter). The data is stored in OSN by user’s activities such as posts by the user, posts by the user’s friends, liked pages etc. The collected data is the public data so there is no barrier to use these data. At a time a user’s previous 30 days data will be collected. Data will be directly collected by the system from OSN by full user authorization. After collecting data it will be stored in system database with security. Twitter, a social network site, can be used for sentiment analysis as it has a very large number of short messages created by its users [60]. So we used Twitter to collect users’ data. Using Twitter REST api 1.1, we collected public tweets and re-tweets. Our twitter app requires users to authorize the app for extracting data from their profiles. The twitter app will not collect data if users do not allow it to run. We made sure all data we 29
  • 39. 30 USER LIWC Mapping OSN(Twitter) Twitter API Represents Figure 4.1. Modeling User Behavior extract from twitter is public data. By calling get statuses/user timeline and get statuses/retweets of me methods we can collect the user’s tweets and retweets. The system can also collect public data from profiles that the user is currently following by using get friends/ids method. The data we collected are in json format and our twitter app can write the data to text files. As separated files are easier to use we separated each user’s data file by using user’s unique identifier- userid or username. 30
  • 40. 31 • 4.2 Data Analysis: Text file which contain past data of a single user is an- alyzed through LIWC (Linguistic Inquiry and Word Count). It is a text analysis software program designed by James W. Pennebaker, Roger J. Booth and Martha E. Each text file analyzed by LIWC2007 can be treated as a whole or broken into segments. It counts the words according to its dictionary. After finishing this process it saves in a specified file where the result is written on the below corresponding its category. Where, these categories indicate different aspects of Big Five factors. On basis of these results the modelling is implemented. The data table is given below which shows which category lies in which factor. Table 4.1. Relationship between LIWC categories and Big Five factors Big Five factors LIWC Categories Extraversion Social process, Family, Friends, Humans, Affec- tive, Biological process, Sexual, Achievement Openness to Experience Leisure, Insight, Body, Ingestion Neuroticism Swear words, Negation, Negative emotion, Anger , Sadness, Sexual Conscientiousness Relativity, Motion, Space, Time, Religion, Death, Money, Certainty Agreeableness Positive Emotion, Feel, Discrepancy(would), Ten- tative(maybe), Hear The collected data is analyzed by LIWC to split every sentence. Then according to the Big Five factors and the meaning and the use of words there will be a percentage marking. After marking the percentage will be summed and the higher marking category will be taken as user behavior. 31
  • 41. 32 • 4.3 Results Result of total counted words provided by LIWC is in percentage. LIWC gives the result in such way: result=(TC*100)/WC Where WC = total words in text file. TC = total words in category. The opposite method is used to know the exact number of words. Where, TC=(result*100)/WC Then which categories lie in same factor of the Big Five factors, values of those categories are summed using linear regression formula. Linear regression f(X)=X1+X2+X3+. . . +Xi We used percentaged value of each factor. Percentage formula part/whole=%/100 These results are used to draw the pie chart using EXCEL. Example: Figure 4.2. Pie Chart of LIWC Results 32
  • 42. 33 USER Figure 4.3. Personality Based Recommendation System • 4.4 Recommendation Analysis: Depending on the behavior analysis some brands of products are suggested or recommended to users. Major percent- age of behavior can influence one to like a particular type of products. There are some examples given in table below which show majority of people hav- ing a particular behavior have interest on a particular brand or product or service. The following tables show some examples of recommendations. 33
  • 43. 34 As for example user A, B and C are followers of Age of Empires game page in Twitter. After analyzing their tweets and retweets, machine maps their behavior and it seems that major part of their behavior is extrovert. And now after analyzing the tweets and retweets of user X if machine finds that majority of his behavior is influenced by extroversion then we can recommend him games like Age of Empires. Table 4.2. Products under Big Five factors Big Five Factors Product Categories/Brands Video Games Extraversion Strategy(Age of Empires, Commandos) Openness to Experience Racing(Need for Speed) Neuroticism Shooting(Call of duty, Counter Strike) Conscientiousness Chess, Sudoku Agreeableness Sports(Fifa) Table 4.3. Products under Big Five factors Big Five Factors Product Categories/Brands Movies Extraversion Political, Fantasy, Family Openness to Experience Comedy, Sports, Drama Neuroticism Crime scene, Action, Horror Conscientiousness Political, Historical, Conspiracy Agreeableness Romantic, Drama Table 4.4. Products under Big Five factors Big Five Factors Product Categories/Brands Music Extraversion Rock Openness to Experience Classical, Vocal, Country wood Neuroticism Pop, Heavy Metal Conscientiousness New Released, Historic Agreeableness Romantic, Country 34
  • 44. 35 Table 4.5. Products under Big Five factors Big Five Factors Product Categories/Brands Food Extraversion Bead, Meat Openness to Experience Multicultural Food, Pizza Neuroticism Fast Food Conscientiousness Salad, Vegetable Agreeableness Bread, Cheese Table 4.6. Products under Big Five factors Big Five Factors Product Categories/Brands Beverage Extraversion Coffee, Tea Openness to Experience Milkshake, Green Tea Neuroticism Soft Drinks Conscientiousness Green tea, Black Coffee Agreeableness coffee, tea, soft Drinks Table 4.7. Products under Big Five factors Big Five Factors Product Categories/Brands Sports Extraversion Football, Athletics Openness to Experience Cricket, Swim Neuroticism Boxing, Rugby, Marshal arts Conscientiousness Athletics, Marshal arts Agreeableness Gymnastics 35
  • 45. Chapter 5 Conclusions In our thesis we proved that personality can be automated through analyzing language cues. There has been little work done regarding to this field and to the very best of our knowledge our research is one of the very first researches to examine the recognition of personality and to introduce recommendation system based on sentiment analysis results. During our research we realized that feature selection is one of the most important tasks, as some of the best models only contain a small subset of all feature set. LIWC features are beneficial for all traits. For all recognition tasks we an- alyzed the influence of the most relevant individual features in specific models. We also used Stanford NLP (natural language processing) application to analyze and split the texts. Later we only used LIWC because it generates more accurate results than Standard NLP for our data analysis. At this moment our system can only use text information. But in future our system will be able to analyze data from shared links or videos. Our system cannot identify quotations (which user uses to share others speech). The system lacks the ability to understand double negatives in a sentence. For example: “The service of Samsung Galaxy S3 is not very bad”. There is a big scope of analyzing exclamatory sentences or smileys(sentimental expressions). Our system can not understand sarcastic behavior at this moment. Recommendation system on brands depends more accurately on percentage of 36
  • 46. 37 Big Five factors. Depth of measuring and scale of marking will be more efficient. 37
  • 47. Bibliography [1] K. Cherry, “The big five personality dimensions,” 2012. Accessed: 2010-09- 30. [2] “Facebook.com.” Accessed: 2014-06-01. [3] “Twitter.com.” Accessed: 2014-06-01. [4] J. Bao, Y. Zheng, and M. F. Mokbel, “Location-based and preference-aware recommendation using sparse geo-social networking data,” in Proceedings of the 20th International Conference on Advances in Geographic Information Systems, pp. 199–208, ACM, 2012. [5] A. M. Ferman, J. H. Errico, P. v. Beek, and M. I. Sezan, “Content-based filtering and personalization using structured metadata,” in Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries, pp. 393–393, ACM, 2002. [6] “Amazon.com.” Accessed: 2014-04-01. [7] “Netflix.com.” Accessed: 2014-04-01. [8] F. Benevenuto, T. Rodrigues, M. Cha, and V. Almeida, “Characterizing user behavior in online social networks,” in Proceedings of the 9th ACM SIG- COMM conference on Internet measurement conference, pp. 49–62, ACM, 2009. [9] N. O. Report, “Social networks and blogs now 4th most popular online ac- tivity.” 38
  • 48. BIBLIOGRAPHY 39 [10] Y. Zheng, “Location-based social networks: Users,” in Computing with Spa- tial Trajectories, pp. 243–276, Springer, 2011. [11] “Flickr.com.” Accessed: 2014-04-01. [12] “Foursquare.com.” Accessed: 2014-01-01. [13] X. Cao, G. Cong, and C. S. Jensen, “Mining significant semantic loca- tions from gps data,” Proceedings of the VLDB Endowment, vol. 3, no. 1-2, pp. 1009–1020, 2010. [14] Y. Zheng, L. Zhang, X. Xie, and W.-Y. Ma, “Mining interesting locations and travel sequences from gps trajectories,” in Proceedings of the 18th inter- national conference on World wide web, pp. 791–800, ACM, 2009. [15] Q. Li, Y. Zheng, X. Xie, Y. Chen, W. Liu, and W.-Y. Ma, “Mining user sim- ilarity based on location history,” in Proceedings of the 16th ACM SIGSPA- TIAL international conference on Advances in geographic information sys- tems, p. 34, ACM, 2008. [16] X. Xiao, Y. Zheng, Q. Luo, and X. Xie, “Finding similar users using category- based location history,” in Proceedings of the 18th SIGSPATIAL Interna- tional Conference on Advances in Geographic Information Systems, pp. 442– 445, ACM, 2010. [17] W. Liu, Y. Zheng, S. Chawla, J. Yuan, and X. Xing, “Discovering spatio- temporal causal interactions in traffic data streams,” in Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1010–1018, ACM, 2011. [18] Y. Zheng, Q. Li, Y. Chen, X. Xie, and W.-Y. Ma, “Understanding mobility based on gps data,” in Proceedings of the 10th international conference on Ubiquitous computing, pp. 312–321, ACM, 2008. [19] L. Wang, Y. Zheng, X. Xie, and W.-Y. Ma, “A flexible spatio-temporal indexing scheme for large-scale gps track retrieval,” in Mobile Data Man- agement, 2008. MDM’08. 9th International Conference on, pp. 1–8, IEEE, 2008. 39
  • 49. BIBLIOGRAPHY 40 [20] I. Konstas, V. Stathopoulos, and J. M. Jose, “On social networks and col- laborative recommendation,” in Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pp. 195–202, ACM, 2009. [21] J. L. Herlocker, J. A. Konstan, A. Borchers, and J. Riedl, “An algorithmic framework for performing collaborative filtering,” in Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 230–237, ACM, 1999. [22] G. Adomavicius and A. Tuzhilin, “Toward the next generation of recom- mender systems: A survey of the state-of-the-art and possible extensions,” Knowledge and Data Engineering, IEEE Transactions on, vol. 17, no. 6, pp. 734–749, 2005. [23] H. Yildirim and M. S. Krishnamoorthy, “A random walk method for allevi- ating the sparsity problem in collaborative filtering,” in Proceedings of the 2008 ACM conference on Recommender systems, pp. 131–138, ACM, 2008. [24] G. Das, N. Koudas, M. Papagelis, and S. Puttaswamy, “Efficient sampling of information in social networks,” in Proceedings of the 2008 ACM workshop on Search in social media, pp. 67–74, ACM, 2008. [25] H. Halpin, V. Robu, and H. Shepherd, “The complex dynamics of collabora- tive tagging,” in Proceedings of the 16th international conference on World Wide Web, pp. 211–220, ACM, 2007. [26] S. B. Subramanya and H. Liu, “Socialtagger-collaborative tagging for blogs in the long tail,” in Proceedings of the 2008 ACM workshop on Search in social media, pp. 19–26, ACM, 2008. [27] M. Strohmaier, “Purpose tagging: capturing user intent to assist goal- oriented social search,” in Proceedings of the 2008 ACM workshop on Search in social media, pp. 35–42, ACM, 2008. 40
  • 50. BIBLIOGRAPHY 41 [28] N. Craswell and M. Szummer, “Random walks on the click graph,” in Pro- ceedings of the 30th annual international ACM SIGIR conference on Re- search and development in information retrieval, pp. 239–246, ACM, 2007. [29] M. Clements, A. P. de Vries, and M. J. Reinders, “Optimizing single term queries using a personalized markov random walk over the social graph,” in Workshop on Exploiting Semantic Annotations in Information Retrieval (ESAIR), 2008. [30] A. Hotho, R. J¨aschke, C. Schmitz, and G. Stumme, Information retrieval in folksonomies: Search and ranking. Springer, 2006. [31] G. Paltoglou, S. Gobron, M. Skowron, M. Thelwall, and D. Thalmann, “Sen- timent analysis of informal textual communication in cyberspace,” Proc. En- gage, pp. 13–25, 2010. [32] “Avatarmovie.com.” Accessed: 2014-04-01. [33] A. Kappas, U. Hess, and K. R. Scherer, “6. voice and emotion,” Fundamen- tals of nonverbal behavior, p. 200, 1991. [34] P. Becheiraz and D. Thalmann, “A model of nonverbal communication and interpersonal relationship between virtual actors,” in Computer Ani- mation’96. Proceedings, pp. 58–67, IEEE, 1996. [35] S. Gobron, J. Ahn, G. Paltoglou, M. Thelwall, and D. Thalmann, “From sen- tence to emotion: a real-time three-dimensional graphics metaphor of emo- tions extracted from text,” The Visual Computer, vol. 26, no. 6-8, pp. 505– 519, 2010. [36] M. Skowron, “Affect listeners: Acquisition of affective states by means of conversational systems,” in Development of Multimodal Interfaces: Active Listening and Synchrony, pp. 169–181, Springer, 2010. [37] M. Thelwall and D. Wilkinson, “Public dialogs in social network sites: What is their purpose?,” Journal of the American Society for Information Science and Technology, vol. 61, no. 2, pp. 392–404, 2010. 41
  • 51. BIBLIOGRAPHY 42 [38] B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up?: sentiment classi- fication using machine learning techniques,” in Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10, pp. 79–86, Association for Computational Linguistics, 2002. [39] M. Thomas, B. Pang, and L. Lee, “Get out the vote: Determining support or opposition from congressional floor-debate transcripts,” in Proceedings of the 2006 conference on empirical methods in natural language processing, pp. 327–335, Association for Computational Linguistics, 2006. [40] I. Ounis, C. Macdonald, and I. Soboroff, “Overview of the trec-2008 blog track,” tech. rep., DTIC Document, 2008. [41] B. Pang and L. Lee, “Opinion mining and sentiment analysis,” Foundations and trends in information retrieval, vol. 2, no. 1-2, pp. 1–135, 2008. [42] T. Mullen and N. Collier, “Sentiment analysis using support vector machines with diverse information sources.,” in EMNLP, vol. 4, pp. 412–418, 2004. [43] C. Whitelaw, N. Garg, and S. Argamon, “Using appraisal groups for senti- ment analysis,” in Proceedings of the 14th ACM international conference on Information and knowledge management, pp. 625–631, ACM, 2005. [44] T. Wilson, J. Wiebe, and P. Hoffmann, “Recognizing contextual polarity in phrase-level sentiment analysis,” in Proceedings of the conference on human language technology and empirical methods in natural language processing, pp. 347–354, Association for Computational Linguistics, 2005. [45] J. W. Pennebaker, M. E. Francis, and R. J. Booth, “Linguistic inquiry and word count: Liwc 2001,” Mahway: Lawrence Erlbaum Associates, vol. 71, p. 2001, 2001. [46] M. Bradley and P. Lang, “Affective norms for english words (anew): Techni- cal manual and affective ratings,” Gainesville, FL: The Center for Research in Psychophysiology, University of Florida, 1999. [47] J. Brooke, M. Tofiloski, and M. Taboada, “Cross-linguistic sentiment analy- sis: From english to spanish.,” in RANLP, pp. 50–54, 2009. 42
  • 52. BIBLIOGRAPHY 43 [48] R. B. Slatcher, C. K. Chung, J. W. Pennebaker, and L. D. Stone, “Winning words: Individual differences in linguistic style among us presidential and vice presidential candidates,” Journal of Research in Personality, vol. 41, no. 1, pp. 63–75, 2007. [49] K. M. Colby, S. Weber, and F. D. Hilf, “Artificial paranoia,” Artificial In- telligence, vol. 2, no. 1, pp. 1–25, 1971. [50] F. Barthelemy, B. Dosquet, S. Gries, and X. Magnant, “Believable synthetic characters in a virtual emarket,” in Artificial Intelligence and Applications: IASTED International Conference Proceedings, as part of the 22 nd IASTED International Multi-Conference on Applied Informatics, 2004. [51] J. Bates et al., “The role of emotion in believable agents,” Communications of the ACM, vol. 37, no. 7, pp. 122–125, 1994. [52] J. C. Acosta, “Using emotion to gain rapport in a spoken dialog system,” in Proceedings of Human Language Technologies: The 2009 Annual Confer- ence of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Student Research Workshop and Doctoral Consortium, pp. 49–54, Association for Computational Linguistics, 2009. [53] J. Gratch, N. Wang, J. Gerten, E. Fast, and R. Duffy, “Creating rapport with virtual agents,” in Intelligent Virtual Agents, pp. 125–138, Springer, 2007. [54] P. Turney and M. L. Littman, “Unsupervised learning of semantic orientation from a hundred-billion-word corpus,” 2002. [55] J. Cassell, C. Pelachaud, N. Badler, M. Steedman, B. Achorn, T. Becket, B. Douville, S. Prevost, and M. Stone, “Animated conversation: rule-based generation of facial expression, gesture & spoken intonation for multiple con- versational agents,” in Proceedings of the 21st annual conference on Com- puter graphics and interactive techniques, pp. 413–420, ACM, 1994. [56] C. Pelachaud, “Studies on gesture expressivity for a virtual agent,” Speech Communication, vol. 51, no. 7, pp. 630–639, 2009. 43
  • 53. BIBLIOGRAPHY 44 [57] J. C. Ward and A. L. Ostrom, “The internet as information minefield: an analysis of the source and content of brand information yielded by net searches,” Journal of Business research, vol. 56, no. 11, pp. 907–914, 2003. [58] S. Bai, T. Zhu, and L. Cheng, “Big-five personality prediction based on user behaviors at social network sites,” arXiv preprint arXiv:1204.4809, 2012. [59] M. Smith, V. Barash, L. Getoor, and H. W. Lauw, “Leveraging social context for searching social media,” in Proceedings of the 2008 ACM workshop on Search in social media, pp. 91–94, ACM, 2008. [60] A. Pak and P. Paroubek, “Twitter as a corpus for sentiment analysis and opinion mining.,” in LREC, 2010. 44