Social Data Mining
Mahesh J. Meniya
Akash M. Rangani
Data, Information, Knowledge(1)
Facts and statistics collected together for reference or analysis.
The quantities, characters, or symbols on which operations
are performed by a computer, being stored and transmitted.
The patterns, associations, or relationships among all this data
can provide information. For example, analysis of retail point
of sale transaction data can yield information on which
products are selling and when.
Data, Information, Knowledge(2)
Information can be converted into knowledge about historical
patterns and future trends. For example, summary
information on retail supermarket sales can be analyzed in
light of promotional efforts to provide knowledge of
consumer buying behavior. Thus, a manufacturer or retailer
could determine which items are most susceptible to
What is Data Mining ?
From the large dataset find the :
The overall goal of the data mining process is to extract
information from a data set and transform it into an
understandable structure for further use.
The process of collecting, searching through, and analyzing a
large amount of data in a database, as to discover patterns or
What is Social Data Mining ?
Social media is designed as a group of Internet-based
applications that build on the ideological and technological
foundations of Web 2.0 and that allow the creation and
exchanges of user-generated content.
Vast amounts of user-generated content are created on social
media sites every day i.e. facebook, Twitter, Google+
Systematically analyzing the valuable information from the
Social media is Social data mining
Social media data are largely user-generated content which is
vast, noisy, distributed, unstructured, and dynamic
Social Media Platform
Community-based Question Answer (CQA)
Emails and Chat
Opinion, reviews, and ratings
Why Important ?
The WWW is vast
People shares more data
Advertising and marketing
Products are more customized
More devices produce more data
Product development and design
Structures in Social Media
Social structures represent social relationships between
community members. Accordingly, social applications are often
designed to systemically support these properties.
Social structures represent social relationships between
community members. For example, in online forums, a useful
criterion provided by a social structure is whether or not a
member is an expert in a specific topic.
Types of Social Media Structure
Objects used in social data mining often possess a natural
hierarchical structure. For example, even a short document comprises a
number of sentences. Accordingly, hierarchical structures has been
frequently addressed in information representation.
We can identify conversational structures explicitly or implicitly
in most social platform involving interactions between users. For
example, in emails and forums, conversational structures are formed by
Data Mining Techniques for
Graphs (or networks) constitute a dominant data structure and
appear essentially in all forms of information. Examples include the
Web graph, social networks. Typically, the communities correspond to
groups of nodes, where nodes within the same community (or clusters)
tend to be highly similar sharing common features, while on the other
hand, nodes of different communities show low similarity.
Extracting useful knowledge (patterns, outliers, etc.) from
structured data that can be represented as a graph.
Graph Mining usage
Google uses page rank as one of many predictors for the relevance of
a web page. The link structure in the world-wide-web network
provides valuable contextual information about which pages are
deemed most relevant by the web page creators—this contextual link
structure is then used to predict relevance for a user’s query.
Useful for understand relationships as well as content (text, images),
Social media host tries to look at certain online groups and predict
about the group whether the group will flourish or disband.
Graph Mining usage cont.
Phone provider looks at cell phone call records to determine
whether an account is a result of identity theft.
Facebook Graph Search
Searching people: “friends of friends who are single female in Rajkot”
Searching interests: “movies my friends like”, “TV shows my friends
like”, “Videos by TV shows liked by my friends”.
Searching places: “Restaurant in Rajkot liked by friends”
Text mining is an emerging technology that attempts to extract
meaningful information from unstructured textual data. Text mining
is an extension of data mining to textual data. Social networks contain
a lot of text in the nodes in various forms. For example, social
networks may contain links to posts, blogs or other news articles.
Usage of text mining (1)
Automatic processing of messages, emails
common application for text mining is to aid in the automatic
classification of texts. For example, it is possible to "filter" out
automatically most undesirable "junk email" based on certain
terms or words that are not likely to appear in legitimate messages
Investigating competitors by crawling their web sites
Another type of potentially very useful application is to
automatically process the contents of Web pages in a particular
domain. For example, you could go to a Web page, and begin
"crawling" the links you find there to process all Web pages that
Usage of text mining (2)
Mining medical records to improve care of patient
Many text mining software packages are marketed for security
applications, especially monitoring and analysis of online plain
text sources such as Internet news, blogs, etc. for national security
Generic Process of social data
Web 2.0 data source
• Cluster & community Detection
• static analysis
Text Mining Process stages (1)
The data collector module continuously downloads the from one or
more social platform and stores the raw data into the database
(e.g.BigData) or normal database. Based on application type the
parameters are specified with the API call.
Data modeling is a process used to define and analyze data
requirements needed to support the application processes within the
scope of corresponding application. In the data modeling stage data
is model in various data model based on the application nature
Text Mining Process stages (2)
automatic or semi-automatic analysis of large quantities of data to
extract previously unknown interesting patterns such as groups of
data records known as cluster analysis.
It is the search for items or events which do not confirm to an
Text Mining Process stages (3)
Analysis of historical business activities, stored as static data in data
warehouse databases, to reveal hidden patterns and trends.
Examples of what businesses use data mining for include
performing market analysis to finding the root cause of
Can be used to assist in discovering previously unknown strategic
To prevent customer attrition and acquire new customers
Cross-sell to existing customers
Manage customers with more accuracy.
OAuth is an open standard for authorization
It provides a process for end-users to authorize third-party access to
their server resources without sharing their credentials (typically, a
username and password pair), using user-agent redirections.
Open authentication protocol which enables applications to access
each other’s data.
Authorization flow steps(1)
First the user accesses the client web application. In this web app is
button saying "Login via Facebook" (or some other system like
Google or Twitter).
Second, when the user clicks the login button, the user is redirected
to the authenticating application (e.g. Facebook). The user then logs
into the authenticating application, and is asked if s/he wants to
grant access to her data in the authenticating application, to the
client application. The user accepts.
Third, the authenticating application redirects the user to a redirect
URI, which the client app has provided to the authenticating app.
providing this redirect URI is normally done by registering the client
application with the authenticating application.
Authorization flow steps(2)
Fourth, the user accesses the page located at the redirect URI in the
client application. In the background the client application contacts
the authenticating application and sends
Once the client application has obtained an access token, this access
token can be sent to the Facebook, Google, Twitter etc. to access
resources in these systems, related to the user who logged in.
Roles of users and applications
in oAuth 2.0 (1)
Roles of users and applications
in Auth 2.0 (2)
The resource owner is the person or application that owns the data that is
to be shared. For instance, a user on Facebook or Google could be a
The resource server is the server hosting the resource owned by the
resource server. For instance, Facebook or Google is a resource server
The client application is the application requesting access to the resources
stored on the resource server. The resources, which are owned by the
resource owner. A client application could be a game requesting access to a
users Facebook account.
Roles of users and applications
in Auth 2.0 (3)
The authorization server is the server authorizing the client
application to access the resources of the resource owner.
The authorization server and the resource server can be the same
Big data is the term for a collection of data sets so large and complex
that it becomes difficult to process using on-hand database
management tools or traditional data processing applications. The
challenges include capture, storage, search, sharing, transfer, analysis
Some Examples :
Facebook has more than 1.15 billion active users generating social
More than 5 billion people are calling, texting, tweeting and
browsing websites on mobile phones
Scientific instruments generate large amount of data
Application Big data
Google Flu Trends uses search terms to predict the spread of the flu
MIT are using mobile phone data to establish how peoples' locations
and traffic patterns can be used for urban planning
Statistician Nate Silver predicted the outcome of the US election
down to each individual state in 2012.
Big Data can bring the intelligence of online shopping into the retail
Tools used in Big data (1)
NoSQL, it means non relational or Non-SQL database. There are
several database types that fit into this category, such as key-value
stores and document stores, which focus on the storage and retrieval
of large volumes of unstructured, semi-structured, or even structured
Map Reduce by Google
This is a programming paradigm that allows for massive job
execution scalability against thousands of servers or clusters of
The "Map" task, where an input dataset is converted into a different set of
key/value pairs, or tuples
The "Reduce" task, where several of the outputs of the "Map" task are
combined to form a reduced set of tuples
Tools used in Big data (2)
Hadoop by Apache
Hadoop is by far the most popular implementation of MapReduce,
being an entirely open source platform for handling Big Data. It is
flexible enough to be able to work with multiple data sources, either
aggregating multiple sources of data in order to do large scale
Access Data from Twitter (1)
Twitter is an online social networking and microblogging service
that enables users to send and read "tweets", which are text messages
limited to 140 characters.
Twitter, provides various APIs that allows developers to build upon
and extend their applications in new and creative ways.
Twitter for Websites
Twitter for Websites is a suite of products that enables websites to easily
integrate Twitter. It is ideal for site developers looking to quickly and easily
integrate very basic Twitter functions.
Access Data from Twitter (2)
The Search API designed for products looking to allow a user to query
for Twitter content. This may include finding a set of tweets with specific
keywords, finding tweets referencing a specific user, or finding tweets
from a particular user.
The REST API enables developers to access some of the core primitives
of Twitter including timelines, status updates, and user information. If
you're building application that leverages core Twitter objects, then this
is the API which can be useful.
Access Data from Twitter (3)
Streaming APIs offered by Twitter give developers low latency access to
Twitter's global stream of Tweet data. This API is for those developers
with data intensive needs. To build a data mining product or are
interested in analytics research, the Streaming API is most suited for such
Access Data from facebook
Facebook platform provides various API,SDK for develop application
which access the facebook data. The Facebook SDK provides a fast,
native, Facebook integration, using the exact same implementation,
regardless of which environment you're deploying to.
For Mobile platform facebook provides SDK for two platform
For Web development SDK are provided by both Facebook and the
Facebook APIs (1)
The Graph API is a simple HTTP-based API that gives access to the
Facebook social graph, uniformly representing objects in the graph and the
connections between them. Most other APIs at Facebook are based on the
Facebook Query Language, or FQL, enables you to use a SQL-style
interface to query the data exposed by the Graph API.
Facebook offers a number of dialogs for Facebook Login, posting to a
person's timeline or sending requests
Facebook APIs (2)
One can integrate Facebook Chat into Web-based, desktop, or mobile
instant messaging products.
The Ads API allows you to build your own app as a customized alternative
to the Facebook Ads.
Public Feed API
The Public Feed API lets you read the stream of public comments as they
are posted to Facebook.
Friend Locator - Facebook App
Facebook application to display friend’s current location
and home town on Google map using jquery, google map
api and facebook platform.
It uses Oauth and FQL for accessing the client data from
Example of Mining Social Media
The core principal in mining of social sites is attribute-value that is
gathering by applying various algorithms. Attribute for any social
networking site can be categorized into two parts:
Individual attribute describe the personal information about the
human like Gender, birth date, address, phone number, email
Community attributes like friend list, tagged pictures, followers.
If we consider the example of facebook then Nowadays Facebook
users these days can control photo tagging and the sharing of their
friend list with the public user can also share the status with specific
people or group but still user cannot control friends sharing their
friend lists or uploading photos of them from their profiles to the
By collecting and assess the vast amount of facebook user data one
can obtain general behavior of the user. Facebook provides the
sharing option for the phone number and personnel information, if
user discloses this sensitive information in their profile. The user
vulnerability will be increase to become the victim.
Valuable information is hidden in vast amounts of social media
data, presenting ample opportunities social media mining to discover
actionable knowledge that is otherwise difficult to find. Social media data
are vast, noisy, distributed, unstructured, and dynamic, which poses novel
challenges for data mining. In this paper, we offer a brief introduction to
mining social media, use illustrative examples to show that burgeoning
social media mining is spearheading the social media research, and
demonstrate its invaluable contributions to real-world applications.
 PritamGundecha, Huan Liu “Mining Social Media: A Brief
Introduction”, ISBN No 978-0-9843378-3-5
 Brain Amento, Loren Terveen , Will Hill “Experiments in Social
 Roosevelt C. Mosley Jr., FCAS, MAAA “Social Media Analytics:
Data Mining Applied to Insurance Twitter Posts”.
 Facebook Development - https://developers.facebook.com/
 Twitter Development - https://dev.twitter.com/
 Social Networking Statistics & Facts - http://visual.ly/100-socialnetworking-statistics-facts-2012