Social Data Mining

46,342 views

Published on

Published in: Social Media, Technology, Business
5 Comments
65 Likes
Statistics
Notes
No Downloads
Views
Total views
46,342
On SlideShare
0
From Embeds
0
Number of Embeds
159
Actions
Shares
0
Downloads
0
Comments
5
Likes
65
Embeds 0
No embeds

No notes for slide

Social Data Mining

  1. 1. Social Data Mining Mahesh J. Meniya Akash M. Rangani
  2. 2. Data, Information, Knowledge(1) Data Facts and statistics collected together for reference or analysis. The quantities, characters, or symbols on which operations are performed by a computer, being stored and transmitted. Information The patterns, associations, or relationships among all this data can provide information. For example, analysis of retail point of sale transaction data can yield information on which products are selling and when.
  3. 3. Data, Information, Knowledge(2) Knowledge Information can be converted into knowledge about historical patterns and future trends. For example, summary information on retail supermarket sales can be analyzed in light of promotional efforts to provide knowledge of consumer buying behavior. Thus, a manufacturer or retailer could determine which items are most susceptible to promotional efforts.
  4. 4. What is Data Mining ? From the large dataset find the : Unknown Useful Information. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. The process of collecting, searching through, and analyzing a large amount of data in a database, as to discover patterns or relationships
  5. 5. What is Social Data Mining ? Social media is designed as a group of Internet-based applications that build on the ideological and technological foundations of Web 2.0 and that allow the creation and exchanges of user-generated content. Vast amounts of user-generated content are created on social media sites every day i.e. facebook, Twitter, Google+ Systematically analyzing the valuable information from the Social media is Social data mining Social media data are largely user-generated content which is vast, noisy, distributed, unstructured, and dynamic
  6. 6. Social Media Platform Blogging Microblogs Community-based Question Answer (CQA) Emails and Chat Hybrid Applications Wikis Social news Social bookmarking Media sharing Opinion, reviews, and ratings
  7. 7. Why Important ? The WWW is vast People shares more data Advertising and marketing Products are more customized More devices produce more data Market Research Customer Experience Brand Loyalties Product development and design Communication, Marketing
  8. 8. Structures in Social Media Social structures represent social relationships between community members. Accordingly, social applications are often designed to systemically support these properties. Social structures represent social relationships between community members. For example, in online forums, a useful criterion provided by a social structure is whether or not a member is an expert in a specific topic.
  9. 9. Types of Social Media Structure Hierarchical Structure Objects used in social data mining often possess a natural hierarchical structure. For example, even a short document comprises a number of sentences. Accordingly, hierarchical structures has been frequently addressed in information representation. Conversational Structure We can identify conversational structures explicitly or implicitly in most social platform involving interactions between users. For example, in emails and forums, conversational structures are formed by replies.
  10. 10. Data Mining Techniques for Social Media Graph Mining Graphs (or networks) constitute a dominant data structure and appear essentially in all forms of information. Examples include the Web graph, social networks. Typically, the communities correspond to groups of nodes, where nodes within the same community (or clusters) tend to be highly similar sharing common features, while on the other hand, nodes of different communities show low similarity. Extracting useful knowledge (patterns, outliers, etc.) from structured data that can be represented as a graph.
  11. 11. Graph Mining usage Google uses page rank as one of many predictors for the relevance of a web page. The link structure in the world-wide-web network provides valuable contextual information about which pages are deemed most relevant by the web page creators—this contextual link structure is then used to predict relevance for a user’s query. Useful for understand relationships as well as content (text, images), Social media host tries to look at certain online groups and predict about the group whether the group will flourish or disband.
  12. 12. Graph Mining usage cont. Phone provider looks at cell phone call records to determine whether an account is a result of identity theft. Facebook Graph Search Query examples Searching people: “friends of friends who are single female in Rajkot” Searching interests: “movies my friends like”, “TV shows my friends like”, “Videos by TV shows liked by my friends”. Searching places: “Restaurant in Rajkot liked by friends”
  13. 13. Sample query for Facebook Graph search
  14. 14. Result Facebook Graph search
  15. 15. Text Mining Text mining is an emerging technology that attempts to extract meaningful information from unstructured textual data. Text mining is an extension of data mining to textual data. Social networks contain a lot of text in the nodes in various forms. For example, social networks may contain links to posts, blogs or other news articles.
  16. 16. Usage of text mining (1) Automatic processing of messages, emails common application for text mining is to aid in the automatic classification of texts. For example, it is possible to "filter" out automatically most undesirable "junk email" based on certain terms or words that are not likely to appear in legitimate messages Investigating competitors by crawling their web sites Another type of potentially very useful application is to automatically process the contents of Web pages in a particular domain. For example, you could go to a Web page, and begin "crawling" the links you find there to process all Web pages that are referenced.
  17. 17. Usage of text mining (2) Medical Mining medical records to improve care of patient Security applications Many text mining software packages are marketed for security applications, especially monitoring and analysis of online plain text sources such as Internet news, blogs, etc. for national security purposes.
  18. 18. Text Mining Process
  19. 19. Generic Process of social data mining Web 2.0 data source Data Collection Data Modeling Used In application Mining Methods • Cluster & community Detection • static analysis • Classification
  20. 20. Text Mining Process stages (1) Data Collection The data collector module continuously downloads the from one or more social platform and stores the raw data into the database (e.g.BigData) or normal database. Based on application type the parameters are specified with the API call. Data Modeling Data modeling is a process used to define and analyze data requirements needed to support the application processes within the scope of corresponding application. In the data modeling stage data is model in various data model based on the application nature
  21. 21. Text Mining Process stages (2) Mining Methods Cluster analysis automatic or semi-automatic analysis of large quantities of data to extract previously unknown interesting patterns such as groups of data records known as cluster analysis. Anomaly detection It is the search for items or events which do not confirm to an expected pattern
  22. 22. Text Mining Process stages (3) Static analysis Analysis of historical business activities, stored as static data in data warehouse databases, to reveal hidden patterns and trends. Examples of what businesses use data mining for include performing market analysis to finding the root cause of manufacturing problems Can be used to assist in discovering previously unknown strategic business information. To prevent customer attrition and acquire new customers Cross-sell to existing customers Manage customers with more accuracy.
  23. 23. OAuth 2.0 OAuth is an open standard for authorization It provides a process for end-users to authorize third-party access to their server resources without sharing their credentials (typically, a username and password pair), using user-agent redirections. Open authentication protocol which enables applications to access each other’s data.
  24. 24. Authorization flow
  25. 25. Authorization flow steps(1) First the user accesses the client web application. In this web app is button saying "Login via Facebook" (or some other system like Google or Twitter). Second, when the user clicks the login button, the user is redirected to the authenticating application (e.g. Facebook). The user then logs into the authenticating application, and is asked if s/he wants to grant access to her data in the authenticating application, to the client application. The user accepts. Third, the authenticating application redirects the user to a redirect URI, which the client app has provided to the authenticating app. providing this redirect URI is normally done by registering the client application with the authenticating application.
  26. 26. Authorization flow steps(2) Fourth, the user accesses the page located at the redirect URI in the client application. In the background the client application contacts the authenticating application and sends Once the client application has obtained an access token, this access token can be sent to the Facebook, Google, Twitter etc. to access resources in these systems, related to the user who logged in.
  27. 27. Roles of users and applications in oAuth 2.0 (1)
  28. 28. Roles of users and applications in Auth 2.0 (2) Resource Owner The resource owner is the person or application that owns the data that is to be shared. For instance, a user on Facebook or Google could be a resource owner. Resource Server The resource server is the server hosting the resource owned by the resource server. For instance, Facebook or Google is a resource server Client Application The client application is the application requesting access to the resources stored on the resource server. The resources, which are owned by the resource owner. A client application could be a game requesting access to a users Facebook account.
  29. 29. Roles of users and applications in Auth 2.0 (3) Authorization Server The authorization server is the server authorizing the client application to access the resources of the resource owner. The authorization server and the resource server can be the same server
  30. 30. Big data Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, storage, search, sharing, transfer, analysis and visualization. Some Examples : Facebook has more than 1.15 billion active users generating social interaction data. More than 5 billion people are calling, texting, tweeting and browsing websites on mobile phones Scientific instruments generate large amount of data
  31. 31. Characteristics of Big Data
  32. 32. Application Big data Google Flu Trends uses search terms to predict the spread of the flu virus MIT are using mobile phone data to establish how peoples' locations and traffic patterns can be used for urban planning Statistician Nate Silver predicted the outcome of the US election down to each individual state in 2012. Big Data can bring the intelligence of online shopping into the retail environment
  33. 33. Tools used in Big data (1) NoSQL databases NoSQL, it means non relational or Non-SQL database. There are several database types that fit into this category, such as key-value stores and document stores, which focus on the storage and retrieval of large volumes of unstructured, semi-structured, or even structured data. Map Reduce by Google This is a programming paradigm that allows for massive job execution scalability against thousands of servers or clusters of servers. The "Map" task, where an input dataset is converted into a different set of key/value pairs, or tuples The "Reduce" task, where several of the outputs of the "Map" task are combined to form a reduced set of tuples
  34. 34. Tools used in Big data (2) Hadoop by Apache Hadoop is by far the most popular implementation of MapReduce, being an entirely open source platform for handling Big Data. It is flexible enough to be able to work with multiple data sources, either aggregating multiple sources of data in order to do large scale processing.
  35. 35. Access Data from Twitter (1) Twitter is an online social networking and microblogging service that enables users to send and read "tweets", which are text messages limited to 140 characters. Twitter, provides various APIs that allows developers to build upon and extend their applications in new and creative ways. Twitter for Websites Twitter for Websites is a suite of products that enables websites to easily integrate Twitter. It is ideal for site developers looking to quickly and easily integrate very basic Twitter functions.
  36. 36. Access Data from Twitter (2) Search API The Search API designed for products looking to allow a user to query for Twitter content. This may include finding a set of tweets with specific keywords, finding tweets referencing a specific user, or finding tweets from a particular user. REST API The REST API enables developers to access some of the core primitives of Twitter including timelines, status updates, and user information. If you're building application that leverages core Twitter objects, then this is the API which can be useful.
  37. 37. Twitter REST API calls
  38. 38. Access Data from Twitter (3) Streaming API Streaming APIs offered by Twitter give developers low latency access to Twitter's global stream of Tweet data. This API is for those developers with data intensive needs. To build a data mining product or are interested in analytics research, the Streaming API is most suited for such things.
  39. 39. Twitter Streaming API calls
  40. 40. Access Data from facebook Facebook platform provides various API,SDK for develop application which access the facebook data. The Facebook SDK provides a fast, native, Facebook integration, using the exact same implementation, regardless of which environment you're deploying to. For Mobile platform facebook provides SDK for two platform iOS platform Android platform For Web development SDK are provided by both Facebook and the community Php Javascript Ruby Node.js C#
  41. 41. Facebook APIs (1) Search API The Graph API is a simple HTTP-based API that gives access to the Facebook social graph, uniformly representing objects in the graph and the connections between them. Most other APIs at Facebook are based on the Graph API. FQL Facebook Query Language, or FQL, enables you to use a SQL-style interface to query the data exposed by the Graph API. Dialogs Facebook offers a number of dialogs for Facebook Login, posting to a person's timeline or sending requests
  42. 42. Facebook APIs (2) Chat One can integrate Facebook Chat into Web-based, desktop, or mobile instant messaging products. Ads API The Ads API allows you to build your own app as a customized alternative to the Facebook Ads. Public Feed API The Public Feed API lets you read the stream of public comments as they are posted to Facebook.
  43. 43. Friend Locator - Facebook App Facebook application to display friend’s current location and home town on Google map using jquery, google map api and facebook platform. It uses Oauth and FQL for accessing the client data from the facebook.
  44. 44. Request Permission for application
  45. 45. Friend Map on Google Map
  46. 46. List of Friends in selected city
  47. 47. Example of Mining Social Media The core principal in mining of social sites is attribute-value that is gathering by applying various algorithms. Attribute for any social networking site can be categorized into two parts: Individual Attributes Community Attributes Individual attribute describe the personal information about the human like Gender, birth date, address, phone number, email address etc. Community attributes like friend list, tagged pictures, followers.
  48. 48. If we consider the example of facebook then Nowadays Facebook users these days can control photo tagging and the sharing of their friend list with the public user can also share the status with specific people or group but still user cannot control friends sharing their friend lists or uploading photos of them from their profiles to the public. By collecting and assess the vast amount of facebook user data one can obtain general behavior of the user. Facebook provides the sharing option for the phone number and personnel information, if user discloses this sensitive information in their profile. The user vulnerability will be increase to become the victim.
  49. 49. Conclusion Valuable information is hidden in vast amounts of social media data, presenting ample opportunities social media mining to discover actionable knowledge that is otherwise difficult to find. Social media data are vast, noisy, distributed, unstructured, and dynamic, which poses novel challenges for data mining. In this paper, we offer a brief introduction to mining social media, use illustrative examples to show that burgeoning social media mining is spearheading the social media research, and demonstrate its invaluable contributions to real-world applications.
  50. 50. References [1] PritamGundecha, Huan Liu “Mining Social Media: A Brief Introduction”, ISBN No 978-0-9843378-3-5 [2] Brain Amento, Loren Terveen , Will Hill “Experiments in Social Data Mining”. [3] Roosevelt C. Mosley Jr., FCAS, MAAA “Social Media Analytics: Data Mining Applied to Insurance Twitter Posts”. [4] Facebook Development - https://developers.facebook.com/ [5] Twitter Development - https://dev.twitter.com/ [6] Social Networking Statistics & Facts - http://visual.ly/100-socialnetworking-statistics-facts-2012

×