Data Mining for Social MediaVNG Corporation – R&D Team4/23/20111VNG Corporation - R&D Team
ContentSocial Media GrowthSocial Media DataData Mining for Social MediaConclusion & Discussion4/23/20112VNG Corporation - R&D Team
1. Social Media GrowthTop sites GloballyGoogleFacebookYoutubeYahooLiveBaiduWikipediaBloggerMSNTencentTwitterTop sites in VietnamGoogleVnexpressZing.vnYahooYoutubeFacebookDantri.com.vn24h.com.vnMediafireVatgia.com4/23/2011VNG Corporation - R&D Team3
1. Social Media Growth Some StatisticsFacebook  - largest social network site600,000,000 users, half log in everyday35,000,000,000 online friendships900,000,000 objects people interact with30,000,000,000 shared content items / monthYouTube – largest video sharing site2,000,000,000 views per day1,000,000 video hours uploaded per monthTwitter – largest microblogging site200,000,000 users per month65,000,000 tweets per day (750 per second)8,000,000 followers of most popular userZingMe – largest Vietnamese social network35,000,000 users, 10,000,000 monthly active260,000,000 online friendshipsPlenty of services: music, video, karaoke, games, news, chat, photo, blog …4/23/20114VNG Corporation - R&D Team
2. Social Media DataSocial media data is everywhereSocial Overload:Information Overloadblogs, microblogs, forums, wikis, news, bookmarked web pages, photos, videos, etc.Interaction Overloadfriends, followers, followees, commenters, co-members, voters, “likers”, taggers, etc. How to extract useful information from this chaos?4/23/20115VNG Corporation - R&D Team
2. Social Media Data OpportunitiesSocial Media captures the pulse of humanity!Can directly study opinions and behaviors of millions of users to gain insights into:Human behaviorsMarketing analytics, product sentimentApplication & Problems:WWW: search, information retrieval (group web sites or documents)Targeted marketing: identify groups of customers or products to make recommendations (targeted advertising, viral marketing)Personalization (interfaces, services)Epidemiology, Fraud detection, Security (counterterrorism)…4/23/20116VNG Corporation - R&D Team
Quick RecapSocial Media GrowthSocial Media DataData Mining for Social MediaSocial Network as a GraphInteresting ProblemsCommunity DetectionNode ClassificationLink Classification & Tie StrengthInformation FlowConclusion & Discussion4/23/20117VNG Corporation - R&D Team
3. Data Mining for Social MediaData Mining in Social Network: Graph Mining:Friendship graph, contact lists.Interactions between users.Text Mining: Blogs, status updates, tweets…Texts, messages sent between users.Some interesting problems for data miners:Model Information Flow (e.g. viral marketing)Model evolution (e.g. link prediction)Extract information for learning (e.g. node classification, community detection).4/23/20118VNG Corporation - R&D Team
3.1 Social Network as a GraphA social network is a graph, but:nodes can have attributesedges (links) may be weighed and/or directed, or notso, the similarity (tie strength, affinity) between two nodes is = f(attributes; links)the network’s graph is not a simple random graph (special structural properties)Large-scale graphsMining of large-scale graph4/23/20119VNG Corporation - R&D Team
3.1 Social Graph CharacteristicsSparse networks: number of links proportional to the number of nodes.Small world effect:The shortest path between two random nodes is on average small.This property is related to the distribution of the degrees of the nodes: scale-free network (Barabasi, 2000)4/23/201110VNG Corporation - R&D Team
3.2 Interesting ProblemsCommunity DetectionCommunity Detection in Social Network:Partition the graph into clustersFind the (small) community around a given nodeWhy Community Detection?Capture network’s dynamicAllow local analysis of interactions.Reveal the properties without releasing individual privacy information.MethodsClustering based on shortest-path betweennessClustering based on network modularity4/23/201111VNG Corporation - R&D Team
3.2 Interesting Problems Node ClassificationNode Classification for Social Network: Labeling nodes in the network, indicating demographic values, interest, beliefs or other characteristics.Applications: Used as input for RecommendationSuggest new connections, objects.Personalized ads tailored to users’ interest.Find community based on interests, affiliation.Study how ideas are spread over time.MethodsMethods based on traditional classifiers using  graph information.Graph-based Methods4/23/201112VNG Corporation - R&D Team
3.2 Interesting Problems Link Prediction & Tie StrengthLink prediction: Given a snapshot of a social network, infer which new interaction among its members are likely to occur in the near future.Tie Strength: combination of amount of TIME, emotional INTENSITY, INTIMACY (mutual confiding), and reciprocal SERVICES.Applications: Predict future friendsFind influential users in the networks.Find possible links between users and objects (e.g. online item to be sold).Methods:Supervised Learning: Decision Trees, Logistic Regression, Support Vector Machine …Graph-based methods.4/23/201113VNG Corporation - R&D Team
3.2 Interesting Problems Information FlowInformation flow through Social MediaAnalyzing underlying mechanisms for the real-time spread of information through on-line networksMotivating questions:How do messages spread through social networks?How to predict the spread of information?How to identify networks over which the messages spread?Application:Indicate trends and attentionsPredictive modeling of the spread of new ideas and behaviorsSearch: Real-time search, Social search4/23/201114VNG Corporation - R&D Team
4. Conclusion and DiscussionSocial Media – Rich,Big & Open Data:Billions users, billions contentsTextual, Multimedia (image, videos, etc.)Billions of connectionsBehaviors, preferences, trends...Challenges:Large-scale ProblemsNoise in dataRecommender System for users and enterprises:Maintain users’ interest and attract new users to the networkTargeted Marketing: Show appropriate ads and items personalized for users toPredict users’ interests and trends: Make effective plans.…4/23/201115VNG Corporation - R&D Team
4/23/2011VNG Corporation - R&D Team16Thank you for your attention!

Data mining for social media

  • 1.
    Data Mining forSocial MediaVNG Corporation – R&D Team4/23/20111VNG Corporation - R&D Team
  • 2.
    ContentSocial Media GrowthSocialMedia DataData Mining for Social MediaConclusion & Discussion4/23/20112VNG Corporation - R&D Team
  • 3.
    1. Social MediaGrowthTop sites GloballyGoogleFacebookYoutubeYahooLiveBaiduWikipediaBloggerMSNTencentTwitterTop sites in VietnamGoogleVnexpressZing.vnYahooYoutubeFacebookDantri.com.vn24h.com.vnMediafireVatgia.com4/23/2011VNG Corporation - R&D Team3
  • 4.
    1. Social MediaGrowth Some StatisticsFacebook - largest social network site600,000,000 users, half log in everyday35,000,000,000 online friendships900,000,000 objects people interact with30,000,000,000 shared content items / monthYouTube – largest video sharing site2,000,000,000 views per day1,000,000 video hours uploaded per monthTwitter – largest microblogging site200,000,000 users per month65,000,000 tweets per day (750 per second)8,000,000 followers of most popular userZingMe – largest Vietnamese social network35,000,000 users, 10,000,000 monthly active260,000,000 online friendshipsPlenty of services: music, video, karaoke, games, news, chat, photo, blog …4/23/20114VNG Corporation - R&D Team
  • 5.
    2. Social MediaDataSocial media data is everywhereSocial Overload:Information Overloadblogs, microblogs, forums, wikis, news, bookmarked web pages, photos, videos, etc.Interaction Overloadfriends, followers, followees, commenters, co-members, voters, “likers”, taggers, etc. How to extract useful information from this chaos?4/23/20115VNG Corporation - R&D Team
  • 6.
    2. Social MediaData OpportunitiesSocial Media captures the pulse of humanity!Can directly study opinions and behaviors of millions of users to gain insights into:Human behaviorsMarketing analytics, product sentimentApplication & Problems:WWW: search, information retrieval (group web sites or documents)Targeted marketing: identify groups of customers or products to make recommendations (targeted advertising, viral marketing)Personalization (interfaces, services)Epidemiology, Fraud detection, Security (counterterrorism)…4/23/20116VNG Corporation - R&D Team
  • 7.
    Quick RecapSocial MediaGrowthSocial Media DataData Mining for Social MediaSocial Network as a GraphInteresting ProblemsCommunity DetectionNode ClassificationLink Classification & Tie StrengthInformation FlowConclusion & Discussion4/23/20117VNG Corporation - R&D Team
  • 8.
    3. Data Miningfor Social MediaData Mining in Social Network: Graph Mining:Friendship graph, contact lists.Interactions between users.Text Mining: Blogs, status updates, tweets…Texts, messages sent between users.Some interesting problems for data miners:Model Information Flow (e.g. viral marketing)Model evolution (e.g. link prediction)Extract information for learning (e.g. node classification, community detection).4/23/20118VNG Corporation - R&D Team
  • 9.
    3.1 Social Networkas a GraphA social network is a graph, but:nodes can have attributesedges (links) may be weighed and/or directed, or notso, the similarity (tie strength, affinity) between two nodes is = f(attributes; links)the network’s graph is not a simple random graph (special structural properties)Large-scale graphsMining of large-scale graph4/23/20119VNG Corporation - R&D Team
  • 10.
    3.1 Social GraphCharacteristicsSparse networks: number of links proportional to the number of nodes.Small world effect:The shortest path between two random nodes is on average small.This property is related to the distribution of the degrees of the nodes: scale-free network (Barabasi, 2000)4/23/201110VNG Corporation - R&D Team
  • 11.
    3.2 Interesting ProblemsCommunityDetectionCommunity Detection in Social Network:Partition the graph into clustersFind the (small) community around a given nodeWhy Community Detection?Capture network’s dynamicAllow local analysis of interactions.Reveal the properties without releasing individual privacy information.MethodsClustering based on shortest-path betweennessClustering based on network modularity4/23/201111VNG Corporation - R&D Team
  • 12.
    3.2 Interesting ProblemsNode ClassificationNode Classification for Social Network: Labeling nodes in the network, indicating demographic values, interest, beliefs or other characteristics.Applications: Used as input for RecommendationSuggest new connections, objects.Personalized ads tailored to users’ interest.Find community based on interests, affiliation.Study how ideas are spread over time.MethodsMethods based on traditional classifiers using graph information.Graph-based Methods4/23/201112VNG Corporation - R&D Team
  • 13.
    3.2 Interesting ProblemsLink Prediction & Tie StrengthLink prediction: Given a snapshot of a social network, infer which new interaction among its members are likely to occur in the near future.Tie Strength: combination of amount of TIME, emotional INTENSITY, INTIMACY (mutual confiding), and reciprocal SERVICES.Applications: Predict future friendsFind influential users in the networks.Find possible links between users and objects (e.g. online item to be sold).Methods:Supervised Learning: Decision Trees, Logistic Regression, Support Vector Machine …Graph-based methods.4/23/201113VNG Corporation - R&D Team
  • 14.
    3.2 Interesting ProblemsInformation FlowInformation flow through Social MediaAnalyzing underlying mechanisms for the real-time spread of information through on-line networksMotivating questions:How do messages spread through social networks?How to predict the spread of information?How to identify networks over which the messages spread?Application:Indicate trends and attentionsPredictive modeling of the spread of new ideas and behaviorsSearch: Real-time search, Social search4/23/201114VNG Corporation - R&D Team
  • 15.
    4. Conclusion andDiscussionSocial Media – Rich,Big & Open Data:Billions users, billions contentsTextual, Multimedia (image, videos, etc.)Billions of connectionsBehaviors, preferences, trends...Challenges:Large-scale ProblemsNoise in dataRecommender System for users and enterprises:Maintain users’ interest and attract new users to the networkTargeted Marketing: Show appropriate ads and items personalized for users toPredict users’ interests and trends: Make effective plans.…4/23/201115VNG Corporation - R&D Team
  • 16.
    4/23/2011VNG Corporation -R&D Team16Thank you for your attention!

Editor's Notes

  • #6 Firms are increasingly collecting data on explicit social network of consumers