Data mining for social media


Published on

Introduction to Data Mining for Social Media

Published in: Technology, Business
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Firms are increasingly collecting data on explicit social network of consumers
  • Data mining for social media

    1. 1. Data Mining for Social Media<br />VNG Corporation – R&D Team<br />4/23/2011<br />1<br />VNG Corporation - R&D Team<br />
    2. 2. Content<br />Social Media Growth<br />Social Media Data<br />Data Mining for Social Media<br />Conclusion & Discussion<br />4/23/2011<br />2<br />VNG Corporation - R&D Team<br />
    3. 3. 1. Social Media Growth<br />Top sites Globally<br />Google<br />Facebook<br />Youtube<br />Yahoo<br />Live<br />Baidu<br />Wikipedia<br />Blogger<br />MSN<br />Tencent<br />Twitter<br />Top sites in Vietnam<br />Google<br />Vnexpress<br /><br />Yahoo<br />Youtube<br />Facebook<br /><br /><br />Mediafire<br /><br />4/23/2011<br />VNG Corporation - R&D Team<br />3<br />
    4. 4. 1. Social Media Growth Some Statistics<br />Facebook - largest social network site<br />600,000,000 users, half log in everyday<br />35,000,000,000 online friendships<br />900,000,000 objects people interact with<br />30,000,000,000 shared content items / month<br />YouTube – largest video sharing site<br />2,000,000,000 views per day<br />1,000,000 video hours uploaded per month<br />Twitter – largest microblogging site<br />200,000,000 users per month<br />65,000,000 tweets per day (750 per second)<br />8,000,000 followers of most popular user<br />ZingMe – largest Vietnamese social network<br />35,000,000 users, 10,000,000 monthly active<br />260,000,000 online friendships<br />Plenty of services: music, video, karaoke, games, news, chat, photo, blog …<br />4/23/2011<br />4<br />VNG Corporation - R&D Team<br />
    5. 5. 2. Social Media Data<br />Social media data is everywhere<br />Social Overload:<br />Information Overloadblogs, microblogs, forums, wikis, news, bookmarked web pages, photos, videos, etc.<br />Interaction Overloadfriends, followers, followees, commenters, co-members, voters, “likers”, taggers, etc.<br /> How to extract useful information from this chaos?<br />4/23/2011<br />5<br />VNG Corporation - R&D Team<br />
    6. 6. 2. Social Media Data Opportunities<br />Social Media captures the pulse of humanity!<br />Can directly study opinions and behaviors of millions of users to gain insights into:<br />Human behaviors<br />Marketing analytics, product sentiment<br />Application & Problems:<br />WWW: search, information retrieval (group web sites or documents)<br />Targeted marketing: identify groups of customers or products to make recommendations (targeted advertising, viral marketing)<br />Personalization (interfaces, services)<br />Epidemiology, Fraud detection, Security (counterterrorism)<br />…<br />4/23/2011<br />6<br />VNG Corporation - R&D Team<br />
    7. 7. Quick Recap<br />Social Media Growth<br />Social Media Data<br />Data Mining for Social Media<br />Social Network as a Graph<br />Interesting Problems<br />Community Detection<br />Node Classification<br />Link Classification & Tie Strength<br />Information Flow<br />Conclusion & Discussion<br />4/23/2011<br />7<br />VNG Corporation - R&D Team<br />
    8. 8. 3. Data Mining for Social Media<br />Data Mining in Social Network: <br />Graph Mining:<br />Friendship graph, contact lists.<br />Interactions between users.<br />Text Mining: <br />Blogs, status updates, tweets…<br />Texts, messages sent between users.<br />Some interesting problems for data miners:<br />Model Information Flow (e.g. viral marketing)<br />Model evolution (e.g. link prediction)<br />Extract information for learning (e.g. node classification, community detection).<br />4/23/2011<br />8<br />VNG Corporation - R&D Team<br />
    9. 9. 3.1 Social Network as a Graph<br />A social network is a graph, but:<br />nodes can have attributes<br />edges (links) may be weighed and/or directed, or not<br />so, the similarity (tie strength, affinity) between two nodes is = f(attributes; links)<br />the network’s graph is not a simple random graph (special structural properties)<br />Large-scale graphs<br />Mining of large-scale graph<br />4/23/2011<br />9<br />VNG Corporation - R&D Team<br />
    10. 10. 3.1 Social Graph Characteristics<br />Sparse networks: number of links proportional to the number of nodes.<br />Small world effect:<br />The shortest path between two random nodes is on average small.<br />This property is related to the distribution of the degrees of the nodes: scale-free network (Barabasi, 2000)<br />4/23/2011<br />10<br />VNG Corporation - R&D Team<br />
    11. 11. 3.2 Interesting ProblemsCommunity Detection<br />Community Detection in Social Network:<br />Partition the graph into clusters<br />Find the (small) community around a given node<br />Why Community Detection?<br />Capture network’s dynamic<br />Allow local analysis of interactions.<br />Reveal the properties without releasing individual privacy information.<br />Methods<br />Clustering based on shortest-path betweenness<br />Clustering based on network modularity<br />4/23/2011<br />11<br />VNG Corporation - R&D Team<br />
    12. 12. 3.2 Interesting Problems Node Classification<br />Node Classification for Social Network: <br />Labeling nodes in the network, indicating demographic values, interest, beliefs or other characteristics.<br />Applications: Used as input for Recommendation<br />Suggest new connections, objects.<br />Personalized ads tailored to users’ interest.<br />Find community based on interests, affiliation.<br />Study how ideas are spread over time.<br />Methods<br />Methods based on traditional classifiers using graph information.<br />Graph-based Methods<br />4/23/2011<br />12<br />VNG Corporation - R&D Team<br />
    13. 13. 3.2 Interesting Problems Link Prediction & Tie Strength<br />Link prediction: Given a snapshot of a social network, infer which new interaction among its members are likely to occur in the near future.<br />Tie Strength: combination of amount of TIME, emotional INTENSITY, INTIMACY (mutual confiding), and reciprocal SERVICES.<br />Applications: <br />Predict future friends<br />Find influential users in the networks.<br />Find possible links between users and objects (e.g. online item to be sold).<br />Methods:<br />Supervised Learning: Decision Trees, Logistic Regression, Support Vector Machine …<br />Graph-based methods.<br />4/23/2011<br />13<br />VNG Corporation - R&D Team<br />
    14. 14. 3.2 Interesting Problems Information Flow<br />Information flow through Social Media<br />Analyzing underlying mechanisms for the real-time spread of information through on-line networks<br />Motivating questions:<br />How do messages spread through social networks?<br />How to predict the spread of information?<br />How to identify networks over which the messages spread?<br />Application:<br />Indicate trends and attentions<br />Predictive modeling of the spread of new ideas and behaviors<br />Search: Real-time search, Social search<br />4/23/2011<br />14<br />VNG Corporation - R&D Team<br />
    15. 15. 4. Conclusion and Discussion<br />Social Media – Rich,Big & Open Data:<br />Billions users, billions contents<br />Textual, Multimedia (image, videos, etc.)<br />Billions of connections<br />Behaviors, preferences, trends...<br />Challenges:<br />Large-scale Problems<br />Noise in data<br />Recommender System for users and enterprises:<br />Maintain users’ interest and attract new users to the network<br />Targeted Marketing: Show appropriate ads and items personalized for users to<br />Predict users’ interests and trends: Make effective plans.<br />…<br />4/23/2011<br />15<br />VNG Corporation - R&D Team<br />
    16. 16. 4/23/2011<br />VNG Corporation - R&D Team<br />16<br />Thank you <br />for your attention!<br />