SlideShare a Scribd company logo
Data Mining for Social Media VNG Corporation – R&D Team 4/23/2011 1 VNG Corporation - R&D Team
Content Social Media Growth Social Media Data Data Mining for Social Media Conclusion & Discussion 4/23/2011 2 VNG Corporation - R&D Team
1. Social Media Growth Top sites Globally Google Facebook Youtube Yahoo Live Baidu Wikipedia Blogger MSN Tencent Twitter Top sites in Vietnam Google Vnexpress Zing.vn Yahoo Youtube Facebook Dantri.com.vn 24h.com.vn Mediafire Vatgia.com 4/23/2011 VNG Corporation - R&D Team 3
1. Social Media Growth Some Statistics Facebook  - largest social network site 600,000,000 users, half log in everyday 35,000,000,000 online friendships 900,000,000 objects people interact with 30,000,000,000 shared content items / month YouTube – largest video sharing site 2,000,000,000 views per day 1,000,000 video hours uploaded per month Twitter – largest microblogging site 200,000,000 users per month 65,000,000 tweets per day (750 per second) 8,000,000 followers of most popular user ZingMe – largest Vietnamese social network 35,000,000 users, 10,000,000 monthly active 260,000,000 online friendships Plenty of services: music, video, karaoke, games, news, chat, photo, blog … 4/23/2011 4 VNG Corporation - R&D Team
2. Social Media Data Social media data is everywhere Social Overload: Information Overloadblogs, microblogs, forums, wikis, news, bookmarked web pages, photos, videos, etc. Interaction Overloadfriends, followers, followees, commenters, co-members, voters, “likers”, taggers, etc.  How to extract useful information from this chaos? 4/23/2011 5 VNG Corporation - R&D Team
2. Social Media Data Opportunities Social Media captures the pulse of humanity! Can directly study opinions and behaviors of millions of users to gain insights into: Human behaviors Marketing analytics, product sentiment Application & Problems: WWW: search, information retrieval (group web sites or documents) Targeted marketing: identify groups of customers or products to make recommendations (targeted advertising, viral marketing) Personalization (interfaces, services) Epidemiology, Fraud detection, Security (counterterrorism) … 4/23/2011 6 VNG Corporation - R&D Team
Quick Recap Social Media Growth Social Media Data Data Mining for Social Media Social Network as a Graph Interesting Problems Community Detection Node Classification Link Classification & Tie Strength Information Flow Conclusion & Discussion 4/23/2011 7 VNG Corporation - R&D Team
3. Data Mining for Social Media Data Mining in Social Network:  Graph Mining: Friendship graph, contact lists. Interactions between users. Text Mining:  Blogs, status updates, tweets… Texts, messages sent between users. Some interesting problems for data miners: Model Information Flow (e.g. viral marketing) Model evolution (e.g. link prediction) Extract information for learning (e.g. node classification, community detection). 4/23/2011 8 VNG Corporation - R&D Team
3.1 Social Network as a Graph A social network is a graph, but: nodes can have attributes edges (links) may be weighed and/or directed, or not so, the similarity (tie strength, affinity) between two nodes is = f(attributes; links) the network’s graph is not a simple random graph (special structural properties) Large-scale graphs Mining of large-scale graph 4/23/2011 9 VNG Corporation - R&D Team
3.1 Social Graph Characteristics Sparse networks: number of links proportional to the number of nodes. Small world effect: The shortest path between two random nodes is on average small. This property is related to the distribution of the degrees of the nodes: scale-free network (Barabasi, 2000) 4/23/2011 10 VNG Corporation - R&D Team
3.2 Interesting ProblemsCommunity Detection Community Detection in Social Network: Partition the graph into clusters Find the (small) community around a given node Why Community Detection? Capture network’s dynamic Allow local analysis of interactions. Reveal the properties without releasing individual privacy information. Methods Clustering based on shortest-path betweenness Clustering based on network modularity 4/23/2011 11 VNG Corporation - R&D Team
3.2 Interesting Problems Node Classification Node Classification for Social Network:  Labeling nodes in the network, indicating demographic values, interest, beliefs or other characteristics. Applications: Used as input for Recommendation Suggest new connections, objects. Personalized ads tailored to users’ interest. Find community based on interests, affiliation. Study how ideas are spread over time. Methods Methods based on traditional classifiers using  graph information. Graph-based Methods 4/23/2011 12 VNG Corporation - R&D Team
3.2 Interesting Problems Link Prediction & Tie Strength Link prediction: Given a snapshot of a social network, infer which new interaction among its members are likely to occur in the near future. Tie Strength: combination of amount of TIME, emotional INTENSITY, INTIMACY (mutual confiding), and reciprocal SERVICES. Applications:  Predict future friends Find influential users in the networks. Find possible links between users and objects (e.g. online item to be sold). Methods: Supervised Learning: Decision Trees, Logistic Regression, Support Vector Machine … Graph-based methods. 4/23/2011 13 VNG Corporation - R&D Team
3.2 Interesting Problems Information Flow Information flow through Social Media Analyzing underlying mechanisms for the real-time spread of information through on-line networks Motivating questions: How do messages spread through social networks? How to predict the spread of information? How to identify networks over which the messages spread? Application: Indicate trends and attentions Predictive modeling of the spread of new ideas and behaviors Search: Real-time search, Social search 4/23/2011 14 VNG Corporation - R&D Team
4. Conclusion and Discussion Social Media – Rich,Big & Open Data: Billions users, billions contents Textual, Multimedia (image, videos, etc.) Billions of connections Behaviors, preferences, trends... Challenges: Large-scale Problems Noise in data Recommender System for users and enterprises: Maintain users’ interest and attract new users to the network Targeted Marketing: Show appropriate ads and items personalized for users to Predict users’ interests and trends: Make effective plans. … 4/23/2011 15 VNG Corporation - R&D Team
4/23/2011 VNG Corporation - R&D Team 16 Thank you  for your attention!

More Related Content

What's hot

Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
Gopal Sakarkar
 
Social media mining PPT
Social media mining PPTSocial media mining PPT
Social media mining PPT
Chhavi Mathur
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender Systems
David Zibriczky
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
nimmyjans4
 
2. visualization in data mining
2. visualization in data mining2. visualization in data mining
2. visualization in data mining
Azad public school
 
Social Media Mining and Analytics
Social Media Mining and AnalyticsSocial Media Mining and Analytics
Introduction to Social Network Analysis
Introduction to Social Network AnalysisIntroduction to Social Network Analysis
Introduction to Social Network Analysis
Patti Anklam
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
Institute of Technology Telkom
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data mining
DataminingTools Inc
 
Social Media Mining - Chapter 7 (Information Diffusion)
Social Media Mining - Chapter 7 (Information Diffusion)Social Media Mining - Chapter 7 (Information Diffusion)
Social Media Mining - Chapter 7 (Information Diffusion)
SocialMediaMining
 
Web mining slides
Web mining slidesWeb mining slides
Web mining slides
mahavir_a
 
Sentiment Analysis and Social Media: How and Why
Sentiment Analysis and Social Media: How and WhySentiment Analysis and Social Media: How and Why
Sentiment Analysis and Social Media: How and Why
Davide Feltoni Gurini
 
Presentation on Sentiment Analysis
Presentation on Sentiment AnalysisPresentation on Sentiment Analysis
Presentation on Sentiment Analysis
Rebecca Williams
 
web mining
web miningweb mining
web mining
Arpit Verma
 
Recommendation Systems Basics
Recommendation Systems BasicsRecommendation Systems Basics
Recommendation Systems Basics
Jarin Tasnim Khan
 
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
error007
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
Anandh Arumugakan
 
Predictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesPredictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use Cases
Kimberley Mitchell
 
5.2 mining time series data
5.2 mining time series data5.2 mining time series data
5.2 mining time series data
Krish_ver2
 
Web mining
Web mining Web mining
Web mining
TeklayBirhane
 

What's hot (20)

Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Social media mining PPT
Social media mining PPTSocial media mining PPT
Social media mining PPT
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender Systems
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
 
2. visualization in data mining
2. visualization in data mining2. visualization in data mining
2. visualization in data mining
 
Social Media Mining and Analytics
Social Media Mining and AnalyticsSocial Media Mining and Analytics
Social Media Mining and Analytics
 
Introduction to Social Network Analysis
Introduction to Social Network AnalysisIntroduction to Social Network Analysis
Introduction to Social Network Analysis
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data mining
 
Social Media Mining - Chapter 7 (Information Diffusion)
Social Media Mining - Chapter 7 (Information Diffusion)Social Media Mining - Chapter 7 (Information Diffusion)
Social Media Mining - Chapter 7 (Information Diffusion)
 
Web mining slides
Web mining slidesWeb mining slides
Web mining slides
 
Sentiment Analysis and Social Media: How and Why
Sentiment Analysis and Social Media: How and WhySentiment Analysis and Social Media: How and Why
Sentiment Analysis and Social Media: How and Why
 
Presentation on Sentiment Analysis
Presentation on Sentiment AnalysisPresentation on Sentiment Analysis
Presentation on Sentiment Analysis
 
web mining
web miningweb mining
web mining
 
Recommendation Systems Basics
Recommendation Systems BasicsRecommendation Systems Basics
Recommendation Systems Basics
 
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
 
Predictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesPredictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use Cases
 
5.2 mining time series data
5.2 mining time series data5.2 mining time series data
5.2 mining time series data
 
Web mining
Web mining Web mining
Web mining
 

Viewers also liked

cf. city flows - A comparative visualization of bike sharing systems
cf. city flows - A comparative visualization of bike sharing systemscf. city flows - A comparative visualization of bike sharing systems
cf. city flows - A comparative visualization of bike sharing systems
Till Nagel
 
Survey of data mining techniques for social
Survey of data mining techniques for socialSurvey of data mining techniques for social
Survey of data mining techniques for social
Firas Husseini
 
Social Targeting: Understanding Social Media Data Mining & Analysis
Social Targeting: Understanding Social Media Data Mining & AnalysisSocial Targeting: Understanding Social Media Data Mining & Analysis
Social Targeting: Understanding Social Media Data Mining & Analysis
Infini Graph
 
Data mining for analyzing social media
Data mining for analyzing social mediaData mining for analyzing social media
Data mining for analyzing social media
Julien Velcin
 
Data mining based social network
Data mining based social networkData mining based social network
Data mining based social network
Firas Husseini
 
Data mining in social network
Data mining in social networkData mining in social network
Data mining in social network
akash_mishra
 

Viewers also liked (6)

cf. city flows - A comparative visualization of bike sharing systems
cf. city flows - A comparative visualization of bike sharing systemscf. city flows - A comparative visualization of bike sharing systems
cf. city flows - A comparative visualization of bike sharing systems
 
Survey of data mining techniques for social
Survey of data mining techniques for socialSurvey of data mining techniques for social
Survey of data mining techniques for social
 
Social Targeting: Understanding Social Media Data Mining & Analysis
Social Targeting: Understanding Social Media Data Mining & AnalysisSocial Targeting: Understanding Social Media Data Mining & Analysis
Social Targeting: Understanding Social Media Data Mining & Analysis
 
Data mining for analyzing social media
Data mining for analyzing social mediaData mining for analyzing social media
Data mining for analyzing social media
 
Data mining based social network
Data mining based social networkData mining based social network
Data mining based social network
 
Data mining in social network
Data mining in social networkData mining in social network
Data mining in social network
 

Similar to Data mining for social media

An Empirical Study On IMDb And Its Communities Based On The Network Of Co-Rev...
An Empirical Study On IMDb And Its Communities Based On The Network Of Co-Rev...An Empirical Study On IMDb And Its Communities Based On The Network Of Co-Rev...
An Empirical Study On IMDb And Its Communities Based On The Network Of Co-Rev...
Rick Vogel
 
Jx2517481755
Jx2517481755Jx2517481755
Jx2517481755
IJERA Editor
 
Jx2517481755
Jx2517481755Jx2517481755
Jx2517481755
IJERA Editor
 
2010 Catalyst Conference - Trends in Social Network Analysis
2010 Catalyst Conference - Trends in Social Network Analysis2010 Catalyst Conference - Trends in Social Network Analysis
2010 Catalyst Conference - Trends in Social Network Analysis
Marc Smith
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media Analysis
Farida Vis
 
20120622 web sci12-won-marc smith-semantic and social network analysis of …
20120622 web sci12-won-marc smith-semantic and social network analysis of …20120622 web sci12-won-marc smith-semantic and social network analysis of …
20120622 web sci12-won-marc smith-semantic and social network analysis of …
Marc Smith
 
SocialCom09-tutorial.pdf
SocialCom09-tutorial.pdfSocialCom09-tutorial.pdf
SocialCom09-tutorial.pdf
BalasundaramSr
 
Social Information & Browsing March 6
Social Information & Browsing   March 6Social Information & Browsing   March 6
Social Information & Browsing March 6
sritikumar
 
Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...
Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...
Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...
inventionjournals
 
Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...
Mike Kujawski
 
Q046049397
Q046049397Q046049397
Q046049397
IJERA Editor
 
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...
IRJET Journal
 
Social Media Data Mining Services - 3i Data Scraping
Social Media Data Mining Services - 3i Data Scraping Social Media Data Mining Services - 3i Data Scraping
Social Media Data Mining Services - 3i Data Scraping
Data Scraping and Data Extraction
 
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
paperpublications3
 
NodeXL Research
NodeXL ResearchNodeXL Research
NodeXL Research
Derek Hansen
 
2015 pdf-marc smith-node xl-social media sna
2015 pdf-marc smith-node xl-social media sna2015 pdf-marc smith-node xl-social media sna
2015 pdf-marc smith-node xl-social media sna
Marc Smith
 
Understanding User-Community Engagement by Multi-faceted Features: A Case ...
Understanding User-Community Engagement by Multi-faceted Features: A Case ...Understanding User-Community Engagement by Multi-faceted Features: A Case ...
Understanding User-Community Engagement by Multi-faceted Features: A Case ...
Artificial Intelligence Institute at UofSC
 
Twist
TwistTwist
Twist
Mayur Ahir
 
Defrag: Pulling the Threads on User Data
Defrag: Pulling the Threads on User DataDefrag: Pulling the Threads on User Data
Defrag: Pulling the Threads on User Data
daniela barbosa
 
A COMPREHENSIVE STUDY ON DATA EXTRACTION IN SINA WEIBO
A COMPREHENSIVE STUDY ON DATA EXTRACTION IN SINA WEIBOA COMPREHENSIVE STUDY ON DATA EXTRACTION IN SINA WEIBO
A COMPREHENSIVE STUDY ON DATA EXTRACTION IN SINA WEIBO
ijaia
 

Similar to Data mining for social media (20)

An Empirical Study On IMDb And Its Communities Based On The Network Of Co-Rev...
An Empirical Study On IMDb And Its Communities Based On The Network Of Co-Rev...An Empirical Study On IMDb And Its Communities Based On The Network Of Co-Rev...
An Empirical Study On IMDb And Its Communities Based On The Network Of Co-Rev...
 
Jx2517481755
Jx2517481755Jx2517481755
Jx2517481755
 
Jx2517481755
Jx2517481755Jx2517481755
Jx2517481755
 
2010 Catalyst Conference - Trends in Social Network Analysis
2010 Catalyst Conference - Trends in Social Network Analysis2010 Catalyst Conference - Trends in Social Network Analysis
2010 Catalyst Conference - Trends in Social Network Analysis
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media Analysis
 
20120622 web sci12-won-marc smith-semantic and social network analysis of …
20120622 web sci12-won-marc smith-semantic and social network analysis of …20120622 web sci12-won-marc smith-semantic and social network analysis of …
20120622 web sci12-won-marc smith-semantic and social network analysis of …
 
SocialCom09-tutorial.pdf
SocialCom09-tutorial.pdfSocialCom09-tutorial.pdf
SocialCom09-tutorial.pdf
 
Social Information & Browsing March 6
Social Information & Browsing   March 6Social Information & Browsing   March 6
Social Information & Browsing March 6
 
Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...
Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...
Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...
 
Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...
 
Q046049397
Q046049397Q046049397
Q046049397
 
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...
 
Social Media Data Mining Services - 3i Data Scraping
Social Media Data Mining Services - 3i Data Scraping Social Media Data Mining Services - 3i Data Scraping
Social Media Data Mining Services - 3i Data Scraping
 
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
 
NodeXL Research
NodeXL ResearchNodeXL Research
NodeXL Research
 
2015 pdf-marc smith-node xl-social media sna
2015 pdf-marc smith-node xl-social media sna2015 pdf-marc smith-node xl-social media sna
2015 pdf-marc smith-node xl-social media sna
 
Understanding User-Community Engagement by Multi-faceted Features: A Case ...
Understanding User-Community Engagement by Multi-faceted Features: A Case ...Understanding User-Community Engagement by Multi-faceted Features: A Case ...
Understanding User-Community Engagement by Multi-faceted Features: A Case ...
 
Twist
TwistTwist
Twist
 
Defrag: Pulling the Threads on User Data
Defrag: Pulling the Threads on User DataDefrag: Pulling the Threads on User Data
Defrag: Pulling the Threads on User Data
 
A COMPREHENSIVE STUDY ON DATA EXTRACTION IN SINA WEIBO
A COMPREHENSIVE STUDY ON DATA EXTRACTION IN SINA WEIBOA COMPREHENSIVE STUDY ON DATA EXTRACTION IN SINA WEIBO
A COMPREHENSIVE STUDY ON DATA EXTRACTION IN SINA WEIBO
 

Recently uploaded

GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 

Recently uploaded (20)

GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 

Data mining for social media

  • 1. Data Mining for Social Media VNG Corporation – R&D Team 4/23/2011 1 VNG Corporation - R&D Team
  • 2. Content Social Media Growth Social Media Data Data Mining for Social Media Conclusion & Discussion 4/23/2011 2 VNG Corporation - R&D Team
  • 3. 1. Social Media Growth Top sites Globally Google Facebook Youtube Yahoo Live Baidu Wikipedia Blogger MSN Tencent Twitter Top sites in Vietnam Google Vnexpress Zing.vn Yahoo Youtube Facebook Dantri.com.vn 24h.com.vn Mediafire Vatgia.com 4/23/2011 VNG Corporation - R&D Team 3
  • 4. 1. Social Media Growth Some Statistics Facebook - largest social network site 600,000,000 users, half log in everyday 35,000,000,000 online friendships 900,000,000 objects people interact with 30,000,000,000 shared content items / month YouTube – largest video sharing site 2,000,000,000 views per day 1,000,000 video hours uploaded per month Twitter – largest microblogging site 200,000,000 users per month 65,000,000 tweets per day (750 per second) 8,000,000 followers of most popular user ZingMe – largest Vietnamese social network 35,000,000 users, 10,000,000 monthly active 260,000,000 online friendships Plenty of services: music, video, karaoke, games, news, chat, photo, blog … 4/23/2011 4 VNG Corporation - R&D Team
  • 5. 2. Social Media Data Social media data is everywhere Social Overload: Information Overloadblogs, microblogs, forums, wikis, news, bookmarked web pages, photos, videos, etc. Interaction Overloadfriends, followers, followees, commenters, co-members, voters, “likers”, taggers, etc.  How to extract useful information from this chaos? 4/23/2011 5 VNG Corporation - R&D Team
  • 6. 2. Social Media Data Opportunities Social Media captures the pulse of humanity! Can directly study opinions and behaviors of millions of users to gain insights into: Human behaviors Marketing analytics, product sentiment Application & Problems: WWW: search, information retrieval (group web sites or documents) Targeted marketing: identify groups of customers or products to make recommendations (targeted advertising, viral marketing) Personalization (interfaces, services) Epidemiology, Fraud detection, Security (counterterrorism) … 4/23/2011 6 VNG Corporation - R&D Team
  • 7. Quick Recap Social Media Growth Social Media Data Data Mining for Social Media Social Network as a Graph Interesting Problems Community Detection Node Classification Link Classification & Tie Strength Information Flow Conclusion & Discussion 4/23/2011 7 VNG Corporation - R&D Team
  • 8. 3. Data Mining for Social Media Data Mining in Social Network: Graph Mining: Friendship graph, contact lists. Interactions between users. Text Mining: Blogs, status updates, tweets… Texts, messages sent between users. Some interesting problems for data miners: Model Information Flow (e.g. viral marketing) Model evolution (e.g. link prediction) Extract information for learning (e.g. node classification, community detection). 4/23/2011 8 VNG Corporation - R&D Team
  • 9. 3.1 Social Network as a Graph A social network is a graph, but: nodes can have attributes edges (links) may be weighed and/or directed, or not so, the similarity (tie strength, affinity) between two nodes is = f(attributes; links) the network’s graph is not a simple random graph (special structural properties) Large-scale graphs Mining of large-scale graph 4/23/2011 9 VNG Corporation - R&D Team
  • 10. 3.1 Social Graph Characteristics Sparse networks: number of links proportional to the number of nodes. Small world effect: The shortest path between two random nodes is on average small. This property is related to the distribution of the degrees of the nodes: scale-free network (Barabasi, 2000) 4/23/2011 10 VNG Corporation - R&D Team
  • 11. 3.2 Interesting ProblemsCommunity Detection Community Detection in Social Network: Partition the graph into clusters Find the (small) community around a given node Why Community Detection? Capture network’s dynamic Allow local analysis of interactions. Reveal the properties without releasing individual privacy information. Methods Clustering based on shortest-path betweenness Clustering based on network modularity 4/23/2011 11 VNG Corporation - R&D Team
  • 12. 3.2 Interesting Problems Node Classification Node Classification for Social Network: Labeling nodes in the network, indicating demographic values, interest, beliefs or other characteristics. Applications: Used as input for Recommendation Suggest new connections, objects. Personalized ads tailored to users’ interest. Find community based on interests, affiliation. Study how ideas are spread over time. Methods Methods based on traditional classifiers using graph information. Graph-based Methods 4/23/2011 12 VNG Corporation - R&D Team
  • 13. 3.2 Interesting Problems Link Prediction & Tie Strength Link prediction: Given a snapshot of a social network, infer which new interaction among its members are likely to occur in the near future. Tie Strength: combination of amount of TIME, emotional INTENSITY, INTIMACY (mutual confiding), and reciprocal SERVICES. Applications: Predict future friends Find influential users in the networks. Find possible links between users and objects (e.g. online item to be sold). Methods: Supervised Learning: Decision Trees, Logistic Regression, Support Vector Machine … Graph-based methods. 4/23/2011 13 VNG Corporation - R&D Team
  • 14. 3.2 Interesting Problems Information Flow Information flow through Social Media Analyzing underlying mechanisms for the real-time spread of information through on-line networks Motivating questions: How do messages spread through social networks? How to predict the spread of information? How to identify networks over which the messages spread? Application: Indicate trends and attentions Predictive modeling of the spread of new ideas and behaviors Search: Real-time search, Social search 4/23/2011 14 VNG Corporation - R&D Team
  • 15. 4. Conclusion and Discussion Social Media – Rich,Big & Open Data: Billions users, billions contents Textual, Multimedia (image, videos, etc.) Billions of connections Behaviors, preferences, trends... Challenges: Large-scale Problems Noise in data Recommender System for users and enterprises: Maintain users’ interest and attract new users to the network Targeted Marketing: Show appropriate ads and items personalized for users to Predict users’ interests and trends: Make effective plans. … 4/23/2011 15 VNG Corporation - R&D Team
  • 16. 4/23/2011 VNG Corporation - R&D Team 16 Thank you for your attention!

Editor's Notes

  1. Firms are increasingly collecting data on explicit social network of consumers