Tutorial on Relationship Mining In Online Social Networks

606 views

Published on

Prepared as an assignment for CS410: Text Information Systems in Spring 2016

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Tutorial on Relationship Mining In Online Social Networks

  1. 1. Образец заголовка Tutorial on Relationship Mining In Online Social Networks Peifeng Jing NetID: pjing2 Department of Computer Science University of Illinois Urbana-Champaign Prepared as an assignment for CS410: Text Information Systems in Spring 2016
  2. 2. Образец заголовкаAgenda •  Introduction •  Basic Concepts •  Text Mining in Social Network •  Sub-fields of Online Social Networks •  Relationship Mining Systems •  Summary
  3. 3. Образец заголовкаAgenda •  Introduction •  Basic Concepts •  Text Mining in Social Network •  Sub-fields of Online Social Networks •  Relationship Mining Systems •  Summary
  4. 4. Образец заголовка Social Networks Play An Important Role In Our Daily Lives •  People conduct communications and share information through social relations with others such as friends, family colleagues, collaborators, and business partners. h"p://socialnetworking.lovetoknow.com/image/37420~SocialNetworkingAnalysisSo@ware.jpg  
  5. 5. Образец заголовка Online Large-Scale Social Network Is Growing •  For example, Facebook, Twitter, Google, etc. •  Over 1.59 billion users on Facebook •  Over 555 million users on Twitter h"p://www.staCsta.com/staCsCcs/272014/global-­‐social-­‐networks-­‐ranked-­‐by-­‐number-­‐of-­‐users/   h"p://libeltyseo.com/wp-­‐content/uploads/2013/03/social-­‐networking.png   h"p://www.bramblingdesign.com/being-­‐visible-­‐Cps-­‐on-­‐how-­‐to-­‐best-­‐use-­‐social-­‐network/  
  6. 6. Образец заголовка There Are Many Cool Applications In Social Network and Relationship Mining •  Business intelligence and market analysis •  Discovery of community structures •  Inferring social relationships •  Personal profiling •  E-mail filtering •  Mining hidden advisor-advisee relationships in academic networks •  Positive/Negative relationship mining, e.g. trust network and voting network •  Biological networks, e.g. predicting food webs, protein-protein interactions, metabolic networks, etc.
  7. 7. Образец заголовка But We Have Problems To Find Relations In Large-Scale Network •  We used to depend on users to label relations •  Some personal information are hidden in the online social network •  People generally use unstructured or semi- structured languages for communication
  8. 8. Образец заголовкаCan We Automatically Find Relations? •  More digital data are available online •  More people share personal information through social media h"p://www.slideshare.net/julia594/social-­‐networking-­‐sites-­‐selling-­‐informaCon-­‐to-­‐third-­‐parCes  
  9. 9. Образец заголовкаAgenda •  Introduction •  Basic Concepts –  Problem Definition –  Relationship Concepts in Sociology •  Text Mining in Social Network •  Sub-fields of Online Social Networks •  Relationship Mining Systems •  Text Mining in Social Network •  Summary
  10. 10. Образец заголовка What Is The Task of Relationship Mining? •  Introduction •  Basic Concepts –  Problem Definition –  Relationship Concepts in Sociology •  Text Mining in Social Network •  Sub-fields of Online Social Networks •  Relationship Mining Systems •  Text Mining in Social Network •  Summary
  11. 11. Образец заголовкаWhat Is Relationship Mining? •  User Information and their relationships are not complete in online social network •  Relationship Mining is to predict or discover the hidden relationships in the social networks
  12. 12. Образец заголовкаFormulated Definition of Relationship Mining •  Given a network G(V,E) and a list of attributes A= {A1,…,Am}. –  V = {v0,…,vn} represents users. –  E={eij} are the set of connections between users. –  The attributes A represents profiles of users. –  The label L={lij} are the types of edge {eij} •  For user vi, some information might be missing –  attributes aij are unknown –  edges eij are missing –  labels lij are unknown •  The task is to discover or predict the missing edges eij and unknown labels lij [1] Wenbin Tang, Honglei Zhuang, and Jie Tang, Learning to Infer Social Ties in Large Networkds, ECML/PKDD’11, 2011 [2]  Rui  Li,  Chi  Wang,  Kevin  Chen-­‐Chuan  Chang,  User  Profiling  in  an  Ego  Network:  Coprofiling  A"ributes  and  RelaConships,  WWW’14,  April  7-­‐11,  2014,  Seuol,  Korea,  ACM   978-­‐1-­‐4503-­‐2744-­‐2/14/04
  13. 13. Образец заголовка How Do We Measure The Edges (Relations) In Social Network? •  Binary Measurement – 0: absent – 1: existing •  Strength of Social Tie – A concept from Sociology [1] Mark Granovetter, The Strength of Weak Ties: A Network Theory Revisited, Sociological Theory, Vol. 1 (1983), pp. 201-233 [2] Mark S. Granovetter, The Strength of Weak Ties, American Journal of Sociology, Vol. 78, No. 6 (1973), pp. 1360-1380 [3] David Easley, Jon Kleinberg, Network, Crowds, and Markets: Reasonging about a Highly Connected World, Cambridge University Press 2010, pp. 47-83
  14. 14. Образец заголовкаWhat Does Sociology Tell Us? •  Introduction •  Basic Concepts –  Problem Definition –  Relationship Concepts in Sociology •  Text Mining in Social Network •  Sub-fields of Online Social Networks •  Relationship Mining Systems •  Text Mining in Social Network •  Summary
  15. 15. Образец заголовка Relationships Has Been Studied For A Long Time In Sociology (Since 1954) •  Sociologists consider relationships as “Social Tie” •  In sociology, social tie is measured by its “Strength” David Easley, Jon Kleinberg, Network, Crowds, and Markets: Reasonging about a Highly Connected World, Cambridge University Press 2010, pp. 47-83
  16. 16. Образец заголовкаWhat Is Social Tie? •  Social tie (also called interpersonal tie) is the information-carrying connections between people •  Types of ties –  Strong tie: the connections to people who you really trust, whose social circles tightly overlap with your own. –  Weak tie: the connections to people who are merely acquaintances. Weak tie often provide access to novel information that not circulate in the closely knit network of strong ties •  How do we distinguish strong and weak ties? –  Strength of social ties Eric Gilbert and Karrie Karahalios, Predicting Tie Strength With Social Media, CHI ’09 Proceedings of the SIGCHI Conference of Human Factors in Computing Systems, Page 211-220
  17. 17. Образец заголовкаProperty of Social Tie: Strength •  Strength of ties: the strength of a tie is a (probably linear) combination of the amount of time, the emotional intensity, the intimacy (mutual confiding), and the reciprocal services which characterize the tie •  Dimensions of Tie Strength –  Amount of time, intimacy, intensity, reciprocal services, network topology and information social circles, emotional supports, socioeconomic status, education level, political affiliation, race, gender, etc. [1] Mohammad Karim Sohrabi, Soodeh Akbar, A comprehensive study on the effects of using data mining techniques to predict tie strength, Computers in Human Behavior, 60 (2016), pp. 534-541 [2] Eric Glbert and Karrie Karahalios, Predicting Tie Strength With Social Media, CHI’09 Proceedings of the SIGCHI Conference on Human Factors in Computer Systems (2009) pp. 211-220
  18. 18. Образец заголовкаHistory of Tie Strength Research •  The concept of tie strength was first introduced by Granovetter (1973) and ties are split into “strong” and “weak” •  Dimensions of tie strength are improved by Lin, Ensel, and Vaughn (1981) and Wellman and Wortley (1990) •  Marsden and Campbell (1984) first did researches to predict tie strength •  Krackhardt and Stern (1988) demonstrate strong relationship between employees of different organizational sub-units can help an organization to resist in the crisis •  Strong partners are demonstrated to create crisis and pressure for institutional changes in the organization (Krackhardt 1992) •  Granovetter (1995) demonstrated that weak ties are more beneficial for job seekers
  19. 19. Образец заголовкаHistory of Tie Strength Research (Cont.) •  Burt (2009) believes that structural factors are effective in shaping tie strength, such as network topology and informal social circle. •  Wilson et al., Viswanath et al., Kahanda and Neville (2009) studied social graphs and different interaction patterns. They also uses characteristic features, topological features, transactional characteristics and network-transactional features and concluded that the most prominent features in predicting tie strength are network-transactional characteristics •  Gilbert and Karahalios (2009) uses 70 variables and achieve 85% accuracy in predicting tie strength •  In 2011, a study was conducted using regression analysis based on the principle that tie strength is a combination of the variables, such as friendships.
  20. 20. Образец заголовкаHistory of Tie Strength Research (Cont.) •  In 2013, Servia-Rodriguez, Diaz-Redondo, Fernandez-Vilas and Pazos-Arias extracted information through Facebook API’s (with users permission) •  In 2014, evaluation of performance testing is performed by means of BFF (Fogues, Such, Espinosa and Garcia-Fornes) •  Lee, Lee and Hwang (2014) used perceived business tie to investigate trust transfer •  Lin and Utz (2015) studied on the roles of tie strength in predicting the emotional outcomes of reading a post on Facebook •  Chen, Liy, and Zou (2016) proposed a Social Tie Factor Graph (STFG) model to estimate the home locations of users in the Twitter network based on user-centric data and tie strength
  21. 21. Образец заголовка Another Approach To Categorize Relationships: Positive/Negative Ties •  Positive and Negative Social Ties •  Also called “Signed Networks” •  Positive Relationship: links to indicate friendship, support or approval •  Negative Relationship: links to indicate disapproval, disagreement or distrust of opinions •  It has cool applications in predicting voting and elections Jure  Leskovec,  Daniel  Hu"enlocher,  Jon  Kleinberg,  PredicCng  PosiCve  and  NegaCve  Links  in  Online  Social  Networks,  WWW  2010,  April  26-­‐30,  2010,  Raleigh,  North  Carolina,   USA,  ACM  978-­‐1-­‐60558-­‐799-­‐8/10/04
  22. 22. Образец заголовка Why Do We Care Social Tie For Relationship Mining? •  We can use strength of social tie to determine whether there are relations between users •  Based on theory of Sociology, different types social ties represent different properties of the social network
  23. 23. Образец заголовкаHow To Do Relationship Mining? •  Introduction •  Basic Concepts •  Text Mining in Relationship Mining •  Sub-fields of Online Social Networks •  Relationship Mining Systems •  Summary
  24. 24. Образец заголовка Text Mining Is An Important Technology In Relationship Mining •  The most popular social networking websites are Facebook, LinkedIn, and MySpace where text is the dominant way of communication •  People in online social networks generally use unstructured or semi-structured languages for communication https://dcurt.is/facebooks-predicament http://www.forbes.com/forbes/welcome/  
  25. 25. Образец заголовкаWhat Can Text Mining Do? •  Pre-processing: –  Feature Extraction –  Feature Selection –  Document Representation Rizwana  Irfan,  et  al.,  A  Survey  on  Text  Mining  in  Social  Networks,  The  knowledge  Engineering  Review,  30(2)  (2015),  pp.  157-­‐170
  26. 26. Образец заголовкаWhat Can Text Mining Do? (cont.) •  Classification –  Ontology Based –  Machine Learning Based Rizwana  Irfan,  et  al.,  A  Survey  on  Text  Mining  in  Social  Networks,  The  knowledge  Engineering  Review,  30(2)  (2015),  pp.  157-­‐170
  27. 27. Образец заголовкаWhat Can Text Mining Do? (cont.) •  Clustering – Hierarchical Clustering – Partitional Clustering – Semantic-based Clustering Rizwana  Irfan,  et  al.,  A  Survey  on  Text  Mining  in  Social  Networks,  The  knowledge  Engineering  Review,  30(2)  (2015),  pp.  157-­‐170
  28. 28. Образец заголовкаAgenda •  Introduction •  Basic Concepts •  Text Mining in Social Network •  Sub-fields of Online Social Networks •  Relationship Mining Systems •  Summary
  29. 29. Образец заголовкаThree Sub-fields of Online Social Networks •  Data –  Data acquisition, storage and visualization –  Scalability for large-scale network •  Approaches on Relationship Mining –  Strength of social ties –  Positive/Negative Social Ties Prediction –  Relationship classification –  Relationship mining •  Association between users’ attributes and relationships
  30. 30. Образец заголовка Data Acquisition, Storage and Visualization •  Acquisition: HTML Web page; FOAF profiles from the Semantic Web (using RDF crawler); Collection of emails (from POP3 or IMAP store); Bibliographic data; Publication data and research profile from Web; Telecommunication data; and so on •  Storage: Sesame server (Flink); RNKB (researcher network knowledge base, ArnetMiner); Handoop (BC-PDM) •  Visualization: Model-View-Controller, Java Server Pages and Java Standard Tag Library (Flink) [1]  Yutaka  Matsuo,  et  al.,  POLYPHONET:  An  advanced  social  network  extracCon  system  from  the  Web,  Web  SemanCcs:  Science,  Services  and  Agents  on  the  World  Wide  Web,  2007   [2] Peter  Mika,  Flink:  SemanCc  Web  technology  for  the  extracCon  and  analysis  of  social  networks,  J.  Web  SemanCcs  3  (2)  (2005)  211–223   [3] Jie  Tang,  et  al.,  ArnetMiner:  ExtracCon  and  Mining  of  Academic  Social  Networks,  KDD’08,  August24-­‐27,  2008,  Las  Vegas,  Nevada,  USA.   [4] Le Yu, et al. BC-PDM: Data Mining, Social Network Analysis and Text Mining System Based on Cloud Computing, KDD’12, Beijing China (2012) 1496-1499
  31. 31. Образец заголовкаScalability for Large-Scale Network •  Filtering out pairs of persons that seem to have no relation (POLYPHONET, TPFG model) •  Parallel and cloud computing: Sesame Server (Flink), Handoop (BC-PDM) h"ps://clinked.com/2016/02/23/cloud-­‐compuCng-­‐benefits-­‐drawbacks/  h"p://www.ahay.org/wiki/Parallel_CompuCng  
  32. 32. Образец заголовкаStrength Mining of Social Ties •  Model strength as linear combination of the predictive variables and network structures •  Predictive variables: friendship, the intensity of feelings, intimacy, mutual trust and mutual services, and so on •  In this study, Gilbert & Karahalio achieves accuracy of about 85% with more than 70 predictive variables [1] Mohammad Karim Sohrabi, Soodeh Akbar, A comprehensive study on the effects of using data mining techniques to predict tie strength, Computers in Human Behavior, 60 (2016), pp. 534-541 [2] Eric Glbert and Karrie Karahalios, Predicting Tie Strength With Social Media, CHI’09 Proceedings of the SIGCHI Conference on Human Factors in Computer Systems (2009) pp. 211-220
  33. 33. Образец заголовкаExample (Gilbert & Karahalio 2009) •  Predictive variables from online social media –  Intensity: wall words exchanged, participant-initiated wall posts, friend-initiated wall posts, inbox messages exchanged, inbox thread depth, participant’s status updates, friends status updates –  Intimacy: participant’s number of friends, friend’s number of friends, days since last communication, wall intimacy words, inbox intimacy words, appearances together in photo, participant’s appearance in photo, distance between hometowns (mi), friend’s relationship status –  Duration: days since first communication –  Reciprocal Services: links exchanged by wall post, applications in common –  Structural: number of mutual friends, groups in common, Norm. TF-IDF of interests and about –  Emotional Support: wall & inbox positive emotion words, wall & inbox negative emotion words –  Social Distance: age difference (days), number of occupations difference, educational difference (degrees), overlapping words in religion, political difference (scale) si =α + βRi +γDi + N(i)+εi N(i) = λ0µM + λ1medM + λk (s −µM )k + λ5 minM + λ6 maxM s∈M ∑ k=2 4 ∑ M={sj: j and i are mutual friends} Eric Glbert and Karrie Karahalios, Predicting Tie Strength With Social Media, CHI’09 Proceedings of the SIGCHI Conference on Human Factors in Computer Systems (2009) pp. 211-220
  34. 34. Образец заголовкаPositive/Negative Prediction •  Edge Sign Prediction Problem Given a social network with signs on all its edges, but the sign on the edge from node u to node v, denoted s(u, v), has been “hidden.” How reliably can we infer this sign s(u, v) using the information provided by the rest of the network? •  Related Work Leskovec et al. (2010) proposed a method to predict the signs of links (positive or negative), yet the prediction of both the existence of a link and its sign has not been well studied. Recent development of social balance theory may provide useful hints
  35. 35. Образец заголовкаExample (Leskovec et al. 2010) •  Features: 23 features in the machine learning classification –  Use and to denote the number of incoming positive and negative edges to v, respectively. Similarly we use and to denote the number of outgoing positive and negative edges from u, respectively. We use C(u, v) to denote the total number of common neighbors of u and v in an undirected –  The second class of feature is each triad involving the edge (u, v), consisting of a node w such that w has an edge either to or from u and also an edge either to or from v; this leads to 16 possibilities •  Logistic Regression Model din + (v) din − (v) dout + (u) dout − (u) P(+ | x) = 1 1+exp[−(b0 + bi xi i=1 n ∑ )] Jure  Leskovec,  Daniel  Hu"enlocher,  Jon  Kleinberg,  PredicCng  PosiCve  and  NegaCve  Links  in  Online  Social  Networks,  WWW  2010,  April  26-­‐30,  2010,  Raleigh,  North  Carolina,  USA,  ACM  978-­‐1-­‐60558-­‐799-­‐8/10/04
  36. 36. Образец заголовкаRelationship Classification •  Real world domains are richly structured; entities of multiple types are related to each other •  Many relationships (links) are hidden in online social network •  Class of Relationships: friends, colleagues, families, teammates, etc. •  The relationship classification task: model for effectively and efficiently mining relationship types
  37. 37. Образец заголовкаExample (Tang, et al. 2011) •  Features: user-specific information, link-specific information and global constraints •  Relationship semantics: a triple (eij, rij, pij), where eij ∈ E is a social relationship, rij ∈ Y is a label associated with the relationship, and pij is the probability (confidence) obtained by an algorithm for inferring relationship type •  Factor functions: attribute factor, correlation factor (correlation between the relationships), and constraint factor (constraints between relationships) Where is parameter configuration and s is factor functions p(Y |G) = 1 Z exp{θT s(yi ) i ∑ } θ Wenbin  Tang,  Honglei  Zhuang,  and  Jie  Tang,  Learning  to  Infer  Social  Ties  in  Large  Networkds,  ECML/PKDD’11,  2011
  38. 38. Образец заголовкаRelationship Mining and Prediction •  Many links are hidden in online networks. Generally, we do not know which links are missing; So we need to predict/ mine hidden relationships •  Two standard metrics are used to quantify the accuracy of prediction algorithms –  Area under the receiver operating characteristic curve (AUC): Provided the rank of all non-observed links, the AUC value can be interpreted as the probability that a randomly chosen missing link (i.e., a link in E) is given a higher score than a randomly chosen nonexistent link AUC = [(# of missing links that have higher score) + 0.5 x (# of missing link that have the same score)] / (# of total comparisons) –  Precision: Given the ranking of the non-observed links, the precision is defined as the ratio of relevant items selected to the number of items selected Linyuan  Lu,  Tao  Zhou,  Link  preidicCon  in  complex  network:  A  survey,  Physica  A:  Sta-s-cal  Mechanics  and  its  applica-ons,  390  (6),  2011,  pp.  1150-­‐1170
  39. 39. Образец заголовкаExample (Wang et al. 2010) •  Mining advisor-advisee relationships from research publication networks •  Challenges –  Latent relation –  Time-dependent –  Scalability •  Joint probability Rank score P({yi,sti,edi}ai ∈Va ) = 1 Z g(yi,sti,edi ) ai ∈Va ∏ rij = maxP(y1,..., yna | yi = j) Chi Wang, Jiawei Han, Yuntao Jia, Jie Tang, Duo Zhang, Yintao Yu, Jingyi Guo, Mining Advisor-Advisee Relationships from Research Publication Networks, KDD’10 July 25-28, 2010, Washignton, DC, USA
  40. 40. Образец заголовка Association of Users Attributes and Relationships •  The attributes of users and their relationships are not independent from each other. •  Two Tasks at The Same Time –  User attribute profiling –  Relationship type profiling •  Advantages –  Can achieve higher accuracy (e.g. 70%-90% precision in Li et al, 2014)
  41. 41. Образец заголовкаExample (Staiano et al, 2012) •  Inferring personality traits from social networks •  Features: centrality measures, small world and efficiency measures, transitivity measures, triadic measures •  Quantize personality traits score into two classes (Low/High). Classification was performed by Means of Random Forest Jacopo Stalano, et al. Friends don’t Lie – Inferring Personality Traits from Social Network Structure, UbiComp’12 Pittsburgh, USA (2012) 321-330
  42. 42. Образец заголовкаExample (Li et al, 2014) •  User profiling in an Ego network •  Social connections are discriminatively correlated with attributes via a hidden factor relationship type •  Feature –  The circle: a set of friends who have the same type of connections with the ego –  The attribute-circle dependency: the friends in a circle share the same value with the ego user for certain attributes. –  The circle-connection dependency: friends across circles are loosely connected •  Cost function: the linear combination of the three features cost = λ1 {( (wt ⋅( fi − fj ))2 + (wt ⋅( f0 − f i ))2 vi ∈Ci ∑ eij ∈E',vi,vj ∈Ci ∑ )} t=1 K ∑ + λ2 (wt ⋅ fi −1)2 vi ∈L∩Ci ∑ t=1 K ∑ + λ3 1(1) eij ∈E',xi!=xj ∑ Rui  Li,  Chi  Wang,  Kevin  Chen-­‐Chuan  Chang,  User  Profiling  in  an  Ego  Network:  Coprofiling  A"ributes  and  RelaConships,  WWW’14,  April  7-­‐11,  2014,  Seuol,  Korea,  ACM   978-­‐1-­‐4503-­‐2744-­‐2/14/04  
  43. 43. Образец заголовкаAgenda •  Introduction •  Basic Concepts •  Text Mining in Social Network •  Sub-fields of Online Social Networks •  Relationship Mining Systems •  Summary
  44. 44. Образец заголовкаThere Are Many Well-Developed Relationship Mining Systems •  Flink (Peter Mika, 2005) •  POLYPHONET (Matsuo et al., 2007) •  BC-PDM (Yu et al., 2012) •  etc.
  45. 45. Образец заголовкаFlink Semantic Web technology for the extraction and analysis of social networks First Layer: metadata acquisition Second Layer: Storage and Inference Third Layer: visualization Peter  Mika,  Flink:  SemanCc  Web  technology  for  the  extracCon  and  analysis  of  social  networks,  J.  Web  SemanCcs  3  (2)  (2005)  211–223  
  46. 46. Образец заголовкаPOLYPHONET •  Social network extraction system that extracts relations of persons, detects groups of persons and obtains keywords for a person •  Algorithms for social network extraction –  Basic algorithm: co-occurrence, matching coefficient, Jaccard coefficient, overlap coefficient –  Advanced algorithm: classifying relations –  Scalability: GoogleCooc, GoogleCoocTop Overview of module dependency Relate–identify process of Iterative Social Network Mining Yutaka  Matsuo,  et  al.,  POLYPHONET:  An  advanced  social  network  extracCon  system  from  the  Web,  Web  SemanCcs:  Science,  Services  and  Agents  on  the  World  Wide  Web,  2007  
  47. 47. Образец заголовкаBC-PDM •  Data mining, social network analysis and text mining system based on cloud computing The result of community detection The architecture of BC-PDM Le Yu, et al. BC-PDM: Data Mining, Social Network Analysis and Text Mining System Based on Cloud Computing, KDD’12, Beijing China (2012) 1496-1499
  48. 48. Образец заголовкаSummary •  Online social network play a more important role in our life; and digital data allows us to automatically find relationships in our social networks •  Text Mining plays an important role in social network applications –  Pre-processing: feature extraction, feature selection, document representation –  Classification: Ontology Based, Machine Learning Based –  Clustering: Hierarchical Clustering, Partitional Clustering, Semantic-based Clustering
  49. 49. Образец заголовкаSummary (cont.) Researches in Relationship Mining •  Data –  Data acquisition, storage and visualization –  Scalability for large-scale network •  Approaches on Relationship Mining –  Strength of social ties –  Positive/Negative Social Ties Prediction –  Relationship classification –  Relationship mining •  Association between users’ attributes and relationships There are Many Systems for relationship extraction and mining •  Flink, POLYPHONET, BC-PDM, etc.
  50. 50. Образец заголовка

×