Learning to Classify Users in Online
Interaction Networks
Georgios Rizos, Symeon Papadopoulos, and Yiannis Kompatsiaris
Centre for Research and Technology Hellas (CERTH) – Information Technologies Institute (ITI)
ICCSS 2015, June 10, 2015, Helsinki, Finland
User Classification
#2
Twitter Handle Labels
@nytimes usa, press,
new york
@HuffPostBiz finance
@BBCBreaking press,
journalist, tv
@StKonrath journalist
Examples from SNOW 2014 dataset
User Classification in (and outside) OSNs
#3
OSN
online activities
log filesAPIs
Behaviour
Observation
Profiling/Classification
Network-based User Classification
• People with similar interests tend to connect
(homophily)
• Knowing about one’s connections
could reveal information
about them
• Knowing about
the whole network
structure could reveal
even more…
#4
Related Work: User Classification
Graph-based semi-supervised learning:
• Label propagation (Zhu and Ghahramani, 2002)
• Local and global consistency (Zhou et al., 2004)
• Empirical evaluation of many graph kernels (Fouss et al., 2012)
Other approaches to user classification:
• Hybrid feature engineering for inferring user behaviors
(Pennacchiotti et al., 2011 , Wagner et al., 2013)
• Crowdsourcing Twitter list keywords for popular users
(Ghosh et al., 2012)
• Content-based, graph-regularized NMF for spammer detection
(Hu et al., 2013)
#5
Related Work: Graph Feature Extraction
First attempts at using community detection:
• EdgeCluster: Edge centric k-means (Tang and Liu, 2009)
• MROC: Binary tree community hierarchy (Wang et al., 2013)
Low-rank matrix representation methods:
• Laplacian Eigenmaps: k eigenvectors of the graph Laplacian
(Belkin and Niyogi, 2003 , Tang and Liu, 2011)
• Random-Walk Modularity Maximization: Does not suffer from
the resolution limit of ModMax (Devooght et al., 2014)
• Deepwalk: Deep representation learning (Perozzi et al., 2014)
#6
Overview of Framework
#7
Online social interactions
(retweets, mentions, etc.)
Social interaction
user graph
ARCTE
Partial/Sparse
Annotation
Unsupervised graph
feature representation
Supervised graph
feature representation
Feature Weighting
User Label
Learning
Classified Users
Network Features using ARCTE
• Based on user-centric community detection.
• We extract for each user, two types of user-centric
communities.
• Base user-centric community: 𝑐 𝑣 = 𝑁(𝑣) ∪ 𝑣
• Extended user-centric community: Consider a vector 𝑝 𝑣 that
contains similarity values among the seed user 𝑣 and all the
rest of the users.
– By truncating appropriately, we can keep a community of the most
similar users to the seed 𝑣.
– We keep the fewest possible users such that we still include the seed
user’s direct neighbors.
• Denote the set of communities detected by 𝐶. We form the
feature matrix 𝑋 as follows:
𝑥 𝑣𝑖 =
1, 𝑖𝑓𝑣 ∈ 𝑐𝑖
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
, ∀𝑐𝑖 ∈ 𝐶
#8
ARCTE: Toy Example
#9
Fast Approximate User-centric PageRank
• Given a seed user 𝑣, we calculate the user-centric PageRank
vector (i.e. stationary distribution with probability 1 at 𝑣).
• Localized, sparse vector; i.e. we neither propagate nor store
trivial values.
• Instead of approximating the PageRank vector, we
approximate cumulative PageRank differences. Better
approximation for fewer iterations.
• We alternate between two update rules:
– Cumulative PR diff: 𝑝(𝑡+1) = 𝑝(𝑡) + 1 − 𝜌 𝑟(𝑡−1) 𝑊𝑢
(instead of PR: 𝑝(𝑡+1) = 𝑝(𝑡) + 𝑟(𝑡) 𝐼 𝑢, (Andersen et al., 2006))
– Residual distribution: 𝑟(𝑡+1) = 𝑟(𝑡) − 𝑟(𝑡) 𝐼 𝑢 + (1 − 𝜌)𝑟(𝑡) 𝑊𝑢
where 𝜌: Restart probability and
𝑊𝑢 the 𝑢-th row of 𝑊 = 𝐷−1 𝐴 and 𝐼 𝑢 the 𝑢-th row of 𝐼
• Finally, we divide each element of 𝑝 by its degree in order to
get approximate, user-centric, regularized commute-times.
#10
Community Weighting
• We perform a supervised community weighting step to
boost the importance of highly predictive communities.
• For each community we calculate a weight:
𝑤 𝑑 = 𝜒2 𝑖 × 𝑖𝑣𝑓(𝑖)
• The first factor is based on supervised chi-squared weighting
that quantifies the correlation among all feature-label pairs.
– PSNR aggregation across labels: 𝜒2
𝑖 =
max 𝜒
2
𝑖,𝑙 −min( 𝜒2 𝑖,𝑙 )
𝑤𝑖𝑡ℎ𝑖𝑛−𝑙𝑎𝑏𝑒𝑙−𝑣𝑎𝑟𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦
• The second factor is unsupervised inverse vertex frequency.
– Consider idf with vertices as terms and communities as documents.
• We multiply each column of 𝑋 with the corresponding weight.
#11
Evaluation: Dataset Description
#12
Datasets Labels Vertices Vertex Type Edges Edge Type
SNOW2014 Graph
(Papadopoulos et al., 2014)
90 533,874 Twitter
Account
949,661 Mentions +
Retweets
IRMV-PoliticsUK
(Greene & Cunningham, 2013)
5 419 Twitter
Account
11,349 Mentions +
Retweets
ASU-YouTube
(Mislove et al., 2007)
47 1,134,890 YouTube
Channel
2,987,624 Subscriptions
ASU-Flickr
(Tang and Liu, 2009)
195 80,513 Flickr Account 5,899,882 Contacts
Ground truth generation:
• SNOW2014 Graph: Twitter list aggregation & post-processing
• IRMV-PoliticsUK: Manual annotation
• ASU-YouTube: User membership to group
• ASU-Flickr: User subscription to interest group
Evaluation: SNOW 2014 dataset
#13
SNOW2014 Graph (534K, 950K): Twitter mentions + retweets
ground truth based on Twitter list processing
Evaluation: Insight Politics UK
#14
Insight-Multiview-PoliticsUK (419, 11K): mentions + retweets
ground truth based on manual annotation
Evaluation: ASU-YouTube
#15
ASU-YouTube (1.1M, 3M): YouTube subscriptions
ground truth based on membership to groups
Evaluation: ASU-Flickr
#16
ASU-Flickr (80K, 5.9M): Flickr contacts
ground truth based on membership to Flickr groups
Evaluation: Community Weighting
#17
Conclusion
• Key ideas:
– new user feature representation based on user-centric
communities
– community weighting based on sparse annotations
– consistently good performance both on interaction
(mention/retweet) and affiliation (follow/subscribe)
graphs
• Future Work:
– integration of additional signals (content)
– investigating feasibility on other classification problems,
e.g. spammer detection
#18
Thank you!
• Resources:
Slides: http://www.slideshare.net/sympapadopoulos/learning-to-classify-
users-in-online-interaction-networks
Code: https://github.com/MKLab-ITI/reveal-user-classification
https://github.com/MKLab-ITI/reveal-user-annotation
• Get in touch:
@sympapadopoulos / papadop@iti.gr
@georgios_rizos / georgerizos@iti.gr
#19
References (1/3)
• Belkin, M., & Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction
and data representation. Neural computation, 15(6), 1373-1396.
• Tang, L., & Liu, H. (2011). Leveraging social media networks for classification. Data
Mining and Knowledge Discovery, 23(3), 447-478.
• Devooght, R., Mantrach, A., Kivimäki, I., Bersini, H., Jaimes, A., & Saerens, M.
(2014, April). Random walks based modularity: application to semi-supervised
learning. In Proceedings of the 23rd international conference on World wide web
(pp. 213-224). International World Wide Web Conferences Steering Committee.
• Perozzi, B., Al-Rfou, R., & Skiena, S. (2014, August). Deepwalk: Online learning of
social representations. In Proceedings of the 20th ACM SIGKDD international
conference on Knowledge discovery and data mining (pp. 701-710). ACM.
• Tang, L., & Liu, H. (2009, November). Scalable learning of collective behavior based
on sparse social dimensions. In Proceedings of the 18th ACM conference on
Information and knowledge management (pp. 1107-1116). ACM.
• Wang, X., Tang, L., Liu, H., & Wang, L. (2013). Learning with multi-resolution
overlapping communities. Knowledge and information systems, 36(2), 517-535.
#20
References (2/3)
• Zhu, X., & Ghahramani, Z. (2002). Learning from labeled and unlabeled data with label
propagation. Technical Report CMU-CALD-02-107, Carnegie Mellon University.
• Zhou, D., Bousquet, O., Lal, T. N., Weston, J., & Schölkopf, B. (2004). Learning with local and
global consistency. Advances in neural information processing systems, 16(16), 321-328.
• Fouss, F., Francoisse, K., Yen, L., Pirotte, A., & Saerens, M. (2012). An experimental
investigation of kernels on graphs for collaborative recommendation and semisupervised
classification. Neural Networks, 31, 53-72.
• Pennacchiotti, M., & Popescu, A. M. (2011, August). Democrats, republicans and starbucks
afficionados: user classification in twitter. In Proceedings of the 17th ACM SIGKDD
international conference on Knowledge discovery and data mining (pp. 430-438). ACM.
• Ghosh, S., Sharma, N., Benevenuto, F., Ganguly, N., & Gummadi, K. (2012, August). Cognos:
crowdsourcing search for topic experts in microblogs. In Proceedings of the 35th
international ACM SIGIR conference on Research and development in information retrieval
(pp. 575-590). ACM.
• Hu, X., Tang, J., Zhang, Y., & Liu, H. (2013, August). Social spammer detection in
microblogging. In Proceedings of the Twenty-Third international joint conference on Artificial
Intelligence (pp. 2633-2639). AAAI Press.
• Wagner, C., Asur, S., & Hailpern, J. (2013, September). Religious politicians and creative
photographers: Automatic user categorization in twitter. In Social Computing (SocialCom),
2013 International Conference on (pp. 303-310). IEEE.
#21
References (3/3)
• Andersen, R., Chung, F., & Lang, K. (2006, October). Local graph
partitioning using pagerank vectors. In Foundations of Computer Science,
2006. FOCS'06. 47th Annual IEEE Symposium on (pp. 475-486). IEEE.
• Papadopoulos, S., Corney, D., & Aiello, L. M. (2014). SNOW 2014 Data
Challenge: Assessing the Performance of News Topic Detection Methods
in Social Media. In SNOW-DC@ WWW (pp. 1-8).
• Greene, D., & Cunningham, P. (2013, May). Producing a unified graph
representation from multiple social network views. In Proceedings of the
5th Annual ACM Web Science Conference (pp. 118-121). ACM.
• Mislove, A., Marcon, M., Gummadi, K. P., Druschel, P., & Bhattacharjee, B.
(2007, October). Measurement and analysis of online social networks. In
Proceedings of the 7th ACM SIGCOMM conference on Internet
measurement (pp. 29-42). ACM.
• Tang, L., & Liu, H. (2009, June). Relational learning via latent social
dimensions. In Proceedings of the 15th ACM SIGKDD international
conference on Knowledge discovery and data mining (pp. 817-826). ACM.
#22
Auxiliary Slides
#23
Classifying Users using Network Structure
• User-centric community detection to the problem
of graph-based user classification. We name our
approach ARCTE.
• Improved approximate, user-centric PageRank
calculation for better local graph exploration.
• Supervised community weighting step that boosts
the importance of highly predictive communities in
the feature representation.
• Extensive comparative study of numerous state-of-
the-art network feature extraction methods on
several social interaction datasets.
#24

Learning to Classify Users in Online Interaction Networks

  • 1.
    Learning to ClassifyUsers in Online Interaction Networks Georgios Rizos, Symeon Papadopoulos, and Yiannis Kompatsiaris Centre for Research and Technology Hellas (CERTH) – Information Technologies Institute (ITI) ICCSS 2015, June 10, 2015, Helsinki, Finland
  • 2.
    User Classification #2 Twitter HandleLabels @nytimes usa, press, new york @HuffPostBiz finance @BBCBreaking press, journalist, tv @StKonrath journalist Examples from SNOW 2014 dataset
  • 3.
    User Classification in(and outside) OSNs #3 OSN online activities log filesAPIs Behaviour Observation Profiling/Classification
  • 4.
    Network-based User Classification •People with similar interests tend to connect (homophily) • Knowing about one’s connections could reveal information about them • Knowing about the whole network structure could reveal even more… #4
  • 5.
    Related Work: UserClassification Graph-based semi-supervised learning: • Label propagation (Zhu and Ghahramani, 2002) • Local and global consistency (Zhou et al., 2004) • Empirical evaluation of many graph kernels (Fouss et al., 2012) Other approaches to user classification: • Hybrid feature engineering for inferring user behaviors (Pennacchiotti et al., 2011 , Wagner et al., 2013) • Crowdsourcing Twitter list keywords for popular users (Ghosh et al., 2012) • Content-based, graph-regularized NMF for spammer detection (Hu et al., 2013) #5
  • 6.
    Related Work: GraphFeature Extraction First attempts at using community detection: • EdgeCluster: Edge centric k-means (Tang and Liu, 2009) • MROC: Binary tree community hierarchy (Wang et al., 2013) Low-rank matrix representation methods: • Laplacian Eigenmaps: k eigenvectors of the graph Laplacian (Belkin and Niyogi, 2003 , Tang and Liu, 2011) • Random-Walk Modularity Maximization: Does not suffer from the resolution limit of ModMax (Devooght et al., 2014) • Deepwalk: Deep representation learning (Perozzi et al., 2014) #6
  • 7.
    Overview of Framework #7 Onlinesocial interactions (retweets, mentions, etc.) Social interaction user graph ARCTE Partial/Sparse Annotation Unsupervised graph feature representation Supervised graph feature representation Feature Weighting User Label Learning Classified Users
  • 8.
    Network Features usingARCTE • Based on user-centric community detection. • We extract for each user, two types of user-centric communities. • Base user-centric community: 𝑐 𝑣 = 𝑁(𝑣) ∪ 𝑣 • Extended user-centric community: Consider a vector 𝑝 𝑣 that contains similarity values among the seed user 𝑣 and all the rest of the users. – By truncating appropriately, we can keep a community of the most similar users to the seed 𝑣. – We keep the fewest possible users such that we still include the seed user’s direct neighbors. • Denote the set of communities detected by 𝐶. We form the feature matrix 𝑋 as follows: 𝑥 𝑣𝑖 = 1, 𝑖𝑓𝑣 ∈ 𝑐𝑖 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 , ∀𝑐𝑖 ∈ 𝐶 #8
  • 9.
  • 10.
    Fast Approximate User-centricPageRank • Given a seed user 𝑣, we calculate the user-centric PageRank vector (i.e. stationary distribution with probability 1 at 𝑣). • Localized, sparse vector; i.e. we neither propagate nor store trivial values. • Instead of approximating the PageRank vector, we approximate cumulative PageRank differences. Better approximation for fewer iterations. • We alternate between two update rules: – Cumulative PR diff: 𝑝(𝑡+1) = 𝑝(𝑡) + 1 − 𝜌 𝑟(𝑡−1) 𝑊𝑢 (instead of PR: 𝑝(𝑡+1) = 𝑝(𝑡) + 𝑟(𝑡) 𝐼 𝑢, (Andersen et al., 2006)) – Residual distribution: 𝑟(𝑡+1) = 𝑟(𝑡) − 𝑟(𝑡) 𝐼 𝑢 + (1 − 𝜌)𝑟(𝑡) 𝑊𝑢 where 𝜌: Restart probability and 𝑊𝑢 the 𝑢-th row of 𝑊 = 𝐷−1 𝐴 and 𝐼 𝑢 the 𝑢-th row of 𝐼 • Finally, we divide each element of 𝑝 by its degree in order to get approximate, user-centric, regularized commute-times. #10
  • 11.
    Community Weighting • Weperform a supervised community weighting step to boost the importance of highly predictive communities. • For each community we calculate a weight: 𝑤 𝑑 = 𝜒2 𝑖 × 𝑖𝑣𝑓(𝑖) • The first factor is based on supervised chi-squared weighting that quantifies the correlation among all feature-label pairs. – PSNR aggregation across labels: 𝜒2 𝑖 = max 𝜒 2 𝑖,𝑙 −min( 𝜒2 𝑖,𝑙 ) 𝑤𝑖𝑡ℎ𝑖𝑛−𝑙𝑎𝑏𝑒𝑙−𝑣𝑎𝑟𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦 • The second factor is unsupervised inverse vertex frequency. – Consider idf with vertices as terms and communities as documents. • We multiply each column of 𝑋 with the corresponding weight. #11
  • 12.
    Evaluation: Dataset Description #12 DatasetsLabels Vertices Vertex Type Edges Edge Type SNOW2014 Graph (Papadopoulos et al., 2014) 90 533,874 Twitter Account 949,661 Mentions + Retweets IRMV-PoliticsUK (Greene & Cunningham, 2013) 5 419 Twitter Account 11,349 Mentions + Retweets ASU-YouTube (Mislove et al., 2007) 47 1,134,890 YouTube Channel 2,987,624 Subscriptions ASU-Flickr (Tang and Liu, 2009) 195 80,513 Flickr Account 5,899,882 Contacts Ground truth generation: • SNOW2014 Graph: Twitter list aggregation & post-processing • IRMV-PoliticsUK: Manual annotation • ASU-YouTube: User membership to group • ASU-Flickr: User subscription to interest group
  • 13.
    Evaluation: SNOW 2014dataset #13 SNOW2014 Graph (534K, 950K): Twitter mentions + retweets ground truth based on Twitter list processing
  • 14.
    Evaluation: Insight PoliticsUK #14 Insight-Multiview-PoliticsUK (419, 11K): mentions + retweets ground truth based on manual annotation
  • 15.
    Evaluation: ASU-YouTube #15 ASU-YouTube (1.1M,3M): YouTube subscriptions ground truth based on membership to groups
  • 16.
    Evaluation: ASU-Flickr #16 ASU-Flickr (80K,5.9M): Flickr contacts ground truth based on membership to Flickr groups
  • 17.
  • 18.
    Conclusion • Key ideas: –new user feature representation based on user-centric communities – community weighting based on sparse annotations – consistently good performance both on interaction (mention/retweet) and affiliation (follow/subscribe) graphs • Future Work: – integration of additional signals (content) – investigating feasibility on other classification problems, e.g. spammer detection #18
  • 19.
    Thank you! • Resources: Slides:http://www.slideshare.net/sympapadopoulos/learning-to-classify- users-in-online-interaction-networks Code: https://github.com/MKLab-ITI/reveal-user-classification https://github.com/MKLab-ITI/reveal-user-annotation • Get in touch: @sympapadopoulos / papadop@iti.gr @georgios_rizos / georgerizos@iti.gr #19
  • 20.
    References (1/3) • Belkin,M., & Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation, 15(6), 1373-1396. • Tang, L., & Liu, H. (2011). Leveraging social media networks for classification. Data Mining and Knowledge Discovery, 23(3), 447-478. • Devooght, R., Mantrach, A., Kivimäki, I., Bersini, H., Jaimes, A., & Saerens, M. (2014, April). Random walks based modularity: application to semi-supervised learning. In Proceedings of the 23rd international conference on World wide web (pp. 213-224). International World Wide Web Conferences Steering Committee. • Perozzi, B., Al-Rfou, R., & Skiena, S. (2014, August). Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 701-710). ACM. • Tang, L., & Liu, H. (2009, November). Scalable learning of collective behavior based on sparse social dimensions. In Proceedings of the 18th ACM conference on Information and knowledge management (pp. 1107-1116). ACM. • Wang, X., Tang, L., Liu, H., & Wang, L. (2013). Learning with multi-resolution overlapping communities. Knowledge and information systems, 36(2), 517-535. #20
  • 21.
    References (2/3) • Zhu,X., & Ghahramani, Z. (2002). Learning from labeled and unlabeled data with label propagation. Technical Report CMU-CALD-02-107, Carnegie Mellon University. • Zhou, D., Bousquet, O., Lal, T. N., Weston, J., & Schölkopf, B. (2004). Learning with local and global consistency. Advances in neural information processing systems, 16(16), 321-328. • Fouss, F., Francoisse, K., Yen, L., Pirotte, A., & Saerens, M. (2012). An experimental investigation of kernels on graphs for collaborative recommendation and semisupervised classification. Neural Networks, 31, 53-72. • Pennacchiotti, M., & Popescu, A. M. (2011, August). Democrats, republicans and starbucks afficionados: user classification in twitter. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 430-438). ACM. • Ghosh, S., Sharma, N., Benevenuto, F., Ganguly, N., & Gummadi, K. (2012, August). Cognos: crowdsourcing search for topic experts in microblogs. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval (pp. 575-590). ACM. • Hu, X., Tang, J., Zhang, Y., & Liu, H. (2013, August). Social spammer detection in microblogging. In Proceedings of the Twenty-Third international joint conference on Artificial Intelligence (pp. 2633-2639). AAAI Press. • Wagner, C., Asur, S., & Hailpern, J. (2013, September). Religious politicians and creative photographers: Automatic user categorization in twitter. In Social Computing (SocialCom), 2013 International Conference on (pp. 303-310). IEEE. #21
  • 22.
    References (3/3) • Andersen,R., Chung, F., & Lang, K. (2006, October). Local graph partitioning using pagerank vectors. In Foundations of Computer Science, 2006. FOCS'06. 47th Annual IEEE Symposium on (pp. 475-486). IEEE. • Papadopoulos, S., Corney, D., & Aiello, L. M. (2014). SNOW 2014 Data Challenge: Assessing the Performance of News Topic Detection Methods in Social Media. In SNOW-DC@ WWW (pp. 1-8). • Greene, D., & Cunningham, P. (2013, May). Producing a unified graph representation from multiple social network views. In Proceedings of the 5th Annual ACM Web Science Conference (pp. 118-121). ACM. • Mislove, A., Marcon, M., Gummadi, K. P., Druschel, P., & Bhattacharjee, B. (2007, October). Measurement and analysis of online social networks. In Proceedings of the 7th ACM SIGCOMM conference on Internet measurement (pp. 29-42). ACM. • Tang, L., & Liu, H. (2009, June). Relational learning via latent social dimensions. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 817-826). ACM. #22
  • 23.
  • 24.
    Classifying Users usingNetwork Structure • User-centric community detection to the problem of graph-based user classification. We name our approach ARCTE. • Improved approximate, user-centric PageRank calculation for better local graph exploration. • Supervised community weighting step that boosts the importance of highly predictive communities in the feature representation. • Extensive comparative study of numerous state-of- the-art network feature extraction methods on several social interaction datasets. #24

Editor's Notes

  • #3 Topics Political/social attitudes News stories Geographical area User types/roles Useful for news search/discovery Potential privacy issues
  • #4 Different kinds of user classification: topic-oriented (e.g., interest/expertise) role-based/behavioral (e.g., bot/spammer) geographical location Useful for advertising, user recommendation, expert search, etc. For personal accounts, user classification raises privacy concerns Challenges multi-linguality Brevity informal language
  • #19 http://irevolution.net/2014/04/03/using-aidr-to-collect-and-analyze-tweets-from-chile-earthquake/