Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
DIGITAL                            Institute for Information and Communication Technologies                               ...
Example                                                     Semantic Metadata                                             ...
Example                           Pragmatic Metadata               3Sunday, October 23, 2011
Example                           Pragmatic Metadata               3Sunday, October 23, 2011
Example                           Pragmatic Metadata               3Sunday, October 23, 2011
Example                           Pragmatic Metadata               3Sunday, October 23, 2011
Example                           Pragmatic Metadata               3Sunday, October 23, 2011
Aim                   Can pragmatic metadata support the generation of semantic                   metadata and if yes how?...
Experimental Setup                   § Methodology                      § Topic Modeling Algorithms to learn topics (pro...
Evaluation                   § Compare different models by testing their predictive                      performance on h...
Methodology                                                                          LDA                   § How to learn...
Methodology                                                                DMR                   § How to incorporate met...
Methodology                                                                     Post 7    ID       Alg           Doc    Me...
Post	  training	  scheme	                                              (M3,	  M5	  and	  M7)                   § Differen...
Results    ID       Alg           Doc    Metadata                                                                     Post...
Results    ID       Alg           Doc    Metadata                                                                     Post...
Results    ID       Alg           Doc    Metadata                                                                     Post...
Results    ID       Alg           Doc    Metadata                                                                     Post...
Results    ID       Alg           Doc    Metadata                                                                     Post...
Results    ID       Alg           Doc    Metadata                                                                     Post...
Results                   § The topics of users who reply to a user are also likely for                      this user   ...
Methodology                                                           Post 7                  Future     ID      Alg      ...
Post	  training	  scheme	                                                   (M3,	  M9	  and	  M11)                        ...
Results                   § Topic models seem to benefit from taking behavioral                      user similarities in...
Conclusions                   § Pragmatic metadata may help to learn better semantic                      user models    ...
Limitations and Future Work                   § Perplexity and semantic interpretability of topics do not                ...
References                   §   David M. Blei, Andrew Ng, Michael Jordan. Latent Dirichlet allocation. JMLR (3)         ...
Upcoming SlideShare
Loading in …5
×

SDOW (ISWC2011)

1,411 views

Published on

http://sdow.semanticweb.org/2011/

Published in: Education
  • Be the first to comment

SDOW (ISWC2011)

  1. 1. DIGITAL Institute for Information and Communication Technologies Pragmatic metadata matters: How data about the usage of data affects semantic user models Claudia Wagner, Markus Strohmaier, Yulan HeSunday, October 23, 2011
  2. 2. Example Semantic Metadata sioc:content sioc:name sioc:has_creator rdf:type rdf:type sioc:Post sioc:UserAccount 2 foaf:Person sioc:account_ofSunday, October 23, 2011
  3. 3. Example Pragmatic Metadata 3Sunday, October 23, 2011
  4. 4. Example Pragmatic Metadata 3Sunday, October 23, 2011
  5. 5. Example Pragmatic Metadata 3Sunday, October 23, 2011
  6. 6. Example Pragmatic Metadata 3Sunday, October 23, 2011
  7. 7. Example Pragmatic Metadata 3Sunday, October 23, 2011
  8. 8. Aim Can pragmatic metadata support the generation of semantic metadata and if yes how? sioc:name sioc:content sioc:has_creator rdf:type rdf:type ? sioc:topic sioc:Post foaf:interest sioc:UserAccount ? 4 foaf:Person sioc:account_ofSunday, October 23, 2011
  9. 9. Experimental Setup § Methodology § Topic Modeling Algorithms to learn topics (probability distributions of words) and annotate users and posts with topics § Incorporated different types of pragmatic metadata into the Topic Models § Compared different models via their predictive performance § Dataset § Boards.ie § Forums, Posts and Users § User`s authoring and replying behavior § Training Dataset: First and last week of February 2006 § Test Dataset: 3 future posts of each user 5Sunday, October 23, 2011
  10. 10. Evaluation § Compare different models by testing their predictive performance on held out posts. Log Likelihood of a word of user`s future post given the model we learned Sum over all words in a user`s future post § Assumption: a better user topic model reacts less perplex on future posts authored by a user and needs less trainings samples. 6Sunday, October 23, 2011
  11. 11. Methodology LDA § How to learn topics and annotate users with topics? Text § Latent Dirichlet Allocation (LDA) T1: (Blei et al, 2003) mac: 0.3 iMac: 0.13 PC: 0.03 computer: 0.04 .... T1 T2 T3 7Sunday, October 23, 2011
  12. 12. Methodology DMR § How to incorporate metadata into topic models? § Dirichlet Multinomial Regression (DMR) Topic Models (Mimno et al, 2008) § Observe feature vector x per document § Draw „fresh“ alpha for each document which depends on observed features x and the feature distribution per topic λt 8 ∝ dt= exp(λt Xdt)Sunday, October 23, 2011
  13. 13. Methodology Post 7 ID Alg Doc Metadata Future M1 LDA Post - Past Post 1 authored M2 LDA User - Post 2 M3 DMR Post author M4 DMR User author Post 3 replies to User 1 M5 DMR Post reply-user Post 4 authored M6 DMR User reply-user Post 5 M7 DMR Post related-user M8 DMR User related-user User 2 Post 6 9Sunday, October 23, 2011
  14. 14. Post  training  scheme   (M3,  M5  and  M7) § Different user activities performed on content Baseline  LDA   (M1  and  M2) Models  which  take  user  replies  into  account. (M6  and  M8) 10Sunday, October 23, 2011
  15. 15. Results ID Alg Doc Metadata Post 7 Future M1 LDA Post - M2 LDA User - Past Post 1 authored M3 DMR Post author Post 2 M4 DMR User author Post 3 M5 DMR Post reply-user User 1 replies to M6 DMR User reply-user Post 4 authored Post 5 M7 DMR Post related-user User 2 Post 6 M8 DMR User related-user 11Sunday, October 23, 2011
  16. 16. Results ID Alg Doc Metadata Post 7 Future M1 LDA Post - M2 LDA User - Past Post 1 authored M3 DMR Post author Post 2 M4 DMR User author Post 3 M5 DMR Post reply-user User 1 replies to M6 DMR User reply-user Post 4 authored Post 5 M7 DMR Post related-user User 2 Post 6 M8 DMR User related-user 11Sunday, October 23, 2011
  17. 17. Results ID Alg Doc Metadata Post 7 Future M1 LDA Post - M2 LDA User - Past Post 1 authored M3 DMR Post author Post 2 M4 DMR User author Post 3 M5 DMR Post reply-user User 1 replies to M6 DMR User reply-user Post 4 authored Post 5 M7 DMR Post related-user User 2 Post 6 M8 DMR User related-user 11Sunday, October 23, 2011
  18. 18. Results ID Alg Doc Metadata Post 7 Future M1 LDA Post - M2 LDA User - Past Post 1 authored M3 DMR Post author Post 2 M4 DMR User author Post 3 M5 DMR Post reply-user User 1 replies to M6 DMR User reply-user Post 4 authored Post 5 M7 DMR Post related-user User 2 Post 6 M8 DMR User related-user 11Sunday, October 23, 2011
  19. 19. Results ID Alg Doc Metadata Post 7 Future M1 LDA Post - M2 LDA User - Past Post 1 authored M3 DMR Post author Post 2 M4 DMR User author Post 3 M5 DMR Post reply-user User 1 replies to M6 DMR User reply-user Post 4 authored Post 5 M7 DMR Post related-user User 2 Post 6 M8 DMR User related-user 11Sunday, October 23, 2011
  20. 20. Results ID Alg Doc Metadata Post 7 Future M1 LDA Post - M2 LDA User - Past Post 1 authored M3 DMR Post author Post 2 M4 DMR User author Post 3 M5 DMR Post reply-user User 1 replies to M6 DMR User reply-user Post 4 authored Post 5 M7 DMR Post related-user User 2 Post 6 M8 DMR User related-user 11Sunday, October 23, 2011
  21. 21. Results § The topics of users who reply to a user are also likely for this user § Therefore, if 2 users get replies from the same users than they are more likely to talk about the same topics § Topic models which incorporate pragmatic metadata per user can indeed improve models § Topic models which incorporate pragmatic metadata per post often over-fit data § Model Assumptions are too strict! § Idea: Incorporate behavioral user similarities § Intuition: users which are similar are more likely to talk about the same topics § How to measure behavioral similarity? § forum usage 12 § communication behaviorSunday, October 23, 2011
  22. 22. Methodology Post 7 Future ID Alg Doc Metadata Past Post 1 authored M9 DMR Post top 10 forums Post 2 User 1 Post 3 M10 DMR User top 10 forums f1 f15 f2 f20 f3 f31 authored Post 4 top 10 f4 f12 M11 DMR Post communication f5 f5 Post 5 partner f6 f6 f7 f17 f8 f18 Post 6 top 10 f9 f19 User 2 M12 DMR User communication f10 f10 partner 13Sunday, October 23, 2011
  23. 23. Post  training  scheme   (M3,  M9  and  M11) Baseline  LDA   (M1  and  M2) User  training   scheme   (M4,  M10   and  M12) Models  M12     incorporates  user   similari;es  based  on   their  communica;on behavior 14Sunday, October 23, 2011
  24. 24. Results § Topic models seem to benefit from taking behavioral user similarities into account § Users who behave similar (regarding their forum usage and communication behavior) are likely to talk about the same topics § Common communication-partner seem to be more predictive for common topics than common forums 15Sunday, October 23, 2011
  25. 25. Conclusions § Pragmatic metadata may help to learn better semantic user models § But pragmatic metadata observed on a post level often over-fits data § Pragmatic Metadata on a user level seems to improve the predictive performance of topic models § If posts of 2 users are “used” in a similar way then they are more likely to talk about the same topics § If 2 users behave similar (tend to post to same forums or tend to talk to same users) they are more likely to talk about same topics. § Common communication-partner seem to be more predictive for common topics than common forums 16Sunday, October 23, 2011
  26. 26. Limitations and Future Work § Perplexity and semantic interpretability of topics do not necessarily correlate (Chang et al., 2009) § Separate evaluation of semantic coherence of topics § Analyzing different types of behavior- and usage-related metadata and explore to what extent they may reveal information about the semantics of data § behavior on social streams such as Twitter § tagging behavior § navigation behavior 17Sunday, October 23, 2011
  27. 27. References § David M. Blei, Andrew Ng, Michael Jordan. Latent Dirichlet allocation. JMLR (3) (2003) pp. 993-1022 § Chang, J., Boyd-graber, J., Gerrish, S., Wang, C. and Blei, D. Reading Tea Leaves: How Humans Interpret Topic Models, Neural Information Processing Systems, NIPS (2009) § Mimno, D.M. and McCallum, A. Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression. In Proceedings of UAI. (2008), pp. 411-418 18Sunday, October 23, 2011

×