Pharos Social Map Based Recommendation For Content Centric Social Websites

  • 2,388 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,388
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
202
Comments
0
Likes
8

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. IBM China Research Laboratory Social Map Based Recommendation for Content-Centric Social Websites IBM Research - China Presenter: Shiwan Zhao (zhaosw@cn.ibm.com) Pharos Team: 赵石顽 袁泉 张夏天 郑文涛 Advisor: Michelle Zhou, Rongyao Fu, Changyan Chi 1
  • 2. IBM China Research Laboratory About me 1993~1998 – B.S. Computer Science, Tsinghua University 1998~2000 – M.S. Computer Science, Tsinghua University 2000~now – IBM Research - China 2007~now – Focus on recommendation technologies 2
  • 3. IBM China Research Laboratory Agenda Part 1: – Problem & challenges – Pharos solution overview – Demo Part 2: – Some technology details 3
  • 4. IBM China Research Laboratory Problem Content-centric social websites (e.g., forums, wikis, and blogs) have flourished with the exponential growth of user-generated information – Overwhelming amount – Evolving over time – Not well organized It is hard for users, especially new users, to grasp what’s out there and then find out interested information 4
  • 5. Example China Research Laboratory IBM A Blog website contains huge amount of dynamically evolving content (blog entries), while not providing effective navigation approaches – Search • Be useful when users have well-defined goals – Recent entries – Top entries by • most comments • most ratings • most visits – Featured blog entries – Tag cloud – … Like looking for needles in a haystack, without guidance, novice users can NOT find anything interesting, then leaves BlogCentral quickly (low stickiness), and won’t come back again (low stickiness) 5
  • 6. IBM China Research Laboratory Existing solutions & challenges Researchers have developed recommender systems to solve this information overload problem – E.g. Blog/News/Webpage recommender However, current recommenders must address two challenges: – difficult to make effective recommendations for new users (the cold start problem) due to the lack of user information – difficult to explain recommendation rationales to end users to make the recommendation more trustworthy 6
  • 7. IBM China Research Laboratory Pharos Solution Dynamically create a social map helping users find out who's talking about what in an online site. Social map creation – Modeling & summarizing time-sensitive user behaviors of content-centric online sites as a set of “latent communities” Social map based recommendations – Provide social landmarks for new users to jump start – Provide personalized social map for experienced users to effectively navigate the community 7
  • 8. IBM China Research Laboratory Demo screenshot John Steve Michael Alice Tom 8
  • 9. IBM China Research Laboratory Agenda Part 1: – Problem & challenges – Pharos solution overview – Demo Part 2: – Some technology details 9
  • 10. IBM China Research Laboratory Pharos Overview * Multi-faceted recommendation Triggers Visual Recommendation Explanations Info item (page, fragment) Explicit People (reference to Bluepages, URL) Implicit Recommendation Algorithms Community (latent, dynamic community) Social Map .. . . .. . . . ....... .. Time-sensitive social map as . ... ..... . . . .... recommendation context target user Time Content Modeling Content Modeling Behavior Mining Behavior Mining User behavior on content 10
  • 11. IBM China Research Laboratory Pharos Technical Focus Visual Recommendation Explanations 3. Community Recommendation summary Algorithms 2. Community/item/ people ranking Social Map .. . . .. . . . ....... .. . ... ..... . . . .... target user 1. Latent community Time extraction Content Modeling Content Modeling Behavior Mining Behavior Mining User behavior on content 11
  • 12. IBM China Research Laboratory Latent community extraction Three approaches – Directly model user-content relationships by using co- clustering methods – Group people firstly, then find associated content – Group content firstly, then find associated people 12
  • 13. IBM China Research Laboratory Approach 1: time-elastic co-clustering How long of the time window size we should use to mining the communities? How long is right? . . ... . ... ... .. ..... ... . .. . . .. ..... .... .. . . . ........ ....... .. .. ... .... .. . . ... ........... . .... .. . .. .. . .. .. . . . .... . ....... .. ...... ....... .. . .. . . . ... . . . . . . . . . . . . .. .. . ... . Time Time-Elastic ad hoc April 2009 community detection Community Map GraphScope: Parameter-free Mining of Large Time-evolving Graphs, Jimeng Sun, et al. KDD’07 13
  • 14. IBM China Research Laboratory Input Data – Graph Stream User actions as a stream ... .............. ...... .. . ..... . .. ........ ....... . . . . . .. . .... ... . . .................. . . . . . . . .. . . ... . . .. .. .. . Time Split click stream into many small time atom frame ... . . .. .................. . . . . . . . .... .... . . .... .. . . . . .. . . . . . .. . . . .. . Time A frame click stream data can be presented by a user-item matrix (Graph). – In the matrix, 1 means one interaction between user and item. 14
  • 15. IBM China Research Laboratory Approach Two Step – Co-clustering graphs – Decide whether a new come graph should be merged with current segment or start a new segment Based on the MDL (Minimum Description Length) of graphs – MDL is the limit of graphs can be compressed – Decide merging or splitting a segment • If compress graphs together can save more encoding cost than compress them respectively, we merge the new graphs with current segment. • Otherwise, we start a new segment by the new Graph 15
  • 16. IBM China Research Laboratory Pros and cons Pros – Clustering users and items on the same time – Parameter free • Don’t need to assign cluster numbers – Automatically decide the size of time window Cons – Fixed Graph Size • Any graphs must have the same size (rows and columns) • Can’t handle new users and items – Can’t handle large scale graphs – Can’t guarantee the optimal result – Result on very sparse graph is not very good • Communities don’t make sense. • Our data is extremely sparse (< 0.1%) 16
  • 17. IBM China Research Laboratory Approach 2: evolutionary spectral clustering for user clustering Discover communities within a time window – Get high quality clustering in each time window Model community evolution for a sequence of time windows – Make the evolution between time windows smooth Community Map .. . .. .. ... .. .. .. .. ... .. .. .. .. . ... .. .. .. . .. Time Jan 2009 Feb 2009 Mar 2009 Apr 2009 In BlogCentral Domain 17
  • 18. IBM China Research Laboratory Evolutionary framework Basic Idea – Cost Function: Cost = α*CS +β*CT • Snapshot cost (CS), measures the snapshot quality of the current clustering result with respect to the current data features, • Temporal cost (CT), measures the temporal smoothness in terms of the goodness-of-fit of the current clustering result with respect to either historic data features or historic clustering results Two Evolutionary framework – PCQ for preserving cluster quality, the current partition is applied to historic data and the resulting cluster quality determines the temporal cost. – PCM for preserving cluster membership, the current partition is directly compared with the historic partition and the resulting difference determines the temporal cost. – PCQ is our currently implemented framework Evolutionary Spectral Clustering by Incorporating Temporal Smoothness, Yun Chi, et al. KDD’07 18
  • 19. IBM China Research Laboratory Approach 3: LDA for content clustering Latent Dirichlet Allocation (LDA), a probabilistic latent semantic model for topic analysis ⎛ N ⎞ k p (w α , β ) = ∫ p (θ α )⎜ ∏∑ p ( z n θ ) p ( wn z n , β ) ⎟d θ ⎜ n =1 z ⎟ ⎝ n ⎠ [Blei et al. 03] LDA is a generative probabilistic model of a corpus. The basic idea is that the documents are represented as random mixtures over latent topics, where a topic is characterized by a distribution over words. 19
  • 20. IBM China Research Laboratory Graphical Model of LDA 20
  • 21. IBM China Research Laboratory Latent community extraction - comparison Co-clustering – Not work well for extremely sparse data (<0.1%) Spectral clustering for user – Most behaviors are from anonymous user, difficult to distinguish users – Topics are not concentrated for each community * LDA for content clustering – Users are more likely to be interested in content 21
  • 22. IBM China Research Laboratory Pharos Technical Focus Visual Recommendation Explanations 3. Community Recommendation summary Algorithms 2. Item/people ranking Social Map .. . . .. . . . ....... .. . ... ..... . . . .... target user 1. Latent community Time extraction Content Modeling Content Modeling Behavior Mining Behavior Mining User behavior on content 22
  • 23. IBM China Research Laboratory Item/People Ranking PR( p j ) PR( pi ) = (1 − d )cvi + d ∑ Authority-based ranking by context-sensitive PageRank, considering p j ∈M ( pi ) L( p j ) – Time factor Context vector (e.g., item attributes) – Context information, e.g., item attributes, report chain of people People Blog entries Influential people: Active author with A 1 high quality entries Influential entry: written by influential authors, high visited / B 2 commented Authority from author to entry Authority from entry to author C 3 Authority from commenter/rater to entry Authority from visitor to entry D 4 23
  • 24. IBM China Research Laboratory Pharos Technical Focus Visual Recommendation Explanations 3. Community Recommendation summary Algorithms 2. Item/people ranking Social Map .. . . .. . . . ....... .. . ... ..... . . . .... target user 1. Latent community Time extraction Content Modeling Content Modeling Behavior Mining Behavior Mining User behavior on content 24
  • 25. IBM China Research Laboratory Community Summary & visualization Community representative keywords extraction – Modified TF/IDF – Content topic modeling by LDA (Latent Dirichlet Allocation) Visualization – A bubble chart layout (used by ManyEyes2) to pack top-N communities tightly on the social map • bubble’s size is determined by community’s ‘hotness’ – Inside each community, Wordle3 layout used to pack labels tightly 25
  • 26. IBM China Research Laboratory Summary Model, detect, and use a social map that summarizes user behavior of online sites to make accurate and trustworthy recommendations Increase recommendation accuracy – Helps “cold start” problem by providing new users with “social landmarks” of a social site to jump start their engagement – Provides users with overall social awareness to compensate for recommendation inaccuracy Enhance recommendation trustworthiness – Explain recommendation results in the context of a social map Interactive recommendation – User can navigation through the social map to find what they need 26
  • 27. IBM China Research Laboratory Thanks! 27