0
IBM China Research Laboratory




                        Social Map Based Recommendation for
                        Cont...
IBM China Research Laboratory


About me

   1993~1998
     – B.S. Computer Science, Tsinghua University
   1998~2000
    ...
IBM China Research Laboratory


Agenda

   Part 1:
   – Problem & challenges
   – Pharos solution overview
   – Demo
   Pa...
IBM China Research Laboratory


Problem

   Content-centric social websites (e.g., forums,
   wikis, and blogs) have flour...
Example China Research Laboratory
     IBM


      A Blog website contains huge amount of dynamically evolving content (bl...
IBM China Research Laboratory


Existing solutions & challenges

   Researchers have developed recommender
   systems to s...
IBM China Research Laboratory

Pharos Solution
   Dynamically create a social map helping users find out who's talking
   ...
IBM China Research Laboratory
Demo screenshot




                                    John




                           ...
IBM China Research Laboratory


Agenda

   Part 1:
   – Problem & challenges
   – Pharos solution overview
   – Demo
   Pa...
IBM China Research Laboratory

Pharos Overview

                                                          * Multi-faceted ...
IBM China Research Laboratory

Pharos Technical Focus

                                     Visual Recommendation
        ...
IBM China Research Laboratory


Latent community extraction

    Three approaches
    – Directly model user-content relati...
IBM China Research Laboratory

Approach 1: time-elastic co-clustering

     How long of the time window size we should use...
IBM China Research Laboratory
Input Data – Graph Stream
 User actions as a stream
          ... .............. ...... .. ....
IBM China Research Laboratory
Approach
   Two Step
    – Co-clustering graphs
    – Decide whether a new come graph should...
IBM China Research Laboratory

Pros and cons
    Pros
     – Clustering users and items on the same time
     – Parameter ...
IBM China Research Laboratory
Approach 2: evolutionary spectral clustering for user
clustering

     Discover communities ...
IBM China Research Laboratory
Evolutionary framework

      Basic Idea
       – Cost Function: Cost = α*CS +β*CT
         ...
IBM China Research Laboratory

Approach 3: LDA for content clustering

     Latent Dirichlet Allocation (LDA), a probabili...
IBM China Research Laboratory

Graphical Model of LDA




                                     20
IBM China Research Laboratory


Latent community extraction - comparison

    Co-clustering
    – Not work well for extrem...
IBM China Research Laboratory

Pharos Technical Focus

                                     Visual Recommendation
        ...
IBM China Research Laboratory
Item/People Ranking
                                                                        ...
IBM China Research Laboratory

Pharos Technical Focus

                                     Visual Recommendation
        ...
IBM China Research Laboratory


Community Summary & visualization

  Community representative keywords extraction
   – Mod...
IBM China Research Laboratory


Summary
 Model, detect, and use a social map that summarizes user behavior of
 online site...
IBM China Research Laboratory




                      Thanks!



                                27
Upcoming SlideShare
Loading in...5
×

Pharos Social Map Based Recommendation For Content Centric Social Websites

2,519

Published on

Published in: Technology
0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,519
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
203
Comments
0
Likes
8
Embeds 0
No embeds

No notes for slide

Transcript of "Pharos Social Map Based Recommendation For Content Centric Social Websites"

  1. 1. IBM China Research Laboratory Social Map Based Recommendation for Content-Centric Social Websites IBM Research - China Presenter: Shiwan Zhao (zhaosw@cn.ibm.com) Pharos Team: 赵石顽 袁泉 张夏天 郑文涛 Advisor: Michelle Zhou, Rongyao Fu, Changyan Chi 1
  2. 2. IBM China Research Laboratory About me 1993~1998 – B.S. Computer Science, Tsinghua University 1998~2000 – M.S. Computer Science, Tsinghua University 2000~now – IBM Research - China 2007~now – Focus on recommendation technologies 2
  3. 3. IBM China Research Laboratory Agenda Part 1: – Problem & challenges – Pharos solution overview – Demo Part 2: – Some technology details 3
  4. 4. IBM China Research Laboratory Problem Content-centric social websites (e.g., forums, wikis, and blogs) have flourished with the exponential growth of user-generated information – Overwhelming amount – Evolving over time – Not well organized It is hard for users, especially new users, to grasp what’s out there and then find out interested information 4
  5. 5. Example China Research Laboratory IBM A Blog website contains huge amount of dynamically evolving content (blog entries), while not providing effective navigation approaches – Search • Be useful when users have well-defined goals – Recent entries – Top entries by • most comments • most ratings • most visits – Featured blog entries – Tag cloud – … Like looking for needles in a haystack, without guidance, novice users can NOT find anything interesting, then leaves BlogCentral quickly (low stickiness), and won’t come back again (low stickiness) 5
  6. 6. IBM China Research Laboratory Existing solutions & challenges Researchers have developed recommender systems to solve this information overload problem – E.g. Blog/News/Webpage recommender However, current recommenders must address two challenges: – difficult to make effective recommendations for new users (the cold start problem) due to the lack of user information – difficult to explain recommendation rationales to end users to make the recommendation more trustworthy 6
  7. 7. IBM China Research Laboratory Pharos Solution Dynamically create a social map helping users find out who's talking about what in an online site. Social map creation – Modeling & summarizing time-sensitive user behaviors of content-centric online sites as a set of “latent communities” Social map based recommendations – Provide social landmarks for new users to jump start – Provide personalized social map for experienced users to effectively navigate the community 7
  8. 8. IBM China Research Laboratory Demo screenshot John Steve Michael Alice Tom 8
  9. 9. IBM China Research Laboratory Agenda Part 1: – Problem & challenges – Pharos solution overview – Demo Part 2: – Some technology details 9
  10. 10. IBM China Research Laboratory Pharos Overview * Multi-faceted recommendation Triggers Visual Recommendation Explanations Info item (page, fragment) Explicit People (reference to Bluepages, URL) Implicit Recommendation Algorithms Community (latent, dynamic community) Social Map .. . . .. . . . ....... .. Time-sensitive social map as . ... ..... . . . .... recommendation context target user Time Content Modeling Content Modeling Behavior Mining Behavior Mining User behavior on content 10
  11. 11. IBM China Research Laboratory Pharos Technical Focus Visual Recommendation Explanations 3. Community Recommendation summary Algorithms 2. Community/item/ people ranking Social Map .. . . .. . . . ....... .. . ... ..... . . . .... target user 1. Latent community Time extraction Content Modeling Content Modeling Behavior Mining Behavior Mining User behavior on content 11
  12. 12. IBM China Research Laboratory Latent community extraction Three approaches – Directly model user-content relationships by using co- clustering methods – Group people firstly, then find associated content – Group content firstly, then find associated people 12
  13. 13. IBM China Research Laboratory Approach 1: time-elastic co-clustering How long of the time window size we should use to mining the communities? How long is right? . . ... . ... ... .. ..... ... . .. . . .. ..... .... .. . . . ........ ....... .. .. ... .... .. . . ... ........... . .... .. . .. .. . .. .. . . . .... . ....... .. ...... ....... .. . .. . . . ... . . . . . . . . . . . . .. .. . ... . Time Time-Elastic ad hoc April 2009 community detection Community Map GraphScope: Parameter-free Mining of Large Time-evolving Graphs, Jimeng Sun, et al. KDD’07 13
  14. 14. IBM China Research Laboratory Input Data – Graph Stream User actions as a stream ... .............. ...... .. . ..... . .. ........ ....... . . . . . .. . .... ... . . .................. . . . . . . . .. . . ... . . .. .. .. . Time Split click stream into many small time atom frame ... . . .. .................. . . . . . . . .... .... . . .... .. . . . . .. . . . . . .. . . . .. . Time A frame click stream data can be presented by a user-item matrix (Graph). – In the matrix, 1 means one interaction between user and item. 14
  15. 15. IBM China Research Laboratory Approach Two Step – Co-clustering graphs – Decide whether a new come graph should be merged with current segment or start a new segment Based on the MDL (Minimum Description Length) of graphs – MDL is the limit of graphs can be compressed – Decide merging or splitting a segment • If compress graphs together can save more encoding cost than compress them respectively, we merge the new graphs with current segment. • Otherwise, we start a new segment by the new Graph 15
  16. 16. IBM China Research Laboratory Pros and cons Pros – Clustering users and items on the same time – Parameter free • Don’t need to assign cluster numbers – Automatically decide the size of time window Cons – Fixed Graph Size • Any graphs must have the same size (rows and columns) • Can’t handle new users and items – Can’t handle large scale graphs – Can’t guarantee the optimal result – Result on very sparse graph is not very good • Communities don’t make sense. • Our data is extremely sparse (< 0.1%) 16
  17. 17. IBM China Research Laboratory Approach 2: evolutionary spectral clustering for user clustering Discover communities within a time window – Get high quality clustering in each time window Model community evolution for a sequence of time windows – Make the evolution between time windows smooth Community Map .. . .. .. ... .. .. .. .. ... .. .. .. .. . ... .. .. .. . .. Time Jan 2009 Feb 2009 Mar 2009 Apr 2009 In BlogCentral Domain 17
  18. 18. IBM China Research Laboratory Evolutionary framework Basic Idea – Cost Function: Cost = α*CS +β*CT • Snapshot cost (CS), measures the snapshot quality of the current clustering result with respect to the current data features, • Temporal cost (CT), measures the temporal smoothness in terms of the goodness-of-fit of the current clustering result with respect to either historic data features or historic clustering results Two Evolutionary framework – PCQ for preserving cluster quality, the current partition is applied to historic data and the resulting cluster quality determines the temporal cost. – PCM for preserving cluster membership, the current partition is directly compared with the historic partition and the resulting difference determines the temporal cost. – PCQ is our currently implemented framework Evolutionary Spectral Clustering by Incorporating Temporal Smoothness, Yun Chi, et al. KDD’07 18
  19. 19. IBM China Research Laboratory Approach 3: LDA for content clustering Latent Dirichlet Allocation (LDA), a probabilistic latent semantic model for topic analysis ⎛ N ⎞ k p (w α , β ) = ∫ p (θ α )⎜ ∏∑ p ( z n θ ) p ( wn z n , β ) ⎟d θ ⎜ n =1 z ⎟ ⎝ n ⎠ [Blei et al. 03] LDA is a generative probabilistic model of a corpus. The basic idea is that the documents are represented as random mixtures over latent topics, where a topic is characterized by a distribution over words. 19
  20. 20. IBM China Research Laboratory Graphical Model of LDA 20
  21. 21. IBM China Research Laboratory Latent community extraction - comparison Co-clustering – Not work well for extremely sparse data (<0.1%) Spectral clustering for user – Most behaviors are from anonymous user, difficult to distinguish users – Topics are not concentrated for each community * LDA for content clustering – Users are more likely to be interested in content 21
  22. 22. IBM China Research Laboratory Pharos Technical Focus Visual Recommendation Explanations 3. Community Recommendation summary Algorithms 2. Item/people ranking Social Map .. . . .. . . . ....... .. . ... ..... . . . .... target user 1. Latent community Time extraction Content Modeling Content Modeling Behavior Mining Behavior Mining User behavior on content 22
  23. 23. IBM China Research Laboratory Item/People Ranking PR( p j ) PR( pi ) = (1 − d )cvi + d ∑ Authority-based ranking by context-sensitive PageRank, considering p j ∈M ( pi ) L( p j ) – Time factor Context vector (e.g., item attributes) – Context information, e.g., item attributes, report chain of people People Blog entries Influential people: Active author with A 1 high quality entries Influential entry: written by influential authors, high visited / B 2 commented Authority from author to entry Authority from entry to author C 3 Authority from commenter/rater to entry Authority from visitor to entry D 4 23
  24. 24. IBM China Research Laboratory Pharos Technical Focus Visual Recommendation Explanations 3. Community Recommendation summary Algorithms 2. Item/people ranking Social Map .. . . .. . . . ....... .. . ... ..... . . . .... target user 1. Latent community Time extraction Content Modeling Content Modeling Behavior Mining Behavior Mining User behavior on content 24
  25. 25. IBM China Research Laboratory Community Summary & visualization Community representative keywords extraction – Modified TF/IDF – Content topic modeling by LDA (Latent Dirichlet Allocation) Visualization – A bubble chart layout (used by ManyEyes2) to pack top-N communities tightly on the social map • bubble’s size is determined by community’s ‘hotness’ – Inside each community, Wordle3 layout used to pack labels tightly 25
  26. 26. IBM China Research Laboratory Summary Model, detect, and use a social map that summarizes user behavior of online sites to make accurate and trustworthy recommendations Increase recommendation accuracy – Helps “cold start” problem by providing new users with “social landmarks” of a social site to jump start their engagement – Provides users with overall social awareness to compensate for recommendation inaccuracy Enhance recommendation trustworthiness – Explain recommendation results in the context of a social map Interactive recommendation – User can navigation through the social map to find what they need 26
  27. 27. IBM China Research Laboratory Thanks! 27
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×