I Link Search And Routing In Social Networks


Published on

Published in: Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • I Link Search And Routing In Social Networks

    1. 1. KDD’07 Jeffrey Davitz, Jiye Yu, Sugato Basu, David Gutelius, and Alexandra Harris Presenter : Bo-Ren Lu Date : 2008 / 12 / 02
    2. 2. <ul><li>This paper focuses on the problem of modeling how social networks accomplish tasks through peer production style collaboration. </li></ul><ul><li>This paper propose a general interaction model for the underlying social networks and then a specific model (iLink) for social search and message routing. </li></ul><ul><li>A key contribution is the development of a general learning framework for making such online peer production systems work at scale. </li></ul><ul><li>FAQtory. </li></ul>
    3. 3. <ul><li>Introduction </li></ul><ul><li>ILINK </li></ul><ul><li>FAQTORY </li></ul><ul><ul><li>Routing Mechanism </li></ul></ul><ul><ul><li>Architecture </li></ul></ul><ul><li>Learning Framework </li></ul><ul><li>Case Study </li></ul><ul><li>Conclusion </li></ul>
    4. 4. <ul><li>The social web provides much more than an opportunity for people to interact and exchange general information. </li></ul><ul><li>A key trend is the growing importance of the social web as a production mechanism. </li></ul><ul><ul><li>The activity in social networks is beginning to exceed all other web-based activities including. </li></ul></ul>
    5. 5. <ul><li>whole notion of social network-based production has received enormous attention, no doubt motivated by hugely popular and successful open source efforts like Wikipedia and the LAMP family. </li></ul><ul><li>Many discussions of open-source production systems do not attempt to model the underlying dynamics. </li></ul><ul><ul><li>Rather they are described informally in terms of shared practices or collaboration platforms (like Wikis). </li></ul></ul><ul><li>The social networks underlying the open-source production systems work as loosely or semi-organized mobs is an inadequate model of the underlying social phenomena. </li></ul>
    6. 6. <ul><li>Understand how social networks create artifacts like Wikipedia. </li></ul><ul><li>Use this increased understanding to build new applications. </li></ul><ul><li>Required to do the following: </li></ul><ul><ul><li>Develop enriched node and link models for social networks. </li></ul></ul><ul><ul><ul><li>A social network is represented as a graph with nodes and links. </li></ul></ul></ul><ul><ul><ul><li>Need to allow the nodes to have associated property vectors. </li></ul></ul></ul><ul><ul><ul><li>The topology of the network is both probabilistic and dynamic. </li></ul></ul></ul><ul><ul><li>Develop an associated learning framework. </li></ul></ul><ul><ul><ul><li>Act within the network by learning patterns of interaction and properties of the social network. </li></ul></ul></ul>
    7. 7. <ul><li>Node </li></ul><ul><ul><li>The network has a set of n network nodes N = {Ni} n i=1 . </li></ul></ul><ul><ul><li>has an associated profile. </li></ul></ul><ul><ul><ul><li>(a) an expertise measure Ei = P(Tk|Ni) </li></ul></ul></ul><ul><ul><ul><ul><li>a global topic set T = {Tk} t k=1 </li></ul></ul></ul></ul><ul><ul><ul><li>(b) a referral rank Fi </li></ul></ul></ul><ul><ul><ul><ul><li>measures the general affinity of other nodes to route messages to the i th node. </li></ul></ul></ul></ul><ul><ul><ul><li>(c) a response score Ri </li></ul></ul></ul><ul><ul><ul><ul><li>a function of the response rate and response accuracy of a node to incoming messages. </li></ul></ul></ul></ul>
    8. 8. <ul><li>Supernode </li></ul><ul><ul><li>(a) a database D storing all the past message streams. </li></ul></ul><ul><ul><li>(b) profile parameters E,R, F for all the nodes in the network. </li></ul></ul><ul><ul><li>(c) the set of all possible topics T. </li></ul></ul>
    9. 9. <ul><li>Message </li></ul><ul><ul><li>A message m is routed between a node and the supernode. </li></ul></ul><ul><ul><li>A general message can have either content (e.g., words), meta-content (e.g., tags) or a forwarding address. </li></ul></ul><ul><ul><li>A new message generated by any node is represented as m 0 . </li></ul></ul><ul><ul><li>While being routed around in the network, the modifications made to the message are represented as m k . where m k is the k th modification made to the initial message m 0 . </li></ul></ul><ul><ul><li>two specific types: </li></ul></ul><ul><ul><ul><li>(a) a question q </li></ul></ul></ul><ul><ul><ul><li>(b) an answer a </li></ul></ul></ul>
    10. 10. <ul><li>A message stream will always begin with a question q (m 0 ) and end with an answer a (m k ). </li></ul><ul><li>All message flows between nodes are moderated by the supernode S. </li></ul>
    11. 11. <ul><li>On getting a message, a node has the choice to </li></ul><ul><ul><li>(i) answer </li></ul></ul><ul><ul><li>(ii) ignore </li></ul></ul><ul><ul><li>(iii) route the message </li></ul></ul><ul><ul><ul><li>possibly with modifications to the incoming message </li></ul></ul></ul><ul><ul><li>(iv) express interest in validating the answer to the query </li></ul></ul>
    12. 12. <ul><li>Two modes of possible routing scenarios </li></ul><ul><ul><li>(a) annotate a message with a forwarding address and send it to the supernode S, specifying that it be routed to another node. </li></ul></ul><ul><ul><li>(b) annotate a message with other content and forward it to S, in which case S makes the decision about to whom to next route the current message. </li></ul></ul>
    13. 13. <ul><li>The message flow and routing mechanism works in FAQtory </li></ul><ul><ul><li>For each incoming message stream {m 0 , . . . ,m k }, S computes its match with the existing message streams in the history database D. </li></ul></ul><ul><ul><li>If a good enough match is found, then the answer a of the matching message stream from D is extracted and recommended to the asking node as a possible answer for q. </li></ul></ul><ul><ul><li>If the asking node accepts the answer a, then the current message flow is terminated. Otherwise, it continues. </li></ul></ul>
    14. 14. <ul><ul><li>If m k in {m 0 , . . . ,m k } is marked as an answer a, then S routes it directly to the node that initially asked the question q = m 0 . </li></ul></ul><ul><ul><li>If the asking node confirms that a is indeed a correct answer to q, then the message flow in the system is terminated by S by notifying the nodes involved in the flow that a correct answer has been obtained. </li></ul></ul><ul><ul><li>If the asking node cannot confirm the correctness of a, S simultaneously sends the answer a for validation to those nodes that had previously expressed interest in evaluating the answer during query routing. </li></ul></ul><ul><ul><li>The answer is given the combined voted score of the evaluators. </li></ul></ul><ul><ul><li>If a is a correct response for q then the current message flow is terminated. Otherwise, it continues. </li></ul></ul>
    15. 15. <ul><ul><li>If m k specifies a direct forwarding address, then S routes the message to that address directly, else then for each node N i in the network, a ranking score is computed by S using a weighted linear combination (weights are currently an input parameter) of the node’s current rank F i , response score R i and the similarity between the topic distributions of the message and the node’s expertise measure E i ,calculated as Σ k P(m|T k )P(T k |N i ). </li></ul></ul><ul><ul><li>The nodes are ranked by S in decreasing order according to this ranking score, and the current message stream {m 0 , . . . ,m k } is next routed to the top r nodes in the network (r is an input parameter). </li></ul></ul>
    16. 16. <ul><ul><li>Each of these r nodes now has the choice to ignore the message stream, add a comment or address m k+1 to the message stream and send it to S, or add an answer a and send it to S. </li></ul></ul><ul><ul><li>Repeat all above until the message stream is terminated by a correct answer a to q. </li></ul></ul><ul><ul><li>On a message flow getting terminated by a correct response, the profile parameters E,R, F are updated globally at the supernode for all the nodes involved in the routing, and the set of topics T is also updated if necessary. </li></ul></ul><ul><li>The goal of the system is to effectively learn the E,R, F parameters of the node profiles and the global topic set T. </li></ul>
    17. 17. <ul><li>Example </li></ul><ul><li>A user node may have low response rate on average, but on a specific topic (e.g., music, mathematics) may be willing to go out of his way to correctly respond to or at least tag the question. </li></ul><ul><li>let P(R i |T k ) be the probability of response given a topic. Then, for any given message m, the topic-specific response rate can be calculated as P(R i |m) = Σ k P(R i , T k |m) = Σ k P(R i |T k )P(T k |m). </li></ul>
    18. 18. <ul><li>The current implementation of the FAQtory system is a stand-alone web application. </li></ul><ul><li>After FAQtory users query the system </li></ul><ul><ul><li>a list of question and answer pairs that are related to the query. </li></ul></ul><ul><ul><li>a list of experts on the topics found in the query. </li></ul></ul><ul><ul><ul><li>The query can be forwarded to the experts or to other nodes not listed. </li></ul></ul></ul><ul><ul><li>search results from the web or other data sources. </li></ul></ul><ul><li>The resulting answers are rated and reviewed by the asker and interested experts. </li></ul>
    19. 19. <ul><li>FAQtory system is implemented as a three-tiered client-server application. </li></ul><ul><ul><li>The client/front tier facilitates interactions between the nodes and the FAQtory server via APIs. </li></ul></ul><ul><ul><ul><li>web browser </li></ul></ul></ul><ul><ul><ul><li>email clients </li></ul></ul></ul><ul><ul><ul><li>instant messenger </li></ul></ul></ul>
    20. 20. <ul><ul><li>The middle tier is the core of the system. </li></ul></ul>
    21. 21. <ul><li>QA handler to fetch the relevant existing answers from either the KB repository (FAQ) or the third party source such as the web search engines. </li></ul><ul><li>Peer handler to fetch the relevant experts from the network/user community. </li></ul><ul><li>The server responses to Messenger to send back to the nodes. </li></ul>
    22. 22. <ul><li>The handlers get their answers from the Knowledge Managers (KM) that interact with the back-end knowledge bases, primarily through an indexing engine and a ranker. </li></ul><ul><li>The KMs also interact with a (topic) Learner, which is responsible for creating and updating the topic models for the nodes. </li></ul><ul><ul><li>The index engine is used to create inverted indices on both the existing answers (keyword & topics) and the node expertise. </li></ul></ul><ul><li>These three are based on the event-listener design pattern, which allows incremental and dynamic updates of the index and the topic models for online learning. </li></ul>
    23. 23. <ul><li>The analysis and processing of the free text is handled by the NLP processor that performs stemming, synonym lookup, stop word removals, and other pre-processing actions. </li></ul>
    24. 24. <ul><li>Heterogeneous topics </li></ul><ul><ul><li>The topic distribution for the different data types can be maintained separately if necessary. </li></ul></ul><ul><li>Cold-start problem </li></ul><ul><ul><li>Use KMeans-style scalable iterative algorithm to learn a multirelational topic model. </li></ul></ul><ul><ul><ul><li>Outputs </li></ul></ul></ul><ul><ul><ul><ul><li>(a) a set of local topics Li, where the j th local topic L ij is a distribution over a set of local keywords. </li></ul></ul></ul></ul><ul><ul><ul><ul><li>(b) a probability distribution P(L ij |N i ) over the local topic set L i for each user node N i . </li></ul></ul></ul></ul>
    25. 25. <ul><li>Privacy issues </li></ul><ul><ul><li>The outputs of the local topic models are sent to the global supernode S, which treats the j th local topic L ij from node N i as a point in a manifold over identifiers and performs information-theoretic probabilistic clustering (EM using KL divergence) of these topics. P(T k |N i ) = Σ L ij P(Lij |N i )P(T k |L ij ,N i ) </li></ul></ul>
    26. 26. <ul><li>Prior knowledge </li></ul><ul><ul><li>The topics should be able to incorporate partial supervision like domain-specific, user-specified tags, provided as prior knowledge to the system for topic inference. </li></ul></ul><ul><ul><li>A set of keywords/tags input as non-hierarchical semi-supervision is incorporated into the topic model by using an empirical prior over the relevant keywords. Currently, the hierarchy structure in hierarchical supervision is not used in learning the topic models model – hierarchical priors like ontologies are represented by considering only the leaf nodes in the hierarchy as a set of keyword identifiers, which are then incorporated into the topic model. </li></ul></ul>
    27. 27. <ul><li>Message matching </li></ul><ul><ul><li>Find the best p topics for message segment m Calculate P(m|T k ) computes the cosine similarities between the current augmented message stream and the existing message streams. </li></ul></ul><ul><li>Scalability </li></ul><ul><ul><li>pseudo-linear algorithms are used for topic inference and inverted indices are used for efficient implementation. </li></ul></ul><ul><ul><ul><li>I1 for mapping keywords w to topics T, and I2 for mapping topics T to nodes N. The output of I2 is used to get a ranked list of nodes relevant to message m. </li></ul></ul></ul>
    28. 28. <ul><li>Incremental learning </li></ul><ul><ul><li>(a) whenever a message stream terminates (for FAQtory, this corresponds to a correct answer a being found for a query q) </li></ul></ul><ul><ul><ul><li>T k (t+1) =T k (t) +(1/(t+1))(m-T k (t) ) </li></ul></ul></ul><ul><ul><ul><li>Update F similar to incremental Pagerank update </li></ul></ul></ul><ul><ul><ul><li>Update R </li></ul></ul></ul><ul><ul><ul><li>P(T k |N i ,m) α P(m|T k ,N i ).P (T k |N i ) =P(m|T k ).P (T k |N i ). P(T k |N i ,m) constitute the updated expertise measure E i . </li></ul></ul></ul><ul><ul><li>(b) when a new node is added to the network. </li></ul></ul><ul><ul><ul><li>rerunning the clustering or assigning the new topics to the closest existing cluster centroids. </li></ul></ul></ul><ul><ul><li>Note: number of topics remains the same. </li></ul></ul>
    29. 29. <ul><li>Calo Test Pilot </li></ul><ul><ul><li>The FAQtory system of iLink was developed as part of CALO, an adaptive cognitive system funded under DARPA’s PAL (Perceptive Assistant that Learns) program. </li></ul></ul><ul><li>PlatoonLeader </li></ul><ul><ul><li>provided with two features that leverage iLink technology: “Suggested Discussions” and the “Moderator’s Assistant”. </li></ul></ul><ul><li>Other Message Applications </li></ul><ul><ul><li>Smart Rss Filter </li></ul></ul><ul><ul><li>Message Routing for Advertising </li></ul></ul>
    30. 30. <ul><li>This paper believe that their approach, which focuses on the underlying message passing dynamics, is a promising avenue for the important task of extending the peer production paradigm. </li></ul><ul><li>Clearly, many practical and theoretical issues remain. </li></ul><ul><li>The importance of making such models work for real, ongoing web-based collaboration efforts is a growing issue and a wonderful challenge for machine learning and data mining techniques. </li></ul>