Robust Expert Ranking in Online Communities - Fighting Sybil Attacks


Published on

This is our second presentation for the the IEEE CollaborateCom 2012 in Pittsburgh, USA. presented on Wednesday, October 17, 2012 in the Trustworthy Collaborative Systems session

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Fake multimedia and misbehaviour
  • e.g. Press Agencies
  • we discuss the notions of experts and expertise in the context of collaborative fake multimedia detection systems.Here we try to define the expert and we asume that ….Improve media evaluation (by increasing the impact of experts)
  • SybilGuard, SybilLimitaredescentralizedSumUpiscentralizerdSybilGuard is based on the “social network” among user identities, where an edge between two identities indicates a human-established trustrelationship. Malicious users can create many identities but few trust relationships. Thus, there is a disproportionately-small “cut” in the graph between the sybil nodes and the honest nodes. SybilGuard exploits this property to bound the number of identities a malicious usercancreate.SybilLimit – leverages the same insight as SybilGuard but is an improved version that reduces the accepted Sybil nodes of a honest node from O(nlogn) to O(logn) for n honest nodesWhen all nodes vote, SumUp leads to much lower attack capacity than SybilLimit despite the same asymptotic bound per attack edgeFirst, SumUp’s bound of 1 + log n inTheorem 5.1 is a loose upper bound of the actual average capacity. Second, since links pointing to lower-levelnodes are not eligible for ticket distribution, many incoming links of an adversarial nodes have zero tickets and thusare assigned capacity of one
  • P@K computes for a given result of ranked users, the fraction of relevant results in the top K results. The higher the precision, the betterthe performance is. We use this metric to compare the results of the expert ranking algorithms that we developed with the ranking of experts resulted by counting the numberof fair votes.Spearman’s rank correlation coefficienis a non-parametric measure of statistical dependence between two ranked lists.Spearman’s rank correlation coefficient it is based on rank order of scores and not the score data. Correlation Coefficient between the ranked variables d= Difference of rank between paired item in two series (lists).
  • For this step of the evaluation, I assume that all users in the network are behaving ina fair way and are rating a random number of media files. So the only way the userscan rate a media file wrong, is when the user has no competence in the specific topic.What is different in the two methods isthat, besides the reinforcement between users voting fairly and authentic media files,the ranking in the case of the MHITS considers also the local trustvalues the user has in the social network.Since average precision ignores the exact rank of a user, we use the Spearman's rankcorrelation coefficient to get a better view of the efficiency. In Table 6.2, the correlationcoefficients for n = 15 are presented. One can notice that the result of the MHITS algorithm is higher correlated to the fair number of media file ranking as thevalue gets closer to 1
  • From the results, we can see that our proposed model integration of Sumup to Mhits algorithm outperforms the HITS and the MHITS with out SumUp, which confirms the effectiveness of our approachAs it can be seen, the MHITS in combination with SumUp performs better for K = 10 and then for K = 20 the precision decreases much rapidly even than the MHITS. We think that this happens due to the fact that some Sybil users are already entering the ranking for K = 20 due to their high local trust values and therefore the precision decreases.
  • It can be noticed that by increasing the number of the Sybils, the attack edges or even the votes (up to 50% of the number of the fair votes), the ranking of the users do not change dramatically. Also it can be seen that the Modified HITS with SumUp performs only slightly better than the ModifiedHITS alone. The reason for these facts is that the steps that are additionally done by SumUp when run together with HITS which are: pruning of the trust network, assignment of capacity in the network and elimination of the links that posses high negative history do not affect the Sybils.The reason for this is that the capacity assignment does not reach them so votes from Sybils do not reach the source node. In this case, the edges connecting Sybils to fair nodes do not accumulate negative history and therefore are not eliminated. On this resulting network, Modified HITS is run again. The Sybils are kept and due to the high local trust values that they have from the other Sybil nodes in the group, they get into the top rank of experts.
  • Combination of expert ranking and resistant to Sybils algorithms to ensure robustness
  • Robust Expert Ranking in Online Communities - Fighting Sybil Attacks

    1. 1. Deutschen Akademischen 8th IEEE International Conference onAustauschdienstes Collaborative Computing: Networking, Applications and Worksharing October 14–17, 2012 Pittsburgh, Pennsylvania, United States Robust Expert Ranking in Online Communities - Fighting Sybil AttacksCollaborateCom2012Khaled Rashed Cristina Balasoiu Ralf Klamma Khaled A. N. Rashed, Cristina Balasoiu, Ralf Klamma RWTH Aachen University Advanced Community Information Systems (ACIS) {rashed|balsoiu|klamma} Lehrstuhl Informatik 5 (Information Systems) Prof. Dr. M. Jarke I5-DR-0312-1
    2. 2. Advanced Community InformationDeutschen AkademischenAustauschdienstes Systems (ACIS)CollaborateCom2012 Responsive Web Engineering Community Web Analytics Open VisualizationKhaled Rashed Community and Cristina Balasoiu Information Simulation Systems Ralf Klamma Community Community Support Analytics Lehrstuhl Informatik 5 Requirements (Information Systems) Prof. Dr. M. Jarke I5-DR-0312-2 Engineering
    3. 3. Deutschen AkademischenAustauschdienstes Agenda  Introduction and motivation  Related workCollaborateCom2012  Our ApproachKhaled Rashed Cristina Balasoiu – Expert ranking algorithm Ralf Klamma – Robustness of the expert ranking algorithm  Evaluation  Conclusions and outlook Lehrstuhl Informatik 5 (Information Systems) Prof. Dr. M. Jarke I5-DR-0312-3
    4. 4. Deutschen AkademischenAustauschdienstes Introduction  The expert search and ranking refer to the way of finding a group of authoritative users with special skills and knowledgeCollaborateCom2012 for a specific category.Khaled Rashed Cristina Balasoiu  The task is very important in online collaborative systems Ralf Klamma  Problems: openness and misbehaviour and – No attention has been made to the trust and reputation of experts  Solution: Leveraging trust Lehrstuhl Informatik 5 (Information Systems) Prof. Dr. M. Jarke I5-DR-0312-4
    5. 5. Deutschen AkademischenAustauschdienstes Motivation Examples Manipulating the truth for war Tidal bores presented as Indian Ocean propaganda TsunamiCollaborateCom2012Khaled Rashed Cristina Balasoiu Ralf Klamma  Published as: British soldiers abusing  Published as: 2004 Indian Ocean Tsunami prisoners in Iraq  Proved to be tidal bores, a four-day-long  Proved to be fake by Brigadier Geoff government-sponsored tourist festival in Sheldon who said the vehicle featured China in the photo had never been to Iraq Lehrstuhl Informatik 5 (Information Systems) Prof. Dr. M. Jarke  Expert knowledge, analysis and witnesses are needed to identify the fake! I5-DR-0312-5
    6. 6. A Case Study: Collaborative Fake MultimediaDeutschen AkademischenAustauschdienstes Detection System  Collaborative activities (rating, tagging and commenting) – Provide new means of search, retrieval and media authenticity evaluationCollaborateCom2012 – Explicit ratings and tags are used for evaluating authenticity of multimedia itemsKhaled Rashed Cristina Balasoiu – Reliability: not all of the submitted ratings are reliable Ralf Klamma – No centralized control mechanism – Vulnerability to attacks  Three types of users – Honest users – Experts Lehrstuhl Informatik 5 (Information Systems) – Malicious users Prof. Dr. M. Jarke I5-DR-0312-6
    7. 7. Deutschen AkademischenAustauschdienstes Research Questions and Goals  Research questions – How to measure users’ expertise in collaborative media sharing andCollaborateCom2012 evaluating systems? and how to rank them?Khaled Rashed – What is the implication of trust Cristina Balasoiu Ralf Klamma – Robustness! how to ensure robustness of the ranking algorithm  Goals – Improve multimedia evaluation – Reduce impacts of malicious users Lehrstuhl Informatik 5 (Information Systems) Prof. Dr. M. Jarke I5-DR-0312-7
    8. 8. Deutschen AkademischenAustauschdienstes Related Work  Probabilistic models e.g.[Tu et al.2010]  Voting models [Macdonald and Ounis 2006] [Macdonald et al.2008]CollaborateCom2012  Link-based approaches PageRank [Brein and Page 1998], HITS [Kleinberg1999] and their variations. SPEAR algorithm [Noll et al. 2009]Khaled Rashed Cristina Balasoiu Ralf Klamma ExpertRank [Jiao et al. 2009]  TREC enterprise track -Find the associations between candidates and documents e.g.[Balog 2006, Balog 2007]  Machine learning algorithms e.g. [Bian and Liu 2008, Li et al. 2009] Lehrstuhl Informatik 5 (Information Systems) Prof. Dr. M. Jarke I5-DR-0312-8
    9. 9. Deutschen AkademischenAustauschdienstes Our Approach  Assumptions – Expert users tend to have many authenticity ratingsCollaborateCom2012 – Correctly evaluated media are rated by users of high expertiseKhaled Rashed – Following expert users provides more benefits Cristina Balasoiu Ralf Klamma  Expert definition – Rates a big number of media files in an authentic way with respect to a topic and Highly trusted by his directly connected users – Should be trustable in evaluating multimedia Lehrstuhl Informatik 5 (Information Systems) Prof. Dr. M. Jarke I5-DR-0312-9
    10. 10. Deutschen AkademischenAustauschdienstes Expert Ranking Methods  Domain knowledge driven method – Considers tags that users assign to media files – User profile: merging tags user submitted to the media files in theCollaborateCom2012 systemKhaled Rashed – Similarity coefficient between the candidate profile and the tags Cristina Balasoiu assigned to a specific resource Ralf Klamma – Used to reorder users who voted a media file according to the tag profile  Domain knowledge independent method – Use the connections between users and resources to decide on the expertise of the users Lehrstuhl Informatik 5 – A modified version of HITS algorithm (Information Systems) Prof. Dr. M. Jarke I5-DR-0312-10 – Mutual reinforcement of users expertise and media
    11. 11. Deutschen AkademischenAustauschdienstes MHITS : Expert Ranking Algorithm  MHITS: Expert ranking algorithm in online collaborative systems – Link-based approach, based on HITS algorithmCollaborateCom2012 – HITS – Authorities: pages that are pointed to by good pagesKhaled Rashed Cristina Balasoiu – Hubs: pages that points to good pages Ralf Klamma – Reinforcement between hubs and authorities – MHITS – Users act as hubs (correctly evaluated media rated by them) – Media files act as authorities – Mutual reinforcement between users and media files Lehrstuhl Informatik 5 (Information Systems) – Local trust values between users are assigned Prof. Dr. M. Jarke I5-DR-0312-11 – Considers the rates of the users
    12. 12. Deutschen AkademischenAustauschdienstes MHITS: Expert Ranking Algorithm a(m) h(u ) r (u ) u U ( m)CollaborateCom2012 h(u) β a(m) r(u) ( 1 β) t(u) m M(u)Khaled Rashed Symbol Description Cristina Balasoiu a(m) Authority score Ralf Klamma U(m) Set of users pointing to media file m h(u) Hubness score r(u) Rating of user u for media file m  one network for users and ratings t(u) Average trust of the direct connected users to user u  one for users only (trust network). M(u) Set of media files to which user u points Trust in range [0, 1] Coefficient that weights the influence of Lehrstuhl Informatik 5 (Information Systems) Ratings 0.5 for a fake vote, the two terms, in range [0, 1] Prof. Dr. M. Jarke I5-DR-0312-12 1 for an authentic vote
    13. 13. Deutschen AkademischenAustauschdienstes Robustness of the MHITS Algorithm  Compromising techniques – Sybil attack [Douc02], Reputation theft, Whitewashing attack, etc. – Compromising the input and the output of the algorithm Sybil attackCollaborateCom2012 Khaled Rashed – Fundamental problem in online collaborative systems Cristina Balasoiu – A malicious user creates many fake accounts (Sybils) which all Ralf Klamma reference the user to boost his reputation (attacker’s goal is to be higher up in the rankings)  Countermeasures against Sybil attack SybilGuard [YKGF06] SybilLimit [YGKX08] SumUp [TMLS09] Protocol type Decentralized Decentralized Centralized Lehrstuhl Informatik 5 Accepted Sybils per (Information Systems) Prof. Dr. M. Jarke attack edge I5-DR-0312-13
    14. 14. Deutschen AkademischenAustauschdienstes SumUp  Centralized approach SumUp Steps – Aims to aggregate votes in a (1) Assign the source node and Sybil resilient manner number of votes per media fileCollaborateCom2012  Key idea – adaptive vote flow (2) Levels assignmentKhaled Rashed technique - that appropriately (3) Pruning step Cristina Balasoiu assigns and adjusts link capacities (4) Capacity assignment Ralf Klamma in the trust graph to collect the votes (5) Max-flow computation – collect for an object votes on each resource  New: we Integrate SumUp with the (6) Leverage user history to penalize MHITS Java implementation – used adversarial nodes own data structure based on Java Lehrstuhl Informatik 5 Sparse Arrays (Information Systems) Prof. Dr. M. Jarke I5-DR-0312-14
    15. 15. Deutschen AkademischenAustauschdienstes Integration of SumUp with MHITSCollaborateCom2012Khaled Rashed Cristina Balasoiu Ralf Klamma Lehrstuhl Informatik 5 (Information Systems) Prof. Dr. M. Jarke I5-DR-0312-15
    16. 16. Deutschen AkademischenAustauschdienstes Evaluation  Experimental Setup – BarabasiAlbert model for generating network – 300 usersCollaborateCom2012 – 20 media files (10 known to be fake and 10 known to be authentic)Khaled Rashed – 800 ratings Cristina Balasoiu – 3000 trust edges Ralf Klamma Lehrstuhl Informatik 5 (Information Systems) Prof. Dr. M. Jarke I5-DR-0312-16
    17. 17. Deutschen AkademischenAustauschdienstes Ratings DistributionCollaborateCom2012Khaled Rashed Cristina Balasoiu Ralf Klamma Lehrstuhl Informatik 5 (Information Systems) Prof. Dr. M. Jarke I5-DR-0312-17
    18. 18. Deutschen AkademischenAustauschdienstes Evaluation  Evaluation metrics: TopK TopK – Precision@K recision@K KCollaborateCom2012 – Spearman’s rank correlation coefficient +1 0 -1Khaled Rashed n Cristina Balasoiu 6 d i2 Perfect Positive No Correlation Perfect Negative Ralf Klamma ρs 1 i 1 Correlation Correlation n(n2 1) p - Spearman’s coefficient of rank correlation -1 ≤ ps ≤ 1 di - is the different between the rank of xi and the rank of yi n:- the number of data points in the sample (total number of observations)  ps = - 1 or 1 high degree of correlation between x any y  Ps = 0 a lack of linear association between two variables Lehrstuhl Informatik 5 (Information Systems) Prof. Dr. M. Jarke I5-DR-0312-18
    19. 19. Deutschen AkademischenAustauschdienstes Experimental Results ICollaborateCom2012Khaled Rashed Cristina Balasoiu Ralf Klamma  No Sybils HITS MHITS  Results are compared with the ranking of the users according to the number of fair ratings each of them had in the system Spearman 0.87 0.93 Lehrstuhl Informatik 5 (Information Systems) n=15 Prof. Dr. M. Jarke I5-DR-0312-19
    20. 20. Deutschen Akademischen Experimental Results IIAustauschdienstesCollaborateCom2012Khaled Rashed Cristina Balasoiu Ralf Klamma  10% Sybils HITS MHITS MHITS & SumUp  4 attack edges Spearman 0.52 0.68 0.93 Lehrstuhl Informatik 5 (Information Systems) n=20 Prof. Dr. M. Jarke I5-DR-0312-20
    21. 21. Deutschen Akademischen Experimental Results IIIAustauschdienstes Precision@KCollaborateCom2012Khaled Rashed Cristina Balasoiu Ralf Klamma 10% Sybils (one group) and 8 attack edges 20% Sybils (one group) and 24 attack edges Lehrstuhl Informatik 5 (Information Systems) Prof. Dr. M. Jarke I5-DR-0312-21
    22. 22. Deutschen AkademischenAustauschdienstes Further evaluation  3% 17% - Number of Sybil votes increased with respect to the total number of fair votes – expertise ranking does not changeCollaborateCom2012  9 to 14 and 24 Number of attack edges was increased keeping the number of Sybil votes to 17% percent of the number of fair votes andKhaled Rashed constant number of Sybils (50) Cristina Balasoiu Ralf Klamma – precision does not change  17% 50% and then to 100% the number of Sybil votes Increased keeping constant the Nr of attack edges (24) and Sybils Nr. K MHITS MHITS & SumUp MHITS MHITS&SumUp MHITS MHITS & SumUp 20% 20% 50% 50% 100% 100% 12 0.91 0.91 0.27 0.33 0.08 0.08 Lehrstuhl Informatik 5 15 0.93 0.93 0.33 0.40 0.06 0.06 (Information Systems) Prof. Dr. M. Jarke I5-DR-0312-22
    23. 23. Deutschen AkademischenAustauschdienstes Conclusions and Future Work  Conclusions – Proposed an expertise ranking algorithm in collaborative systemsCollaborateCom2012 (fake multimedia detection systems)Khaled Rashed – Leveraging trust and showed the trust implications Cristina Balasoiu Ralf Klamma – Combination of expert ranking and resistant to Sybils algorithms  Future Work Applying the algorithm on real data and on different data sets – Temporal analysis –time series analysis Lehrstuhl Informatik 5 (Information Systems) – Integrate the domain knowledge driven method Prof. Dr. M. Jarke I5-DR-0312-23