downloading - IJS - Institut "Jozef Stefan"


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

downloading - IJS - Institut "Jozef Stefan"

  1. 1. EIGEN REP: REPUTATION MANAGEMENT IN P2P NETWORKS Because of the huge popularityP2Psystemsare gainingfromdayto day,mechanismsforsecure contentsharingare necessary.Thisneedisdictatedbythe anonymousandopennature of P2P systems, where infiltrationandspreadof inauthenticfiles are easilyaccomplished if malicious peers are trying to subvert the system. Reputationmechanismsingenerally are about generation, discovery and aggregation of rating information in electronic commerce and P2P systems [Bin Yu et al.], since it’s far more effective to identify malicious peersassourcesforinauthenticorfileswithbadqualitythantotrack downthose files whose number can be virtually unlimited. Thisreputation mechanism (EigenRep) is based on Power iteration (eigenvalue algorithm) for computingglobal reputationvaluesforeverypeer inthe network.Thisglobal rep.valueis later used for decidingif apeerwill interactwithanother peer, being aware of his reputation. It should represent an effectivewayfordetectingmaliciouspeers,and,aslatershown,maliciousgroups,too.Inotherwords,it should help isolating malicious peers from the P2P network, since peers who haven’t earn enough or have earned bad reputation are not allowed to participate in the P2P interactions. Furthermore, this approach helps distributing the load of computing and storing global values among all peers in the network, leading to minimal network overhead. The challenge for rep. systems in distributed environment is how to aggregate the local rep. values without centralized storage and management facility, while at the same time taking into consideration large enough number of peers (in order to get appropriate view for one’s reputation), without congesting the network with system messages (asking for each peer’s local reputation). DESIGN Five issues are considered while designing the system: 1) There should be no central authority that will dictate or define the shared ethic of peers; 2) Peer’s representation should be associated with an opaque ID, rather than some external identity (IP address); 3) No profit for newcomers, reputation should be hard to earn and only by consistent good behavior, hence preventing the system of whitewashers; 4) System with minimal overhear in terms of computing power, infrastructure, storage and message complexity; 5) Robustness to malicious collective; The approached used is based on transitive trust: a peer will also trust the trusted peers of its trusted peers. Advantage:highefficiency in decreasing the number of unsatisfactory downloads; even when malicious groups are present in large percentage (40-50%); Drawback: It doespreventthe systemof congestionand also reducesthe message complexity, but in the case of malicious groups operating successfully despite the reputation mechanism, the system will be subverted with much greater speed. The global reputationvalue of eachpeer i isgivenbythe local reputationvaluesassignedto i by other peers, weighted by the global reputations of the assigning peers.
  2. 2. In orderto aggregate local reputation values, they must be normalized. It prevents the system from effects caused when malicious peer assign arbitrarily high reputation value to other malicious peers and arbitrarily low reputation to good peers. 𝑐𝑖𝑗 = max⁡( 𝑠𝑖𝑗,0) ∑ max⁡( 𝑠𝑖𝑗,0)𝑗 Drawbacks of normalization: the normalized vector makes no difference between a peer with whom peer i didn’t interact with and one that peer i has had poor experience with. 𝑐𝑖𝑗 is relative, meaning that if 𝑐𝑖𝑗 = 𝑐𝑖𝑘 , we know that peers j and k have same reputation in the eyes of i, but we don’t know if they are both very reputable or not so reputable. Anyway,thisapproachisusedbecause itleadstoa good probabilisticmodel andstill being able to achieve substantially good results. // Note: what are the limits and the constraints? What is the number of peers, i.e. the size of the network for which these calculations stop to be acceptable? Three more practical issues are defined: 1) A priori notions of trust (based on the notion that usually first few peers that join the network are often trustworthy): some distribution p over the set of pre-trusted peers P is defined, such that pi=1/|P|, i Є P and pi=0 otherwise. 2) Inactive peers: peers that don’t download from anybody else, or assign zero score for the local reputationvalue of otherpeers.Inthiscase pi is usedto be the start vector, i.e if peer i doesn’t know or trust anybody, it will choose to trust the pre-trusted peers. 3) Maliciouscollectives:breakingthem by having each peer put some trust in the pre-trusted peers (that are not part of the collective). Security issues are handled by implementing two ideas: 1) The current trust value of a peer must not be computed and reside at the peer itself; 2) The trust value of one peer will be computed by more than one peer-called mother peers. The coordinate space is dynamically partitioned among peers in the system so that every peer covers a particular region in that coordinate space. Peers store (key, value) pairs, a.k.a. coordinates. The assignment of mother peers {M} is done using DHT, which uses hash function to deterministically map keys (e.g. file names) into points in a logical coordinate space. Here, a mother peer is located by hashing a unique ID of the peer (e.g. its IP address and TCP port) into a point in the DHT. The peer who currently covers this point as part of its DHT region is appointed as the mother of that peer. In otherwords:the DHT coordinates posi of the motherpeers(M) are determined by applying a set of one-way hash functions h0, h1, …, hM-1. Since eachpeeralsoacts as a motherpeer,itis assigned a set of daughters Di. Thus, every peer contains and reports the opinion vector of its daughter. In addition to these parameters, two more
  3. 3. valuesare stored and reported by each peer: 𝐵 𝑑𝑖- the set of peers that a peer’s daughter downloaded files from; and 𝐴 𝑑𝑖- the set of peers who downloaded files from peer’s daughter. 𝐵 𝑑𝑖 is provided forpeeriby itsdaughterd, whend submitsits trust assessments on the peer that it has downloaded files from. Similarly, 𝐴 𝑑𝑖 is provided by all the peers that had interaction with peer i’s daughter d, in the process of submitting their trust assessment of d. By introducingDHTs,the robustness of the system is increased (the data stored by the mother does not depend on the presence of the mother in the system, especially in the case of mother’s failure). The implementation of a secure algorithm introduces very important characteristics for the systems: 1) Because of the nature of one-wayhashfunction,there isnow wayfor a peer to know which peer’sIDit computesthe trustvalue for,hence itinvalidatesmalicious peers of giving good trust assessments for other malicious peers; (ANONYMITY) 2) Peers can’t choose their own coordinates in the hash space, hence it is not possible for a peer to locate itself in the DHT space (cannot determine their position by computing the hash value of its ID) and thus give itself good trust values; (RANDOMIZATION) 3) Since there are more mothersfor one peer, one multidimensional hash functions are used for creatingseveral coordinatespaces – thus mapping the peer’s unique ID into a different point in every multi-dimensional hash space. // Note: Are peers aware of their own trust values? (i.e. is there a constraint for the peer to query all of its mothers to get the trust assessment of itself) After computing and storing a global value, it can be used in more ways: 1) One of the purposes for a global value is to isolate malicious peers from the network. This can be done byintroducingarelationshipbetweenthe probabilityfordownloadingfile from a particular peer and its global trust value. This will limit the number of unsatisfactory downloads on the network while at the same time allow newcomers to build trust; 2) Anotherwayto use a global trustvalue isfor incenting peers to share “good” files, i.e. files that are authentic. This can be done by rewarding good peers with greater bandwidth, increasedconnectivityetc.Anothergoodside-effectof thisisthatit may give anincentive to a non-maliciouspeer to delete inauthentic files that a good peer downloaded by mistake, thus cleaning the system. SIMULATONS Simulations are based on a typical P2P network model: peers are queried and queries are propagated in the usual Gnutella way, by broadcasting them to other peers. Interactions between peers are computed based on a probabilistic content delivery model. Namely,itisassumedthatpeersare interestedinasubsetof the whole amountavailable onthe network, i.e. each peer pick a number of content categories and shares files only in these categories.
  4. 4. Whenthe simulatorgeneratesaquery,itactuallygenerates the categoryandthe rankof the file that will satisfythatquery. Each peer that receives the query checks if it supports the category and if it holds the file. Different thread modes are considered. The settings implemented on the simulator introduce a fairly pessimistic scenario: maliciouspeersconnecttothe mosthighlyconnectedpeerwhenjoiningthe network;theyhave large bandwidthandrespondtomore than 20% of the queriesissuedbythe rest of the peers in the network. Metrics of interest are the No inauthentic downloads versus authentic file downloads. Non-trusted versus reputation based scenarios are considered. Two different trust-based algorithms for selecting download sources are considered: 1) Deterministic: peer with highest trust is chosen as a download source; 2) Probabilistic:peerischosenwithcertainprobabilityandpeers that have trust value 0 are given 10% probability to be chosen as a download source (making a balance between newcomers and malicious peers with no trust); THREAT MODELS 1) No presence of malicious and pre-trusted peers: a. Vast load imbalance; b. No chance for new peers to build up reputation; c. Much better performance if using probab. Model than determin., but still just slightly better than non-trust based model; A) Always uploadinauthenticfiles andassigngoodtrustvaluestomaliciouspeersthey’ve foundon the way; B) Always upload inauthentic files and assign good trust values to malicious peers they’ve cooperate with in a colluding group; C) Uploadinauthenticfilesin f% of all cases in order to get some positive ratings from good peers and thus build some reputation (by uploading authentic files in some cases); //Note: according to the Figures, f%is the percentage of authentic fileswhenC-model peerswere chosen as a download source D) One group of maliciouspeersprovidesonlyauthentic files and uses the trust they’ve gained to boostthe trust valuesof another group of malicious peers that only provides inauthentic files. Simulation results for model A and B are very similar due to the presence of pre-trusted peers in mode B. Otherwise maliciousgroupformationwouldheavilyboost the trust values of malicious nodes. However,bothof the modelsperformmuchbetterthanthe case where notrust model is implemented at all. Simulation results for model C show that malicious peers have maximum impact on the network when providing 50% authentic files, which comes at a certain cost for them, because not only they participate inthe processof sharingand spreadinggoodfiles acrossthe network,they have to maintain
  5. 5. a repositoryof authenticfileswhichrequirescertainmaintenance overhead. Also,inthe long run, these peers will lose their reputation anyway. Simulation for model D considers a malicious group consisted of peers from both model B and D. Type D peers act as normal peers who try to gain good global trust and in turn assign it to the type B peers. There are other two threat models described in the paper, named as Model E and Model F. The firstone representsthe Sybil attack,where apeercaninitiate virtuallyinfinite number of peers on the network. After this peer is chosen for transaction, it sends inauthentic file, disconnects and entersthe systemwithcompletely newidentity,thusoutnumbering the good peers and lowering their chance to be chosen for a transaction.  Introducing some cost for every new ID The secondmodel isVirusDisseminators and is not addressed by EigenTrust. The reason for that is the following:the model consists of malicious peer sending inauthentic file every 100 transaction; the restof the filesare authentic.Since EigenTrustdoesn’tcompletelyeliminate corrupt files, in the case of executable to be shared, it can cause great damage. The reason why this kind of problems are not addressed is that it is taken into account that today’s problems with P2P networks are mainly about flooding of the network with inauthentic files (since the network is used for sharing files and digital media), not distribution of malicious executables (which are rarely used).