Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deanonymize Tor Hidden Services

Tor Hidden Services enables server anonymity, which may lead potentially to illegal and criminal activities. In this presentation, state-of-the-art literature method against hidden service anonymity are presented to overcome this issue.

  • Be the first to comment

Deanonymize Tor Hidden Services

  1. 1. Deanonymize Tor Hidden Services Master in Engineering in Computer Science Web Security and Privacy a.y. 2016-17 Prof. Marchetti Spaccamela Alberto 1
  2. 2. About Us Andrea Bissoli Fabrizio Farinacci Andrea Prosseda Sara Veterini 2
  3. 3. What is Tor1 3
  4. 4. Tor in a nutshell The most popular volunteer-based anonymity network consisting of over 3000 relays. 4
  5. 5. How it works Client Onion Proxy Server 5
  6. 6. Hidden Services2 6
  7. 7. About HS ● Hidden services are the websites located inside the Tor Networks, which receive inbound connection only through Tor. ● They provide server anonymity in addition to Tor-default client one. ● They protect the location of the server hosting the service and provide encryption at every hop from a client to the hidden service. 7
  8. 8. Set up ● HS chooses some relays as Introduction Point (IP) that will be used to receive inbound connections from clients, building simple tor circuits to them. Server DB Client Onion Proxy 8
  9. 9. Set up (cont’d) ● HS creates an hidden service descriptor containing its: ○ public key ○ Introduction Points signed with its private key. ● It sends the descriptor to a directory (HSDir). ● An onion address xyz.onion, where xyz is first 80 bits of the hashed (SHA1) public key, is generated and sent to HSDir. 9
  10. 10. Client connection ● Client queries the HSDir with the onion address, obtaining HS descriptor. ● Chooses a Rendezvous Point (RP), builds a circuit to it and communicates a one-time secret (auth cookie). 10
  11. 11. Client connection (2) ● Client establishes a connection to one of the IPs and sends it an introduce message signed with HS public key containing: ○ RP address ○ One-time secret 11
  12. 12. Client connection (3) ● HS decrypts the message and builds a connection to the RP providing the one-time secret 12
  13. 13. Client connection (4) ● RP verifies the one-time secret and notifies the eventual success of connection to the client. ● Now client and service can communicate through the RP. 13
  14. 14. Deanonymization Attacks3 14
  15. 15. Types and Goals of Attacks Active Passive Misconfiguration Types The adversary injects malicious nodes in the tor network and eventually obtain the control of the HS entry guard with the possibility of disabling benign relays The adversary observes traffic looking for temporal and structural identifying patterns allowing him to discover the relays involved in the communication. The administrator of the HS injects unintentionally identifying information in either/both configuration files or/and hidden service content. Goals Deanonymize the hidden service’s IP through attacker controlled relays. Deanonymize first the clients involved in HS communications and then the specific HSs targeted by these ones. Deanonymize the owner identity (identity leaks) or the IP address of the hidden server (location leaks). 15
  16. 16. References ● Misconfiguration Attacks ○ “CARONTE: Detecting Location Leaks for Deanonymizing Tor Hidden Services” , Matic, et al., 2015 ● Passive Attacks ○ “Circuit Fingerprinting Attacks: Passive Deanonymization of Tor Hidden Services” , Kwon, et al., 2015 ○ “POSTER: Fingerprinting Tor Hidden Services” , Mitseva, et al., 2016 ● Active Attacks ○ “Protocol-level Hidden Server Discovery” , Ling et al., 2013 ○ “The Sniper Attack: Anonymously Deanonymizing and Disabling the Tor Network” , Jansen et al., 2014 16
  17. 17. Active Attacks3.1 17
  18. 18. “ Goal: “Deanonymize the hidden service’ IP through attacker controlled relays”. 18
  19. 19. Protocol Level Attack Attacker controls a client, a rendezvous point and some other relays of the Tor network. Furthermore, it has a central server where its nodes store relevant events of the connection. General idea: Since only entry nodes of the server knows its location (IP address) the attacker consists in trying several attempts of connections to the HS until this chooses an entry guard controlled by the attacker. Desired scenario: “Protocol-level Hidden Server Discovery” , Ling et al., 2013 19
  20. 20. Attack phases ● Phase 1 client continues to create circuits to the HS until one of attacker’s entry nodes sees a particular combination of cells. ● Phase 2 The attacker starts a testing phase on the previous entry point to understand if it is the actual entry guard of the HS, manipulating a cell in the Rendezvous Point. ● Phase 3 He concludes the test checking temporal correlation of events triggered by his nodes. If the presumably identified entry router is chosen by the hidden server, he can locate it accordingly. 20
  21. 21. Phase 1 ● The client continues to establish new connections with HS and recording every kind of cell in the central server. ● It repeats this loop until one of its entry point sees the following combination of cells However… This doesn’t imply that our entry point was chosen by THAT particular HS, but just by some HS. 21
  22. 22. Phase 2 ● In this phase the attacker want to be sure its relay is chosen as HS entry guard. ● When the client is about to establish the conversation with the server, it automatically sends a begin cell. ● The RP without even decrypting, it modifies 1 bit of the cell so that the server will not understand its content. Note that it works because the integrity check is performed ONLY at HS. ● The above triggers a destroy cell to be sent back to the client to tear down the complete circuit. ● Every attackers relay is waiting for this cell and, if it arrives, reports it to the central server (including the timestamp) 22
  23. 23. Phase 2 (cont’d) Client HS Central Server 23
  24. 24. Phase 3 ● The central server check the following ○ Both RP and entry node trigger a Destroy event ○ Timeliness of them is consistent: given Tb the timing of the begin cell and Te the timing of the destroy cell at RP and Td the timing of the destroy cell at entry point. If Tb < Td < Te timing of event is consistent ● This implies the attacker controls an HS entry guard so… he is directly connected with the server and consequently it knows its location 24
  25. 25. Sniper Attack It is based on a DoS attack towards HS critical Tor relays. Attacker controls just a client and at least one relay (GA). General idea: The attacker wants GA to be chosen as HS entry guard in order to identify the server location (as in the previous attack). To do that he needs first to disable ALL the HS entry guards until GA is chosen to be one of them. So keeps building a normal Tor connection to the HS until GA is directly connected to HS entry guard. At this point the attacker disables it performing a Sniper Attack. When GA becomes the HS entry guard it knows the HS location. “The Sniper Attack: Anonymously Deanonymizing and Disabling the Tor Network” , Jansen et al., 2014 25
  26. 26. Phase 1: Identify guards ● Adversary keeps building Tor circuits to the HS until GA is directly connected to HS entry guard. For these circuits, the adversary can directly observe the guards’ identities. ● To understand he is in this situation, he perform a simple request/response with the server. This implies RP sends a pattern of 50 PADDING cells to HS followed by a DESTROY cell. ● If GA observes a pattern of 2 cells (used to build the circuit) on a rendezvous circuit FROM a hidden service and 52 cells on the same circuit TO the hidden service (50 + 2 to build the circuit), followed by a DESTROY cell shortly after one is sent by the rendezvous, it concludes that GA is directly connected a guard of H. 26
  27. 27. Phase 2: Disable guards ● Once HS’s guards have been identified, the adversary builds a custom circuits by selecting targets as circuit entries and uses Sniper Attack to kill them. ● This can be done by repeatedly sending SENDMEs cells and blocking reading of packets in node GA. 27
  28. 28. Phase 3: Test for Guard Selection ● By repeating Sniper attack many times, the attacker eventually ends up in making the HS choose its relay GA as an entry guard. ● To determine if his guard GA was selected by HS, he uses techniques very similar to those used to identify guards in Phase 1. ● Since now the attacker controls an HS entry guard… he is directly connected with the server and consequently it knows its location 28
  29. 29. Passive Attacks3.2 29
  30. 30. “ Goal: “Deanonymize the clients involved in HS communications and then the specific HSs addressed by these firsts, exploiting circuit and traffic fingerprinting techniques”. 30
  31. 31. Circuit Fingerprinting Attack General idea: Since each circuit has unique structural and temporal characteristics, attacker can look at Tor traffic and classify observed circuits, looking at those particular characteristics. Once client-HS circuits are identified, Web Site Fingerprinting techniques employing traffic characteristics are used to identify the receiver HS that is so deanonymized. “Circuit Fingerprinting Attacks: Passive Deanonymization of Tor Hidden Services” , Kwon, et al., 2015 Attacker uses traffic fingerprinting techniques to identify Tor circuits, so he can determine the user's’ involvement with hidden services. 31
  32. 32. Attack phases ● Phase 1: Circuit Fingerprinting Attack Client-HS connection employs different circuits: HS-IP, Client-IP, HS-RP and Client-RP. The aim of this phase is classify these different circuits with fingerprint techniques. ● Phase 2: Website fingerprinting (WF) Attack Attacker can perform website fingerprinting (WF) attacks to deanonymize the hidden service clients and servers with the information of the phase 1. 32
  33. 33. Phase 1: Circuit Fingerprinting Attack ● We can distinguish 4 circuits: ○ HS - Introduction Point ○ Client - Introduction Point ○ Client - Rendezvous Point ○ HS - Rendezvous Point 33
  34. 34. Phase 1: observations ● Streams for different HS for the same client are not multiplexed in the same circuit (i.e. single RP/entry points is exploited for each) ● General circuits have different structure with respect to HS circuits (i.e. they do not employ RP and IP) and so different construction patterns, especially for client-RP circuits ● HS-IP circuits are long-lived (they need to stay up to accept incoming connection from clients), conversely from client-IP (short-lived) and general circuits (small duration on average) ● Incoming-Outgoing cells patterns, useful in identifying: ○ Client-IP (3 out + >3 in) and HS-IP circuits (>3 out = >3 in) ○ HS-RP (out >> in) because they serve content, conversely to client-RP (in >> out) sending small request and getting content 34
  35. 35. Phase 1: features and algorithms ● From the previous observation, we can derive the features: ○ Duration of activity: the time circuits are up ○ The number of incoming and outgoing cells ○ Circuit construction sequences toward the RP ● Tree-based and k-NN classifiers are used for circuit classification 35
  36. 36. Phase 2: Website Fingerprinting Attack ● Hidden service deanonymization through website fingerprinting using as features: ○ General traffic features as transmission size and time and number of incoming and outgoing cells in the transmission ○ Packet ordering, so the location of each outgoing cell ○ Bursts, so the number of consecutive cells of the same type both for incoming/outgoing traffic and performing WF in both ○ open world (i.e. looking at ALL the possible HSs) and ○ closed world (i.e. restricting the list to plausible HSs) settings. Conclusion: through website fingerprinting, the contacted HS is identified. 36
  37. 37. Circuit fingerprinting attack: problem ● Streams for different HS for the same client are not multiplexed in the same circuit (i.e. single RP/entry points is exploited for each) ● General circuits have different structure with respect to HS circuits (i.e. they do not employ RP and IP) and so different construction patterns, especially for client-RP circuits ● HS-IP circuits are long-lived (they need to stay up to accept incoming connection from clients), conversely from client-IP (short-lived) and general circuits (small duration on average) ● Incoming-Outgoing cells patterns, useful in identifying: ○ Client-IP (3 out + >3 in) and HS-IP circuits (>3 out = >3 in) ○ HS-RP (out >> in) because they serve content, conversely to client-RP (in >> out) sending small request and getting content 37
  38. 38. Circuit fingerprinting attack: problem (cont’d) ● Streams for different HS for the same client are not multiplexed in the same circuit (i.e. single RP/entry points is exploited for each) ● General circuits have different structure with respect to HS circuits (i.e. they do not employ RP and IP) and so different construction patterns, especially for client-RP circuits ● HS-IP circuits are long-lived (they need to stay up to accept incoming connection from clients), conversely from client-IP (short-lived) and general circuits (small duration on average) ● Incoming-Outgoing cells patterns, useful in identifying: ○ Client-IP (3 out + >3 in) and HS-IP circuits (>3 out = >3 in) ○ HS-RP (out >> in) because they serve content, conversely to client-RP (in >> out) sending small request and getting content 38 No longer true!!
  39. 39. POSTER Fingerprinting ● Try to detect an HS communication with circuit fingerprints (FPs): this exploits the fact that an HS connection leaks the information that multiple entry nodes are used ● FPs are computed based on statistics computed on: ○ the number of entry nodes ○ chronological sequence of incoming/outgoing cells. The more the fingerprints, the higher the classification capabilities ● An SVM-based classifier is trained with a 10-fold cross-validation scheme to detect: ○ Unknown HS (open-world), if all the 8 FPs are used ○ Known HS (closed-world), if just one FP is user with high recall and precision (greater than 95%). “POSTER: Fingerprinting Tor Hidden Services” , Mitseva, et al., 2016 39
  40. 40. User misconfiguration Attacks3.3 40
  41. 41. “ Goal: “Deanonymize the owner identity (identity leaks) or the IP address of the hidden server (location leaks)”. 41
  42. 42. Caronte Caronte is an automated tool based on finding location leaks. The input is the onion address(es) of the interested hidden service(s). General idea: Leak are discovered in the content or configuration of a hidden service finding some candidate identity (e.g., phone numbers embedded in a page) or candidate Internet endpoint (e.g., an IP address or DNS domain in an error page). Then, candidates are validated looking if the IP and the onion address lead to the same service. Location leaks: information in the content or configuration of a hidden service that gives away its location. Location leaks are introduced by the hidden service administrators and cannot be centrally fixed by the Tor project. “CARONTE: Detecting Location Leaks for Deanonymizing Tor Hidden Services , Matic, et al., 2015 42
  43. 43. Caronte Overview 43
  44. 44. Phase 1: Exploration ● Caronte visits: ○ root page of HS ○ all HTML resources in root page (/xyz) ○ a random resource to trigger an error page that may leak information placed there by the administrator. ● For each previous URL, Caronte visits and stores: ○ both with HTTP and HTTPS (to get its certificate) ○ with two Host header values (the onion address and a random onion address). An hosting server can contain more than one public service besides the hidden one. Requesting a random address may push the server to return the default (public) site leaking information. 44
  45. 45. Phase 2: Candidate selection The next step is to extract a list of candidates for each onion URL: ● Internet endpoints Pages may contain URLs, email and IP addresses. If URLs contain very popular DNS domains (checked in a public list of popular domains), they are discarded, otherwise they are kept. ● Unique strings These are: ○ Identifiers i.e. Google Analytics and AdSense id, Bitcoin wallets ○ Titles of pages, often distinctive They are looked up in search engines to return Internet sites where they have observed, to date back their DNS domains and to use them as candidate, if they are not popular. 45
  46. 46. Phase 2: Candidate selection (cont’d) ● HTTPS certificates Caronte extracts from certificates: ○ Subject’s Common Name (SCN) and Subject’s Alternative Name (SAN) that contain IP addresses and/or DNS domains. ○ SHA1 of DER format certificate and then searches it in SONAR database (that keeps certificates seen on the Internet) to retrieve the IPs that have used them. ○ the public key and searches in SONAR certificates containing the same key and repeats the same process of above. Additionally it searches in SONAR for any certificate whose SCN or SAN contains an onion address. ● The output is a set of candidate pair <onion address, endpoint>. 46
  47. 47. Phase 3: Validation ● For every pair <onion address, endpoint> it checks similarities between the candidate and one of the hidden service page. ● If the similarity is high then the candidate is actually a DNS domain or IP address of the hidden service. Default error pages or recurrent ones are excluded from this check. ● Validation is divided in two steps and 7 checks: ○ Server similarity ○ Body similarity 47
  48. 48. Intentional Similarities Leaks can be intentional. Example: Facebook wants to make its hidden service public. How can we check intentional similarities? There are three methods: ● Onion address is compared with the endpoint. If their longest common substring is larger or equal to 4 it means that the onion address was obtained by brute forcing the first 80 bits SHA1 in the generation process. Example: www.facebook.com & facebookcorewwwi.onion ● Check if the endpoint contains the onion address of the HS ● Check if titles of HS pages embeds the internet endpoint. 48
  49. 49. Thanks! ANY QUESTIONS? 49 You can find us on LinkedIn: Andrea Bissoli: https://www.linkedin.com/in/andrea-bissoli-537768116/ Fabrizio Farinacci: https://www.linkedin.com/in/fabrizio-farinacci-496679116/ Andrea Prosseda: https://www.linkedin.com/in/andrea-prosseda-2b8651116/ Sara Veterini: https://www.linkedin.com/in/sara-veterini-667684116/

×