Tor Hidden Services enables server anonymity, which may lead potentially to illegal and criminal activities. In this presentation, state-of-the-art literature method against hidden service anonymity are presented to overcome this issue.
7. About HS
● Hidden services are the websites located inside
the Tor Networks, which receive inbound
connection only through Tor.
● They provide server anonymity in addition to
Tor-default client one.
● They protect the location of the server hosting
the service and provide encryption at every hop
from a client to the hidden service.
7
8. Set up
● HS chooses some relays as Introduction Point (IP) that
will be used to receive inbound connections from clients,
building simple tor circuits to them.
Server
DB
Client
Onion Proxy
8
9. Set up (cont’d)
● HS creates an hidden service descriptor containing its:
○ public key
○ Introduction Points
signed with its private key.
● It sends the descriptor to a directory (HSDir).
● An onion address xyz.onion, where xyz is first 80 bits of the
hashed (SHA1) public key, is generated and sent to HSDir.
9
10. Client connection
● Client queries the HSDir with the onion address, obtaining
HS descriptor.
● Chooses a Rendezvous Point (RP), builds a circuit to it
and communicates a one-time secret (auth cookie).
10
11. Client connection (2)
● Client establishes a connection to one of the IPs and sends it
an introduce message signed with HS public key containing:
○ RP address
○ One-time secret
11
12. Client connection (3)
● HS decrypts the message and builds a connection to the
RP providing the one-time secret
12
13. Client connection (4)
● RP verifies the one-time secret and notifies the eventual
success of connection to the client.
● Now client and service can communicate through the RP.
13
15. Types and Goals of Attacks
Active Passive Misconfiguration
Types
The adversary injects
malicious nodes in the tor
network and eventually
obtain the control of the
HS entry guard with the
possibility of disabling
benign relays
The adversary observes
traffic looking for
temporal and structural
identifying patterns
allowing him to discover
the relays involved in the
communication.
The administrator of the HS
injects unintentionally
identifying information in
either/both configuration
files or/and hidden service
content.
Goals
Deanonymize the hidden
service’s IP through
attacker controlled relays.
Deanonymize first the
clients involved in HS
communications and then
the specific HSs targeted
by these ones.
Deanonymize the owner
identity (identity leaks) or
the IP address of the hidden
server (location leaks).
15
16. References
● Misconfiguration Attacks
○ “CARONTE: Detecting Location Leaks for
Deanonymizing Tor Hidden Services” , Matic, et al., 2015
● Passive Attacks
○ “Circuit Fingerprinting Attacks: Passive Deanonymization
of Tor Hidden Services” , Kwon, et al., 2015
○ “POSTER: Fingerprinting Tor Hidden Services” , Mitseva,
et al., 2016
● Active Attacks
○ “Protocol-level Hidden Server Discovery” , Ling et al.,
2013
○ “The Sniper Attack: Anonymously Deanonymizing and
Disabling the Tor Network” , Jansen et al., 2014
16
19. Protocol Level Attack
Attacker controls a client, a rendezvous point and some other relays
of the Tor network. Furthermore, it has a central server where its
nodes store relevant events of the connection.
General idea:
Since only entry nodes of the server knows its location (IP address) the
attacker consists in trying several attempts of connections to the HS
until this chooses an entry guard controlled by the attacker.
Desired scenario:
“Protocol-level Hidden Server Discovery” , Ling et al., 2013
19
20. Attack phases
● Phase 1
client continues to create circuits to the HS until one of
attacker’s entry nodes sees a particular combination of cells.
● Phase 2
The attacker starts a testing phase on the previous entry
point to understand if it is the actual entry guard of the HS,
manipulating a cell in the Rendezvous Point.
● Phase 3
He concludes the test checking temporal correlation of
events triggered by his nodes.
If the presumably identified entry router is chosen by the
hidden server, he can locate it accordingly.
20
21. Phase 1
● The client continues to
establish new connections with
HS and recording every kind of
cell in the central server.
● It repeats this loop until one of
its entry point sees the
following combination of cells
However…
This doesn’t imply that our entry
point was chosen by THAT
particular HS, but just by some HS.
21
22. Phase 2
● In this phase the attacker want to be sure its relay is chosen
as HS entry guard.
● When the client is about to establish the conversation with the
server, it automatically sends a begin cell.
● The RP without even decrypting, it modifies 1 bit of the cell so
that the server will not understand its content. Note that it
works because the integrity check is performed ONLY at HS.
● The above triggers a destroy cell to be sent back to the client
to tear down the complete circuit.
● Every attackers relay is waiting for this cell and, if it arrives,
reports it to the central server (including the timestamp)
22
24. Phase 3
● The central server check the following
○ Both RP and entry node trigger a Destroy event
○ Timeliness of them is consistent: given Tb the timing of
the begin cell and Te the timing of the destroy cell at RP
and Td the timing of the destroy cell at entry point. If
Tb < Td < Te
timing of event is consistent
● This implies the attacker controls an HS entry guard so…
he is directly connected with the server
and consequently it knows its location
24
25. Sniper Attack
It is based on a DoS attack towards HS critical Tor relays. Attacker controls just
a client and at least one relay (GA).
General idea:
The attacker wants GA to be chosen as HS entry guard in order to identify the
server location (as in the previous attack). To do that he needs first to disable
ALL the HS entry guards until GA is chosen to be one of them.
So keeps building a normal Tor connection to the HS until GA is directly
connected to HS entry guard. At this point the attacker disables it performing a
Sniper Attack. When GA becomes the HS entry guard it knows the HS location.
“The Sniper Attack: Anonymously Deanonymizing and Disabling the Tor Network” , Jansen
et al., 2014
25
26. Phase 1: Identify guards
● Adversary keeps building Tor circuits to the HS until GA is directly
connected to HS entry guard. For these circuits, the adversary can
directly observe the guards’ identities.
● To understand he is in this situation, he perform a simple
request/response with the server. This implies RP sends a pattern
of 50 PADDING cells to HS followed by a DESTROY cell.
● If GA observes a pattern of 2 cells (used to build the circuit) on a
rendezvous circuit FROM a hidden service and 52 cells on the same
circuit TO the hidden service (50 + 2 to build the circuit), followed
by a DESTROY cell shortly after one is sent by the rendezvous, it
concludes that GA is directly connected a guard of H.
26
27. Phase 2: Disable guards
● Once HS’s guards have been identified, the adversary builds a
custom circuits by selecting targets as circuit entries and uses
Sniper Attack to kill them.
● This can be done by repeatedly sending SENDMEs cells and
blocking reading of packets in node GA.
27
28. Phase 3: Test for Guard Selection
● By repeating Sniper attack many times, the attacker eventually
ends up in making the HS choose its relay GA as an entry guard.
● To determine if his guard GA was selected by HS, he uses
techniques very similar to those used to identify guards in Phase 1.
● Since now the attacker controls an HS entry guard…
he is directly connected with the server
and consequently it knows its location
28
30. “
Goal:
“Deanonymize the clients involved in HS
communications and then the specific HSs
addressed by these firsts, exploiting circuit
and traffic fingerprinting techniques”.
30
31. Circuit Fingerprinting Attack
General idea:
Since each circuit has unique
structural and temporal
characteristics, attacker can
look at Tor traffic and classify
observed circuits, looking at
those particular characteristics.
Once client-HS circuits are
identified, Web Site
Fingerprinting techniques
employing traffic characteristics
are used to identify the receiver
HS that is so deanonymized.
“Circuit Fingerprinting Attacks: Passive Deanonymization of Tor Hidden Services” ,
Kwon, et al., 2015
Attacker uses traffic fingerprinting techniques to identify Tor circuits,
so he can determine the user's’ involvement with hidden services.
31
32. Attack phases
● Phase 1: Circuit Fingerprinting Attack
Client-HS connection employs different circuits: HS-IP,
Client-IP, HS-RP and Client-RP. The aim of this phase is
classify these different circuits with fingerprint techniques.
● Phase 2: Website fingerprinting (WF) Attack
Attacker can perform website fingerprinting (WF) attacks
to deanonymize the hidden service clients and servers with
the information of the phase 1.
32
33. Phase 1: Circuit Fingerprinting Attack
● We can distinguish 4 circuits:
○ HS - Introduction Point
○ Client - Introduction Point
○ Client - Rendezvous Point
○ HS - Rendezvous Point
33
34. Phase 1: observations
● Streams for different HS for the same client are not multiplexed in
the same circuit (i.e. single RP/entry points is exploited for each)
● General circuits have different structure with respect to HS circuits
(i.e. they do not employ RP and IP) and so different construction
patterns, especially for client-RP circuits
● HS-IP circuits are long-lived (they need to stay up to accept
incoming connection from clients), conversely from client-IP
(short-lived) and general circuits (small duration on average)
● Incoming-Outgoing cells patterns, useful in identifying:
○ Client-IP (3 out + >3 in) and HS-IP circuits (>3 out = >3 in)
○ HS-RP (out >> in) because they serve content, conversely to
client-RP (in >> out) sending small request and getting content
34
35. Phase 1: features and algorithms
● From the previous observation, we can derive the features:
○ Duration of activity: the time circuits are up
○ The number of incoming and outgoing cells
○ Circuit construction sequences toward the RP
● Tree-based and k-NN classifiers are used for circuit classification
35
36. Phase 2: Website Fingerprinting Attack
● Hidden service deanonymization through website fingerprinting
using as features:
○ General traffic features as transmission size and time and
number of incoming and outgoing cells in the transmission
○ Packet ordering, so the location of each outgoing cell
○ Bursts, so the number of consecutive cells of the same type
both for incoming/outgoing traffic
and performing WF in both
○ open world (i.e. looking at ALL the possible HSs) and
○ closed world (i.e. restricting the list to plausible HSs) settings.
Conclusion:
through website fingerprinting, the contacted HS is identified.
36
37. Circuit fingerprinting attack: problem
● Streams for different HS for the same client are not multiplexed in
the same circuit (i.e. single RP/entry points is exploited for each)
● General circuits have different structure with respect to HS circuits
(i.e. they do not employ RP and IP) and so different construction
patterns, especially for client-RP circuits
● HS-IP circuits are long-lived (they need to stay up to accept
incoming connection from clients), conversely from client-IP
(short-lived) and general circuits (small duration on average)
● Incoming-Outgoing cells patterns, useful in identifying:
○ Client-IP (3 out + >3 in) and HS-IP circuits (>3 out = >3 in)
○ HS-RP (out >> in) because they serve content, conversely to
client-RP (in >> out) sending small request and getting content
37
38. Circuit fingerprinting attack: problem (cont’d)
● Streams for different HS for the same client are not multiplexed in
the same circuit (i.e. single RP/entry points is exploited for each)
● General circuits have different structure with respect to HS circuits
(i.e. they do not employ RP and IP) and so different construction
patterns, especially for client-RP circuits
● HS-IP circuits are long-lived (they need to stay up to accept
incoming connection from clients), conversely from client-IP
(short-lived) and general circuits (small duration on average)
● Incoming-Outgoing cells patterns, useful in identifying:
○ Client-IP (3 out + >3 in) and HS-IP circuits (>3 out = >3 in)
○ HS-RP (out >> in) because they serve content, conversely to
client-RP (in >> out) sending small request and getting content
38
No longer true!!
39. POSTER Fingerprinting
● Try to detect an HS communication with circuit fingerprints (FPs):
this exploits the fact that an HS connection leaks the information
that multiple entry nodes are used
● FPs are computed based on statistics computed on:
○ the number of entry nodes
○ chronological sequence of incoming/outgoing cells.
The more the fingerprints, the higher the classification capabilities
● An SVM-based classifier is trained with a 10-fold cross-validation
scheme to detect:
○ Unknown HS (open-world), if all the 8 FPs are used
○ Known HS (closed-world), if just one FP is user
with high recall and precision (greater than 95%).
“POSTER: Fingerprinting Tor Hidden Services” , Mitseva, et al., 2016
39
42. Caronte
Caronte is an automated tool based on finding location leaks.
The input is the onion address(es) of the interested hidden service(s).
General idea:
Leak are discovered in the content or configuration of a hidden
service finding some candidate identity (e.g., phone numbers
embedded in a page) or candidate Internet endpoint (e.g., an IP
address or DNS domain in an error page).
Then, candidates are validated looking if the IP and the onion
address lead to the same service.
Location leaks: information in the content or configuration of a
hidden service that gives away its location. Location leaks are
introduced by the hidden service administrators and cannot be
centrally fixed by the Tor project.
“CARONTE: Detecting Location Leaks for Deanonymizing Tor Hidden Services ,
Matic, et al., 2015
42
44. Phase 1: Exploration
● Caronte visits:
○ root page of HS
○ all HTML resources in root page (/xyz)
○ a random resource to trigger an error page that may leak
information placed there by the administrator.
● For each previous URL, Caronte visits and stores:
○ both with HTTP and HTTPS (to get its certificate)
○ with two Host header values (the onion address and a
random onion address).
An hosting server can contain more than one public service besides
the hidden one. Requesting a random address may push the server
to return the default (public) site leaking information.
44
45. Phase 2: Candidate selection
The next step is to extract a list of candidates for each onion URL:
● Internet endpoints
Pages may contain URLs, email and IP addresses. If URLs contain
very popular DNS domains (checked in a public list of popular
domains), they are discarded, otherwise they are kept.
● Unique strings
These are:
○ Identifiers i.e. Google Analytics and AdSense id, Bitcoin wallets
○ Titles of pages, often distinctive
They are looked up in search engines to return Internet sites
where they have observed, to date back their DNS domains and
to use them as candidate, if they are not popular.
45
46. Phase 2: Candidate selection (cont’d)
● HTTPS certificates
Caronte extracts from certificates:
○ Subject’s Common Name (SCN) and Subject’s Alternative
Name (SAN) that contain IP addresses and/or DNS domains.
○ SHA1 of DER format certificate and then searches it in
SONAR database (that keeps certificates seen on the
Internet) to retrieve the IPs that have used them.
○ the public key and searches in SONAR certificates containing
the same key and repeats the same process of above.
Additionally it searches in SONAR for any certificate whose SCN
or SAN contains an onion address.
● The output is a set of candidate pair <onion address, endpoint>.
46
47. Phase 3: Validation
● For every pair <onion address,
endpoint> it checks similarities
between the candidate and
one of the hidden service
page.
● If the similarity is high then the
candidate is actually a DNS
domain or IP address of the
hidden service. Default error
pages or recurrent ones are
excluded from this check.
● Validation is divided in two
steps and 7 checks:
○ Server similarity
○ Body similarity
47
48. Intentional Similarities
Leaks can be intentional.
Example: Facebook wants to make its hidden service public.
How can we check intentional similarities?
There are three methods:
● Onion address is compared with the endpoint. If their
longest common substring is larger or equal to 4 it means
that the onion address was obtained by brute forcing the
first 80 bits SHA1 in the generation process.
Example: www.facebook.com & facebookcorewwwi.onion
● Check if the endpoint contains the onion address of the HS
● Check if titles of HS pages embeds the internet endpoint.
48
49. Thanks!
ANY QUESTIONS?
49
You can find us on LinkedIn:
Andrea Bissoli: https://www.linkedin.com/in/andrea-bissoli-537768116/
Fabrizio Farinacci: https://www.linkedin.com/in/fabrizio-farinacci-496679116/
Andrea Prosseda: https://www.linkedin.com/in/andrea-prosseda-2b8651116/
Sara Veterini: https://www.linkedin.com/in/sara-veterini-667684116/