Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

TOR Packet Analysis - Locating Identifying Markers


Published on

Network forensics analysis of TOR packets (through WireShark & eEye IRIS), locating unique identifying markers, for InfoSec management (LE/ISP MiTM visibility).

Published in: Technology
  • Be the first to comment

TOR Packet Analysis - Locating Identifying Markers

  2. 2. ABSTRACT This paper examines the traffic analysis of “The Onion Router” (TOR) network in order to identify any markers of TOR usage on the network packets. A historical overview of anonymity systems on the internet is provided. A detailed examination of the TOR system is also conducted discussing its development, its features, its limitations and its weaknesses. The methodology utilised to locate any TOR identifying markers is via a packet comparison of TOR and non-TOR identical network packets. A high-level and a low-level traffic analysis are conducted resulting in some TOR markers being identified. These results are put into a law enforcement context in order for a forensic analysis of TOR network packets to take place. Recommendations are given regarding the usage of TOR to mitigate the behavioural actions of users that have inadvertently violated their anonymity. 2 TOR Packet Analysis MUIR 2010
  3. 3. TABLE OF CONTENTS ABSTRACT ............................................................................................ 2 INTRODUCTION.................................................................................... 4 ANONYMITY ON THE INTERNET ........................................................... 5 TOR (THE ONION ROUTER) ................................................................. 10 METHODOLOGY ................................................................................. 19 TOR CLIENT ................................................................................................................ 20 PACKET COMPARISON................................................................................................ 22 PACKET CAPTURE & ANALYSIS .................................................................................... 24 RESULTS ............................................................................................. 25 CONCLUSION/RECOMMENDATIONS .................................................. 33 REFERENCE LIST ................................................................................. 36 3 TOR Packet Analysis MUIR 2010
  4. 4. INTRODUCTION Various technologies exist that assist internet users in maintaining their anonymity while online. One of the most common technologies that allows for anonymity is “TOR” (The Onion Router). This paper will examine the extent of anonymity that TOR provides when the network traffic is subject to traffic analysis techniques. Specifically, through the analysis of network packets, it is the hypothesis that TOR traffic will be distinguishable from regular internet traffic. Through traffic analysis and social engineering it is theorised that the originating IP address can still be learnt from the remnants of the TOR network traffic. Before discussing the analysis of TOR traffic, firstly anonymity on the internet will be explained providing a brief background into the different techniques that have been used since the internet was invented. Secondly, TOR itself will be discussed explaining the technology behind the onion router and how it provides anonymity. Next, the methodology utilised to test TOR’s ability to provide anonymity will be explained, including traffic capture and analysis techniques. The results of the traffic analysis will be detailed next providing an insight into TOR’s use as an anonymity tool on the internet. Lastly, some recommendations will be given regarding the use of TOR as an anonymity tool, and the analysis of TOR traffic in a law enforcement context. 4 TOR Packet Analysis MUIR 2010
  5. 5. ANONYMITY ON THE INTERNET The development of inter-networking that led to the “internet” was not designed for the mass-usage that it currently facilitates1. Many features that users take for granted were not specified, for example encryption, and were subsequently “tackedon” to a system that couldn’t initially support it, rather than building it from the ground-up with these features included2. One feature that was never envisaged was anonymity for its users3. With the exclusion of anonymity by default on the internet many systems have been designed to fill this gap, for example proxy servers and MIX networks, yet the network traffic these systems attempt to anonymise still relies on the same network infrastructure to send and receive packets4, thus leading to a misconception that these systems provide full-anonymity on the internet. Before discussing these anonymity systems any further it is important to define the term “anonymity”. Danezis and Diaz define anonymity as “the state of being not identifiable within a set of subjects, the anonymity set”5. This definition implies that a user on the internet should not be identifiable through their network traffic any more than any other user of the internet, that is, the network traffic should have no identifying characteristics. This is not the case with network traffic as it is necessary for network traffic to contain identification characteristics such as IP headers and port numbers so that the computers receiving this traffic are able to correctly interpret the data contained within. A more suitable term for these systems might be “unlinkability”, which is defined in the ISO15408 standard as follows: [Unlinkability] ensures that a user may make multiple uses of resources or services without others being able to link these uses together. [...] Unlinkability requires that users and/or subjects are unable to determine whether the same user caused certain specific operations in the system6. 1 (Hafner & Lyon, 2000) (Hafner & Lyon, 2000) 3 (Hafner & Lyon, 2000) 4 (Danezis & Diaz, 2008) 5 (Danezis & Diaz, 2008) 6 In (Danezis & Diaz, 2008) 2 TOR Packet Analysis 5 MUIR 2010
  6. 6. The term “unlinkability” describes a scenario where it is impossible to pin certain network traffic to a particular user (or computer) which is more accurate for describing the level of service that the anonymity systems provide on the internet. There are numerous types of anonymity systems (anonymisers) designed to function over the internet. These can be broken-down into three main categories: Proxy servers; Anonymous email clients, and MIX or Crowd systems. The idea of proxy servers is to route the network traffic through a proxy to hide the original IP address of the internet connection. An example of these simple proxy servers are “Anonymizer” and “SafeWeb”: The Anonymizer product acts as a web proxy through which all web requests and replies are relayed. The web servers accessed, should therefore not be able to extract any information about the address of the requesting user7. The purpose of these systems is to provide basic “unlinkability” and allows users the ability to access IP location-specific content, such as videos hosted on These types of systems are known as “one-hop” proxy servers as the network traffic is only routed through one proxy server at a time. This makes them distinct from MIX networks which route their traffic through multiple nodes (or “hops”) before reaching their destination. Due to the nature of the “onehop” system, back-tracing can be conducted on the internet traffic to determine the originating IP address of the packets. Anonymous email clients were originally devised in the early 1980’s by Chaum who designed an email communication system that used Public Key Cryptography (PKG) to not only hide the contents of the email message, but also the sender and receiver of the email message8. The purpose of anonymous email clients is to provide a communication channel where the author of an email message can communicate with another without revealing their identity. An example of an anonymous email 7 8 (Danezis & Diaz, 2008) (Chaum, 1981) TOR Packet Analysis 6 MUIR 2010
  7. 7. client is the “” relay system. This system utilised pseudonyms, or fictitious names, to facilitate an anonymous communication system: The technical principle behind the service was a table of correspondences between real email addresses and pseudonymous addresses, kept by the server. Email to a pseudonym would be forwarded to the real user. Email from a pseudonym was stripped of all identifying information and forwarded to the recipient9. These types of systems were designed with email communication in mind; however, email is only one form of communication over the internet, and so these systems are not suitable for providing anonymity (or unlinkability for that matter) for the majority of internet traffic. MIX or Crowd systems are similar in design to proxy servers; however they relay the network traffic through multiple MIXes (or nodes) rather than through only one proxy server. Each user contacts a central server and receives the list of participants, the crowd. A user then relays her web requests by passing it to another randomly selected node in the crowd. Upon receiving a request each node tosses a biased coin and decides if it should relay it further through the crowd or send it to the final recipient 10. The principal idea is that messages to be anonymized are relayed through a node, called a mix. The mix has a well-known RSA public key, and messages are divided into blocks and encrypted using this key. The first few blocks are conceptually the “header” of the message, and contain the address of the next mix. Upon receiving a message, a mix decrypts all the blocks, strips out the first block that contains the address of the recipient, and appends a block of random bits (the junk) at the end of the message. 11 9 (Danezis & Diaz, 2008) (Danezis & Diaz, 2008) 11 (Danezis & Diaz, 2008) 10 TOR Packet Analysis 7 MUIR 2010
  8. 8. Another purpose of MIX systems is to “actually mix together many messages, to make it difficult for an adversary to follow messages through it, on a first-in, first-out basis”12. Like the previous systems, Crowds and MIXes are also susceptible to various attacks that undermine the level of anonymity that they provide. One such attack is outlined by Reiter and Rubin who explain that these systems “can be undermined by executable web content that, if downloaded into the user's browser, can open network connections directly from the browser to web servers, thus bypassing Crowds altogether and exposing the user to the end server”13. By utilising end-to-end traffic analysis techniques other inadequacies of these systems can also be highlighted: Another attack tries to correlate events at the endpoints of the system: if a user makes an HTTP request, it is reasonable to assume that this request leaves the last MIX towards a web server shortly later. Similarly, the response sent from the web server to the last MIX will appear on the link between first MIX and user within some 14 seconds . Onion routers, such as TOR, work similarly to Crowds in providing anonymity; in fact their goal has been described as “to protect communication so that the recipients and the sender cannot be linked by an adversary analyzing the network traffic”15. Onion Routing is similar to Crowds in that an initial message forms a path of proxies through which the initiator sends its future messages. The protocol gets its name from its method of encrypting the initial packet and the address of the proxies at each hop on the path with the public key of the previous step. This scheme results in layers of encryption that are peeled off at each step in order to determine the next 12 (Danezis & Diaz, 2008) (Reiter & Rubin, 1998) 14 (Rennhard & Plattner, 2002) 15 (Gomu kiewicz, Klonowski, & Kutylowski, 2004) 13 TOR Packet Analysis 8 MUIR 2010
  9. 9. address to send to on the path. This requires the initiator to predetermine the entire path16. One issue with onion routing is highlighted by Danezis and Diaz who state that “onion routing aims at providing anonymous web browsing, and therefore would become too slow if proper mixing was to be implemented”17. This means that network traffic utilising onion routing does not mix various messages together unlike MIX networks. Another weakness with onion routing is described by Wright et al.: Onion Routing has generally been implemented with the onion routers being placed in the network outside of the control of the individual users. While it can be argued that this reduces the possibility of corruption of any particular onion router, it requires that the users trust the operators of the onion router to maintain their anonymity 18. The level of trust required in the people or organisations which host these nodes is possibly the biggest weakness in onion routing. Any corrupt node along the route can compromise the entire anonymity of the network packet. …strong anonymity against traffic analysis requires cooperation by and implicit trust in many different parties. Any single entity, no matter how trustworthy it appears, can be subverted, whether by technical means, corrupt personnel, or so-called “subpoena attacks”19. 16 (Wright, Adler, Levine, & Shields, 2002) (Danezis & Diaz, 2008) 18 (Wright et al., 2002) 19 (Androulaki, Raykova, Srivatsan, Stavrou, & Bellovin, 2008) 17 TOR Packet Analysis MUIR 9 2010
  10. 10. TOR (THE ONION ROUTER) One of the most widely deployed onion-routing anonymising systems is TOR (The Onion Router). TOR is known as a "second-generation onion router"20 and was originally funded by the United States Naval Research Centre and the United States Defense Advanced Research Projects Agency (DARPA)21. The history of the TOR project will be briefly outlined before a detailed examination of the TOR design is discussed. After this the design limitations and weaknesses of TOR will be scrutinised providing an overview of the attacks that have been proposed to target the TOR system. HISTORY The beginnings of the TOR project date back to the mid-1990s when the US Office of Naval Research (ONR) began developing onion routing techniques 22. The development of the second generation of onion routers did not begin until 2002 when funding was provided by DARPA and ONR23. In 2003 the TOR network was publicly deployed with nodes spanning across 2 continents and the following year “hidden services” went online24. The funding from DARPA and ONR ceased in 2004 and the Electronic Frontier Foundation stepped up to continue funding the TOR project25. One of the main purposes of TOR has been stated as being to “defend against a form of network surveillance that threatens personal freedom and privacy”26. It is widely acknowledged that TOR is often used by journalists and people who wish to remain anonymous while browsing online, as well as people who have 20 (Danezis & Diaz, 2008) (The Tor Project Inc, 2009b) 22 (Naval Research Laboratory) 23 (Naval Research Laboratory) 24 (Naval Research Laboratory) 25 (Naval Research Laboratory) 26 (The Tor Project Inc, 2009a) 21 TOR Packet Analysis 10 MUIR 2010
  11. 11. restricted internet access, for example people living in China27. The US military also utilise TOR to host hidden services for intelligence gathering purposes28. DESIGN The TOR system follows on from traditional onion routing services, that is it utilises proxy servers in order to spoof an IP address so that the originating IP address remains unknown. TOR can be seen as a mix between onion routing and crowd systems. The TOR system “tunnels everything over TCP Port 80” 29 “over a network of relays, and is particularly well tuned to work for web traffic, with the help of the ‘Privoxy’ content sanitizer”30. Privoxy is a web proxy service which modifies “web page data and HTTP headers”31 and is commonly used for “removing ads and other obnoxious Internet junk”32. In the case of TOR this web cache assists in removing web traffic that could reveal the true IP address of the user, such as Javascript or Flash content. Unlike traditional onion routing services TOR does not send the traffic through in its original packet format, instead TOR uses fixed-length “Cells” to transfer data. Each Cell consists of a header and a payload (see Diagram 1). As stated by Fraser et. al, “TOR operates using fixed 512 byte cells (or packets) for stronger anonymity and the Transport Layer Security (TLS) protocol for authentication and privacy”33. Coupled with this Cell-based design, TOR utilises “Circuits” to choose the path that the data will take as well as which protocol layer to anonymise: “they may intercept IP packets directly, and relay them whole (stripping the source address) along the circuit”34. 27 (The Tor Project Inc, 2009c) (The Tor Project Inc, 2009c) 29 (Danezis & Diaz, 2008) 30 (Danezis & Diaz, 2008) 31 (Privoxy Developers 2010) 32 (Danezis & Diaz, 2008) 33 (Fraser, Raines, & Baldwin, 2005) 34 (Dingledine, Mathewson, & Syverson, 2005) 28 TOR Packet Analysis 11 MUIR 2010
  12. 12. Diagram 1 – TOR Cells (Packets)35 TOR uses a traditional network architecture: a list of volunteer servers is downloaded from a directory service. Then, clients can create paths by choosing three random nodes, over which their communication is relayed. Instead of an `onion' being sent to distribute the cryptographic material, Tor uses an iterative mechanism. The client connects to the first node, then it request this node to connect to the next one. The bi-directional channel is used at each stage to perform an authenticated DiffieHellman key exchange. This guarantees forward secrecy and compulsion resistance: only short term encryption keys are ever needed36. Diagram 2 – The TOR Network37 35 (Dingledine, Mathewson, & Syverson, 2004) (Danezis & Diaz, 2008) 37 (Bauer, McCoy, Grunwald, Kohno, & Sicker, 2007) 36 TOR Packet Analysis MUIR 12 2010
  13. 13. HIDDEN SERVICES Another benefit of the TOR system over traditional onion routers is that it allows users to host content on the internet that can only be accessed via the use of the TOR system, these are known as “Hidden Services”. These hidden services are denoted by the use of the virtual Top Level Domain (TLD) “.onion” which is the address entered by the user to connect to this type of service. When connecting to a hidden service a user creates a new circuit to the hidden service’s rendezvous point which adds an extra layer of protection38 (see Diagram 3). As claimed by Dingledine et al. “this type of anonymity protects against Distributed-Denial-of-Service attacks: attackers are forced to attack the onion routing network because they do not know the host’s IP address”39. Diagram 3 - Normal use of hidden services and rendezvous servers 40 38 (Dingledine et al., 2004) (Dingledine et al., 2004) 40 (Øverlier & Syverson, 2006) 39 TOR Packet Analysis 13 MUIR 2010
  14. 14. LIMITATIONS and WEAKNESSES The TOR system is not without its share of limitations; Danezis and Diaz raise the point that “one notable difference between TOR and previous attempts at anonymizing streams of traffic, is that it does not claim to offer security against even passive global observers”41. In fact Lemos states that “the problem is known to both the Tor Project, which advises everyone to use end-to-end encryption, and to security researchers”42. This limitation accumulates to the following point, “an adversary, who can observe a stream at two different points, can trivially realize it is the same traffic”43. This limitation leads to weaknesses that can be exploited to undermine the anonymity of the TOR system. As outlined in Table 1 there are two types of attacks against the TOR network: active attacks and passive attacks. PASSIVE ATTACKS ACTIVE ATTACKS – Packet and connection timing – Lying about bandwidth to get more correlation traffic – Fingerprinting of traffic/usage patterns – Failing circuits to bias node selection – “Intersection Attacks” of multiple – Modifying application layer traffic at attributes of users exit Table 1 – Attacks Against TOR44 Passive attacks involve collecting of the network packets for later analysis and are often hard to detect45. Fu et. al state that “passive traffic analysis attacks may, at first sight, appear innocuous since those attacks do not actively alter the traffic (e.g., 41 (Danezis & Diaz, 2008) (Lemos, 2007) 43 (Danezis & Diaz, 2008) 44 (Perry, 2007) 45 (Fu, Graham, Bettati, & Zhao, 2003) 42 TOR Packet Analysis 14 MUIR 2010
  15. 15. drop, insert, and modify packets during a communication session)”46. Whereas active attacks use probing methods to collect packet information which may alter the traffic on the network. The various types of attacks against TOR, and there position in the TOR network, are detailed in Diagram 4. As stated by Sun et. al: Even when multiple proxies are used, however, the first link between the user and the first proxy is the most vulnerable to attack, since the attacker (whether the first proxy itself, the user's ISP, or perhaps an eavesdropper (say, on a wireless link) can immediately determine the user's network address47. Diagram 4 - TOR Attack Points48 One common attack against the TOR system is known as a “Timing Correlation Attack”. This type of attack uses timing analysis methods to determine the network latency of the TOR system. As observed by Murdoch: …the load on the Tor node affects the latency of all connection streams that are routed through this node. A similar increase in latency is introduced at all layers. As expected, the higher the load on the node, the higher the latency49. 46 (Fu et al., 2003) (Sun et al., 2002) 48 (Perry, 2007) 47 TOR Packet Analysis 15 MUIR 2010
  16. 16. An attacker relays traffic over all routers, and measures their latency: this latency is affected by the other streams transported over the router. Long term correlations between known signals injected by a malicious server and the measurements are possible. This allows an adversary to trace a connection up to the first router used to anonymize it50. Diagram 5 - How Much Anonymity Does Network Latency Leak?51 (Measuring TOR circuit time without application-layer ACKs: the estimate for TAX is t3 - t1. We abuse notation and write TXY for the one-way delay from X to Y 52) Using website fingerprinting is another passive attack against the TOR system. In this type of attack an adversary “fingerprints” commonly visited websites to determine their file size, this file size is then compared to the network packets to determine if there are any matches. As stated by Hintz: 49 (Murdoch & Danezis, 2005) (Danezis & Diaz, 2008) 51 (Hopper, Vasserman, & Chan-Tin, 2007) 52 (Hopper et al., 2007) 50 TOR Packet Analysis 16 MUIR 2010
  17. 17. When a user visits a typical webpage, they download several files. A user downloads the HTML file for the webpage, images included in the page, and the referenced stylesheets. Each of these... files has a specific file size which is for the most part constant.53 Attacks against TOR Hidden Services have also been devised. Øverlier and Syverson discuss an attack which is used to locate the address of the Hidden Service. To carry out this attack a compromised TOR node and a malicious client machine are used to make repeated connections to the Hidden Service (see Diagram 6). The main idea is to make many connections to the hidden server, so that it eventually builds a circuit to the rendezvous point using the malicious Tor node as an entry point. The malicious Tor node uses a simple timing analysis (packet counting) to 54 discover when this has happened . Diagram 6 - Vulnerable location of Attacker in communication channel to the Hidden Server 55 Although the design specifications of the TOR system negates traditional DDoS attacks, Fraser et. al have proposed a mutated DDoS attack on TOR based on TOR’s use of TLS. 53 (Hintz, 2003) In (Hopper et al., 2007) 55 (Øverlier & Syverson, 2006) 54 TOR Packet Analysis 17 MUIR 2010
  18. 18. DDoS attacks targeting an Onion Router’s CPU are possible due to TOR’s dependence on TLS. Such attacks force an Onion Router to execute so many public key decryptions that it can no longer route messages56. Another weakness in the TOR system is due to the fact that any user may host a TOR server (node) which means that any person wishing to host a compromised node is able to do so without any major hurdles. TOR designers have developed a formula for determining the probability of using a compromised node: …the probability of choosing a compromised entrance node is m/N and the probability of choosing a compromised exit node is the same, thus, the combinatorial model is expressed as (m/N )2, where m > 1 is the number of malicious nodes and N is the network size…57 Another passive attack can be achieved by hosting a compromised TOR node and collecting the unencrypted packets exiting this node. In this type of attack high-level information about the network traffic can be learnt. Egerstad conducted an attack against TOR using this method described and was able to intercept email messages “discussing military and national-security issues between embassies and sensitive corporate e-mail messages”58. This highlights another limitation of the TOR system, or any anonymity system, if users enter their real logins and email addresses into TOR then their perceived anonymity is compromised. TOR is not designed to be used by “real” users due to the lack of end-to-end encryption, instead it is recommended that people utilise anonymous email clients and logins59. 56 (Fraser et al., 2005) In (Bauer et al., 2007) 58 In (Lemos, 2007) 59 (The Tor Project Inc, 2009a) 57 TOR Packet Analysis 18 MUIR 2010
  19. 19. METHODOLOGY A gap exists in the research regarding how the weaknesses in TOR can be utilised from a law enforcement perspective. In order to establish what information can be gathered from the analysis of TOR packets, a packet comparison is necessary. This comparison will examine the TOR packets as well as identical non-TOR (or standard) internet packets. There are three stages to the methodology: Setting up the TOR system; Packet selection, and Analysis. The setup of the TOR system will be discussed to detail how the packets will be intercepted. This will be followed by an explanation of the types of internet traffic examined. Finally the analysis stage will be outlined discussing the various tools utilised to examine the TOR packets. 19 TOR Packet Analysis MUIR 2010
  20. 20. TOR CLIENT To ensure that the network traffic was generated from identical machines, virtual machines (VMs) were utilised: one with TOR installed and the other without TOR. Originally it had been planned to run a TOR exit node on a local server in order to capture the unencrypted network traffic as it left the exit node, however it was determined that to propagate realistic network traffic locally would produce undesired results. Instead a standard TOR client was installed on the TOR-VM. The full specifications for the two VMs was as follows: TOR VM Non-TOR VM CPU Intel Dual core E6550 @ 2.33GHz Intel Dual core E6550 @ 2.33GHz RAM 1 GB Ram 1 GB Ram Operating System Microsoft Windows XP SP3 Microsoft Windows XP SP3 Web Browser Mozilla Firefox version 3.5.6 Mozilla Firefox version 3.5.6 TOR/Vidalia TOR version N/A Vidalia version 0.2.6 WireShark version 1.2.5 WireShark version 1.2.5 (SVN Rev 31296) (SVN Rev 31296) WireShark (traffic capture) Eeye IRIS Eeye IRIS version 5 Eeye IRIS version 5 (traffic analysis) Table 2 – VM Comparison By running a fully functioning TOR client for end-users allows for the packets to be generated on-the-fly over the internet rather than propagating traffic to simulate the internet. Rather than capture the TOR-packets on the local machine, which would be unencrypted, the TOR traffic was captured by observing the traffic entering the LAN (as depicted in Diagram 7). When installing TOR it provides a Mozilla Firefox plug-in that can be switched on and off. It is for this reason that Mozilla Firefox was utilised for the web browsing aspect of this research. Windows 20 TOR Packet Analysis MUIR 2010
  21. 21. XP was chosen as the operating system for the VMs, this is due to the full compatibility of the TOR system with Windows XP. Diagram 7 – TOR and Non-TOR Network Setup 21 TOR Packet Analysis MUIR 2010
  22. 22. PACKET COMPARISON The types of internet traffic chosen to utilise for this analysis was based on the highest internet hits of December 2009, as compiled by Nielsen60 (see Table 3). By examining these statistics the following information about web usage can be gathered: the internet is used as a source of information (for example Google or News Corp); the internet is used as a communication medium (for example Facebook or Yahoo); the internet is used as a source for shopping (for example eBay or Amazon). Using this knowledge the following web browsing usage was established: 1. Yahoo was selected as the user’s homepage. The user would log on to their Yahoo webmail account. 2. The user would read their emails as well as write an email. 3. Following-on from reading their email, the user would click a link inside a email and read a few news articles, including one regarding the 2010 Winter Olympics. 4. The user would then visit and search for “winter Olympics”. 5. This search would result in a link which the user would click on. 6. From the original “winter Olympics” Wikipedia entry the user would click on the 2010 winter Olympics link. 7. The user would then enter into the web browser and search for “winter Olympics tickets”. 8. Following this search the user would then browse a few of the resulting links. 9. The user would then enter into the web browser and search for “winter Olympics tickets”. 10. The user would then search for “ice hockey” under the “movies” category and click on the first link. 11. The user would then enter into the web browser and login to their account. 60 (Nielsen, 2010) TOR Packet Analysis 22 MUIR 2010
  23. 23. 12. On the Facebook site the user would search for “winter Olympics” under the “groups” category. 13. The user would then join a “winter Olympics” group and add a message to the group’s Facebook “wall”. 14. Then the user would enter into the web browser and search for “what is my ip”. 15. Following the above search the user would click on the link and recover their IP address (it is to be noted that when TOR is installed the original homepage is always an IP address providing link). RANK PARENT UNIQUE ACTIVE TIME PER AUDIENCE REACH PERSON (000) % (HH:MM:SS) 1 GOOGLE 353,851 83.91 2:38:50 2 MICROSOFT 315,490 74.81 3:01:38 3 YAHOO! 228,711 54.23 2:12:36 4 FACEBOOK 206,878 49.06 5:57:17 5 EBAY 163,844 38.85 1:41:31 6 WIKIMEDIA FOUNDATION 141,239 33.49 0:16:01 7 AMAZON 137,364 32.57 0:32:11 8 AOL LLC 129,360 30.67 2:21:03 9 NEWS CORP. ONLINE 120,316 28.53 0:59:17 10 INTERACTIVECORP 115,131 27.30 0:11:36 Table 3- Top 10 Global Web Parent Companies, Home & Work December 200961 61 (Nielsen, 2010) TOR Packet Analysis 23 MUIR 2010
  24. 24. PACKET CAPTURE & ANALYSIS The analysis phase of the methodology has two stages: the packet capture stage; and the packet analysis stage. To capture the network packets WireShark was selected as it is a robust network capture tool based on the “pcap” library. The first stage of the analysis involves capturing the identical network traffic from the two VMs via WireShark. WireShark can then be utilised to conduct the first form of traffic analysis to examine the low-level protocol information of the network traffic. This initial analysis will focus on IP header information as well as connection types and port information. For the next stage of analysis Eeye’s IRIS will be utilised to rebuild the html traffic. IRIS is a commercial network traffic monitoring and analysis tool that works on all IPv4 internet traffic. It is able to rebuild html traffic as well as provide statistical information about the network traffic. By utilising WireShark and IRIS it will be possible to drill-down into the network packets in order to exploit social engineering strategies to locate personally identifiable information from the network packets. The social engineering analysis will attempt to discover personally identifiable information from various sources, including email and social networking sites. As stated by Cohen, “the forensic examiner is more interested in high level information obtained from the traffic rather than low level protocol information”62. As the TOR traffic will be captured on the LAN, the most important question to answer will be if there is anyway to tell if network packets are utilising TOR from packet information. This is, can traffic analysis be used to “fingerprint” the network packets in order to identify the usage of TOR. 62 (Cohen, 2008) TOR Packet Analysis 24 MUIR 2010
  25. 25. RESULTS As the network traffic is already known, the purpose of the analysis was not to distinguish the websites visited by the user, instead the analysis is to determine what, if any, TOR-specific traffic fragments can be identified in order to violate the anonymity-properties of the TOR system. LOW-LEVEL ANALYSIS By observing the network packets through WireShark the low-level packet properties were examined and compared. It is evident through the analysis of these packets that TOR packets do not contain any property that can be utilised to “fingerprint” the header of the packet, that is, there is no recurring hex header of the network traffic that can be associated with TOR traffic. This is due to the first part of the TOR “cell” being the CircID (Circuit ID) of 2 bytes, which is unlikely to be the same as numerous circuits can be multiplexed over the single TLS connection. This is not to say that TOR traffic cannot be recognised on-the-fly, just that a hex header for packet fingerprinting is not possible. One way that TOR traffic can be identified compared to standard internet traffic is through the default port number that TOR utilises, port number 9001. By applying a TCP port filter in WireShark the TOR traffic can be easily monitored (see Screenshot 1). Officially port 9001 is reserved for traffic related to the “Microsoft Sharepoint Authoring Environment”; however, TOR is setup by default to take advantage of this port number for both a source port and a destination port. This is not to say that TOR can’t be re-configured to use other TCP port numbers, only that a default installation TOR will utilise port 9001. Screenshot 1 – WireShark TOR Port Filter 25 TOR Packet Analysis MUIR 2010
  26. 26. Screenshot 2 – TOR Traffic on Port 9001 By filtering for port 9001 on the LAN the TOR traffic was able to be observed. Once the TOR traffic has been identified it was important to note that the IP source and destination address information could be learnt through analysis in WireShark. As is seen in Screenshot 3 the destination address for this packet is Knowing this IP address will allow for future analysis and capturing of the unencrypted packets from the local machine hosting the TOR client. In this way the identification of the TOR packets could be used to determine which user has TOR installed on their machine. 26 TOR Packet Analysis MUIR 2010
  27. 27. Screenshot 3 - IP Address Identification HIGH-LEVEL ANALYSIS The TOR packets, when encrypted, do not allow for the HTML data to be rebuilt. Although this may obstruct a high-level traffic analysis from taking place on these TOR packets, there are workarounds which allow for the SSL-encrypted TOR packets to be rebuilt. There is a WireShark plug-in called “TOR Dissector” which, when run on a local machine running a TOR client , captures the user’s TOR SSL keys and decrypts the TOR packets on the fly (see screenshot 4). This leads to an issue about whether someone would be able to access a user’s local machine and run WireShark in conjunction with TOR Dissector without the user’s knowledge. This does, however, lead to an alternative method to decrypt the TOR traffic without the user suspecting anything, by conducting a Man-In-The-Middle (MITM) attack. A MITM attack positioned between the user’s computer and the TOR server will allow an attacker to decrypt the user’s TOR packets in real-time, and either rebuild the HTML or filter for plain text. 27 TOR Packet Analysis MUIR 2010
  28. 28. Screenshot 4 – TOR Dissector Private LAN/Corporations By using a network capture tool such as WireShark and filtering the internet connection of the LAN it is easy to recognise when TOR is used. By observing the TOR packets on the LAN a corporation would be able to pinpoint the local host computer utilising the TOR network. The use of TOR in many organisations is in itself likely to breach their internet usage policies, and once the local host is determined any future TOR packets could be captured and rebuilt using the WireShark plug-in “TOR Dissector”, or by conducting a TOR MITM attack. Government/ISP If a corporation, or a Government, does not have access to the local machine running TOR then the TOR MITM attack can still be performed to decrypt the TOR traffic. Similarly it is possible to establish a compromised TOR exit node to capture unencrypted TOR traffic. It must be stated that in order to conduct a targeted TOR 28 TOR Packet Analysis MUIR 2010
  29. 29. MITM attack the adversary must have prior knowledge that the user has utilised TOR and be aware of their IP address. In 2007 Egerstad hosted a TOR exit node in an effort to capture unencrypted TOR packets to investigate the types of internet traffic people were accessing through the TOR service63. Among the captured packets were highly confidential emails regarding foreign military issues sent by embassy staff members64. This highlights one of the biggest fallacies with TOR, or any anonymity service, in that many users assume that these services will provide complete anonymity even when sending emails from their own accounts. Similarly in 2009 Vea conducted research into the anonymitybreaching properties of hosting a TOR service and stated: matter how many anonymizing tools a user employs, or how well they are put into play, that same user lets the cat out of the bag when their web posts, emails or chats leave traces back to themselves...65 This research broke-down the TOR traffic into categories of usage as depicted in the following graph: Graph 1 – TOR Packet Distribution66 63 In (Lemos, 2007) In (Lemos, 2007) 65 (Vea, 2009) 66 (Vea, 2009) 64 TOR Packet Analysis 29 MUIR 2010
  30. 30. The fact that anyone may host a TOR server is another concern and major security risk, which may be mitigated via educating TOR users about what aspect of their internet usage is really anonymous. This leads to an important question: how many compromised TOR nodes are there? It only takes one compromised node along the TOR relay to violate the entire relay’s traffic. This issue has propelled some into investigating whether certain TOR nodes are in fact compromised and acting maliciously67. Since the TOR exit-nodes can decide what traffic (or rather, what ports) it wants to relay it’s easy to set up a rogue exit-node that relays only cleartext traffic (and of 68 course sniffs it on the fly)... This research resulted in the identification of numerous TOR exit nodes restricting traffic based upon the port numbers. For example, a node was identified as accepting only unencrypted IMAP, AOL Instant Messenger, MSN Messenger and Yahoo Messenger traffic and rejecting all other forms of internet traffic69. It is possible that the person hosting this server is doing so to assist people communicate over TOR, yet it is equally possible that the node is compromised and capturing unencrypted packets. Even if this node is not compromised it could become compromised as easily as turning on WireShark. As well as being selective with internet traffic, TOR nodes can be compromised using the MITM attack methodology. By running a SSL enabled server the same researcher connected to their website through TOR to check if any exit nodes were modifying his website’s SSL certificate70. One TOR exit node was found to have modified his website’s SSL certificate indicating that a MITM attack was being carried out through this particular exit node71. It was unclear what the MITM attack was being used for, but it is important to be aware of the potential dangers when using the TOR service. 67 (Team Furry, 2007a) (Team Furry, 2007a) 69 (Team Furry, 2007a) 70 (Team Furry, 2007b) 71 (Team Furry, 2007b) 68 TOR Packet Analysis 30 MUIR 2010
  31. 31. Man-In-The-Middle attacks against TOR are not new. In fact there is a tool designed to facilitate these types of attacks against SSL traffic called “SSLStrip”72. This tool has been designed to work on a proxy server, such as TOR, between the user and the internet. Whenever a user attempts to access an SSL website, “a program on the proxy server sends the request to the website, handles any redirect to an SSLencrypted page and returns an exact duplicate to the user, without the encryption”73. To the end user the website looks legitimate, even the ubiquitous SSL “padlock” symbol is able to be spoofed with the use of this tool74. When run on a TOR node the tool’s creator was able to capture and decrypt packets relating to account logins, including 114 Yahoo credentials, and 50 Gmail credentials, as well as packets containing credit-card numbers75. This research indicates that TOR users are sending traffic relating to login details and credit card numbers and assuming that the TOR system will ensure that these packets are secure and anonymous. By analysing TOR traffic captured from the TOR exit nodes it is evident that users are misguided in their understanding of the abilities of TOR, specifically users who utilise TOR to “anonymously” log into their own email clients or other websites using their own personally identifiable information (for example social networking/blogging sites). In fact the TOR developers clearly state that for security users should incorporate end-to-end encryption76. The TOR developers also state that TOR does not guarantee against global adversaries, for example corrupt or compromised nodes77. Currently there is the ability for TOR users to manually select which exit nodes they wish to utilise. Although good in theory, this leads to another issue regarding the choosing of the nodes. It is therefore possible that a system similar to a Certificate Authority (CA) system could be put into place for users to ensure the integrity of the exit nodes which they are using. This would, however, result in a violation of the anonymity of the people or organisations hosting these exit nodes. This violation would most likely 72 (Marlinspike, 2009) (Security Focus, 2009) 74 (Marlinspike, 2009) 75 (Security Focus, 2009) 76 (Lemos, 2007) 77 (Dingledine et al., 2004) 73 TOR Packet Analysis 31 MUIR 2010
  32. 32. lead to a reduction in the number of privately operated exit nodes which in turn would result in fewer onion layers and slower connections. 32 TOR Packet Analysis MUIR 2010
  33. 33. CONCLUSION/RECOMMENDATIONS Through traffic analysis it has been disproven that the originating IP addresses can be recovered from TOR packets, that is, except if they are TOR packets captured over a local area network. On the other hand social engineering has had success in identifying users of the TOR network through insecure and non-anonymous logins. Although this method does not always result in recovering the originating IP address, recovering the real identity of the user is much more important from a law enforcement perspective. This paper has shown that through traffic analysis techniques TOR traffic can be distinguished from regular internet traffic. Specifically, the port numbers that TOR utilises, along with the frequent usage of SSL traffic, assist in locating packets belonging to the TOR network. Having this knowledge greatly assists network observers, either law enforcement or corporations, in recognising TOR and then subsequently implementing suitable measures to further conduct traffic analysis on these types of packets. Although at first glance it may appear that the TOR system provides adequate protection of the users’ anonymity, and to a certain degree their security, the weaknesses exhibited by the TOR system can be easily exploited. From a law enforcement perspective these weaknesses can be exploited in order to capture these packets and conduct a forensic analysis of their content. There are a few processes that are recommended in order to minimise the loss of anonymity while using TOR. Firstly it is fundamentally flawed to use TOR with a user’s real email address or account logins. This undermines any anonymity provided by the TOR service. Instead it is highly recommended that only anonymous, or temporary, email addresses and logins are used within the TOR network. Secondly TOR should not be utilised to make any purchases over the internet. Using a credit card number or a user’s physical shipping address will also undermine any anonymity provided by TOR. Any reference to a user’s physical location or any personally identifiable information should not be mentioned whilst utilising the TOR service to ensure the anonymity of TOR users. TOR Packet Analysis 33 MUIR 2010
  34. 34. If, for example, a user or computer had been identified as utilising TOR and a law enforcement agency wanted to know what TOR was being used for then the law enforcement agency could instigate a MITM attack using a tool such as “SSLStrip”. The point of attack could either be running SSLStrip while acting as a compromised TOR node, or running SSLStrip in between the user’s internet connection and the TOR system itself. Using either of these attack points a law enforcement agency would be able to “tap” the user’s network packets and view the content in clear text (see Diagram 8). Diagram 8 – TOR MITM Attack By utilising open-source tools, such as WireShark and SSLStrip, a law enforcement agency would be able to effectively capture and analyse a user’s TOR packets. In order for this type of capture and analysis to be successful the law enforcement agency would need to have prior knowledge of the person who is utilising the TOR 34 TOR Packet Analysis MUIR 2010
  35. 35. service. Without knowledge of the person’s IP address the MITM attack would not be feasible due to the requirement of positioning the attack in between the user’s computer and the TOR system. If a law enforcement agency were to run a MITM attack on a compromised TOR node they would not be able to determine which TOR users were connected to their compromised TOR node, therefore in a law enforcement context knowing the target is a necessity. This paper has demonstrated that the TOR system is not infallible to traffic analysis techniques. Indeed traffic analysis plays an important part in locating TOR packets and subsequently implementing attacks that compromise the anonymity of the TOR network. The attacks presented in this paper allow law enforcement agencies to implement systems that will decrypt TOR packets to gain high-level access to the original HTML of the packets. When used in a network forensic context these attacks change TOR from an anonymity system into nothing more than a slight inconvenience. 35 TOR Packet Analysis MUIR 2010
  36. 36. REFERENCE LIST Androulaki, E., Raykova, M., Srivatsan, S., Stavrou, A., & Bellovin, S. M. (2008). Par: Payment for anonymous routing. Lecture notes in computer science, 5134, 219-236. Bauer, K., McCoy, D., Grunwald, D., Kohno, T., & Sicker, D. (2007). Low-resource routing attacks against anonymous systems. Paper presented at the Proceedings of the 2007 ACM workshop on Privacy in electronic society. Chaum, D. L. (1981). Untraceable electronic mail, return addresses, and digital pseudonyms. Communications of the ACM. Cohen, M. I. (2008). PyFlag–An advanced network forensic framework. Digital Investigation, 5, 112-120. Danezis, G., & Diaz, C. (2008). A survey of anonymous communication channels. Journal of Privacy Technology. Dingledine, R., Mathewson, N., & Syverson, P. (2004). Tor: The second-generation onion router. Paper presented at the Proceedings of the 13 th Usenix Security Symposium. Dingledine, R., Mathewson, N., & Syverson, P. (2005). Challenges in deploying low-latency anonymity. NRL CHACS Report, 5540-5265. Fraser, N. A., Raines, R. A., & Baldwin, R. O. (2005). Tor: An Anonymous Routing Network for Covert On-line Operations. IOSphere: the Professional Journal of Joint Information Operations, 44–47. Fu, X., Graham, B., Bettati, R., & Zhao, W. (2003). Active traffic analysis attacks and countermeasures. Gomu kiewicz, M., Klonowski, M., & Kutylowski, M. (2004). Onions Based on Universal Re– Encryption-Anonymous Communication Immune Against Repetitive Attack. Hafner, K., & Lyon, M. (2000). Where wizards stay up late: The origins of the Internet: Touchstone Books. Hintz, A. (2003). Fingerprinting websites using traffic analysis. Lecture notes in computer science, 171-178. Hopper, N., Vasserman, E. Y., & Chan-Tin, E. (2007). How much anonymity does network latency leak? Lemos, R. (2007). Embassy leaks highlight pitfalls of Tor [Electronic Version]. SecurityFocus. Retrieved 09/10/2009, from Marlinspike, M. (2009). SSLSTRIP [Electronic Version]. Retrieved 05/02/2010, from Murdoch, S. J., & Danezis, G. (2005). Low-cost traffic analysis of tor. Paper presented at the IEEE Symposium on Security and Privacy. Naval Research Laboratory. Onion Routing - Brief Selected History. Retrieved 09/10/2009, from Nielsen. (2010). Top 10 Global Web Parent Companies. Retrieved 22/01/2010, from Øverlier, L., & Syverson, P. (2006). Locating hidden servers. Paper presented at the IEEE Symposium on Security and Privacy. Perry, M. (2007). Securing the Tor Network: Defcon. Privoxy Developers (2010). Privoxy 3.0.16 User Manual. Retrieved 04/01/2010, from Reiter, M. K., & Rubin, A. D. (1998). Crowds: Anonymity for web transactions. ACM Transactions on Information and System Security (TISSEC), 1(1), 66-92. Rennhard, M., & Plattner, B. (2002). Introducing morphmix: Peer-to-peer based anonymous internet usage with collusion detection. Security Focus. (2009). Man-in-the-middle attack sidesteps SSL [Electronic Version]. Retrieved 05/02/2010, from 36 TOR Packet Analysis MUIR 2010
  37. 37. Sun, Q., Simon, D. R., Wang, Y. M., Russell, W., Padmanabhan, V. N., & Qiu, L. (2002). Statistical Identification of Encrypted Web Browsing Traffic. Paper presented at the Proceedings of IEEE Symposium on Security and Privacy,. Team Furry. (2007a). On TOR. MW-Blog Retrieved 05/02/2010, from Team Furry. (2007b). TOR Exit Nodes Doing MITM Attacks. MW-Blog Retrieved 05/02/2010, from The Tor Project Inc. (2009a). Tor: anonymity online. Retrieved 09/10/2009, from The Tor Project Inc. (2009b). Tor: Sponsors. Retrieved 09/10/2009, from The Tor Project Inc. (2009c). Tor: Users. Retrieved 09/10/2009, from Vea, M. (2009). What Traffic is on a TOR Relay? Retrieved 04/01/2010, from Wright, M., Adler, M., Levine, B. N., & Shields, C. (2002). An analysis of the degradation of anonymous protocols. 37 TOR Packet Analysis MUIR 2010