Posting 1 Reply Required What concerns should be underst.docx
FreeSpeak- Anonymous messaging over on-demand cloud services
1. FreeSpeak: Anonymous messaging over on-demand
cloud services
Pablo Panero
Department of Computer Science Engineering
University of Oulu
pablopanerovz@gmail.com
Raúl Jiménez Redondo
Department of Computer Science Engineering
University of Oulu
rauljredondo@gmail.com
ABSTRACT
In the last decade privacy and security has grown in popularity. In
consequence more and more solutions are being offered in the
market. In this paper we propose the design of an anonymous
messaging system, FreeSpeak. The system makes use of cloud
computing, specifically cloud virtualization. The main anonymity-
providing feature of the system consists of routing messages
through virtualized nodes, mimicking Fast-Flux networks
components fleetingness and disposability. However, this fact
makes consistency a challenge, in the form of tracking and
applying routing changes properly. Nevertheless, not all
components of the systems can be made ephemeral, and need to
be highly available. Therefore, permanent and key components
are protected and masked using Tor hidden services. Key
components are not replicated becoming a single point of failure
and a more interesting target. Moreover, making virtual nodes
disposable and ephemeral is not enough in order to become
anonymous; a layer of encryption is needed in order to protect
information from unauthorized access to it. In consequence, onion
routing and public-key encryption are utilized.
Categories and Subject Descriptors
C.2.4 [Computer Communication Networks]: Distributed
Systems – Client/Server, distributed applications.
General Terms
Design.
Keywords
Cloud computing, privacy, anonymity.
1. INTRODUCTION
In the recent years the privacy problem has come up to the surface
with the discovery of massive surveillance programs such as
PRISM, run by the NSA, instilling concern among Internet users.
In addition, law is not well defined due to the wide spectrum of
the World Wide Web. Meaning that it is affected by almost every
country’s jurisdiction, each one with its own laws, making it a
chaotic and slow progressing field. In order to try to solve some
issues and take advantage of the data flowing through the Internet
many countries are approving, or already have, Internet
surveillance laws. Such is the example of France [1], approving a
surveillance law with the aim of tackling terrorist attacks.
However, massive surveillance carries big responsibilities and
risks.
The rapid growth in popularity of mobile devices and Internet
along with the lack of users’ attention in privacy and permissions
has increased the problem. Thus gathering personal information,
and in consequence surveillance, has become extremely easier.
With every advertisement clicked, product purchased online,
website browsed, photo shared, etc. users leave a digital trail and
paint a picture of themselves for anyone who has the means to
see. Encountering the thin line between personalization and
privacy invasion.
These happenings have had a direct effect in privacy providing
services and applications, having them grown in popularity and
demand. A problem faced by anonymity networks and services is
that in general they require at least basic knowledge on
networking to be set up properly. Moreover, the most popular of
those networks (i.e. Tor and JonDonym) have their nodes or
relays in fixed, known or not, points, which can lead to direct
attacks. Our goal is to design an application that provides
anonymity while chatting, by exploiting cloud computing and
virtualization features that are not yet used in their whole
potential. Even though the system targets messaging
functionalities, it could be adapted to be used for secure
teleconference, data sharing, web surfing, etc.
2. RELATED WORK
The first thing we need to understand is that it is excruciatingly
difficult to provide anonymity, and most of the times what is
provided is unlinkanbility, unobservability, or pseudonimity, what
means that users can be deanonymized by some attackers.
As defined by Pfitzmann and Hansen [2] anonymity is the
characteristic of not being identifiable among a set of possible
subjects, or anonymity set. A security requirement for anonymity
is to be identified with a probability of 1/N, where N is the
number of subjects in the anonymity set.
Unlinkability is intended to avoid users profiling. In terms of
anonymous messaging it would imply that there is no method to
link a message with neither the sender nor the receiver, and the
sender and receiver between them.
Unobserbability means that no message, sender, nor receiver can
be discerned from “random noise”. Therefore, it would not be
noticeable if a message has been exchanged between any sender-
receiver pair.
Finally, pseudonimity is defined by the use of pseudonyms as
identifiers. Meaning that an attacker cannot identify a subject
using the information given by the pseudonym that can be
understood as an alias.
Nowadays there are a diverse variety of networks that provide
anonymization services, such as Tor [3], Freenet [4], JonDonym
[5], etc. In general terms most of those networks are built on the
same base, data encryption and routing through a series of nodes,
over which they apply their own modifications. Some of them,
such as Tor [3] and Freenet [4], make clients share their own
2. bandwidth (and in the case of Freenet [4] also the storage) to route
packets in order to achieve anonymization. However, other
networks such as JonDonym [5] make use of certified operators
with strong privacy contracts.
Two related examples of chat applications that make use of these
anonimization networks are Fast-Flux chat [6] which makes use
of Tor hidden services [7][8] to protect key infrastructure, but
leaves the rest out of it in order to increase performance. The
second example is Scatterchat [9] that integrates the onion routing
through Tor network.
Tor hidden services [7][8] provide an important layer of
protection by avoiding others to locate the service. This feature is
achieved by using a “Randevouz Point”, or meeting point, to
where both client and server connect in order to communicate.
Although these networks provide anonymity there is a high
number of possibilities to attack them. Tor has been victim of
numerous attacks, may be due to its popularity. Even though,
most of them require a high computational power and may not be
successful, they can be carried out.
As stated by Salo [10] there are a long list of possibilities of
attacking this networks. As shown by Bauer et al [11] low-cost
traffic analysis can link back unrelated streams by timing and
loading Tor nodes. Back et al [12] also showed the possibility of
traffic analysis attacks but this time over Freenet. Back et al [12]
performed an attack by means of counting exchanged packets
(since it is easier to count how many packets are sent to the first
node, the attack consist in counting the exit ones to another
destiny in order to find the path of the connection).
By using traffic analysis and probability (e.g. Bayesian
probability) we can learn how much an adversary can know.
However, as said by Salo [10] there is still many opportunities for
research in this field.
In addition to traffic analysis attacks, denial-of-service attacks
also help deanonymize users. In this case one of the most common
techniques is to overload a part of the networks so the circuit is
routed through malicious nodes.
Moreover, protocol vulnerabilities and browser-based (by
tampering HTTP traffic) can be carried out.
In the cloud computing field privacy has been a hot sport from its
beginning, above all in terms of data storage. Most of the
solutions are achieved via encryption and authentication.
An approach to solve this problem is the one proposed by Khan
and Hamlen [13] consisting in anonimizing the owner, source, and
destination of data and computations that can be access in an
unencrypted fashion in the cloud. This is achieved through Tor
network and public-key encryption. Another method to tackle the
privacy problem is to only provide access to the data to authorized
users. This is the approach taken by Giweli et al. [14] by using
symmetric and asymmetric keys, which are securely attached to
the encrypted data. Only authorized used can have access to the
keys and therefore to the encrypted data.
This paper aims to fill what we consider a technological gap,
cloud computing, left by other applications and service providers.
Although it is true that Tor uses cloud bridges [15] (using
Amazon AWS) as an extra measure against Internet censorship,
none of the existing networks exploit cloud computing and
virtualization to create nodes.
3. SYSTEM DESIGN
The system is design following a mixnet-based scheme [16]. It
consists of a master server, a cloud black box, and the client
application. The cloud could be seen as an individual system
through which the messages will be exchanged, as shown in
figure 1.
The overall interaction of the system would be as follows:
1. The users will agree in the amount of nodes to use for the
black box.
2. In the second stage the master will set up the black box
according to the previously calculated privacy level.
3. After setting up the environment the master server will send
back the addresses to which they should establish
connections. Those addresses will be the entry and exit
points of the black box.
4. The user will establish the proper connections and start the
message exchange.
5. The messages will go through the black box being their
tracks covered.
6. Finally, the messages will be received by the targeted users.
The following picture shows an overall scheme of the system
flow. Notice that the users at the beginning and end of the
communication are the same.
Figure 1: Overall system flow
To establish the number of nodes there are 5 options:
1. The creator of the chat room decides the amount. The other
users will be force to agree with it.
2. In a more democratic fashion users would be asked the
desired amount of nodes, when all answers are received the
mean value will be calculated. Users will be shown the value,
and if they agree (all of the users) on it the system will go to
next phase. On the other hand, if they do not agree they will
be asked again.
3. A modification of the previous one would consider an
agreement if 50% or more of the users agree in the amount of
nodes.
4. Another slight modification would be to only allow
introducing a value closer to the calculated mean than the
previously introduced.
5. The last possibility would be to use the mode or some
quartile instead of the mean value of the values introduced by
the users.
Note that all the options, except the first one, could be combined
between them. The amount of nodes needs to be able to be
arranged in a 2 dimensional matrix fashion.
3.1 The master node
The master node is the brain of the system. Meaning that it is a
vital part, and therefore a single point of failure. One possible
3. solution would be to replicate the master node in order to achieve
high availability. However, we propose a similar solution than
that taken by Banks et al. [6] (Fast Flux Chat) being the master
node is protected by making it a Tor hidden service. By doing that
we ensure that it is highly difficult to locate and identify the
master node, and would be less prone to successful attacks.
However, as shown by Byrukov et al [17] hidden services can be
deanonymized by means of bandwidth inflation and tracking
descriptors.
The master node is in charge of building the black box according
to the specified characteristics given by the users.
3.2 The black box
The black box is the heart of the system. Its function is to route
the messages through the corresponding nodes, and therefore
covering the tracks of who wrote the message, what is in the
message, and to whom it is destined when it goes through the
web.
The relays of the box will be arranged in a matrix (of N rows and
M columns), where each one of them knows the addresses of the
relays of the following column. A column of the matrix could be
understood as a mixer with many entry points that receives a
message and forwards it to the next mixer.
Upon receiving a message the relay will decide to which relay to
route the message based on a randomized factor and the number
of messages that has been already routed to the next stage nodes
in order to control load on the relays. It is true that the more
messages go through the same node the more mixed they are, but
we have to take into account not to overload a node in order not to
affect the overall performance.
When a message is sent to the next relay and its reception is
acknowledged, all tracks of the message will be erased from the
relay. In addition nodes of the black box have a TTL (Time To
Live), after which they are substituted by a new node. The node
would be created and configured and substitute the old one when
ready in order not to lose packets and affect performance.
Encryption is provided in form of onion routing [16], using
ElGamal public-key cryptosystem. Even though onion routing
provides a socket connection through the mixers, this connection
would need to be established again if any of the nodes in the path
change (a node is substituted by a new one).
Figure 2: Onion routing key layers
Apart from the matrix structure, relays could instead be arranged
in a peer-to-peer network where messages will be sent to a
random node (or following some combinatory routing function) of
the network instead of passing through mixers as in the previous
approach. However, this approach is more failure and overload
prone. In the case of p2p structure a maximum limit of hopes
would need to be established so the packets arrive to its
destination. Otherwise they could be trapped forever in the black
box due to the random nature of the path.
The virtual machines deployed in the black box would be
configured to forward packets to the designated addresses/port.
However automatic updates might cause the machine creation to
be slower or misconfigured. Therefore, the virtual machine image
would be change upon software updates. In consequence the client
would not need to worry about configuration nor updates.
Since no data is stored in the node there is no need to keep the
views of the servers consistent, the client applications are needed
to be consistent. In consequence messages would be routed to all
exit points.
4. DISCUSSION
The master node is a single point of failure, however since it is
provided as a tor hidden service it more difficult to locate and
therefore attack (it would require a high computation power in
order to break the cryptography behind it). This problem could be
solved having various replicas of the master node in order to
increase performance and decrease the effect of it being down.
Nevertheless, the fact of being hidden through a Tor service
makes a geographic distribution of the replicas less effective, but
if some other ways of protection against attack were applied a
geographically distribution of master nodes replicas would
increase significantly the performance.
The black box also has items with a short lifespan and therefore
an attack over a node would not have a big effect in the overall
performance. However some kind of buffer in the entry point of
the network could be useful in case of a high number of nodes of
the black box being down, this meaning a new kind of node in
order to support the buffer of all the clients (take into account that
the entry point of different clients may be different, therefore it
would mean losing some architectural transparency and making
those node a more interesting target).
Sending all messages to the exit points could be challenging due
to the continuously changing nature of the black box nodes. The
TTL of the nodes would need to be a proper trade-off between
anonymity and performance because the highest computational
effort is made when establishing the connection and the public-
private keys for the onion routing.
Another problem that can affect performance is the startup time of
the virtual machines in the cloud. As said by Mao and Humphrey
[18] this depends greatly on the provider, being Amazon being the
up time approximately 1 minutes and 30 seconds. Moreover a
more recent study made by Hoffman [19] shows that Google
Cloud is the fastest, with an startup time of approximately 30
seconds, closely followed by Amazon and Vexxhost with 47
seconds of startup time. Nevertheless as shown by Mao and
Humphrey [18] the startup time also depends on the operative
system that the virtual machine needs to run, being the best option
a networking distribution of Linux (e.g. CentOS).
Considering the reception of messages consistency could be a
problem due to the variety of exit points and different delays and
latencies of the mixers and nodes. However, eventual consistency
is achieved, meaning that if writes stop, all clients will have the
same view of the chat, being possible some variances in the order
of some messages.
4. Regarding possible attacks over FreeSpeack network, a denial-of-
service over the entire network could be successful if, as said
before, it affects the master node. The ephemerality of the black
box nodes makes traffic analysis attacks to require a high
computation power in order to track and analyze the whole ever
changing network.
A business related problem that this application might face is the
amount of money needed to use virtualization in the cloud. As the
startup time, it also varies depending on the provider. Therefore,
choosing a provider would be trade-off between performance and
price.
5. ACKNOWLEDGMENTS
We acknowledge the University of Oulu for the help and guidance
that made it possible.
6. REFERENCES
[1] BBC News. (May 6, 2015). French parliament approves new
surveillance rules. Retrieved from
http://www.bbc.com/news/world-europe-32587377
[2] Pfitzmann, A., & Hansen, M. (2010). A terminology for
talking about privacy by data minimization: Anonymity,
unlinkability, undetectability, unobservability,
pseudonymity, and identity management.
[3] The Tor Project, Inc. (n.d.). Tor Project. Retrieved from
https://www.torproject.org/index.html.en
[4] JonDos GmbH. (n.d.). JonDonym. Retrieved from
https://anonymous-proxy-servers.net/en/overview.html
[5] Freenet Project. (n.d.). Freenet Project. Retrieved from
https://freenetproject.org/index.html
[6] Banks, G., Childers, N., & Ford, S. (2008). Fast Flux Chat.
[7] The Tor Project, Inc. (n.d.). Tor Hidden Servies. Retrieved
from https://www.torproject.org/docs/hidden-
services.html.en
[8] Dingledine, R. (2005). Tor Hidden Services. Proc. What the
Hack.
[9] Hacktivistmo. (n.d.). Scatterchat. Retrieved from
http://www.scatterchat.com/
[10] Salo, J. (2010). Recent Attacks On Tor. Aalto University.
[11] Bauer, K., McCoy, D., Grunwald, D., Kohno, T., & Sicker,
D. (2007, October). Low-resource routing attacks against tor.
In Proceedings of the 2007 ACM workshop on Privacy in
electronic society (pp. 11-20). ACM.
[12] Back, A., Möller, U., & Stiglic, A. (2001, January). Traffic
analysis attacks and trade-offs in anonymity providing
systems. In Information Hiding (pp. 245-257). Springer
Berlin Heidelberg.
[13] Khan, S. M., & Hamlen, K. W. (2012, June).
AnonymousCloud: A data ownership privacy provider
framework in cloud computing. In Trust, Security and
Privacy in Computing and Communications (TrustCom),
2012 IEEE 11th International Conference on (pp. 170-176).
IEEE.
[14] Giweli, N., Shahrestani, S., & Cheung, H. (2013). Enhancing
data privacy and access anonymity in cloud
computing. Communications of the IBIMA,1(462966).
[15] The Tor Project, Inc. (n.d.). Tor Cloud. Retrieved from
https://cloud.torproject.org/
[16] Ren, J., & Wu, J. (2010). Survey on anonymous
communications in computer networks. Computer
Communications, 33(4), 420-431.
[17] Biryukov, A., Pustogarov, I., & Weinmann, R. (2013, May).
Trawling for tor hidden services: Detection, measurement,
deanonymization. In Security and Privacy (SP), 2013 IEEE
Symposium on (pp. 80-94). IEEE.
[18] Mao, M., & Humphrey, M. (2012, June). A performance
study on the vm startup time in the cloud. In Cloud
Computing (CLOUD), 2012 IEEE 5th International
Conference on (pp. 423-430). IEEE.
[19] Hoffman, K. (March 7, 2015). Ready. Steady. Go! The speed
of VM creation and SSH access on AWS, DigitalOcean,
Linode, Vexxhost, Google Cloud, Rackspace and Microsoft
Azure. Retrieved from http://blog.cloud66.com/ready-
steady-go-the-speed-of-vm-creation-and-ssh-key-access-on-
aws-digitalocean-linode-vexxhost-google-cloud-rackspace-
and-microsoft-azure/