A Study of A Method To Provide Minimized Bandwidth Consumption Using Regenera...
paper
1. UNIVERSITY OF ONTARIO INSTITUTE OF TECHNOLOGY, WEB-SERVICES AND E-BUSINESS SECURITY, APRIL 2015 1
Redis, a NoSQL key-value store used in Big
Data: a security analysis
George Logarakis,Gustavo El Khoury,Ian Dunbar.
Abstract—An important topic in the field of Big Data is, without doubt, the storage technology to use. Such technology
must provide easy and fast access to the datasets, while ensuring the integrity, resiliency and even confidentiality. In this
paper, a comprehensive review of Redis, a data store used in Big Data will be executed. A summary of its architecture
and utility will be offered, as well as a discussion of its competence in terms of security.
Keywords—Redis, NoSQL, Datastore, Big Data, Privacy, Security
!
1 INTRODUCTION
ONE of the most difficult challenges to face
while working on a project that involves
Big Data is, essentially, where and how to store
this data. The main reason for this is that tra-
ditional technologies, such as databases, usu-
ally don’t scale well enough, or take enormous
amounts of time to perform simple operations.
This is usually because such technologies were
not developed with the idea of huge datasets
in mind. For such purposes, special storage so-
lutions are currently being developed, tailored
particularly for the needs of Big Data. One
of such solutions is Redis, a NoSQL database
that stores key-value pair that can be stored
in great quantities and retrieved really quick.
On this paper we will do a quick overview of
the technologies behind Redis, its architecture
and the reasons behind its performance. Most
importantly, we will cover the security mecha-
nisms embedded in Redis, and we’ll compare
these against the industry’s best practices and
guidelines.
April 5, 2015
2 A BRIEF OVERVIEW OF REDIS
One of the best ways to describe Redis is as
an in-memory, key-value data store[1]. It’s a
type of NoSQL database, since the data stored
• G. Logarakis, G. El Khoury, and I. Dunbar are with the Univer-
sity of Ontario Institute of Technology
Manuscript received April 8, 2015; revised April 5, 2015.
doesn’t follow any particular schema, beyond a
dictionary-like structure: a value, which can be
of any type within the supported ones (strings,
sets, ordered sets and hashes), is uniquely as-
sociated with a key. In order to retrieve the
value, the key -and only the key- can be used
to access it. This level of simplicity can seem
restrictive at first, but it also provides incredible
flexibility in terms of the projects in which
Redis can be used. Furthermore, since Redis
uses the system memory as the primary storage
for the key-value pairs, it’s incredibly fast when
compared with traditional storage solutions
which use secondary storage like hard disks or
SSD storage -however, to provide persistence,
secondary storage is used to save the memory
contents-. These features turn Redis into a pow-
erful tool with the following advantages:
• Incredible performance: In environments
in which the rate of information received
per second is really high, traditional
databases fail to deliver the required per-
formance. Since Redis uses RAM to store
the data, the speed in which it processes
operations in really high, reaching up to
500K operations per second[2]
• Flexibility: Since it uses the simple key-
value data structure, many applications
can easily take advantage of Redis
• Scalability: Because of its design, Redis
can store millions of entries without ex-
periencing any performance reduction.
2. 2 UNIVERSITY OF ONTARIO INSTITUTE OF TECHNOLOGY, WEB-SERVICES AND E-BUSINESS SECURITY, APRIL 2015
Fig. 1. Redis Replication
In the industry, Redis is mainly used for two
purposes: as a cache for large-scale applications
that must be responsive and fast while having
thousands of entries, like Twitter [3]; or by
Big Data providers, specially in cases in which
the number of entries received per second out-
classes most commercial database systems.
3 REDIS ARCHITECTURE
In terms of roles, Redis distinguishes two ev-
ident ones: a client, which can be any system
process accessing the store, and a server, which
provides services to any number of clients.
The server and the client can be on different
machines, and even on different networks, but
this last scenario is discouraged for security
reasons that will be discussed further on.
For the data to be persistent, Redis provides
two mechanisms to ensure data is not lost[4]:
• RDB persistence: On specific intervals or
triggering conditions (eg. after 100 write
operations, or after 5 minutes of runtime),
the server can perform a snapshot of the
contents. A snapshot can also be taken
manually with a SAVE command. Irre-
spective of how it’s triggered, this task
involves a fork() system call, which can
be time-consuming depending on the size
of the data store, and it can make Redis
stop serving clients
• AOF persistence logs: Every write oper-
ation on the data store is logged as a
part of an append-only logfile. Although
Fig. 2. Redis Clustering
it provides more flexibility in terms of
storage, and is more resilient to corrup-
tion because of the append-only property
of the logfile, it can take more space than
a RDB snapshot.
To provide resiliency and fault tolerance,
Redis incorporates replication techniques (See
figure 1) in order to provide fault-tolerance
and data accessibility [1]. In this scenario, a
master server is designated, and it replicates
the write operations to the other Redis servers
in real time. At the same time, intensive read
operations are received by the master server
and divided among as many slave servers
as needed. It’s important to notice that even
thought the master server maintains the slave
servers synced, each slave must ensure persis-
tence for its dataset.
In order to extend the storage capacity of
Redis, clustering can be used (see figure 2).
This allows a data store to be sharded across
multiple servers, using the entire RAM as the
datastore. As a consequence of this architecture,
a single failure in one of the nodes causes the
cluster to stop working. However, clustering
and replication can be combined in more com-
plex structures to achieve fault tolerance and
large capacity at the same time (see figure 3)
4 REDIS SECURITY MODEL
4.1 General Model
Redis is designed to be used in an isolated
environment (i.e. client pc) and it is recom-
3. EL KHOURY, LOGARAKIS, DUNBAR et al.: REDIS, A NOSQL KEY-VALUE STORE USED IN BIG DATA: A SECURITY ANALYSIS 3
Fig. 3. Redis Clustering and Replication simul-
taneously
mended that the Redis instance not be directly
exposed to the internet or any environment
where untrusted clients can directly access the
TCP port or UNIX socket [1]. Redis is not op-
timized for maximum security, it is optimized
for maximum performance and simplicity[1].
The follow sections will outline the three major
security areas in Redis and the features and
drawbacks of each.
4.2 Network Security
Redis does not provide any network security
features on install, if any network security is
desired, it is the clients responsibility to im-
plement the security. Redis does provide rec-
ommendations on what security measures to
implement. The first recommended action is en-
sure access to the Redis port is denied to every-
body but the trusted clients in the network [1].
This is to ensure all servers running Redis are
only accessible by the computers implement-
ing the application using Redis [1]. If Redis is
running on a single computer connected to the
internet, the Redis port should be firewalled to
prevent access from the outside environment
[1]. Failure to protect the Redis port can have
major consequences, one example of this is that
a single FLUSHALL command can be used by
an attacker to delete the whole dataset. The
Redis documentation informs users of the risks
but leaves the implementation of security up to
the user.
4.3 Authentication and Data Encryption
Redis does not implement any access control
features by default, but does have the option
to implement a small layer of access control
by editing the redis.conf file. When the au-
thentication feature is enabled, Redis refuses
any query by unauthenticated clients. To au-
thenticate itself, a client must issue the AUTH
command followed by a password. The main
problem with Redis access control feature is
that the password set by the administrator is
saved and sent in clear text. When an admin-
istrator issues an AUTH command, the entire
command (including the password) is sent in
plain text. If the client does not have the proper
network security implemented an external at-
tacker can eavesdrop and determine the clients
password and ultimately gain access to Redis.
The authentication layer in Redis is designed
to prevent external attackers from accessing the
Redis instance, but if an attacker can success-
fully gain access to the network they can secure
the password thus rendering the authentication
layer useless. As seen in the authentication
layer Redis does not support data encryption.
Similar to the network security, It is the clients
responsibility to implement an additional layer
of protections (i.e. SSL proxy), if parties want
to access Redis over the internet.
4. 4 UNIVERSITY OF ONTARIO INSTITUTE OF TECHNOLOGY, WEB-SERVICES AND E-BUSINESS SECURITY, APRIL 2015
5 BEST PRACTICES IN BIG DATA AND
REDIS
With the widespread use of big data there are
several best practices that should be followed
to ensure that data is properly represented, and
the confidentiality of the data is maintained.
The three main best practices that we will be
looking at are the: Create a Data firewall, Pro-
tection of the data, Gather security intelligence.
The data firewall aspect of big data revolves
around the restriction of access to data by an
unauthorized user. By creating policies that
restrict users and only allowing those who are
authorized to access data it ensures that the
data is not accessed by unauthorized users and
cannot access information that should not have
access too. Good security practices would be to
ensure that they can detect and prevent priv-
ileged users from unauthorized access. With
Redis many of the security and design prin-
ciples are left to the user to configure, Redis
does not natively support any restriction of the
databases it creates, instead it relies on the user
to restrict access to the port that it connects to
on the individual machines in the network. This
provides some level of security but leaves it
open to vulnerabilities by not allowing a more
broad range of security tools for securing the
database.
The next best practice is to Protect the data,
this practice is focused on encryption and en-
suring that data is protected. This protection
should take place while the data is in storage
at a specific location, as well as while it is
in transit. The best way to ensure this is by
using encryption and having a centralized key
management process in order to protect the
data and ensure it cannot be easily decrypted.
Redis has been designed for local use and not
for use over a WAN network, as such they do
not provide any level of encryption to the data
they store. Since there is no encryption on the
data it will be in plain text, so if any user get
unauthorized access to the data there will be
no safeguards to ensure that the data remains
secure.
The final best practice that we examined
was the Gathering of security intelligence. This
practise is put in place regarding the analysis
and audit of access to your data. Having a
proper logging will help to ensure that you
will be able to detect suspicious or frequent
access to sensitive data and will help to prevent
threats from occurring. Many of the benefits to
using Redis are the high level of customization
that is granted to its users, allowing for many
different configurations of the system. The cus-
tomization allows for many different variations
in the logs as well as in searching through
the database. One of the features to improve
logging with Redis is the AOF persistence logs
which logs every write operation received by
the server, they will be played again at startup
in order to reconstruct the original dataset.
6 RECOMMENDATIONS TO IMPROVE
THE SECURITY OF REDIS
To improve the functionality of Redis there
are several recommendations we would like
to make so that they will conform to best
practices. To ensure that only authorized users
have access to your data, all servers that are
running Redis should only be accessible by the
computers implementing the application using
Redis. Also if running Redis on a single com-
puter connected to the internet, the Redis port
should be firewalled to prevent unauthorized
access from outside the network.
Since Redis does not provide any encryption
with its database, it is important for Redis to
add this feature into later revisions so that
users will be able to have a secure database.
If Redis is being used over the internet steps
should be taken to include additional layers
of security and protection such as using an
SSL proxy. With such a customizable big data
system integrating security into their system is
a logical next step for Redis, doing this will give
users a greater sense of security and ensure that
the data that they are storing is protected from
unauthorized users.
REFERENCES
[1] N. Prusty, “Overview of redis architecture,”
http://qnimate.com/overview-of-redis-architecture/,
2014.
[2] Redis, “How fast is redis?”
http://redis.io/topics/benchmarks, 2014.
5. EL KHOURY, LOGARAKIS, DUNBAR et al.: REDIS, A NOSQL KEY-VALUE STORE USED IN BIG DATA: A SECURITY ANALYSIS 5
[3] T. Hoff, “How twitter uses redis to scale:
105tb ram, 39mm qps, 10,000+ instances,”
http://highscalability.com/blog/2014/9/8/how-twitter-
uses-redis-to-scale-105tb-ram-39mm-qps-10000-ins.html,
2014.
[4] Redis, “Redis persistence,”
http://redis.io/topics/persistence, 2014.