SlideShare a Scribd company logo
1 of 5
Download to read offline
UNIVERSITY OF ONTARIO INSTITUTE OF TECHNOLOGY, WEB-SERVICES AND E-BUSINESS SECURITY, APRIL 2015 1
Redis, a NoSQL key-value store used in Big
Data: a security analysis
George Logarakis,Gustavo El Khoury,Ian Dunbar.
Abstract—An important topic in the field of Big Data is, without doubt, the storage technology to use. Such technology
must provide easy and fast access to the datasets, while ensuring the integrity, resiliency and even confidentiality. In this
paper, a comprehensive review of Redis, a data store used in Big Data will be executed. A summary of its architecture
and utility will be offered, as well as a discussion of its competence in terms of security.
Keywords—Redis, NoSQL, Datastore, Big Data, Privacy, Security
!
1 INTRODUCTION
ONE of the most difficult challenges to face
while working on a project that involves
Big Data is, essentially, where and how to store
this data. The main reason for this is that tra-
ditional technologies, such as databases, usu-
ally don’t scale well enough, or take enormous
amounts of time to perform simple operations.
This is usually because such technologies were
not developed with the idea of huge datasets
in mind. For such purposes, special storage so-
lutions are currently being developed, tailored
particularly for the needs of Big Data. One
of such solutions is Redis, a NoSQL database
that stores key-value pair that can be stored
in great quantities and retrieved really quick.
On this paper we will do a quick overview of
the technologies behind Redis, its architecture
and the reasons behind its performance. Most
importantly, we will cover the security mecha-
nisms embedded in Redis, and we’ll compare
these against the industry’s best practices and
guidelines.
April 5, 2015
2 A BRIEF OVERVIEW OF REDIS
One of the best ways to describe Redis is as
an in-memory, key-value data store[1]. It’s a
type of NoSQL database, since the data stored
• G. Logarakis, G. El Khoury, and I. Dunbar are with the Univer-
sity of Ontario Institute of Technology
Manuscript received April 8, 2015; revised April 5, 2015.
doesn’t follow any particular schema, beyond a
dictionary-like structure: a value, which can be
of any type within the supported ones (strings,
sets, ordered sets and hashes), is uniquely as-
sociated with a key. In order to retrieve the
value, the key -and only the key- can be used
to access it. This level of simplicity can seem
restrictive at first, but it also provides incredible
flexibility in terms of the projects in which
Redis can be used. Furthermore, since Redis
uses the system memory as the primary storage
for the key-value pairs, it’s incredibly fast when
compared with traditional storage solutions
which use secondary storage like hard disks or
SSD storage -however, to provide persistence,
secondary storage is used to save the memory
contents-. These features turn Redis into a pow-
erful tool with the following advantages:
• Incredible performance: In environments
in which the rate of information received
per second is really high, traditional
databases fail to deliver the required per-
formance. Since Redis uses RAM to store
the data, the speed in which it processes
operations in really high, reaching up to
500K operations per second[2]
• Flexibility: Since it uses the simple key-
value data structure, many applications
can easily take advantage of Redis
• Scalability: Because of its design, Redis
can store millions of entries without ex-
periencing any performance reduction.
2 UNIVERSITY OF ONTARIO INSTITUTE OF TECHNOLOGY, WEB-SERVICES AND E-BUSINESS SECURITY, APRIL 2015
Fig. 1. Redis Replication
In the industry, Redis is mainly used for two
purposes: as a cache for large-scale applications
that must be responsive and fast while having
thousands of entries, like Twitter [3]; or by
Big Data providers, specially in cases in which
the number of entries received per second out-
classes most commercial database systems.
3 REDIS ARCHITECTURE
In terms of roles, Redis distinguishes two ev-
ident ones: a client, which can be any system
process accessing the store, and a server, which
provides services to any number of clients.
The server and the client can be on different
machines, and even on different networks, but
this last scenario is discouraged for security
reasons that will be discussed further on.
For the data to be persistent, Redis provides
two mechanisms to ensure data is not lost[4]:
• RDB persistence: On specific intervals or
triggering conditions (eg. after 100 write
operations, or after 5 minutes of runtime),
the server can perform a snapshot of the
contents. A snapshot can also be taken
manually with a SAVE command. Irre-
spective of how it’s triggered, this task
involves a fork() system call, which can
be time-consuming depending on the size
of the data store, and it can make Redis
stop serving clients
• AOF persistence logs: Every write oper-
ation on the data store is logged as a
part of an append-only logfile. Although
Fig. 2. Redis Clustering
it provides more flexibility in terms of
storage, and is more resilient to corrup-
tion because of the append-only property
of the logfile, it can take more space than
a RDB snapshot.
To provide resiliency and fault tolerance,
Redis incorporates replication techniques (See
figure 1) in order to provide fault-tolerance
and data accessibility [1]. In this scenario, a
master server is designated, and it replicates
the write operations to the other Redis servers
in real time. At the same time, intensive read
operations are received by the master server
and divided among as many slave servers
as needed. It’s important to notice that even
thought the master server maintains the slave
servers synced, each slave must ensure persis-
tence for its dataset.
In order to extend the storage capacity of
Redis, clustering can be used (see figure 2).
This allows a data store to be sharded across
multiple servers, using the entire RAM as the
datastore. As a consequence of this architecture,
a single failure in one of the nodes causes the
cluster to stop working. However, clustering
and replication can be combined in more com-
plex structures to achieve fault tolerance and
large capacity at the same time (see figure 3)
4 REDIS SECURITY MODEL
4.1 General Model
Redis is designed to be used in an isolated
environment (i.e. client pc) and it is recom-
EL KHOURY, LOGARAKIS, DUNBAR et al.: REDIS, A NOSQL KEY-VALUE STORE USED IN BIG DATA: A SECURITY ANALYSIS 3
Fig. 3. Redis Clustering and Replication simul-
taneously
mended that the Redis instance not be directly
exposed to the internet or any environment
where untrusted clients can directly access the
TCP port or UNIX socket [1]. Redis is not op-
timized for maximum security, it is optimized
for maximum performance and simplicity[1].
The follow sections will outline the three major
security areas in Redis and the features and
drawbacks of each.
4.2 Network Security
Redis does not provide any network security
features on install, if any network security is
desired, it is the clients responsibility to im-
plement the security. Redis does provide rec-
ommendations on what security measures to
implement. The first recommended action is en-
sure access to the Redis port is denied to every-
body but the trusted clients in the network [1].
This is to ensure all servers running Redis are
only accessible by the computers implement-
ing the application using Redis [1]. If Redis is
running on a single computer connected to the
internet, the Redis port should be firewalled to
prevent access from the outside environment
[1]. Failure to protect the Redis port can have
major consequences, one example of this is that
a single FLUSHALL command can be used by
an attacker to delete the whole dataset. The
Redis documentation informs users of the risks
but leaves the implementation of security up to
the user.
4.3 Authentication and Data Encryption
Redis does not implement any access control
features by default, but does have the option
to implement a small layer of access control
by editing the redis.conf file. When the au-
thentication feature is enabled, Redis refuses
any query by unauthenticated clients. To au-
thenticate itself, a client must issue the AUTH
command followed by a password. The main
problem with Redis access control feature is
that the password set by the administrator is
saved and sent in clear text. When an admin-
istrator issues an AUTH command, the entire
command (including the password) is sent in
plain text. If the client does not have the proper
network security implemented an external at-
tacker can eavesdrop and determine the clients
password and ultimately gain access to Redis.
The authentication layer in Redis is designed
to prevent external attackers from accessing the
Redis instance, but if an attacker can success-
fully gain access to the network they can secure
the password thus rendering the authentication
layer useless. As seen in the authentication
layer Redis does not support data encryption.
Similar to the network security, It is the clients
responsibility to implement an additional layer
of protections (i.e. SSL proxy), if parties want
to access Redis over the internet.
4 UNIVERSITY OF ONTARIO INSTITUTE OF TECHNOLOGY, WEB-SERVICES AND E-BUSINESS SECURITY, APRIL 2015
5 BEST PRACTICES IN BIG DATA AND
REDIS
With the widespread use of big data there are
several best practices that should be followed
to ensure that data is properly represented, and
the confidentiality of the data is maintained.
The three main best practices that we will be
looking at are the: Create a Data firewall, Pro-
tection of the data, Gather security intelligence.
The data firewall aspect of big data revolves
around the restriction of access to data by an
unauthorized user. By creating policies that
restrict users and only allowing those who are
authorized to access data it ensures that the
data is not accessed by unauthorized users and
cannot access information that should not have
access too. Good security practices would be to
ensure that they can detect and prevent priv-
ileged users from unauthorized access. With
Redis many of the security and design prin-
ciples are left to the user to configure, Redis
does not natively support any restriction of the
databases it creates, instead it relies on the user
to restrict access to the port that it connects to
on the individual machines in the network. This
provides some level of security but leaves it
open to vulnerabilities by not allowing a more
broad range of security tools for securing the
database.
The next best practice is to Protect the data,
this practice is focused on encryption and en-
suring that data is protected. This protection
should take place while the data is in storage
at a specific location, as well as while it is
in transit. The best way to ensure this is by
using encryption and having a centralized key
management process in order to protect the
data and ensure it cannot be easily decrypted.
Redis has been designed for local use and not
for use over a WAN network, as such they do
not provide any level of encryption to the data
they store. Since there is no encryption on the
data it will be in plain text, so if any user get
unauthorized access to the data there will be
no safeguards to ensure that the data remains
secure.
The final best practice that we examined
was the Gathering of security intelligence. This
practise is put in place regarding the analysis
and audit of access to your data. Having a
proper logging will help to ensure that you
will be able to detect suspicious or frequent
access to sensitive data and will help to prevent
threats from occurring. Many of the benefits to
using Redis are the high level of customization
that is granted to its users, allowing for many
different configurations of the system. The cus-
tomization allows for many different variations
in the logs as well as in searching through
the database. One of the features to improve
logging with Redis is the AOF persistence logs
which logs every write operation received by
the server, they will be played again at startup
in order to reconstruct the original dataset.
6 RECOMMENDATIONS TO IMPROVE
THE SECURITY OF REDIS
To improve the functionality of Redis there
are several recommendations we would like
to make so that they will conform to best
practices. To ensure that only authorized users
have access to your data, all servers that are
running Redis should only be accessible by the
computers implementing the application using
Redis. Also if running Redis on a single com-
puter connected to the internet, the Redis port
should be firewalled to prevent unauthorized
access from outside the network.
Since Redis does not provide any encryption
with its database, it is important for Redis to
add this feature into later revisions so that
users will be able to have a secure database.
If Redis is being used over the internet steps
should be taken to include additional layers
of security and protection such as using an
SSL proxy. With such a customizable big data
system integrating security into their system is
a logical next step for Redis, doing this will give
users a greater sense of security and ensure that
the data that they are storing is protected from
unauthorized users.
REFERENCES
[1] N. Prusty, “Overview of redis architecture,”
http://qnimate.com/overview-of-redis-architecture/,
2014.
[2] Redis, “How fast is redis?”
http://redis.io/topics/benchmarks, 2014.
EL KHOURY, LOGARAKIS, DUNBAR et al.: REDIS, A NOSQL KEY-VALUE STORE USED IN BIG DATA: A SECURITY ANALYSIS 5
[3] T. Hoff, “How twitter uses redis to scale:
105tb ram, 39mm qps, 10,000+ instances,”
http://highscalability.com/blog/2014/9/8/how-twitter-
uses-redis-to-scale-105tb-ram-39mm-qps-10000-ins.html,
2014.
[4] Redis, “Redis persistence,”
http://redis.io/topics/persistence, 2014.

More Related Content

What's hot

PLEDGE: A POLICY-BASED SECURITY PROTOCOL FOR PROTECTING CONTENT ADDRESSABLE S...
PLEDGE: A POLICY-BASED SECURITY PROTOCOL FOR PROTECTING CONTENT ADDRESSABLE S...PLEDGE: A POLICY-BASED SECURITY PROTOCOL FOR PROTECTING CONTENT ADDRESSABLE S...
PLEDGE: A POLICY-BASED SECURITY PROTOCOL FOR PROTECTING CONTENT ADDRESSABLE S...IJNSA Journal
 
Cryoserver Technical Overview
Cryoserver Technical OverviewCryoserver Technical Overview
Cryoserver Technical Overviewcryoserver
 
Dynamic Resource Allocation and Data Security for Cloud
Dynamic Resource Allocation and Data Security for CloudDynamic Resource Allocation and Data Security for Cloud
Dynamic Resource Allocation and Data Security for CloudAM Publications
 
Improved deduplication with keys and chunks in HDFS storage providers
Improved deduplication with keys and chunks in HDFS storage providersImproved deduplication with keys and chunks in HDFS storage providers
Improved deduplication with keys and chunks in HDFS storage providersIRJET Journal
 
Security Threat Solution over Single Cloud To Multi-Cloud Using DepSky Model
Security Threat Solution over Single Cloud To Multi-Cloud Using DepSky ModelSecurity Threat Solution over Single Cloud To Multi-Cloud Using DepSky Model
Security Threat Solution over Single Cloud To Multi-Cloud Using DepSky ModelIOSR Journals
 
Secure Distributed Deduplication Systems with Improved Reliability
Secure Distributed Deduplication Systems with Improved ReliabilitySecure Distributed Deduplication Systems with Improved Reliability
Secure Distributed Deduplication Systems with Improved Reliability1crore projects
 
An approach for secured data transmission at client end in cloud computing
An approach for secured data transmission at client end in cloud computingAn approach for secured data transmission at client end in cloud computing
An approach for secured data transmission at client end in cloud computingIAEME Publication
 
Enabling Integrity for the Compressed Files in Cloud Server
Enabling Integrity for the Compressed Files in Cloud ServerEnabling Integrity for the Compressed Files in Cloud Server
Enabling Integrity for the Compressed Files in Cloud ServerIOSR Journals
 
A SECURITY FRAMEWORK IN CLOUD COMPUTING INFRASTRUCTURE
A SECURITY FRAMEWORK IN CLOUD COMPUTING INFRASTRUCTUREA SECURITY FRAMEWORK IN CLOUD COMPUTING INFRASTRUCTURE
A SECURITY FRAMEWORK IN CLOUD COMPUTING INFRASTRUCTUREIJNSA Journal
 
Secure Auditing and Deduplicating Data on Cloud
Secure Auditing and Deduplicating Data on CloudSecure Auditing and Deduplicating Data on Cloud
Secure Auditing and Deduplicating Data on CloudIJMTST Journal
 
A Trusted TPA Model, to Improve Security & Reliability for Cloud Storage
A Trusted TPA Model, to Improve Security & Reliability for Cloud StorageA Trusted TPA Model, to Improve Security & Reliability for Cloud Storage
A Trusted TPA Model, to Improve Security & Reliability for Cloud StorageIRJET Journal
 
data storage security technique for cloud computing
data storage security technique for cloud computingdata storage security technique for cloud computing
data storage security technique for cloud computinghasimshah
 
Enhancing Availability of Data in Mixed Homomorphic Encryption in Cloud
Enhancing Availability of Data in Mixed Homomorphic Encryption in CloudEnhancing Availability of Data in Mixed Homomorphic Encryption in Cloud
Enhancing Availability of Data in Mixed Homomorphic Encryption in Cloudijtsrd
 
Secure distributed deduplication systems
Secure distributed deduplication systemsSecure distributed deduplication systems
Secure distributed deduplication systemsPvrtechnologies Nellore
 
Revocation based De-duplication Systems for Improving Reliability in Cloud St...
Revocation based De-duplication Systems for Improving Reliability in Cloud St...Revocation based De-duplication Systems for Improving Reliability in Cloud St...
Revocation based De-duplication Systems for Improving Reliability in Cloud St...IRJET Journal
 
Securely Data Forwarding and Maintaining Reliability of Data in Cloud Computing
Securely Data Forwarding and Maintaining Reliability of Data in Cloud ComputingSecurely Data Forwarding and Maintaining Reliability of Data in Cloud Computing
Securely Data Forwarding and Maintaining Reliability of Data in Cloud ComputingIJERA Editor
 
Database security technique with database cache
Database security technique with database cacheDatabase security technique with database cache
Database security technique with database cacheIJARIIT
 
Survey on securing outsourced storages in cloud
Survey on securing outsourced storages in cloudSurvey on securing outsourced storages in cloud
Survey on securing outsourced storages in cloudeSAT Publishing House
 
Survey on securing outsourced storages in cloud
Survey on securing outsourced storages in cloudSurvey on securing outsourced storages in cloud
Survey on securing outsourced storages in cloudeSAT Journals
 

What's hot (20)

PLEDGE: A POLICY-BASED SECURITY PROTOCOL FOR PROTECTING CONTENT ADDRESSABLE S...
PLEDGE: A POLICY-BASED SECURITY PROTOCOL FOR PROTECTING CONTENT ADDRESSABLE S...PLEDGE: A POLICY-BASED SECURITY PROTOCOL FOR PROTECTING CONTENT ADDRESSABLE S...
PLEDGE: A POLICY-BASED SECURITY PROTOCOL FOR PROTECTING CONTENT ADDRESSABLE S...
 
Cryoserver Technical Overview
Cryoserver Technical OverviewCryoserver Technical Overview
Cryoserver Technical Overview
 
Dynamic Resource Allocation and Data Security for Cloud
Dynamic Resource Allocation and Data Security for CloudDynamic Resource Allocation and Data Security for Cloud
Dynamic Resource Allocation and Data Security for Cloud
 
Improved deduplication with keys and chunks in HDFS storage providers
Improved deduplication with keys and chunks in HDFS storage providersImproved deduplication with keys and chunks in HDFS storage providers
Improved deduplication with keys and chunks in HDFS storage providers
 
Security Threat Solution over Single Cloud To Multi-Cloud Using DepSky Model
Security Threat Solution over Single Cloud To Multi-Cloud Using DepSky ModelSecurity Threat Solution over Single Cloud To Multi-Cloud Using DepSky Model
Security Threat Solution over Single Cloud To Multi-Cloud Using DepSky Model
 
Secure Distributed Deduplication Systems with Improved Reliability
Secure Distributed Deduplication Systems with Improved ReliabilitySecure Distributed Deduplication Systems with Improved Reliability
Secure Distributed Deduplication Systems with Improved Reliability
 
An approach for secured data transmission at client end in cloud computing
An approach for secured data transmission at client end in cloud computingAn approach for secured data transmission at client end in cloud computing
An approach for secured data transmission at client end in cloud computing
 
Enabling Integrity for the Compressed Files in Cloud Server
Enabling Integrity for the Compressed Files in Cloud ServerEnabling Integrity for the Compressed Files in Cloud Server
Enabling Integrity for the Compressed Files in Cloud Server
 
A SECURITY FRAMEWORK IN CLOUD COMPUTING INFRASTRUCTURE
A SECURITY FRAMEWORK IN CLOUD COMPUTING INFRASTRUCTUREA SECURITY FRAMEWORK IN CLOUD COMPUTING INFRASTRUCTURE
A SECURITY FRAMEWORK IN CLOUD COMPUTING INFRASTRUCTURE
 
Secure Auditing and Deduplicating Data on Cloud
Secure Auditing and Deduplicating Data on CloudSecure Auditing and Deduplicating Data on Cloud
Secure Auditing and Deduplicating Data on Cloud
 
A Trusted TPA Model, to Improve Security & Reliability for Cloud Storage
A Trusted TPA Model, to Improve Security & Reliability for Cloud StorageA Trusted TPA Model, to Improve Security & Reliability for Cloud Storage
A Trusted TPA Model, to Improve Security & Reliability for Cloud Storage
 
data storage security technique for cloud computing
data storage security technique for cloud computingdata storage security technique for cloud computing
data storage security technique for cloud computing
 
Enhancing Availability of Data in Mixed Homomorphic Encryption in Cloud
Enhancing Availability of Data in Mixed Homomorphic Encryption in CloudEnhancing Availability of Data in Mixed Homomorphic Encryption in Cloud
Enhancing Availability of Data in Mixed Homomorphic Encryption in Cloud
 
Secure distributed deduplication systems
Secure distributed deduplication systemsSecure distributed deduplication systems
Secure distributed deduplication systems
 
L04302088092
L04302088092L04302088092
L04302088092
 
Revocation based De-duplication Systems for Improving Reliability in Cloud St...
Revocation based De-duplication Systems for Improving Reliability in Cloud St...Revocation based De-duplication Systems for Improving Reliability in Cloud St...
Revocation based De-duplication Systems for Improving Reliability in Cloud St...
 
Securely Data Forwarding and Maintaining Reliability of Data in Cloud Computing
Securely Data Forwarding and Maintaining Reliability of Data in Cloud ComputingSecurely Data Forwarding and Maintaining Reliability of Data in Cloud Computing
Securely Data Forwarding and Maintaining Reliability of Data in Cloud Computing
 
Database security technique with database cache
Database security technique with database cacheDatabase security technique with database cache
Database security technique with database cache
 
Survey on securing outsourced storages in cloud
Survey on securing outsourced storages in cloudSurvey on securing outsourced storages in cloud
Survey on securing outsourced storages in cloud
 
Survey on securing outsourced storages in cloud
Survey on securing outsourced storages in cloudSurvey on securing outsourced storages in cloud
Survey on securing outsourced storages in cloud
 

Similar to paper

Transaction management techniques and practices in current cloud computing en...
Transaction management techniques and practices in current cloud computing en...Transaction management techniques and practices in current cloud computing en...
Transaction management techniques and practices in current cloud computing en...ijdms
 
Cloud Computing: Provide privacy and Security in Database-as-a-Service
Cloud Computing: Provide privacy and Security in Database-as-a-ServiceCloud Computing: Provide privacy and Security in Database-as-a-Service
Cloud Computing: Provide privacy and Security in Database-as-a-ServiceEditor Jacotech
 
A Survey Paper on Removal of Data Duplication in a Hybrid Cloud
 A Survey Paper on Removal of Data Duplication in a Hybrid Cloud  A Survey Paper on Removal of Data Duplication in a Hybrid Cloud
A Survey Paper on Removal of Data Duplication in a Hybrid Cloud IRJET Journal
 
An Approach towards Shuffling of Data to Avoid Tampering in Cloud
An Approach towards Shuffling of Data to Avoid Tampering in CloudAn Approach towards Shuffling of Data to Avoid Tampering in Cloud
An Approach towards Shuffling of Data to Avoid Tampering in CloudIRJET Journal
 
Cooperative Schedule Data Possession for Integrity Verification in Multi-Clou...
Cooperative Schedule Data Possession for Integrity Verification in Multi-Clou...Cooperative Schedule Data Possession for Integrity Verification in Multi-Clou...
Cooperative Schedule Data Possession for Integrity Verification in Multi-Clou...IJMER
 
BLOCKCHAIN BASED DATA SECURITY AS A SERVICE IN CLOUD PLATFORM SECURITY
BLOCKCHAIN BASED DATA SECURITY AS A SERVICE IN CLOUD PLATFORM SECURITYBLOCKCHAIN BASED DATA SECURITY AS A SERVICE IN CLOUD PLATFORM SECURITY
BLOCKCHAIN BASED DATA SECURITY AS A SERVICE IN CLOUD PLATFORM SECURITYijccsa
 
CRITICISMS OF THE FUTURE AVAILABILITY IN SUSTAINABLE GENDER GOAL, ACCESS TO L...
CRITICISMS OF THE FUTURE AVAILABILITY IN SUSTAINABLE GENDER GOAL, ACCESS TO L...CRITICISMS OF THE FUTURE AVAILABILITY IN SUSTAINABLE GENDER GOAL, ACCESS TO L...
CRITICISMS OF THE FUTURE AVAILABILITY IN SUSTAINABLE GENDER GOAL, ACCESS TO L...csijjournal
 
Blockchain based Data Security as a Service in Cloud Platform Security
Blockchain based Data Security as a Service in Cloud Platform SecurityBlockchain based Data Security as a Service in Cloud Platform Security
Blockchain based Data Security as a Service in Cloud Platform Securityijccsa
 
BLOCKCHAIN BASED DATA SECURITY AS A SERVICE IN CLOUD PLATFORM SECURITY
BLOCKCHAIN BASED DATA SECURITY AS A SERVICE IN CLOUD PLATFORM SECURITYBLOCKCHAIN BASED DATA SECURITY AS A SERVICE IN CLOUD PLATFORM SECURITY
BLOCKCHAIN BASED DATA SECURITY AS A SERVICE IN CLOUD PLATFORM SECURITYijccsa
 
Cidr11 paper32
Cidr11 paper32Cidr11 paper32
Cidr11 paper32jujukoko
 
Megastore providing scalable, highly available storage for interactive services
Megastore providing scalable, highly available storage for interactive servicesMegastore providing scalable, highly available storage for interactive services
Megastore providing scalable, highly available storage for interactive servicesJoão Gabriel Lima
 
Enterprise level security, the Huddle way
Enterprise level security, the Huddle wayEnterprise level security, the Huddle way
Enterprise level security, the Huddle wayHuddleHQ
 
Overview of cloud computing
Overview of cloud computingOverview of cloud computing
Overview of cloud computingTarek Nader
 
wp-security-dbsec-cloud-3225125
wp-security-dbsec-cloud-3225125wp-security-dbsec-cloud-3225125
wp-security-dbsec-cloud-3225125Gabor Bokor
 
Cloud Computing Using Encryption and Intrusion Detection
Cloud Computing Using Encryption and Intrusion DetectionCloud Computing Using Encryption and Intrusion Detection
Cloud Computing Using Encryption and Intrusion Detectionijsrd.com
 
A robust and verifiable threshold multi authority access control system in pu...
A robust and verifiable threshold multi authority access control system in pu...A robust and verifiable threshold multi authority access control system in pu...
A robust and verifiable threshold multi authority access control system in pu...IJARIIT
 
IRJET- Secured Hadoop Environment
IRJET- Secured Hadoop EnvironmentIRJET- Secured Hadoop Environment
IRJET- Secured Hadoop EnvironmentIRJET Journal
 
A Study of A Method To Provide Minimized Bandwidth Consumption Using Regenera...
A Study of A Method To Provide Minimized Bandwidth Consumption Using Regenera...A Study of A Method To Provide Minimized Bandwidth Consumption Using Regenera...
A Study of A Method To Provide Minimized Bandwidth Consumption Using Regenera...IJERA Editor
 

Similar to paper (20)

Transaction management techniques and practices in current cloud computing en...
Transaction management techniques and practices in current cloud computing en...Transaction management techniques and practices in current cloud computing en...
Transaction management techniques and practices in current cloud computing en...
 
Cloud Computing: Provide privacy and Security in Database-as-a-Service
Cloud Computing: Provide privacy and Security in Database-as-a-ServiceCloud Computing: Provide privacy and Security in Database-as-a-Service
Cloud Computing: Provide privacy and Security in Database-as-a-Service
 
1376842823 2982373
1376842823  29823731376842823  2982373
1376842823 2982373
 
1376842823 2982373
1376842823  29823731376842823  2982373
1376842823 2982373
 
A Survey Paper on Removal of Data Duplication in a Hybrid Cloud
 A Survey Paper on Removal of Data Duplication in a Hybrid Cloud  A Survey Paper on Removal of Data Duplication in a Hybrid Cloud
A Survey Paper on Removal of Data Duplication in a Hybrid Cloud
 
An Approach towards Shuffling of Data to Avoid Tampering in Cloud
An Approach towards Shuffling of Data to Avoid Tampering in CloudAn Approach towards Shuffling of Data to Avoid Tampering in Cloud
An Approach towards Shuffling of Data to Avoid Tampering in Cloud
 
Cooperative Schedule Data Possession for Integrity Verification in Multi-Clou...
Cooperative Schedule Data Possession for Integrity Verification in Multi-Clou...Cooperative Schedule Data Possession for Integrity Verification in Multi-Clou...
Cooperative Schedule Data Possession for Integrity Verification in Multi-Clou...
 
BLOCKCHAIN BASED DATA SECURITY AS A SERVICE IN CLOUD PLATFORM SECURITY
BLOCKCHAIN BASED DATA SECURITY AS A SERVICE IN CLOUD PLATFORM SECURITYBLOCKCHAIN BASED DATA SECURITY AS A SERVICE IN CLOUD PLATFORM SECURITY
BLOCKCHAIN BASED DATA SECURITY AS A SERVICE IN CLOUD PLATFORM SECURITY
 
CRITICISMS OF THE FUTURE AVAILABILITY IN SUSTAINABLE GENDER GOAL, ACCESS TO L...
CRITICISMS OF THE FUTURE AVAILABILITY IN SUSTAINABLE GENDER GOAL, ACCESS TO L...CRITICISMS OF THE FUTURE AVAILABILITY IN SUSTAINABLE GENDER GOAL, ACCESS TO L...
CRITICISMS OF THE FUTURE AVAILABILITY IN SUSTAINABLE GENDER GOAL, ACCESS TO L...
 
Blockchain based Data Security as a Service in Cloud Platform Security
Blockchain based Data Security as a Service in Cloud Platform SecurityBlockchain based Data Security as a Service in Cloud Platform Security
Blockchain based Data Security as a Service in Cloud Platform Security
 
BLOCKCHAIN BASED DATA SECURITY AS A SERVICE IN CLOUD PLATFORM SECURITY
BLOCKCHAIN BASED DATA SECURITY AS A SERVICE IN CLOUD PLATFORM SECURITYBLOCKCHAIN BASED DATA SECURITY AS A SERVICE IN CLOUD PLATFORM SECURITY
BLOCKCHAIN BASED DATA SECURITY AS A SERVICE IN CLOUD PLATFORM SECURITY
 
Cidr11 paper32
Cidr11 paper32Cidr11 paper32
Cidr11 paper32
 
Megastore providing scalable, highly available storage for interactive services
Megastore providing scalable, highly available storage for interactive servicesMegastore providing scalable, highly available storage for interactive services
Megastore providing scalable, highly available storage for interactive services
 
Enterprise level security, the Huddle way
Enterprise level security, the Huddle wayEnterprise level security, the Huddle way
Enterprise level security, the Huddle way
 
Overview of cloud computing
Overview of cloud computingOverview of cloud computing
Overview of cloud computing
 
wp-security-dbsec-cloud-3225125
wp-security-dbsec-cloud-3225125wp-security-dbsec-cloud-3225125
wp-security-dbsec-cloud-3225125
 
Cloud Computing Using Encryption and Intrusion Detection
Cloud Computing Using Encryption and Intrusion DetectionCloud Computing Using Encryption and Intrusion Detection
Cloud Computing Using Encryption and Intrusion Detection
 
A robust and verifiable threshold multi authority access control system in pu...
A robust and verifiable threshold multi authority access control system in pu...A robust and verifiable threshold multi authority access control system in pu...
A robust and verifiable threshold multi authority access control system in pu...
 
IRJET- Secured Hadoop Environment
IRJET- Secured Hadoop EnvironmentIRJET- Secured Hadoop Environment
IRJET- Secured Hadoop Environment
 
A Study of A Method To Provide Minimized Bandwidth Consumption Using Regenera...
A Study of A Method To Provide Minimized Bandwidth Consumption Using Regenera...A Study of A Method To Provide Minimized Bandwidth Consumption Using Regenera...
A Study of A Method To Provide Minimized Bandwidth Consumption Using Regenera...
 

paper

  • 1. UNIVERSITY OF ONTARIO INSTITUTE OF TECHNOLOGY, WEB-SERVICES AND E-BUSINESS SECURITY, APRIL 2015 1 Redis, a NoSQL key-value store used in Big Data: a security analysis George Logarakis,Gustavo El Khoury,Ian Dunbar. Abstract—An important topic in the field of Big Data is, without doubt, the storage technology to use. Such technology must provide easy and fast access to the datasets, while ensuring the integrity, resiliency and even confidentiality. In this paper, a comprehensive review of Redis, a data store used in Big Data will be executed. A summary of its architecture and utility will be offered, as well as a discussion of its competence in terms of security. Keywords—Redis, NoSQL, Datastore, Big Data, Privacy, Security ! 1 INTRODUCTION ONE of the most difficult challenges to face while working on a project that involves Big Data is, essentially, where and how to store this data. The main reason for this is that tra- ditional technologies, such as databases, usu- ally don’t scale well enough, or take enormous amounts of time to perform simple operations. This is usually because such technologies were not developed with the idea of huge datasets in mind. For such purposes, special storage so- lutions are currently being developed, tailored particularly for the needs of Big Data. One of such solutions is Redis, a NoSQL database that stores key-value pair that can be stored in great quantities and retrieved really quick. On this paper we will do a quick overview of the technologies behind Redis, its architecture and the reasons behind its performance. Most importantly, we will cover the security mecha- nisms embedded in Redis, and we’ll compare these against the industry’s best practices and guidelines. April 5, 2015 2 A BRIEF OVERVIEW OF REDIS One of the best ways to describe Redis is as an in-memory, key-value data store[1]. It’s a type of NoSQL database, since the data stored • G. Logarakis, G. El Khoury, and I. Dunbar are with the Univer- sity of Ontario Institute of Technology Manuscript received April 8, 2015; revised April 5, 2015. doesn’t follow any particular schema, beyond a dictionary-like structure: a value, which can be of any type within the supported ones (strings, sets, ordered sets and hashes), is uniquely as- sociated with a key. In order to retrieve the value, the key -and only the key- can be used to access it. This level of simplicity can seem restrictive at first, but it also provides incredible flexibility in terms of the projects in which Redis can be used. Furthermore, since Redis uses the system memory as the primary storage for the key-value pairs, it’s incredibly fast when compared with traditional storage solutions which use secondary storage like hard disks or SSD storage -however, to provide persistence, secondary storage is used to save the memory contents-. These features turn Redis into a pow- erful tool with the following advantages: • Incredible performance: In environments in which the rate of information received per second is really high, traditional databases fail to deliver the required per- formance. Since Redis uses RAM to store the data, the speed in which it processes operations in really high, reaching up to 500K operations per second[2] • Flexibility: Since it uses the simple key- value data structure, many applications can easily take advantage of Redis • Scalability: Because of its design, Redis can store millions of entries without ex- periencing any performance reduction.
  • 2. 2 UNIVERSITY OF ONTARIO INSTITUTE OF TECHNOLOGY, WEB-SERVICES AND E-BUSINESS SECURITY, APRIL 2015 Fig. 1. Redis Replication In the industry, Redis is mainly used for two purposes: as a cache for large-scale applications that must be responsive and fast while having thousands of entries, like Twitter [3]; or by Big Data providers, specially in cases in which the number of entries received per second out- classes most commercial database systems. 3 REDIS ARCHITECTURE In terms of roles, Redis distinguishes two ev- ident ones: a client, which can be any system process accessing the store, and a server, which provides services to any number of clients. The server and the client can be on different machines, and even on different networks, but this last scenario is discouraged for security reasons that will be discussed further on. For the data to be persistent, Redis provides two mechanisms to ensure data is not lost[4]: • RDB persistence: On specific intervals or triggering conditions (eg. after 100 write operations, or after 5 minutes of runtime), the server can perform a snapshot of the contents. A snapshot can also be taken manually with a SAVE command. Irre- spective of how it’s triggered, this task involves a fork() system call, which can be time-consuming depending on the size of the data store, and it can make Redis stop serving clients • AOF persistence logs: Every write oper- ation on the data store is logged as a part of an append-only logfile. Although Fig. 2. Redis Clustering it provides more flexibility in terms of storage, and is more resilient to corrup- tion because of the append-only property of the logfile, it can take more space than a RDB snapshot. To provide resiliency and fault tolerance, Redis incorporates replication techniques (See figure 1) in order to provide fault-tolerance and data accessibility [1]. In this scenario, a master server is designated, and it replicates the write operations to the other Redis servers in real time. At the same time, intensive read operations are received by the master server and divided among as many slave servers as needed. It’s important to notice that even thought the master server maintains the slave servers synced, each slave must ensure persis- tence for its dataset. In order to extend the storage capacity of Redis, clustering can be used (see figure 2). This allows a data store to be sharded across multiple servers, using the entire RAM as the datastore. As a consequence of this architecture, a single failure in one of the nodes causes the cluster to stop working. However, clustering and replication can be combined in more com- plex structures to achieve fault tolerance and large capacity at the same time (see figure 3) 4 REDIS SECURITY MODEL 4.1 General Model Redis is designed to be used in an isolated environment (i.e. client pc) and it is recom-
  • 3. EL KHOURY, LOGARAKIS, DUNBAR et al.: REDIS, A NOSQL KEY-VALUE STORE USED IN BIG DATA: A SECURITY ANALYSIS 3 Fig. 3. Redis Clustering and Replication simul- taneously mended that the Redis instance not be directly exposed to the internet or any environment where untrusted clients can directly access the TCP port or UNIX socket [1]. Redis is not op- timized for maximum security, it is optimized for maximum performance and simplicity[1]. The follow sections will outline the three major security areas in Redis and the features and drawbacks of each. 4.2 Network Security Redis does not provide any network security features on install, if any network security is desired, it is the clients responsibility to im- plement the security. Redis does provide rec- ommendations on what security measures to implement. The first recommended action is en- sure access to the Redis port is denied to every- body but the trusted clients in the network [1]. This is to ensure all servers running Redis are only accessible by the computers implement- ing the application using Redis [1]. If Redis is running on a single computer connected to the internet, the Redis port should be firewalled to prevent access from the outside environment [1]. Failure to protect the Redis port can have major consequences, one example of this is that a single FLUSHALL command can be used by an attacker to delete the whole dataset. The Redis documentation informs users of the risks but leaves the implementation of security up to the user. 4.3 Authentication and Data Encryption Redis does not implement any access control features by default, but does have the option to implement a small layer of access control by editing the redis.conf file. When the au- thentication feature is enabled, Redis refuses any query by unauthenticated clients. To au- thenticate itself, a client must issue the AUTH command followed by a password. The main problem with Redis access control feature is that the password set by the administrator is saved and sent in clear text. When an admin- istrator issues an AUTH command, the entire command (including the password) is sent in plain text. If the client does not have the proper network security implemented an external at- tacker can eavesdrop and determine the clients password and ultimately gain access to Redis. The authentication layer in Redis is designed to prevent external attackers from accessing the Redis instance, but if an attacker can success- fully gain access to the network they can secure the password thus rendering the authentication layer useless. As seen in the authentication layer Redis does not support data encryption. Similar to the network security, It is the clients responsibility to implement an additional layer of protections (i.e. SSL proxy), if parties want to access Redis over the internet.
  • 4. 4 UNIVERSITY OF ONTARIO INSTITUTE OF TECHNOLOGY, WEB-SERVICES AND E-BUSINESS SECURITY, APRIL 2015 5 BEST PRACTICES IN BIG DATA AND REDIS With the widespread use of big data there are several best practices that should be followed to ensure that data is properly represented, and the confidentiality of the data is maintained. The three main best practices that we will be looking at are the: Create a Data firewall, Pro- tection of the data, Gather security intelligence. The data firewall aspect of big data revolves around the restriction of access to data by an unauthorized user. By creating policies that restrict users and only allowing those who are authorized to access data it ensures that the data is not accessed by unauthorized users and cannot access information that should not have access too. Good security practices would be to ensure that they can detect and prevent priv- ileged users from unauthorized access. With Redis many of the security and design prin- ciples are left to the user to configure, Redis does not natively support any restriction of the databases it creates, instead it relies on the user to restrict access to the port that it connects to on the individual machines in the network. This provides some level of security but leaves it open to vulnerabilities by not allowing a more broad range of security tools for securing the database. The next best practice is to Protect the data, this practice is focused on encryption and en- suring that data is protected. This protection should take place while the data is in storage at a specific location, as well as while it is in transit. The best way to ensure this is by using encryption and having a centralized key management process in order to protect the data and ensure it cannot be easily decrypted. Redis has been designed for local use and not for use over a WAN network, as such they do not provide any level of encryption to the data they store. Since there is no encryption on the data it will be in plain text, so if any user get unauthorized access to the data there will be no safeguards to ensure that the data remains secure. The final best practice that we examined was the Gathering of security intelligence. This practise is put in place regarding the analysis and audit of access to your data. Having a proper logging will help to ensure that you will be able to detect suspicious or frequent access to sensitive data and will help to prevent threats from occurring. Many of the benefits to using Redis are the high level of customization that is granted to its users, allowing for many different configurations of the system. The cus- tomization allows for many different variations in the logs as well as in searching through the database. One of the features to improve logging with Redis is the AOF persistence logs which logs every write operation received by the server, they will be played again at startup in order to reconstruct the original dataset. 6 RECOMMENDATIONS TO IMPROVE THE SECURITY OF REDIS To improve the functionality of Redis there are several recommendations we would like to make so that they will conform to best practices. To ensure that only authorized users have access to your data, all servers that are running Redis should only be accessible by the computers implementing the application using Redis. Also if running Redis on a single com- puter connected to the internet, the Redis port should be firewalled to prevent unauthorized access from outside the network. Since Redis does not provide any encryption with its database, it is important for Redis to add this feature into later revisions so that users will be able to have a secure database. If Redis is being used over the internet steps should be taken to include additional layers of security and protection such as using an SSL proxy. With such a customizable big data system integrating security into their system is a logical next step for Redis, doing this will give users a greater sense of security and ensure that the data that they are storing is protected from unauthorized users. REFERENCES [1] N. Prusty, “Overview of redis architecture,” http://qnimate.com/overview-of-redis-architecture/, 2014. [2] Redis, “How fast is redis?” http://redis.io/topics/benchmarks, 2014.
  • 5. EL KHOURY, LOGARAKIS, DUNBAR et al.: REDIS, A NOSQL KEY-VALUE STORE USED IN BIG DATA: A SECURITY ANALYSIS 5 [3] T. Hoff, “How twitter uses redis to scale: 105tb ram, 39mm qps, 10,000+ instances,” http://highscalability.com/blog/2014/9/8/how-twitter- uses-redis-to-scale-105tb-ram-39mm-qps-10000-ins.html, 2014. [4] Redis, “Redis persistence,” http://redis.io/topics/persistence, 2014.