SlideShare a Scribd company logo
1 of 6
Download to read offline
Comparison of In-Memory Data Platforms
Amirmahdi Akbari, Hasan Dağ
Kadir Has University
{amirmahdi.akbar, hasan.dag}@khas.edu.tr
Unstoppable growth of data has prompted us to process
and analyze big data as quick as possible. One of these
approaches is using RAM (Random Access Memory) as
the main accessing device instead of traditional disks.
The major focus on this study is a comparison of two
well-known platforms (Hazelcast 3.0 and Infinispan 6.0)
respectively by their "Read" and "Write" performance
and efficiency of resource usage such as CPU and Main
Memory to determine a general understanding of what
the most important elements are or conditions to choose
between in-memory platforms.
1. Introduction
Todays’ data growth by no means could have been
predicted just a couple of decades ago. For example, the
data produced from the down of time up until the year
2003, can now be produced just in two days. The data
mainly comes from social media, all types of transactions,
logs, Internet of Things to name a few. The terms; speed,
volume, and the versatility are used to describe the big
data. Some argue that veracity and value must also be
used in the description of big data, referring to 5v. What
is important is the fact that data mining of this data in
real-time for all types of business functions.
When the real time big data in its full extent is seen to be
almost impossible, the increasing demands of big data
applications have led researchers and practitioners to turn
to in-memory computing to speed processing. For
instance, the Apache Spark framework stores
intermediate results in memory to deliver good
performance on iterative machine learning and interactive
data analysis tasks [1].
Among most important reasons to using In-Memory data
platforms are analyzing huge amount of cellular data to
enhance security of communications, providing quick
fraud detection in banking and financing and real-time
remarketing and retargeting such as Facebook’s Ad
exchange program known as FBX.
Before, going into the details of in-memory data
platforms let us provide an analogy for it to understand it
better.
2. In-Memory processing
Growing main memory capacity has fueled the
development of in-memory big data management and
processing. By eliminating disk I/O bottleneck, it is now
possible to support interactive data analytics [3].
Nowadays, the quantity of data that is created every two
days is estimated to be 5 Exabyte. This amount of data is
similar to the amount of data created from the dawn of
time up until 2003. Moreover, it was estimated that 2007
was the first year in which it was not possible to store all
the data that we are producing. This massive amount of
data opens new challenging discovery tasks [4].
Data stream real time analytics are needed to manage the
data currently generated, at an ever-increasing rate, from
such applications as: sensor networks, measurements in
network monitoring and traffic management, log records
or click-streams in web exploring, manufacturing
processes, call detail records, email, blogging, twitter
posts and others [5].
The idea of In-Memory is to keep the needed data as
close to as to the CPU. One of classic problem of
traditional computing is that the data comes from disk.
Although increased in capacity and therefore the price per
gigabyte and terabyte is dropped, disk storage
performance has in no point increased.
2.1. Factors that Made In-Memory so Popular
Several factors made in memory so popular such as
increasing demand for real time analytics for security and
advertisement goals, recent software developments and
speed differences between memory and disk, which we
are going through some of them in more details:
2.1.1. Need for Real-time OLAP
Online analytical processing is one of most important
reasons to engage In-Memory and combination of recent
technologies enabling iterative links between the real-
time analysis of data for the prediction of business trends
and execution of business decisions immediately.
2.1.2. Speed of Main Memory
As we can see in Table 1., reading 1 MB sequentially
from Memory is more than 100 times faster than that of
Disk and about four times faster than that of SSD.
Table 1: Access and read times for disk, SSD and
main memory [7]
Action Time
Main memory access 100 ns
Read 1 MB from memory 250,000 ns
Read 1 MB from SSD 1,000,000 ns
Disk seek 10,000,000 ns
Read 1 MB from disk 20,000,000 ns
2.1.3. Reduced Costs and Growing Capacity of
Memory
Main memory, as the primary storage location is
becoming increasingly attractive because of the
decreasing cost/size ratio. As shown in Fig 1., we can see
main memory price development over the past years.
2.1.4. Energy efficiency by Employing In-Memory
Technology
Energy consumption is another important factor for data
center operations. According to a benchmark discussed in
the book “In-Memory Data Management-Technology and
Applications” The in-memory configuration offers the
best performance and consumes the least amount of
power among the tested configurations.
Only the configuration with 100 parallel disks provides a
throughput near the throughput observed on the main
memory variant. The 100 disks variant consumes more
than three times the power than the main memory variant
[2].
2.1.5. Software Development
Development of software technologies such as:
• Columnar storage
• Insert only (Append only file)
• Compression and Parallelization
Also have remarkable effects to convince data centered
organizations to use or consider using In-Memory for
processing and analyzing data.
max.speedup(N) =
1
(1− p)+
P
N
(1)
Equation (1) defines Amdahl’s law where P is the
fraction of the code that can be processed in parallel and
N is the number of CPU cores that is the level of
parallelism in the program. Parallelizability of codes in
applications is another good reason to get as much
advantage as possible from speed of In-Memory
processing.
3. Comparison of two In-Memory
platforms – (Hazelcast 3.0 Vs. Infinispan
6.0)
Hazelcast is the leading provider of operational in-
memory computing with tens of thousands of installed
clusters and over 16 million servers starting each month.
It is free and open source software provided under the
Apache 2 license. Organizations are encouraged to freely
download Hazelcast and initiate proof-of-concept (POC),
and even go into production deployment on open source
Hazelcast [14].
On the other hand, Infinispan is also famous for its near
cache processing and is an extremely scalable, highly
available key/value data store and data grid platform. It is
100% open source and written in Java. The purpose of
Infinispan is to expose a data structure that is distributed,
highly concurrent, and designed ground-up to make the
most of modern multi-processor and multi-core
architectures. It is often used as a distributed cache, but
also as a NoSQL key/value store or object database [15].
!
Figure 1: The speed, price and the capacity comparison of storage units [6].
4. Benchmark details and configuration
This is a comparison between two and four server
Infinispan 6.0 cluster and a two and four server Hazelcast
3.0 cluster, prepared using the standard caching
benchmarking tool RadarGun.
In order to be able to benchmark distributed caches
RadarGun is using a master/slave architecture in which
the RadarGun control node (Master) coordinates multiple
cluster nodes (Slaves). Each slave runs as an independent
process, that handles one of the nodes of the
benchmarked cluster. The Master has the following
responsibilities:
- Parse the configuration (see Configuration section
bellow), and based on that give work to slaves
- The unit of work the master gives to slaves is named
stage
One of the most important purposes of RadarGun is to
support benchmarking of distributed caches/data grids.
Generally speaking, a benchmark on a distributed cache
is performed as follows:
1. A number of nodes are started. A node is an instance
of a distributed cache.
2. RadarGun waits until all these nodes see each other
and form a cluster.
3. Once the cluster is formed, RadarGun will warm up
the cluster: run a set of operations, Get (read) and Put
(write) against each node in the cluster.
4. After the warm up is finished, the actual benchmark is
executed. Each node in the cluster runs the
benchmark and produces a load. Each benchmark
stresses each node and records performance data, e.g.
average write/read duration.
5. The benchmark is iterated over cluster size and
number of load producing threads
6. At each cluster size (2 and 4), the basic-operation-test
is run using 10, 20, and 30 load producing threads.
5. Comparisons
We compare the two products in terms of the followings:
• Comparing Get (reading) performance
• Comparing Put (writing) performance
• Comparing usage (CPU and Memory) performance
and efficiency
5.1. Comparison of Get (read) and Put (write)
Operations
We are using 10 (1st
iteration), 20 (2nd
iteration) and
30 (3rd
iteration) load producing threads with clusters
of 2 and 4 nodes to execute random requests (15000
key/maps) against a default testing cache.
For instance, Table 2. shows the resulting data from
comparing two platforms in third iteration including:
• Requests - Total requests made by test using both
platforms at the same time and iterated 3 times over
10, 20, and 30 load producing treads.
• Mean - Average or mean response time value of each
iteration.
• Std.dev - Is amount of response time deviation from
mean.
• Net/Gross throughput - Is the amount of get or
reading requests per second.
• RTM (Response Time Maximum) at 95% and
99% - Is the maximum response time value in 95th
and 99th
percentiles.
!
1.00E-06
1.00E-05
1.00E-04
1.00E-03
1.00E-02
1.00E-01
1.00E+00
1.00E+01
1.00E+02
1.00E+03
1.00E+04
1.00E+05
1.00E+06
1.00E+07
1.00E+08
1.00E+09
1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 2020
MemoryPrice($/MB)
Year
Historical Cost of Computer Memory and Storage
Main Mamory
Big Drives
SSD
Figure 2: Main memory and storage price development [8, 9, 10, 11, 12]
Table 2: Hazelcast Vs. Infinispan Get (read)
operation on a cluster of 2 nodes
Get (read) 3rd
Iteration
Hazelcast Infinispan
Requests 2999406 2014107
Mean
std.dev
802.27 us
1.14 ms
4.69 us
160.78 us
Net throughput 49979 r/s 33560 r/s
RTM at 95.0% 1.98 ms 5.09 us
RTM at 99.0% 3.67 ms 21.63 us
Infinispan huge std dev. observable in Table 2. Is because
of high conflict between slaves which we will discuss
more in next chapters.
In Figure 3, X-axis represents load producing thread
count 10, 20, 30 labelled as iteration 0, 1, 2 and Y-axis
represents Gross Get (read) throughput as we can see
obvious advantage of Hazelcast over Infinispan.
Figure 3: Reading operation throughput
difference in 2 nodes cluster
Table 3: Hazelcast Vs. Infinispan Put (write) operation
on a cluster of 4 nodes
Put (write) 3rd
Iteration
Hazelcast Infinispan
Requests 329915 161617
Mean
std.dev
7.99 ms
26.04 ms
36.61 ms
137.13 ms
Net throughput 5474 r/s 2624 r/s
RTM at 95.0% 30.93 ms 63.7 ms
RTM at 99.0% 78.12 ms 910.16 ms
As depicted in Figure 4, we can see very big difference
between Hazelcast and Infinispan response times.
Hazelcast is 3, 4, and 5 times (3 iterations) faster than
Infinispan.
Figure 4: Response time difference of writing
operation in 4 nodes cluster size
5.2. Comparisons of CPU and Memory Usage
Raw data from test have been used to create a combined
chart to compare both platforms’ CPU and Memory
usage, for example, Figure 5 is combined CPU usage of
two platforms for a cluster with 2 nodes (slaves) and
Figure 5 is Memory usage for a cluster with 4 nodes.
Calculating average, maximum, total and standard
deviation values of both platforms usage from given data:
(x) = (S1: S2)∑ (2)
Having first slave activity as S1 and second as S2, x
defines summation of both slides from every request.
Average (mean) CPU/Memory usage is computed by
x =
x1 + x2 +...+ xn
n
(3)
Total CPU/ Memory usage is computed by
(T) = (x1 : xn )∑ (4)
And finally, the standard deviation of entire test N giving
the number of requests is computed by
(σ ) =
1
N
(xi −µ)2
i=1
N
∑ (5)
By calculating the summaries of obtained tests results of
both 2 and 4-node cluster regarding the usage of CPU and
Main Memory one can make a good comparison for a
final decision about each platform’s efficiency.
Figure 5 shows the Memory usage timeline of both
platforms in a combined view to give a comprehensive
coverage of whole operation and Table 4 presents the
numerical values of the same operations.
Table 4: Comparing Memory usage of entire process in a
4-node cluster
Hazelcast Infinispan
Average (mean) usage 171 Mb 381 Mb
Maximum Usage 383 Mb 1301 Mb
Total Usage 166586 Mb 196525 Mb
Std.dev 90 240
Elapsed time 00:04:01 00:04:14
Figure 5: Comparing Memory usage of both platforms in a combined view (4 nodes cluster)
Table 5 and 6 show every slave’s activity of using
memory during the operation in a selected period of time
(from 3:56 to 3:59) for Hazelcast and Infinispan. Four
seconds time period used for demonstrating slave
activities is a small sample of bigger image (Figure 5.)
that is showing the entire resource usage for each
platform.
We can clearly distinguish the neat and well-ordered
distribution of duties between slaves in every second in
Hazelcast compared to full contention arrangement of
slave activities in Infinispan considering also some very
big outliers as well according to Table 4.
Table 5: Hazelcast distribution of duties (memory usage)
among slaves between 03:56 and 03:59
Time S1 S2 S3 S4 Total
0:03:56
- 237 - -
836
Mb
229 - - -
- - - 80
- - 290 -
0:03:57
- 316 - -
828
Mb
299 - - -
- - - 158
- - 55 -
0:03:58
- 71 - -
786
Mb
366 - - -
- - - 219
- - 130 -
0:03:59
- 135 - -
735
Mb
121 - - -
- - - 281
- - 198 -
Table 6: Infinispan distribution of duties (memory usage)
among slaves between 03:56 and 03:59
Time S1 S2 S3 S4 Total
0:03:56
292 335 288 168 1083
Mb
0:03:57
348 387 344 222 1301
Mb
0:03:58
398 126 98 277 899
Mb
0:03:59
- - 151 331 806
Mb142 182 - -
6. Overal results and conclusion
Looking at overall results at Table 7 and 8 we can see
Hazelcast has advantage in most of areas in both cluster
sizes (specially in 4 node cluster size) benchmark tests
with less deviations and outliers in response time and
resource usages. One of the main reasons for Hazelcast’s
success would be using efficient asynchronous IO that
each thread owns its partitions, so there is less or no
contention between treads activity.
In conclusion, due to intense competition between big
and small companies in In-Memory sector we can realize
test results released by them may not be as reliable and
comprehensive as what we expected because of many
elements that might affect results such as; testing
environments, testing frameworks and configuration
setups (cluster size, testing time and amount of input
data).
Table 7: Comparing overall result 2-node cluster
3 Iteration avg Hazelcast Infinispan
Reading throughput 47599 request/s 30010 r/s
Mean response time 765 us 499 us
Std.dev/Mean 1,39 89
Writing throughput 11891 r/s 7469 r/s
Mean response time 1056 us 5050 us
Std.dev/Mean 0,85 5,82
Reading throughput 172% 153%
Mean response time 147126 Mb 119532 Mb
Std.dev/Mean 0,15 0,27
Std.dev/Mean 0,56 0,45
Table 8: Comparing overall result 4-node cluster
3 Iteration avg Hazelcast Infinispan
Reading throughput 20869 r/s 10842 r/s
Mean response time 2403 us 1481 us
Std.dev/Mean 4.83 10.71
Writing throughput 5216 r/s 2678 r/s
Mean response time 5186 us 23506 us
Std.dev/Mean 3.43 3.92
Reading throughput 183% 213%
Mean response time 166586 Mb 196525 Mb
Std.dev/Mean 0,26 0,52
Std.dev/Mean 0,52 0,62
For example, testing environment or configuration could
be easily set up or manipulated in favor of a particular
feature or strength of a platform over another due to
marketing targets.
Also by choosing wrong product (In-Memory platform)
we may compromise energy efficiency or resource usage
(CPU and RAM) of our system to achieve better
performance as in case of our experiment Infinispan is
using more resources but still with lesser performance
than that of Hazelcast, specially in 4 node cluster size.
7. References
[1] Akhil Goyal, Bharti, "Study on emerging
implementations of MapReduce", Computing
Communication & Automation (ICCCA) 2015
International Conference on, pp. 16-21, 2015.
[2] Hasso Plattner and Alexander Zeier. In-Memory
Data Management. Second. Springer, 2012.
[3] Xianqiang Bao, Ling Liu, Nong Xiao, Yutong Lu,
Wenqi Cao, "Persistence and Recovery for In-
Memory NoSQL Services: A Measurement
Study", Web Services (ICWS) 2016 IEEE
International Conference on, pp. 530-537, 2016.
[4] J. Gama. Knowledge discovery from data streams.
Chapman & Hall/CRC, 2010
[5] Bifet, Albert. "Mining big data in real
time." Informatica 37.1 (2013).
[6] Illustration by Ryan J Leng. URL:
http://cinf401.artifice.cc/notes/big-data.html
[7] Peter Norvig. URL: http://norvig.com/21-days.html -
answers
[8] Jr. M. Phister. Data Processing Technology and
Economics. Santa Monica Publishing Co., 1976.
[9] John C. McCallum. Price-Performance of Computer
Technology, Chapter 4 in The Computer
Engineering Handbook. Second. CRC Press, 2002.
[10] John C. McCallum. Complete List of references.
URL: http://www.jcmit.net/references.htm
[11] John C. McCallum. Memory Prices (1957-2016).
URL: http://www.jcmit.net/memoryprice.htm
[12] John C. McCallum. Disk Drive Prices (1955-2016).
URL: http://www.jcmit.net/diskprice.htm
[13] Gordon E. (1965-04-19) Moore. "Cramming more
components onto integrated circuits" (PDF).
Electronics. Retrieved 2011-08-22.
[14] About Hazelcast. 2016. URL:
https://hazelcast.com/company/about/.
[15] About Infinispan 2017 URL:
http://infinispan.org/about/

More Related Content

What's hot

Chapter 8 : Memory
Chapter 8 : MemoryChapter 8 : Memory
Chapter 8 : MemoryAmin Omi
 
Memory management
Memory managementMemory management
Memory managementImran Khan
 
Operating Systems - memory management
Operating Systems - memory managementOperating Systems - memory management
Operating Systems - memory managementMukesh Chinta
 
Computer memory management
Computer memory managementComputer memory management
Computer memory managementKumar
 
data deduplication
data deduplicationdata deduplication
data deduplicationssuser1eca7d
 
Paging and Segmentation
Paging and SegmentationPaging and Segmentation
Paging and Segmentationsathish sak
 
Introduction of Memory Management
Introduction of Memory Management Introduction of Memory Management
Introduction of Memory Management Maitree Patel
 
Understanding memory management
Understanding memory managementUnderstanding memory management
Understanding memory managementGokul Vasan
 
Memory management early_systems
Memory management early_systemsMemory management early_systems
Memory management early_systemsMybej Che
 

What's hot (19)

Memory management
Memory managementMemory management
Memory management
 
Chapter 8 : Memory
Chapter 8 : MemoryChapter 8 : Memory
Chapter 8 : Memory
 
Main Memory
Main MemoryMain Memory
Main Memory
 
Pnuts Review
Pnuts ReviewPnuts Review
Pnuts Review
 
OS_Ch9
OS_Ch9OS_Ch9
OS_Ch9
 
Memory management
Memory managementMemory management
Memory management
 
Virtual memory
Virtual memoryVirtual memory
Virtual memory
 
Memory management
Memory managementMemory management
Memory management
 
Memory Management
Memory ManagementMemory Management
Memory Management
 
Memory management
Memory managementMemory management
Memory management
 
Operating Systems - memory management
Operating Systems - memory managementOperating Systems - memory management
Operating Systems - memory management
 
Computer memory management
Computer memory managementComputer memory management
Computer memory management
 
data deduplication
data deduplicationdata deduplication
data deduplication
 
Memory Management
Memory ManagementMemory Management
Memory Management
 
Paging and Segmentation
Paging and SegmentationPaging and Segmentation
Paging and Segmentation
 
Memory managment
Memory managmentMemory managment
Memory managment
 
Introduction of Memory Management
Introduction of Memory Management Introduction of Memory Management
Introduction of Memory Management
 
Understanding memory management
Understanding memory managementUnderstanding memory management
Understanding memory management
 
Memory management early_systems
Memory management early_systemsMemory management early_systems
Memory management early_systems
 

Similar to Comparison of In-memory Data Platforms

Performance Tuning
Performance TuningPerformance Tuning
Performance TuningJannet Peetz
 
Performance Review of Zero Copy Techniques
Performance Review of Zero Copy TechniquesPerformance Review of Zero Copy Techniques
Performance Review of Zero Copy TechniquesCSCJournals
 
In memory big data management and processing a survey
In memory big data management and processing a surveyIn memory big data management and processing a survey
In memory big data management and processing a surveyredpel dot com
 
Insiders Guide- Managing Storage Performance
Insiders Guide- Managing Storage PerformanceInsiders Guide- Managing Storage Performance
Insiders Guide- Managing Storage PerformanceDataCore Software
 
I-Sieve: An inline High Performance Deduplication System Used in cloud storage
I-Sieve: An inline High Performance Deduplication System Used in cloud storageI-Sieve: An inline High Performance Deduplication System Used in cloud storage
I-Sieve: An inline High Performance Deduplication System Used in cloud storageredpel dot com
 
Positioning IBM Flex System 16 Gb Fibre Channel Fabric for Storage-Intensive ...
Positioning IBM Flex System 16 Gb Fibre Channel Fabric for Storage-Intensive ...Positioning IBM Flex System 16 Gb Fibre Channel Fabric for Storage-Intensive ...
Positioning IBM Flex System 16 Gb Fibre Channel Fabric for Storage-Intensive ...IBM India Smarter Computing
 
Applications of parellel computing
Applications of parellel computingApplications of parellel computing
Applications of parellel computingpbhopi
 
Steps to Modernize Your Data Ecosystem | Mindtree
Steps to Modernize Your Data Ecosystem | Mindtree									Steps to Modernize Your Data Ecosystem | Mindtree
Steps to Modernize Your Data Ecosystem | Mindtree AnikeyRoy
 
Six Steps to Modernize Your Data Ecosystem - Mindtree
Six Steps to Modernize Your Data Ecosystem  - MindtreeSix Steps to Modernize Your Data Ecosystem  - Mindtree
Six Steps to Modernize Your Data Ecosystem - Mindtreesamirandev1
 
6 Steps to Modernize Data Ecosystem with Mindtree
6 Steps to Modernize Data Ecosystem with Mindtree6 Steps to Modernize Data Ecosystem with Mindtree
6 Steps to Modernize Data Ecosystem with Mindtreedevraajsingh
 
Steps to Modernize Your Data Ecosystem with Mindtree Blog
Steps to Modernize Your Data Ecosystem with Mindtree Blog Steps to Modernize Your Data Ecosystem with Mindtree Blog
Steps to Modernize Your Data Ecosystem with Mindtree Blog sameerroshan
 
Combining IBM Real-time Compression and IBM ProtecTIER Deduplication
Combining IBM Real-time Compression and IBM ProtecTIER DeduplicationCombining IBM Real-time Compression and IBM ProtecTIER Deduplication
Combining IBM Real-time Compression and IBM ProtecTIER DeduplicationIBM India Smarter Computing
 
Complete data analysis faster with Google Cloud C3 high CPU instances enabled...
Complete data analysis faster with Google Cloud C3 high CPU instances enabled...Complete data analysis faster with Google Cloud C3 high CPU instances enabled...
Complete data analysis faster with Google Cloud C3 high CPU instances enabled...Principled Technologies
 
hadoop seminar training report
hadoop seminar  training reporthadoop seminar  training report
hadoop seminar training reportSarvesh Meena
 
Mongo db pefrormance tuning with MMS
Mongo db pefrormance tuning with MMSMongo db pefrormance tuning with MMS
Mongo db pefrormance tuning with MMSronwarshawsky
 
A Study Review of Common Big Data Architecture for Small-Medium Enterprise
A Study Review of Common Big Data Architecture for Small-Medium EnterpriseA Study Review of Common Big Data Architecture for Small-Medium Enterprise
A Study Review of Common Big Data Architecture for Small-Medium EnterpriseRidwan Fadjar
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackAnant Corporation
 
IRJET - The 3-Level Database Architectural Design for OLAP and OLTP Ops
IRJET - The 3-Level Database Architectural Design for OLAP and OLTP OpsIRJET - The 3-Level Database Architectural Design for OLAP and OLTP Ops
IRJET - The 3-Level Database Architectural Design for OLAP and OLTP OpsIRJET Journal
 
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...Prolifics
 

Similar to Comparison of In-memory Data Platforms (20)

Performance Tuning
Performance TuningPerformance Tuning
Performance Tuning
 
Performance Review of Zero Copy Techniques
Performance Review of Zero Copy TechniquesPerformance Review of Zero Copy Techniques
Performance Review of Zero Copy Techniques
 
In memory big data management and processing a survey
In memory big data management and processing a surveyIn memory big data management and processing a survey
In memory big data management and processing a survey
 
Insiders Guide- Managing Storage Performance
Insiders Guide- Managing Storage PerformanceInsiders Guide- Managing Storage Performance
Insiders Guide- Managing Storage Performance
 
I-Sieve: An inline High Performance Deduplication System Used in cloud storage
I-Sieve: An inline High Performance Deduplication System Used in cloud storageI-Sieve: An inline High Performance Deduplication System Used in cloud storage
I-Sieve: An inline High Performance Deduplication System Used in cloud storage
 
Facade
FacadeFacade
Facade
 
Positioning IBM Flex System 16 Gb Fibre Channel Fabric for Storage-Intensive ...
Positioning IBM Flex System 16 Gb Fibre Channel Fabric for Storage-Intensive ...Positioning IBM Flex System 16 Gb Fibre Channel Fabric for Storage-Intensive ...
Positioning IBM Flex System 16 Gb Fibre Channel Fabric for Storage-Intensive ...
 
Applications of parellel computing
Applications of parellel computingApplications of parellel computing
Applications of parellel computing
 
Steps to Modernize Your Data Ecosystem | Mindtree
Steps to Modernize Your Data Ecosystem | Mindtree									Steps to Modernize Your Data Ecosystem | Mindtree
Steps to Modernize Your Data Ecosystem | Mindtree
 
Six Steps to Modernize Your Data Ecosystem - Mindtree
Six Steps to Modernize Your Data Ecosystem  - MindtreeSix Steps to Modernize Your Data Ecosystem  - Mindtree
Six Steps to Modernize Your Data Ecosystem - Mindtree
 
6 Steps to Modernize Data Ecosystem with Mindtree
6 Steps to Modernize Data Ecosystem with Mindtree6 Steps to Modernize Data Ecosystem with Mindtree
6 Steps to Modernize Data Ecosystem with Mindtree
 
Steps to Modernize Your Data Ecosystem with Mindtree Blog
Steps to Modernize Your Data Ecosystem with Mindtree Blog Steps to Modernize Your Data Ecosystem with Mindtree Blog
Steps to Modernize Your Data Ecosystem with Mindtree Blog
 
Combining IBM Real-time Compression and IBM ProtecTIER Deduplication
Combining IBM Real-time Compression and IBM ProtecTIER DeduplicationCombining IBM Real-time Compression and IBM ProtecTIER Deduplication
Combining IBM Real-time Compression and IBM ProtecTIER Deduplication
 
Complete data analysis faster with Google Cloud C3 high CPU instances enabled...
Complete data analysis faster with Google Cloud C3 high CPU instances enabled...Complete data analysis faster with Google Cloud C3 high CPU instances enabled...
Complete data analysis faster with Google Cloud C3 high CPU instances enabled...
 
hadoop seminar training report
hadoop seminar  training reporthadoop seminar  training report
hadoop seminar training report
 
Mongo db pefrormance tuning with MMS
Mongo db pefrormance tuning with MMSMongo db pefrormance tuning with MMS
Mongo db pefrormance tuning with MMS
 
A Study Review of Common Big Data Architecture for Small-Medium Enterprise
A Study Review of Common Big Data Architecture for Small-Medium EnterpriseA Study Review of Common Big Data Architecture for Small-Medium Enterprise
A Study Review of Common Big Data Architecture for Small-Medium Enterprise
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data Stack
 
IRJET - The 3-Level Database Architectural Design for OLAP and OLTP Ops
IRJET - The 3-Level Database Architectural Design for OLAP and OLTP OpsIRJET - The 3-Level Database Architectural Design for OLAP and OLTP Ops
IRJET - The 3-Level Database Architectural Design for OLAP and OLTP Ops
 
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
 

Recently uploaded

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computationsit20ad004
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 

Recently uploaded (20)

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computation
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 

Comparison of In-memory Data Platforms

  • 1. Comparison of In-Memory Data Platforms Amirmahdi Akbari, Hasan Dağ Kadir Has University {amirmahdi.akbar, hasan.dag}@khas.edu.tr Unstoppable growth of data has prompted us to process and analyze big data as quick as possible. One of these approaches is using RAM (Random Access Memory) as the main accessing device instead of traditional disks. The major focus on this study is a comparison of two well-known platforms (Hazelcast 3.0 and Infinispan 6.0) respectively by their "Read" and "Write" performance and efficiency of resource usage such as CPU and Main Memory to determine a general understanding of what the most important elements are or conditions to choose between in-memory platforms. 1. Introduction Todays’ data growth by no means could have been predicted just a couple of decades ago. For example, the data produced from the down of time up until the year 2003, can now be produced just in two days. The data mainly comes from social media, all types of transactions, logs, Internet of Things to name a few. The terms; speed, volume, and the versatility are used to describe the big data. Some argue that veracity and value must also be used in the description of big data, referring to 5v. What is important is the fact that data mining of this data in real-time for all types of business functions. When the real time big data in its full extent is seen to be almost impossible, the increasing demands of big data applications have led researchers and practitioners to turn to in-memory computing to speed processing. For instance, the Apache Spark framework stores intermediate results in memory to deliver good performance on iterative machine learning and interactive data analysis tasks [1]. Among most important reasons to using In-Memory data platforms are analyzing huge amount of cellular data to enhance security of communications, providing quick fraud detection in banking and financing and real-time remarketing and retargeting such as Facebook’s Ad exchange program known as FBX. Before, going into the details of in-memory data platforms let us provide an analogy for it to understand it better. 2. In-Memory processing Growing main memory capacity has fueled the development of in-memory big data management and processing. By eliminating disk I/O bottleneck, it is now possible to support interactive data analytics [3]. Nowadays, the quantity of data that is created every two days is estimated to be 5 Exabyte. This amount of data is similar to the amount of data created from the dawn of time up until 2003. Moreover, it was estimated that 2007 was the first year in which it was not possible to store all the data that we are producing. This massive amount of data opens new challenging discovery tasks [4]. Data stream real time analytics are needed to manage the data currently generated, at an ever-increasing rate, from such applications as: sensor networks, measurements in network monitoring and traffic management, log records or click-streams in web exploring, manufacturing processes, call detail records, email, blogging, twitter posts and others [5]. The idea of In-Memory is to keep the needed data as close to as to the CPU. One of classic problem of traditional computing is that the data comes from disk. Although increased in capacity and therefore the price per gigabyte and terabyte is dropped, disk storage performance has in no point increased. 2.1. Factors that Made In-Memory so Popular Several factors made in memory so popular such as increasing demand for real time analytics for security and advertisement goals, recent software developments and speed differences between memory and disk, which we are going through some of them in more details: 2.1.1. Need for Real-time OLAP Online analytical processing is one of most important reasons to engage In-Memory and combination of recent technologies enabling iterative links between the real- time analysis of data for the prediction of business trends and execution of business decisions immediately. 2.1.2. Speed of Main Memory As we can see in Table 1., reading 1 MB sequentially from Memory is more than 100 times faster than that of Disk and about four times faster than that of SSD. Table 1: Access and read times for disk, SSD and main memory [7] Action Time Main memory access 100 ns Read 1 MB from memory 250,000 ns Read 1 MB from SSD 1,000,000 ns Disk seek 10,000,000 ns Read 1 MB from disk 20,000,000 ns
  • 2. 2.1.3. Reduced Costs and Growing Capacity of Memory Main memory, as the primary storage location is becoming increasingly attractive because of the decreasing cost/size ratio. As shown in Fig 1., we can see main memory price development over the past years. 2.1.4. Energy efficiency by Employing In-Memory Technology Energy consumption is another important factor for data center operations. According to a benchmark discussed in the book “In-Memory Data Management-Technology and Applications” The in-memory configuration offers the best performance and consumes the least amount of power among the tested configurations. Only the configuration with 100 parallel disks provides a throughput near the throughput observed on the main memory variant. The 100 disks variant consumes more than three times the power than the main memory variant [2]. 2.1.5. Software Development Development of software technologies such as: • Columnar storage • Insert only (Append only file) • Compression and Parallelization Also have remarkable effects to convince data centered organizations to use or consider using In-Memory for processing and analyzing data. max.speedup(N) = 1 (1− p)+ P N (1) Equation (1) defines Amdahl’s law where P is the fraction of the code that can be processed in parallel and N is the number of CPU cores that is the level of parallelism in the program. Parallelizability of codes in applications is another good reason to get as much advantage as possible from speed of In-Memory processing. 3. Comparison of two In-Memory platforms – (Hazelcast 3.0 Vs. Infinispan 6.0) Hazelcast is the leading provider of operational in- memory computing with tens of thousands of installed clusters and over 16 million servers starting each month. It is free and open source software provided under the Apache 2 license. Organizations are encouraged to freely download Hazelcast and initiate proof-of-concept (POC), and even go into production deployment on open source Hazelcast [14]. On the other hand, Infinispan is also famous for its near cache processing and is an extremely scalable, highly available key/value data store and data grid platform. It is 100% open source and written in Java. The purpose of Infinispan is to expose a data structure that is distributed, highly concurrent, and designed ground-up to make the most of modern multi-processor and multi-core architectures. It is often used as a distributed cache, but also as a NoSQL key/value store or object database [15]. ! Figure 1: The speed, price and the capacity comparison of storage units [6].
  • 3. 4. Benchmark details and configuration This is a comparison between two and four server Infinispan 6.0 cluster and a two and four server Hazelcast 3.0 cluster, prepared using the standard caching benchmarking tool RadarGun. In order to be able to benchmark distributed caches RadarGun is using a master/slave architecture in which the RadarGun control node (Master) coordinates multiple cluster nodes (Slaves). Each slave runs as an independent process, that handles one of the nodes of the benchmarked cluster. The Master has the following responsibilities: - Parse the configuration (see Configuration section bellow), and based on that give work to slaves - The unit of work the master gives to slaves is named stage One of the most important purposes of RadarGun is to support benchmarking of distributed caches/data grids. Generally speaking, a benchmark on a distributed cache is performed as follows: 1. A number of nodes are started. A node is an instance of a distributed cache. 2. RadarGun waits until all these nodes see each other and form a cluster. 3. Once the cluster is formed, RadarGun will warm up the cluster: run a set of operations, Get (read) and Put (write) against each node in the cluster. 4. After the warm up is finished, the actual benchmark is executed. Each node in the cluster runs the benchmark and produces a load. Each benchmark stresses each node and records performance data, e.g. average write/read duration. 5. The benchmark is iterated over cluster size and number of load producing threads 6. At each cluster size (2 and 4), the basic-operation-test is run using 10, 20, and 30 load producing threads. 5. Comparisons We compare the two products in terms of the followings: • Comparing Get (reading) performance • Comparing Put (writing) performance • Comparing usage (CPU and Memory) performance and efficiency 5.1. Comparison of Get (read) and Put (write) Operations We are using 10 (1st iteration), 20 (2nd iteration) and 30 (3rd iteration) load producing threads with clusters of 2 and 4 nodes to execute random requests (15000 key/maps) against a default testing cache. For instance, Table 2. shows the resulting data from comparing two platforms in third iteration including: • Requests - Total requests made by test using both platforms at the same time and iterated 3 times over 10, 20, and 30 load producing treads. • Mean - Average or mean response time value of each iteration. • Std.dev - Is amount of response time deviation from mean. • Net/Gross throughput - Is the amount of get or reading requests per second. • RTM (Response Time Maximum) at 95% and 99% - Is the maximum response time value in 95th and 99th percentiles. ! 1.00E-06 1.00E-05 1.00E-04 1.00E-03 1.00E-02 1.00E-01 1.00E+00 1.00E+01 1.00E+02 1.00E+03 1.00E+04 1.00E+05 1.00E+06 1.00E+07 1.00E+08 1.00E+09 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 2020 MemoryPrice($/MB) Year Historical Cost of Computer Memory and Storage Main Mamory Big Drives SSD Figure 2: Main memory and storage price development [8, 9, 10, 11, 12]
  • 4. Table 2: Hazelcast Vs. Infinispan Get (read) operation on a cluster of 2 nodes Get (read) 3rd Iteration Hazelcast Infinispan Requests 2999406 2014107 Mean std.dev 802.27 us 1.14 ms 4.69 us 160.78 us Net throughput 49979 r/s 33560 r/s RTM at 95.0% 1.98 ms 5.09 us RTM at 99.0% 3.67 ms 21.63 us Infinispan huge std dev. observable in Table 2. Is because of high conflict between slaves which we will discuss more in next chapters. In Figure 3, X-axis represents load producing thread count 10, 20, 30 labelled as iteration 0, 1, 2 and Y-axis represents Gross Get (read) throughput as we can see obvious advantage of Hazelcast over Infinispan. Figure 3: Reading operation throughput difference in 2 nodes cluster Table 3: Hazelcast Vs. Infinispan Put (write) operation on a cluster of 4 nodes Put (write) 3rd Iteration Hazelcast Infinispan Requests 329915 161617 Mean std.dev 7.99 ms 26.04 ms 36.61 ms 137.13 ms Net throughput 5474 r/s 2624 r/s RTM at 95.0% 30.93 ms 63.7 ms RTM at 99.0% 78.12 ms 910.16 ms As depicted in Figure 4, we can see very big difference between Hazelcast and Infinispan response times. Hazelcast is 3, 4, and 5 times (3 iterations) faster than Infinispan. Figure 4: Response time difference of writing operation in 4 nodes cluster size 5.2. Comparisons of CPU and Memory Usage Raw data from test have been used to create a combined chart to compare both platforms’ CPU and Memory usage, for example, Figure 5 is combined CPU usage of two platforms for a cluster with 2 nodes (slaves) and Figure 5 is Memory usage for a cluster with 4 nodes. Calculating average, maximum, total and standard deviation values of both platforms usage from given data: (x) = (S1: S2)∑ (2) Having first slave activity as S1 and second as S2, x defines summation of both slides from every request. Average (mean) CPU/Memory usage is computed by x = x1 + x2 +...+ xn n (3) Total CPU/ Memory usage is computed by (T) = (x1 : xn )∑ (4) And finally, the standard deviation of entire test N giving the number of requests is computed by (σ ) = 1 N (xi −µ)2 i=1 N ∑ (5) By calculating the summaries of obtained tests results of both 2 and 4-node cluster regarding the usage of CPU and Main Memory one can make a good comparison for a final decision about each platform’s efficiency. Figure 5 shows the Memory usage timeline of both platforms in a combined view to give a comprehensive coverage of whole operation and Table 4 presents the numerical values of the same operations. Table 4: Comparing Memory usage of entire process in a 4-node cluster Hazelcast Infinispan Average (mean) usage 171 Mb 381 Mb Maximum Usage 383 Mb 1301 Mb Total Usage 166586 Mb 196525 Mb Std.dev 90 240 Elapsed time 00:04:01 00:04:14
  • 5. Figure 5: Comparing Memory usage of both platforms in a combined view (4 nodes cluster) Table 5 and 6 show every slave’s activity of using memory during the operation in a selected period of time (from 3:56 to 3:59) for Hazelcast and Infinispan. Four seconds time period used for demonstrating slave activities is a small sample of bigger image (Figure 5.) that is showing the entire resource usage for each platform. We can clearly distinguish the neat and well-ordered distribution of duties between slaves in every second in Hazelcast compared to full contention arrangement of slave activities in Infinispan considering also some very big outliers as well according to Table 4. Table 5: Hazelcast distribution of duties (memory usage) among slaves between 03:56 and 03:59 Time S1 S2 S3 S4 Total 0:03:56 - 237 - - 836 Mb 229 - - - - - - 80 - - 290 - 0:03:57 - 316 - - 828 Mb 299 - - - - - - 158 - - 55 - 0:03:58 - 71 - - 786 Mb 366 - - - - - - 219 - - 130 - 0:03:59 - 135 - - 735 Mb 121 - - - - - - 281 - - 198 - Table 6: Infinispan distribution of duties (memory usage) among slaves between 03:56 and 03:59 Time S1 S2 S3 S4 Total 0:03:56 292 335 288 168 1083 Mb 0:03:57 348 387 344 222 1301 Mb 0:03:58 398 126 98 277 899 Mb 0:03:59 - - 151 331 806 Mb142 182 - -
  • 6. 6. Overal results and conclusion Looking at overall results at Table 7 and 8 we can see Hazelcast has advantage in most of areas in both cluster sizes (specially in 4 node cluster size) benchmark tests with less deviations and outliers in response time and resource usages. One of the main reasons for Hazelcast’s success would be using efficient asynchronous IO that each thread owns its partitions, so there is less or no contention between treads activity. In conclusion, due to intense competition between big and small companies in In-Memory sector we can realize test results released by them may not be as reliable and comprehensive as what we expected because of many elements that might affect results such as; testing environments, testing frameworks and configuration setups (cluster size, testing time and amount of input data). Table 7: Comparing overall result 2-node cluster 3 Iteration avg Hazelcast Infinispan Reading throughput 47599 request/s 30010 r/s Mean response time 765 us 499 us Std.dev/Mean 1,39 89 Writing throughput 11891 r/s 7469 r/s Mean response time 1056 us 5050 us Std.dev/Mean 0,85 5,82 Reading throughput 172% 153% Mean response time 147126 Mb 119532 Mb Std.dev/Mean 0,15 0,27 Std.dev/Mean 0,56 0,45 Table 8: Comparing overall result 4-node cluster 3 Iteration avg Hazelcast Infinispan Reading throughput 20869 r/s 10842 r/s Mean response time 2403 us 1481 us Std.dev/Mean 4.83 10.71 Writing throughput 5216 r/s 2678 r/s Mean response time 5186 us 23506 us Std.dev/Mean 3.43 3.92 Reading throughput 183% 213% Mean response time 166586 Mb 196525 Mb Std.dev/Mean 0,26 0,52 Std.dev/Mean 0,52 0,62 For example, testing environment or configuration could be easily set up or manipulated in favor of a particular feature or strength of a platform over another due to marketing targets. Also by choosing wrong product (In-Memory platform) we may compromise energy efficiency or resource usage (CPU and RAM) of our system to achieve better performance as in case of our experiment Infinispan is using more resources but still with lesser performance than that of Hazelcast, specially in 4 node cluster size. 7. References [1] Akhil Goyal, Bharti, "Study on emerging implementations of MapReduce", Computing Communication & Automation (ICCCA) 2015 International Conference on, pp. 16-21, 2015. [2] Hasso Plattner and Alexander Zeier. In-Memory Data Management. Second. Springer, 2012. [3] Xianqiang Bao, Ling Liu, Nong Xiao, Yutong Lu, Wenqi Cao, "Persistence and Recovery for In- Memory NoSQL Services: A Measurement Study", Web Services (ICWS) 2016 IEEE International Conference on, pp. 530-537, 2016. [4] J. Gama. Knowledge discovery from data streams. Chapman & Hall/CRC, 2010 [5] Bifet, Albert. "Mining big data in real time." Informatica 37.1 (2013). [6] Illustration by Ryan J Leng. URL: http://cinf401.artifice.cc/notes/big-data.html [7] Peter Norvig. URL: http://norvig.com/21-days.html - answers [8] Jr. M. Phister. Data Processing Technology and Economics. Santa Monica Publishing Co., 1976. [9] John C. McCallum. Price-Performance of Computer Technology, Chapter 4 in The Computer Engineering Handbook. Second. CRC Press, 2002. [10] John C. McCallum. Complete List of references. URL: http://www.jcmit.net/references.htm [11] John C. McCallum. Memory Prices (1957-2016). URL: http://www.jcmit.net/memoryprice.htm [12] John C. McCallum. Disk Drive Prices (1955-2016). URL: http://www.jcmit.net/diskprice.htm [13] Gordon E. (1965-04-19) Moore. "Cramming more components onto integrated circuits" (PDF). Electronics. Retrieved 2011-08-22. [14] About Hazelcast. 2016. URL: https://hazelcast.com/company/about/. [15] About Infinispan 2017 URL: http://infinispan.org/about/