SlideShare a Scribd company logo
1 of 83
From cache to in-memory data grid. 
Introduction to Hazelcast. 
By Taras Matyashovsky
Introduction
About me 
• Software engineer/TL 
• Worked for outsource companies, product 
companies and tried myself in startups/ 
freelancing 
• 7+ years production Java experience 
• Fan of Agile methodologies, CSM
What? 
• This presentation: 
• covers basics of caching and popular cache 
types 
• explains evolution from simple cache to 
distributed, and from distributed to IMDG 
• not describes usage of NoSQL solutions for 
caching 
• is not intended for products comparison or 
for promotion of Hazelcast as the best 
solution
Why? 
• to expand horizons regarding modern 
distributed architectures and solutions 
• to share experience from my current 
project where Infinispan was replaced 
with Hazelcast as in-memory distributed 
cache solution
Agenda 
1st part: 
• Why software caches? 
• Common cache attributes 
• Cache access patterns 
• Cache types 
• Distributed cache vs. IMDG
Agenda 
2nd part: 
• Hazelcast in a nutshell 
• Hazelcast configuration 
• Live demo sessions 
• in-memory distributed cache 
• write-through cache with Postgres as storage 
• search in distributed cache 
• parallel processing using executor service and entry 
processor 
• Infinispan vs. Hazelcast 
• Best practices and personal recommendations
Caching Basics
Why Software Caching? 
• application performance: 
• many concurrent users 
• time and costs overhead to access 
application’s data stored in RDBMS or file 
system 
• database-access bottlenecks caused by too 
many simultaneous requests
So Software Caches 
• improve response times by reducing data access 
latency 
• offload persistent storages by reducing number 
of trips to data sources 
• avoid the cost of repeatedly creating objects 
• share objects between threads 
• only work for IO-bound applications
So Software Caches 
are essential for modern 
high-loaded applications
But 
• memory size 
• is limited 
• can become unacceptably huge 
• synchronization complexity 
• consistency between the cached data state 
and data source’s original data 
• durability 
• correct cache invalidation 
• scalability
Common Cache Attributes 
• maximum size, e.g. quantity of entries 
• cache algorithm used for invalidation/eviction, 
e.g.: 
• least recently used (LRU) 
• least frequently used (LFU) 
• FIFO 
• eviction percentage 
• expiration, e.g.: 
• time-to-live (TTL) 
• absolute/relative time-based expiration
Cache Access Patterns 
• cache aside 
• read-through 
• refresh-ahead 
• write-through 
• write-behind
Cache Aside Pattern 
• application is responsible for reading and writing 
from the storage and the cache doesn't interact 
with the storage at all 
• the cache is “kept aside” as a faster and more 
scalable in-memory data store 
Client 
Cache 
Storage
Read-Through/Write-Through 
• the application treats cache as the main data 
store and reads/writes data from/to it 
• the cache is responsible for reading and writing 
this data to the database 
Client Cache Storage
Write-Behind Pattern 
• modified cache entries are asynchronously 
written to the storage after a configurable delay 
Client Cache Storage
Refresh-Ahead Pattern 
• automatically and asynchronously reload 
(refresh) any recently accessed cache entry from 
the cache loader prior to its expiration 
Client Cache Storage
Cache Strategy Selection 
RT/WT vs. cache-aside: 
• RT/WT simplifies application code 
• cache-aside may have blocking behavior 
• cache-aside may be preferable when there are 
multiple cache updates triggered to the same 
storage from different cache servers
Cache Strategy Selection 
Write-through vs. write-behind: 
• write-behind caching may deliver considerably 
higher throughput and reduced latency 
compared to write-through caching 
• implication of write-behind caching is that 
database updates occur outside of the cache 
transaction 
• write-behind transaction can conflict with an 
external update
Cache Types
Cache Types 
• local cache 
• replicated cache 
• distributed cache 
• remote cache 
• near cache
Local Cache 
a cache that is local to 
(completely contained within) 
a particular cluster node
Local Cache 
Pros: 
• simplicity 
• performance 
• no serialization/deserialization overhead 
Cons: 
• not a fault-tolerant 
• scalability
Local Cache 
Solutions: 
• EhCache 
• Google Guava 
• Infinispan local cache mode
Replicated Cache 
a cache that replicates its data 
to all cluster nodes
Get in Replicated Cache 
Each cluster node (JVM) accesses the data from its 
own memory, i.e. local read:
Put in Replicated Cache 
Pushing the new version of the data to all other 
cluster nodes:
Replicated Cache 
Pros: 
• best read performance 
• fault–tolerant 
• linear performance scalability for reads 
Cons: 
• poor write performance 
• additional network load 
• poor and limited scalability for writes 
• memory consumption
Replicated Cache 
Solutions: 
• open-source: 
• Infinispan 
• commercial: 
• Oracle Coherence 
• EhCache + Terracota
Distributed Cache 
a cache that partitions its data 
among all cluster nodes
Get in Distributed Cache 
Access often must go over the network to another 
cluster node:
Put in Distributed Cache 
Resolving known limitation of replicated cache:
Put in Distributed Cache 
• the data is being sent to a primary cluster node 
and a backup cluster node if backup count is 1 
• modifications to the cache are not considered 
complete until all backups have acknowledged 
receipt of the modification, i.e. slight 
performance penalty 
• such overhead guarantees that data consistency 
is maintained and no data is lost
Failover in Distributed Cache 
Failover involves promoting backup data to be 
primary storage:
Local Storage in Distributed Cache 
Certain cluster nodes can be configured to store 
data, and others to be configured to not store 
data:
Distributed Cache 
Pros: 
• linear performance scalability for reads and 
writes 
• fault-tolerant 
Cons: 
• increased latency of reads (due to network 
round-trip and serialization/deserialization 
expenses)
Distributed Cache Summary 
Distributed in-memory key/value stores 
supports a simple set of “put” and “get” 
operations and optionally read-through and 
write-through behavior for writing and 
reading values to and from underlying 
disk-based storage such as an RDBMS
Distributed Cache Summary 
Depending on the product additional 
features like: 
• ACID transactions 
• eviction policies 
• replication vs. partitioning 
• active backups 
also became available as the products 
matured
Distributed Cache 
Solutions: 
• open-source: 
• Infinispan 
• Hazelcast 
• NoSQL storages, e.g. Redis, Cassandra, 
MongoDB, etc. 
• commercial: 
• Oracle Coherence 
• Terracota
Remote Cache 
a cache that is located remotely and 
should be accessed by a client(s)
Remote Cache 
Majority of existing distributed/replicated 
caches solutions support 2 modes: 
• embedded mode 
• when cache instance is started within the same JVM 
as your application 
• client-server mode 
• when remote cache instance is started and clients 
connect to it using a variety of different protocols
Remote Cache 
Solutions: 
• Infinispan remote cache mode 
• Hazelcast client-server mode 
• Memcached
Near Cache 
a hybrid cache; 
it typically fronts a distributed cache or a 
remote cache with a local cache
Get in Near Cache 
When an object is fetched from remote node, it is 
put to local cache, so subsequent requests are 
handled by local node retrieving from local cache:
Near Cache 
Pros: 
• it is best used for read only data 
Cons: 
• increases memory usage since the near cache 
items need to be stored in the memory of the 
member 
• reduces consistency
In-memory Data Grid
In-memory Data Grid (IMDG)
In-memory Data Grid 
In-memory distributed cache plus: 
• ability to support co-location of computations 
with data in a distributed context and move 
computation to data 
• distributed MPP processing based on standard 
SQL and/or Map/Reduce, that allows to 
effectively compute over data stored in-memory 
across the cluster
IMDC vs. IMDG 
• in-memory distributed caches were 
developed in response to a growing need 
for data high-availability 
• in-memory data grids were developed to 
respond to the growing complexities of 
data processing
IMDG in a nutshell 
Adding distributed SQL and/or MapReduce 
type processing required a complete 
re-thinking of distributed caches, as focus 
has shifted from pure data management to 
hybrid data and compute management
In-memory Data Grid Solutions
Hazelcast
Hazelcast 
The leading open source 
in-memory data grid 
free alternative to proprietary solutions, 
such as Oracle Coherence, 
VMWare Pivotal Gemfire and 
Software AG Terracotta
Hazelcast Use-Cases 
• scale your application 
• share data across cluster 
• partition your data 
• balance the load 
• send/receive messages 
• process in parallel on many JVMs, i.e. MPP
Hazelcast Features 
• dynamic clustering, backup, discovery, 
fail-over 
• distributed map, queue, set, list, lock, 
semaphore, topic, executor service, etc. 
• transaction support 
• map/reduce API 
• Java client for accessing the cluster 
remotely
Hazelcast Configuration 
• programmatic configuration 
• XML configuration 
• Spring configuration 
Nuance: 
It is very important that the configuration on all 
members in the cluster is exactly the same, 
it doesn’t matter if you use the XML based 
configuration or the programmatic configuration.
Sample Application
Live Demo “Configuration”
Sample Application 
Technologies: 
• Spring Boot 1.0.1 
• Hazelcast 3.2 
• Postgres 9.3 
Application: 
• RESTful web service to get/put data from/to cache 
• RESTful web service to execute tasks in the cluster 
• one Instance of Hazelcast per application 
* Some samples are not optimal and created just to demonstrate usage of existing Hazelcast API
Global Hazelcast Configuration 
Defined global Hazelcast configuration in separate 
config in common module. It contains skeleton for 
future Hazelcast instance as well as global 
configuration settings: 
• instance configuration skeleton 
• common properties 
• group name and password 
• TCP based network configuration 
• join config 
• multicast and TCP/IP config 
• default distributed map configuration skeleton
Hazelcast Instance 
Each module that uses Hazelcast for distributed 
cache should have its own separate Hazelcast 
instance. 
The “Hazelcast Instance” is a factory for creating 
individual cache objects. 
Each cache has a name and potentially distinct 
configuration settings (expiration, eviction, 
replication, and more). 
Multiple instances can live within the same JVM.
Hazelcast Cluster Group 
Groups are used in order to have multiple isolated 
clusters on the same network instead of a single 
cluster. 
JVM can host multiple Hazelcast instances (nodes). 
Each node can only participate in one group and it 
only joins to its own group, does not mess with 
others. 
In order to achieve this group name and group 
password configuration properties are used.
Hazelcast Network Config 
In our environment multicast mechanism for 
joining the cluster is not supported, so only TCP/IP-cluster 
approach will be used. 
In this case there should be a one or more well 
known members to connect to.
Live Demo “Map Store”
Hazelcast Map Store 
• useful for reading and writing map entries from 
and to an external data source 
• one instance per map per node will be created 
• word of caution: the map store should NOT call 
distributed map operations, otherwise you 
might run into deadlocks
Hazelcast Map Store 
• map pre-population via loadAllKeys method that 
returns the set of all “hot” keys that need to be 
loaded for the partitions owned by the member 
• write through vs. write behind using “write-delay- 
seconds” configuration (0 or bigger) 
• MapLoaderLifecycleSupport to be notified of 
lifecycle events, i.e. init and destroy
Live Demo “Executor Service”
Hazelcast Executor Service 
• extends the java.util.concurrent.ExecutorService, 
but is designed to be used in a distributed 
environment 
• scaling up via threads pool size 
• scaling out is automatic via addition of new 
Hazelcast instances
Hazelcast Executor Service 
• provides different ways to route tasks: 
• any member 
• specific member 
• the member hosting a specific key 
• all or subset of members 
• supports execution callback
Hazelcast Executor Service 
Drawbacks: 
• work-queue has no high availability: 
• each member will create local ThreadPoolExecutors 
with ordinary work-queues that do the real work but 
not backed up by Hazelcast 
• work-queue is not partitioned: 
• it could be that one member has a lot of unprocessed 
work, and another is idle 
• no customizable load balancing
Hazelcast Features 
More useful features: 
• entry listener 
• transactions support, e.g. local, distributed 
• map reduce API out-of-the-box 
• custom serialization/deserialization mechanism 
• distributed topic 
• clients
Hazelcast Missing Features 
Missing useful features: 
• update configuration in running cluster 
• load balancing for executor service
Infinispan vs. Hazelcast
Infinispan vs. Hazelcast 
Infinispan Hazelcast 
Pros • backed by relatively large 
company for use in largely 
distributed environments 
(JBoss) 
• been in active use for 
several years 
• well-written documentation 
• a lot of examples of different 
configurations as well as 
solutions to common 
problems 
• easy setup 
• more performant than 
Infinispan 
• simple node/cluster 
discovery mechanism 
• relies on only 1 jar to be 
included on classpath 
• brief documentation 
completed with simple 
code samples
Infinispan vs. Hazelcast 
Infinispan Hazelcast 
Cons • relies on JGroups that 
proven to be buggy 
especially under high load 
• configuration can be 
overly complex 
• ~9 jars are needed in 
order to get Infinispan up 
and running 
• code appears very 
complex and hard to 
debug/trace 
• backed by a startup based 
in Palo Alto and Turkey, 
just received Series A 2.5 
M funding from Bain 
Capital Ventures 
• customization points are 
fairly limited 
• some exceptions can be 
difficult to diagnose due to 
poorly written exception 
messages 
• still quite buggy
Hazelcast Summary
Best Practices 
• each specific Hazelcast instance should have its 
unique instance name 
• each specific Hazelcast instance should have its 
unique group name and password 
• each specific Hazelcast instance should start on 
separate port according to predefined ranges
Personal Recommendations 
• use XML configuration in production, but don’t 
use spring:hz schema. Our Spring based “lego 
bricks” approach for building resulting Hazelcast 
instance is quite decent. 
• don’t use Hazelcast for local caches as it was 
never designed with that purpose and always 
performs serialization/deserialization 
• don’t use library specific classes, use common 
collections, e.g. ConcurrentMap, and you will be 
able to replace underlying cache solution easily
Hazelcast Drawbacks 
• still quite buggy 
• poor documentation for more complex 
cases 
• enterprise edition costs money, but 
includes: 
• elastic memory 
• JAAS security 
• .NET and C++ clients
Q/A?
Thank you! 
by Taras Matyashovsky
References 
• http://docs.oracle.com/cd/E18686_01/coh.37/e18677/cache_intro.htm 
• http://coherence.oracle.com/display/COH31UG/Read-Through,+Write- 
Through,+Refresh-Ahead+and+Write-Behind+Caching 
• http://blog.tekmindsolutions.com/oracle-coherence-diffrence-between-replicated- 
cache-vs-partitioneddistributed-cache/ 
• http://www.slideshare.net/MaxAlexejev/from-distributed-caches-to-inmemory-data- 
grids 
• http://www.slideshare.net/jaxlondon2012/clustering-your-application-with-hazelcast 
• http://www.gridgain.com/blog/fyi/cache-data-grid-database/ 
• http://gridgaintech.wordpress.com/2013/10/19/distributed-caching-is-dead-long- 
live/ 
• http://www.hazelcast.com/resources/the-book-of-hazelcast/ 
• https://labs.consol.de/java-caches/part-3-3-peer-to-peer-with-hazelcast/ 
• http://hazelcast.com/resources/thinking-distributed-the-hazelcast-way/ 
• https://github.com/tmatyashovsky/hazelcast-samples/

More Related Content

What's hot

Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
 
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookTech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookThe Hive
 
DNS Security Presentation ISSA
DNS Security Presentation ISSADNS Security Presentation ISSA
DNS Security Presentation ISSASrikrupa Srivatsan
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesDataStax Academy
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureDan McKinley
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explainedconfluent
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaJiangjie Qin
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in JavaRuben Badaró
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developersconfluent
 
Modeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQLModeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQLScyllaDB
 
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonThrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonIgor Anishchenko
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
 
Apache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeApache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeDremio Corporation
 

What's hot (20)

kafka
kafkakafka
kafka
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookTech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
 
DNS Security Presentation ISSA
DNS Security Presentation ISSADNS Security Presentation ISSA
DNS Security Presentation ISSA
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds Architecture
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
 
Modeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQLModeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQL
 
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonThrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased Comparison
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
Apache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeApache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In Practice
 

Similar to From cache to in-memory data grid. Introduction to Hazelcast.

Caching principles-solutions
Caching principles-solutionsCaching principles-solutions
Caching principles-solutionspmanvi
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache KuduAndriy Zabavskyy
 
Training Webinar: Enterprise application performance with distributed caching
Training Webinar: Enterprise application performance with distributed cachingTraining Webinar: Enterprise application performance with distributed caching
Training Webinar: Enterprise application performance with distributed cachingOutSystems
 
Distributed applications using Hazelcast
Distributed applications using HazelcastDistributed applications using Hazelcast
Distributed applications using HazelcastTaras Matyashovsky
 
Oracle RAC Internals - The Cache Fusion Edition
Oracle RAC Internals - The Cache Fusion EditionOracle RAC Internals - The Cache Fusion Edition
Oracle RAC Internals - The Cache Fusion EditionMarkus Michalewicz
 
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347Manik Surtani
 
Distributed caching with java JCache
Distributed caching with java JCacheDistributed caching with java JCache
Distributed caching with java JCacheKasun Gajasinghe
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople
 
Data has a better idea the in-memory data grid
Data has a better idea   the in-memory data gridData has a better idea   the in-memory data grid
Data has a better idea the in-memory data gridBogdan Dina
 
Introducing Oxia: A Scalable Zookeeper Alternative
Introducing Oxia: A Scalable Zookeeper AlternativeIntroducing Oxia: A Scalable Zookeeper Alternative
Introducing Oxia: A Scalable Zookeeper AlternativeHostedbyConfluent
 
Container Attached Storage with OpenEBS - CNCF Paris Meetup
Container Attached Storage with OpenEBS - CNCF Paris MeetupContainer Attached Storage with OpenEBS - CNCF Paris Meetup
Container Attached Storage with OpenEBS - CNCF Paris MeetupMayaData Inc
 
Webinar: Overcoming the Storage Challenges Cassandra and Couchbase Create
Webinar: Overcoming the Storage Challenges Cassandra and Couchbase CreateWebinar: Overcoming the Storage Challenges Cassandra and Couchbase Create
Webinar: Overcoming the Storage Challenges Cassandra and Couchbase CreateStorage Switzerland
 
Selecting the right cache framework
Selecting the right cache frameworkSelecting the right cache framework
Selecting the right cache frameworkMohammed Fazuluddin
 
4. (mjk) extreme performance 2
4. (mjk) extreme performance 24. (mjk) extreme performance 2
4. (mjk) extreme performance 2Doina Draganescu
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservicesBigstep
 
Inter connect2016 yss1841-cloud-storage-options-v4
Inter connect2016 yss1841-cloud-storage-options-v4Inter connect2016 yss1841-cloud-storage-options-v4
Inter connect2016 yss1841-cloud-storage-options-v4Tony Pearson
 
Development of concurrent services using In-Memory Data Grids
Development of concurrent services using In-Memory Data GridsDevelopment of concurrent services using In-Memory Data Grids
Development of concurrent services using In-Memory Data Gridsjlorenzocima
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyPeter Clapham
 

Similar to From cache to in-memory data grid. Introduction to Hazelcast. (20)

Caching principles-solutions
Caching principles-solutionsCaching principles-solutions
Caching principles-solutions
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
 
Training Webinar: Enterprise application performance with distributed caching
Training Webinar: Enterprise application performance with distributed cachingTraining Webinar: Enterprise application performance with distributed caching
Training Webinar: Enterprise application performance with distributed caching
 
Distributed applications using Hazelcast
Distributed applications using HazelcastDistributed applications using Hazelcast
Distributed applications using Hazelcast
 
Mini-Training: To cache or not to cache
Mini-Training: To cache or not to cacheMini-Training: To cache or not to cache
Mini-Training: To cache or not to cache
 
Oracle RAC Internals - The Cache Fusion Edition
Oracle RAC Internals - The Cache Fusion EditionOracle RAC Internals - The Cache Fusion Edition
Oracle RAC Internals - The Cache Fusion Edition
 
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
 
Distributed caching with java JCache
Distributed caching with java JCacheDistributed caching with java JCache
Distributed caching with java JCache
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
 
Data has a better idea the in-memory data grid
Data has a better idea   the in-memory data gridData has a better idea   the in-memory data grid
Data has a better idea the in-memory data grid
 
Introducing Oxia: A Scalable Zookeeper Alternative
Introducing Oxia: A Scalable Zookeeper AlternativeIntroducing Oxia: A Scalable Zookeeper Alternative
Introducing Oxia: A Scalable Zookeeper Alternative
 
Container Attached Storage with OpenEBS - CNCF Paris Meetup
Container Attached Storage with OpenEBS - CNCF Paris MeetupContainer Attached Storage with OpenEBS - CNCF Paris Meetup
Container Attached Storage with OpenEBS - CNCF Paris Meetup
 
Webinar: Overcoming the Storage Challenges Cassandra and Couchbase Create
Webinar: Overcoming the Storage Challenges Cassandra and Couchbase CreateWebinar: Overcoming the Storage Challenges Cassandra and Couchbase Create
Webinar: Overcoming the Storage Challenges Cassandra and Couchbase Create
 
Selecting the right cache framework
Selecting the right cache frameworkSelecting the right cache framework
Selecting the right cache framework
 
4. (mjk) extreme performance 2
4. (mjk) extreme performance 24. (mjk) extreme performance 2
4. (mjk) extreme performance 2
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservices
 
Inter connect2016 yss1841-cloud-storage-options-v4
Inter connect2016 yss1841-cloud-storage-options-v4Inter connect2016 yss1841-cloud-storage-options-v4
Inter connect2016 yss1841-cloud-storage-options-v4
 
Development of concurrent services using In-Memory Data Grids
Development of concurrent services using In-Memory Data GridsDevelopment of concurrent services using In-Memory Data Grids
Development of concurrent services using In-Memory Data Grids
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 

More from Taras Matyashovsky

Distinguish Pop from Heavy Metal using Apache Spark MLlib
Distinguish Pop from Heavy Metal using Apache Spark MLlibDistinguish Pop from Heavy Metal using Apache Spark MLlib
Distinguish Pop from Heavy Metal using Apache Spark MLlibTaras Matyashovsky
 
Introduction to ML with Apache Spark MLlib
Introduction to ML with Apache Spark MLlibIntroduction to ML with Apache Spark MLlib
Introduction to ML with Apache Spark MLlibTaras Matyashovsky
 
Morning at Lohika 2nd anniversary
Morning at Lohika 2nd anniversaryMorning at Lohika 2nd anniversary
Morning at Lohika 2nd anniversaryTaras Matyashovsky
 
Influence. The Psychology of Persuasion (in IT)
Influence. The Psychology of Persuasion (in IT)Influence. The Psychology of Persuasion (in IT)
Influence. The Psychology of Persuasion (in IT)Taras Matyashovsky
 
JEEConf 2015 - Introduction to real-time big data with Apache Spark
JEEConf 2015 - Introduction to real-time big data with Apache SparkJEEConf 2015 - Introduction to real-time big data with Apache Spark
JEEConf 2015 - Introduction to real-time big data with Apache SparkTaras Matyashovsky
 
Morning at Lohika 1st anniversary
Morning at Lohika 1st anniversaryMorning at Lohika 1st anniversary
Morning at Lohika 1st anniversaryTaras Matyashovsky
 
Introduction to real time big data with Apache Spark
Introduction to real time big data with Apache SparkIntroduction to real time big data with Apache Spark
Introduction to real time big data with Apache SparkTaras Matyashovsky
 
New life inside monolithic application
New life inside monolithic applicationNew life inside monolithic application
New life inside monolithic applicationTaras Matyashovsky
 

More from Taras Matyashovsky (11)

Morning 3 anniversary
Morning 3 anniversaryMorning 3 anniversary
Morning 3 anniversary
 
Distinguish Pop from Heavy Metal using Apache Spark MLlib
Distinguish Pop from Heavy Metal using Apache Spark MLlibDistinguish Pop from Heavy Metal using Apache Spark MLlib
Distinguish Pop from Heavy Metal using Apache Spark MLlib
 
Introduction to ML with Apache Spark MLlib
Introduction to ML with Apache Spark MLlibIntroduction to ML with Apache Spark MLlib
Introduction to ML with Apache Spark MLlib
 
Morning at Lohika 2nd anniversary
Morning at Lohika 2nd anniversaryMorning at Lohika 2nd anniversary
Morning at Lohika 2nd anniversary
 
Confession of an Engineer
Confession of an EngineerConfession of an Engineer
Confession of an Engineer
 
Influence. The Psychology of Persuasion (in IT)
Influence. The Psychology of Persuasion (in IT)Influence. The Psychology of Persuasion (in IT)
Influence. The Psychology of Persuasion (in IT)
 
JEEConf 2015 - Introduction to real-time big data with Apache Spark
JEEConf 2015 - Introduction to real-time big data with Apache SparkJEEConf 2015 - Introduction to real-time big data with Apache Spark
JEEConf 2015 - Introduction to real-time big data with Apache Spark
 
Morning at Lohika 1st anniversary
Morning at Lohika 1st anniversaryMorning at Lohika 1st anniversary
Morning at Lohika 1st anniversary
 
Introduction to real time big data with Apache Spark
Introduction to real time big data with Apache SparkIntroduction to real time big data with Apache Spark
Introduction to real time big data with Apache Spark
 
New life inside monolithic application
New life inside monolithic applicationNew life inside monolithic application
New life inside monolithic application
 
Morning at Lohika
Morning at LohikaMorning at Lohika
Morning at Lohika
 

Recently uploaded

First Review Group 1 PPT.pptx with slide
First Review Group 1 PPT.pptx with slideFirst Review Group 1 PPT.pptx with slide
First Review Group 1 PPT.pptx with slideMonika860882
 
Searching and Sorting Algorithms
Searching and Sorting AlgorithmsSearching and Sorting Algorithms
Searching and Sorting AlgorithmsAshutosh Satapathy
 
EJECTOR REFRIGERATION CYCLE WITH THE INJECTION OF A HIGH DENSITY FLUID INTO A...
EJECTOR REFRIGERATION CYCLE WITH THE INJECTION OF A HIGH DENSITY FLUID INTO A...EJECTOR REFRIGERATION CYCLE WITH THE INJECTION OF A HIGH DENSITY FLUID INTO A...
EJECTOR REFRIGERATION CYCLE WITH THE INJECTION OF A HIGH DENSITY FLUID INTO A...marijomiljkovic1
 
Fabrics Finishing Manual ( Arkey Knit Dyeing Mills Ltd)
Fabrics Finishing Manual ( Arkey Knit Dyeing Mills Ltd)Fabrics Finishing Manual ( Arkey Knit Dyeing Mills Ltd)
Fabrics Finishing Manual ( Arkey Knit Dyeing Mills Ltd)Mizan Rahman
 
Conventional vs Modern method (Philosophies) of Tunneling-re.pptx
Conventional vs Modern method (Philosophies) of Tunneling-re.pptxConventional vs Modern method (Philosophies) of Tunneling-re.pptx
Conventional vs Modern method (Philosophies) of Tunneling-re.pptxSAQIB KHURSHEED WANI
 
A brief about Jeypore Sub-station Presentation
A brief about Jeypore Sub-station PresentationA brief about Jeypore Sub-station Presentation
A brief about Jeypore Sub-station PresentationJeyporess2021
 
Wave Energy Technologies Overtopping 1 - Tom Thorpe.pdf
Wave Energy Technologies Overtopping 1 - Tom Thorpe.pdfWave Energy Technologies Overtopping 1 - Tom Thorpe.pdf
Wave Energy Technologies Overtopping 1 - Tom Thorpe.pdfErik Friis-Madsen
 
This chapter gives an outline of the security.
This chapter gives an outline of the security.This chapter gives an outline of the security.
This chapter gives an outline of the security.RoshniIsrani1
 
Artificial organ courses Hussein L1-C2.pptx
Artificial organ courses Hussein  L1-C2.pptxArtificial organ courses Hussein  L1-C2.pptx
Artificial organ courses Hussein L1-C2.pptxHusseinMishbak
 
Support nodes for large-span coal storage structures
Support nodes for large-span coal storage structuresSupport nodes for large-span coal storage structures
Support nodes for large-span coal storage structureswendy cai
 
Research paper publications: Meaning of Q1 Q2 Q3 Q4 Journal
Research paper publications: Meaning of Q1 Q2 Q3 Q4 JournalResearch paper publications: Meaning of Q1 Q2 Q3 Q4 Journal
Research paper publications: Meaning of Q1 Q2 Q3 Q4 JournalDr. Manjunatha. P
 
zomato data mining datasets for quality prefernece and conntrol.pptx
zomato data mining  datasets for quality prefernece and conntrol.pptxzomato data mining  datasets for quality prefernece and conntrol.pptx
zomato data mining datasets for quality prefernece and conntrol.pptxPratikMhatre39
 
The Art of Cloud Native Defense on Kubernetes
The Art of Cloud Native Defense on KubernetesThe Art of Cloud Native Defense on Kubernetes
The Art of Cloud Native Defense on KubernetesJacopo Nardiello
 
autonomous_vehicle_working_paper_01072020-_508_compliant.pdf
autonomous_vehicle_working_paper_01072020-_508_compliant.pdfautonomous_vehicle_working_paper_01072020-_508_compliant.pdf
autonomous_vehicle_working_paper_01072020-_508_compliant.pdfPandurangGurakhe
 
Introduction to Machine Learning Unit-2 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-2 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-2 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-2 Notes for II-II Mechanical EngineeringC Sai Kiran
 
presentation by faizan[1] [Read-Only].pptx
presentation by faizan[1] [Read-Only].pptxpresentation by faizan[1] [Read-Only].pptx
presentation by faizan[1] [Read-Only].pptxkhfaizan534
 
Advanced Additive Manufacturing by Sumanth A.pptx
Advanced Additive Manufacturing by Sumanth A.pptxAdvanced Additive Manufacturing by Sumanth A.pptx
Advanced Additive Manufacturing by Sumanth A.pptxSumanth A
 
Field Report on present condition of Ward 1 and Ward 2 of Pabna Municipality
Field Report on present condition of Ward 1 and Ward 2 of Pabna MunicipalityField Report on present condition of Ward 1 and Ward 2 of Pabna Municipality
Field Report on present condition of Ward 1 and Ward 2 of Pabna MunicipalityMorshed Ahmed Rahath
 

Recently uploaded (20)

First Review Group 1 PPT.pptx with slide
First Review Group 1 PPT.pptx with slideFirst Review Group 1 PPT.pptx with slide
First Review Group 1 PPT.pptx with slide
 
Searching and Sorting Algorithms
Searching and Sorting AlgorithmsSearching and Sorting Algorithms
Searching and Sorting Algorithms
 
EJECTOR REFRIGERATION CYCLE WITH THE INJECTION OF A HIGH DENSITY FLUID INTO A...
EJECTOR REFRIGERATION CYCLE WITH THE INJECTION OF A HIGH DENSITY FLUID INTO A...EJECTOR REFRIGERATION CYCLE WITH THE INJECTION OF A HIGH DENSITY FLUID INTO A...
EJECTOR REFRIGERATION CYCLE WITH THE INJECTION OF A HIGH DENSITY FLUID INTO A...
 
Fabrics Finishing Manual ( Arkey Knit Dyeing Mills Ltd)
Fabrics Finishing Manual ( Arkey Knit Dyeing Mills Ltd)Fabrics Finishing Manual ( Arkey Knit Dyeing Mills Ltd)
Fabrics Finishing Manual ( Arkey Knit Dyeing Mills Ltd)
 
Conventional vs Modern method (Philosophies) of Tunneling-re.pptx
Conventional vs Modern method (Philosophies) of Tunneling-re.pptxConventional vs Modern method (Philosophies) of Tunneling-re.pptx
Conventional vs Modern method (Philosophies) of Tunneling-re.pptx
 
Caltrans view on recycling of in-place asphalt pavements
Caltrans view on recycling of in-place asphalt pavementsCaltrans view on recycling of in-place asphalt pavements
Caltrans view on recycling of in-place asphalt pavements
 
A brief about Jeypore Sub-station Presentation
A brief about Jeypore Sub-station PresentationA brief about Jeypore Sub-station Presentation
A brief about Jeypore Sub-station Presentation
 
Wave Energy Technologies Overtopping 1 - Tom Thorpe.pdf
Wave Energy Technologies Overtopping 1 - Tom Thorpe.pdfWave Energy Technologies Overtopping 1 - Tom Thorpe.pdf
Wave Energy Technologies Overtopping 1 - Tom Thorpe.pdf
 
This chapter gives an outline of the security.
This chapter gives an outline of the security.This chapter gives an outline of the security.
This chapter gives an outline of the security.
 
Artificial organ courses Hussein L1-C2.pptx
Artificial organ courses Hussein  L1-C2.pptxArtificial organ courses Hussein  L1-C2.pptx
Artificial organ courses Hussein L1-C2.pptx
 
FOREST FIRE USING IoT-A Visual to UG students
FOREST FIRE USING IoT-A Visual to UG studentsFOREST FIRE USING IoT-A Visual to UG students
FOREST FIRE USING IoT-A Visual to UG students
 
Support nodes for large-span coal storage structures
Support nodes for large-span coal storage structuresSupport nodes for large-span coal storage structures
Support nodes for large-span coal storage structures
 
Research paper publications: Meaning of Q1 Q2 Q3 Q4 Journal
Research paper publications: Meaning of Q1 Q2 Q3 Q4 JournalResearch paper publications: Meaning of Q1 Q2 Q3 Q4 Journal
Research paper publications: Meaning of Q1 Q2 Q3 Q4 Journal
 
zomato data mining datasets for quality prefernece and conntrol.pptx
zomato data mining  datasets for quality prefernece and conntrol.pptxzomato data mining  datasets for quality prefernece and conntrol.pptx
zomato data mining datasets for quality prefernece and conntrol.pptx
 
The Art of Cloud Native Defense on Kubernetes
The Art of Cloud Native Defense on KubernetesThe Art of Cloud Native Defense on Kubernetes
The Art of Cloud Native Defense on Kubernetes
 
autonomous_vehicle_working_paper_01072020-_508_compliant.pdf
autonomous_vehicle_working_paper_01072020-_508_compliant.pdfautonomous_vehicle_working_paper_01072020-_508_compliant.pdf
autonomous_vehicle_working_paper_01072020-_508_compliant.pdf
 
Introduction to Machine Learning Unit-2 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-2 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-2 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-2 Notes for II-II Mechanical Engineering
 
presentation by faizan[1] [Read-Only].pptx
presentation by faizan[1] [Read-Only].pptxpresentation by faizan[1] [Read-Only].pptx
presentation by faizan[1] [Read-Only].pptx
 
Advanced Additive Manufacturing by Sumanth A.pptx
Advanced Additive Manufacturing by Sumanth A.pptxAdvanced Additive Manufacturing by Sumanth A.pptx
Advanced Additive Manufacturing by Sumanth A.pptx
 
Field Report on present condition of Ward 1 and Ward 2 of Pabna Municipality
Field Report on present condition of Ward 1 and Ward 2 of Pabna MunicipalityField Report on present condition of Ward 1 and Ward 2 of Pabna Municipality
Field Report on present condition of Ward 1 and Ward 2 of Pabna Municipality
 

From cache to in-memory data grid. Introduction to Hazelcast.

  • 1. From cache to in-memory data grid. Introduction to Hazelcast. By Taras Matyashovsky
  • 3. About me • Software engineer/TL • Worked for outsource companies, product companies and tried myself in startups/ freelancing • 7+ years production Java experience • Fan of Agile methodologies, CSM
  • 4. What? • This presentation: • covers basics of caching and popular cache types • explains evolution from simple cache to distributed, and from distributed to IMDG • not describes usage of NoSQL solutions for caching • is not intended for products comparison or for promotion of Hazelcast as the best solution
  • 5. Why? • to expand horizons regarding modern distributed architectures and solutions • to share experience from my current project where Infinispan was replaced with Hazelcast as in-memory distributed cache solution
  • 6. Agenda 1st part: • Why software caches? • Common cache attributes • Cache access patterns • Cache types • Distributed cache vs. IMDG
  • 7. Agenda 2nd part: • Hazelcast in a nutshell • Hazelcast configuration • Live demo sessions • in-memory distributed cache • write-through cache with Postgres as storage • search in distributed cache • parallel processing using executor service and entry processor • Infinispan vs. Hazelcast • Best practices and personal recommendations
  • 9. Why Software Caching? • application performance: • many concurrent users • time and costs overhead to access application’s data stored in RDBMS or file system • database-access bottlenecks caused by too many simultaneous requests
  • 10. So Software Caches • improve response times by reducing data access latency • offload persistent storages by reducing number of trips to data sources • avoid the cost of repeatedly creating objects • share objects between threads • only work for IO-bound applications
  • 11. So Software Caches are essential for modern high-loaded applications
  • 12. But • memory size • is limited • can become unacceptably huge • synchronization complexity • consistency between the cached data state and data source’s original data • durability • correct cache invalidation • scalability
  • 13. Common Cache Attributes • maximum size, e.g. quantity of entries • cache algorithm used for invalidation/eviction, e.g.: • least recently used (LRU) • least frequently used (LFU) • FIFO • eviction percentage • expiration, e.g.: • time-to-live (TTL) • absolute/relative time-based expiration
  • 14. Cache Access Patterns • cache aside • read-through • refresh-ahead • write-through • write-behind
  • 15. Cache Aside Pattern • application is responsible for reading and writing from the storage and the cache doesn't interact with the storage at all • the cache is “kept aside” as a faster and more scalable in-memory data store Client Cache Storage
  • 16. Read-Through/Write-Through • the application treats cache as the main data store and reads/writes data from/to it • the cache is responsible for reading and writing this data to the database Client Cache Storage
  • 17. Write-Behind Pattern • modified cache entries are asynchronously written to the storage after a configurable delay Client Cache Storage
  • 18. Refresh-Ahead Pattern • automatically and asynchronously reload (refresh) any recently accessed cache entry from the cache loader prior to its expiration Client Cache Storage
  • 19. Cache Strategy Selection RT/WT vs. cache-aside: • RT/WT simplifies application code • cache-aside may have blocking behavior • cache-aside may be preferable when there are multiple cache updates triggered to the same storage from different cache servers
  • 20. Cache Strategy Selection Write-through vs. write-behind: • write-behind caching may deliver considerably higher throughput and reduced latency compared to write-through caching • implication of write-behind caching is that database updates occur outside of the cache transaction • write-behind transaction can conflict with an external update
  • 22. Cache Types • local cache • replicated cache • distributed cache • remote cache • near cache
  • 23. Local Cache a cache that is local to (completely contained within) a particular cluster node
  • 24. Local Cache Pros: • simplicity • performance • no serialization/deserialization overhead Cons: • not a fault-tolerant • scalability
  • 25. Local Cache Solutions: • EhCache • Google Guava • Infinispan local cache mode
  • 26. Replicated Cache a cache that replicates its data to all cluster nodes
  • 27. Get in Replicated Cache Each cluster node (JVM) accesses the data from its own memory, i.e. local read:
  • 28. Put in Replicated Cache Pushing the new version of the data to all other cluster nodes:
  • 29. Replicated Cache Pros: • best read performance • fault–tolerant • linear performance scalability for reads Cons: • poor write performance • additional network load • poor and limited scalability for writes • memory consumption
  • 30. Replicated Cache Solutions: • open-source: • Infinispan • commercial: • Oracle Coherence • EhCache + Terracota
  • 31. Distributed Cache a cache that partitions its data among all cluster nodes
  • 32. Get in Distributed Cache Access often must go over the network to another cluster node:
  • 33. Put in Distributed Cache Resolving known limitation of replicated cache:
  • 34. Put in Distributed Cache • the data is being sent to a primary cluster node and a backup cluster node if backup count is 1 • modifications to the cache are not considered complete until all backups have acknowledged receipt of the modification, i.e. slight performance penalty • such overhead guarantees that data consistency is maintained and no data is lost
  • 35. Failover in Distributed Cache Failover involves promoting backup data to be primary storage:
  • 36. Local Storage in Distributed Cache Certain cluster nodes can be configured to store data, and others to be configured to not store data:
  • 37. Distributed Cache Pros: • linear performance scalability for reads and writes • fault-tolerant Cons: • increased latency of reads (due to network round-trip and serialization/deserialization expenses)
  • 38. Distributed Cache Summary Distributed in-memory key/value stores supports a simple set of “put” and “get” operations and optionally read-through and write-through behavior for writing and reading values to and from underlying disk-based storage such as an RDBMS
  • 39. Distributed Cache Summary Depending on the product additional features like: • ACID transactions • eviction policies • replication vs. partitioning • active backups also became available as the products matured
  • 40. Distributed Cache Solutions: • open-source: • Infinispan • Hazelcast • NoSQL storages, e.g. Redis, Cassandra, MongoDB, etc. • commercial: • Oracle Coherence • Terracota
  • 41. Remote Cache a cache that is located remotely and should be accessed by a client(s)
  • 42. Remote Cache Majority of existing distributed/replicated caches solutions support 2 modes: • embedded mode • when cache instance is started within the same JVM as your application • client-server mode • when remote cache instance is started and clients connect to it using a variety of different protocols
  • 43. Remote Cache Solutions: • Infinispan remote cache mode • Hazelcast client-server mode • Memcached
  • 44. Near Cache a hybrid cache; it typically fronts a distributed cache or a remote cache with a local cache
  • 45. Get in Near Cache When an object is fetched from remote node, it is put to local cache, so subsequent requests are handled by local node retrieving from local cache:
  • 46. Near Cache Pros: • it is best used for read only data Cons: • increases memory usage since the near cache items need to be stored in the memory of the member • reduces consistency
  • 49. In-memory Data Grid In-memory distributed cache plus: • ability to support co-location of computations with data in a distributed context and move computation to data • distributed MPP processing based on standard SQL and/or Map/Reduce, that allows to effectively compute over data stored in-memory across the cluster
  • 50. IMDC vs. IMDG • in-memory distributed caches were developed in response to a growing need for data high-availability • in-memory data grids were developed to respond to the growing complexities of data processing
  • 51. IMDG in a nutshell Adding distributed SQL and/or MapReduce type processing required a complete re-thinking of distributed caches, as focus has shifted from pure data management to hybrid data and compute management
  • 52. In-memory Data Grid Solutions
  • 54. Hazelcast The leading open source in-memory data grid free alternative to proprietary solutions, such as Oracle Coherence, VMWare Pivotal Gemfire and Software AG Terracotta
  • 55. Hazelcast Use-Cases • scale your application • share data across cluster • partition your data • balance the load • send/receive messages • process in parallel on many JVMs, i.e. MPP
  • 56. Hazelcast Features • dynamic clustering, backup, discovery, fail-over • distributed map, queue, set, list, lock, semaphore, topic, executor service, etc. • transaction support • map/reduce API • Java client for accessing the cluster remotely
  • 57. Hazelcast Configuration • programmatic configuration • XML configuration • Spring configuration Nuance: It is very important that the configuration on all members in the cluster is exactly the same, it doesn’t matter if you use the XML based configuration or the programmatic configuration.
  • 60. Sample Application Technologies: • Spring Boot 1.0.1 • Hazelcast 3.2 • Postgres 9.3 Application: • RESTful web service to get/put data from/to cache • RESTful web service to execute tasks in the cluster • one Instance of Hazelcast per application * Some samples are not optimal and created just to demonstrate usage of existing Hazelcast API
  • 61. Global Hazelcast Configuration Defined global Hazelcast configuration in separate config in common module. It contains skeleton for future Hazelcast instance as well as global configuration settings: • instance configuration skeleton • common properties • group name and password • TCP based network configuration • join config • multicast and TCP/IP config • default distributed map configuration skeleton
  • 62. Hazelcast Instance Each module that uses Hazelcast for distributed cache should have its own separate Hazelcast instance. The “Hazelcast Instance” is a factory for creating individual cache objects. Each cache has a name and potentially distinct configuration settings (expiration, eviction, replication, and more). Multiple instances can live within the same JVM.
  • 63. Hazelcast Cluster Group Groups are used in order to have multiple isolated clusters on the same network instead of a single cluster. JVM can host multiple Hazelcast instances (nodes). Each node can only participate in one group and it only joins to its own group, does not mess with others. In order to achieve this group name and group password configuration properties are used.
  • 64. Hazelcast Network Config In our environment multicast mechanism for joining the cluster is not supported, so only TCP/IP-cluster approach will be used. In this case there should be a one or more well known members to connect to.
  • 65. Live Demo “Map Store”
  • 66. Hazelcast Map Store • useful for reading and writing map entries from and to an external data source • one instance per map per node will be created • word of caution: the map store should NOT call distributed map operations, otherwise you might run into deadlocks
  • 67. Hazelcast Map Store • map pre-population via loadAllKeys method that returns the set of all “hot” keys that need to be loaded for the partitions owned by the member • write through vs. write behind using “write-delay- seconds” configuration (0 or bigger) • MapLoaderLifecycleSupport to be notified of lifecycle events, i.e. init and destroy
  • 68. Live Demo “Executor Service”
  • 69. Hazelcast Executor Service • extends the java.util.concurrent.ExecutorService, but is designed to be used in a distributed environment • scaling up via threads pool size • scaling out is automatic via addition of new Hazelcast instances
  • 70. Hazelcast Executor Service • provides different ways to route tasks: • any member • specific member • the member hosting a specific key • all or subset of members • supports execution callback
  • 71. Hazelcast Executor Service Drawbacks: • work-queue has no high availability: • each member will create local ThreadPoolExecutors with ordinary work-queues that do the real work but not backed up by Hazelcast • work-queue is not partitioned: • it could be that one member has a lot of unprocessed work, and another is idle • no customizable load balancing
  • 72. Hazelcast Features More useful features: • entry listener • transactions support, e.g. local, distributed • map reduce API out-of-the-box • custom serialization/deserialization mechanism • distributed topic • clients
  • 73. Hazelcast Missing Features Missing useful features: • update configuration in running cluster • load balancing for executor service
  • 75. Infinispan vs. Hazelcast Infinispan Hazelcast Pros • backed by relatively large company for use in largely distributed environments (JBoss) • been in active use for several years • well-written documentation • a lot of examples of different configurations as well as solutions to common problems • easy setup • more performant than Infinispan • simple node/cluster discovery mechanism • relies on only 1 jar to be included on classpath • brief documentation completed with simple code samples
  • 76. Infinispan vs. Hazelcast Infinispan Hazelcast Cons • relies on JGroups that proven to be buggy especially under high load • configuration can be overly complex • ~9 jars are needed in order to get Infinispan up and running • code appears very complex and hard to debug/trace • backed by a startup based in Palo Alto and Turkey, just received Series A 2.5 M funding from Bain Capital Ventures • customization points are fairly limited • some exceptions can be difficult to diagnose due to poorly written exception messages • still quite buggy
  • 78. Best Practices • each specific Hazelcast instance should have its unique instance name • each specific Hazelcast instance should have its unique group name and password • each specific Hazelcast instance should start on separate port according to predefined ranges
  • 79. Personal Recommendations • use XML configuration in production, but don’t use spring:hz schema. Our Spring based “lego bricks” approach for building resulting Hazelcast instance is quite decent. • don’t use Hazelcast for local caches as it was never designed with that purpose and always performs serialization/deserialization • don’t use library specific classes, use common collections, e.g. ConcurrentMap, and you will be able to replace underlying cache solution easily
  • 80. Hazelcast Drawbacks • still quite buggy • poor documentation for more complex cases • enterprise edition costs money, but includes: • elastic memory • JAAS security • .NET and C++ clients
  • 81. Q/A?
  • 82. Thank you! by Taras Matyashovsky
  • 83. References • http://docs.oracle.com/cd/E18686_01/coh.37/e18677/cache_intro.htm • http://coherence.oracle.com/display/COH31UG/Read-Through,+Write- Through,+Refresh-Ahead+and+Write-Behind+Caching • http://blog.tekmindsolutions.com/oracle-coherence-diffrence-between-replicated- cache-vs-partitioneddistributed-cache/ • http://www.slideshare.net/MaxAlexejev/from-distributed-caches-to-inmemory-data- grids • http://www.slideshare.net/jaxlondon2012/clustering-your-application-with-hazelcast • http://www.gridgain.com/blog/fyi/cache-data-grid-database/ • http://gridgaintech.wordpress.com/2013/10/19/distributed-caching-is-dead-long- live/ • http://www.hazelcast.com/resources/the-book-of-hazelcast/ • https://labs.consol.de/java-caches/part-3-3-peer-to-peer-with-hazelcast/ • http://hazelcast.com/resources/thinking-distributed-the-hazelcast-way/ • https://github.com/tmatyashovsky/hazelcast-samples/