Apache ignite v1.3

æHow to make your DBMS
1000x faster
19/12/2017 –City College

Presentation’s Overview
Part 1 – Theory
• What is the problem with RDBMS and NoSQL solutions
• What is Apache Ignite
• Main Features of Apache Ignite
• Use cases and Integrations
• Supported platforms
Part 2 – Examples from Apache Ignite’s repository

Scaling relational Databases is hard
• RDBMS mainly scale up / vertically (bigger/faster
machines)
• Limited scalability compared with Big Data requirements
• Shared all approach
• Same data files are available to all nodes (instances)
• Distributed locks required
• Distributed network search is required when the data
of an instance is not yet persisted into the file system.
In these cases a network search is required for recent
committed values. This approach is not scalable.
Source: http://www.marklogic.com/blog/relational-databases-scale/

NoSQL databases as a solution
• Provide horizontal scalability with the use of shared
nothing architectures and partitioning.
• But the following functionalities are not easily supported
in most NoSQL platforms:
• Joins
• Set operations (union/interest/minus)
• Transactions
• Full ANSI SQL support
• Constraints as we know from the RDBMS

What is Apache Ignite
Apache Ignite is the in-memory computing platform
that is durable, strongly consistent, and highly available
with powerful SQL, key-value and processing APIs  Durable Memory
 Ignite Persistence
 ACID Compliance
 Complete SQL Support
 Key-Value
 Collocated Processing
 Scalability and
Durability

What you can do with Apache Ignite?
Apache Ignite, is an in-memory computing platform that
enables you to dramatically accelerate and scale out your
existing data-intensive applications without ripping and
replacing your existing databases. It can reduce query
times by 1,000x versus disk-based systems. You can scale
out by adding new nodes to your cluster, which can handle
hundreds of terabytes of data from multiple databases.

What you can do with Apache Ignite? (cont.)
You can modernize your existing data-intensive
architecture by inserting Apache Ignite between your
existing application and database layers. Apache Ignite
integrates seamlessly with RDBMS, NoSQL and Apache®
Hadoop™ databases. It features a unified API which
supports SQL, C++, .NET, PHP, MapReduce,
JAVA/Scala/Groovy, and Node.js protocols for the
application layer. Your Apache Ignite cluster, applications,
and databases can run on premise, in a hybrid
environment, or on a cloud platform such as AWS® or
Microsoft Azure.

Nikita Ivanov
Founder and CTO at GridGain systems
“You can buy a 10-blade server that has a terabyte of RAM for less than
$25,000 (~year 2015). RAM does push up the initial price but because RAM’s
lower power and cooling costs, and no moving parts to break, analysts say that
the TCO (Total Cost of Ownership) for using RAM instead of rotating or solid-
state storage as primary storage breaks even in about three years. And that's
just looking at TCO, not including the delivered value from getting much faster
processing performance.”
Source: https://www.linux.com/news/gridgain-memory-data-fabric-becomes-apache-
ignite

Source: https://gist.github.com/jboner/2841832
Latency Comparison Numbers
--------------------------
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy 3,000 ns 3 us
Send 1K bytes over 1 Gbps network 10,000 ns 10 us
Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD
Read 1 MB sequentially from memory 250,000 ns 250 us
Round trip within same datacenter 500,000 ns 500 us
Read 1 MB sequentially from SSD* 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD, 4X
memory
Disk seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip
Read 1 MB sequentially from disk 20,000,000 ns 20,000 us 20 ms 80x memory, 20X SSD
Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms

In-Memory Database (IMDB)
Apache Ignite can be used as a distributed and
horizontally scalable in-memory database
(IMDB) that supports ACID transactions and can
be used with SQL, key-value, compute, machine
learning and other data processing APIs.
One of the distinguishing characteristics of
Ignite SQL is the support for distributed SQL
JOINs, which works in both, collocated and non-
collocated fashions.
When collocated, the JOINs are executed on the local data available on each node
without having to move large data sets across the network. Such collocated
approach provides the best scalability and performance in distributed clusters.
More information about the IMDB can be found here.

Quiz 1
1. How many times is RAM faster than an SSD disk for
1MB sequential read?
1. 4
2. 10
3. 8
2. How many times is RAM faster than a typical HDD
for 1MB sequential read?
1. 80
2. 100
3. 1000

Quiz 2
1. Apache Ignite supports transactions?
1. Yes
2. No
2. Apache Ignite supports distributed SQL joins?
1. Yes
2. No

Distributed SQL Database
Apache Ignite is fully complaint with ANSI-
99 compliant, horizontally scalable and
fault-tolerant distributed SQL database.
The distribution is provided either by
partitioning the data across cluster nodes
or by full replication, depending on the
use case.
You can interact with Ignite as you would
with any other SQL storage, using standard
JDBC or ODBC connectivity. Ignite also
provides native SQL APIs for Java, .NET
and C++ developers for better
performance.
More information about the Distributed SQl DBs can be found here.
Distributed Collocated SQL Query

In-Memory Data Grid (IMDG) – Key-Value store
The data grid has been built from
the ground up to linearly scale to
hundreds of nodes with strong
semantics for data locality and
affinity data routing to reduce
redundant data noise.
It can be viewed as a distributed partitioned hash map with every cluster
node owning a portion of the overall data. This way the more cluster nodes
we add, the more data we can cache.
More information about the IMDG can be found here.

Compute Grid
Distributed computations are performed in
parallel fashion to gain high performance,
low latency, and linear scalability.
Ignite compute grid provides a set of
simple APIs that allow users distribute
computations and data processing across
multiple computers in the cluster.
Distributed parallel processing is based on
the ability to take any computation and
execute it on any set of cluster nodes and
return the results back.
More information about the Compute grid can be found here.

• Continuous availability of deployed services regardless of topology changes or crashes.
• Automatically deploy any number of distributed service instances in the cluster.
• Automatically deploy singletons, including cluster-singleton, node-singleton, or key-affinity-
singleton.
• Automatically deploy distributed services on node start-up by specifying them in the configuration.
• Undeploy any of the deployed services.
• Get information about service deployment topology within the cluster.
• Create service proxy for accessing remotely deployed distributed services
Service Grid
Service Grid allows for deployments
of arbitrary user-defined services on
the cluster. You can implement and
deploy any service, such as custom
counters, ID generators, hierarchical
maps, etc.
More information about the Service grid can be found here.

Distributed Data Structures
Ignite allows for most of the data structures
from java.util.concurrent framework to be
used in a distributed fashion.
Ignite gives you the capability to take a data
structure you are familiar with and use it in a
clustered fashion.
For example, you can take
java.util.concurrent.BlockingDeque and add
something to it on one node and poll it from
another node.
Or have a distributed primary key generator
which would guarantee uniqueness on all
nodes.
More information about the Distributed data Strucutres can be found here.
• Queue and Set
• Atomic Types
• CountDownLatch
• ID Generator
• Semaphore

Quiz 3
1. Apache Ignite is able to scale up and down by simply
adding/removing nodes from the cluster?
1. Yes
2. No
2. Does Apache Ignite has the concept of servers and
clients?
1. Yes
2. No
3. Is it possible to manage an Apache Ignite cluster
remotely?
1. Yes
2. No

Data Streamers
1. Client nodes inject finite or
continuous streams of data into
Ignite caches using Ignite Data
Streamers.
2. Data is automatically partitioned
between Ignite data nodes, and each
node gets equal amount of data.
3. Streamed data can be concurrently
processed directly on the Ignite data
nodes in collocated fashion.
4. Clients can also perform concurrent
SQL queries on the streamed data.
More information about the Data Streamers can be found here.

Integration with major streaming technologies
Apache Ignite integrates with major streaming technologies and frameworks
in order to bring even more advanced streaming capabilities to Ignite-based
architectures:
1. Kafka Streamer
2. Camel Streamer
3. JMS Streamer
4. MQTT Streamer
5. Storm Streamer
6. Flink streamer
7. Twitter Streamer
8. Flume Streamer
9. Zero MQ
10. Rocket MQ
More information about Integrating Ignite with Data
Streamers can be found here.

Messaging & Events
Exchange custom messages between nodes across the cluster.
Ignite distributed messaging allows for topic based cluster-wide
communication between all nodes.
Messages with a specified message topic can be distributed to all or sub-
group of nodes that have subscribed to that topic.
Ignite messaging is based on publish-subscribe paradigm where publishers
and subscribers are connected together by a common topic.
When one of the nodes sends a message A for topic T, it is published on all
nodes that have subscribed to T.
More information about Messaging & Events can be found here.

Sliding Windows
More information about Sliding Windows can be found here.
Sliding windows are configured as Ignite cache eviction policies,
and can be:
• Time-based sliding windows
• FIFO sliding windows
• LRU sliding windows
• Querying sliding windows

Web Session clustering
More information about Web Session Clustering can be found here.
Ignite In-Memory Data Fabric is capable of
caching web sessions of all Java Servlet
containers that follow Java Servlet 3.0
Specification, including Apache Tomcat,
Eclipse Jetty, Oracle WebLogic, and others.
• No need for sticky sessions provided by
the Load Balancer.

Hibernate L2 Cache
More information about Hibernate L2 cache can be found here.
Ignite In-Memory Data Fabric can be used as
Hibernate Second-Level cache (or L2 cache),
which can significantly speed-up the
persistence layer of your application.
Hibernate is a well-known and widely used
framework for Object-Relational Mapping
(ORM). While interacting closely with an SQL
database, it performs caching of retrieved
data to minimize expensive database requests

Spring Caching
More information about Spring Cache can be found here.
Ignite is shipped with SpringCacheManager - an implementation of Spring
Cache Abstraction. It provides an annotation-based way to enable caching
for Java methods so that the result of a method execution is stored in the
Ignite cache. Later, if the same method is called with the same set of
parameter values, the result will be retrieved from the cache instead of
actually executing the method.
Example:
private JdbcTemplate jdbc;
@Cacheable("averageSalary")
public long averageSalary(int organizationId) {
String sql = "SELECT AVG(e.salary) " + "FROM Employee e " + "WHERE e.organizationId = ?";
return jdbc.queryForObject(sql, Long.class, organizationId);
}

Spring Data
More information about Spring Data can be found here.
Spring Data Framework provides a unified and
widely used API that allows abstracting an
underlying data storage from the application
layer.
Spring Data helps you avoid locking to a specific
database vendor, making it easy to switch from
one database to another with minimal efforts.
Apache Ignite implements Spring Data
CrudRepository interface that not only supports
basic CRUD operations but also provides access
to the Apache Ignite SQL Grid via the unified
Spring Data API.
@RepositoryConfig(cacheName = "PersonCache")
public interface PersonRepository extends IgniteRepository
<Person, Long> {
/**
* Gets all the persons with the given name.
* @param name Person name.
* @return A list of Persons with the given first name.
*/
public List<Person> findByFirstName(String name);
/**
* Returns top Person with the specified surname.
* @param name Person surname.
* @return Person that satisfy the query.
*/
public Cache.Entry<Long, Person> findTopByLastNameLike
(String name);
}

Apache Spark
More information about Ignite for Spark can be found here.
Apache Ignite provides an implementation of
Spark RDD (Resilient Distributed Datasets)
abstraction which allows to easily share
state in memory across Spark jobs. The main
difference between native Spark RDD and
IgniteRDD is that Ignite RDD provides a
shared in-memory view on data across
different Spark jobs, workers, or
applications, while native Spark RDD cannot
be seen by other Spark jobs or applications.

Other integrations
More information about integrations can be found here.
Apache Ignite integrates with:
• Hadoop
• Apache Cassandra
• PHP PDO – Data Objects
• MyBatis L2 Cache
• OSGi

Ignite Native Persistence
Ignite native persistence is a distributed,
ACID, and SQL-compliant disk store that
transparently integrates with Ignite's durable
memory. Ignite persistence is optional and can
be turned on and off. When turned off Ignite
becomes a pure in-memory store.
With the native persistence enabled, Ignite always stores a superset of data
on disk, and as much as possible in RAM. For example, if there are 100 entries
and RAM has the capacity to store only 20, then all 100 will be stored on disk
and only 20 will be cached in RAM for better performance.
More information about the Ignite Native Persistence can be found here.

3rd Party Persistence
JCache specification comes with APIs for
javax.cache.integration.CacheLoader and
javax.cache.integration.CacheWriter which are used for write-through
and read-through to and from an underlying persistent storage
respectively (e.g. an RDBMS database like Oracle or MySQL, or NoSQL
database like MongoDB or Couchbase).
It supports:
• Read/Write Through
• Write-Behind
More information about the 3rd Party Persistence can be found here.

Supported platforms & protocols
• Java
• .NET
• C++
• REST API
• Memcached
• Redis
• PHP
More information about the Platforms & Protocols can be found here.
Apache Ignite has a rich set of APIs that
are covered throughout the
documentation.
The APIs are implemented in the form of
native libraries that support major
languages such as Java, .NET and C++, as
well as a variety of protocols like REST,
Memcached, and Redis

Automatic RDBMS Integration
More information about the Automatic RDBMS Integration can be found here.

SqlLine with version 2.3.0
More information about the sqlline tool can can be found here.

Typical deployment for Apache Ignite
More information can be found here.
1. One or more applications connect to the Apache Ignite cluster in
order to manipulate the data in memory.
2. The application never communicates directly with the database.
3. Apache Ignite is responsible to synchronise the data.

Legacy systems?
1. Existing legacy systems updating a database.
2. New systems that rely on Apache Ignite with 3rd Party
Persistence enabled.
3. How we guarantee that stale data won’t reside on Ignite
cluster for a long time and will be updated as soon as the
database receives updated from the legacy application?

Legacy systems. Possible solution 1.
Connect the legacy system to the Ignite cluster directly.
1. Development is required in order
to make the transition from the
existing DB to Apache Ignite.
2. Complex PL/SQL stored
procedures needs rewrite.
3. Many legacy applications.
1. Simple solution.
2. No added costs.

Legacy systems. Possible solution 2 (Push).
Custom logic on the third party database that would propagate the
committed changes back to the Apache Ignite cluster.
1. Development cost.1. The data are being
replicated on time

Legacy systems. Possible solution 3 (Gridgain
and Oracle GoldenGate Integrator).
Use Gridgain cluster and Oracle GoldenGate Integrator.
1. Licenses cost ($).1. No need to develop complex
code

1. Startup a cluster
2. Run the JdbcExample/modified and show the console online
3. CacheTransactionExample
4. CacheQueryExample
5. CacheDataStreamerExample
6. CacheContinuousQueryExample (Show partitions from the web console)
7. CacheAffinityExample (java 8)
8. ComputeClosureExample (java 8)
9. IgniteAtomicSequenceExample
10. MessagingExample
11. PersistenceStoreExample
Examples from Apache Ignite’s github repo

Similar products/solutions
Hazelcast
Oracle Coherence
Pivotal GemFire
Terracotta
Gigaspaces
Redis
Detailed comparisons between GridGain and the
previous products can be found here.

Thank you!
Useful Resources
• https://github.com/apache/ignite
• https://www.gridgain.com/resources/documentation
• https://github.com/srecon/ignite-book-code-samples

Apache ignite v1.3

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Apache ignite v1.3

Similar to Apache ignite v1.3 (20)

Recently uploaded

Recently uploaded (20)

Apache ignite v1.3

Editor's Notes