Apache Ignite In-Memory Computing Platform Overview

© 2017 GridGain Systems, Inc.
In-Memory Performance
Durability of Disk

Apache Ignite
the in-memory hammer in your data science toolkit
Denis Magda
Ignite PMC Chair
GridGain Director of Product Management

• Apache Ignite Overview
• Clustering and Deployment
• Distributed Storage
• Distributed SQL
• Distributed Computations
• Machine Learning
Agenda

Apache Ignite In-Memory Computing Platform
Memory-Centric Storage
Ignite Native Persistence
(Flash, SSD, Intel 3D XPoint)
Third-Party Persistence
(RDBMS, HDFS, NoSQL)
SQL Transactions Compute Services MLStreamingKey/Value
IoTFinancial
Services
Pharma &
Healthcare
E-CommerceTravel &
Logistics
Telco

Clustering and Deployment

Clustering
• Server Nodes
• Act as containers for data and computations
• Generally started as standalone processes
• Client Nodes
• Provide a cluster entry point to run operations
• Embedded in applications code

Deployment
• Nodes are logical entities
• Runs in a JVM process
• Many nodes in a single JVM process
• On-Premise and Cloud
• Physical server or VM
• AWS, Azure, Google Compute Engine
• Kubernetes, Mesos, YARN

Distributed Storage

Distributed Storage
JCache Transactions Compute SQL
RDBMS
NoSQL
HDFS
Server Node
Distributed Key-Value Store
Dynamic
Scaling
Distributed
partitioned
hash map
ACID TransactionJCache & SQL
Server Node Server Node
3rd party storage caching
DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY

Partitions Distribution
Ignite Node
A
C
Ignite Node
B
A
Ignite Node
C
D
Ignite Node
D
B

Durable Memory
Off-heap Removes
noticeable GC
pauses
Automatic
Defragmentation
Stores
Superset of
Data
Predictable memory
consumption
Fully Transactional
(Write-Ahead Log)
Server Node Server Node Server Node
Ignite Cluster
Instantaneous
Restarts

Ignite Native Persistence
1. Update
RAM
2. Persist
Write-Ahead Log
Partition File 1
3. Ack
4. Checkpointing
Partition File N
Server Node

Distributed SQL

Distributed SQL
JDBC ODBC SQL API
Java .NET C++ BI
SELECT, UPDATE,
INSERT, MERGE,
DELETE, CREATE
and ALTER
DDL, DML Support
Cross-platform
Compatibility
Indexes in
RAM or Disk
Dynamic
Scaling
Server Node Server NodeServer Node
Apache Ignite Cluster
Tools

Data Definition Language
• CREATE/DROP TABLE
• CREATE/DROP INDEX
• ALTER TABLE
• Changes Durability
• Ignite Native Persistence
CREATE TABLE `city` (
`ID` INT(11),
`Name` CHAR(35),
`CountryCode` CHAR(3),
`District` CHAR(20),
`Population` INT(11),
PRIMARY KEY (`ID`, `CountryCode`)
) WITH "template=partitioned, backups=1, affinityKey=CountryCode";

Data Manipulation Language
• ANSI-99 specification
• Fault-tolerant and consistent
• INSERT, UPDATE, DELETE
• SELECT
• JOINs
• Subqueries
SELECT country.name, city.name, MAX(city.population) as max_pop
FROM country JOIN city ON city.countrycode = country.code
WHERE country.code IN ('USA','RUS','CHN')
GROUP BY country.name, city.name ORDER BY max_pop DESC LIMIT 3;

Affinity Collocation
Country
Languag
e
City
Server Node
ON-DISK
Server Node
ON-DISK
key (country = 5) 10
Partition
key (cityId = 10, countryId = 5)
10
Partition
key (cityId = 11, countryId = 9) 12
Partition

Collocated Joins
1. Initial Query
2. Query execution over local data
3. Reduce multiple results in one
Ignite Node
Canada
Toronto
Ottawa
Montreal
Calgary
Ignite Node
India
Mumbai
New Delhi
1 SELECT ct.name, c.name
FROM Country as ct
JOIN City as c ON ct.id = c.countryId
WHERE ct.name = “Canada”;
2
23

Non-Collocated Joins
1. Initial Query
2. Query execution (local + remote data)
3. Potential data movement
4. Reduce multiple results in one
Ignite Node
Canad
a
Toronto
Calgary
1 SELECT ct.name, c.name
FROM Country as ct
JOIN City as c ON ct.id = c.countryId
WHERE ct.name = “Canada”;
2
24 Ignite Node
India
Montreal
Ottawa
3
Montreal
Ottawa
Mumbai
New Delhi

Distributed Computations

Compute Grid
DURABLE MEMORY
DURABLE MEMORY
Ignite Cluster
C1
R1
C2
R2
C = C1 + C2
R = R1 + R2
C = Compute
R = Result
in T/2 time
Automatic Failover
Load Balancing
Zero Deployment

1. Initial Request
2. Fetch data from remote
nodes
3. Process entire data-set
3
1
Data 1
2
2 Data 2
Client-Server Processing Co-located Processing
Server Node
ON-DISK
Server Node
ON-DISK
1. Initial Request
2. Co-located processing with
data
3. Reduce multiple results in
one
2
2
1Client Node
Server Node
ON-DISK
Server Node
ON-DISK
Client Node
3

Machine Learning

Genetic Algorithm Grid
DURABLE MEMORY
DURABLE MEMORY
Ignite Cluster
F2, C2, M2
F = F1 + F2
C = C1 + C2
Collocated
Computation
Biological Evolution
Simulation
Chromosome and Genes Cluster
M = M1 + M2
F1, C1, M1
F = Fitness Calculation
C = Crossover
M = Mutation

Machine Learning Grid
K-Means Regressions Decision Trees
R C++ Python Java
Server Node Server NodeServer Node
Distributed Core Algebra
Scala REST
Random Forest
Distributed Algorithms
Dense and Sparse
Algebra
Large Scale
Parallelization
Multi-Language
Support
Dense and Sparse
Algebra
No ETL

Any Questions?
Thank you for joining us. Follow the conversation.
http://ignite.apache.org
#apacheignite
#denismagda

Apache Ignite In-Memory Computing Platform Overview

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Apache Ignite In-Memory Computing Platform Overview

Similar to Apache Ignite In-Memory Computing Platform Overview (20)

Recently uploaded

Recently uploaded (20)

Apache Ignite In-Memory Computing Platform Overview

Editor's Notes