Apache 
Ignite: 
Real-­‐Time 
Big 
Data 
with 
In-­‐Memory 
Data 
Fabric 
NIKITA 
IVANOV 
Founder 
& 
CTO 
@c64hacker 
© 
2014 
GridGain 
Systems, 
Inc. 
www.gridgain.com 
#gridgain
© 
2014 
GridGain 
Systems, 
Inc. 
Agenda 
• History 
of 
GridGain/Apache 
Ignite 
• EvoluSon 
of 
In-­‐Memory 
CompuSng 
• In-­‐Memory 
Data 
Fabric 
• Distributed 
Cluster 
& 
Compute 
– Coding 
Example 
• Distributed 
Data 
Grid 
– Coding 
Examples 
• Distributed 
Streaming 
& 
CEP 
• Plug-­‐n-­‐Play 
Hadoop 
Accelerator
© 
2014 
GridGain 
Systems, 
Inc. 
What 
is 
In-­‐Memory 
CompuAng 
• High 
Performance 
& 
Low 
Latencies 
• Faster 
than 
Disk 
and 
Flash 
• Cost 
EffecSve 
• Distributed 
or 
Not 
• Caching, 
Streaming, 
ComputaSons 
• Data 
Querying 
– 
SQL 
or 
Unstructured 
• VolaSle 
and 
Persistent 
• OLAP 
and 
OLTP 
Use 
Cases
EvoluAon 
of 
In-­‐Memory 
CompuAng 
Caching 
Distributed 
Caching 
© 
2014 
GridGain 
Systems, 
Inc. 
Database 
IM 
opSons 
In-­‐Memory 
Data 
Grids 
IMDBs 
Hadoop 
accelerators 
Streaming 
BI 
accelerators 
Data Grid 
Streaming 
Clustering & Compute Grid 
Hadoop 
Acceleration
© 
2014 
GridGain 
Systems, 
Inc. 
ExisAng 
Market 
is 
Fragmented 
Company 
Product 
Proprietary/ 
Open 
Source 
CharacterizaAon 
Oracle In-Memory Option for Oracle Database Proprietary Cost Option 
Oracle Times Ten Proprietary Point Solution IMDB 
Oracle Coherence Proprietary Point Solution IMDG 
SAP Hana Proprietary Point Solution - IMDB 
Microsoft SQL Server 2014 Proprietary Feature Upgrade 
DataBricks Apache Spark Open Source Point Solution - Hadoop 
VoltDB VoltDB Open Source Point Solution – IMDB 
Aerospike Aerospike Open Source Point Solution – NoSQL DB 
IBM DB2 with BLU Acceleration Proprietary Feature Upgrade 
Software AG Terracotta Open Source Point Solution - IMDG 
Hazelcast Hazelcast Open Source Point Solution - IMDG
In-­‐Memory 
Data 
Fabric: 
Strategic 
Approach 
to 
IMC 
• Supports all Apps 
• Open Source – Apache 2.0 
• Simple Java APIs 
• 1 JAR Dependency 
• High Performance & Scale 
• Automatic Fault Tolerance 
• Management/Monitoring 
• Runs on Commodity Hardware 
• Supports existing & 
new data sources 
• No need to rip & replace 
Data Grid 
Streaming 
Clustering & Compute Grid 
Hadoop 
Acceleration 
© 
2014 
GridGain 
Systems, 
Inc.
© 
2014 
GridGain 
Systems, 
Inc. 
Clustering 
& 
Compute 
• Zero 
Deployment 
• Pluggable 
SPI 
Design 
• Full 
Cluster 
Management 
• Direct 
API 
for 
MapReduce 
• Direct 
API 
for 
Fork/Join 
• Cron-­‐like 
Task 
Scheduling 
• State 
Checkpoints 
• Early 
and 
Late 
Load 
Balancing 
• AutomaSc 
Failover
© 
2014 
GridGain 
Systems, 
Inc. 
AutomaAc 
Cluster 
Discovery
© 
2014 
GridGain 
Systems, 
Inc. 
Closure 
ExecuAon
© 
2014 
GridGain 
Systems, 
Inc. 
Closure 
ExecuAon
In-­‐Memory 
Caching 
and 
Data 
Grid 
• Distributed 
© 
2014 
GridGain 
Systems, 
Inc. 
In-­‐Memory 
Key-­‐Value 
Store 
• Replicated 
and 
ParSSoned 
• TBs 
of 
data, 
of 
any 
type 
• On-­‐Heap 
and 
Off-­‐Heap 
Storage 
• Backup 
Replicas 
/ 
AutomaSc 
Failover 
• Distributed 
ACID 
TransacSons 
• SQL 
queries 
and 
JDBC 
driver 
• CollocaSon 
of 
Compute 
and 
Data
© 
2014 
GridGain 
Systems, 
Inc. 
Cache 
OperaAons 
Find 
a 
Bug?
© 
2014 
GridGain 
Systems, 
Inc. 
Cache 
TransacAon
Distributed 
Java 
Data 
Structures 
• Distributed 
© 
2014 
GridGain 
Systems, 
Inc. 
Map 
(cache) 
• Distributed 
Set 
• Distributed 
Queue 
• CountDownLatch 
• AtomicLong 
• AtomicSequence 
• AtomicReference 
• Distributed 
ExecutorService
Client-­‐Server 
vs. 
Affinity 
ColocaAon 
Client-­‐Server 
© 
2014 
GridGain 
Systems, 
Inc. 
Affinity 
ColocaSon
© 
2014 
GridGain 
Systems, 
Inc. 
In-­‐Memory 
Streaming 
& 
CEP 
• Streaming 
Data 
Never 
Ends 
• Branching 
Pipelines 
• CEP 
Sliding 
Windows 
• Pluggable 
RouSng 
• Real 
Time 
Analysis 
• At 
Least 
Once 
Guarantee
Plug-­‐n-­‐Play 
Hadoop 
Accelerator 
• Up 
to 
100x 
AcceleraSon 
• In-­‐Memory 
© 
2014 
GridGain 
Systems, 
Inc. 
NaSve 
MapReduce 
– In-­‐Process 
Data 
ColocaSon 
– Eager 
Push 
Scheduling 
• GGFS 
In-­‐Memory 
File 
System 
– Pure 
In-­‐Memory 
– Write-­‐Through 
to 
HDFS 
– Read-­‐Through 
from 
HDFS 
• Sync 
and 
Async 
Persistence
© 
2014 
GridGain 
Systems, 
Inc. 
In-­‐Memory 
NaAve 
MapReduce 
• In-­‐Memory 
NaSve 
MapReduce 
– Zero 
Code 
Change 
– Use 
exisSng 
MR 
code 
– Use 
exisSng 
Hive 
queries 
• No 
Name 
Node 
• No 
Network 
Noise 
• In-­‐Process 
Data 
ColocaSon 
• Eager 
Push 
Scheduling
DevOps 
Management 
and 
Monitoring 
© 
2014 
GridGain 
Systems, 
Inc.
© 
2014 
GridGain 
Systems, 
Inc. 
THANK 
YOU 
www.gridgain.com 
#gridgain

GridGain 6.0: Open Source In-Memory Computing Platform - Nikita Ivanov

  • 1.
    Apache Ignite: Real-­‐Time Big Data with In-­‐Memory Data Fabric NIKITA IVANOV Founder & CTO @c64hacker © 2014 GridGain Systems, Inc. www.gridgain.com #gridgain
  • 2.
    © 2014 GridGain Systems, Inc. Agenda • History of GridGain/Apache Ignite • EvoluSon of In-­‐Memory CompuSng • In-­‐Memory Data Fabric • Distributed Cluster & Compute – Coding Example • Distributed Data Grid – Coding Examples • Distributed Streaming & CEP • Plug-­‐n-­‐Play Hadoop Accelerator
  • 3.
    © 2014 GridGain Systems, Inc. What is In-­‐Memory CompuAng • High Performance & Low Latencies • Faster than Disk and Flash • Cost EffecSve • Distributed or Not • Caching, Streaming, ComputaSons • Data Querying – SQL or Unstructured • VolaSle and Persistent • OLAP and OLTP Use Cases
  • 4.
    EvoluAon of In-­‐Memory CompuAng Caching Distributed Caching © 2014 GridGain Systems, Inc. Database IM opSons In-­‐Memory Data Grids IMDBs Hadoop accelerators Streaming BI accelerators Data Grid Streaming Clustering & Compute Grid Hadoop Acceleration
  • 5.
    © 2014 GridGain Systems, Inc. ExisAng Market is Fragmented Company Product Proprietary/ Open Source CharacterizaAon Oracle In-Memory Option for Oracle Database Proprietary Cost Option Oracle Times Ten Proprietary Point Solution IMDB Oracle Coherence Proprietary Point Solution IMDG SAP Hana Proprietary Point Solution - IMDB Microsoft SQL Server 2014 Proprietary Feature Upgrade DataBricks Apache Spark Open Source Point Solution - Hadoop VoltDB VoltDB Open Source Point Solution – IMDB Aerospike Aerospike Open Source Point Solution – NoSQL DB IBM DB2 with BLU Acceleration Proprietary Feature Upgrade Software AG Terracotta Open Source Point Solution - IMDG Hazelcast Hazelcast Open Source Point Solution - IMDG
  • 6.
    In-­‐Memory Data Fabric: Strategic Approach to IMC • Supports all Apps • Open Source – Apache 2.0 • Simple Java APIs • 1 JAR Dependency • High Performance & Scale • Automatic Fault Tolerance • Management/Monitoring • Runs on Commodity Hardware • Supports existing & new data sources • No need to rip & replace Data Grid Streaming Clustering & Compute Grid Hadoop Acceleration © 2014 GridGain Systems, Inc.
  • 7.
    © 2014 GridGain Systems, Inc. Clustering & Compute • Zero Deployment • Pluggable SPI Design • Full Cluster Management • Direct API for MapReduce • Direct API for Fork/Join • Cron-­‐like Task Scheduling • State Checkpoints • Early and Late Load Balancing • AutomaSc Failover
  • 8.
    © 2014 GridGain Systems, Inc. AutomaAc Cluster Discovery
  • 9.
    © 2014 GridGain Systems, Inc. Closure ExecuAon
  • 10.
    © 2014 GridGain Systems, Inc. Closure ExecuAon
  • 11.
    In-­‐Memory Caching and Data Grid • Distributed © 2014 GridGain Systems, Inc. In-­‐Memory Key-­‐Value Store • Replicated and ParSSoned • TBs of data, of any type • On-­‐Heap and Off-­‐Heap Storage • Backup Replicas / AutomaSc Failover • Distributed ACID TransacSons • SQL queries and JDBC driver • CollocaSon of Compute and Data
  • 12.
    © 2014 GridGain Systems, Inc. Cache OperaAons Find a Bug?
  • 13.
    © 2014 GridGain Systems, Inc. Cache TransacAon
  • 14.
    Distributed Java Data Structures • Distributed © 2014 GridGain Systems, Inc. Map (cache) • Distributed Set • Distributed Queue • CountDownLatch • AtomicLong • AtomicSequence • AtomicReference • Distributed ExecutorService
  • 15.
    Client-­‐Server vs. Affinity ColocaAon Client-­‐Server © 2014 GridGain Systems, Inc. Affinity ColocaSon
  • 16.
    © 2014 GridGain Systems, Inc. In-­‐Memory Streaming & CEP • Streaming Data Never Ends • Branching Pipelines • CEP Sliding Windows • Pluggable RouSng • Real Time Analysis • At Least Once Guarantee
  • 17.
    Plug-­‐n-­‐Play Hadoop Accelerator • Up to 100x AcceleraSon • In-­‐Memory © 2014 GridGain Systems, Inc. NaSve MapReduce – In-­‐Process Data ColocaSon – Eager Push Scheduling • GGFS In-­‐Memory File System – Pure In-­‐Memory – Write-­‐Through to HDFS – Read-­‐Through from HDFS • Sync and Async Persistence
  • 18.
    © 2014 GridGain Systems, Inc. In-­‐Memory NaAve MapReduce • In-­‐Memory NaSve MapReduce – Zero Code Change – Use exisSng MR code – Use exisSng Hive queries • No Name Node • No Network Noise • In-­‐Process Data ColocaSon • Eager Push Scheduling
  • 19.
    DevOps Management and Monitoring © 2014 GridGain Systems, Inc.
  • 20.
    © 2014 GridGain Systems, Inc. THANK YOU www.gridgain.com #gridgain