How we broke Apache Ignite by adding
persistence
Stephen Darlington
16 December, 2019
2018 © GridGain Systems
2019 © GridGain Systems GridGain Company Confidential2
(spoiler: already fixed)
2019 © GridGain Systems GridGain Company Confidential
What is Ignite?
Distributed memory-centric storage
Combines the performance and scale of in-
memory computing together with the disk
durability and strong consistency in one system
Co-located Computations
Brings the computations to the servers where
the data actually resides, eliminating need to
move data over the network
Distributed Key-Value
Read, write and transact with
fast key-value APIs
Distributed SQL ACID Transactions Machine and Deep Learning
Horizontally, fault-tolerant distributed SQL
database that treats memory and disk as
active storage tiers
Supports distributed ACID transactions for
key-value as well as SQL operations
Set of simple, scalable and efficient tools that
allow building predictive machine learning
models without costly data transfers (ETL)
2019 © GridGain Systems GridGain Company Confidential
Apache Ignite In-Memory Computing Platform
Mainframe NoSQL HadoopIgnite Persistence
Persistent Layer
RDBMS
Machine and Deep Learning
EventsStreamingMessaging
Transaction
s
SQLKey-Value
Service GridCompute Grid
Application Layer
Web SaaS SocialMobile IoT
In-Memory Data Store
2019 © GridGain Systems GridGain Company Confidential
Apache Ignite’s History
5
Data Grid Local Store
Transactional
Persistence
?
2019 © GridGain Systems GridGain Company Confidential
Circa 2011
6
Local Store
Transactional
Persistence
?Data Grid
2019 © GridGain Systems GridGain Company Confidential
Circa 2014
7
Transactional
Persistence
?Data Grid Local Store
2019 © GridGain Systems GridGain Company Confidential
Circa 2017
• Start time does not depend on the data volume
• Can store more data than memory
• Crash recovery
• Single in-memory & native persistence architecture
2019 © GridGain Systems GridGain Company Confidential
Ignite 2.0: What we wanted
“We will just save everything to disk”
2019 © GridGain Systems GridGain Company Confidential
Circa 2017
10
?Data Grid Local Store
Transactional
Persistence
2019 © GridGain Systems GridGain Company Confidential
Beginning: Durable Memory
2019 © GridGain Systems GridGain Company Confidential
Beginning: Durable Memory
• ARIES Architecture
• Page-based
• Write-ahead log (when persistence is enabled)
• Everything is off heap
2019 © GridGain Systems GridGain Company Confidential
Beginning: Durable Memory
• PK Index: how to replace a HashMap
• Concurrent B+ Tree: a well-known data structure
• Separate PK Index per each partition
• Compare key hash first
• Bonus: guaranteed iteration order in a hash map
2019 © GridGain Systems GridGain Company Confidential
Baseline Topology
16
• [16:21:01] Ignite node started OK (id=326bab44)
• [16:21:01] >>> Ignite cluster is not active (limited functionality available). Use control.(sh|bat)
script or IgniteCluster interface to activate.
• [16:21:01] Topology snapshot [ver=1, locNode=326bab44, servers=1, clients=0, state=INACTIVE, CPUs=8,
offheap=3.2GB, heap=3.6GB]
• [16:21:01] ^-- Baseline [id=11, size=3, online=1, offline=2]
• [16:21:01] ^-- 2 nodes left for auto-activation [6213b7af-23bb-4c8d-a045-157d7f2d7718, db969788-
fc01-41f4-a91c-c03f2d201f76]
• [16:21:19] Joining node doesn't have encryption data [node=89b6ef6c-1055-4678-bcfa-00fb222208ce]
• [16:21:19] Topology snapshot [ver=2, locNode=326bab44, servers=2, clients=0, state=INACTIVE, CPUs=8,
offheap=6.4GB, heap=7.1GB]
• [16:21:19] ^-- Baseline [id=11, size=3, online=2, offline=1]
• [16:21:19] ^-- 1 nodes left for auto-activation [6213b7af-23bb-4c8d-a045-157d7f2d7718]
• [16:21:37] Joining node doesn't have encryption data [node=dd55ff24-da61-42cd-bbaf-c7940fab07d3]
• [16:21:37] Topology snapshot [ver=3, locNode=326bab44, servers=3, clients=0, state=INACTIVE, CPUs=8,
offheap=9.6GB, heap=11.0GB]
• [16:21:37] ^-- Baseline [id=11, size=3, online=3, offline=0]
• [16:21:37] ^-- All baseline nodes are online, will start auto-activation
2019 © GridGain Systems GridGain Company Confidential
Disk. Predictable access speed
• Disks are slow (even NVMe)
• At peak load naïve implementation steps on it’s tail easily
• Sudden performance drops to 0
2019 © GridGain Systems GridGain Company Confidential
Disk. Predictable access speed
• So, we need to… make Ignite slower
• Throttle input load depending on
• How fast we produce “dirty” pages
• How fast we write to disk
• How free the Copy-On-Write buffer is
2019 © GridGain Systems GridGain Company Confidential
Disk. Predictable access speed
• Page cache: what can go wrong
• We already have one-page cache in Ignite (durable memory)
• OS-level page cache
• Effectively doubles the memory consumption
2019 © GridGain Systems GridGain Company Confidential
Disk. Predictable access speed
• Page cache: solution is Direct IO
• Available in Java 10, but we build on Java 8
• Need to have native/platform specific calls or Java-
dependent module
2019 © GridGain Systems GridGain Company Confidential
The future?
21
Data Grid Local Store
Transactional
Persistence
?
2019 © GridGain Systems GridGain Company Confidential22
Questions?
2019 © GridGain Systems GridGain Company Confidential
More information
23
• Main landing page: https://ignite.apache.org
• Documentation: https://apacheignite.readme.io/docs
• Please complete our survey on how Apache Ignite should evolve:
https://docs.google.com/forms/d/e/1FAIpQLSdUveEVXer3lpkyiqfFw4175T
vZzGHUOS4snPfnkO0NDku0eQ/viewform
• Realtime data loading: https://www.imcsummit.org/2019/us/session/best-
practices-loading-real-time-data-distributed-systems-change-data-capture
2019 © GridGain Systems GridGain Company Confidential24
Stephen Darlington
Senior Consultant, GridGain Systems
@sdarlington

How we broke Apache Ignite by adding persistence, by Stephen Darlington (Gridgain Systems)

  • 1.
    How we brokeApache Ignite by adding persistence Stephen Darlington 16 December, 2019 2018 © GridGain Systems
  • 2.
    2019 © GridGainSystems GridGain Company Confidential2 (spoiler: already fixed)
  • 3.
    2019 © GridGainSystems GridGain Company Confidential What is Ignite? Distributed memory-centric storage Combines the performance and scale of in- memory computing together with the disk durability and strong consistency in one system Co-located Computations Brings the computations to the servers where the data actually resides, eliminating need to move data over the network Distributed Key-Value Read, write and transact with fast key-value APIs Distributed SQL ACID Transactions Machine and Deep Learning Horizontally, fault-tolerant distributed SQL database that treats memory and disk as active storage tiers Supports distributed ACID transactions for key-value as well as SQL operations Set of simple, scalable and efficient tools that allow building predictive machine learning models without costly data transfers (ETL)
  • 4.
    2019 © GridGainSystems GridGain Company Confidential Apache Ignite In-Memory Computing Platform Mainframe NoSQL HadoopIgnite Persistence Persistent Layer RDBMS Machine and Deep Learning EventsStreamingMessaging Transaction s SQLKey-Value Service GridCompute Grid Application Layer Web SaaS SocialMobile IoT In-Memory Data Store
  • 5.
    2019 © GridGainSystems GridGain Company Confidential Apache Ignite’s History 5 Data Grid Local Store Transactional Persistence ?
  • 6.
    2019 © GridGainSystems GridGain Company Confidential Circa 2011 6 Local Store Transactional Persistence ?Data Grid
  • 7.
    2019 © GridGainSystems GridGain Company Confidential Circa 2014 7 Transactional Persistence ?Data Grid Local Store
  • 8.
    2019 © GridGainSystems GridGain Company Confidential Circa 2017 • Start time does not depend on the data volume • Can store more data than memory • Crash recovery • Single in-memory & native persistence architecture
  • 9.
    2019 © GridGainSystems GridGain Company Confidential Ignite 2.0: What we wanted “We will just save everything to disk”
  • 10.
    2019 © GridGainSystems GridGain Company Confidential Circa 2017 10 ?Data Grid Local Store Transactional Persistence
  • 11.
    2019 © GridGainSystems GridGain Company Confidential Beginning: Durable Memory
  • 12.
    2019 © GridGainSystems GridGain Company Confidential Beginning: Durable Memory • ARIES Architecture • Page-based • Write-ahead log (when persistence is enabled) • Everything is off heap
  • 13.
    2019 © GridGainSystems GridGain Company Confidential Beginning: Durable Memory • PK Index: how to replace a HashMap • Concurrent B+ Tree: a well-known data structure • Separate PK Index per each partition • Compare key hash first • Bonus: guaranteed iteration order in a hash map
  • 14.
    2019 © GridGainSystems GridGain Company Confidential Baseline Topology 16 • [16:21:01] Ignite node started OK (id=326bab44) • [16:21:01] >>> Ignite cluster is not active (limited functionality available). Use control.(sh|bat) script or IgniteCluster interface to activate. • [16:21:01] Topology snapshot [ver=1, locNode=326bab44, servers=1, clients=0, state=INACTIVE, CPUs=8, offheap=3.2GB, heap=3.6GB] • [16:21:01] ^-- Baseline [id=11, size=3, online=1, offline=2] • [16:21:01] ^-- 2 nodes left for auto-activation [6213b7af-23bb-4c8d-a045-157d7f2d7718, db969788- fc01-41f4-a91c-c03f2d201f76] • [16:21:19] Joining node doesn't have encryption data [node=89b6ef6c-1055-4678-bcfa-00fb222208ce] • [16:21:19] Topology snapshot [ver=2, locNode=326bab44, servers=2, clients=0, state=INACTIVE, CPUs=8, offheap=6.4GB, heap=7.1GB] • [16:21:19] ^-- Baseline [id=11, size=3, online=2, offline=1] • [16:21:19] ^-- 1 nodes left for auto-activation [6213b7af-23bb-4c8d-a045-157d7f2d7718] • [16:21:37] Joining node doesn't have encryption data [node=dd55ff24-da61-42cd-bbaf-c7940fab07d3] • [16:21:37] Topology snapshot [ver=3, locNode=326bab44, servers=3, clients=0, state=INACTIVE, CPUs=8, offheap=9.6GB, heap=11.0GB] • [16:21:37] ^-- Baseline [id=11, size=3, online=3, offline=0] • [16:21:37] ^-- All baseline nodes are online, will start auto-activation
  • 15.
    2019 © GridGainSystems GridGain Company Confidential Disk. Predictable access speed • Disks are slow (even NVMe) • At peak load naïve implementation steps on it’s tail easily • Sudden performance drops to 0
  • 16.
    2019 © GridGainSystems GridGain Company Confidential Disk. Predictable access speed • So, we need to… make Ignite slower • Throttle input load depending on • How fast we produce “dirty” pages • How fast we write to disk • How free the Copy-On-Write buffer is
  • 17.
    2019 © GridGainSystems GridGain Company Confidential Disk. Predictable access speed • Page cache: what can go wrong • We already have one-page cache in Ignite (durable memory) • OS-level page cache • Effectively doubles the memory consumption
  • 18.
    2019 © GridGainSystems GridGain Company Confidential Disk. Predictable access speed • Page cache: solution is Direct IO • Available in Java 10, but we build on Java 8 • Need to have native/platform specific calls or Java- dependent module
  • 19.
    2019 © GridGainSystems GridGain Company Confidential The future? 21 Data Grid Local Store Transactional Persistence ?
  • 20.
    2019 © GridGainSystems GridGain Company Confidential22 Questions?
  • 21.
    2019 © GridGainSystems GridGain Company Confidential More information 23 • Main landing page: https://ignite.apache.org • Documentation: https://apacheignite.readme.io/docs • Please complete our survey on how Apache Ignite should evolve: https://docs.google.com/forms/d/e/1FAIpQLSdUveEVXer3lpkyiqfFw4175T vZzGHUOS4snPfnkO0NDku0eQ/viewform • Realtime data loading: https://www.imcsummit.org/2019/us/session/best- practices-loading-real-time-data-distributed-systems-change-data-capture
  • 22.
    2019 © GridGainSystems GridGain Company Confidential24 Stephen Darlington Senior Consultant, GridGain Systems @sdarlington