IN-MEMORY COMPUTING
ROBERT FRIBERG, DEVREX LABS
HTTP://DEVREXLABS.COM/
@ROBERTFRIBERG
About me
◦ Independent Developer and Trainer
◦ Sql Server DBA since 6.5
◦ C#, javascript, perl, java, +++
◦ Machine learning, AI
◦ Squash fanatic
Agenda
◦ Revisiting Traditional RDBMS
◦ Defining IMDB
◦ A look at a few in-memory products
◦ OrigoDB in depth
◦ Goals
◦ Learn technical stuff
◦ Thinking different
What is a database?
◦ An organized collection of information
◦ Allows reading and writing
◦ Provides authorization and authentication
◦ Provides some level of data safety
Demand drives change
◦ Performance
◦ Data volume
◦ Scalability
◦ Availability
◦ Modeling
• NoSQL
• Big data
• Graph
• Real time analytics
• In-memory computing
• Column stores
One size no longer fits all
B-TREE data structure
B-trees and Transactions
LOG
DATA 64KB blocks w 8x8KB pages
Logical BTREE of 8kb data pages
In the buffer pool (cache)
Buffer
Manager
Transactions append inserted, deleted, original and modified pages to the LOG
• Fill factor
• Page splits
• Clustered index
• Checkpoint
Transactions
A Atomic
C Consistent
I Isolated
D Durable s0 s1 s2t1 t2
What is s?
Isolation Levels
◦ SERIALIZABLE
◦ REPEATABLE_READ
◦ READ_COMMITTED_SNAPSHOT (MVCC) (Row versioning)
◦ READ_COMMITTED
◦ READ_UNCOMMITTED – dont worry, be happy
consistency
performance
“the B-tree is optimized for
systems that read and
write large blocks of data”
- Wikipedia
The Traditional RDBMS Architecture
”.. is obsolete”
-Michael
Stonebraker
Reference: OLTP through the looking glass, Stonebraker et al
OLTP vs. OLAP mismatch
Read load
OLAP Read intensive, touches a lot of
data, benefits from indexes
Write
load
OLTP
Write intensive
- Small writes
- small reads
- hot spots
Indexes hurt
write
performance
In-memory
Disk-based
What is an in-memory database?
◦ PRIMARY representation is in-memory
◦ Memory optimized data structures
◦ ALL the data in memory (possibly distributed)
(in-memory is not necessarily in-process)
Transaction logging
◦ Write Ahead Logging – write to disk before commit
◦ Effect logging – persist the effected datapages
◦ Command logging – persist the cause
IMDB Applications
◦ Real time applications with no durability requirements
◦ Embedded, router, online gaming
◦ Real time applications with durability requirements, low latency, high throughput
◦ Traditional applications during test and development (and production)
◦ Whenever data fits in RAM or can be distributed
◦ General OLTP replacement when DB < 2TB
Some In-memory Products
memcached
In-memory computing
SQL Server Hekaton
◦ Memory optimized tree structure
◦ Almost Lock-free Mvcc concurrency control
◦ Command logging
◦ Seamlessly Integrated in the traditional model
◦ Indexing
◦ Joins
◦ Querying
redis
◦ Redis is an open source, BSD licensed, advanced key-value store. It is often referred to
as a data structure server since keys can contain strings, hashes, lists, sets and sorted
sets.
◦ Extremely popular and widespread
(twitter, flicker, github, digg, disqus, Instagram, stackoverflow)
◦ Written in C, great performance
Comparison Matrix
Product License Datamodel Interface ACID Distributed Concurrency
Control
VoltDB OSS Relational Java/sql yes Yes (2PC) Serialized
memsql $$ Relational SQL Almost Yes Mvcc
aerospike $$ Key/value many yes Yes(2PC) CAS
SQL Server $$ Relational + T-SQL Yes (no) No Locking,
mvcc
NuoDB $$ Relational SQL Yes
Hazelcast OSS Key/value+ java Almost Yes (2PC)
Gridgain OSS Key/value Java,sql Yes Yes (2PC) mvcc
Origodb OSS + User defined NET/REST Yes No
Master/slave
Serialized +
Redis OSS Key/value + Many/LUA Yes No
Master/Slave
Serialized
OrigoDB
◦ Is it a database? (first name was Livedomain)
◦ Database Toolkit - Define your own datamodel
◦ Write ahead command logging + snapshots
◦ Single writer + multiple reader concurrency (serialized)
◦ Open source embedded engine
◦ 100% ACID
◦ Commercial server with master/slave replication
Design goals
◦Simplicity and correctness before performance
◦Flexibility
◦Rapid development
Modular Kernels
◦ Optimistic Kernel – write ahead logging assumes command will succeed
◦ Royal Food Taster – 2 identical in-memory models
◦ Immutability Kernel – Lock free, writers don’t block readers
◦ Requires immutable model
Evolution of OrigoDB
◦ File formats for document-oriented Desktop applications (java 1996)
◦ Cache invalidation experiments
◦ In-memory search indexes (offline built snapshots -> live updates)
◦ Inception: lambda expressions (NET 2008)
Cousins of OrigoDB
◦ Prevayler (java)
◦ Bamboo (net)
◦ Madeleine (ruby)
◦ Perlvayler
◦ Twisted-python
◦ Lmax
◦ Java-chronicle
Bring your own data model
◦ Generic models = Extra schema + mapping is complex so why?
◦ Relational
◦ Key/Value (value is a blob)
◦ Document (document is structured and queryable)
◦ Graph, nodes and edges
◦ Domain specific models
◦ OO Domain model (DDD) (typed graph)
◦ Javascript V8 environment (persisted node.js)
◦ Machine learning models (Accord.NET)
◦ Lucene.NET indexes
Demo time!
◦ TODO example – Anemic model, transaction script pattern (fat commands)
◦ Twitter clone – rich model with proxy, no commands
◦ Geekstream http://geekstream.devrexlabs.com/
◦ OrigoDB Server http://origodb.com/
Last words
◦ Times are changing! Embrace!
◦ One size does not fit all – go polyglot persistence!
◦ Choose the most appropriate data model
◦ If data fits in RAM go in-memory!
Thank you!
robert@devrexlabs.com
@robertfriberg

In-memory Databases

  • 1.
    IN-MEMORY COMPUTING ROBERT FRIBERG,DEVREX LABS HTTP://DEVREXLABS.COM/ @ROBERTFRIBERG
  • 2.
    About me ◦ IndependentDeveloper and Trainer ◦ Sql Server DBA since 6.5 ◦ C#, javascript, perl, java, +++ ◦ Machine learning, AI ◦ Squash fanatic
  • 3.
    Agenda ◦ Revisiting TraditionalRDBMS ◦ Defining IMDB ◦ A look at a few in-memory products ◦ OrigoDB in depth ◦ Goals ◦ Learn technical stuff ◦ Thinking different
  • 4.
    What is adatabase? ◦ An organized collection of information ◦ Allows reading and writing ◦ Provides authorization and authentication ◦ Provides some level of data safety
  • 5.
    Demand drives change ◦Performance ◦ Data volume ◦ Scalability ◦ Availability ◦ Modeling • NoSQL • Big data • Graph • Real time analytics • In-memory computing • Column stores One size no longer fits all
  • 6.
  • 7.
    B-trees and Transactions LOG DATA64KB blocks w 8x8KB pages Logical BTREE of 8kb data pages In the buffer pool (cache) Buffer Manager Transactions append inserted, deleted, original and modified pages to the LOG • Fill factor • Page splits • Clustered index • Checkpoint
  • 8.
    Transactions A Atomic C Consistent IIsolated D Durable s0 s1 s2t1 t2 What is s?
  • 9.
    Isolation Levels ◦ SERIALIZABLE ◦REPEATABLE_READ ◦ READ_COMMITTED_SNAPSHOT (MVCC) (Row versioning) ◦ READ_COMMITTED ◦ READ_UNCOMMITTED – dont worry, be happy consistency performance
  • 10.
    “the B-tree isoptimized for systems that read and write large blocks of data” - Wikipedia
  • 11.
    The Traditional RDBMSArchitecture ”.. is obsolete” -Michael Stonebraker Reference: OLTP through the looking glass, Stonebraker et al
  • 12.
    OLTP vs. OLAPmismatch Read load OLAP Read intensive, touches a lot of data, benefits from indexes Write load OLTP Write intensive - Small writes - small reads - hot spots Indexes hurt write performance In-memory Disk-based
  • 13.
    What is anin-memory database? ◦ PRIMARY representation is in-memory ◦ Memory optimized data structures ◦ ALL the data in memory (possibly distributed) (in-memory is not necessarily in-process)
  • 14.
    Transaction logging ◦ WriteAhead Logging – write to disk before commit ◦ Effect logging – persist the effected datapages ◦ Command logging – persist the cause
  • 15.
    IMDB Applications ◦ Realtime applications with no durability requirements ◦ Embedded, router, online gaming ◦ Real time applications with durability requirements, low latency, high throughput ◦ Traditional applications during test and development (and production) ◦ Whenever data fits in RAM or can be distributed ◦ General OLTP replacement when DB < 2TB
  • 16.
  • 17.
    SQL Server Hekaton ◦Memory optimized tree structure ◦ Almost Lock-free Mvcc concurrency control ◦ Command logging ◦ Seamlessly Integrated in the traditional model ◦ Indexing ◦ Joins ◦ Querying
  • 18.
    redis ◦ Redis isan open source, BSD licensed, advanced key-value store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets and sorted sets. ◦ Extremely popular and widespread (twitter, flicker, github, digg, disqus, Instagram, stackoverflow) ◦ Written in C, great performance
  • 19.
    Comparison Matrix Product LicenseDatamodel Interface ACID Distributed Concurrency Control VoltDB OSS Relational Java/sql yes Yes (2PC) Serialized memsql $$ Relational SQL Almost Yes Mvcc aerospike $$ Key/value many yes Yes(2PC) CAS SQL Server $$ Relational + T-SQL Yes (no) No Locking, mvcc NuoDB $$ Relational SQL Yes Hazelcast OSS Key/value+ java Almost Yes (2PC) Gridgain OSS Key/value Java,sql Yes Yes (2PC) mvcc Origodb OSS + User defined NET/REST Yes No Master/slave Serialized + Redis OSS Key/value + Many/LUA Yes No Master/Slave Serialized
  • 20.
    OrigoDB ◦ Is ita database? (first name was Livedomain) ◦ Database Toolkit - Define your own datamodel ◦ Write ahead command logging + snapshots ◦ Single writer + multiple reader concurrency (serialized) ◦ Open source embedded engine ◦ 100% ACID ◦ Commercial server with master/slave replication
  • 21.
    Design goals ◦Simplicity andcorrectness before performance ◦Flexibility ◦Rapid development
  • 22.
    Modular Kernels ◦ OptimisticKernel – write ahead logging assumes command will succeed ◦ Royal Food Taster – 2 identical in-memory models ◦ Immutability Kernel – Lock free, writers don’t block readers ◦ Requires immutable model
  • 23.
    Evolution of OrigoDB ◦File formats for document-oriented Desktop applications (java 1996) ◦ Cache invalidation experiments ◦ In-memory search indexes (offline built snapshots -> live updates) ◦ Inception: lambda expressions (NET 2008)
  • 24.
    Cousins of OrigoDB ◦Prevayler (java) ◦ Bamboo (net) ◦ Madeleine (ruby) ◦ Perlvayler ◦ Twisted-python ◦ Lmax ◦ Java-chronicle
  • 25.
    Bring your owndata model ◦ Generic models = Extra schema + mapping is complex so why? ◦ Relational ◦ Key/Value (value is a blob) ◦ Document (document is structured and queryable) ◦ Graph, nodes and edges ◦ Domain specific models ◦ OO Domain model (DDD) (typed graph) ◦ Javascript V8 environment (persisted node.js) ◦ Machine learning models (Accord.NET) ◦ Lucene.NET indexes
  • 26.
    Demo time! ◦ TODOexample – Anemic model, transaction script pattern (fat commands) ◦ Twitter clone – rich model with proxy, no commands ◦ Geekstream http://geekstream.devrexlabs.com/ ◦ OrigoDB Server http://origodb.com/
  • 27.
    Last words ◦ Timesare changing! Embrace! ◦ One size does not fit all – go polyglot persistence! ◦ Choose the most appropriate data model ◦ If data fits in RAM go in-memory! Thank you! robert@devrexlabs.com @robertfriberg

Editor's Notes

  • #7 Explain the basic operations like insert, seek and scan
  • #9 Explain the basics quickly.Talk about the boundaries of s.Ask: Is an RDBMS ACID? Answer on next slide.
  • #10 Consistency and isolation are not binary.
  • #13 Reporting.In-memory pushes the boundaries
  • #21 Explain each of the bullets relating to previous topics.Recall slide ”What is a database”?
  • #22 Great performance comes for free but could be optimized.
  • #25 Some other frameworks based on or supporting write-ahead command logging and snapshots with a user defined in-memory model.
  • #26 Defining a custom data model is what makes OrigoDB unique.