PPTX, PDF1,066 views

OrigoDB - Your data fits in RAM

This document summarizes OrigoDB, a database that keeps data in memory and persists operations rather than system state. It discusses how OrigoDB is faster than traditional disk-based databases by avoiding I/O bottlenecks. The document provides examples of how to model data, write commands and queries using OrigoDB's API, and run a simple program with it. It concludes by inviting the reader to try OrigoDB and get involved in its open source community.

Software◦

Speed of light vs spinning metal
What Time Scale
L1 Cache 0.5 ns 0.008 2 m
L2 Cache 7 ns 0.23
RAM 60 ns 1 240 m 1 second
1K over Gbit network 10 µs 167 2.5 minutes
4K read SSD 150 µs 2500
Rotating disk seek 10 ms 167000 40000 km 46 hours

What’s the problem?
Service
Layer
Domain
Layer
Data Access
Layer
Relational
Model
Views/SP’s
Cache

B-trees and Transactions
LOG
DATA 64KB blocks w 8x8KB pages
Logical BTREE of 8kb data pages
In the buffer pool (cache)
Buffer
Manager
Transactions append inserted, deleted, original and modified pages to the LOG
CHECKPOINT

One simple idea...
Keep state in memory
Persist operations, not system state
s0 s1 s2
op1 op2
Sn = apply(opn, Sn-1)

... with many names
• System prevalance – Prevayler, java
• MongoDB op log
• Redis AOF
• Memory Image – Martin Fowler
• VoltDB – logical logging
• Akka persistence – logging per actor
• Event Sourcing

OrigoDB
Kernel
Engine
Model
Storage
App Code
Server
Command
Query
File
Sql
Event Store
Custom
Consistency
Isolation
concurrency
Sends commands and queries
Journaling
Snapshots
BinaryFormatter
ProtoBuf
JSON
tcp
JSON/http
In-process
calls
Domain specific
object-graph
Domain specific operations
Replication
Ad-hoc queries
Web ui
Console or win svc

Complete history of events
• Point in time
• Debugging
• Restore
• Queries
• Audit trail
• New interpretations

Example – the model
[Serializable]
public class CommerceModel : Model
{
internal SortedDictionary<Guid, Customer> Customers { get; set; }
internal SortedDictionary<Guid, Order> Orders { get; set; }
internal SortedDictionary<Guid, Product> Products { get; set; }
public CommerceModel()
{
Customers = new SortedDictionary<Guid, Customer>();
Orders = new SortedDictionary<Guid, Order>();
Products = new SortedDictionary<Guid, Product>();
}
}

Command
[Serializable]
public class AddCustomer : Command<CommerceModel>
{
public readonly Guid Id;
public readonly string Name;
public AddCustomer(Guid id, String name)
{
Id = id;
Name = name;
}
public override void Execute(CommerceModel model)
{
if (model.Customers.ContainsKey(Id)) Abort("Duplicate customer id");
var customer = new Customer {Id = Id, Name = Name};
model.Customers.Add(Id, customer);
}
}

Query
[Serializable]
public class CustomerById : Query<CommerceModel, CustomerView>
{
public readonly Guid Id;
public CustomerById(Guid id)
{
Id = id;
}
public override CustomerView Execute(CommerceModel model)
{
if (!model.Customers.ContainsKey(Id)) throw new Exception("no such customer");
return new CustomerView(model.Customers[Id]);
}
}

Start your engines!
static void Main(string[] args)
{
var engine = Engine.For<CommerceModel>();
Guid id = Guid.NewGuid();
var customerCommand = new AddCustomer(id, "Homer");
engine.Execute(customerCommand);
var customerView = engine.Execute(new CustomerById(id));
Console.WriteLine(customerView.Name);
Console.WriteLine("{0} orders", customerView.OrderIds.Count);
Console.ReadLine();
}

Demos!
Geekstream
Graphs
origodb.com
spatial modeling – shortest path

Thank you!
• Try Origo!
• Contribute, it’s open source
• http://origodb.com
• @robertfriberg, robert@devrexlabs.com

OrigoDB - Your data fits in RAM

1.
OrigoDB Your data fitsin RAM @robertfriberg robert@devrexlabs.com origodb.com
2.
Price/GB vs GB/Server1980- 2015 8 USD 6.480.000 USD 0.001 GB 2000 GB
3.
Your data fitsin RAM!
4.
Speed of lightvs spinning metal What Time Scale L1 Cache 0.5 ns 0.008 2 m L2 Cache 7 ns 0.23 RAM 60 ns 1 240 m 1 second 1K over Gbit network 10 µs 167 2.5 minutes 4K read SSD 150 µs 2500 Rotating disk seek 10 ms 167000 40000 km 46 hours
5.
What’s the problem? Service Layer Domain Layer DataAccess Layer Relational Model Views/SP’s Cache
6.
B-trees and Transactions LOG DATA64KB blocks w 8x8KB pages Logical BTREE of 8kb data pages In the buffer pool (cache) Buffer Manager Transactions append inserted, deleted, original and modified pages to the LOG CHECKPOINT
7.
THE RDBMS Architecture is Obsolete
8.
3. OrigoDB Build fastersystems faster
9.
One simple idea... Keepstate in memory Persist operations, not system state s0 s1 s2 op1 op2 Sn = apply(opn, Sn-1)
10.
... with manynames • System prevalance – Prevayler, java • MongoDB op log • Redis AOF • Memory Image – Martin Fowler • VoltDB – logical logging • Akka persistence – logging per actor • Event Sourcing
11.
OrigoDB Kernel Engine Model Storage App Code Server Command Query File Sql Event Store Custom Consistency Isolation concurrency Sendscommands and queries Journaling Snapshots BinaryFormatter ProtoBuf JSON tcp JSON/http In-process calls Domain specific object-graph Domain specific operations Replication Ad-hoc queries Web ui Console or win svc
12.
Complete history ofevents • Point in time • Debugging • Restore • Queries • Audit trail • New interpretations
13.
Example – themodel [Serializable] public class CommerceModel : Model { internal SortedDictionary<Guid, Customer> Customers { get; set; } internal SortedDictionary<Guid, Order> Orders { get; set; } internal SortedDictionary<Guid, Product> Products { get; set; } public CommerceModel() { Customers = new SortedDictionary<Guid, Customer>(); Orders = new SortedDictionary<Guid, Order>(); Products = new SortedDictionary<Guid, Product>(); } }
14.
Command [Serializable] public class AddCustomer: Command<CommerceModel> { public readonly Guid Id; public readonly string Name; public AddCustomer(Guid id, String name) { Id = id; Name = name; } public override void Execute(CommerceModel model) { if (model.Customers.ContainsKey(Id)) Abort("Duplicate customer id"); var customer = new Customer {Id = Id, Name = Name}; model.Customers.Add(Id, customer); } }
15.
Query [Serializable] public class CustomerById: Query<CommerceModel, CustomerView> { public readonly Guid Id; public CustomerById(Guid id) { Id = id; } public override CustomerView Execute(CommerceModel model) { if (!model.Customers.ContainsKey(Id)) throw new Exception("no such customer"); return new CustomerView(model.Customers[Id]); } }
16.
Start your engines! staticvoid Main(string[] args) { var engine = Engine.For<CommerceModel>(); Guid id = Guid.NewGuid(); var customerCommand = new AddCustomer(id, "Homer"); engine.Execute(customerCommand); var customerView = engine.Execute(new CustomerById(id)); Console.WriteLine(customerView.Name); Console.WriteLine("{0} orders", customerView.OrderIds.Count); Console.ReadLine(); }
17.
Demos! Geekstream Graphs origodb.com spatial modeling –shortest path
18.
Thank you! • TryOrigo! • Contribute, it’s open source • http://origodb.com • @robertfriberg, robert@devrexlabs.com

Editor's Notes

#3 The cost of RAM has dropped from millions of dollars per GB in 1980 to 8 dollars per GB in 2015. During the same period the amount of possible RAM in a single commodity server has increased from around one MB to 2 TB. 99% of all OLTP databases are < 1TB – Michael Stonebraker This means your data fits in RAM. RAM in the Cloud as per 2015-06-01 https://aws.amazon.com/ec2/instance-types/ R3.8xlarge 32 cores, 244 GB Azure 112GB
#5 Why is memory so much faster? 200 meters to the convenience store down the block and back is 400 meters. 40000 km = circumference of the earth To give you some perspective... So why isn’t in-memory the default? Next slide...
#6 An application using a traditional disk-based database is not only slower, it is more complex. More layers and subsystems that take time and cost money to develop and maintain. Some concrete examples problems: Moving data back and forth – takes time. Also concurrency issues when using CRUD pattern Dual domain models – you have to models to maintain, the object-oriented domain model and the data model Mapping – Object/relational mapping between the 2 models Caching – A disk based db is so slow that you need a cache, now you have 3 models to maintain and cache validation problems. Source control of database objects Database migrations Debugging stored procedures Concurrency bugs due to default isolation level of read committed Concurrency bugs and deadlocks due to explicit transaction management
#7 I use this slide to describe how a relational database processes write transactions, how the log works, Logical structure of data pages, b-trees, the buffer pool, buffer manager, checkpoints etc. Each table is stored as either a b-tree or heap. Each secondary index is also represented as a b-tree. 8 While processing write transactions. the RDBMS writes new, deleted, original and modified data pages (of each table and index affected) to the transaction log. This is called effect logging. During periodic checkpoints, the data files are updated to reflect changes since the previous checkpoint. Data pages required by queries and transactions must be present in memory (buffer pool). Disk I/O is the main bottleneck in an RDBMS system. Every aspect of the design and architecture is centered around this fact. The main goal is to minimize the number of random disk reads and writes.
#8 The relational database architecture is an impressive piece of engineering, highly optimized and evolved for over 30 years. But the premises true att the time when the architecture was conceptualized no longer hold. Your data fits in RAM.
#9 Not faster but easier. Simplicity. Consistency. Testing.
#10 So how do we do persistence if the data is in-memory? How do we achieve durability, the D in ACID? The short answer: Write the operations to a log, similar to the transaction log in an RDBMS but log the operations themselves, not the effect. Current state of a system is a function of the previous state and the most recent operation applied to it. If we know the complete sequence of operations and the initial state, the current (and any intermediate) state can be reconstructed. OrigoDB state is an object graph defined using NET types and collections The initial state is either provided by the user or created by calling a default constructor The entire sequence of operations is persisted to the journal The system is restored during startup by re-applying the operations to the initial state Operations must be deterministic and side effect free
#11 One simple idea with many names and applications. Logging operations is not a new concept at all. Disk-based systems use it to persist transactions until the actual data is written to disk, some systems use it for replication. OrigoDB is nearly identical to Prevayler. Both have user-defined transactions, queries and in-memory model defined with a java and C# respectively. Both achieve persistence by logging and (optional in the case of OrigoDB) snapshots. Martin Fowler calls the pattern ”Memory Image”, Klaus Wuestefeld, founder of Prevayler, calls it System Prevalance. Redis is similar to OrigoDB in that data is in-memory only and uses logging for persistence. Redis differs by having a predefined key/value store model where values can be simple values or complex data structures. And of course redis is written in highly-optimized C with superior performance. Event Sourcing, coined by Greg Young, is an extension to Domain Driven Design where the state of a single Aggregate is defined by a sequence of Domain Events. One could say that OrigoDB is an event sourced single aggregate.
#12 Here are the components of an origodb application. Blue things are OrigoDB components that your application interacts with. Peach colored things are things that you define or derive from. In-memory database engine/server Code and data in same process Write-ahead command logging and snapshots Open Source single DLL for NET/Mono Commercial server with mirror replication In-memory In-memory object graph, user defined. Probably collections, entities and references. Your choice. Is it a database? yes. Is it an object database? yes. Is it a graph database? Yes. DatabaseLinq queries. Toolkit Flexible, configurable, kernels, storage, data model, persistence modes, formatting Bring your own model. – this is key. Usually a product based on a specific data model. VoltDB, Raven Naming. LiveDomain -> LiveDB -> OrigoDB What is OrigoDB? OrigoDB is an in-memory database toolkit. The core component is the Engine. The engine is 100% ACID, runs in-process and hosts a user defined data model. The data model can be domain specific or generic and is defined using plain old NET types. Persistence is based on snapshots and write-ahead command logging to the underlying storage. The Model is an instance of the user defined data model lives in RAM only is the data is a projection of the entire sequence of commands applied to the initial model, usually empty. can only be accessed through the engine The Client has no direct reference to the model interacts directly with the Engine either in-process or remote or indirectly via a proxy with the same interface as the model passes query and command objects to the engine The Engine The Engine encapsulates an instance of the model and is responsible for atomicity, consistency, isolation and durability. It performs the following tasks: writes commands to the journal executes commands and queries reads and writes snapshots restores the model on startup We call it a toolkit because you have a lot of options Modelling - define your own model or use an existing one. Generic or domain specific. It’s up to you. Storage - Default is FileStore. SqlStore or write your own module. Data format - Choose wire and storage format by plugging in different IFormatter implementations. Binary, JSON, ProtoBuf, etc Read more in the docs on Extensibility Design goals Our initial design goals were focused on rapid development, testability, simplicity, correctness, modularity, flexibility and extensibility. Performance was never a goal but running in-memory with memory optimized data structures outperforms any disk oriented system. But of course a lot of optimization is possible.
#13 OrigoDB is a cousin of Event Sourcing.. The entire database is a single aggregrate and there is single stream om events, the commands that were executed. We call it command journaling. If the database only stores current state, then previous states and commands that caused the transitions to new states are lost. With command journaling you have a complete history of every single command that was executed. During system startup the commands in the journal are replayed but there are other benefits. It’s possible to restore to a specific command or point in time. This is useful if you need to discard commands and rollback. It’s also possible to step through the code in a debugger or execute a query at a given point in time. In some applications it’s necessary to keep an audit log of every single change made. With OrigoDB this is automatic. The journal contains every single command including parameters, when it was executed and who made the request.
#14 An instance of the model IS the database. Create your own domain specific model or choose a generic one. An object IS a strongly typed graph.
#15 A command is a write transaction. The engine calls the Execute method passing a reference to the in-memory model. Command authoring guidelines No side effects or external actions – like send an email No external dependencies – like datetime.now, random Unhandled exceptions trigger rollback (full restore) Call Command.Abort() to signal exception or throw CommandAbortedException Immutable is good
#16 A query is a read-transaction. You have read access to the model in the Execute method.
#17 Engine.For<T>() is a complex method. It will look for a configuration string in the application configuration file and a create either a local or remote client. If local, it will look for a journal in the current directory or App_Data when running in a web context. If no journal exists, it will create a new one. The returned object is an IEngine<T>. The engine is thread safe, just pass commands and queries to it and that’s it. Now go write some code!
#18 http://geekstream.devrexlabs.com https://github.com/DevrexLabs/GeekStream http://origodb.com http://github.com/origodb OrigoDB Lite – the core essentials in 250 lines of code http://github.com/rofr/origolite OrigoDB GeoSpatial and GraphModel using QuickGraph (quickgraph.codeplex.com) https://gist.github.com/rofr/d5fe5f553327dc00a26a

OrigoDB - Your data fits in RAM

More Related Content

What's hot

Similar to OrigoDB - Your data fits in RAM

Recently uploaded

OrigoDB - Your data fits in RAM

Editor's Notes