• Save
Bitsy graph database
Upcoming SlideShare
Loading in...5

Bitsy graph database



Bitsy is a small, fast, embeddable, durable in-memory graph database that implements the Blueprints API. It supports [ACID] transactions with optimistic concurrency control and on-disk persistence.

Bitsy is a small, fast, embeddable, durable in-memory graph database that implements the Blueprints API. It supports [ACID] transactions with optimistic concurrency control and on-disk persistence.



Total Views
Views on SlideShare
Embed Views



1 Embed 72

https://twitter.com 72



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Bitsy graph database Bitsy graph database Presentation Transcript

  • Bitsy Graph Database Sridhar Ramachandran Founder, LambdaZen LLC
  • What is Bitsy? ● A small, fast, embeddable, durable, in-memory graph database. ● Maintains an on-disk copy of the graph database. ● Designed for multi-threaded OLTP applications. ● Provides ACID guarantees and optimistic concurrency control for transactions. ● Compatible with Tinkerpop/Blueprints -- the graph database standard. Tinkerpop software stack From https://github.com/tinkerpop/blueprints/wiki
  • In-memory and durable? ● Bitsy maintains a copy of the entire graph in memory data-structures. ● Bitsy saves all changes made to the database, to the disk, during a commit operation. ● Commits from different threads are forced to the disk at once, thereby improving the write performance in a multithreaded OLTP environment. ● The database is loaded from files during startup. ● All database files are append-only text files with JSONencoded vertices and edges. ● The database files are periodically compacted by a background thread.
  • Design Principle #1: No Seek ● Bitsy appends all changes to an unordered transaction log, unlike most databases which persist data in B-Trees and other ordered structures. ● Ordered data structures perform multiple seeks per updated element. ● Seek operations on the hard-disk are expensive (5-15 ms). ● Bitsy avoids seeks per element, and addresses rotational latency by combining commits from concurrent transactions. Hard disk head: Seek operations require a mechanical movement of the hard disk head which takes 5-15ms. Rotational latency is the time taken for the requested sector in the rotating platter to reach the head. Takes 2-4ms.
  • Design Principle #2: No Socket ● Typical databases run in a separate process exposing a socket-based protocol to applications. ● The cost of serializing and deserializing the requests and responses, and calling OS-level functions, reduces the overall throughput of the database. ● By avoiding a socket-based protocol between the application and the database, Bitsy can achieve submicrosecond query latencies. The OSI model requires serialization and deserialization as the packet crosses from one layer to another
  • Design Principle #3: No SQL ● Tuning a SQL database is a non-trivial task. ● The biggest factor in a SQL query's efficiency is its execution plan. ● By avoiding SQL and the execution plans that come with it, Bitsy ensures that all queries and updates are efficient*. An example execution plan from Oracle's documentation * The "allow full-graph scan" option must be disabled to guarantee quick responses.
  • Concurrency Model ● Bitsy is designed to work in multi-threaded OLTP environments. ● It implements optimistic concurrency control where edges and vertices are tied to version numbers that are incremented on updates. ● A BitsyRetryException is raised during a transaction commit, if an updated vertex/edge has a different version at the time of commit, than at the time of query. ● The application should retry the entire transaction in case of conflict.
  • Write Algorithms ● ● ● ● ● ● The write algorithms operate on three levels of "double buffers". The transaction buffers capture transactions to be committed simultaneously. The commit waits for the buffer to flush to a transaction file (A/B). Transaction files are moved to vertex and edge files on exceeding a threshold size (default is 4MB). Vertex and edge files are reorganized after a period of growth (default is +1x initial size). Online backups trigger a transaction flush, and then copy the backup the vertex and edge files representing the DB snapshot.
  • Write throughput in an OLTP setting ● ● ● The plot below shows the throughput of a test application* that repeatedly commits a small transaction (1 vertex + 1 edge) from multiple threads. The throughput exceeds 50K ops/second at 750 concurrent threads. The comparison with Neo4J 1.9.2 illustrates the benefit of "No Seek". * Tests performed on a $600 HP p7-1287c desktop PC with a single 7200 rpm hard disk.
  • Read throughput in an OLTP setting ● ● The plot below shows the read throughput of threads, repeatedly traversing separate portions of the graph in a desktop PC*. Bitsy implements mostly lock-free read algorithms that can perform close to 20M ops/second at 1000 threads -- on par with Neo4J’s warm caches. * Tests performed on a $600 HP p7-1287c desktop PC with 4 cores
  • Monitoring and Management ● Offline backup and restore operations are simple file copy operations on the database directory. ● Bitsy exposes a JMX interface to make online backups, and adjust database parameters. ● Bitsy logs messages using the SLF4J API with logger names starting with "com.lambdazen". Online backup through jconsole
  • Dependencies ● ● ● ● Blueprints Core Jackson JSON Processor SLF4J API Ness Computing Core Component: For fast UUID serialization/deserialization
  • License ● Bitsy is a dual-licensed product. ● The AGPL v3 license can be used for open-source ● projects and internally-used closed-source projects. The commercial license is an extremely liberal license that provides rights to modify and use Bitsy in an unlimited number of instances, products* and services. Pricing details with a 15% promotional discount (till Feb 2014) Startups and small businesses (1-10 employees) Medium-sized enterprises (10-500 employees) Large-sized enterprises (500+ employees) $425 annual $1699 perpetual $849 annual $3399 perpetual $1275 annual $5099 perpetual * The products must not encourage the direct use of Bitsy APIs.
  • Wrap-up ● Bitsy is a small, fast, embeddable, durable, in-memory graph database, with the following features: ○ ACID guarantees and clean recovery from crashes ○ Query latency in sub-microseconds ○ High transaction throughput in an OLTP setting with multiple clients/threads accessing the database ● ○ Well-defined optimistic concurrency model ○ Support for online backups ○ Human-readable database files ○ Small code footprint (~1.5MB with dependencies) Bitsy is dual-licensed under AGPL and a liberal commercial license for unlimited enterprise-wide use.
  • Questions and Feedback ● The project is hosted at https://bitbucket. org/lambdazen/bitsy with publicly accessible ○ Documentation and install instructions (in Wiki) ○ Links to downloads ○ Issue management ● Please email your questions and feedback to bisty@lambdazen.com