Bitsy graph database


Published on

Bitsy is a small, fast, embeddable, durable in-memory graph database that implements the Blueprints API. It supports [ACID] transactions with optimistic concurrency control and on-disk persistence.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Bitsy graph database

  1. 1. Bitsy Graph Database Sridhar Ramachandran Founder, LambdaZen LLC
  2. 2. What is Bitsy? ● A small, fast, embeddable, durable, in-memory graph database. ● Maintains an on-disk copy of the graph database. ● Designed for multi-threaded OLTP applications. ● Provides ACID guarantees and optimistic concurrency control for transactions. ● Compatible with Tinkerpop/Blueprints -- the graph database standard. Tinkerpop software stack From
  3. 3. In-memory and durable? ● Bitsy maintains a copy of the entire graph in memory data-structures. ● Bitsy saves all changes made to the database, to the disk, during a commit operation. ● Commits from different threads are forced to the disk at once, thereby improving the write performance in a multithreaded OLTP environment. ● The database is loaded from files during startup. ● All database files are append-only text files with JSONencoded vertices and edges. ● The database files are periodically compacted by a background thread.
  4. 4. Design Principle #1: No Seek ● Bitsy appends all changes to an unordered transaction log, unlike most databases which persist data in B-Trees and other ordered structures. ● Ordered data structures perform multiple seeks per updated element. ● Seek operations on the hard-disk are expensive (5-15 ms). ● Bitsy avoids seeks per element, and addresses rotational latency by combining commits from concurrent transactions. Hard disk head: Seek operations require a mechanical movement of the hard disk head which takes 5-15ms. Rotational latency is the time taken for the requested sector in the rotating platter to reach the head. Takes 2-4ms.
  5. 5. Design Principle #2: No Socket ● Typical databases run in a separate process exposing a socket-based protocol to applications. ● The cost of serializing and deserializing the requests and responses, and calling OS-level functions, reduces the overall throughput of the database. ● By avoiding a socket-based protocol between the application and the database, Bitsy can achieve submicrosecond query latencies. The OSI model requires serialization and deserialization as the packet crosses from one layer to another
  6. 6. Design Principle #3: No SQL ● Tuning a SQL database is a non-trivial task. ● The biggest factor in a SQL query's efficiency is its execution plan. ● By avoiding SQL and the execution plans that come with it, Bitsy ensures that all queries and updates are efficient*. An example execution plan from Oracle's documentation * The "allow full-graph scan" option must be disabled to guarantee quick responses.
  7. 7. Concurrency Model ● Bitsy is designed to work in multi-threaded OLTP environments. ● It implements optimistic concurrency control where edges and vertices are tied to version numbers that are incremented on updates. ● A BitsyRetryException is raised during a transaction commit, if an updated vertex/edge has a different version at the time of commit, than at the time of query. ● The application should retry the entire transaction in case of conflict.
  8. 8. Write Algorithms ● ● ● ● ● ● The write algorithms operate on three levels of "double buffers". The transaction buffers capture transactions to be committed simultaneously. The commit waits for the buffer to flush to a transaction file (A/B). Transaction files are moved to vertex and edge files on exceeding a threshold size (default is 4MB). Vertex and edge files are reorganized after a period of growth (default is +1x initial size). Online backups trigger a transaction flush, and then copy the backup the vertex and edge files representing the DB snapshot.
  9. 9. Write throughput in an OLTP setting ● ● ● The plot below shows the throughput of a test application* that repeatedly commits a small transaction (1 vertex + 1 edge) from multiple threads. The throughput exceeds 50K ops/second at 750 concurrent threads. The comparison with Neo4J 1.9.2 illustrates the benefit of "No Seek". * Tests performed on a $600 HP p7-1287c desktop PC with a single 7200 rpm hard disk.
  10. 10. Read throughput in an OLTP setting ● ● The plot below shows the read throughput of threads, repeatedly traversing separate portions of the graph in a desktop PC*. Bitsy implements mostly lock-free read algorithms that can perform close to 20M ops/second at 1000 threads -- on par with Neo4J’s warm caches. * Tests performed on a $600 HP p7-1287c desktop PC with 4 cores
  11. 11. Monitoring and Management ● Offline backup and restore operations are simple file copy operations on the database directory. ● Bitsy exposes a JMX interface to make online backups, and adjust database parameters. ● Bitsy logs messages using the SLF4J API with logger names starting with "com.lambdazen". Online backup through jconsole
  12. 12. Dependencies ● ● ● ● Blueprints Core Jackson JSON Processor SLF4J API Ness Computing Core Component: For fast UUID serialization/deserialization
  13. 13. License ● Bitsy is a dual-licensed product. ● The AGPL v3 license can be used for open-source ● projects and internally-used closed-source projects. The commercial license is an extremely liberal license that provides rights to modify and use Bitsy in an unlimited number of instances, products* and services. Pricing details with a 15% promotional discount (till Feb 2014) Startups and small businesses (1-10 employees) Medium-sized enterprises (10-500 employees) Large-sized enterprises (500+ employees) $425 annual $1699 perpetual $849 annual $3399 perpetual $1275 annual $5099 perpetual * The products must not encourage the direct use of Bitsy APIs.
  14. 14. Wrap-up ● Bitsy is a small, fast, embeddable, durable, in-memory graph database, with the following features: ○ ACID guarantees and clean recovery from crashes ○ Query latency in sub-microseconds ○ High transaction throughput in an OLTP setting with multiple clients/threads accessing the database ● ○ Well-defined optimistic concurrency model ○ Support for online backups ○ Human-readable database files ○ Small code footprint (~1.5MB with dependencies) Bitsy is dual-licensed under AGPL and a liberal commercial license for unlimited enterprise-wide use.
  15. 15. Questions and Feedback ● The project is hosted at https://bitbucket. org/lambdazen/bitsy with publicly accessible ○ Documentation and install instructions (in Wiki) ○ Links to downloads ○ Issue management ● Please email your questions and feedback to