An introduction to Pincaster


Published on

An introduction to the Pincaster nosql data store.

Published in: Technology
1 Comment
No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

  • An introduction to Pincaster

    1. 1. This is just a kiwi, what were you thinking about?
    2. 2. What is Pincaster ? • Not Only SQL (well, not at all) • Not only a database for geolocalized apps • Not only a key/value store • Not only a persistent database • Not only a non-relational database • Not only a database.
    3. 3. Speaks HTTP / JSON • Adds a slight overhead over a binary protocol, but makes it dead simple to use in almost any language and environment. • Pincaster is fast and lightweight no matter what. And there's plenty of room for optimization. HTTP keep alive is supported. • Written in C. Runs on OSX / Linux / *BSD and doesn't require any external dependency. Valgrind doesn't cry. BSD license. • Event-driven model with asynchronous workers. Powered by the awesome libevent2 library. • And shamelessly reuses (in a different implementation) some of the nice concepts from Redis. Why not?
    4. 4. Threads HTTP server OpReply queue Worker Zero copy Worker Domain handler Op queue Worker Journal rewriter fork()ed
    5. 5. Layers • A layer is like a database, identified by a unique name. • A layer contains a set of records. • Layers are independent and can have different settings. • Layers can be created / deleted online. • Layers can mix different types of records.
    6. 6. Void records • Just unique keys. • Fast and memory efficient. • Serialized data can be embedded in keys. • Useful as flags and in range queries.
    7. 7. Hashes Property Binary-safe value Key Property Binary-safe value Property Binary-safe value
    8. 8. Atomic operations No transaction, but multiple changes can be combined as a single atomic operation: • Add new properties • Update properties • Delete properties • Change special properties • Increment/decrement counters which are automatically created if needed.
    9. 9. Points Latitude, longitude Key or x,y Location is set through the special _loc property. Indexed with quad-trees. Space efficient, points are grouped into buckets. Designed for dynamic data like geolocalized applications.
    10. 10. Layers types • Flat: rectangular area (x0, y0) - (x1, y1) • Flatwrap: rectangular area with wrapping. • Spherical / geoidal - WGS84 - GPS and map services friendly. Handle corner cases. • Pick your function for non-euclidian distance computation: rhomboid, fast, great circle or haversine.
    11. 11. Simple spatial queries • Find points within a radius (euclidian distance for flat/flatwrap or meters for spherical/geoidal layers). • Find points within a rectangle. Wraparound is properly handled. You can directly query according to the Google Map viewing area. • Overflow is reported. • Clustering is on the way.
    12. 12. Points+hashes Location Key Property Binary-safe value Property Binary-safe value Spatial queries can optionally return properties.
    13. 13. Expirable records • Records can automatically expire by setting an _expires_at property. • Expiration dates can be changed, removed and readded at will. • Can act like a memcache speaking HTTP/ JSON with a set of properties per record. Also useful for ephemeral geo data (e.g. when storing location of online users).
    14. 14. Range queries • Keys are lexically sorted (red-black tree). • Hence, range queries are cheap. • Pincaster currently offers prefix-matching. • Results of range queries can include keys, keys + overview or keys + overview + properties.
    15. 15. Linked records • Symbolic links to other records with special properties starting with $link: . • Implements N:1 relations but 1:N and N:N can be represented by multiple links. Records can have any number of links. • No referential integrity. • Useful for easy retrieval of related records. But nowhere an alternative to a graph database. • Just add link=1 to any query in order to traverse links and retrieve related records (duplicate records and loops are tracked).
    16. 16. Name Donald Duck Donald Location $link:favorite restaurant Rest1234 Rest1234 Description Mac Donald's Location
    17. 17. Public HTTP service • Public data can be embedded in records, through $content and $content_type properties. • Especially useful to store JSON data and HTML partials that can be directly served to browsers. You can also think about it like memcache with an embedded web server. • Should be used behind a proxy or a filtering load balancer, though.
    18. 18. Expiration Location Donald Name Donald Duck $content <html>Quack quack!</html> $content_type text/html http://host/api/1.0/records/users/Donald.json Public: http://host/public/users/Donald.html Quack quack!
    19. 19. Durability • Similar to Redis AOF, but with a non-binary (human readable and tweakable) journal. • Data set and indices are kept in memory. • But an append-only journal can log every query that is going to change a database. Timestamps allow point-in-time rollback. • Configurable fsync() policy: after every commit, never or after x seconds. • Combines efficiency of an in-memory database with durability.
    20. 20. Journal rewrite • Reduces startup time. • Constructs a new journal with only needed operations to reconstruct the data set. • Background operation happening in a child process with a low priority. • Takes advantage of copy-on-write through fork() and appends missing records afterwards. • The new journal atomically replaces the previous one after successful completion.
    21. 21. Coming soon • Replication • Clustering of spatial results (partly implemented). • Optimization (distance computation, finer grained rwlocks, use slabs in more areas). • Spidermonkey integration. • Automatically move expired records to another layer. • Observers (push a notification over a HTTP channel when there's an update in a geographic zone). • Client libraries and possibly a decent web site (help needed!)
    22. 22. jedisct1/Pincaster/
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.