An introduction to Pincaster

  • 4,769 views
Uploaded on

An introduction to the Pincaster nosql data store.

An introduction to the Pincaster nosql data store.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • GIS
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
4,769
On Slideshare
0
From Embeds
0
Number of Embeds
4

Actions

Shares
Downloads
30
Comments
1
Likes
7

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide






















Transcript

  • 1. This is just a kiwi, what were you thinking about?
  • 2. What is Pincaster ? • Not Only SQL (well, not at all) • Not only a database for geolocalized apps • Not only a key/value store • Not only a persistent database • Not only a non-relational database • Not only a database.
  • 3. Speaks HTTP / JSON • Adds a slight overhead over a binary protocol, but makes it dead simple to use in almost any language and environment. • Pincaster is fast and lightweight no matter what. And there's plenty of room for optimization. HTTP keep alive is supported. • Written in C. Runs on OSX / Linux / *BSD and doesn't require any external dependency. Valgrind doesn't cry. BSD license. • Event-driven model with asynchronous workers. Powered by the awesome libevent2 library. • And shamelessly reuses (in a different implementation) some of the nice concepts from Redis. Why not?
  • 4. Threads HTTP server OpReply queue Worker Zero copy Worker Domain handler Op queue Worker Journal rewriter fork()ed
  • 5. Layers • A layer is like a database, identified by a unique name. • A layer contains a set of records. • Layers are independent and can have different settings. • Layers can be created / deleted online. • Layers can mix different types of records.
  • 6. Void records • Just unique keys. • Fast and memory efficient. • Serialized data can be embedded in keys. • Useful as flags and in range queries.
  • 7. Hashes Property Binary-safe value Key Property Binary-safe value Property Binary-safe value
  • 8. Atomic operations No transaction, but multiple changes can be combined as a single atomic operation: • Add new properties • Update properties • Delete properties • Change special properties • Increment/decrement counters which are automatically created if needed.
  • 9. Points Latitude, longitude Key or x,y Location is set through the special _loc property. Indexed with quad-trees. Space efficient, points are grouped into buckets. Designed for dynamic data like geolocalized applications.
  • 10. Layers types • Flat: rectangular area (x0, y0) - (x1, y1) • Flatwrap: rectangular area with wrapping. • Spherical / geoidal - WGS84 - GPS and map services friendly. Handle corner cases. • Pick your function for non-euclidian distance computation: rhomboid, fast, great circle or haversine.
  • 11. Simple spatial queries • Find points within a radius (euclidian distance for flat/flatwrap or meters for spherical/geoidal layers). • Find points within a rectangle. Wraparound is properly handled. You can directly query according to the Google Map viewing area. • Overflow is reported. • Clustering is on the way.
  • 12. Points+hashes Location Key Property Binary-safe value Property Binary-safe value Spatial queries can optionally return properties.
  • 13. Expirable records • Records can automatically expire by setting an _expires_at property. • Expiration dates can be changed, removed and readded at will. • Can act like a memcache speaking HTTP/ JSON with a set of properties per record. Also useful for ephemeral geo data (e.g. when storing location of online users).
  • 14. Range queries • Keys are lexically sorted (red-black tree). • Hence, range queries are cheap. • Pincaster currently offers prefix-matching. • Results of range queries can include keys, keys + overview or keys + overview + properties.
  • 15. Linked records • Symbolic links to other records with special properties starting with $link: . • Implements N:1 relations but 1:N and N:N can be represented by multiple links. Records can have any number of links. • No referential integrity. • Useful for easy retrieval of related records. But nowhere an alternative to a graph database. • Just add link=1 to any query in order to traverse links and retrieve related records (duplicate records and loops are tracked).
  • 16. Name Donald Duck Donald Location $link:favorite restaurant Rest1234 Rest1234 Description Mac Donald's Location
  • 17. Public HTTP service • Public data can be embedded in records, through $content and $content_type properties. • Especially useful to store JSON data and HTML partials that can be directly served to browsers. You can also think about it like memcache with an embedded web server. • Should be used behind a proxy or a filtering load balancer, though.
  • 18. Expiration Location Donald Name Donald Duck $content <html>Quack quack!</html> $content_type text/html http://host/api/1.0/records/users/Donald.json Public: http://host/public/users/Donald.html Quack quack!
  • 19. Durability • Similar to Redis AOF, but with a non-binary (human readable and tweakable) journal. • Data set and indices are kept in memory. • But an append-only journal can log every query that is going to change a database. Timestamps allow point-in-time rollback. • Configurable fsync() policy: after every commit, never or after x seconds. • Combines efficiency of an in-memory database with durability.
  • 20. Journal rewrite • Reduces startup time. • Constructs a new journal with only needed operations to reconstruct the data set. • Background operation happening in a child process with a low priority. • Takes advantage of copy-on-write through fork() and appends missing records afterwards. • The new journal atomically replaces the previous one after successful completion.
  • 21. Coming soon • Replication • Clustering of spatial results (partly implemented). • Optimization (distance computation, finer grained rwlocks, use slabs in more areas). • Spidermonkey integration. • Automatically move expired records to another layer. • Observers (push a notification over a HTTP channel when there's an update in a geographic zone). • Client libraries and possibly a decent web site (help needed!)
  • 22. http://github.com/ jedisct1/Pincaster/