librados
object model
● pools
– 1s to 100s
– independent namespaces or object collections
– replication level, placement policy
● o...
atomic transactions
● client operations send to the OSD cluster
– operate on a single object
– can contain a sequence of o...
key/value storage
● store key/value pairs in an object
– independent from object attrs or byte data payload
● based on goo...
watch/notify
● establish stateful 'watch' on an object
– client interest persistently registered with object
– client keep...
CLIENT
#1
CLIENT
#2
CLIENT
#3
OSD
watch
ack/commit
ack/commit
watch
ack/commit
watch
notify
notify
notify
notify
ack
ack
a...
watch/notify example
● radosgw cache consistency
– radosgw instances watch a single object
(.rgw/notify)
– locally cache b...
rados classes
● dynamically loaded .so
– /var/lib/rados-classes/*
– implement new object “methods” using existing methods
...
class examples
● grep
– read an object, filter out individual records, and
return those
● sha1
– read object, generate fin...
ideas
● rados mailbox (RMB?)
– plug librados backend into dovecot, postfix, etc.
– key/value object for each mailbox
● key...
ideas
● distributed key/value table
– aggregate many k/v objects into one big 'table'
– working prototype exists (thanks, ...
ideas
● lua rados class
– embed lua interpreter in a rados class
– ship semi-arbitrary code for operations
● json class
– ...
Upcoming SlideShare
Loading in …5
×

Ceph Day NYC: Developing With Librados

742 views

Published on

This talk, given by Sage Weil at Ceph Day NYC, introduces developers to librados and discusses the hidden capabilities of RADOS.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
742
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
21
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Ceph Day NYC: Developing With Librados

  1. 1. librados
  2. 2. object model ● pools – 1s to 100s – independent namespaces or object collections – replication level, placement policy ● objects – bazillions – blob of data (bytes to gigabytes) – attributes (e.g., “version=12”; bytes to kilobytes) – key/value bundle (bytes to gigabytes)
  3. 3. atomic transactions ● client operations send to the OSD cluster – operate on a single object – can contain a sequence of operations, e.g. ● truncate object ● write new object data ● set attribute ● atomicity – all operations commit or do not commit atomically ● conditional – 'guard' operations can control whether operation is performed ● verify xattr has specific value ● assert object is a specific version – allows atomic compare-and-swap etc.
  4. 4. key/value storage ● store key/value pairs in an object – independent from object attrs or byte data payload ● based on google's leveldb – efficient random and range insert/query/removal – based on BigTable SSTable design ● exposed via key/value API – insert, update, remove – individual keys or ranges of keys ● avoid read/modify/write cycle for updating complex objects – e.g., file system directory objects
  5. 5. watch/notify ● establish stateful 'watch' on an object – client interest persistently registered with object – client keeps session to OSD open ● send 'notify' messages to all watchers – notify message (and payload) is distributed to all watchers – variable timeout – notification on completion ● all watchers got and acknowledged the notify ● use any object as a communication/synchronization channel – locking, distributed coordination (ala ZooKeeper), etc.
  6. 6. CLIENT #1 CLIENT #2 CLIENT #3 OSD watch ack/commit ack/commit watch ack/commit watch notify notify notify notify ack ack ack complete
  7. 7. watch/notify example ● radosgw cache consistency – radosgw instances watch a single object (.rgw/notify) – locally cache bucket metadata – on bucket metadata changes (removal, ACL changes) ● write change to relevant bucket object ● send notify with bucket name to other radosgw instances – on receipt of notify ● invalidate relevant portion of cache
  8. 8. rados classes ● dynamically loaded .so – /var/lib/rados-classes/* – implement new object “methods” using existing methods – part of I/O pipeline – simple internal API ● reads – can call existing native or class methods – do whatever processing is appropriate – return data ● writes – can call existing native or class methods – do whatever processing is appropriate – generates a resulting transaction to be applied atomically
  9. 9. class examples ● grep – read an object, filter out individual records, and return those ● sha1 – read object, generate fingerprint, return that ● images – rotate, resize, crop image stored in object – remove red-eye ● crypto – encrypt/decrypt object data with provided key
  10. 10. ideas ● rados mailbox (RMB?) – plug librados backend into dovecot, postfix, etc. – key/value object for each mailbox ● key = message id ● value = headers – object for each message or attachment – watch/notify to delivery notification
  11. 11. ideas ● distributed key/value table – aggregate many k/v objects into one big 'table' – working prototype exists (thanks, Eleanor!)
  12. 12. ideas ● lua rados class – embed lua interpreter in a rados class – ship semi-arbitrary code for operations ● json class – parse, manipulate json structures

×