PITHOS
@PYR
#CASSANDRASUMMIT
0
@PYR
CTO at Exoscale, Swiss Cloud Hosting.
Open source developer: pithos, cyanite, riemann, collectd.
AIM OF THIS TALK
Presenting object storage
Show-casing efficient uses of object storage
Presenting pithos
Feedback on usage
OUTLINE
Object Storage 101
6 things you should do with S3
Pithos, your personal Object Store
Pithos in production
OBJECT STORAGE 101
THE ELEVATOR PITCH
Object Storage is a storage architecture that
manages data as objects
Wikipedia
INCEPTION
Asset and content storage for large hosting platforms.
Livejournal's MogileFS.
A shift in how we perceive distributed storage.
ESSENTIAL PROPERTIES
No POSIX guarantees
No atomicity
Eventual consistency
Pushes some responsibility back to the application.
THE OBJECT STORAGE LANDSCAPE
Mostly hosted solutions:
AWS S3
Rackspace Cloud Files
DreamObjects
Exoscale SOS
No real API standardisation
AWS S3 is the de-facto standard
THE ON-PREMISE OBJECT STORAGE LANDSCAPE
Some vendor-backed solutions:
EMC Atmos
Scality
Cloudian
Swift
Ceph
Riak CS
Pithos
A TYPICAL OBJECT STORE REQUEST
#curl-XPUT-d@file.txthttps://mybucket.myprovider.com/some-file.txt
#curlhttps://mybucket.myprovider.com/some-file.txt
S3 TERMINOLOGY
Region: Determines where objects will be stored.
Storage Class: Storage properties for objects.
Bucket: A named container for objects.
Object: A file.
THE S3 API
A global bucket namespace
Artificial hierarchy support
Authentication and Authorization through ACLs
Multipart uploads
CORS support & Form based uploads
Eventual consistency
A GLOBAL BUCKET NAMESPACE
A single consistent namespace for buckets:
Across tenants.
There is only one highlander bucket.
A bucket is located within a region.
HIERACHY SUPPORT
Listing requests may supply a delimiter and prefix.
Emulates directories when keys contain slashes.
HIERARCHY SUPPORT
GET/?delimiter=/HTTP/1.1
Host:mybucket.service.uri
Date:<date>
Authorization:AWS<key>:<signature>
HIERARCHY SUPPORT
<?xmlversion="1.0"encoding="UTF-8"?>
<ListBucketResultxmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<Name>batman</Name>
<Prefix></Prefix>
<MaxKeys>100</MaxKeys>
<Delimiter>/</Delimiter>
<IsTruncated>false</IsTruncated>
<Contents>
<Key>sample.txt</Key>
<LastModified>2014-10-17T12:35:10.423Z</LastModified>
<ETag>"a4b7923f7b2df9bc96fb263978c8bc40"</ETag>
<Size>1603</Size>
<Owner>
<ID>test@example.com</ID>
<DisplayName>test@example.com</DisplayName>
</Owner>
<StorageClass>Standard</StorageClass>
</Contents>
</ListBucketResult>
AUTHENTICATION & AUTHORIZATION THROUGH ACLS
Simple canned ACLs allow common settings.
e.g: public.
An XML syntax is also available.
MULTIPART UPLOADS
Allows uploading several chunks of files.
User-controlled re-aggregation step.
Limits the impact of upload failures for large files.
CORS SUPPORT AND FORM-BASED UPLOADS
Web interaction without any backend components.
CORS setup through an XML configuration syntax.
Form based uploads through pre-signed requests.
EVENTUAL CONSISTENCY
An easy sell at Cassandra Summit
Possible delay between PUT and GET availability.
Checksums avoid massive inconsistencies.
6 THINGS TO DO WITH S3
12-FACTOR APP SUPPORT FOR PERSISTENCE
Eliminates the need for NFS
Eases interaction with PaaS type platforms
http://12factor.net/
STATIC CONTENT HOSTING
Perfect for hosting CSS, JS and other static assets
Simply requires setting a bucket's ACL to public
FORM BASED UPLOADS
Pre-signed requests
Requests encapsulate a policy
No proxying to the S3 service required
Great for supporting user generated content
ARTIFACT STORAGE
Supported in Maven
Supported in Docker Registry
Supported in Apt
Supported in Mesos fetcher
BACKUPS
Great Open-Source options like duplicity.
Commercial storage gateway support.
Some home NAS-type products support S3 as well.
CLIENT-SIDE ENCRYPTION
GPG encryption support.
Guarantees full data ownership, even when leveraging third-
party providers.
Don't lose your keys!
PITHOS, YOUR PERSONAL OBJECT-STORE
FROM THE WEBSITE
Pithos is a daemon which provides an S3-
compatible frontend for storing files in a
Cassandra cluster.
WHY ?
Provide your own S3-compatible service (that's us!)
Restricted from using hosted object-storage services.
Willingness to fully own availability.
PITHOS ESSENTIAL PROPERTIES
Extensive S3 API coverage.
Fully Stateless.
Multi-region support.
Fully Cassandra-backed.
Extensible.
Open-Source.
MISC.
Runs on the JVM.
Written in Clojure.
Small codebase (~ 5300 LoC).
Can run an embedded cassandra for tests purposes.
PITHOS ARCHITECTURE
A daemon built out of 5 isolated and pluggable components.
PITHOS ARCHITECTURE
Keystore
Bucketstore
Metastore
Blobstore
Reporter
OVERALL CONCEPT
THE KEYSTORE
Authentication & Authorization handled outside of pithos.
Only component which doesn't rely on Cassandra by default.
Default implementation relies on the pithos configuration file.
Maps an API key to a credentials.
Example alternative implementation in the documentation.
THE KEYSTORE
{
"tenant":"tenantname",
"secret":"secretkey",
"memberof":["group1","group2"]
}
THE BUCKETSTORE
Stores essential bucket properties
Bucket tenant.
Region and storage-class where bucket is located.
Optional CORS properties.
THE BUCKETSTORE
Bucket ownership is transactional.
Cassandra is not the best suited for this task.
The lightweight transaction features help.
THE BUCKETSTORE
{
"bucket": "batman",
"created":"2012-01-0101:30:00",
"tenant": "test@example.com",
"region": "ch-dk-2",
"acl": "...",
"cors": "..."
}
THE METASTORE
Stores all object details.
References an inode an version in the bucketstore.
Using the path as a key in a wide colum ensures keys are
sorted.
THE METASTORE
{
"bucket": "test",
"object": "file.txt",
"inode": "4e682d3d-28fa-4ea6-aa28-282c2757f31b",
"version": "c97894cd-e2cd-46d5-a217-1add544e88a4",
"atime": "2012-01-0101:30:00",
"size": 1024,
"checksum": "d41d8cd98f00b204e9800998ecf8427e",
"storageclass":"standard",
"acl": "...",
"metadata": {}
}
THE BLOBSTORE
Stores data.
Inodes are lists of blocks.
Blocks are lists of chunks.
Chunks contain small (128k) chunks of the file.
THE BLOBSTORE
Not what Cassandra was meant for.
Works suprisingly well.
THE REPORTER
Emits useful usage information.
Good basis for building billing extensions.
CONFIGURATION
A single configuration file to configure all aspects
Logging & server options.
Keystore, bucketstore, metastore and blobstore.
Each can have its own details / cassandra cluster.
CONFIGURATION
service:
host:"0.0.0.0"
port:8080
logging:
level:info
console:true
overrides:
io.pithos:debug
options:
service-uri:s3.example.com
default-region:myregion
CONFIGURATION
keystore:
keys:
AKIAIOSFODNN7EXAMPLE:
tenant:test@example.com
secret:'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'
bucketstore:
default-region:myregion
cluster:"localhost"
keyspace:storage
regions:
myregion:
metastore:
cluster:"localhost"
keyspace:storage
storage-classes:
standard:
cluster:"localhost"
keyspace:storage
max-chunk:"128k"
max-block-chunk:1024
AREAS OF IMPROVEMENT
V4 Signatures.
Overall S3 API coverage.
Overall S3 Client coverage.
Promoting Cassandra compact storage.
Simple web interface.
More contributors and users!
V4 SIGNATURES
V4 type signatures are still not supported in pithos and are item
number 1 on the todo-list.
OVERALL S3 API COVERAGE
The S3 API is byzantine and corner cases are poorly
documented.
Still missing some useful bits (versioning, bucket policies,
session tokens).
OVERALL S3 CLIENT COVERAGE
Some clients are very sensitive with regard to API behavior.
The essentials work.
Glitches are quickly fixed when caught.
PROMOTING CASSANDRA COMPACT STORAGE
WITH COMPACT STORAGEgives great benefits.
Not yet promoted or automatically converged on startup.
SIMPLE WEB INTERFACE
A simple JavaScript SPA would be nice.
PITHOS IN PRODUCTION
A WORD OF WARNING
Running an object-store is not necessarily for the faint of heart.
HOW WE USE IT
No multi-datacenter clusters.
Dedicated metadata cluster.
Dedicated "blobstore" clusters.
ELSEWHERE
Few known installations (in the 10s).
Always rather large.
Always used where cassandra previously existed.
MAINTENANCE (PITHOS)
A few cases generate orphan inodes and must be pruned
manually.
Internal tooling used for this, should eventually be released.
Rather worry-free
MAINTENANCE (CASSANDRA)
The usual applies
Schedule regular repairs of your clusters
Follow releases
Best supported version: 2.1.x
Quorum is satisfactory in terms of performance.
SCALING
Pithos is stateless.
Colocate cassandra and pithos daemons.
Split blobstore and metastore keyspaces into separate
clusters.
Split Data/Proxy nodes is worth investigating for huge
deployments.
Haproxy to distribute queries to pithos instances.
PARTING WORDS
Try it out! (There's an all-in-one version)
Get involved
Docs need proof-reading, additions.
Some issues need to be tackled.
THANKS !
Pithos owes a lot to:
Max Penet (@mpenet) for the great alia & jet libraries
Datastax for the awesome cassandra java-driver
Its contributors
Apache Cassandra obviously
@pyr

Exoscale: Pithos: your personal S3 object store on cassandra