• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Riak CS in Cloudstack
 

Riak CS in Cloudstack

on

  • 3,681 views

The CloudStack European User group met on Thursday 11th for our quarterly meeting.

The CloudStack European User group met on Thursday 11th for our quarterly meeting.


Stuart Mcall from Basho talked about their RiakCS technology & community

Statistics

Views

Total Views
3,681
Views on SlideShare
2,951
Embed Views
730

Actions

Likes
3
Downloads
34
Comments
1

7 Embeds 730

http://www.shapeblue.com 341
http://www.scoop.it 328
http://shapeblue.com 44
http://www.responsivelaboratory.com 10
https://twitter.com 5
http://feedspot.com 1
http://www.feedspot.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Welcome/Intro
  • Here ’ s the basics
  • Very high level discussion, segue into brief discussion of Riak
  • What you get is a platform on which you can host your own public and private clouds.
  • You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
  • You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
  • You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
  • You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
  • You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
  • You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
  • You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
  • You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
  • a Riak CS stack is composed of 3 critical components. Riak CS exposes an API to the users and is responsible for logging/tracking stats. All the data is stored in and retrieved from Riak. Run multiple instances of Riak and Riak CS for scale. Theres a third component, a single instance of a piece of software called stanchion that is responsible for tying it all together. Stanchion in essence provides the S3-like behavior at an architectural level, ensures user and bucket uniqueness globally, etc....
  • 1-to-1 pairing, and why.
  • 1. user PUTs object into Riak CS. The request will be via an S3 API and signed by their credentials. 2. once authenticated, object is chunked (remind why this is important) 3. as object is chunked, chunks sent to Riak. (you can use haproxy in the middle here) 4. Riak stores the chunks, yay!
  • Riak CS is multi-tenant. Each user is assigned an access_key and a secret_key. Users are authenticated by the system by signing requests using a combination of both keys. If the keys are valid, the requests will be allowed; else, denied. User details stored in “ user ” bucket, identified by access_key. Furthermore, every user ’ s activity will be tracked by Riak CS and stored for billing/metering purposes(more later)
  • Objects are stored in buckets. Users ’ s can create and remove buckets as well as list their contents. Buckets are essentially a namespace, and are very much like folders. Bucket names must be globally unique, so if you have two users both try to create a bucket named “ kittens ” , whoever creates that bucket first will own it, etc.
  • Put objects in buckets. Objects are chunked and replicated, but that all happens behind the scenes and not exposed to the user.
  • You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
  • You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
  • You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
  • You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
  • You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
  • You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
  • Riak CS provides stats on user activity and total cluster operations, as well as ships with DTrace probes you can use to inspect/debug a live system. So at any given time you can monitor a Riak CS cluster for both expected behavior and anomalies. From an administrative perspective, (as mentioned earlier) Riak CS will track each individual user ’ s activity, so that you can define usage limits and billing policies if necessary.
  • Riak CS, just like Riak, uses Boundary ’ s Folsom stats library for monitoring cluster operations. These start when Riak CS starts, are not persisted to disk. Get stats with an HTTP request to /riak-cs/stats. You ’ ll get back counters and histograms that track the total number of operations performed on blocks, buckets and objects. For instance, see the total number of GET or PUT operations on objects in the Riak CS cluster. These stats are going to be most useful if you ’ re trying to diagnose unexpected behavior. Hopefully that ’ s never the case, but shit happens.
  • RiakCS has a reserved namespace for tracking user activity. This is the “ usage ” bucket and is the foundation for metering and building custom billing policies in Riak CS. Every time a user performs an operation, RiakCS will store this data in an object in the usage bucket identified by that users ’ s access_key. You can configure the frequency with which these reports are persisted as well as the ability for user ’ s to request their own usage statistics.
  • Limitations = no periods greater than 31 days
  • Limitations = no periods greater than 31 days
  • Introduce yourself

Riak CS in Cloudstack Riak CS in Cloudstack Presentation Transcript

  • Simple, Available Cloud StorageFor Cloudstack
  • Overview
  • On March 27, 2012Bashoannounced a newproduct calledRiak CS
  • On September 5, 2012 BASHOjoined Apache Cloudstack
  • On March 20, 2013Riak CSbecame open source
  • Riak CS is... storageenterprise cloudbuilt g in S3-compatibility on top fe rof o f multi-tenancyRiak per user reporting large object storage
  • Enabling you to host your own PUBLIC & PRIVATE CLOUDS or….Reliable Storage Behind Apps
  • Bashos Commits@john_burwell s contribution:S3-backed secondary storage feature in 4.1.0Uses S3 to sync secondary storage across zonesLong term: (shhhhhh!)Native S3 SupportFederated authentication and authorization
  • DataPipe blog.datapipe.com/datapipe-cloudstack “Riak CS provides the high-performance, distributed datastore we need to deliver a sound foundation for our cloud storage needs now and for many years into the future” - Ed Laczynski, VP Cloud Strategy, Datapipe.
  • Yahoo! “Today, Yahoo! leverages Riak CS Enterprise to offer an S3-compatible public cloud storage service, as well as dedicated hosting options ... Yahoo! is highly supportive of open source software and we view Basho’s (OSS) announcement as a positive move that will work to accelerate its ability to innovate and ultimately strengthen our cloud platform.” - Shingo Saito, cloud product manager, Yahoo!
  • About Riak
  • Riak Dynamo-inspired key/value store Written in Erlang with C/C++ Open source under Apache 2 license Thousands of production deployments
  • Riak High availability Low-latency Horizontal scalability Fault-tolerance Ops friendliness
  • Riak Masterless • No master/slave or different roles • All nodes are equal • Write availability and scalability • All nodes can accept/route requests
  • Riak No Sharding • Consistent hashing • Prevents “hot spots” • Lowers operational burden of scale • Data rebalanced automatically
  • Riak Availability and Fault-Tolerance • Automatically replicates data • Read and write data during hardware failure and network partition • Hinted handoff
  • How ItWorks
  • Riak CSStanchionRiak
  • 1 Riak CS node for every node of Riak
  • Large Object 1. User uploads an object S3 Reporting S3 Reporting S3 Reporting S3 Reporting S3 Reporting API API API API API API API API API API Riak CS Riak CS Riak CS Riak CS Riak CS 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 2. Riak CS 3. Riak CS Riak breaks objectstreams chunks Node into 1 MB chunks to Riak nodes Riak Riak Node Node Riak Riak 4. Riak replicates Node Node and stores chunks
  • IC S S T BA EP C CON USERS multi-tenancy: Riak CS will track individual usage/statsusers identified by users authenticated by access_key secret_key
  • IC S S T BA EP CCON BUCKETS users create buckets. buckets are like folders. store objects in buckets. names are globally unique.
  • IC S S T BA EP CCON OBJECTS stored in buckets. objects are opaque. store any file type.
  • Features
  • Riak CS Large Object Support • Started with 5GB / object • Now have multipart upload • Content agnostic
  • Riak CS S3-Compatible API • Use existing S3 libraries and tools • RESTful operations • Multipart upload • S3-style ACLs for object/bucket permissions • S3 authentication scheme
  • Riak CS Administration and Users • Interface for user creation, deletion, and credentials • Configure so only admins can create users
  • Riak CS New Stuff in Riak 1.3 • Multipart upload: parts between 5MB and 5GB • Support for GET range queries • Restrict access to buckets based on source IP
  • Riak CS
  • Riak CS Packages • Debian • Ubuntu • FreeBSD • Mac • Red Hat Enterprise • Fedora • SmartOS • Solaris • Source
  • Operations
  • built-instats & track access & storage per user inspect ops withDTrace DTrace probes monitor totalsupport cluster ops
  • OPERATIONAL STATS exposed via HTTP resource: /riak-cs/statsHISTOGRAMS & COUNTERS block bucket object LIST KEYS, CREATE, GET, PUT, DELETE GET, PUT, DELETE DELETE, GET/PUT ACL HEAD, GET/PUT ACL
  • THE “USAGE” BUCKETTRACK INDIVIDUAL USER’SACCESS STORAGE
  • QUERY USAGE STATS Storage and access statistics tracked on per-user basis, as rollups for slices of time •Operations, Count, BytesIn, BytesOut, + system and user error •Objects, Bytes
  • Enterprise
  • Multi-Datacenter Replication • For active backups, availability zones, disaster recovery, global traffic • Real-time or full-sync • 24/7 support • Per-node or storage-based pricing
  • SIGN UP FOR ANENTERPRISE DEVELOPER TRIAL basho.com http://docs.basho.com/
  • Riak LondonA distributed systems meet/drink up www.meetup.com/riak-london
  • github.com/bashotwitter.com/basho docs.basho.com
  • Q&A @_stu_