0
Simple, Available Cloud StorageFor Cloudstack
Overview
On March 27, 2012Bashoannounced a newproduct calledRiak CS
On September 5, 2012   BASHOjoined Apache Cloudstack
On March 20, 2013Riak CSbecame open source
Riak CS        is... storageenterprise cloudbuilt                    g                       in S3-compatibility   on top ...
Enabling you to host your own      PUBLIC &      PRIVATE CLOUDS             or….Reliable Storage Behind Apps
Bashos Commits@john_burwell s contribution:S3-backed secondary storage feature in 4.1.0Uses S3 to sync secondary storage a...
DataPipe  blog.datapipe.com/datapipe-cloudstack          “Riak CS provides the high-performance,                          ...
Yahoo!   “Today, Yahoo! leverages Riak CS Enterprise to offer an               S3-compatible public cloud storage service,...
About Riak
Riak   Dynamo-inspired key/value store   Written in Erlang with C/C++   Open source under Apache 2 license   Thousands of ...
Riak  High availability  Low-latency  Horizontal scalability  Fault-tolerance  Ops friendliness
Riak   Masterless    • No master/slave or different roles    • All nodes are equal    • Write availability and scalability...
Riak   No Sharding     • Consistent hashing     • Prevents “hot spots”     • Lowers operational burden of scale     • Data...
Riak  Availability and Fault-Tolerance    • Automatically replicates data    • Read and write data during hardware       f...
How ItWorks
Riak CSStanchionRiak
1    Riak CS node      for every    node of Riak
Large Object                        1. User uploads an                                                                    ...
IC S     S T   BA EP     C CON               USERS                multi-tenancy:              Riak CS will track          ...
IC S    S T  BA EP    CCON      BUCKETS       users create buckets.      buckets are like folders.      store objects in b...
IC S    S T  BA EP    CCON       OBJECTS         stored in buckets.        objects are opaque.         store any file type.
Features
Riak CS   Large Object Support     • Started with 5GB / object     • Now have multipart upload     • Content agnostic
Riak CS   S3-Compatible API     • Use existing S3 libraries and tools     • RESTful operations     • Multipart upload     ...
Riak CS   Administration and Users     • Interface for user creation, deletion,       and credentials     • Configure so o...
Riak CS   New Stuff in Riak 1.3    • Multipart upload: parts between 5MB       and 5GB    • Support for GET range queries ...
Riak CS
Riak CS   Packages     • Debian     • Ubuntu     • FreeBSD     • Mac     • Red Hat Enterprise     • Fedora     • SmartOS  ...
Operations
built-instats &           track access &           storage per user           inspect ops withDTrace     DTrace probes    ...
OPERATIONAL STATS       exposed via HTTP resource: /riak-cs/statsHISTOGRAMS & COUNTERS      block               bucket    ...
THE      “USAGE”      BUCKETTRACK INDIVIDUAL USER’SACCESS STORAGE
QUERY USAGE STATS Storage and access statistics tracked on per-user basis, as rollups for slices of time •Operations, Coun...
Enterprise
Multi-Datacenter Replication        • For active backups, availability zones,          disaster recovery, global traffic  ...
SIGN UP FOR ANENTERPRISE DEVELOPER        TRIAL            basho.com     http://docs.basho.com/
Riak LondonA distributed systems   meet/drink up  www.meetup.com/riak-london
github.com/bashotwitter.com/basho docs.basho.com
Q&A @_stu_
Upcoming SlideShare
Loading in...5
×

Riak CS in Cloudstack

4,825

Published on

The CloudStack European User group met on Thursday 11th for our quarterly meeting.


Stuart Mcall from Basho talked about their RiakCS technology & community

Published in: Technology, Business
1 Comment
4 Likes
Statistics
Notes
No Downloads
Views
Total Views
4,825
On Slideshare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
45
Comments
1
Likes
4
Embeds 0
No embeds

No notes for slide
  • Welcome/Intro
  • Here ’ s the basics
  • Very high level discussion, segue into brief discussion of Riak
  • What you get is a platform on which you can host your own public and private clouds.
  • You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
  • You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
  • You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
  • You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
  • You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
  • You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
  • You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
  • You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
  • a Riak CS stack is composed of 3 critical components. Riak CS exposes an API to the users and is responsible for logging/tracking stats. All the data is stored in and retrieved from Riak. Run multiple instances of Riak and Riak CS for scale. Theres a third component, a single instance of a piece of software called stanchion that is responsible for tying it all together. Stanchion in essence provides the S3-like behavior at an architectural level, ensures user and bucket uniqueness globally, etc....
  • 1-to-1 pairing, and why.
  • 1. user PUTs object into Riak CS. The request will be via an S3 API and signed by their credentials. 2. once authenticated, object is chunked (remind why this is important) 3. as object is chunked, chunks sent to Riak. (you can use haproxy in the middle here) 4. Riak stores the chunks, yay!
  • Riak CS is multi-tenant. Each user is assigned an access_key and a secret_key. Users are authenticated by the system by signing requests using a combination of both keys. If the keys are valid, the requests will be allowed; else, denied. User details stored in “ user ” bucket, identified by access_key. Furthermore, every user ’ s activity will be tracked by Riak CS and stored for billing/metering purposes(more later)
  • Objects are stored in buckets. Users ’ s can create and remove buckets as well as list their contents. Buckets are essentially a namespace, and are very much like folders. Bucket names must be globally unique, so if you have two users both try to create a bucket named “ kittens ” , whoever creates that bucket first will own it, etc.
  • Put objects in buckets. Objects are chunked and replicated, but that all happens behind the scenes and not exposed to the user.
  • You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
  • You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
  • You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
  • You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
  • You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
  • You can think of Riak in many ways as a distributed filesystem. Riak is awesome because all nodes are equal. Has distribution protocols that allows incredibly straightforward scaling and even balance of load by tokenizing a huge keyspace and using consistent hashing, etc. This however is not conducive to large object storage because of latency and other network limitations when moving files around during topology changes. Also, developing on top of Riak is non-trivial, so interactions with the database can be a pain. Riak CS abstracts away both the complexity of interactions via a simple, S3-compatible API as well as uses Riak ’ s inherent functionality to provide a solution for large object storage.
  • Riak CS provides stats on user activity and total cluster operations, as well as ships with DTrace probes you can use to inspect/debug a live system. So at any given time you can monitor a Riak CS cluster for both expected behavior and anomalies. From an administrative perspective, (as mentioned earlier) Riak CS will track each individual user ’ s activity, so that you can define usage limits and billing policies if necessary.
  • Riak CS, just like Riak, uses Boundary ’ s Folsom stats library for monitoring cluster operations. These start when Riak CS starts, are not persisted to disk. Get stats with an HTTP request to /riak-cs/stats. You ’ ll get back counters and histograms that track the total number of operations performed on blocks, buckets and objects. For instance, see the total number of GET or PUT operations on objects in the Riak CS cluster. These stats are going to be most useful if you ’ re trying to diagnose unexpected behavior. Hopefully that ’ s never the case, but shit happens.
  • RiakCS has a reserved namespace for tracking user activity. This is the “ usage ” bucket and is the foundation for metering and building custom billing policies in Riak CS. Every time a user performs an operation, RiakCS will store this data in an object in the usage bucket identified by that users ’ s access_key. You can configure the frequency with which these reports are persisted as well as the ability for user ’ s to request their own usage statistics.
  • Limitations = no periods greater than 31 days
  • Limitations = no periods greater than 31 days
  • Introduce yourself
  • Transcript of "Riak CS in Cloudstack"

    1. 1. Simple, Available Cloud StorageFor Cloudstack
    2. 2. Overview
    3. 3. On March 27, 2012Bashoannounced a newproduct calledRiak CS
    4. 4. On September 5, 2012 BASHOjoined Apache Cloudstack
    5. 5. On March 20, 2013Riak CSbecame open source
    6. 6. Riak CS is... storageenterprise cloudbuilt g in S3-compatibility on top fe rof o f multi-tenancyRiak per user reporting large object storage
    7. 7. Enabling you to host your own PUBLIC & PRIVATE CLOUDS or….Reliable Storage Behind Apps
    8. 8. Bashos Commits@john_burwell s contribution:S3-backed secondary storage feature in 4.1.0Uses S3 to sync secondary storage across zonesLong term: (shhhhhh!)Native S3 SupportFederated authentication and authorization
    9. 9. DataPipe blog.datapipe.com/datapipe-cloudstack “Riak CS provides the high-performance, distributed datastore we need to deliver a sound foundation for our cloud storage needs now and for many years into the future” - Ed Laczynski, VP Cloud Strategy, Datapipe.
    10. 10. Yahoo! “Today, Yahoo! leverages Riak CS Enterprise to offer an S3-compatible public cloud storage service, as well as dedicated hosting options ... Yahoo! is highly supportive of open source software and we view Basho’s (OSS) announcement as a positive move that will work to accelerate its ability to innovate and ultimately strengthen our cloud platform.” - Shingo Saito, cloud product manager, Yahoo!
    11. 11. About Riak
    12. 12. Riak Dynamo-inspired key/value store Written in Erlang with C/C++ Open source under Apache 2 license Thousands of production deployments
    13. 13. Riak High availability Low-latency Horizontal scalability Fault-tolerance Ops friendliness
    14. 14. Riak Masterless • No master/slave or different roles • All nodes are equal • Write availability and scalability • All nodes can accept/route requests
    15. 15. Riak No Sharding • Consistent hashing • Prevents “hot spots” • Lowers operational burden of scale • Data rebalanced automatically
    16. 16. Riak Availability and Fault-Tolerance • Automatically replicates data • Read and write data during hardware failure and network partition • Hinted handoff
    17. 17. How ItWorks
    18. 18. Riak CSStanchionRiak
    19. 19. 1 Riak CS node for every node of Riak
    20. 20. Large Object 1. User uploads an object S3 Reporting S3 Reporting S3 Reporting S3 Reporting S3 Reporting API API API API API API API API API API Riak CS Riak CS Riak CS Riak CS Riak CS 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 2. Riak CS 3. Riak CS Riak breaks objectstreams chunks Node into 1 MB chunks to Riak nodes Riak Riak Node Node Riak Riak 4. Riak replicates Node Node and stores chunks
    21. 21. IC S S T BA EP C CON USERS multi-tenancy: Riak CS will track individual usage/statsusers identified by users authenticated by access_key secret_key
    22. 22. IC S S T BA EP CCON BUCKETS users create buckets. buckets are like folders. store objects in buckets. names are globally unique.
    23. 23. IC S S T BA EP CCON OBJECTS stored in buckets. objects are opaque. store any file type.
    24. 24. Features
    25. 25. Riak CS Large Object Support • Started with 5GB / object • Now have multipart upload • Content agnostic
    26. 26. Riak CS S3-Compatible API • Use existing S3 libraries and tools • RESTful operations • Multipart upload • S3-style ACLs for object/bucket permissions • S3 authentication scheme
    27. 27. Riak CS Administration and Users • Interface for user creation, deletion, and credentials • Configure so only admins can create users
    28. 28. Riak CS New Stuff in Riak 1.3 • Multipart upload: parts between 5MB and 5GB • Support for GET range queries • Restrict access to buckets based on source IP
    29. 29. Riak CS
    30. 30. Riak CS Packages • Debian • Ubuntu • FreeBSD • Mac • Red Hat Enterprise • Fedora • SmartOS • Solaris • Source
    31. 31. Operations
    32. 32. built-instats & track access & storage per user inspect ops withDTrace DTrace probes monitor totalsupport cluster ops
    33. 33. OPERATIONAL STATS exposed via HTTP resource: /riak-cs/statsHISTOGRAMS & COUNTERS block bucket object LIST KEYS, CREATE, GET, PUT, DELETE GET, PUT, DELETE DELETE, GET/PUT ACL HEAD, GET/PUT ACL
    34. 34. THE “USAGE” BUCKETTRACK INDIVIDUAL USER’SACCESS STORAGE
    35. 35. QUERY USAGE STATS Storage and access statistics tracked on per-user basis, as rollups for slices of time •Operations, Count, BytesIn, BytesOut, + system and user error •Objects, Bytes
    36. 36. Enterprise
    37. 37. Multi-Datacenter Replication • For active backups, availability zones, disaster recovery, global traffic • Real-time or full-sync • 24/7 support • Per-node or storage-based pricing
    38. 38. SIGN UP FOR ANENTERPRISE DEVELOPER TRIAL basho.com http://docs.basho.com/
    39. 39. Riak LondonA distributed systems meet/drink up www.meetup.com/riak-london
    40. 40. github.com/bashotwitter.com/basho docs.basho.com
    41. 41. Q&A @_stu_
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×