• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)
 

Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

on

  • 1,777 views

About Basho: Basho makes and distributes Riak CS. Built on Riak, Basho's opensource, scalable datastore used by thousands in production, CS is made for companies that need large file storage that ...

About Basho: Basho makes and distributes Riak CS. Built on Riak, Basho's opensource, scalable datastore used by thousands in production, CS is made for companies that need large file storage that can't go down.

About the speaker: Andy Gross, Basho's Chief Architect, will take you on a tour of RiakCS, talk about how and why Basho built it, and the architecture that underpins it. He'll also highlight various uses case featuring Fortune500 companies who rely on Riak CS.

Statistics

Views

Total Views
1,777
Views on SlideShare
1,587
Embed Views
190

Actions

Likes
0
Downloads
13
Comments
0

2 Embeds 190

https://twitter.com 184
http://tweetedtimes.com 6

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Think of it like a big hash-table
  • X = throughput, compute power for MapReduce, storage, lower latency
  • Consistent hashing means: 1) large, fixed-size key-space 2) no rehashing of keys - always hash the same way
  • 1) Client requests a key 2) Get handler starts up to service the request 3) Hashes key to its owner partitions (N=3) 4) Sends similar “get” request to those partitions 5) Waits for R replies that concur (R=2) 6) Resolves the object, replies to client 7) Third reply may come back at any time, but FSM replies as soon as quorum is satisfied/violated

Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho) Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho) Presentation Transcript

  • Riak and Riak CSRiak and Riak CSAndy Gross <@argv0>Andy Gross <@argv0>Chief Architect, Basho TechnologiesChief Architect, Basho TechnologiesSilicon Valley Cloud Computing GroupSilicon Valley Cloud Computing GroupApril 2, 2013April 2, 2013
  • BashoBasho120+ employees, offices in SF, MA,120+ employees, offices in SF, MA,London, JapanLondon, JapanFounded in 2008, open sourced Riak inFounded in 2008, open sourced Riak in20092009Sponsors of the Riak open sourceSponsors of the Riak open sourcedatabase (Apache 2)database (Apache 2)Sell Enterprise features (multi-DCSell Enterprise features (multi-DCreplication), support, training.replication), support, training.Riak CS (S3-compat storage) released inRiak CS (S3-compat storage) released inMarch 2012March 2012
  • Now Open Source (Apache 2)Now Open Source (Apache 2)Cloud storage software backed by RiakCloud storage software backed by RiakS3 APIS3 APIFormerly closed-sourceFormerly closed-sourcePer-tenant reportingPer-tenant reportingPluggable authenticationPluggable authenticationDetailed statsDetailed statsDTrace supportDTrace support
  • REDACTEDREDACTEDREDACTEDREDACTEDREDACTEDREDACTED
  • what is a cloud service?what is a cloud service?operationally simpleoperationally simplehorizontally scalablehorizontally scalableglobally distributedglobally distributedhighly availablehighly availableno SPOFsno SPOFsfault tolerantfault tolerant
  • you can’t outsource theseyou can’t outsource thesepropertiespropertiesoperationally simpleoperationally simplehorizontally scalablehorizontally scalableglobally distributedglobally distributedhighly availablehighly availableno SPOFsno SPOFsfault tolerantfault tolerant
  • ““use pacemaker” =use pacemaker” =wrong answerwrong answer
  • ““use mysql best practicesuse mysql best practicesfor redundancy” = wrongfor redundancy” = wrongansweranswer
  • ““just plug it into a SAN” =just plug it into a SAN” =wrong answerwrong answer
  • all cloud services needall cloud services needreliable, distributed statereliable, distributed statestoragestorage
  • storage is the moststorage is the mostimportant and hardestimportant and hardestpartpart
  • Riak CS uses RiakRiak CS uses Riak
  • What is Riak?What is Riak?
  • Key-Value store (plus extras)Key-Value store (plus extras)Distributed, horizontally scalableDistributed, horizontally scalableEventually consistentEventually consistentFault-tolerantFault-tolerantHighly-availableHighly-availableInspired by Amazon’s DynamoInspired by Amazon’s Dynamo
  • Simple operations - get, put, deleteSimple operations - get, put, deleteValue is mostly opaque (some metadata)Value is mostly opaque (some metadata)ExtrasExtrasMapReduceMapReduceSecondary IndexesSecondary IndexesFull-text search (optional)Full-text search (optional)Key-ValueKey-Value
  • Distributed &Distributed &Horizontally ScalableHorizontally ScalableDefault configuration is in a clusterDefault configuration is in a clusterLoad and data are spread evenly via consistentLoad and data are spread evenly via consistenthashinghashingScalable: Add more nodes to get more XScalable: Add more nodes to get more X
  • Fault-TolerantFault-TolerantSymmetry: All nodes participate equallySymmetry: All nodes participate equallyDecentralized: no central control, no SPOFDecentralized: no central control, no SPOFAll data is replicated 3x by defaultAll data is replicated 3x by defaultCluster transparently survives...Cluster transparently survives...node failurenode failurenetwork partitionsnetwork partitionsBuilt on Erlang/OTP (designed for FT)Built on Erlang/OTP (designed for FT)
  • Highly-AvailableHighly-AvailableAny node can serve client requestsAny node can serve client requestsFallbacks (sloppy quorums) are used whenFallbacks (sloppy quorums) are used whennodes are downnodes are downAlways accepts write requestsAlways accepts write requestsAccepts read request as long as R/NAccepts read request as long as R/Nnodes are alivenodes are alivePer-request quorumsPer-request quorums
  • Inspired by Amazon’sInspired by Amazon’sDynamoDynamoMasterless, peer-coordinated replicationMasterless, peer-coordinated replicationConsistent hashingConsistent hashingEventually consistentEventually consistentQuorum reads and writesQuorum reads and writesAnti-entropy: read repair, hinted handoffAnti-entropy: read repair, hinted handoff
  • RiakNodeRiakNodeRiakNodeRiakNodeRiakNodeLarge ObjectRiak CSS3APIReportingAPIRiak CSS3APIReportingAPIRiak CSS3APIReportingAPIRiak CSS3APIReportingAPIRiak CSS3APIReportingAPI1. user uploadsan object1 MB2. Riak CSbreaks objectinto 1 MB chunks1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB3. Riak CSstreams chunksto Riak nodes4. Riakreplicatesand storeschunks
  • PrinciplesPrinciplesAlways-writableAlways-writableIncrementally scalableIncrementally scalableSymmetricalSymmetricalDecentralizedDecentralizedFocus on SLAs, tail latencyFocus on SLAs, tail latency
  • TechniquesTechniquesConsistent HashingConsistent HashingVector ClocksVector ClocksRead RepairRead RepairAnti-EntropyAnti-EntropyHinted HandoffHinted HandoffGossip ProtocolGossip Protocol
  • Consistent HashingConsistent HashingInvented by Danny Lewin and others @Invented by Danny Lewin and others @MIT/AkamaiMIT/AkamaiMinimizes remapping of keys when number ofMinimizes remapping of keys when number ofhash slots changeshash slots changesOriginally applied to CDNs, used in Dynamo forOriginally applied to CDNs, used in Dynamo forreplica placementreplica placementEnables incremental scalability, even spreadEnables incremental scalability, even spreadMinimizes hot spotsMinimizes hot spots
  • Vector ClocksVector ClocksIntroduced by Mattern et al, in 1988Introduced by Mattern et al, in 1988Extends Lamport’s timestamps (1978)Extends Lamport’s timestamps (1978)Each value in Dynamo tagged with vector clockEach value in Dynamo tagged with vector clockAllows detection of stale values, logical siblingsAllows detection of stale values, logical siblings
  • Read RepairRead RepairUpdate stale versions opportunistically on readsUpdate stale versions opportunistically on reads(instead of writes)(instead of writes)Pushes system toward consistency, afterPushes system toward consistency, afterreturning value to clientreturning value to clientReflects focus on a cheap, always-availableReflects focus on a cheap, always-availablewrite pathwrite path
  • Hinted HandoffHinted HandoffAny node can accept writes for other nodes ifAny node can accept writes for other nodes ifthey’re downthey’re downAll messages include a destinationAll messages include a destinationData accepted by node other than destinationData accepted by node other than destinationis handed off when node recoversis handed off when node recoversAs long as a single node is alive the cluster canAs long as a single node is alive the cluster canaccept a writeaccept a write
  • Anti-EntropyAnti-EntropyReplicas maintain a Merkle Tree of keys andReplicas maintain a Merkle Tree of keys andtheir versions/hashestheir versions/hashesTrees periodically exchanged with peer vnodesTrees periodically exchanged with peer vnodesMerkle tree enables cheap comparisonMerkle tree enables cheap comparisonOnly values with different hashes areOnly values with different hashes areexchangedexchangedPushes system toward consistencyPushes system toward consistency
  • Gossip ProtocolGossip ProtocolDecentralized approach to managing globalDecentralized approach to managing globalstatestateTrades off atomicity of state changes for aTrades off atomicity of state changes for adecentralized approachdecentralized approachVolume of gossip can overwhelm networksVolume of gossip can overwhelm networkswithout carewithout care
  • Hinted Handoff• Node fails• Requests go to fallback• Node comes back• “Handoff” - data returnsto recovered node• Normal operationsresumehash(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”)```XXXXXXXX```
  • Anatomy of a Requestget(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”)Get Handler (FSM)Get Handler (FSM)clientRiakhash(“hash(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”)”)== 10, 11, 12== 10, 11, 12get(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”)Coordinating nodeCluster66 77 88 99 1010 1111 1212 1313 1414 1515 1616The RingR=2R=2v1v1 v2v2v1v1 v2v2v2v2
  • v2v2v2v2Read Repairget(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”)Get Handler (FSM)Get Handler (FSM)clientRiakCoordinating nodeCluster66 77 88 99 1010 1111 1212 1313 1414 1515 1616R=2R=2 v1v1 v2v2v2v2v1v1v2v2v1v1v1v1 v2v2v2v2
  • Erlang/OTP RuntimeErlang/OTP RuntimeRiak KVRiak KVRiak ArchitectureClient APIsClient APIsRequest CoordinationRequest CoordinationRiak CoreRiak Coregetget putput deletedelete map-reducemap-reduceHTTPHTTP Protocol BuffersProtocol BuffersErlang local clientErlang local clientmembershipconsistent hashing handoffnode-livenessgossipbucketsvnodesvnodesstorage backendstorage backendJS RuntimeJS Runtimevnode mastervnode master
  • riak is a solid foundationriak is a solid foundationfor building cloudfor building cloudservicesservices
  • Coming Soon:Coming Soon:Riak CS 1.4 (Q2)Riak CS 1.4 (Q2)Swift APISwift APIKeystone IntegrationKeystone IntegrationS3 FeaturesS3 FeaturesCOPY ObjectCOPY ObjectObject VersioningObject VersioningRiak CS 1.5 (Q3)Riak CS 1.5 (Q3)Server side encryptionServer side encryption
  • Coming Later (2014)Coming Later (2014)Erasure codingErasure codingReduced redundancy storageReduced redundancy storageNative indexing/searchNative indexing/search
  • RICON East - May 13-14,RICON East - May 13-14,NYCNYCA distributed systems conference forA distributed systems conference fordevelopersdevelopersSpeakers from Comcast, State Farm, UCSpeakers from Comcast, State Farm, UCBerkeley, Harvard, and many moreBerkeley, Harvard, and many moreUse discount code SVCloud20 for 20% offUse discount code SVCloud20 for 20% offticketsticketshttp://ricon.io/east.htmlhttp://ricon.io/east.html
  • thanks!/questions?thanks!/questions?download riakcs:download riakcs:http://docs.basho.com/riakcs/latest/riakcs-downloadhttp://docs.basho.com/riakcs/latest/riakcs-downloadhack riakcs:hack riakcs:http://github.com/basho/riak_cshttp://github.com/basho/riak_cswork at basho:work at basho:http://bashojobs.theresumator.comhttp://bashojobs.theresumator.comfollow basho on twitter:follow basho on twitter:http:/twitter.com/bashohttp:/twitter.com/basho