0
Riak and Riak CSRiak and Riak CSAndy Gross <@argv0>Andy Gross <@argv0>Chief Architect, Basho TechnologiesChief Architect, ...
BashoBasho120+ employees, offices in SF, MA,120+ employees, offices in SF, MA,London, JapanLondon, JapanFounded in 2008, o...
Now Open Source (Apache 2)Now Open Source (Apache 2)Cloud storage software backed by RiakCloud storage software backed by ...
REDACTEDREDACTEDREDACTEDREDACTEDREDACTEDREDACTED
what is a cloud service?what is a cloud service?operationally simpleoperationally simplehorizontally scalablehorizontally ...
you can’t outsource theseyou can’t outsource thesepropertiespropertiesoperationally simpleoperationally simplehorizontally...
““use pacemaker” =use pacemaker” =wrong answerwrong answer
““use mysql best practicesuse mysql best practicesfor redundancy” = wrongfor redundancy” = wrongansweranswer
““just plug it into a SAN” =just plug it into a SAN” =wrong answerwrong answer
all cloud services needall cloud services needreliable, distributed statereliable, distributed statestoragestorage
storage is the moststorage is the mostimportant and hardestimportant and hardestpartpart
Riak CS uses RiakRiak CS uses Riak
What is Riak?What is Riak?
Key-Value store (plus extras)Key-Value store (plus extras)Distributed, horizontally scalableDistributed, horizontally scal...
Simple operations - get, put, deleteSimple operations - get, put, deleteValue is mostly opaque (some metadata)Value is mos...
Distributed &Distributed &Horizontally ScalableHorizontally ScalableDefault configuration is in a clusterDefault configura...
Fault-TolerantFault-TolerantSymmetry: All nodes participate equallySymmetry: All nodes participate equallyDecentralized: n...
Highly-AvailableHighly-AvailableAny node can serve client requestsAny node can serve client requestsFallbacks (sloppy quor...
Inspired by Amazon’sInspired by Amazon’sDynamoDynamoMasterless, peer-coordinated replicationMasterless, peer-coordinated r...
RiakNodeRiakNodeRiakNodeRiakNodeRiakNodeLarge ObjectRiak CSS3APIReportingAPIRiak CSS3APIReportingAPIRiak CSS3APIReportingA...
PrinciplesPrinciplesAlways-writableAlways-writableIncrementally scalableIncrementally scalableSymmetricalSymmetricalDecent...
TechniquesTechniquesConsistent HashingConsistent HashingVector ClocksVector ClocksRead RepairRead RepairAnti-EntropyAnti-E...
Consistent HashingConsistent HashingInvented by Danny Lewin and others @Invented by Danny Lewin and others @MIT/AkamaiMIT/...
Vector ClocksVector ClocksIntroduced by Mattern et al, in 1988Introduced by Mattern et al, in 1988Extends Lamport’s timest...
Read RepairRead RepairUpdate stale versions opportunistically on readsUpdate stale versions opportunistically on reads(ins...
Hinted HandoffHinted HandoffAny node can accept writes for other nodes ifAny node can accept writes for other nodes ifthey...
Anti-EntropyAnti-EntropyReplicas maintain a Merkle Tree of keys andReplicas maintain a Merkle Tree of keys andtheir versio...
Gossip ProtocolGossip ProtocolDecentralized approach to managing globalDecentralized approach to managing globalstatestate...
Hinted Handoff• Node fails• Requests go to fallback• Node comes back• “Handoff” - data returnsto recovered node• Normal op...
Anatomy of a Requestget(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”)Get Handler (FSM)Get Handler (FSM)clientRiakhash(“ha...
v2v2v2v2Read Repairget(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”)Get Handler (FSM)Get Handler (FSM)clientRiakCoordinat...
Erlang/OTP RuntimeErlang/OTP RuntimeRiak KVRiak KVRiak ArchitectureClient APIsClient APIsRequest CoordinationRequest Coord...
riak is a solid foundationriak is a solid foundationfor building cloudfor building cloudservicesservices
Coming Soon:Coming Soon:Riak CS 1.4 (Q2)Riak CS 1.4 (Q2)Swift APISwift APIKeystone IntegrationKeystone IntegrationS3 Featu...
Coming Later (2014)Coming Later (2014)Erasure codingErasure codingReduced redundancy storageReduced redundancy storageNati...
RICON East - May 13-14,RICON East - May 13-14,NYCNYCA distributed systems conference forA distributed systems conference f...
thanks!/questions?thanks!/questions?download riakcs:download riakcs:http://docs.basho.com/riakcs/latest/riakcs-downloadhtt...
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)
Upcoming SlideShare
Loading in...5
×

Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)

1,934

Published on

About Basho: Basho makes and distributes Riak CS. Built on Riak, Basho's opensource, scalable datastore used by thousands in production, CS is made for companies that need large file storage that can't go down.

About the speaker: Andy Gross, Basho's Chief Architect, will take you on a tour of RiakCS, talk about how and why Basho built it, and the architecture that underpins it. He'll also highlight various uses case featuring Fortune500 companies who rely on Riak CS.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,934
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
16
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • Think of it like a big hash-table
  • X = throughput, compute power for MapReduce, storage, lower latency
  • Consistent hashing means: 1) large, fixed-size key-space 2) no rehashing of keys - always hash the same way
  • 1) Client requests a key 2) Get handler starts up to service the request 3) Hashes key to its owner partitions (N=3) 4) Sends similar “get” request to those partitions 5) Waits for R replies that concur (R=2) 6) Resolves the object, replies to client 7) Third reply may come back at any time, but FSM replies as soon as quorum is satisfied/violated
  • Transcript of "Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief Architect (Basho)"

    1. 1. Riak and Riak CSRiak and Riak CSAndy Gross <@argv0>Andy Gross <@argv0>Chief Architect, Basho TechnologiesChief Architect, Basho TechnologiesSilicon Valley Cloud Computing GroupSilicon Valley Cloud Computing GroupApril 2, 2013April 2, 2013
    2. 2. BashoBasho120+ employees, offices in SF, MA,120+ employees, offices in SF, MA,London, JapanLondon, JapanFounded in 2008, open sourced Riak inFounded in 2008, open sourced Riak in20092009Sponsors of the Riak open sourceSponsors of the Riak open sourcedatabase (Apache 2)database (Apache 2)Sell Enterprise features (multi-DCSell Enterprise features (multi-DCreplication), support, training.replication), support, training.Riak CS (S3-compat storage) released inRiak CS (S3-compat storage) released inMarch 2012March 2012
    3. 3. Now Open Source (Apache 2)Now Open Source (Apache 2)Cloud storage software backed by RiakCloud storage software backed by RiakS3 APIS3 APIFormerly closed-sourceFormerly closed-sourcePer-tenant reportingPer-tenant reportingPluggable authenticationPluggable authenticationDetailed statsDetailed statsDTrace supportDTrace support
    4. 4. REDACTEDREDACTEDREDACTEDREDACTEDREDACTEDREDACTED
    5. 5. what is a cloud service?what is a cloud service?operationally simpleoperationally simplehorizontally scalablehorizontally scalableglobally distributedglobally distributedhighly availablehighly availableno SPOFsno SPOFsfault tolerantfault tolerant
    6. 6. you can’t outsource theseyou can’t outsource thesepropertiespropertiesoperationally simpleoperationally simplehorizontally scalablehorizontally scalableglobally distributedglobally distributedhighly availablehighly availableno SPOFsno SPOFsfault tolerantfault tolerant
    7. 7. ““use pacemaker” =use pacemaker” =wrong answerwrong answer
    8. 8. ““use mysql best practicesuse mysql best practicesfor redundancy” = wrongfor redundancy” = wrongansweranswer
    9. 9. ““just plug it into a SAN” =just plug it into a SAN” =wrong answerwrong answer
    10. 10. all cloud services needall cloud services needreliable, distributed statereliable, distributed statestoragestorage
    11. 11. storage is the moststorage is the mostimportant and hardestimportant and hardestpartpart
    12. 12. Riak CS uses RiakRiak CS uses Riak
    13. 13. What is Riak?What is Riak?
    14. 14. Key-Value store (plus extras)Key-Value store (plus extras)Distributed, horizontally scalableDistributed, horizontally scalableEventually consistentEventually consistentFault-tolerantFault-tolerantHighly-availableHighly-availableInspired by Amazon’s DynamoInspired by Amazon’s Dynamo
    15. 15. Simple operations - get, put, deleteSimple operations - get, put, deleteValue is mostly opaque (some metadata)Value is mostly opaque (some metadata)ExtrasExtrasMapReduceMapReduceSecondary IndexesSecondary IndexesFull-text search (optional)Full-text search (optional)Key-ValueKey-Value
    16. 16. Distributed &Distributed &Horizontally ScalableHorizontally ScalableDefault configuration is in a clusterDefault configuration is in a clusterLoad and data are spread evenly via consistentLoad and data are spread evenly via consistenthashinghashingScalable: Add more nodes to get more XScalable: Add more nodes to get more X
    17. 17. Fault-TolerantFault-TolerantSymmetry: All nodes participate equallySymmetry: All nodes participate equallyDecentralized: no central control, no SPOFDecentralized: no central control, no SPOFAll data is replicated 3x by defaultAll data is replicated 3x by defaultCluster transparently survives...Cluster transparently survives...node failurenode failurenetwork partitionsnetwork partitionsBuilt on Erlang/OTP (designed for FT)Built on Erlang/OTP (designed for FT)
    18. 18. Highly-AvailableHighly-AvailableAny node can serve client requestsAny node can serve client requestsFallbacks (sloppy quorums) are used whenFallbacks (sloppy quorums) are used whennodes are downnodes are downAlways accepts write requestsAlways accepts write requestsAccepts read request as long as R/NAccepts read request as long as R/Nnodes are alivenodes are alivePer-request quorumsPer-request quorums
    19. 19. Inspired by Amazon’sInspired by Amazon’sDynamoDynamoMasterless, peer-coordinated replicationMasterless, peer-coordinated replicationConsistent hashingConsistent hashingEventually consistentEventually consistentQuorum reads and writesQuorum reads and writesAnti-entropy: read repair, hinted handoffAnti-entropy: read repair, hinted handoff
    20. 20. RiakNodeRiakNodeRiakNodeRiakNodeRiakNodeLarge ObjectRiak CSS3APIReportingAPIRiak CSS3APIReportingAPIRiak CSS3APIReportingAPIRiak CSS3APIReportingAPIRiak CSS3APIReportingAPI1. user uploadsan object1 MB2. Riak CSbreaks objectinto 1 MB chunks1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB 1 MB3. Riak CSstreams chunksto Riak nodes4. Riakreplicatesand storeschunks
    21. 21. PrinciplesPrinciplesAlways-writableAlways-writableIncrementally scalableIncrementally scalableSymmetricalSymmetricalDecentralizedDecentralizedFocus on SLAs, tail latencyFocus on SLAs, tail latency
    22. 22. TechniquesTechniquesConsistent HashingConsistent HashingVector ClocksVector ClocksRead RepairRead RepairAnti-EntropyAnti-EntropyHinted HandoffHinted HandoffGossip ProtocolGossip Protocol
    23. 23. Consistent HashingConsistent HashingInvented by Danny Lewin and others @Invented by Danny Lewin and others @MIT/AkamaiMIT/AkamaiMinimizes remapping of keys when number ofMinimizes remapping of keys when number ofhash slots changeshash slots changesOriginally applied to CDNs, used in Dynamo forOriginally applied to CDNs, used in Dynamo forreplica placementreplica placementEnables incremental scalability, even spreadEnables incremental scalability, even spreadMinimizes hot spotsMinimizes hot spots
    24. 24. Vector ClocksVector ClocksIntroduced by Mattern et al, in 1988Introduced by Mattern et al, in 1988Extends Lamport’s timestamps (1978)Extends Lamport’s timestamps (1978)Each value in Dynamo tagged with vector clockEach value in Dynamo tagged with vector clockAllows detection of stale values, logical siblingsAllows detection of stale values, logical siblings
    25. 25. Read RepairRead RepairUpdate stale versions opportunistically on readsUpdate stale versions opportunistically on reads(instead of writes)(instead of writes)Pushes system toward consistency, afterPushes system toward consistency, afterreturning value to clientreturning value to clientReflects focus on a cheap, always-availableReflects focus on a cheap, always-availablewrite pathwrite path
    26. 26. Hinted HandoffHinted HandoffAny node can accept writes for other nodes ifAny node can accept writes for other nodes ifthey’re downthey’re downAll messages include a destinationAll messages include a destinationData accepted by node other than destinationData accepted by node other than destinationis handed off when node recoversis handed off when node recoversAs long as a single node is alive the cluster canAs long as a single node is alive the cluster canaccept a writeaccept a write
    27. 27. Anti-EntropyAnti-EntropyReplicas maintain a Merkle Tree of keys andReplicas maintain a Merkle Tree of keys andtheir versions/hashestheir versions/hashesTrees periodically exchanged with peer vnodesTrees periodically exchanged with peer vnodesMerkle tree enables cheap comparisonMerkle tree enables cheap comparisonOnly values with different hashes areOnly values with different hashes areexchangedexchangedPushes system toward consistencyPushes system toward consistency
    28. 28. Gossip ProtocolGossip ProtocolDecentralized approach to managing globalDecentralized approach to managing globalstatestateTrades off atomicity of state changes for aTrades off atomicity of state changes for adecentralized approachdecentralized approachVolume of gossip can overwhelm networksVolume of gossip can overwhelm networkswithout carewithout care
    29. 29. Hinted Handoff• Node fails• Requests go to fallback• Node comes back• “Handoff” - data returnsto recovered node• Normal operationsresumehash(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”)```XXXXXXXX```
    30. 30. Anatomy of a Requestget(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”)Get Handler (FSM)Get Handler (FSM)clientRiakhash(“hash(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”)”)== 10, 11, 12== 10, 11, 12get(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”)Coordinating nodeCluster66 77 88 99 1010 1111 1212 1313 1414 1515 1616The RingR=2R=2v1v1 v2v2v1v1 v2v2v2v2
    31. 31. v2v2v2v2Read Repairget(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”)Get Handler (FSM)Get Handler (FSM)clientRiakCoordinating nodeCluster66 77 88 99 1010 1111 1212 1313 1414 1515 1616R=2R=2 v1v1 v2v2v2v2v1v1v2v2v1v1v1v1 v2v2v2v2
    32. 32. Erlang/OTP RuntimeErlang/OTP RuntimeRiak KVRiak KVRiak ArchitectureClient APIsClient APIsRequest CoordinationRequest CoordinationRiak CoreRiak Coregetget putput deletedelete map-reducemap-reduceHTTPHTTP Protocol BuffersProtocol BuffersErlang local clientErlang local clientmembershipconsistent hashing handoffnode-livenessgossipbucketsvnodesvnodesstorage backendstorage backendJS RuntimeJS Runtimevnode mastervnode master
    33. 33. riak is a solid foundationriak is a solid foundationfor building cloudfor building cloudservicesservices
    34. 34. Coming Soon:Coming Soon:Riak CS 1.4 (Q2)Riak CS 1.4 (Q2)Swift APISwift APIKeystone IntegrationKeystone IntegrationS3 FeaturesS3 FeaturesCOPY ObjectCOPY ObjectObject VersioningObject VersioningRiak CS 1.5 (Q3)Riak CS 1.5 (Q3)Server side encryptionServer side encryption
    35. 35. Coming Later (2014)Coming Later (2014)Erasure codingErasure codingReduced redundancy storageReduced redundancy storageNative indexing/searchNative indexing/search
    36. 36. RICON East - May 13-14,RICON East - May 13-14,NYCNYCA distributed systems conference forA distributed systems conference fordevelopersdevelopersSpeakers from Comcast, State Farm, UCSpeakers from Comcast, State Farm, UCBerkeley, Harvard, and many moreBerkeley, Harvard, and many moreUse discount code SVCloud20 for 20% offUse discount code SVCloud20 for 20% offticketsticketshttp://ricon.io/east.htmlhttp://ricon.io/east.html
    37. 37. thanks!/questions?thanks!/questions?download riakcs:download riakcs:http://docs.basho.com/riakcs/latest/riakcs-downloadhttp://docs.basho.com/riakcs/latest/riakcs-downloadhack riakcs:hack riakcs:http://github.com/basho/riak_cshttp://github.com/basho/riak_cswork at basho:work at basho:http://bashojobs.theresumator.comhttp://bashojobs.theresumator.comfollow basho on twitter:follow basho on twitter:http:/twitter.com/bashohttp:/twitter.com/basho
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×