BlobSeer Presented by Viet-Trung TRAN KerData Team
BlobSeer: Architecture <ul><li>Clients </li></ul><ul><ul><li>Perform fine grain blob accesses </li></ul></ul><ul><li>Provi...
BlobSeer: What may be refined <ul><li>Hotspots/fault-tolerance  </li></ul><ul><ul><li>Fixed single version manager </li></...
BlobSeer: What I am thinking of
Background: Lighting-weigh DHT(may not correct) <ul><li>Using consistent hashing to hash distribute keys </li></ul><ul><ul...
Distributed version managers <ul><li>Distributed version managers: A 2 levels </li></ul><ul><ul><li>Splitting BLOB_ID name...
<ul><li>Concurrent writing/appending need to be serialized </li></ul><ul><ul><li>On master </li></ul></ul><ul><ul><li>Blob...
<ul><li>Eliminate the provider manager </li></ul><ul><ul><li>Provider manager keeps cluster state to answer clients’ reque...
However !!! <ul><li>We will not want to use consistent hashing     </li></ul>
Architecture Version managers, metadata managers, providers, clients DHT with consistent hashing Distributed membership ma...
Access scenarios <ul><li>Reading </li></ul><ul><ul><li>Hash blobID to know its associated version manager </li></ul></ul><...
Overview of the implementation <ul><li>Gossip based DHT </li></ul><ul><li>We need 3 hash namespaces </li></ul><ul><ul><li>...
Advantages <ul><li>Still keeping the current nice features of BlobSeer </li></ul><ul><li>Monolithic-based design </li></ul...
Some more discussions <ul><li>If client is outside of BlobSeer storage cloud, client randomly chooses one node to communic...
BlobSeer in NoSQL paradigm <ul><li>Document stores </li></ul><ul><li>Column stores </li></ul>
{pages} distribution <ul><li>BlobSeer’s approach </li></ul><ul><ul><li>Distribute {pages} over different providers </li></...
<ul><li>Eliminate the provider manager </li></ul><ul><ul><li>Provider manager keeps cluster state to answer clients’ reque...
Upcoming SlideShare
Loading in …5
×

BlobSeer in NoSQL world

762 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
762
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

BlobSeer in NoSQL world

  1. 1. BlobSeer Presented by Viet-Trung TRAN KerData Team
  2. 2. BlobSeer: Architecture <ul><li>Clients </li></ul><ul><ul><li>Perform fine grain blob accesses </li></ul></ul><ul><li>Providers </li></ul><ul><ul><li>Store the pages of the blob </li></ul></ul><ul><li>Provider manager </li></ul><ul><ul><li>Monitors the providers </li></ul></ul><ul><ul><li>Favours data load balancing </li></ul></ul><ul><li>Metadata providers </li></ul><ul><ul><li>Store information about page location </li></ul></ul><ul><li>Version manager </li></ul><ul><ul><li>Ensures concurrency control </li></ul></ul>Clients Providers Metadata providers Provider manager Version manager
  3. 3. BlobSeer: What may be refined <ul><li>Hotspots/fault-tolerance </li></ul><ul><ul><li>Fixed single version manager </li></ul></ul><ul><ul><li>Fixed provider manager </li></ul></ul><ul><li>Load balancing </li></ul><ul><ul><li>Version manager, provider manager may become hotspots </li></ul></ul><ul><ul><li>Fixed metadata providers </li></ul></ul>
  4. 4. BlobSeer: What I am thinking of
  5. 5. Background: Lighting-weigh DHT(may not correct) <ul><li>Using consistent hashing to hash distribute keys </li></ul><ul><ul><li>Load balancing </li></ul></ul><ul><ul><li>Fault tolerance </li></ul></ul><ul><ul><li>Elasticity </li></ul></ul><ul><li>Lookup cost: O(1) </li></ul><ul><ul><li>Base on Gossip overlay (borrowed from NoSQL world) </li></ul></ul><ul><ul><li>Or base on Kelips P2P prototype (I have just know about it) </li></ul></ul><ul><ul><li>Given a key, node know the destination exactly in most cases </li></ul></ul><ul><ul><li>Overhead: OK ref. NoSQL world (Facebook Cassandra, Amazon Dynamo, Voldermort) </li></ul></ul><ul><li>I will try solving my given problems by building BlobSeer on top of this DHT </li></ul>
  6. 6. Distributed version managers <ul><li>Distributed version managers: A 2 levels </li></ul><ul><ul><li>Splitting BLOB_ID namespace </li></ul></ul><ul><ul><ul><li>DHT-based </li></ul></ul></ul><ul><ul><ul><li>Fortunately, blob is independent from each other </li></ul></ul></ul><ul><ul><ul><li>Hash (BLOB_ID) => ID of version manager server </li></ul></ul></ul><ul><ul><li>Splitting version ID’s space per BLOB </li></ul></ul><ul><ul><ul><li>Easily Rely on DHT replication </li></ul></ul></ul><ul><ul><ul><li>Hash (BLOB_ID) => {neighbouring version managers} </li></ul></ul></ul><ul><ul><li>Lookup cost = O(1), equally to BlobSeer </li></ul></ul>
  7. 7. <ul><li>Concurrent writing/appending need to be serialized </li></ul><ul><ul><li>On master </li></ul></ul><ul><ul><li>Blob.getlatest() </li></ul></ul><ul><ul><li>Blob.write() </li></ul></ul><ul><ul><li>Blob.append() </li></ul></ul><ul><li>Access to history versions </li></ul><ul><ul><li>Randomly on {master, slaves} </li></ul></ul><ul><ul><li>Blob.read() </li></ul></ul><ul><ul><li>Blob.getsize() </li></ul></ul><ul><ul><li>Ask Master only in case of necessary </li></ul></ul><ul><li>Master periodically PUTS OR Slaves PULL versions to do serialization </li></ul><ul><ul><li>Version info is quite tiny </li></ul></ul>
  8. 8. <ul><li>Eliminate the provider manager </li></ul><ul><ul><li>Provider manager keeps cluster state to answer clients’ requests </li></ul></ul><ul><ul><ul><li>Lookup costs O(1) </li></ul></ul></ul><ul><li>Providers can learn themselves about the system state </li></ul><ul><ul><li>Load and Load balancing?? </li></ul></ul><ul><ul><li>Lookup costs O(1) </li></ul></ul><ul><ul><li>Use the presented DHT overlay to propagate providers’ states </li></ul></ul><ul><ul><ul><li>Gossip-based (limited in cluster size around 1000 but it is still good) </li></ul></ul></ul><ul><ul><ul><li>Or a lighting version of P2P overlay (E.g. Kelips) </li></ul></ul></ul><ul><ul><ul><li>Hotspot when increasing number of clients, providers </li></ul></ul></ul><ul><li>Client randomly asks any providers </li></ul>
  9. 9. However !!! <ul><li>We will not want to use consistent hashing  </li></ul>
  10. 10. Architecture Version managers, metadata managers, providers, clients DHT with consistent hashing Distributed membership management Gossip based Zookeeper (like Google’s chubby) Replication, fault tolerance, leader election
  11. 11. Access scenarios <ul><li>Reading </li></ul><ul><ul><li>Hash blobID to know its associated version manager </li></ul></ul><ul><ul><li>Go down the metadata tree </li></ul></ul><ul><ul><li>Access providers </li></ul></ul><ul><ul><li>O(1) for any step and equal to the current BlobSeer design </li></ul></ul><ul><li>Writing </li></ul><ul><ul><li>The same as in BlobSeer but no provider manager </li></ul></ul>
  12. 12. Overview of the implementation <ul><li>Gossip based DHT </li></ul><ul><li>We need 3 hash namespaces </li></ul><ul><ul><li>Version managers </li></ul></ul><ul><ul><li>Metadata providers </li></ul></ul><ul><ul><li>Providers </li></ul></ul><ul><li>Elasticity </li></ul><ul><ul><li>Is inherent if we use consistent hashing for DHT </li></ul></ul><ul><li>Fault-tolerance </li></ul><ul><ul><li>DHT based </li></ul></ul><ul><li>Load balancing </li></ul><ul><ul><li>DHT based </li></ul></ul>
  13. 13. Advantages <ul><li>Still keeping the current nice features of BlobSeer </li></ul><ul><li>Monolithic-based design </li></ul><ul><ul><li>Node provides all capabilities as a client, a version manager, a metadata manager and a provider </li></ul></ul><ul><ul><li>Simpler/easier for configuration/deployment (autonomic feature?) </li></ul></ul><ul><li>Load balancing </li></ul><ul><li>Fault tolerance </li></ul><ul><li>Elasticity </li></ul><ul><li>Compare to NoSQL key/value store </li></ul><ul><ul><li>Efficient one key/ a value of TB size (versioning, throughput) </li></ul></ul>
  14. 14. Some more discussions <ul><li>If client is outside of BlobSeer storage cloud, client randomly chooses one node to communicate. Node is as a proxy server (Cassandra) </li></ul><ul><li>We may need a small number of version manager, metadata managers </li></ul><ul><ul><li>Leader election (can base on Apache Zookeeper) </li></ul></ul><ul><ul><li>If we fix them, we will reduce overhead at DHT level </li></ul></ul>BlobSeer cloud Client
  15. 15. BlobSeer in NoSQL paradigm <ul><li>Document stores </li></ul><ul><li>Column stores </li></ul>
  16. 16. {pages} distribution <ul><li>BlobSeer’s approach </li></ul><ul><ul><li>Distribute {pages} over different providers </li></ul></ul><ul><ul><li>{pages} are mapped to physical addresses of providers directly </li></ul></ul><ul><li>DHT’s approach </li></ul><ul><ul><li>DHT is used only to know how has {pages} but not to route {pages} </li></ul></ul><ul><ul><li>Must find a good way: {pages} of single write should be distributed over different providers? [YES or NO] </li></ul></ul><ul><ul><ul><li>Hopefully, page keys are picked by client in BlobSeer </li></ul></ul></ul><ul><ul><li>DHT load balancing </li></ul></ul><ul><ul><li>DHT fault-tolerance </li></ul></ul><ul><ul><li>Lookup cost: O(1) </li></ul></ul>
  17. 17. <ul><li>Eliminate the provider manager </li></ul><ul><ul><li>Provider manager keeps cluster state to answer clients’ requests </li></ul></ul><ul><ul><ul><li>Lookup costs O(1) </li></ul></ul></ul><ul><ul><ul><li>Hotspot when increasing number of clients, providers </li></ul></ul></ul><ul><li>Providers can learn themselves about the system state </li></ul><ul><ul><li>Lookup costs O(1) </li></ul></ul><ul><ul><li>Use the presented DHT overlay to propagate providers’ states </li></ul></ul><ul><ul><ul><li>Gossip-based (limited in cluster size around 1000 but it is still good) </li></ul></ul></ul><ul><ul><ul><li>Or a lighting version of P2P overlay (E.g. Kelips) </li></ul></ul></ul><ul><ul><li>Need a good way to distribute {pages} of each separated write operation over DHT? </li></ul></ul><ul><ul><ul><li>BlobSeer’s approach </li></ul></ul></ul><ul><ul><ul><li>DHT’s approach </li></ul></ul></ul>

×