Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Quota enforcement for high-performance                              distributed storage systems Kristal T. Pollack, Darrel...
decisions that give the best result for the application the                                                               ...
fi   l   e   m                     (                         a                             b                               ...
sumption rate, while reserving some quota in case new                                                                     ...
quota-voucher-paper.pdf
quota-voucher-paper.pdf
quota-voucher-paper.pdf
quota-voucher-paper.pdf
quota-voucher-paper.pdf
quota-voucher-paper.pdf
Upcoming SlideShare
Loading in …5
×

quota-voucher-paper.pdf

467 views

Published on

  • Be the first to comment

  • Be the first to like this

quota-voucher-paper.pdf

  1. 1. Quota enforcement for high-performance distributed storage systems Kristal T. Pollack, Darrell D. E. Long, Richard A. Golding, Benjamin Reed, Ralph A. Becker-Szendy IBM Almaden Research Center, San Jose, CAAbstract cess control. Existing quota systems trade off scalability and accu-Storage systems manage quota to ensure that each user racy. A centralized quota tracking server can be byte-gets the storage they need, and that no one user can—even accurate as long as each client informs the server on eachby accident—use up all available storage. This is diffi- resource allocation, which inhibits scalable performance.cult for large, distributed systems, especially those used Other systems use a centralized server but relax accuracy,for high-performance computing applications, because re- either by tracking quota only in large granules or by us-source allocation occurs on many nodes concurrently. We ing time-limited escrow mechanisms (which set aside apresent a scheme where quota is enforced asynchronously certain amount of resource for a client for a limited du-by intelligent storage servers: storage clients contact a ration), both of which reduce the frequency with which ashared management service to get vouchers, a capability- client must interact with the quota tracking server.like certificate that the clients can redeem at participating The existing quota systems also provide for a sin-storage servers to allocate storage space. This approach gle quota-related policy for all clients. In a distributedproduces low load on the shared management service, file system like SanFS [10], for example, the centralizedpromotes good scaling, and allows the client to make de- metadata server decides which logical disks in a storagecisions about which storage server(s) to use without com- pool a client should be allocated when it needs storage.municating with the management service for further ap- This requires that the policy not only have an accurateproval. Storage servers and the management service peri- record of how much quota each user has consumed, butodically reconcile voucher usage to ensure that clients do also an accurate map of how much resource is availablenot cheat by spending the same voucher at multiple stor- on every server on which that resource could be allocated.age servers. We report on a simulation study that shows We propose an alternative approach to tracking and en-that this approach gives performance nearly as good as forcing resource limits. This approach borrows from mi-not enforcing quota at all, and that the load on the shared crocash mechanisms: there is a centralized server thatmanagement server is remarkably low. acts as a bank that issues vouchers to clients, which the client can spend to allocate resources on whatever server they want. The client can withdraw enough vouchers to1 Introduction cover their needs for some period, during which time the client does not need to contact the bank. Servers are ableTracking and enforcing resource usage limits in a large to check the vouchers for authenticity. The vouchers aredistributed system is difficult because it requires maintain- valid for a limited time, in order to handle clients that fail,ing a consistent view of total usage when consumption is and servers periodically reconcile their transactions withoccurring in several places concurrently. In a file system, the bank to check that clients have behaved correctly.for example, users must not use more than their storage This approach provides a different tradeoff than otherquota. Many scientific applications involve tens of thou- quota servers. It provides excellent scalability—in mostsands of nodes all cooperating on a problem, all writing to cases indistinguishable from not tracking resource us-shared files and consuming from the same pool of quota. age at all—while providing byte-accurate but temporally-The file system is typically built as a small cluster of meta- coarse accuracy similar to time-limited escrow. It reducesdata servers and a larger number of storage servers or disk load on the centralized tracking service well below that ofarrays. We concentrate here on systems that use storage other mechanisms. It also decouples quota tracking fromservers that provide intelligence similar to object storage, allocation policy, so that the quota server only needs toand so can track local storage allocation and enforce ac- track how much quota a user has consumed, and does not 1
  2. 2. decisions that give the best result for the application the C l i e n t client is running (self-interest), while ensuring that users do not go over quota and that storage resources are not A u t h o r i z a t i o n A c c e s s over-used (community interest). Different files can have significantly different needs, and so the system allows a different layout for each file. m a n a F g i l e e m e n t D i s k One file may be located on a single storage server; another may be mirrored and striped across many storage servers. i s k D S t o r a g e l u s t e r c s e r v e r s The client decides what the layout should be, based on expected needs derived either from application hints or inferred from file attributes. Peak file creation rates can be high in some scientific applications, and so it is impor- tant for good scalability to minimize the dependence on Figure 1: Basic distributed storage system architecture. the shared file management service during file creation. Scientific applications have characteristics different from what studies of end-user workstations have shown.need to be concerned with where that consumption has The absolute numbers are several orders of magnitudeoccurred, which reduces the load on the quota server and larger: petabytes of data are being deployed now, withmakes it easier to partition the quota tracking work across aggregate transfer rates of gigabytes per second, files inmultiple servers. Further, each client can decide for itself the terabytes, all being accessed by tens of thousands ofwhere to allocate resources based on its own needs which clients. The clients are cooperating to run one application,allows a client to customize its allocation based, for ex- and both read- and write-share files. The applications areample, on how a particular file will be used. bursty, as the clients synchronize as they move through phases of a computation and write out checkpoints con- currently or read the results of previous phases. Some of2 System context the files are only temporary, for communication between computation phases, while others are results of days ofFigure 1 shows the architecture of the distributed storage computation and must be carefully protected.systems we are investigating. In these systems, clients act Because K2 is built for large distributed environments,on behalf of users. The clients communicate with a file its design works to minimise the trust required in any onesystem management service cluster to locate files and au- component—in particular, the client. While many clientsthorize actions. The authorization includes both checking may be part of a single homogeneous compute cluster,permission to access data and permission to consume re- some clients will be different and potentially not undersources. The authorization is expressed using location- careful administrative care—for example, user worksta-independent capabilities [11] and vouchers, which en- tions used for visualizing results. The system assumescode the client’s rights to access files and to allocate re- that clients authenticate the users that run on them, andsources respectively. Once the client has the capabilities that the clients can provide evidence of that authentica-and vouchers it needs, it communicates directly with stor- tion when communicating with the file management ser-age servers to read and write data and to create and delete vice and with storage server [9]. Our design assumes thatfiles. The storage server has the intelligence to manage clients can crash-fail. While some clients can also be ma-internal resource allocation and to check capabilities for licious, we do not focus on them; for example, we dovalidity, similar to object store model [8, 5]. not provide data access that can survive Byzantine behav- We are investigating quota management as part of the ior [1]. However, for resource quota management we doK2 distributed storage system. For scalability reasons, K2 bound the effect that any client can have on the system.pushes decentralization as far as possible. Each node isan autonomous agent, acting in its own interest as muchas possible while respecting community needs. We make 3 Protocol operationthis possible by each node acting as an enlightened ra-tional agent, with algorithms that work to meet the node’s In this section we give an overview of the operation of theneeds while avoiding the “tragedy of the commons,” [6] in voucher-based quota system. Figure 2 shows the generalwhich the limits on shared resources are not considered. flow of usage. A client first requests a voucher for storageIn concrete terms for a storage system, this means that resources from the quota server for the user, then sendswe want each client to be able to make its own allocation IO requests to storage nodes, including the voucher when 2
  3. 3. fi l e m ( a b n a a n g k e ) m e n t ( f o c r l i u e s n e t r ) s s e t o r v r a e g r e 1 s s e t o r v r a e g r e 2 Getting vouchers. While a client could ask the man- r e q u e s t ( u s e r i d , a m o u n t ) agement server for quota authorization on every I/O, v o u c h e r this would put an unreasonable load on the management I O r e q u e s t ( v o u c h e r ) server. Instead, the client maintains a pool of vouchers, I O r e q u e s t ( v o u c h e r ) and only periodically communicates with the manage- ment server. The client tries to maintain enough vouchers a l l o c a t e r e s o u r c e l e f t o e r v to cover any allocation it expects to do in the near future, c h e c k u s a g e while allowing for other clients to share quota. This ap- proach reduces load on the management server and im- proves client response latency. The client has to decide when to request vouchers andFigure 2: Sequence of operations. A client begins by ob- how much resource to ask for; the management servertaining a voucher for a user from the quota server, then has to decide how much of that request to grant. Thespending that voucher during IO requests to different stor- management server must maintain the invariant that theage nodes. Later, the quota server and storage nodes check vouchers granted for a user to any client—which repre-that the client did not overuse a voucher. sent potentially-used resources—plus the amount actually allocated do not go over the user’s quota. While there are many possible policies for decidingthose requests may consume resources. If a client frees when and how much to ask for, we focus on a client re-resources on a storage server, the storage server gives the questing quota from the management server on a regularclient a “refund” voucher for the amount freed. The quota schedule. This generally makes the load on the manage-server and storage nodes periodically reconcile the set of ment server proportional to the number of clients, rathervouchers that have been spent against those that have been than to the intensity of workload on those clients. Theissued, in order to detect clients that overuse a voucher. client uses its history of recent resource consumption to estimate how much it will likely use between the currentVouchers. A voucher is a record of a decision to allow request and the next request, and asks the managementa client to consume resources on behalf of a particular server for the difference between the estimated usage anduser. It is represented as a cryptographically-protected se- the vouchers it already has on hand.quence of bytes: If actual usage is higher than anticipated, then the client will have to ask the management server for extra vouchers {epoch, expiry, user, amount, serial}auth before its next scheduled request. The client can estimate the management server’s response time and the short-termsimilar to capabilities used to authorize actions in voucher usage rate to predict when to send a request to theAmoeba [11] and the T10 OSD [8]. User and amount are management server before the client runs out of vouchers.obvious. The voucher has a unique serial number, whichis used when storage servers reconcile voucher usage with The client may have extra vouchers when net consump-the management server. Each voucher also records when tion is lower than anticipated—perhaps because it hasit was issued (the epoch) and when it expires. been freeing rather than allocating storage. In that case The voucher includes a signature or MAC generated the client can return some vouchers to the managementusing a secret key known only to the management and server, making the quota available for other clients. Thisstorage servers, which ensures that a client cannot forge matters, for example, when one client is cleaning up olda voucher. However, it does not prevent one client from files while other clients are writing new data.eavesdropping on another, and so vouchers must be trans- The management server determines how much to grantmitted only over private channels. Issues like avoiding re- to a client based on its global information, including theplay attacks do not require special mechanisms in vouch- total amount of vouchers outstanding for a user and es-ers if they are used in conjunction with authorization ca- timated demand from all clients consuming that user’spabilities that provide defense against replay. quota. Granting more to a client can reduce the number of Each voucher is valid for only a limited duration, request messages that the management server must pro-as recorded in its expiry field. This is used in han- cess, but giving too much to one client can inhibit sharingdling failure—if a client crashes while holding an un- across multiple clients. The management server must alsoused voucher, other clients can use that quota once the not issue enough vouchers that a user could go over quota.voucher expires—and in reconciling storage and manage- One reasonable heuristic is for the management serverment servers. These are discussed further below. to give each client an amount proportional to its con- 3
  4. 4. sumption rate, while reserving some quota in case new K uclients begin using quota. The client policy discussed n e x i r e d p e x i r e d p Xabove makes requests approximately proportional to con-sumption rate, and so the management server can give R e c o n c i l e d 2 3 2 4 2 5 2 6 2 7each client the same fraction f of their requested amount. ( j u s t t o t a l s ) U n r e c o n c i l e d C u r r e n tWhen there is plenty of quota, f = 1. As the number of e p o c h s e p o c hclients increases or the amount of remaining quota de- ( t o t a l s a n d v o u c h e r screases, each client gets a fraction f = r/((n + 1)r) of c h ) e r e o p p t i m etheir requests, where r is this client’s consumption rate, nis the number of clients consuming from that quota, and ris the average consumption rate over all active clients. Figure 3: How the storage server maintains consumptionUsing vouchers. Once a client has obtained a voucher, information over multiple epochs. X is the number ofit can use the voucher to consume resources. The client epochs before vouchers expire; K is the number of epochspicks which storage server it will use; the problem of se- in the past when reconciliation happens.lecting the server is outside the scope of this paper. In the simplest way of using vouchers, the client sendsits I/O request to the storage server, along with one or The system divides time into epochs, as illustrated inmore vouchers that will cover any resource allocationsFigure 3. Each voucher is associated with the epoch inthe I/O request might require. The storage server keeps which it was issued, and the storage server keeps a list oftrack of how much resource was actually consumed by vouchers that it has received for each epoch. It also tracksthe request, and may send the client a new voucher forhow much resource was consumed against those vouch-any balance in its reply. The storage server keeps track ers in the epoch. At some point there can be no moreof how much each user has consumed, plus any recently-activity associated with an epoch—because all vouchersspent vouchers. The vouchers are periodically reconciled from that epoch will have expired—and the storage serverwith the management server in order to handle failure or can reconcile the list of vouchers used for that epoch withto catch cheaters, as discussed below. the management server. After reconciliation, the storage server can get rid of the list of vouchers and merge that Consider a simple scenario: a client is trying to write1 MB of data into an existing file. The client obtains epoch’s consumption information into the record of over-a voucher for (say) 2 MB from the management server, all reconciled consumption.then sends a write request to the storage server along with We summarize the formal model for tracking and rec-the voucher. The storage server determines how much re- onciling quota for a single user as follows. (The expo-source is consumed. It might consume nothing, if the re- sition for a single user is clearer than for multiple users,quest only overwrites already-allocated blocks, or it might but the rules are the same.) The user has a quota Q, andconsume a full 1 MB, or something in between. The stor- the management server has authorized some allocation A;age server will reply with a refund voucher for 2 MB mi- the system works to keep A ≤ Q. The allocation A is thenus the amount actually allocated. amount of storage used on all storage servers plus the Vouchers can also be used in a somewhat different amount of any unredeemed vouchers: A = ∑∀d Sd + U,way to solve a long-standing problem in object storagewhere Sd is the amount used on storage server d and U issystems—ensuring that an operation will succeed when the amount of unredeemed vouchers.multiple clients could be consuming resources in the stor- For the management server to know A accurately us-age server. A client can use a voucher to reserve resources ing this definition, it would need to be involved syn-at the storage server to ensure that its later operations will chronously on every resource allocation, which defeatshave the resources to complete. This is particularly im- the intent of the voucher approach. Instead, the man-portant for object stores because it is hard for a client to agement server uses a conservative estimate of A: in-predict how much resource any one operation will con- stead of the actual current usage ∑ Sd , it uses the us-sume. age determined at the last reconciliation, and instead of the actual unredeemed vouchers, it uses all vouchers is-Tracking and reconciling usage. The storage server sued since the last reconciliation. Formally, the manage-keeps track of how much a user has consumed, peri- ment server knows the amount of resource the user hadodically reconciles voucher usage with the management consumed as of the last reconciliation, which covered allserver to catch cheaters and recover from failures. e epochs up to and including epoch e: ∑∀d Sd , where e de- 4

×