Successfully reported this slideshow.
Your SlideShare is downloading. ×

Object Compaction in Cloud for High Yield

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
G1: To Infinity and Beyond
G1: To Infinity and Beyond
Loading in …3
×

Check these out next

1 of 15 Ad

Object Compaction in Cloud for High Yield

Download to read offline

In file systems, large sequential writes are more beneficial than small random writes, and hence many storage systems implement a log structured file system. In the same way, the cloud favors large objects more than small objects. Cloud providers place throttling limits on PUTs and GETs, and so it takes significantly longer time to upload a bunch of small objects than a large object of the aggregate size. Moreover, there are per-PUT calls associated with uploading smaller objects.
In Netflix, a lot of media assets and their relevant metadata is generated and pushed to cloud.
We would like to propose a strategy to compact these small objects into larger blobs before uploading them to Cloud. We will discuss how to select relevant smaller objects, and manage the indexing of these objects within the blob along with modification in reads, overwrites and deletes.
Finally, we would showcase the potential impact of such a strategy on Netflix assets in terms of cost and performance.

In file systems, large sequential writes are more beneficial than small random writes, and hence many storage systems implement a log structured file system. In the same way, the cloud favors large objects more than small objects. Cloud providers place throttling limits on PUTs and GETs, and so it takes significantly longer time to upload a bunch of small objects than a large object of the aggregate size. Moreover, there are per-PUT calls associated with uploading smaller objects.
In Netflix, a lot of media assets and their relevant metadata is generated and pushed to cloud.
We would like to propose a strategy to compact these small objects into larger blobs before uploading them to Cloud. We will discuss how to select relevant smaller objects, and manage the indexing of these objects within the blob along with modification in reads, overwrites and deletes.
Finally, we would showcase the potential impact of such a strategy on Netflix assets in terms of cost and performance.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Object Compaction in Cloud for High Yield (20)

Advertisement

More from ScyllaDB (20)

Recently uploaded (20)

Advertisement

Object Compaction in Cloud for High Yield

  1. 1. Brought to you by Compacting Small Objects in Cloud for High Yield Tejas Chopra Senior Software Engineer at
  2. 2. Agenda ■ Introduction ■ Problem statement & overview ■ Compaction basics ■ Data structures: blobs, blob tables ■ File system operations ■ Results ■ Conclusion
  3. 3. Introduction ■ Senior Software Engineer, Netflix ■ Keynote speaker: Distributed systems, Cloud, Blockchain ■ Senior Software Engineer, Box ■ Datrium, Samsung, Cadence, Tensilica
  4. 4. Overview ■ Cloud FS: Provide FS interface to objects ● Simple approach: Each file → object ● Cloud workloads: Many small files, bursty ■ Small sized files, not beneficial for Cloud Storage ■ Request Level costs ● To upload 1GiB file with 4KiB PUTs is 57x the cost of storing the file ■ Throttling ● S3 Best Practice Guideline: If > 300 PUT/DELETE/LIST/s or > 800 GET/s, S3 may rate limit ● 1 4MiB PUT is 1000x faster than 1000 4KiB PUTs
  5. 5. S3 Alternatives ■ EFS: Generic File system: Charges only for storage: $0.3/GB/month ■ DynamoDB: NoSQL, $0.25/GB/month for 1 write/s/month. $0.09/GB/month for 1 read/s/month ■ S3: $0.023/GB/month, 0.0005c/PUT ■ To store 1 TiB worth of 4KiB files, 100 writes/s: ● S3 (as is): $1366; operation cost: 98%; 5 yrs of storage to match operation cost ● EFS: $307, operation cost: 0% ● DynamoDB: $303, operation cost: 16%; 5 days of storage to match operation cost ● S3 (with compaction): $24, operation cost: 0.001%; 27 seconds of storage to match operation cost.
  6. 6. What is compaction? ■ Pluggable module on the client side ■ Module packs data into GiB sized blobs and an index of where the file is within the blob ■ Redundant client side global index of all files in all blobs ■ Embedded index in every blob is for fault-tolerance
  7. 7. Data Structures: Blob ■ Single, immutable object on cloud ■ Data from multiple files is packed into a blob. Footer contains information about the byte ranges of a file within the blob ■ By reading all footers of all compacted blobs, we can recreate the world ■ Could have used SSTables, but it requires sorting before persistence
  8. 8. Data Structures: Blob Table ■ Optimization over reading all blob footers ■ Global table that contains information of all files in all blobs ■ Redundant - can be recreated ■ Consistency is a challenge because Blob creation is distributed.
  9. 9. Compaction ■ Sufficient data to be accumulated before it is compacted ■ Data is buffered on client worker nodes, and is not durable until pushed to cloud ■ Compaction policy is specified at mount time ■ Once data is compacted, embedded indices within the blob get updated ■ Blob name, objectId: <clientIP>:<FooterOffset>:<Date> ■ After blob is pushed to cloud, blob table is updated with new extents for a file ■ Global blob table always maintains the updated loc for each file ● If file is buffered on worker, contains the worker’s IP ● If file is blob on cloud, contains the blob’s objectId and file’s extent in the blob
  10. 10. Reading Data ■ Depends on whether data is buffered or pushed to cloud ■ Blob Table contains the location of data ● If on a client worker, worker-worker transfer for read ■ Range reads for cloud blobs ● May want to fetch more than just what’s needed - essence of prefetching
  11. 11. Deletes and Renames ■ Deletes ● When client issues a delete, remove entry from Blob table ● Periodically check the liveness of a blob ● If live files are less than a threshold, repack the blob ■ Renames ● Atomic update of file’s name in Blob table ● Piggyback on future blobs’ embedded indices ● For recovery from embedded indices, blobs are read in order of creation, so latest blobs will contain the rename information
  12. 12. Fault Tolerance ■ Master ● Periodically backup global Blob table to cloud ● On a crash, read last checkpointed Blob table and all the blobs written after that time ■ Worker ● Files buffered for compaction are lost ● Can be still recovered if persisted on the local disk ● If compacted and pushed to cloud, Blob table contains all information for recovery ● If compacted and pushed to cloud, but the Blob table not updated, the embedded indices contain information for recovery
  13. 13. Results
  14. 14. Conclusion ■ Cloud file systems should aggressively compact small objects ■ Significant costs for operations and rate limiting ■ Throughput increased by 60x, costs reduced by 25000x ■ Backups of smaller files can be archived using this strategy to improve backup times and costs ■ Compacting policies can compact objects by create-time, per-application, etc. We could apply learning to choose objects to compact, objects with similar read profiles can benefit from prefetching the entire blob
  15. 15. Brought to you by Tejas Chopra chopratejas@gmail.com @chopra_tejas

×