Prospero Media StorageManaging 100TB of small files…IGT – EventJuly 2011
Numbers70TBused space700 million files200GBand 250,000 files uploaded every day1200Mbpsbandwidth throughput in peak180TBof data is being served out monthly3700 Hits per second in peak 40 storage node servers – 300TB raw space$0.13 per GB
MotivationWeb 2.0 content serving paradigm shiftToo many files12M users x 1 file = very long tailToo many connections1M users + keepalive = 1M connectionsLiving with modern content in web 2.01 file x (thumbnail + iPhone + Mac) = 3 file copies
Traditional ArchitectureHTTPIOIOIOIOCentralized Storage (NAS, SAN, DAS etc.)
Traditional ArchitectureHTTP – TOO MANY CONNECTIONSIOIOIOIOCentralized Storage (NAS, SAN, DAS etc.)
Traditional ArchitectureHTTPIOIOIOIOIOIOIOCentralized Storage (NAS, SAN, DAS etc.)
Traditional ArchitectureHTTPIOIOIOIOIOIOIOToo much IO
Traditional ArchitectureHTTPCacheIOIOIOIOIOIOIOCentralized Storage (NAS, SAN, DAS etc.)
“There are only two hard things in Computer Science: cache invalidation and naming things”. -- Tim Bray quoting Phil Karlton
Architecture goalsSymmetric identical server nodesSimplified management and scalingLinear scaling outNo functional / role serversNo single point of failureNo performance bottlenecksMultiple datacenters supportDRP supportGeo load distribution
Meet ProsperoDistributed Web content storage systemFull blown HTTP supportRuns on low cost commodity hardwareAdjustable file level replication controls redundancy policy for every content typeProvides dynamic image manipulation
How do we do it?
Designed to failFallback for every operationGeographical, machine, storage mediumWrite never failsAll files will reach their destinationJournalingTracking all uploaded filesPending jobs Guaranteed file distribution
How do we achieve thisControl the inputdefine the only unified API Functional process isolationevery function deserves its own process by defaultwatchdogsmonitorsalerts
get 37D815B5.jpgGo to 37 range serversFallback if not found2.static6.static0.static4.staticHTTPHTTPHTTP20-3f60-7f00-1f40-5f7.static3.static1.static5.staticHTTPHTTPHTTP
Fallback Example
Node Architecture
Real Life
It’s all about performanceNon blocking IO, readiness notification (epoll)Asynchronous file IO (AIO)Zero copy (sendfile)Memory mapsInter-process binary protocolsUNIX socketMinimize dynamic memory allocationlighttpd memory footprint: 50MB
Lessons learntBe symmetricControl the inputDesign to failurePerformance matters againSimple is hard but a must

Wix 10M Users Event - Prospero Media Storage

Editor's Notes

  • #20 AIO: Asynchronous i/o overlaps application processing with i/o operationsfor improved utilization of CPU and devices, and improved application performance, in a dynamic/adaptive manner, especially under high loads Zero-copy: Hardware that supports gather can assemble data from multiple memory locations, eliminating another copy.Step one: the sendfile system call causes the file contents to be copied into a kernel buffer by the DMA engine.Step two: no data is copied into the socket buffer. Instead, only descriptors with information about the whereabouts and length of the data are appended to the socket buffer. The DMA engine passes data directly from the kernel buffer to the protocol engine, thus eliminating the remaining final copy.involving large numbers of i/o operations.