2. How to store large amounts of data?
• Firstly it should be cheap and use less manpower
• They use OpenStack Swift and are the largest contributors to it.
• Researching Sequencing a lot of data.
• Hudson Alpha – they do sequence processing, use biology, using storage
compute and expertise.
• There was no storage before to do any of this.
• They had a low budget so then they decided to use Swift.
4. Data Avalanche
• Turnover time should be decreased.
• Metadata
• Data Proliferation.
• Cost of Downtime
• HPC throughput
• Multiple generations of hardware.
5. Storage is expensive!
• Amazon charges 37$ per TB
• So to store they needed 504 drives, but limited to 8 per customer then(s3),so
they got a 4PB rack at 150k$.
• It should be durable, available and flexible.
• They used Cgate and Swift so they could manage very easily.
• SwiftStack irradicates difficulty with provisioning.
6. Swift Stack
• Cost is reduced
• Architecture is simple.
• Practical application of this –
• Auto discarding objects, temp url for customers, file system gateway for
object storage, offsite replica if not using a CDN.
• Future – Erasure coding to erase replicas as this is a lot of data, pipelining
work, scaling.