1. Swift in
the Small
OpenStack Meetup
June 29, 2011
Computer History Museum
Mountain View, CA
Joe Arnold - Cloudscaling
twitter: @joearnold
blog: http://joearnold.com
Wednesday, June 29, 2011
- The theme of tonight is Corporate IT.
- The promise of OpenStack for Corporate IT is the ability to take advantage of
-- all the great tooling,
-- all the great services,
-- all the compatible applications that use infrastructure cloud services as a platform.
- It gives the ability to deploy cloud infrastructure in-house.
- Tonight I’ll be covering OpenStack Object Storage, Swift -- In the Small
- Raise of hands: How many have downloaded and installed either Swift?
2. Wednesday, June 29, 2011
- Swift is an Object Storage system that was designed for scale.
- This was one of the first clusters we deployed.
- It’s a petabyte of useable storage. It can serve a lot of users.
- For the spinning disks of aluminum, bent sheet metal, forged iron for the racks, strands of
glass, and silicon wafers, etc... A deployment like this is a great deal at $500,000 and a
million dollars.
- But not everyone needs a petabyte out of the gate.
- Even for these deployments, we have staging clusters in the range of 80-100 TB
3. Wednesday, June 29, 2011
- Challenge for this ‘Corporate IT’ theme is what a small-scale Object Storage (Swift) cluster
would look like.
- What does it take and what compromises are made when scaling down something designed
for large-scale.
- This, for example, is a 4-U, 36 drive system from ComputerLINK. ComputerLINK was nice
enough to provide a demo unit for the meetup tonight.
- I’ll be powering it up in a few minutes and if you’re interested, you can come over and we
can start pulling drives and watch data get replicated around.
4. Zone 2
Zone Zone Zone Zone
Wednesday, June 29, 2011
Why is this a challenge? — Zones
- Swift is designed for large-scale deployments.
- The mechanisms for replication and data distribution are built on the concept that data is
distributed across isolated failure boundaries. These isolated failure boundaries are called
zones.
- Unlike RAID systems, data isn’t chopped up and distributed throughout the system.
- With Swift whole files are distributed throughout the system. Each copy of the data resides
in a different zone.
- Swift stores 3 copies of the data, so at least 4 zones are required. (in case 1 zone fails)
- Preferably 5 zones (so that 2 zones can fail).
- In the big clusters, failure boundaries can be separate racks with their own networking
components.
- In medium deployments, a physical node can represent a zone.
- For smaller deployments with fewer then 4 nodes, drives need to be grouped together to
form pseudo-failure boundaries.
- A grouping of drives is simply declared a zone.
- Here is a scheme for starting small and growing the cluster bit-by-bit (well.. terabyte-by-
terabyte).
5. 4 Disks 4 Zones
Wednesday, June 29, 2011
- For a single storage node the minimum configuration would have 4 drives for data + 1
boot drive.
- Each disk is a zone.
- If a single drive fails, it’s data will be replicated to the remaining 3 drives in the system.
- The system would grow, 4-disks at at time (one in each zone) until the chassis was full.
6. Zone 1 Zone 2
Zone 3 Zone 4
Wednesday, June 29, 2011
- The strategy here is to split the zones evenly across the two nodes.
- The addition of an additional node does increases availability (assuming that load
balancing is configured),
- but it does does not create a master-slave configuration. If one of the nodes is down ½ of
your zones are unavailable.
- The good news is that if one of the nodes is down (½ of your zones), data is still
accessible.
- This is because because at least one of the zones will still up on the remaining node.
- The bad news is that there is still a 1 in 2 chance that writes will fail
- because at least two of three zones need to be written to for the write to be considered
successful.
7. Zone 1 ⅓ Zone 4
Zone 2 ⅓ Zone 4
Zone 3 ⅓ Zone 4
Wednesday, June 29, 2011
- The addition of a third node further enables distribution of zones across the nodes.
- Something strange is going on here by putting whole zones in each node,
- but breaking up zone 4 into thirds and distributing across the three nodes.
- This is done to enable smoother rebalancing when going to 4 nodes.
- Again, if a single node is down, data will be available, but there will be a 1 if 5 chance that
a write would fail.
8. Zone 1 Zone 2
Zone 3 Zone 4
Wednesday, June 29, 2011
- The strategy of breaking up Zone 4 into thirds with 3 nodes, is to make this transition
easier.
- The cluster can be configured with zone 4 entirely on that new server,
- then the remaining zones can slowly be rebalanced to fold-in the newly vacated drives on
their node.
- Now, if a single node fails, writes will be successful as at least two zones will be available.
9. Wednesday, June 29, 2011
- Why small-scale Swift?
- Using OpenStack Object Storage is a private-cloud alternative to S3, CloudFiles, etc.
- This enables private cloud builders to start out with a single machine their own data center
and scale-up as their needs grow.
- Why not use RAID?
- Why not use a banana? :) It’s a different storage system, used for different purposes.
- Going with a private deployment of Object Storage gives something that looks and feels
just like Rackspace Cloud Files.
- App developers don’t need to attach a volume to use the storage system and assets can be
served directly to end users or to a CDN.
- The bottom line is that a small deployment can transition smoothly into a larger
deployment.
- The great thing about OpenStack being open-source software is that it gives us the
freedom to build and design systems however we see fit.