• Share
  • Email
  • Embed
  • Like
  • Private Content
Understanding Amazon EBS Availability and Performance

Understanding Amazon EBS Availability and Performance






Total Views
Views on SlideShare
Embed Views



3 Embeds 161

http://copperegg.com 116
http://pages.copperegg.com 44
https://twitter.com 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.


18 of 8 previous next Post a comment

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • @bincheng111 As far as I know, EBS volumes do not give you a choice between those. ROW is something most people talk about in reference to ZFS snapshots, not EBS snapshots. For guides on the details of implementing and doing snapshots, Amazon has done a good job documenting this (as well as hundreds of blog and tutorial sites) so I won't try to out-do them here in a comment section :).
    Are you sure you want to
    Your message goes here
  • @destari :How to implement the snapshot?ROW or COW?
    Are you sure you want to
    Your message goes here
  • @bincheng111 First snapshots take a while because they snapshot every block on the volume. Subsequent snapshots take less time since they are only snapshotting the changes since the previous snapshot.
    Are you sure you want to
    Your message goes here
  • @bincheng111 I'm not sure what you mean by hardware RAID on EBS. There is no hardware on EC2 that you can control, so it is all software.
    Are you sure you want to
    Your message goes here
  • @bincheng111 Re: bursty workloads: you will need to understand your workload, and determine if you can handle spikes in IO performance with an average around 100 ops/s. If you can stay around that average, standard EBS might be fine for you. If you need sustained IO performance higher than that, you need provisioned IOPS EBS.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Understanding Amazon EBS Availability and Performance Understanding Amazon EBS Availability and Performance Presentation Transcript

    • AWS Summit 2013Navigating the CloudUnderstanding Amazon EBS Availability and PerformanceEric AndersonCopperEggApril 18, 2013
    • CopperEgg: EBS Use Case• How CopperEgg uses EBS• EBS vs Provisioned IOPS EBS• EBS and RAID• Backup/Snapshot best practices• Filesystem selection and tuning• Monitoring/Migrations/Planning
    • How CopperEgg uses EBS• Real-time monitoring (every 5s)– System information– Processes– Synthetic HTTP/TCP/etc– Application metrics– Tons more..• Requirements:– Store many terabytes of data– Persist the data over long periods of time– Backups (use snapshots)– High IO: 50-60k+ ops/s per node• SSD + Provisioned IOPS EBS– Consistent IO behavior (non-spikey)
    • EBS vs Provisioned IOPS EBS• Standard EBS– Good for low IO volume– Bursty workloads may be a goodfit: do the math• Provisioned IOPS EBS– Great for steady IO patterns thatneed consistency– Not always more expensive thanstandard!– Be sure to use the IOPS youprovision!
    • EBS and RAID• Which RAID?– Depends on your use case, but:• We use stripes (RAID 0) for most things– Good performance, we build our fault tolerance at a different level• RAID 10 (stripe of mirrors)– Good RAID0 performance, but increase in fault tolerance due to mirrors– Twice the cost of RAID 0• RAID 0+1 (mirror of stripes)– Don’t do this – same performance, worse fault tolerance• RAID 5 (stripe with parity)– Could be dangerous: software RAID 5 can be bad if you have any write caching enabled.– Maybe RAID 6 (dual parity) is an option..• Block size– Use an appropriate stripe size for best results• We use 64kb – but you need to test various configs to get the best fit for your application
    • Backup/Snapshot best practices• Snapshot regularly– At least once per day, more if you can– First snapshots take a while, subsequent are faster– Schedule for when your IO load is lowest to reduce impact• We do it at around 9pm CST• Use consistent naming for snapshots– {hostname}-{raid device}-{device}-{timestamp}• Use the API for creation– Faster kickoff, more likely to be consistent (script it!)– ec2-create-snapshot –d “{hostname}-{raid device}-{device}-{timestamp}” vol-d726382• Move older snapshots to S3/Glacier for long-term storage• RAID makes this a bit more complex:– Make sure you unmount/snapshot/remount your file system, or use fsfreeze to keepconsistent snapshots!
    • Choosing a good file system• We like ext3/4, but we love XFS– High performance, consistent– Robust and lots of options for tweaking/adjusting as needed• Our favorite mount options: (your mileage may vary)– inode64, noatime, nodiratime, attr2, nobarrier, logbufs=8, logbsize=256k, osyncisdsync, nobootwait, noauto– Yields great performance, reduces unnecessary writes, stable• We like ZFS a lot too, but we want to see more runtime on linux first– But FreeBSD/ZFS would be a fine choice• However: test your workload!– File systems behave differently under different workloads
    • EBS/File system performance tuning• Tuning file systems:– Set the scheduler to use „deadline‟ (for each disk in RAID array/EBS):• [as root] echo deadline > /sys/block/[disk device]/queue/scheduler– Adjust how aggressively the cache is written to disk. Tune these back if you arebursty in write IO:• vm.dirty_ratio=30• vm.dirty_background_ratio=20• Track what you change!– Before changing anything, monitor it– After you make the change, monitor it– Then: KEEP monitoring it – things can change over time in unexpected ways
    • Monitoring• Observing:– iostat –xcd –t 1• Watch the sum of r/s and w/s – this is your IOPS metric. For PIOPS, you want it close to the provisionedamount. We monitor this using CopperEgg custom metrics, and alert if it goes low, or high.– grep –A 1 dirty /proc/vmstat• If nr_dirty approaches nr_dirty_threshold, you need to tune down vm.dirty to flush writes more often.• Reference: http://docs.neo4j.org/chunked/stable/linux-performance-guide.html• Useful stats to capture:– In /proc/fs/xfs/stat• xs_trans* -> transactions• xs_read/write* -> read/write operations stats• xb_* -> buffer stats• Ignore SMART - does not work for EBS• Watch the console log– Use the AWS API to look for warning signs of EBS issues
    • Migrations and Capacity Planning• Using PIOPS?– Plan on a data migration path if you need to increase PIOPS• You can‟t (yet) increase IOPS on the fly• Migration steps from an EBS backed RAID:1. Snapshot 1hr before, then again, and again – each time it takes less time2. Stop all services3. Unmount the filesystem4. Stop the RAID (mdadm –stop /dev/md0)5. Take final snapshot6. Create new volumes based on last snapshot7. RAID attach new volumes – mdadm should detect the array and magically make it work.8. Mount the filesystem9. Restart services