Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
What Every Data Programmer <br />Needs to Know about Disks<br />OSCON Data – July, 2011 - Portland<br />Ted Dziuba<br />@d...
Who are you and why are you talking?<br />First job: Like college but they pay you to go.<br />A few years ago: Technical ...
The Linux Disk Abstraction<br />Volume<br />/mnt/volume<br />File System<br />xfs, ext<br />Block Device<br />HDD, HW RAID...
What happens when you read from a file?<br />f = open(“/home/ted/not_pirated_movie.avi”, “rb”)<br />avi_header = f.read(56...
What happens when you read from a file?<br />Disk<br />controller<br />user<br />buffer<br />page<br />cache<br /><ul><li>...
Latency: 100 nanoseconds
Throughput: 12GB/sec on good hardware</li></ul>platter<br />
What happens when you read from a file?<br />Disk<br />controller<br />user<br />buffer<br />page<br />cache<br /><ul><li>...
Latency: 10 milliseconds
Throughput: 768 MB/sec on SATA 3
(Faster if you have a lot of money)</li></ul>platter<br />
Sidebar: The Horror of a 10ms Seek Latency<br />A disk read is 100,000 times slower than a memory read.<br />100 nanosecon...
What happens when you write to a file?<br />f = open(“/home/ted/nosql_database.csv”, “wb”)<br />f.write(key)<br />f.write(...
What happens when you write to a file?<br />f = open(“/home/ted/nosql_database.csv”, “wb”)<br />f.write(key)<br />f.write(...
Aside: Stick your finger in the Linux Page Cache<br />Pre-Linux 2.6 used “pdflush”, now per-Backing Device Info (BDI) flus...
dirty_ratio: flush after some percent of memory is used
dirty_writeback_centisecs: how often to wake up and start flushing</li></ul>Clear your page cache: echo 1 > /proc/sys/vm/d...
Fsync: force a flush to disk<br />f = open(“/home/ted/nosql_database.csv”, “wb”)<br />f.write(key)<br />f.write(“,”)<br />...
Aside: point and laugh at MongoDB<br />Mongo’s “fsync” command:<br />> db.runCommand({fsync:1,async:true}); <br />wat.<br ...
It’s not enabled by default.</li></li></ul><li>Fsync: bitter lies<br />f = open(“/home/ted/nosql_database.csv”, “wb”)<br /...
Fsync: bitter lies<br />platter<br />…it’s a cache!<br />Disk<br />controller<br />page<br />cache<br /><ul><li>Two types ...
Writeback is the demon</li></li></ul><li>(Just dropped in) to see what condition your caches are in<br />A Typical Worksta...
(Just dropped in) to see what condition your caches are in<br />A Good Server<br />platter<br />Writethrough cache<br />on...
(Just dropped in) to see what condition your caches are in<br />An Even Better Server<br />platter<br />Battery-backedwrit...
(Just dropped in) to see what condition your caches are in<br />The Demon Setup<br />platter<br />Battery-backed writeback...
Disks in a virtual environment<br />The Trail of Tears to the Platter<br />Host<br />page<br />cache<br />Virtual<br />con...
Upcoming SlideShare
Loading in …5
×

What every data programmer needs to know about disks

22,642 views

Published on

What every data programmer needs to know about disks presentation

Published in: Technology

What every data programmer needs to know about disks

  1. 1. What Every Data Programmer <br />Needs to Know about Disks<br />OSCON Data – July, 2011 - Portland<br />Ted Dziuba<br />@dozba<br />tjdziuba@gmail.com<br />Not proprietary or confidential. In fact, you’re risking a career by listening to me.<br />
  2. 2. Who are you and why are you talking?<br />First job: Like college but they pay you to go.<br />A few years ago: Technical troll for The Register.<br />Recently: Co-founder of Milo.com, local shopping engine.<br />Present: Senior Technical Staff for eBay Local<br />
  3. 3. The Linux Disk Abstraction<br />Volume<br />/mnt/volume<br />File System<br />xfs, ext<br />Block Device<br />HDD, HW RAID array<br />
  4. 4. What happens when you read from a file?<br />f = open(“/home/ted/not_pirated_movie.avi”, “rb”)<br />avi_header = f.read(56)<br />f.close()<br />Disk<br />controller<br />user<br />buffer<br />page<br />cache<br />platter<br />
  5. 5. What happens when you read from a file?<br />Disk<br />controller<br />user<br />buffer<br />page<br />cache<br /><ul><li>Main memory lookup
  6. 6. Latency: 100 nanoseconds
  7. 7. Throughput: 12GB/sec on good hardware</li></ul>platter<br />
  8. 8. What happens when you read from a file?<br />Disk<br />controller<br />user<br />buffer<br />page<br />cache<br /><ul><li>Needs to actuate a physical device
  9. 9. Latency: 10 milliseconds
  10. 10. Throughput: 768 MB/sec on SATA 3
  11. 11. (Faster if you have a lot of money)</li></ul>platter<br />
  12. 12. Sidebar: The Horror of a 10ms Seek Latency<br />A disk read is 100,000 times slower than a memory read.<br />100 nanoseconds<br />Time it takes you to write a really clever tweet<br />10 milliseconds<br />Time it takes to write a novel, working full time<br />
  13. 13. What happens when you write to a file?<br />f = open(“/home/ted/nosql_database.csv”, “wb”)<br />f.write(key)<br />f.write(“,”)<br />f.write(value)<br />f.close()<br />Disk<br />controller<br />user<br />buffer<br />page<br />cache<br />platter<br />
  14. 14. What happens when you write to a file?<br />f = open(“/home/ted/nosql_database.csv”, “wb”)<br />f.write(key)<br />f.write(“,”)<br />f.write(value)<br />f.close()<br />Disk<br />controller<br />user<br />buffer<br />page<br />cache<br />platter<br />You need to make this<br />part happen<br />Mark the page dirty,<br />call it a day and go have a smoke.<br />
  15. 15. Aside: Stick your finger in the Linux Page Cache<br />Pre-Linux 2.6 used “pdflush”, now per-Backing Device Info (BDI) flush threads<br />Dirty pages: grep –i “dirty” /proc/meminfo<br />/proc/sys/vmLove:<br /><ul><li>dirty_expire_centisecs : flush old dirty pages
  16. 16. dirty_ratio: flush after some percent of memory is used
  17. 17. dirty_writeback_centisecs: how often to wake up and start flushing</li></ul>Clear your page cache: echo 1 > /proc/sys/vm/drop_caches<br />Crusty sysadmin’s hail-Mary pass: sync; sync; sync<br />
  18. 18. Fsync: force a flush to disk<br />f = open(“/home/ted/nosql_database.csv”, “wb”)<br />f.write(key)<br />f.write(“,”)<br />f.write(value)<br />os.fsync(f.fileno())<br />f.close()<br />Disk<br />controller<br />user<br />buffer<br />page<br />cache<br />platter<br />Also note, fsync() has a cousin, fdatasync() that does not sync metadata.<br />
  19. 19. Aside: point and laugh at MongoDB<br />Mongo’s “fsync” command:<br />> db.runCommand({fsync:1,async:true}); <br />wat.<br />Also supports “journaling”, like a WAL in the SQL world, however…<br /><ul><li>It only fsyncs() the journal every 100ms…”for performance”.
  20. 20. It’s not enabled by default.</li></li></ul><li>Fsync: bitter lies<br />f = open(“/home/ted/nosql_database.csv”, “wb”)<br />f.write(key)<br />f.write(“,”)<br />f.write(value)<br />os.fsync(f.fileno())<br />f.close()<br />Disk<br />controller<br />user<br />buffer<br />page<br />cache<br />platter<br />Drives will lie to you.<br />
  21. 21. Fsync: bitter lies<br />platter<br />…it’s a cache!<br />Disk<br />controller<br />page<br />cache<br /><ul><li>Two types of caches: writethrough and writeback
  22. 22. Writeback is the demon</li></li></ul><li>(Just dropped in) to see what condition your caches are in<br />A Typical Workstation<br />platter<br />No controller cache<br />Writeback cache on disk<br />Disk<br />controller<br />
  23. 23. (Just dropped in) to see what condition your caches are in<br />A Good Server<br />platter<br />Writethrough cache<br />on controller<br />Writethrough cache on disk<br />Disk<br />controller<br />
  24. 24. (Just dropped in) to see what condition your caches are in<br />An Even Better Server<br />platter<br />Battery-backedwriteback<br />cache on controller<br />Writethrough cache on disk<br />Disk<br />controller<br />
  25. 25. (Just dropped in) to see what condition your caches are in<br />The Demon Setup<br />platter<br />Battery-backed writeback<br />cache or<br />Writethrough cache<br />Writeback cache on disk<br />Disk<br />controller<br />
  26. 26. Disks in a virtual environment<br />The Trail of Tears to the Platter<br />Host<br />page<br />cache<br />Virtual<br />controller<br />user<br />buffer<br />page<br />cache<br />Physical<br />controller<br />Hypervisor<br />platter<br />
  27. 27. Disks in a virtual environment<br />Why EC2 I/O is Slow and Unpredictable<br />Shared Hardware<br /><ul><li>Physical Disk
  28. 28. Ethernet Controllers
  29. 29. Southbridge
  30. 30. How are the caches configured?
  31. 31. How big are the caches?
  32. 32. How many controllers?
  33. 33. How many disks?
  34. 34. RAID?</li></ul>Image Credit: ArsTechnica<br />
  35. 35. Aside: Amazon EBS<br />MySQL<br />Amazon EBS<br />Please stop doing this.<br />
  36. 36. What’s Killing That Box?<br />ted@u235:~$ iostat -x<br />Linux 2.6.32-24-generic (u235) 07/25/2011 _x86_64_ (8 CPU)<br />avg-cpu: %user %nice %system %iowait %steal %idle<br /> 0.15 0.14 0.05 0.00 0.00 99.66<br />Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz%util<br />sda 0.00 3.27 0.01 2.38 0.58 45.23 19.21 0.24<br />
  37. 37. Cool Hardware Tricks<br />Beginner Hardware Trick: SSD Drives<br /><ul><li>$2.50/GB vs 7.5c/GB
  38. 38. Negligible seek time vs 10ms seek time
  39. 39. Not a lot of space</li></li></ul><li>Cool Hardware Tricks<br />Intermediate Hardware Trick: RAID Controllers<br /><ul><li>Standard RAID Controller
  40. 40. SSD as writeback cache
  41. 41. Battery-backed
  42. 42. Adaptec “MaxIQ”
  43. 43. $1,200</li></ul>Image Credit: Tom’s Hardware<br />
  44. 44. Cool Hardware Tricks<br />Advanced Hardware Trick: FusionIO<br /><ul><li>SSD Storage on the Northbridge (PCIe)
  45. 45. 6.0 GB/sec throughput. Gigabytes.
  46. 46. 30 microsecond latency (30k ns)
  47. 47. Roughly $20/GB
  48. 48. Top-line card > $100,000 for around 5TB</li></li></ul><li>Questions<br />Questions & Heckling<br />Thank You<br />http://teddziuba.com/<br />@dozba<br />

×