Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
How Shit
Works:
Storage
Tomer Gabel, Wix
@ GeeCON Kraków 2016
Like all good stories…
• We’ll start with a question.
• “What’s wrong with this picture?”
Like all good stories…
• We’ll start with a question.
• “What’s wrong with this picture?”
MY, OH, MY.
WHAT COULD IT BE?
Axioms
• Not a trick question
– Servers are properly
configured
– System architecture
makes sense
– No obvious bugs
– No s...
PROLOGUE
“A LAUGHABLE CLAIM”
I/O is simple
• Just open a file, write, flush, close
• Nothing to it, right?
HDD
Application File
I/O is simple
• A little closer…
HDD
Application File
Kernel
File
system
(ext4)
Virtual
File
System Logical
Volume
Manager...
I/O is simple
• But really…
HDD
Application File
Kernel
Hardware
Storage Subsystem
System Bus Drivers
PCI Express Bus
SATA...
THE ONION OF ABSTRACTION
ACT I
THESE BOOTS
ARE MADE
FOR WALKIN’
Everybody knows...
• Sequential
access is fast
• Random
access is slow
• … so what?
Everybody knows…
“Disk seeks are a huge performance
bottleneck… When the amount of data
starts to grow so large that effec...
Everybody knows…
“Disk seeks are a huge performance
bottleneck… When the amount of data
starts to grow so large that effec...
But why?
Rotational Latency
Rotational Latency
Rotational Latency
Rotational Latency
Throughput
• So you understand
latency…
• What about throughput?
• Depends on two factors:
– Areal density
– Newtonian phy...
Areal Density
Interlude: Math
• Rotation is fixed
– Constant angular
velocity (CAV)
• Newton tells us that…
v = ω ∙ r
• Throughput incre...
Interlude: Math
• Commodity drives
are available at:
– 5400-15000 RPM
– Usually 7200 RPM
• What does it mean
for latency?
...
In practice?
• Modern drives
give you:
200+ MB/s
300 IOPS
• Pure random
access nets only
1.2MB/s!
RIGHT.
WHAT CAN WE DO ABOUT IT?
Fine-tuning
• Provision more RAM
• Careful index structure
– Represent IPs as
UNSIGNED INT for 75%
reduction
– Implement b...
… or use a sledgehammer!
• RAID 0 (and variants)
employ striping
• Data is distributed to
multiple spindles
• If it sounds...
It’s turtles all the way down
• Don’t jump to
conclusions!
– RAID 0 is impractical
– RAID 5 may be slow
– RAID 10 is expen...
ACT II: I’LL USE MY CREDIT CARD
Let’s talk SSDs
• Non-volatile RAM
• Lots of IOPS
• Expensive :-)
• Same caveats
apply…
Let’s talk SSDs
• Value starts at “1”
• Electrons accrue in the
floating gate
• After programming,
value becomes “0”
• Ele...
Surprise and Terror
• “Draining” is destructive!
• Limited erases
• Limited lifespan!
Wear Leveling
Caveats, remember?
• Addressing
– Cells (1 bit) – not
addressable
Caveats, remember?
• Addressing
– Cells (1 bit) – not
addressable
– Pages (0.5-8KB)
Caveats, remember?
• Addressing
– Cells (1 bit) – not
addressable
– Pages (0.5-8KB)
– Blocks (32-64 pages)
Caveats, remember?
• Addressing
– Cells (1 bit) – not
addressable
– Pages (0.5-8KB)
– Blocks (32-64 pages)
• Why do you ca...
Write Amplification
1
1
1
1
1
Δ = 1 bit Δ = 1 block!
Surprising Results
• Defragmentation
– Relocates blocks
– Contiguous files
– Lower LBAs
– Background job
• Bad, bad, bad!
...
Background GC
7
5
6
1
2
Block A Block B
Block C Block D
1 2 5
6 7
Block A Block B
Block C Block D
Surprising Results
• What happens when
you delete file?
– Not much
– Bit flip on file table
– Space is not reclaimed
• Res...
SSD Takeaways
• A moving target
–File systems
–Data structures
–Longevity
• As usual:
–Benchmark
–Monitor
EPILOGUE
“LET ME EMBRACE
THEE, SOUR
ADVERSITY,
FOR WISE MEN SAY
IT IS THE WISEST
COURSE.”
WE’RE DONE HERE!
… AND YES, WE’RE HIRING :-)
Thank you for listening
tomer@tomergabel.com
@tomerg
http://il.linkedin.com/i...
Upcoming SlideShare
Loading in …5
×

How Shit Works: Storage

674 views

Published on

The beautiful thing about software engineering is that it gives you the warm and fuzzy illusion of total understanding: I control this machine because I know how it operates. This is the result of layers upon layers of successful abstractions, which hide immense sophistication and complexity. As with any abstraction, though, these sometimes leak, and that's when a good grounding in what's under the hood pays off.

This first in what will hopefully be a series of talks covers the fundamentals of storage, providing an overview of the three storage tiers commonly found on modern platforms (hard drives, RAM and CPU cache). You'll come away knowing a little bit about a lot of different moving parts under the hood; after all, isn't understanding how the machine operates what this is all about?

-- A talk given at GeeCON Kraków 2016.

Published in: Software
  • Be the first to comment

  • Be the first to like this

How Shit Works: Storage

  1. 1. How Shit Works: Storage Tomer Gabel, Wix @ GeeCON Kraków 2016
  2. 2. Like all good stories… • We’ll start with a question. • “What’s wrong with this picture?”
  3. 3. Like all good stories… • We’ll start with a question. • “What’s wrong with this picture?”
  4. 4. MY, OH, MY. WHAT COULD IT BE?
  5. 5. Axioms • Not a trick question – Servers are properly configured – System architecture makes sense – No obvious bugs – No scheduled jobs • So what else goes bump in the night?
  6. 6. PROLOGUE “A LAUGHABLE CLAIM”
  7. 7. I/O is simple • Just open a file, write, flush, close • Nothing to it, right? HDD Application File
  8. 8. I/O is simple • A little closer… HDD Application File Kernel File system (ext4) Virtual File System Logical Volume Manager I/O scheduler SCSI driver stack
  9. 9. I/O is simple • But really… HDD Application File Kernel Hardware Storage Subsystem System Bus Drivers PCI Express Bus SATA Controller
  10. 10. THE ONION OF ABSTRACTION
  11. 11. ACT I THESE BOOTS ARE MADE FOR WALKIN’
  12. 12. Everybody knows... • Sequential access is fast • Random access is slow • … so what?
  13. 13. Everybody knows… “Disk seeks are a huge performance bottleneck… When the amount of data starts to grow so large that effective caching becomes impossible… you need at least one disk seek to read and a couple of disk seeks to write things.” -- MySQL Reference Manual (8.12.3)
  14. 14. Everybody knows… “Disk seeks are a huge performance bottleneck… When the amount of data starts to grow so large that effective caching becomes impossible… you need at least one disk seek to read and a couple of disk seeks to write things.” -- MySQL Reference Manual (8.12.3)
  15. 15. But why?
  16. 16. Rotational Latency
  17. 17. Rotational Latency
  18. 18. Rotational Latency
  19. 19. Rotational Latency
  20. 20. Throughput • So you understand latency… • What about throughput? • Depends on two factors: – Areal density – Newtonian physics
  21. 21. Areal Density
  22. 22. Interlude: Math • Rotation is fixed – Constant angular velocity (CAV) • Newton tells us that… v = ω ∙ r • Throughput increases with radius!
  23. 23. Interlude: Math • Commodity drives are available at: – 5400-15000 RPM – Usually 7200 RPM • What does it mean for latency? 7200 60 = 120 Revolutions / Second 1 120 = 0.08333 ~ 8.33ms!
  24. 24. In practice? • Modern drives give you: 200+ MB/s 300 IOPS • Pure random access nets only 1.2MB/s!
  25. 25. RIGHT. WHAT CAN WE DO ABOUT IT?
  26. 26. Fine-tuning • Provision more RAM • Careful index structure – Represent IPs as UNSIGNED INT for 75% reduction – Implement better UUIDs¹ for 30% reduction ¹ Store UUID in an optimized way, Percona blog
  27. 27. … or use a sledgehammer! • RAID 0 (and variants) employ striping • Data is distributed to multiple spindles • If it sounds familiar… – It is! – We call it “sharding”
  28. 28. It’s turtles all the way down • Don’t jump to conclusions! – RAID 0 is impractical – RAID 5 may be slow – RAID 10 is expensive – etc. • Do your homework • Benchmark!
  29. 29. ACT II: I’LL USE MY CREDIT CARD
  30. 30. Let’s talk SSDs • Non-volatile RAM • Lots of IOPS • Expensive :-) • Same caveats apply…
  31. 31. Let’s talk SSDs • Value starts at “1” • Electrons accrue in the floating gate • After programming, value becomes “0” • Electrons are drained to reset value to “0”
  32. 32. Surprise and Terror • “Draining” is destructive! • Limited erases • Limited lifespan!
  33. 33. Wear Leveling
  34. 34. Caveats, remember? • Addressing – Cells (1 bit) – not addressable
  35. 35. Caveats, remember? • Addressing – Cells (1 bit) – not addressable – Pages (0.5-8KB)
  36. 36. Caveats, remember? • Addressing – Cells (1 bit) – not addressable – Pages (0.5-8KB) – Blocks (32-64 pages)
  37. 37. Caveats, remember? • Addressing – Cells (1 bit) – not addressable – Pages (0.5-8KB) – Blocks (32-64 pages) • Why do you care? – Reads/writes on a page – But erasure on a block
  38. 38. Write Amplification 1 1 1 1 1 Δ = 1 bit Δ = 1 block!
  39. 39. Surprising Results • Defragmentation – Relocates blocks – Contiguous files – Lower LBAs – Background job • Bad, bad, bad! – No benefit with SSDs – Major write load!
  40. 40. Background GC 7 5 6 1 2 Block A Block B Block C Block D 1 2 5 6 7 Block A Block B Block C Block D
  41. 41. Surprising Results • What happens when you delete file? – Not much – Bit flip on file table – Space is not reclaimed • Result? – SATA TRIM command 7 5 6 1 2 Block A Block B Block C Block D
  42. 42. SSD Takeaways • A moving target –File systems –Data structures –Longevity • As usual: –Benchmark –Monitor
  43. 43. EPILOGUE “LET ME EMBRACE THEE, SOUR ADVERSITY, FOR WISE MEN SAY IT IS THE WISEST COURSE.”
  44. 44. WE’RE DONE HERE! … AND YES, WE’RE HIRING :-) Thank you for listening tomer@tomergabel.com @tomerg http://il.linkedin.com/in/tomergabel Wix Engineering blog: http://engineering.wix.com

×