2006-11-7 John Lazzaro (www.cs.berkeley.edu/~lazzaro) CS 152 Computer Architecture and Engineering Lecture 20 – Buses, Disks, and RAID TAs: Udam Saini and Jue Sun + Flash Memory www-inst.eecs.berkeley.edu/~cs152/
Last Time: Multithreading, Multiple Cores (2) Threads on two cores share memory via L2 cache operations. Much faster than 2 CPUs on 2 chips. (1) Threads on two cores that use shared libraries conserve L2 memory.
Today: Buses, Disks, RAID, and Flash Buses: Shared physical wires that act to communicate signals between several devices (often ”peripherals”) Disks: Store bits as the orientation of miniature “bar magnets” on a rotating platter. A mechanical device: slow and prone to failure. Buses let computers be expandable: add more memory, a better graphics card, webcam, etc.
Properties of bus structures ... Buses are an abstraction for communication: helps designers compose large, complex systems. Control lines: Controls transactions, signals what is on data lines Data lines: Carries information across the interface
Buses are defined in layers ... Transaction Protocols Signal Timing on Wires Wires Electrical Properties Mechanical Properties Example: DIMM DRAM bus. The name of every wire is defined in a standards document. JEDEC: Joint Electron Device Engineering Council. Makes the DRAM bus standards.
Lower levels of DRAM bus specification Ideally, DIMMs made by any manufacturer should fit into any compliant socket, and work. Transaction Protocols Signal Timing on Wires Wires Electrical Properties Mechanical Properties
Upper levels of DRAM bus specification Collaboration between DRAM manufacturers (Samsung, Micron) and DRAM users (Intel, Cisco, ... ). Transaction Protocols Signal Timing on Wires Wires Electrical Properties Mechanical Properties
Bus wires shared between many DIMMS Apple Xserve G5 - has 8 DIMM slots, to support 8GB. DIMMs respond to transaction requests. Since memory controller is the only bus master , and there are a small number of DIMM slots, bus sharing is easy: use dedicated wires to each slot. Memory controller is the only “ bus master ” - it can start transactions on the bus, but the DIMMs cannot.
Buses: pros and cons ... +++ Low cost. One set of wires from memory controller can support up to 8 DIMMs. --- Latency of bus increases with length of wires (needed to reach all 8 DIMM sockets), and the loading of 8 DIMMs. Must design for worst-case (8 DIMMs), even if only 1 DIMM is present. --- Shared wires limit maximum bandwidth from memory. If memory controller had 8 sets of dedicated wires, one per DIMM, memory bandwidth would be much better (but more expensive).
Buses turn a CPU into a product Case Study: Mac Mini (PowerPC edition)
Constraints: Size, low price (499 USD) Size fixed by the “form factor” (physical size) of desktop DIMMS. Laptop DRAM is smaller, but too expensive for $499 price.
Users expansion via serial buses Serial : Data is sent “bit by bit” over one logical wire. +++ Low cost: a small number of wires cost less. Also, cheap wires and connectors can be used, since skew is less a problem. --- When only using one wire, there is a bandwidth limit. Thus, DIMMs uses many wires (a ”parallel” bus, not “serial”). USB, FireWire Ethernet. +++ Sending data over many wires introduces “skew” - signals travel on each wire at a slightly different speed. Skew limits speed and length of a bus. Serial buses have fewer skew issues, because they only use one logical wire. Serial pros and cons:
Many other buses hidden from user Processor bus. How the CPU talks to everything else. Not standardized. CPU : PowerPC G4 (Freescale) Bus controller. Just 1 for low cost. High-end products have two: fast North Bridge, slow South Bridge.
Uses many standard parallel buses ... AGP 4X bus. Graphics chip. PCI bus : Boot ROM, USB 2. ATA/100 bus. For hard disk, DVD/CD ROM. PCI, ATA, AGP devices can be bus master, for Direct Memory Access (DMA). Disk can write RAM directly.
Parts + manufacturing cost: $283.37 Parts cost in volume: $274.69 Source: iSuppli corporation
Disks Programs request a “block” of data from a disk: Block = 0.5K to 4K bytes
Trick: use bar magnets to code bits Symbol N S
Write and read bar direction on a disk 1 0 0 Longitudinal Recording: Today’s technology. Problem: Magnets tend to erase each other. Disk surface 1 0 0 Perpendicular Recording: New to market.
Data block written on a sector of a track Outer tracks hold more sectors. 2005 desktop rotation speed: 7200 RPM Each ring is a “track ”. A track is divided into “ sectors ”. A sector codes a fixed # of bytes ( ex: 4K blocks ). Many more tracks and sectors than shown!
Read/write head mounted on “arm” A “seek”: When the arm is moved so that the heads are over the desired track. “ Seek time ”: Average time to move from one track to another track. Pessimistic estimate of real-world performance. 2005 desktop drive typical seek time: 8.5 milliseconds.
Disk Latency Equation Latency of a disk block read = Queueing Time + Zero if no other accesses pending. Controller Time + Usually short. Seek Time + 2005: about 8 ms Rotation Time + 4.25 ms @ 7200 RPM 1/2 full rotation time Transfer Time 1 ms @ 7200 RPM
OS interleaves a file onto disk sectors ... For this “toy” disk, an ~8 ms rotation time, allows ~2 ms between end of sector N and start of sector N+1 , enough time for an OS to request the next block for a linear read of a file. 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Why disk arrays? Reliability + throughput Fully dual redundant I/O Controller I/O Controller Array Controller Array Controller . . . . . . . . . . . . . . . . . . Goal: No Single Points of Failure host host with duplicated paths, higher performance can be obtained when there are no failures . . . . . . . . . . . . Recovery Groups (columns) Disk(s) can fail in a group w/o losing access to the data stored in the group.
Raid Level: Recovery Group Organization Each disk holds a copy of each block. +++ High availability. +++ High read bandwidth. --- High disk capacity cost. 5-disk Recovery Group D0 D1 D2 D3 D4 RAID 1: Disk Mirroring Logical Blocks Bn on Array B0 B0 B0 B0 B0 B1 B1 B1 B1 B1 B2 B2 B2 B2 B2 . . . . . . . . . . . . . . .
Simple case: Two 1KB blocks of data (A and B) The math is easy: the trick is system design! Examples: RAID, voice-over-IP parity FEC. “ Parity codes” Why? (A xor B) xor B = (B xor B) xor A = A Recall from ECC lecture: Parity Codes Create a third block, C: C = A xor B (do xor on each bit of block) Read all three blocks. If A or B is not available but C is, regenerate A or B: A = C xor B B = C xor A
For lower capacity cost: parity codes 80% of disks hold unique data. +++ Low cost. --- Only one disk may fail. --- Pdisk limits write bandwidth --- No read bandwidth gain. 5-disk Recovery Group D0 D1 D2 D3 Parity RAID 3: Parity Disk Logical Blocks Bn on Array B0 B1 B2 B3 P0 B4 B5 B6 B7 P1 . . . . . . . . . . . . . . . B8 B9 B10 P2 B11
The physics of non-volatile memory p- n+ Vd n+ Vs dielectric Vg dielectric I ds 2. 10,000 electrons on floating gate shift transistor threshold by 2V. 3. In a memory array, shifted transistors hold “0”, unshifted hold “1”. Two gates. But the middle one is not connected. I ds V s V d V g “ Floating gate”. 1. Electrons “placed” on floating gate stay there for many years (ideally). +++ --- +++ ---
Moving electrons on/off floating gate p- n+ Vd n+ Vs dielectric Vg dielectric 1. Hot electron injection and tunneling produce tiny currents , thus writes are slow . 2. High voltages damage the floating gate. Too many writes and a bit goes “bad”. A high drain voltage injects “hot electrons” onto floating gate. A high gate voltage “ tunnels” electrons off of floating gate.
Flash: Disk Replacement Presents memory to the CPU as a set of pages. Chip “remembers” for 10 years. 2048 Bytes 64 Bytes + (user data) (meta data) Page format: 1GB Flash: 512K pages 2GB Flash: 1M pages 4GB Flash: 2M pages
Reading a Page ... Flash Memory 8-bit data or address (bi-directional) Bus Control 33 MB/s Read Bandwidth Samsung K9WAG08U1A Page address in: 175 ns First byte out: 10,000 ns Clock out page bytes: 52,800 ns
Where Time Goes Page address in: 175 ns First byte out: 10,000 ns Clock out page bytes: 52, 800 ns
Writing a Page ... A page lives in a block of 64 pages: To write a page: 1. Erase all pages in the block (cannot erase just one page). Time: 1,500,000 ns 2. May program each page individually, exactly once. Time: 200,000 ns per page. 1GB Flash: 8K blocks 2GB Flash: 16K blocks 4GB Flash: 32K blocks Block lifetime: 100,000 erase/program cycles.
Block Failure Even when new, not all blocks work! 1GB: 8K blocks, 160 may be bad. 2GB: 16K blocks, 220 may be bad. 4GB: 32K blocks, 640 may be bad. During factory testing, Samsung writes good/bad info for each block in the meta data bytes. 2048 Bytes 64 Bytes + (user data) (meta data) After an erase/program, chip can say “ write failed” , and block is now “ bad ”. OS must recover (migrate bad block data to a new block). Bits can also go bad “silently” (!!!).
Flash controllers: Chips or Verilog IP ... Flash memory controller manages write lifetime management, block failures, silent bit errors ... Software sees a “perfect” disk-like storage device.
Buses make the machine. “Value” and “high-end” machines may have similar CPUs, different chip-sets. Disks are different : As mechanical devices, they move on a millisecond time frame, and suffer mechanical failure. System design must cope with this reality, not ignore it. Conclusions: Buses, Disks, and RAID Flash: Has its own set of non-idealities ...
Reminder: No Checkoff this Friday! TAs will provide “secret” MIPS machine code tests. Bonus points if these tests run by end of section. If not, TAs give you test code to use over weekend Final checkoff the following Friday ... Final report due following Monday, 11:59 PM