Your SlideShare is downloading. ×
CSE232A: Database System
                Principles

                  Notes 02: Hardware




                            ...
Storage Cost
                                                                              nearline offline
              ...
Cost of Disk Access

    • How many blocks were accessed ?
    • Clustered/consecutive ?




                             ...
Focus on: “Typical Disk”
         Controller
BUS


           Disk




                                    …
  Terms: Plat...
Key performance metric: Time
       to fetch block
I want                          block x
block X                        ...
Seek Time



     3 or 5x


  Time


          x
Few ms
               1                           N

                    ...
Transfer Rate: t


• “typical” t: 1 ® 3 MB/second
• transfer time: block size
                    t




                  ...
Practice Problem: Minimum Time

 • Head is at the start of the first sector of
   the block
 • Just compute transfer time
...
• So far: Random Block Access
• What about: Reading “Next” block?

Time to get = Block Size + Negligible
  block          ...
Suggested optimization

  • Cluster data in consecutive blocks
  • Give an extra point to algorithms that
    – exploit da...
• To Modify a Block?

To Modify Block:
    (a) Read Block
    (b) Modify in Memory
    (c) Write Block
    [(d) Verify?]

...
Double Buffering
Problem: Have a File
            » Sequence of Blocks B1, B2

           Have a Program
           » Proc...
Double Buffering

                process      process
 Memory:
                  C
                  A            B



 D...
Trend

• memory prices drop and memory capacities
  increase,
• transfer rates increase
• Disk access times do not increas...
At what level do we cope?


• Single Disk
  – e.g., Error Correcting Codes
• Disk Array



    Logical                 Phy...
Summary
Summary
• Secondary storage, mainly disks
• I/O times
• I/Os should be avoided,
          especially random ones…....
Upcoming SlideShare
Loading in...5
×

Hardware

205

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
205
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Hardware"

  1. 1. CSE232A: Database System Principles Notes 02: Hardware 1 Database System Architecture Query Processing Transaction Management SQL query Calls from Transactions (read,write) Parser Transaction relational algebra Manager Query View Hardware Rewriter definitions aspects of Concurrency Lock Controller Table and Optimizer Statistics & Catalogs & storing and query execution System Data retrieving data Recovery plan Manager Execution Buffer Engine Manager Data + Indexes Log Memory Hierarchy • Cache memory – On-chip and L2 – Caching outside control of DB Cost per byte system • RAM Capacity Access Speed – Addressable space includes virtual memory but DB systems avoid it – Main memory DBs rely more on OS • Disk – Access speed & Transfer rate – Winchester, arrays,… • Tertiary storage – Tapes, jukeboxes, DVDs 3 1
  2. 2. Storage Cost nearline offline 1015 tape & tape optical typical capacity (bytes) 1013 magnetic disks optical 1011 electronic disks secondary online 109 electronic tape main 107 from Gray & Reuter 105 cache 103 10-9 10-6 10-3 10-0 103 access time (sec) 4 Storage Cost from Gray & Reuter 104 cache electronic 102 online main tape dollars/MB electronic secondary magnetic optical nearline 100 tape & disks optical disks 10-2 offline tape 10-4 10-9 10-6 10-3 10-0 103 access time (sec) 5 Volatile Vs Non-Volatile Storage • Persistence important for transaction atomicity and durability • Even if database fits in main memory changes have to be written in non- volatile storage • Hard disk • RAM disks w/ battery • Flash memory 6 2
  3. 3. Cost of Disk Access • How many blocks were accessed ? • Clustered/consecutive ? 7 Moore’s Law: Different Rates of Improvement • Processor speed Clustered/sequential access-based algorithms Disk Transfer Rate • Main memory bit/$ become relatively • Disk bit/$ better • RAM access speed • Disk access speed • Disk transfer rate Access Time Disk 8 Moore’s Law: Different Rates of Improvement Cost of “miss” increases Cache Capacity RAM Capacity Access Time Disk 9 3
  4. 4. Focus on: “Typical Disk” Controller BUS Disk … Terms: Platter, Head, Actuator Cylinder, Track Sector (physical), Block (logical), Gap 10 Often different numbers Top View of sectors per track Sector Track Block (typically Gap multiple sectors) 11 “Typical” Numbers Diameter: 1 inch ® 15 inches Cylinders: 100 ® 20000 Surfaces: 1 (CDs) ® (Tracks/cyl) 2 (floppies) ® ® 5 (typical hd) ® 30 Sector Size: 512B ® 50K Capacity: 360 KB (old floppy) ® 30 GB 12 4
  5. 5. Key performance metric: Time to fetch block I want block x block X in memory ? Time = Seek Time (locate track) + Rotational Delay (locate sector)+ Transfer Time (fetch block) + Other (disk controller, …) 13 Track Seek Delay Where Head must go Track Where Head is 14 Rotational Delay Head Here Block I Want 15 5
  6. 6. Seek Time 3 or 5x Time x Few ms 1 N Cylinders Traveled 16 Average Random Seek Time N N å å SEEKTIME (i ® j) i=1 j=1 S= j¹i N(N-1) “Typical” S: 10 ms ® 40 ms 17 Average Rotational Delay R = 1/2 revolution “typical” R = 8.33 ms (7200 RPM) Assume we have to start reading from start of first sector 18 6
  7. 7. Transfer Rate: t • “typical” t: 1 ® 3 MB/second • transfer time: block size t 19 Other Delays • CPU time to issue I/O • Contention for controller • Contention for bus, memory “Typical” Value: 0 20 Practice Problem • Single surface • Rotation speed 7200rpm • 16,384 tracks • 128 sectors/track • 4096 bytes/sector • 4 sectors/block (16,384 bytes/block) • SEEKTIME (i ® j) = [1000 + (j-i)] µs • Neglect gaps • Calculate minimum, maximum, average time to fetch one block 21 7
  8. 8. Practice Problem: Minimum Time • Head is at the start of the first sector of the block • Just compute transfer time • 4 sectors cover 4/128 of a track • 1 full rotation takes 60/7200=8.33ms • Transfer time is 8.33 * 4 /128 = 0.26ms 22 Practice Problem: Maximum Time • Assume read must start at the first sector • Head is at innermost, required track is the outermost • Seek time = … • Head just missed the beginning • Rotational delay = … • Transfer time = … 23 Practice problem: Average time • Solve… 24 8
  9. 9. • So far: Random Block Access • What about: Reading “Next” block? Time to get = Block Size + Negligible block t - skip gap - switch track - once in a while, next cylinder 25 Rule of Random I/O: Expensive Thumb Sequential I/O: Much less • Ex: 1 KB Block » Random I/O: ~ 20 ms. » Sequential I/O: ~ 1 ms. 26 Practice Problem cont’d: Sustained Bandwidth over Track • Assume required blocks are consecutive on single track • What is the sustained bandwidth of fetching consecutive blocks? • 128 sectors/track * 4KB/sector in 8.33ms/track full rotation = 512KB/8.33ms = 61.46KB/ms 27 9
  10. 10. Suggested optimization • Cluster data in consecutive blocks • Give an extra point to algorithms that – exploit data clustering by avoiding “random” accesses – Read/write consecutive blocks 28 Example: 2-Phase Merge Sort P K AD L E ZW J C RH Y F X I Main Memory: 4 blocks READ P K A D L E ZW SORT A D K P L D WP E K P K Z WRITE A D K P L DWP E K P K Z … MERGE AD DP C K F WRITE C F H I J R X Y Improve by bringing max number of 29 blocks in memory Cost for Writing similar to Reading …. unless we want to verify! need to add (full) rotation + Block size t 30 10
  11. 11. • To Modify a Block? To Modify Block: (a) Read Block (b) Modify in Memory (c) Write Block [(d) Verify?] 31 Block Address: • Physical Device • Cylinder # • Surface # • Sector Once upon a time DBs had access to such – now it is the OS’s domain 32 Optimizations (in controller or O.S.) • Disk Scheduling Algorithms – e.g., elevator algorithm • Track (or larger) Buffer • Pre-fetch • Arrays • Mirrored Disks 33 11
  12. 12. Double Buffering Problem: Have a File » Sequence of Blocks B1, B2 Have a Program » Process B1 » Process B2 » Process B3 ... 34 Single Buffer Solution (1) Read B1 ® Buffer (2) Process Data in Buffer (3) Read B2 ® Buffer (4) Process Data in Buffer ... 35 Say P = time to process/block R = time to read in 1 block n = # blocks Single buffer time = n(P+R) 36 12
  13. 13. Double Buffering process process Memory: C A B Disk: A B C D E F G done done 37 Say P ³ R P = Processing time/block R = IO time/block n = # blocks What is processing time? • Double buffering time = R + nP • Single buffering time = n(R+P) Improvement much more dramatic if consequtive blocks: … 38 Block Size Selection? • Big Block ® Amortize I/O Cost Unfortunately... • Big Block Þ Read in more useless stuff! and takes longer to read 39 13
  14. 14. Trend • memory prices drop and memory capacities increase, • transfer rates increase • Disk access times do not increase that much Þ blocks get bigger ... 40 Disk Failures (Sec 2.5) • Partial ® Total • Intermittent ® Permanent 41 Coping with Disk Failures • Detection – e.g. Checksum • Correction Þ Redundancy 42 14
  15. 15. At what level do we cope? • Single Disk – e.g., Error Correcting Codes • Disk Array Logical Physical 43 Operating System e.g., Stable Storage Logical Block Copy A Copy B 44 Database System • e.g., Log Current DB Last week’s DB 45 15
  16. 16. Summary Summary • Secondary storage, mainly disks • I/O times • I/Os should be avoided, especially random ones….. 46 16

×