Operating Systems
         CMPSCI 377
         Architecture
                   Emery Berger
University of Massachusetts Am...
Architecture
    Hardware Support for Applications & OS


        Architecture basics & details
    

        Focus on c...
The Memory Hierarchy
    Registers


    Caches


        Associativity
    

        Misses
    

    Locality





...
Registers
    Register = dedicated name for word of


    memory managed by CPU
        General-purpose: “AX”, “BX”, “CX”...
Registers
    Register = dedicated name for one word of


    memory managed by CPU
        General-purpose: “AX”, “BX”, ...
Caches
    Access to main memory: “expensive”


        ~ 100 cycles (slow, relatively cheap)
    

    Caches: small, f...
Memory Hierarchy
    Higher = small, fast, more $, lower latency


    Lower = large, slow, less $, higher latency


   ...
Cache Jargon
    Cache initially cold


    Accessing data initially misses


        Fetch from lower level in hierarch...
Cache Details
    Ideal cache would be fully associative


        That is, LRU (least-recently used) queue
    

      ...
Associativity Example
    Hash memory based on addresses to


    different indices in cache




     UNIVERSITY OF MASSA...
Miss Classification
    First access = compulsory miss


        Unavoidable without prefetching
    

    Too many item...
Exercise
    Cache with 4 entries, 2-way associativity


        Assume hash(x) = x % 4 (modulus)
    

    How many mis...
Solution
    Cache with 4 entries, 2-way associativity


        Assume hash(x) = x % 4 (modulus)
    

    How many mis...
Solution
    Cache with 4 entries, 2-way associativity


        Assume hash(x) = x % 4 (modulus)
    

    How many mis...
Solution
    Cache with 4 entries, 2-way associativity


        Assume hash(x) = x % 4 (modulus)
    

    How many mis...
Locality
    Locality = re-use of recently-used items


        Temporal locality: re-use in time
    

        Spatial ...
Quantifying Locality
        Instead of counting misses,
    

        compute hit curve from LRU histogram
            A...
Quantifying Locality
        Instead of counting misses,
    

        compute hit curve from LRU histogram
            A...
Quantifying Locality
        Instead of counting misses,
    

        compute hit curve from LRU histogram
            A...
Quantifying Locality
        Instead of counting misses,
    

        compute hit curve from LRU histogram
            A...
Quantifying Locality
        Instead of counting misses,
    

        compute hit curve from LRU histogram
            A...
Quantifying Locality
        Instead of counting misses,
    

        compute hit curve from LRU histogram
            A...
Quantifying Locality
    Instead of counting misses,


    compute hit curve from LRU histogram
        Start with total ...
Quantifying Locality
    Instead of counting misses,


    compute hit curve from LRU histogram
        Start with total ...
Hit Curve Exercise
    Derive hit curve for following trace:





3   5    4     2    8     3     6    9     9     6    1...
Hit Curve Exercise
    Derive hit curve for following trace:





                   1       2       3       4       5   ...
Hit Curve Exercise
    Derive hit curve for following trace:




                   1       2       2       2       3    ...
Hit Curve Exercise
    Derive hit curve for following trace:




               1    2     2     2    3     3       4    ...
Important CPU Internals
    Issues that affect performance


        Pipelining
    

        Branches & prediction
    ...
Scalar architecture + memory…
    Straight-up sequential execution


        Fetch instruction
    

        Decode it
 ...
Superscalar architectures
    Out-of-order processors


        Pipeline of instructions in flight
    

        Instead...
Pipelining and Branches

Pipelining overlaps instructions to
exploit parallelism, allowing the clock
rate to be increased....
Branch Prediction

A branch predictor allows the processor
to speculatively fetch and execute
instructions down the predic...
Kernel Mode
    Protects OS from users


        kernel = English for nucleus
    

              Think atom
          ...
Timers & Interrupts
    Need to respond to events periodically


        Change executing processes
    

              ...
To do
    Read C/C++ notes for next week


    First homework assigned next week


        Language: C/C++
    

      ...
The End




   UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science   37
Upcoming SlideShare
Loading in …5
×

Operating Systems - Architecture

1,066
-1

Published on

From the Operating Systems course (CMPSCI 377) at UMass Amherst, Fall 2007.

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,066
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
114
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Operating Systems - Architecture

  1. 1. Operating Systems CMPSCI 377 Architecture Emery Berger University of Massachusetts Amherst UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
  2. 2. Architecture Hardware Support for Applications & OS  Architecture basics & details  Focus on characteristics exposed to  application programmer / OS UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 2
  3. 3. The Memory Hierarchy Registers  Caches  Associativity  Misses  Locality  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 3
  4. 4. Registers Register = dedicated name for word of  memory managed by CPU General-purpose: “AX”, “BX”, “CX” on x86  SP Special-purpose:  arg0 arg1 arg0 “SP” = stack pointer  arg1 arg2 “FP” = frame pointer FP  “PC” = program counter  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 4
  5. 5. Registers Register = dedicated name for one word of  memory managed by CPU General-purpose: “AX”, “BX”, “CX” on x86  SP Special-purpose:  arg0 arg1 “SP” = stack pointer  “FP” = frame pointer FP  “PC” = program counter  Change processes:  save current registers & load saved registers = context switch UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 5
  6. 6. Caches Access to main memory: “expensive”  ~ 100 cycles (slow, relatively cheap)  Caches: small, fast, expensive memory  Hold recently-accessed data (D$) or  instructions (I$) Different sizes & locations  Level 1 (L1) – on-chip, smallish  Level 2 (L2) – on or next to chip, larger  Level 3 (L3) – pretty large, on bus  Manages lines of memory (32-128 bytes)  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 6
  7. 7. Memory Hierarchy Higher = small, fast, more $, lower latency  Lower = large, slow, less $, higher latency  registers 1-cycle latency 2-cycle latency L1 evict load D$, I$ separate L2 7-cycle latency D$, I$ unified RAM 100 cycle latency Disk 40,000,000 cycle latency Network 200,000,000+ cycle latency UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 7
  8. 8. Cache Jargon Cache initially cold  Accessing data initially misses  Fetch from lower level in hierarchy  Bring line into cache (populate cache)  Next access: hit  Once cache holds most-frequently used  data: “warmed up” Context switch implications?  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 8
  9. 9. Cache Details Ideal cache would be fully associative  That is, LRU (least-recently used) queue  Generally too expensive  Instead, partition memory addresses and  put into separate bins divided into ways 1-way or direct-mapped  2-way = 2 entries per bin  4-way = 4 entries per bin, etc.  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 9
  10. 10. Associativity Example Hash memory based on addresses to  different indices in cache UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 10
  11. 11. Miss Classification First access = compulsory miss  Unavoidable without prefetching  Too many items in way = conflict miss  Avoidable if we had higher associativity  No space in cache = capacity miss  Avoidable if cache were larger  Invalidated = coherence miss  Avoidable if cache were unshared  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 11
  12. 12. Exercise Cache with 4 entries, 2-way associativity  Assume hash(x) = x % 4 (modulus)  How many misses?  # compulsory misses?  # conflict misses?  # capacity misses?  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 12
  13. 13. Solution Cache with 4 entries, 2-way associativity  Assume hash(x) = x % 4 (modulus)  How many misses?  # compulsory misses? 10  # conflict misses?  # capacity misses?  3 7 11 2 3 7 7 9 9 6 13 7 2 5 8 10 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 13
  14. 14. Solution Cache with 4 entries, 2-way associativity  Assume hash(x) = x % 4 (modulus)  How many misses?  # compulsory misses? 10  # conflict misses? 2  # capacity misses?  3 7 11 2 3 7 7 9 9 6 13 7 2 5 8 10 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 14
  15. 15. Solution Cache with 4 entries, 2-way associativity  Assume hash(x) = x % 4 (modulus)  How many misses?  # compulsory misses? 10  # conflict misses? 2  # capacity misses? 0  3 7 11 2 3 7 7 9 9 6 13 7 2 5 8 10 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 15
  16. 16. Locality Locality = re-use of recently-used items  Temporal locality: re-use in time  Spatial locality: use of nearby items  In same cache line, same page (4K chunk)  Intuitively – greater locality = fewer misses  # misses depends on cache layout, # of levels,  associativity… Machine-specific  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 16
  17. 17. Quantifying Locality Instead of counting misses,  compute hit curve from LRU histogram Assume perfect LRU cache  Ignore compulsory misses  3 7 7 2 3 7 7 3 1 2 3 4 5 6 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 17
  18. 18. Quantifying Locality Instead of counting misses,  compute hit curve from LRU histogram Assume perfect LRU cache  Ignore compulsory misses  3 7 7 2 3 7 7 3 1 2 3 4 5 6 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 18
  19. 19. Quantifying Locality Instead of counting misses,  compute hit curve from LRU histogram Assume perfect LRU cache  Ignore compulsory misses  3 7 7 2 3 7 2 7 3 1 2 3 4 5 6 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 19
  20. 20. Quantifying Locality Instead of counting misses,  compute hit curve from LRU histogram Assume perfect LRU cache  Ignore compulsory misses  3 7 7 2 3 7 2 7 3 1 2 3 4 5 6 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 20
  21. 21. Quantifying Locality Instead of counting misses,  compute hit curve from LRU histogram Assume perfect LRU cache  Ignore compulsory misses  3 7 7 2 3 7 3 2 7 1 2 3 4 5 6 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 21
  22. 22. Quantifying Locality Instead of counting misses,  compute hit curve from LRU histogram Assume perfect LRU cache  Ignore compulsory misses  3 7 7 2 3 7 3 2 7 1 2 3 4 5 6 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 22
  23. 23. Quantifying Locality Instead of counting misses,  compute hit curve from LRU histogram Start with total misses on right hand side  Subtract histogram values  1 1 3 3 3 3 1 2 3 4 5 6 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 23
  24. 24. Quantifying Locality Instead of counting misses,  compute hit curve from LRU histogram Start with total misses on right hand side  Subtract histogram values  Normalize  100% .3 .3 1 1 1 1 67% 33% 0% 1 2 3 4 5 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 24
  25. 25. Hit Curve Exercise Derive hit curve for following trace:  3 5 4 2 8 3 6 9 9 6 13 7 2 5 8 10 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 25
  26. 26. Hit Curve Exercise Derive hit curve for following trace:  1 2 3 4 5 6 7 8 9 3 5 4 2 8 3 6 9 9 6 13 7 2 5 8 10 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 26
  27. 27. Hit Curve Exercise Derive hit curve for following trace:  1 2 2 2 3 3 4 5 6 1 2 3 4 5 6 7 8 9 3 5 4 2 8 3 6 9 9 6 13 7 2 5 8 10 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 27
  28. 28. Hit Curve Exercise Derive hit curve for following trace:  1 2 2 2 3 3 4 5 6 100% 67% 33% 0% 1 2 3 4 5 6 7 8 9 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 28
  29. 29. Important CPU Internals Issues that affect performance  Pipelining  Branches & prediction  System calls (kernel crossings)  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 29
  30. 30. Scalar architecture + memory… Straight-up sequential execution  Fetch instruction  Decode it  Execute it  Problem: instruction or data miss in cache  Result – stall: everything stops  How long to wait for miss all the way to  RAM? UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 30
  31. 31. Superscalar architectures Out-of-order processors  Pipeline of instructions in flight  Instead of stalling on load, guess!  Branch prediction  Value prediction  Predictors based on history, location in program  Speculatively execute instructions  Actual results checked asynchronously  If mispredicted, squash instructions  Accurate prediction = massive speedup  Hides latency of memory hierarchy  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 31
  32. 32. Pipelining and Branches Pipelining overlaps instructions to exploit parallelism, allowing the clock rate to be increased. Branches cause bubbles in the pipeline, where some stages are left idle. Instruction fetch Instruction decode Execute Memory access Write back Unresolved branch UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
  33. 33. Branch Prediction A branch predictor allows the processor to speculatively fetch and execute instructions down the predicted path. Instruction fetch Instruction decode Execute Memory access Write back Speculative execution UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
  34. 34. Kernel Mode Protects OS from users  kernel = English for nucleus  Think atom  Only privileged code executes in kernel  System call –  Enters kernel mode  Flushes pipeline, saves context  Executes code in kernel land  Returns to user mode, restoring context  Where we are in user land  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 34
  35. 35. Timers & Interrupts Need to respond to events periodically  Change executing processes  Quantum – time limit for process execution  Fairness – when timer goes off, interrupt  Current process stops  OS takes control through interrupt handler  Scheduler chooses next process  Interrupts also signal I/O events  Network packet arrival, disk read complete…  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 35
  36. 36. To do Read C/C++ notes for next week  First homework assigned next week  Language: C/C++  Will be due in 2 weeks  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 36
  37. 37. The End UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 37
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×