Operating Systems - Architecture

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    Operating Systems - Architecture - Presentation Transcript

    1. Operating Systems CMPSCI 377 Architecture Emery Berger University of Massachusetts Amherst UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
    2. Architecture Hardware Support for Applications & OS  Architecture basics & details  Focus on characteristics exposed to  application programmer / OS UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 2
    3. The Memory Hierarchy Registers  Caches  Associativity  Misses  Locality  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 3
    4. Registers Register = dedicated name for word of  memory managed by CPU General-purpose: “AX”, “BX”, “CX” on x86  SP Special-purpose:  arg0 arg1 arg0 “SP” = stack pointer  arg1 arg2 “FP” = frame pointer FP  “PC” = program counter  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 4
    5. Registers Register = dedicated name for one word of  memory managed by CPU General-purpose: “AX”, “BX”, “CX” on x86  SP Special-purpose:  arg0 arg1 “SP” = stack pointer  “FP” = frame pointer FP  “PC” = program counter  Change processes:  save current registers & load saved registers = context switch UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 5
    6. Caches Access to main memory: “expensive”  ~ 100 cycles (slow, relatively cheap)  Caches: small, fast, expensive memory  Hold recently-accessed data (D$) or  instructions (I$) Different sizes & locations  Level 1 (L1) – on-chip, smallish  Level 2 (L2) – on or next to chip, larger  Level 3 (L3) – pretty large, on bus  Manages lines of memory (32-128 bytes)  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 6
    7. Memory Hierarchy Higher = small, fast, more $, lower latency  Lower = large, slow, less $, higher latency  registers 1-cycle latency 2-cycle latency L1 evict load D$, I$ separate L2 7-cycle latency D$, I$ unified RAM 100 cycle latency Disk 40,000,000 cycle latency Network 200,000,000+ cycle latency UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 7
    8. Cache Jargon Cache initially cold  Accessing data initially misses  Fetch from lower level in hierarchy  Bring line into cache (populate cache)  Next access: hit  Once cache holds most-frequently used  data: “warmed up” Context switch implications?  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 8
    9. Cache Details Ideal cache would be fully associative  That is, LRU (least-recently used) queue  Generally too expensive  Instead, partition memory addresses and  put into separate bins divided into ways 1-way or direct-mapped  2-way = 2 entries per bin  4-way = 4 entries per bin, etc.  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 9
    10. Associativity Example Hash memory based on addresses to  different indices in cache UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 10
    11. Miss Classification First access = compulsory miss  Unavoidable without prefetching  Too many items in way = conflict miss  Avoidable if we had higher associativity  No space in cache = capacity miss  Avoidable if cache were larger  Invalidated = coherence miss  Avoidable if cache were unshared  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 11
    12. Exercise Cache with 4 entries, 2-way associativity  Assume hash(x) = x % 4 (modulus)  How many misses?  # compulsory misses?  # conflict misses?  # capacity misses?  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 12
    13. Solution Cache with 4 entries, 2-way associativity  Assume hash(x) = x % 4 (modulus)  How many misses?  # compulsory misses? 10  # conflict misses?  # capacity misses?  3 7 11 2 3 7 7 9 9 6 13 7 2 5 8 10 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 13
    14. Solution Cache with 4 entries, 2-way associativity  Assume hash(x) = x % 4 (modulus)  How many misses?  # compulsory misses? 10  # conflict misses? 2  # capacity misses?  3 7 11 2 3 7 7 9 9 6 13 7 2 5 8 10 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 14
    15. Solution Cache with 4 entries, 2-way associativity  Assume hash(x) = x % 4 (modulus)  How many misses?  # compulsory misses? 10  # conflict misses? 2  # capacity misses? 0  3 7 11 2 3 7 7 9 9 6 13 7 2 5 8 10 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 15
    16. Locality Locality = re-use of recently-used items  Temporal locality: re-use in time  Spatial locality: use of nearby items  In same cache line, same page (4K chunk)  Intuitively – greater locality = fewer misses  # misses depends on cache layout, # of levels,  associativity… Machine-specific  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 16
    17. Quantifying Locality Instead of counting misses,  compute hit curve from LRU histogram Assume perfect LRU cache  Ignore compulsory misses  3 7 7 2 3 7 7 3 1 2 3 4 5 6 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 17
    18. Quantifying Locality Instead of counting misses,  compute hit curve from LRU histogram Assume perfect LRU cache  Ignore compulsory misses  3 7 7 2 3 7 7 3 1 2 3 4 5 6 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 18
    19. Quantifying Locality Instead of counting misses,  compute hit curve from LRU histogram Assume perfect LRU cache  Ignore compulsory misses  3 7 7 2 3 7 2 7 3 1 2 3 4 5 6 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 19
    20. Quantifying Locality Instead of counting misses,  compute hit curve from LRU histogram Assume perfect LRU cache  Ignore compulsory misses  3 7 7 2 3 7 2 7 3 1 2 3 4 5 6 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 20
    21. Quantifying Locality Instead of counting misses,  compute hit curve from LRU histogram Assume perfect LRU cache  Ignore compulsory misses  3 7 7 2 3 7 3 2 7 1 2 3 4 5 6 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 21
    22. Quantifying Locality Instead of counting misses,  compute hit curve from LRU histogram Assume perfect LRU cache  Ignore compulsory misses  3 7 7 2 3 7 3 2 7 1 2 3 4 5 6 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 22
    23. Quantifying Locality Instead of counting misses,  compute hit curve from LRU histogram Start with total misses on right hand side  Subtract histogram values  1 1 3 3 3 3 1 2 3 4 5 6 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 23
    24. Quantifying Locality Instead of counting misses,  compute hit curve from LRU histogram Start with total misses on right hand side  Subtract histogram values  Normalize  100% .3 .3 1 1 1 1 67% 33% 0% 1 2 3 4 5 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 24
    25. Hit Curve Exercise Derive hit curve for following trace:  3 5 4 2 8 3 6 9 9 6 13 7 2 5 8 10 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 25
    26. Hit Curve Exercise Derive hit curve for following trace:  1 2 3 4 5 6 7 8 9 3 5 4 2 8 3 6 9 9 6 13 7 2 5 8 10 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 26
    27. Hit Curve Exercise Derive hit curve for following trace:  1 2 2 2 3 3 4 5 6 1 2 3 4 5 6 7 8 9 3 5 4 2 8 3 6 9 9 6 13 7 2 5 8 10 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 27
    28. Hit Curve Exercise Derive hit curve for following trace:  1 2 2 2 3 3 4 5 6 100% 67% 33% 0% 1 2 3 4 5 6 7 8 9 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 28
    29. Important CPU Internals Issues that affect performance  Pipelining  Branches & prediction  System calls (kernel crossings)  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 29
    30. Scalar architecture + memory… Straight-up sequential execution  Fetch instruction  Decode it  Execute it  Problem: instruction or data miss in cache  Result – stall: everything stops  How long to wait for miss all the way to  RAM? UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 30
    31. Superscalar architectures Out-of-order processors  Pipeline of instructions in flight  Instead of stalling on load, guess!  Branch prediction  Value prediction  Predictors based on history, location in program  Speculatively execute instructions  Actual results checked asynchronously  If mispredicted, squash instructions  Accurate prediction = massive speedup  Hides latency of memory hierarchy  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 31
    32. Pipelining and Branches Pipelining overlaps instructions to exploit parallelism, allowing the clock rate to be increased. Branches cause bubbles in the pipeline, where some stages are left idle. Instruction fetch Instruction decode Execute Memory access Write back Unresolved branch UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
    33. Branch Prediction A branch predictor allows the processor to speculatively fetch and execute instructions down the predicted path. Instruction fetch Instruction decode Execute Memory access Write back Speculative execution UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
    34. Kernel Mode Protects OS from users  kernel = English for nucleus  Think atom  Only privileged code executes in kernel  System call –  Enters kernel mode  Flushes pipeline, saves context  Executes code in kernel land  Returns to user mode, restoring context  Where we are in user land  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 34
    35. Timers & Interrupts Need to respond to events periodically  Change executing processes  Quantum – time limit for process execution  Fairness – when timer goes off, interrupt  Current process stops  OS takes control through interrupt handler  Scheduler chooses next process  Interrupts also signal I/O events  Network packet arrival, disk read complete…  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 35
    36. To do Read C/C++ notes for next week  First homework assigned next week  Language: C/C++  Will be due in 2 weeks  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 36
    37. The End UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 37

    + Emery BergerEmery Berger, 3 years ago

    custom

    488 views, 0 favs, 1 embeds more stats

    From the Operating Systems course (CMPSCI 377) at U more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 488
      • 429 on SlideShare
      • 59 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 61
    Most viewed embeds
    • 59 views on http://prisms.cs.umass.edu

    more

    All embeds
    • 59 views on http://prisms.cs.umass.edu

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories