SlideShare a Scribd company logo
1 of 37
Download to read offline
Operating Systems
         CMPSCI 377
         Architecture
                   Emery Berger
University of Massachusetts Amherst




UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Architecture
    Hardware Support for Applications & OS


        Architecture basics & details
    

        Focus on characteristics exposed to
    

        application programmer / OS




        UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science   2
The Memory Hierarchy
    Registers


    Caches


        Associativity
    

        Misses
    

    Locality





        UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science   3
Registers
    Register = dedicated name for word of


    memory managed by CPU
        General-purpose: “AX”, “BX”, “CX” on x86
    
                                                 SP
        Special-purpose:
                                       arg0
                                        arg1
                                                                          arg0
              “SP” = stack pointer
                                                                         arg1
                                                                          arg2
              “FP” = frame pointer                                               FP
          

              “PC” = program counter
          




        UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science          4
Registers
    Register = dedicated name for one word of


    memory managed by CPU
        General-purpose: “AX”, “BX”, “CX” on x86
    
                                                 SP
        Special-purpose:
                                       arg0
                                        arg1
              “SP” = stack pointer
          

              “FP” = frame pointer                                             FP
          

              “PC” = program counter
          


    Change processes:


    save current registers &
    load saved registers =
    context switch
        UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science        5
Caches
    Access to main memory: “expensive”


        ~ 100 cycles (slow, relatively cheap)
    

    Caches: small, fast, expensive memory


        Hold recently-accessed data (D$) or
    

        instructions (I$)
        Different sizes & locations
    

              Level 1 (L1) – on-chip, smallish
          

              Level 2 (L2) – on or next to chip, larger
          

              Level 3 (L3) – pretty large, on bus
          


        Manages lines of memory (32-128 bytes)
    


        UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science   6
Memory Hierarchy
    Higher = small, fast, more $, lower latency


    Lower = large, slow, less $, higher latency


                   registers             1-cycle latency

                                         2-cycle latency
                      L1
                               evict
           load




                                                    D$, I$ separate
                      L2                 7-cycle latency
                                                      D$, I$ unified

                  RAM                    100 cycle latency


                  Disk                   40,000,000 cycle latency


            Network                            200,000,000+ cycle latency

         UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science   7
Cache Jargon
    Cache initially cold


    Accessing data initially misses


        Fetch from lower level in hierarchy
    

        Bring line into cache (populate cache)
    

        Next access: hit
    

    Once cache holds most-frequently used


    data: “warmed up”

    Context switch implications?



        UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science   8
Cache Details
    Ideal cache would be fully associative


        That is, LRU (least-recently used) queue
    

        Generally too expensive
    

    Instead, partition memory addresses and

    put into separate bins divided into ways
        1-way or direct-mapped
    

        2-way = 2 entries per bin
    

        4-way = 4 entries per bin, etc.
    




        UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science   9
Associativity Example
    Hash memory based on addresses to


    different indices in cache




     UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science   10
Miss Classification
    First access = compulsory miss


        Unavoidable without prefetching
    

    Too many items in way = conflict miss


        Avoidable if we had higher associativity
    

    No space in cache = capacity miss


        Avoidable if cache were larger
    

    Invalidated = coherence miss


        Avoidable if cache were unshared
    




        UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science   11
Exercise
    Cache with 4 entries, 2-way associativity


        Assume hash(x) = x % 4 (modulus)
    

    How many misses?


        # compulsory misses?
    

        # conflict misses?
    

        # capacity misses?
    




        UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science   12
Solution
    Cache with 4 entries, 2-way associativity


        Assume hash(x) = x % 4 (modulus)
    

    How many misses?


        # compulsory misses?                               10
    

        # conflict misses?
    

        # capacity misses?
    




3   7   11     2    3     7     7    9     9     6    13    7     2     5      8   10




        UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science            13
Solution
    Cache with 4 entries, 2-way associativity


        Assume hash(x) = x % 4 (modulus)
    

    How many misses?


        # compulsory misses?                               10
    

        # conflict misses?                                 2
    

        # capacity misses?
    




3   7   11     2    3     7     7    9     9     6    13    7     2     5      8   10




        UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science            14
Solution
    Cache with 4 entries, 2-way associativity


        Assume hash(x) = x % 4 (modulus)
    

    How many misses?


        # compulsory misses?                               10
    

        # conflict misses?                                 2
    

        # capacity misses?                                 0
    




3   7   11     2    3     7     7    9     9     6    13    7     2     5      8   10




        UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science            15
Locality
    Locality = re-use of recently-used items


        Temporal locality: re-use in time
    

        Spatial locality: use of nearby items
    

              In same cache line, same page (4K chunk)
          


    Intuitively – greater locality = fewer misses


        # misses depends on cache layout, # of levels,
    

        associativity…
        Machine-specific
    




        UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science   16
Quantifying Locality
        Instead of counting misses,
    

        compute hit curve from LRU histogram
            Assume perfect LRU cache
        

                  Ignore compulsory misses
              



        3                7             7              2              3             7




7
3
                     1       2   3    4    5     6


            UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science       17
Quantifying Locality
        Instead of counting misses,
    

        compute hit curve from LRU histogram
            Assume perfect LRU cache
        

                  Ignore compulsory misses
              



        3                7             7              2              3             7




7
3
                     1       2   3    4    5     6


            UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science       18
Quantifying Locality
        Instead of counting misses,
    

        compute hit curve from LRU histogram
            Assume perfect LRU cache
        

                  Ignore compulsory misses
              



        3                7             7              2              3             7


2
7
3
                     1       2   3    4    5     6


            UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science       19
Quantifying Locality
        Instead of counting misses,
    

        compute hit curve from LRU histogram
            Assume perfect LRU cache
        

                  Ignore compulsory misses
              



        3                7             7              2              3             7


2
7
3
                     1       2   3    4    5     6


            UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science       20
Quantifying Locality
        Instead of counting misses,
    

        compute hit curve from LRU histogram
            Assume perfect LRU cache
        

                  Ignore compulsory misses
              



        3                7             7              2              3             7


3
2
7
                     1       2   3    4    5     6


            UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science       21
Quantifying Locality
        Instead of counting misses,
    

        compute hit curve from LRU histogram
            Assume perfect LRU cache
        

                  Ignore compulsory misses
              



        3                7             7              2              3             7


3
2
7
                     1       2   3    4    5     6


            UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science       22
Quantifying Locality
    Instead of counting misses,


    compute hit curve from LRU histogram
        Start with total misses on right hand side
    

        Subtract histogram values
    




                 1    1     3     3    3     3




                 1    2     3     4    5     6


        UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science   23
Quantifying Locality
    Instead of counting misses,


    compute hit curve from LRU histogram
        Start with total misses on right hand side
    

        Subtract histogram values
    

        Normalize
    
    100%

                   .3   .3   1       1   1   1
     67%



     33%



        0%
               1        2        3       4       5
        UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science   24
Hit Curve Exercise
    Derive hit curve for following trace:





3   5    4     2    8     3     6    9     9     6    13    7     2     5      8   10




        UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science            25
Hit Curve Exercise
    Derive hit curve for following trace:





                   1       2       3       4       5       6       7    8   9




3   5    4     2       8       3       6       9       9       6       13   7   2   5   8   10




        UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science                     26
Hit Curve Exercise
    Derive hit curve for following trace:




                   1       2       2       2       3       3       4    5   6




                   1       2       3       4       5       6       7    8   9




3   5    4     2       8       3       6       9       9       6       13   7   2   5   8   10




        UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science                     27
Hit Curve Exercise
    Derive hit curve for following trace:




               1    2     2     2    3     3       4       5       6


       100%



        67%



        33%



         0%
               1     2     3     4    5        6       7       8       9




      UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science   28
Important CPU Internals
    Issues that affect performance


        Pipelining
    

        Branches & prediction
    

        System calls (kernel crossings)
    




        UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science   29
Scalar architecture + memory…
    Straight-up sequential execution


        Fetch instruction
    

        Decode it
    

        Execute it
    

    Problem: instruction or data miss in cache


        Result – stall: everything stops
    

        How long to wait for miss all the way to
    

        RAM?



        UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science   30
Superscalar architectures
    Out-of-order processors


        Pipeline of instructions in flight
    

        Instead of stalling on load, guess!
    

              Branch prediction
          

              Value prediction
          

                     Predictors based on history, location in program
                 


        Speculatively execute instructions
    

              Actual results checked asynchronously
          

              If mispredicted, squash instructions
          



    Accurate prediction = massive speedup


        Hides latency of memory hierarchy
    
        UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science   31
Pipelining and Branches

Pipelining overlaps instructions to
exploit parallelism, allowing the clock
rate to be increased. Branches cause
bubbles in the pipeline, where some
stages are left idle.

              Instruction fetch
              Instruction decode
              Execute
              Memory access
              Write back
              Unresolved branch
               UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Branch Prediction

A branch predictor allows the processor
to speculatively fetch and execute
instructions down the predicted path.

             Instruction fetch
             Instruction decode
             Execute
             Memory access
             Write back
             Speculative execution


              UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Kernel Mode
    Protects OS from users


        kernel = English for nucleus
    

              Think atom
          


    Only privileged code executes in kernel


    System call –


        Enters kernel mode
    

              Flushes pipeline, saves context
          


        Executes code in kernel land
    

        Returns to user mode, restoring context
    

              Where we are in user land
          


        UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science   34
Timers & Interrupts
    Need to respond to events periodically


        Change executing processes
    

              Quantum – time limit for process execution
          


    Fairness – when timer goes off, interrupt


        Current process stops
    

        OS takes control through interrupt handler
    

        Scheduler chooses next process
    

    Interrupts also signal I/O events


        Network packet arrival, disk read complete…
    



        UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science   35
To do
    Read C/C++ notes for next week


    First homework assigned next week


        Language: C/C++
    

        Will be due in 2 weeks
    




        UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science   36
The End




   UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science   37

More Related Content

Viewers also liked

2.2 Demonstrate the understanding of Programming Life Cycle
2.2 Demonstrate the understanding of Programming Life Cycle2.2 Demonstrate the understanding of Programming Life Cycle
2.2 Demonstrate the understanding of Programming Life CycleFrankie Jones
 
Computer function-and-interconnection 3
Computer function-and-interconnection 3Computer function-and-interconnection 3
Computer function-and-interconnection 3Mujaheed Sulantingan
 
Chapter 3 Computer Organization
Chapter 3 Computer OrganizationChapter 3 Computer Organization
Chapter 3 Computer OrganizationFrankie Jones
 
top level view of computer function and interconnection
top level view of computer function and interconnectiontop level view of computer function and interconnection
top level view of computer function and interconnectionSajid Marwat
 
Chapter 2 Boolean Algebra (part 2)
Chapter 2 Boolean Algebra (part 2)Chapter 2 Boolean Algebra (part 2)
Chapter 2 Boolean Algebra (part 2)Frankie Jones
 
Introduction to programming principles languages
Introduction to programming principles languagesIntroduction to programming principles languages
Introduction to programming principles languagesFrankie Jones
 
2.1 Understand problem solving concept
2.1 Understand problem solving concept2.1 Understand problem solving concept
2.1 Understand problem solving conceptFrankie Jones
 
Basic concepts of information technology and the internet
Basic concepts of information technology and the internetBasic concepts of information technology and the internet
Basic concepts of information technology and the internetFrankie Jones
 
Chapter 2 Data Representation on CPU (part 1)
Chapter 2 Data Representation on CPU (part 1)Chapter 2 Data Representation on CPU (part 1)
Chapter 2 Data Representation on CPU (part 1)Frankie Jones
 
2.3 Apply the different types of algorithm to solve problem
2.3 Apply the different types of algorithm to solve problem2.3 Apply the different types of algorithm to solve problem
2.3 Apply the different types of algorithm to solve problemFrankie Jones
 
Input Output - Computer Architecture
Input Output - Computer ArchitectureInput Output - Computer Architecture
Input Output - Computer ArchitectureMaruf Abdullah (Rion)
 
Chapter 1 computer hardware and flow of information
Chapter 1 computer hardware and flow of informationChapter 1 computer hardware and flow of information
Chapter 1 computer hardware and flow of informationFrankie Jones
 
Chapter 3 INSTRUCTION SET AND ASSEMBLY LANGUAGE PROGRAMMING
Chapter 3 INSTRUCTION SET AND ASSEMBLY LANGUAGE PROGRAMMINGChapter 3 INSTRUCTION SET AND ASSEMBLY LANGUAGE PROGRAMMING
Chapter 3 INSTRUCTION SET AND ASSEMBLY LANGUAGE PROGRAMMINGFrankie Jones
 

Viewers also liked (14)

2.2 Demonstrate the understanding of Programming Life Cycle
2.2 Demonstrate the understanding of Programming Life Cycle2.2 Demonstrate the understanding of Programming Life Cycle
2.2 Demonstrate the understanding of Programming Life Cycle
 
Computer function-and-interconnection 3
Computer function-and-interconnection 3Computer function-and-interconnection 3
Computer function-and-interconnection 3
 
Chapter 3 Computer Organization
Chapter 3 Computer OrganizationChapter 3 Computer Organization
Chapter 3 Computer Organization
 
top level view of computer function and interconnection
top level view of computer function and interconnectiontop level view of computer function and interconnection
top level view of computer function and interconnection
 
Chapter 2 Boolean Algebra (part 2)
Chapter 2 Boolean Algebra (part 2)Chapter 2 Boolean Algebra (part 2)
Chapter 2 Boolean Algebra (part 2)
 
Introduction to programming principles languages
Introduction to programming principles languagesIntroduction to programming principles languages
Introduction to programming principles languages
 
2.1 Understand problem solving concept
2.1 Understand problem solving concept2.1 Understand problem solving concept
2.1 Understand problem solving concept
 
Basic concepts of information technology and the internet
Basic concepts of information technology and the internetBasic concepts of information technology and the internet
Basic concepts of information technology and the internet
 
Chapter 2 Data Representation on CPU (part 1)
Chapter 2 Data Representation on CPU (part 1)Chapter 2 Data Representation on CPU (part 1)
Chapter 2 Data Representation on CPU (part 1)
 
2.3 Apply the different types of algorithm to solve problem
2.3 Apply the different types of algorithm to solve problem2.3 Apply the different types of algorithm to solve problem
2.3 Apply the different types of algorithm to solve problem
 
Bus interconnection
Bus interconnectionBus interconnection
Bus interconnection
 
Input Output - Computer Architecture
Input Output - Computer ArchitectureInput Output - Computer Architecture
Input Output - Computer Architecture
 
Chapter 1 computer hardware and flow of information
Chapter 1 computer hardware and flow of informationChapter 1 computer hardware and flow of information
Chapter 1 computer hardware and flow of information
 
Chapter 3 INSTRUCTION SET AND ASSEMBLY LANGUAGE PROGRAMMING
Chapter 3 INSTRUCTION SET AND ASSEMBLY LANGUAGE PROGRAMMINGChapter 3 INSTRUCTION SET AND ASSEMBLY LANGUAGE PROGRAMMING
Chapter 3 INSTRUCTION SET AND ASSEMBLY LANGUAGE PROGRAMMING
 

Similar to Operating Systems - Architecture

Operating Systems - Virtual Memory
Operating Systems - Virtual MemoryOperating Systems - Virtual Memory
Operating Systems - Virtual MemoryEmery Berger
 
Memory Management for High-Performance Applications
Memory Management for High-Performance ApplicationsMemory Management for High-Performance Applications
Memory Management for High-Performance ApplicationsEmery Berger
 
Processes and Threads
Processes and ThreadsProcesses and Threads
Processes and ThreadsEmery Berger
 
Virtual Memory and Paging
Virtual Memory and PagingVirtual Memory and Paging
Virtual Memory and PagingEmery Berger
 
javascript teach
javascript teachjavascript teach
javascript teachguest3732fa
 
JSBootcamp_White
JSBootcamp_WhiteJSBootcamp_White
JSBootcamp_Whiteguest3732fa
 
Reconsidering Custom Memory Allocation
Reconsidering Custom Memory AllocationReconsidering Custom Memory Allocation
Reconsidering Custom Memory AllocationEmery Berger
 
Operating Systems - Distributed Parallel Computing
Operating Systems - Distributed Parallel ComputingOperating Systems - Distributed Parallel Computing
Operating Systems - Distributed Parallel ComputingEmery Berger
 
A Re-Introduction to JavaScript
A Re-Introduction to JavaScriptA Re-Introduction to JavaScript
A Re-Introduction to JavaScriptSimon Willison
 
MapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementMapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementKyong-Ha Lee
 
Rubish- A Quixotic Shell
Rubish- A Quixotic ShellRubish- A Quixotic Shell
Rubish- A Quixotic Shellguest3464d2
 
Operating Systems - Advanced File Systems
Operating Systems - Advanced File SystemsOperating Systems - Advanced File Systems
Operating Systems - Advanced File SystemsEmery Berger
 
Lightweight Grids With Terracotta
Lightweight Grids With TerracottaLightweight Grids With Terracotta
Lightweight Grids With TerracottaPT.JUG
 
Reproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureReproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureJason Riedy
 
Make Your Life Easier With Maatkit
Make Your Life Easier With MaatkitMake Your Life Easier With Maatkit
Make Your Life Easier With MaatkitMySQLConference
 
Garbage Collection without Paging
Garbage Collection without PagingGarbage Collection without Paging
Garbage Collection without PagingEmery Berger
 
Google_A_Behind_the_Scenes_Tour_-_Jeff_Dean
Google_A_Behind_the_Scenes_Tour_-_Jeff_DeanGoogle_A_Behind_the_Scenes_Tour_-_Jeff_Dean
Google_A_Behind_the_Scenes_Tour_-_Jeff_DeanHiroshi Ono
 

Similar to Operating Systems - Architecture (20)

Operating Systems - Virtual Memory
Operating Systems - Virtual MemoryOperating Systems - Virtual Memory
Operating Systems - Virtual Memory
 
Memory Management for High-Performance Applications
Memory Management for High-Performance ApplicationsMemory Management for High-Performance Applications
Memory Management for High-Performance Applications
 
Processes and Threads
Processes and ThreadsProcesses and Threads
Processes and Threads
 
Virtual Memory and Paging
Virtual Memory and PagingVirtual Memory and Paging
Virtual Memory and Paging
 
javascript teach
javascript teachjavascript teach
javascript teach
 
JSBootcamp_White
JSBootcamp_WhiteJSBootcamp_White
JSBootcamp_White
 
Reconsidering Custom Memory Allocation
Reconsidering Custom Memory AllocationReconsidering Custom Memory Allocation
Reconsidering Custom Memory Allocation
 
Gpu Join Presentation
Gpu Join PresentationGpu Join Presentation
Gpu Join Presentation
 
Operating Systems - Distributed Parallel Computing
Operating Systems - Distributed Parallel ComputingOperating Systems - Distributed Parallel Computing
Operating Systems - Distributed Parallel Computing
 
Mapreduce Pact06 Keynote
Mapreduce Pact06 KeynoteMapreduce Pact06 Keynote
Mapreduce Pact06 Keynote
 
A Re-Introduction to JavaScript
A Re-Introduction to JavaScriptA Re-Introduction to JavaScript
A Re-Introduction to JavaScript
 
MapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementMapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvement
 
Rubish- A Quixotic Shell
Rubish- A Quixotic ShellRubish- A Quixotic Shell
Rubish- A Quixotic Shell
 
Operating Systems - Advanced File Systems
Operating Systems - Advanced File SystemsOperating Systems - Advanced File Systems
Operating Systems - Advanced File Systems
 
Lightweight Grids With Terracotta
Lightweight Grids With TerracottaLightweight Grids With Terracotta
Lightweight Grids With Terracotta
 
sysprog3 Part2
sysprog3 Part2sysprog3 Part2
sysprog3 Part2
 
Reproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureReproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to Architecture
 
Make Your Life Easier With Maatkit
Make Your Life Easier With MaatkitMake Your Life Easier With Maatkit
Make Your Life Easier With Maatkit
 
Garbage Collection without Paging
Garbage Collection without PagingGarbage Collection without Paging
Garbage Collection without Paging
 
Google_A_Behind_the_Scenes_Tour_-_Jeff_Dean
Google_A_Behind_the_Scenes_Tour_-_Jeff_DeanGoogle_A_Behind_the_Scenes_Tour_-_Jeff_Dean
Google_A_Behind_the_Scenes_Tour_-_Jeff_Dean
 

More from Emery Berger

Doppio: Breaking the Browser Language Barrier
Doppio: Breaking the Browser Language BarrierDoppio: Breaking the Browser Language Barrier
Doppio: Breaking the Browser Language BarrierEmery Berger
 
Dthreads: Efficient Deterministic Multithreading
Dthreads: Efficient Deterministic MultithreadingDthreads: Efficient Deterministic Multithreading
Dthreads: Efficient Deterministic MultithreadingEmery Berger
 
Programming with People
Programming with PeopleProgramming with People
Programming with PeopleEmery Berger
 
Stabilizer: Statistically Sound Performance Evaluation
Stabilizer: Statistically Sound Performance EvaluationStabilizer: Statistically Sound Performance Evaluation
Stabilizer: Statistically Sound Performance EvaluationEmery Berger
 
DieHarder (CCS 2010, WOOT 2011)
DieHarder (CCS 2010, WOOT 2011)DieHarder (CCS 2010, WOOT 2011)
DieHarder (CCS 2010, WOOT 2011)Emery Berger
 
Operating Systems - File Systems
Operating Systems - File SystemsOperating Systems - File Systems
Operating Systems - File SystemsEmery Berger
 
Operating Systems - Advanced Synchronization
Operating Systems - Advanced SynchronizationOperating Systems - Advanced Synchronization
Operating Systems - Advanced SynchronizationEmery Berger
 
Operating Systems - Synchronization
Operating Systems - SynchronizationOperating Systems - Synchronization
Operating Systems - SynchronizationEmery Berger
 
MC2: High-Performance Garbage Collection for Memory-Constrained Environments
MC2: High-Performance Garbage Collection for Memory-Constrained EnvironmentsMC2: High-Performance Garbage Collection for Memory-Constrained Environments
MC2: High-Performance Garbage Collection for Memory-Constrained EnvironmentsEmery Berger
 
Vam: A Locality-Improving Dynamic Memory Allocator
Vam: A Locality-Improving Dynamic Memory AllocatorVam: A Locality-Improving Dynamic Memory Allocator
Vam: A Locality-Improving Dynamic Memory AllocatorEmery Berger
 
Quantifying the Performance of Garbage Collection vs. Explicit Memory Management
Quantifying the Performance of Garbage Collection vs. Explicit Memory ManagementQuantifying the Performance of Garbage Collection vs. Explicit Memory Management
Quantifying the Performance of Garbage Collection vs. Explicit Memory ManagementEmery Berger
 
DieHard: Probabilistic Memory Safety for Unsafe Languages
DieHard: Probabilistic Memory Safety for Unsafe LanguagesDieHard: Probabilistic Memory Safety for Unsafe Languages
DieHard: Probabilistic Memory Safety for Unsafe LanguagesEmery Berger
 
Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf ...
Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf ...Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf ...
Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf ...Emery Berger
 
Composing High-Performance Memory Allocators with Heap Layers
Composing High-Performance Memory Allocators with Heap LayersComposing High-Performance Memory Allocators with Heap Layers
Composing High-Performance Memory Allocators with Heap LayersEmery Berger
 
CRAMM: Virtual Memory Support for Garbage-Collected Applications
CRAMM: Virtual Memory Support for Garbage-Collected ApplicationsCRAMM: Virtual Memory Support for Garbage-Collected Applications
CRAMM: Virtual Memory Support for Garbage-Collected ApplicationsEmery Berger
 

More from Emery Berger (15)

Doppio: Breaking the Browser Language Barrier
Doppio: Breaking the Browser Language BarrierDoppio: Breaking the Browser Language Barrier
Doppio: Breaking the Browser Language Barrier
 
Dthreads: Efficient Deterministic Multithreading
Dthreads: Efficient Deterministic MultithreadingDthreads: Efficient Deterministic Multithreading
Dthreads: Efficient Deterministic Multithreading
 
Programming with People
Programming with PeopleProgramming with People
Programming with People
 
Stabilizer: Statistically Sound Performance Evaluation
Stabilizer: Statistically Sound Performance EvaluationStabilizer: Statistically Sound Performance Evaluation
Stabilizer: Statistically Sound Performance Evaluation
 
DieHarder (CCS 2010, WOOT 2011)
DieHarder (CCS 2010, WOOT 2011)DieHarder (CCS 2010, WOOT 2011)
DieHarder (CCS 2010, WOOT 2011)
 
Operating Systems - File Systems
Operating Systems - File SystemsOperating Systems - File Systems
Operating Systems - File Systems
 
Operating Systems - Advanced Synchronization
Operating Systems - Advanced SynchronizationOperating Systems - Advanced Synchronization
Operating Systems - Advanced Synchronization
 
Operating Systems - Synchronization
Operating Systems - SynchronizationOperating Systems - Synchronization
Operating Systems - Synchronization
 
MC2: High-Performance Garbage Collection for Memory-Constrained Environments
MC2: High-Performance Garbage Collection for Memory-Constrained EnvironmentsMC2: High-Performance Garbage Collection for Memory-Constrained Environments
MC2: High-Performance Garbage Collection for Memory-Constrained Environments
 
Vam: A Locality-Improving Dynamic Memory Allocator
Vam: A Locality-Improving Dynamic Memory AllocatorVam: A Locality-Improving Dynamic Memory Allocator
Vam: A Locality-Improving Dynamic Memory Allocator
 
Quantifying the Performance of Garbage Collection vs. Explicit Memory Management
Quantifying the Performance of Garbage Collection vs. Explicit Memory ManagementQuantifying the Performance of Garbage Collection vs. Explicit Memory Management
Quantifying the Performance of Garbage Collection vs. Explicit Memory Management
 
DieHard: Probabilistic Memory Safety for Unsafe Languages
DieHard: Probabilistic Memory Safety for Unsafe LanguagesDieHard: Probabilistic Memory Safety for Unsafe Languages
DieHard: Probabilistic Memory Safety for Unsafe Languages
 
Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf ...
Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf ...Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf ...
Exploiting Multicore CPUs Now: Scalability and Reliability for Off-the-shelf ...
 
Composing High-Performance Memory Allocators with Heap Layers
Composing High-Performance Memory Allocators with Heap LayersComposing High-Performance Memory Allocators with Heap Layers
Composing High-Performance Memory Allocators with Heap Layers
 
CRAMM: Virtual Memory Support for Garbage-Collected Applications
CRAMM: Virtual Memory Support for Garbage-Collected ApplicationsCRAMM: Virtual Memory Support for Garbage-Collected Applications
CRAMM: Virtual Memory Support for Garbage-Collected Applications
 

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 

Recently uploaded (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 

Operating Systems - Architecture

  • 1. Operating Systems CMPSCI 377 Architecture Emery Berger University of Massachusetts Amherst UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
  • 2. Architecture Hardware Support for Applications & OS  Architecture basics & details  Focus on characteristics exposed to  application programmer / OS UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 2
  • 3. The Memory Hierarchy Registers  Caches  Associativity  Misses  Locality  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 3
  • 4. Registers Register = dedicated name for word of  memory managed by CPU General-purpose: “AX”, “BX”, “CX” on x86  SP Special-purpose:  arg0 arg1 arg0 “SP” = stack pointer  arg1 arg2 “FP” = frame pointer FP  “PC” = program counter  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 4
  • 5. Registers Register = dedicated name for one word of  memory managed by CPU General-purpose: “AX”, “BX”, “CX” on x86  SP Special-purpose:  arg0 arg1 “SP” = stack pointer  “FP” = frame pointer FP  “PC” = program counter  Change processes:  save current registers & load saved registers = context switch UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 5
  • 6. Caches Access to main memory: “expensive”  ~ 100 cycles (slow, relatively cheap)  Caches: small, fast, expensive memory  Hold recently-accessed data (D$) or  instructions (I$) Different sizes & locations  Level 1 (L1) – on-chip, smallish  Level 2 (L2) – on or next to chip, larger  Level 3 (L3) – pretty large, on bus  Manages lines of memory (32-128 bytes)  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 6
  • 7. Memory Hierarchy Higher = small, fast, more $, lower latency  Lower = large, slow, less $, higher latency  registers 1-cycle latency 2-cycle latency L1 evict load D$, I$ separate L2 7-cycle latency D$, I$ unified RAM 100 cycle latency Disk 40,000,000 cycle latency Network 200,000,000+ cycle latency UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 7
  • 8. Cache Jargon Cache initially cold  Accessing data initially misses  Fetch from lower level in hierarchy  Bring line into cache (populate cache)  Next access: hit  Once cache holds most-frequently used  data: “warmed up” Context switch implications?  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 8
  • 9. Cache Details Ideal cache would be fully associative  That is, LRU (least-recently used) queue  Generally too expensive  Instead, partition memory addresses and  put into separate bins divided into ways 1-way or direct-mapped  2-way = 2 entries per bin  4-way = 4 entries per bin, etc.  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 9
  • 10. Associativity Example Hash memory based on addresses to  different indices in cache UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 10
  • 11. Miss Classification First access = compulsory miss  Unavoidable without prefetching  Too many items in way = conflict miss  Avoidable if we had higher associativity  No space in cache = capacity miss  Avoidable if cache were larger  Invalidated = coherence miss  Avoidable if cache were unshared  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 11
  • 12. Exercise Cache with 4 entries, 2-way associativity  Assume hash(x) = x % 4 (modulus)  How many misses?  # compulsory misses?  # conflict misses?  # capacity misses?  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 12
  • 13. Solution Cache with 4 entries, 2-way associativity  Assume hash(x) = x % 4 (modulus)  How many misses?  # compulsory misses? 10  # conflict misses?  # capacity misses?  3 7 11 2 3 7 7 9 9 6 13 7 2 5 8 10 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 13
  • 14. Solution Cache with 4 entries, 2-way associativity  Assume hash(x) = x % 4 (modulus)  How many misses?  # compulsory misses? 10  # conflict misses? 2  # capacity misses?  3 7 11 2 3 7 7 9 9 6 13 7 2 5 8 10 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 14
  • 15. Solution Cache with 4 entries, 2-way associativity  Assume hash(x) = x % 4 (modulus)  How many misses?  # compulsory misses? 10  # conflict misses? 2  # capacity misses? 0  3 7 11 2 3 7 7 9 9 6 13 7 2 5 8 10 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 15
  • 16. Locality Locality = re-use of recently-used items  Temporal locality: re-use in time  Spatial locality: use of nearby items  In same cache line, same page (4K chunk)  Intuitively – greater locality = fewer misses  # misses depends on cache layout, # of levels,  associativity… Machine-specific  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 16
  • 17. Quantifying Locality Instead of counting misses,  compute hit curve from LRU histogram Assume perfect LRU cache  Ignore compulsory misses  3 7 7 2 3 7 7 3 1 2 3 4 5 6 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 17
  • 18. Quantifying Locality Instead of counting misses,  compute hit curve from LRU histogram Assume perfect LRU cache  Ignore compulsory misses  3 7 7 2 3 7 7 3 1 2 3 4 5 6 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 18
  • 19. Quantifying Locality Instead of counting misses,  compute hit curve from LRU histogram Assume perfect LRU cache  Ignore compulsory misses  3 7 7 2 3 7 2 7 3 1 2 3 4 5 6 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 19
  • 20. Quantifying Locality Instead of counting misses,  compute hit curve from LRU histogram Assume perfect LRU cache  Ignore compulsory misses  3 7 7 2 3 7 2 7 3 1 2 3 4 5 6 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 20
  • 21. Quantifying Locality Instead of counting misses,  compute hit curve from LRU histogram Assume perfect LRU cache  Ignore compulsory misses  3 7 7 2 3 7 3 2 7 1 2 3 4 5 6 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 21
  • 22. Quantifying Locality Instead of counting misses,  compute hit curve from LRU histogram Assume perfect LRU cache  Ignore compulsory misses  3 7 7 2 3 7 3 2 7 1 2 3 4 5 6 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 22
  • 23. Quantifying Locality Instead of counting misses,  compute hit curve from LRU histogram Start with total misses on right hand side  Subtract histogram values  1 1 3 3 3 3 1 2 3 4 5 6 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 23
  • 24. Quantifying Locality Instead of counting misses,  compute hit curve from LRU histogram Start with total misses on right hand side  Subtract histogram values  Normalize  100% .3 .3 1 1 1 1 67% 33% 0% 1 2 3 4 5 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 24
  • 25. Hit Curve Exercise Derive hit curve for following trace:  3 5 4 2 8 3 6 9 9 6 13 7 2 5 8 10 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 25
  • 26. Hit Curve Exercise Derive hit curve for following trace:  1 2 3 4 5 6 7 8 9 3 5 4 2 8 3 6 9 9 6 13 7 2 5 8 10 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 26
  • 27. Hit Curve Exercise Derive hit curve for following trace:  1 2 2 2 3 3 4 5 6 1 2 3 4 5 6 7 8 9 3 5 4 2 8 3 6 9 9 6 13 7 2 5 8 10 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 27
  • 28. Hit Curve Exercise Derive hit curve for following trace:  1 2 2 2 3 3 4 5 6 100% 67% 33% 0% 1 2 3 4 5 6 7 8 9 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 28
  • 29. Important CPU Internals Issues that affect performance  Pipelining  Branches & prediction  System calls (kernel crossings)  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 29
  • 30. Scalar architecture + memory… Straight-up sequential execution  Fetch instruction  Decode it  Execute it  Problem: instruction or data miss in cache  Result – stall: everything stops  How long to wait for miss all the way to  RAM? UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 30
  • 31. Superscalar architectures Out-of-order processors  Pipeline of instructions in flight  Instead of stalling on load, guess!  Branch prediction  Value prediction  Predictors based on history, location in program  Speculatively execute instructions  Actual results checked asynchronously  If mispredicted, squash instructions  Accurate prediction = massive speedup  Hides latency of memory hierarchy  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 31
  • 32. Pipelining and Branches Pipelining overlaps instructions to exploit parallelism, allowing the clock rate to be increased. Branches cause bubbles in the pipeline, where some stages are left idle. Instruction fetch Instruction decode Execute Memory access Write back Unresolved branch UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
  • 33. Branch Prediction A branch predictor allows the processor to speculatively fetch and execute instructions down the predicted path. Instruction fetch Instruction decode Execute Memory access Write back Speculative execution UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
  • 34. Kernel Mode Protects OS from users  kernel = English for nucleus  Think atom  Only privileged code executes in kernel  System call –  Enters kernel mode  Flushes pipeline, saves context  Executes code in kernel land  Returns to user mode, restoring context  Where we are in user land  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 34
  • 35. Timers & Interrupts Need to respond to events periodically  Change executing processes  Quantum – time limit for process execution  Fairness – when timer goes off, interrupt  Current process stops  OS takes control through interrupt handler  Scheduler chooses next process  Interrupts also signal I/O events  Network packet arrival, disk read complete…  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 35
  • 36. To do Read C/C++ notes for next week  First homework assigned next week  Language: C/C++  Will be due in 2 weeks  UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 36
  • 37. The End UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 37