2. Characteristics of memory system
• Location
• Capacity
• Unit of transfer
• Access method
• Performance
• Physical type
• Physical characteristics
• Organisation
2
5. Capacity
• Word size
• The natural unit of organisation
• The size of the word is typically equal to the number of bits used to represent an
integer and to the instruction length.
• Number of words
• or Bytes
6. Unit of Transfer
• For main memory, this is the number of bits read out of or written
into memory at a time.
• Internal
• Usually governed by data bus width
• External
• Usually a block which is much larger than a word
• Addressable unit
• Smallest location which can be uniquely addressed
• Word internally
• Cluster on M$ disks
7. Access Methods (1)
• Sequential
• Start at the beginning and read through in order
• Access time depends on location of data and previous location
• e.g. tape
• Direct
• Individual blocks have unique address
• Access is by jumping to vicinity plus sequential search
• Access time depends on location and previous location
• e.g. disk
8. Access Methods (2)
• Random
• Individual addresses identify locations exactly
• Access time is independent of location or previous access
• e.g. RAM
• Associative
• Data is located by a comparison with contents of a portion of the store
• Access time is independent of location or previous access
• e.g. cache
9. Memory Hierarchy
• Registers
• In CPU
• Internal or Main memory
• May include one or more levels of cache
• “RAM”
• External memory
• Backing store
11. Performance
• Access time
• Time between presenting the address and getting the valid data
• Memory Cycle time
• Time may be required for the memory to “recover” before next access
• Cycle time is access + recovery
• Transfer Rate
• Rate at which data can be moved
17. So you want fast?
• It is possible to build a computer which uses only static RAM (see
later)
• This would be very fast
• This would need no cache
• How can you cache cache?
• This would cost a very large amount
18. Locality of Reference
• During the course of the execution of a program, memory references
tend to cluster
• e.g. loops
22. Cache operation – overview
• CPU requests contents of memory location
• Check cache for this data
• If present, get from cache (fast)
• If not present, read required block from main memory to cache
• Then deliver from cache to CPU
• Cache includes tags to identify which block of main memory is in each
cache slot
28. Cache Mapping
• we discussed three cache mapping functions, i.e., methods of
addressing to locate data within a cache.
• Direct
• Full Associative
• Set Associative
28
29. Cache Mapping
• RAM is divided into blocks of memory locations.
• In other words, memory locations are grouped into blocks of 2n
locations
• where n represents the number of bits used to identify a word within
a block.
29
30. Mapping Function
• Cache of 64kByte
• Cache block of 4 bytes
• i.e. cache is 16k (214) lines of 4 bytes
• 16MBytes main memory
• 24 bit address
• (224=16M)
32. Direct Mapping
• Each block of main memory maps to only one cache line
• i.e. if a block is in cache, it must be in one specific place
• Address is in two parts
• Least Significant w bits identify unique word
• Most Significant s bits specify one memory block
• The MSBs are split into a cache line field r and a tag of s-r (most
significant)
33. Direct Mapping
Address Structure
• 24 bit address
• 2 bit word identifier (4 byte block)
• 22 bit block identifier
• 8 bit tag (=22-14)
• 14 bit slot or line
• No two blocks in the same line have the same Tag field
• Check contents of cache by finding line and checking Tag
Tag s-r Line or Slot r Word w
8 14 2
35. Direct Mapping Summary
• Address length = (s + w) bits
• Number of addressable units = 2s+w words or bytes
• Block size = line size = 2w words or bytes
• Number of blocks in main memory = 2s+ w/2w = 2s
• Number of lines in cache = m = 2r
• Size of tag = (s – r) bits
36. Direct Mapping pros & cons
• Simple
• Inexpensive
• Fixed location for given block
• If a program accesses 2 blocks that map to the same line repeatedly, cache
misses are very high
38. Associative Mapping
• A main memory block can load into any line of cache
• Memory address is interpreted as tag and word
• Tag uniquely identifies block of memory
• Every line’s tag is examined for a match
• Cache searching gets expensive
40. Tag 22 bit
Word
2 bit
Associative Mapping Address Structure
• 22 bit tag stored with each 32 bit block of data
• Compare tag field with tag entry in cache to check for hit
• Least significant 2 bits of address identify which 16 bit word is
required from 32 bit data block
• e.g.
• Address Tag Data Cache line
• FFFFFC FFFFFC 24682468 3FFF
41. Associative Mapping Summary
• Address length = (s + w) bits
• Number of addressable units = 2s+w words or bytes
• Block size = line size = 2w words or bytes
• Number of blocks in main memory = 2s+ w/2w = 2s
• Number of lines in cache = undetermined
• Size of tag = s bits
42. Set Associative Mapping
• Cache is divided into a number of sets
• Each set contains a number of lines
• A given block maps to any line in a given set
• e.g. Block B can be in any line of set i
• e.g. 2 lines per set
• 2 way associative mapping
• A given block can be in one of 2 lines in only one set
44. Set Associative Mapping
Address Structure
• Use set field to determine cache set to look in
• Compare tag field to see if we have a hit
Tag 9 bit Set 13 bit
Word
2 bit
46. Set Associative Mapping Summary
• Address length = (s + w) bits
• Number of addressable units = 2s+w words or bytes
• Block size = line size = 2w words or bytes
• Number of blocks in main memory = 2d
• Number of lines in set = k
• Number of sets = v = 2d
• Number of lines in cache = kv = k * 2d
• Size of tag = (s – d) bits
48. Replacement Algorithms (2)
Associative & Set Associative
• Hardware implemented algorithm (speed)
• Least Recently used (LRU)
• e.g. in 2 way set associative
• Which of the 2 block is lru?
• First in first out (FIFO)
• replace block that has been in cache longest
• Least frequently used
• replace block which has had fewest hits
• Random
49. Basic Page Replacement
1. Find the location of the desired page on disk
2. Find a free frame:
- If there is a free frame, use it
- If there is no free frame, use a page replacement algorithm to select a victim frame
(of that process)
- Write victim frame to disk
3. Bring the desired page into the (newly) free frame; update the page and frame tables
4. Continue the process by restarting the instruction that caused the trap.
• If the page in not in memory, first reference to that page will trap to operating system:
page fault
52. Evaluation
Evaluate algorithm by running it on a particular string of memory
references (reference string) and computing the number of page faults on
that string
String is just page numbers, not full addresses
Repeated access to the same page does not cause a page fault
Trace the memory reference of a process
0100, 0432, 0101, 0612, 0102, 0104, 0101, 0611, 0102
Page size 100B
Reference string 1, 4, 1, 6, 1, 6
In all our examples, the reference string is
7,0,1,2,0,3,0,4,2,3,0,3,0,3,2,1,2,0,1,7,0,1
53. Associates a time with each frame when the page was brought into memory
When a must be replaced, the oldest one is chosen
Reference string: 7,0,1,2,0,3,0,4,2,3,0,3,0,3,2,1,2,0,1,7,0,1
First-In-First-Out (FIFO) Algorithm
Limitation:
A variable is initialized early and constantly used
54. FIFO Page Replacement
• How to track ages of pages?
• Just use a FIFO queue to hold all the pages in memory
• Replace the page at the head
• Insert at tail
3 frames (3 pages can be in memory at a time per process)
15 page faults
55. Optimal Algorithm
• Replace page that will not be used for longest period of time.
• Optimal Page Replacement algorithm used future reference.
• How do you know this?
• Can’t read the future
• Used for measuring how well your algorithm performs
57. Least Recently Used (LRU) Algorithm
Use past knowledge rather than future
Past is the proxy of future
Replace page that has not been used in the most of the time
Associate time of last use with each page
12 faults – better than FIFO but worse than OPT
Generally good algorithm and frequently used
But how to implement?
60. Input/Output Problems
• Wide variety of peripherals
• Delivering different amounts of data
• At different speeds
• In different formats
• All slower than CPU and RAM
• Need I/O modules
65. Cont..
• The interface to the I/O module is in the form of control, data,
and status signals.
• Control signals determine the function that the device will
perform, such as send data to the I/O module (INPUT or
READ), accept data from the I/O module (OUTPUT or WRITE),
report status, or perform some control function particular to
the device (e.g., position a disk head).
• Data are in the form of a set of bits to be sent to or received
from the I/O module.
• Status signals indicate the state of the device. Examples are
READY/NOT-READY to show whether the device is ready for
data transfer.
65
66. Cont..
• The transducer converts data from electrical to other
forms of energy during output and from other forms
to electrical during input.
• Typically, a buffer is associated with the transducer to
temporarily hold data being transferred between the
I/O module and the external environment
66
67. I/O Module Function
• Control & Timing
• CPU Communication
• Device Communication
• Data Buffering
• Error Detection
68. I/O Steps
• Data transfer between external devices to Processor.
• The processor interrogates the I/O module to check the status
of the attached device.
• The I/O module returns the device status.
• If the device is operational and ready to transmit, the
processor requests the transfer of data, by means of a
command to the I/O module.
• The I/O module obtains a unit of data (e.g., 8 or 16 bits) from
the external device.
• The data are transferred from the I/O module to the
processor.
70. Cont..
• The module connects to the rest of the computer through a
set of signal lines (e.g., system bus lines).
• Data transferred to and from the module are buffered in one
or more data registers.
• There may also be one or more status registers that provide
current status information. A status register may also function
as a control register, to accept detailed control information
from the processor.
• The logic within the module interacts with the processor via a
set of control
70
71. Cont..
• The processor uses the control lines to issue commands to the
I/O module.
• The module must also be able to recognize and generate
addresses associated with the devices it controls.
• Each I/O module has a unique address or, if it controls more
than one external device, a unique set of addresses.
• Finally, the I/O module contains logic specific to the interface
with each device that it controls.
71