Cache memory is a memory located close to the processor that stores frequently accessed data. There are three main types of cache mapping: direct mapped, set associative, and fully associative. A cache hit occurs when requested data is in cache, while a cache miss requires accessing slower main memory. Virtual memory uses main memory as a cache for secondary storage through address translation. Translation lookaside buffers cache recent virtual to physical address translations to improve performance. Interrupts allow I/O devices to signal the processor asynchronously, with interrupt service routines executing in response.
Query optimization and processing for advanced database systems
Cache.pptx
1. Cache Memory:
Cache – A safe place for storing or hiding things like desk for
books from library
The simplest way to assign a location of main memory data in
the cache is to assign the cache location based on the address
of the word in memory. This cache structure is called direct
mapped, since each memory location is mapped directly to
exactly one location in the cache.
For example, almost all direct-mapped caches use this
mapping to find a block:
(Block address) modulo (Number of blocks in the cache)
2. Cache Memory:
If the number of entries in the cache is a power of 2, then
modulo can be computed simply by using the low-order log2
(cache size in blocks) bits of the address.
Thus, an 8-block cache uses the three lowest bits (8=23) of the
block address.
Similarly 4-block cache uses two bits as block address i.e. 00,
01, 10 and 11
3. Cache Memory:
(Block address) modulo (Number of blocks in the cache)
For example, the memory addresses between 1ten (00001two)
and 29ten (11101two) map to locations 1ten (001two) and 5ten
(101two) in a direct-mapped cache of eight words.
4. Cache Memory:
Tag - A field in a table used for a memory hierarchy that
contains the address information required to identify whether
the associated block in the hierarchy corresponds to a
requested word.
5. Cache Memory:
Valid Bit – A field in the tables of a memory hierarchy that
indicates that the associated block in the hierarchy contains
valid data. If data present, it is 1 otherwise 0
6. Cache Memory:
Accessing a cache:
Cache Hit – The requested data is present in the cache
Cache Miss – The requested data is not present in the cache
and have to access from main memory
Initially the cache is empty and any data requested is not
available in cache and it is cache miss
9. Cache Memory:
The address in the memory is divided into
Tag field – compared with tag field of the cache
Cache index – used to select the block in cache
10. Cache Memory:
32-bit addresses
The cache size is 2n blocks, so n bits are used for the index
The block size is 2m words (2m+2 bytes), so m bits are used for
the word within the block, and two bits are used for the byte
part of the address
the size of the tag field is
11. Cache Memory:
When the CPU tries to read from memory, the address will be
sent to a cache controller.
•The lowest k bits of the address will index a block in the
cache.
•If the block is valid and the tag matches the upper (m - k)
bits of the m-bit address, then that data will be sent to the
CPU.
13. Cache Memory:
Handling Cache Misses:
Steps to be taken on an instruction cache miss:
1. Send the original PC value (current PC – 4) to the memory.
2. Instruct main memory to perform a read and wait for the
memory to complete its access.
3. Write the cache entry, putting the data from memory in the
data portion of the entry, writing the upper bits of the address
(from the ALU) into the tag field, and turning the valid bit on.
4. Restart the instruction execution at the first step, which will
refetch the instruction, this time finding it in the cache.
14. Cache Memory:
Handling Writes:
The cache and main memory are said to be inconsistent if
both contains different data for a same location e.g. a store
instruction may writes new data to cache block which may
not be updated in main memory
Write-through:
A scheme in which writes always update both the cache and
the next lower level of the memory hierarchy, ensuring that
data is always consistent between the two.
15. Cache Memory:
Handling Writes:
With write-through scheme, the data when fetched and write
into the cache also requires written back into the main
memory, which decreases the performance by taking more
clock cycles of processor.
The solution is Write buffer:
16. Cache Memory:
Handling Writes:
Write buffer:
• A queue that holds data while the data is waiting to be
written to memory.
• After writing the data into the cache and into the write
buffer, the processor can continue execution.
• When a write to main memory completes, the entry in the
write buffer is freed.
• If the write buffer is full when the processor reaches a
write, the processor must stall until there is an empty
position in the write buffer.
17. Cache Memory:
Handling Writes:
Write-back:
An alternative to a write-through scheme is a scheme called
write-back.
A scheme that handles writes by updating values only to the
block in the cache, then writing the modified block to the
lower level of the hierarchy when the block is replaced.
Advantage: Performance increases by reducing the writes into
main memory every time
18. Cache Memory:
Split Cache:
A scheme in which a level of a memory hierarchy is composed
of two independent caches that operates in parallel with each
other, with one handling instructions and one handling data
19. Measuring and Improving Cache Performance:
Ways to improve cache performance:
Reducing the miss rate by reducing the probability that two
different memory blocks will contend for the same cache
location.
Reduces the miss penalty by adding an additional level to the
Hierarchy (multilevel caching)
20. Measuring and Improving Cache Performance:
CPU time can be divided into
the clock cycles that the CPU spends executing the program
the clock cycles that the CPU spends waiting for the memory
system.
CPU time = (CPU execution clock cycles + Memory-stall clock
cycles) * Clock cycle time
21. Measuring and Improving Cache Performance:
Memory-stall clock cycles can be defined as the sum of the
stall cycles coming from reads plus those coming from writes
Memory-stall clock cycles = (Read-stall cycles + Write-stall
cycles)
The read-stall cycles can be defined in terms of the number of
read accesses per program, the miss penalty in clock cycles
for a read, and the read miss rate.
22. Measuring and Improving Cache Performance:
Write-stall cycles:
For a write-through scheme, we have two sources of stalls:
write misses, which usually require that we fetch the block
before continuing the write and
write buffer stalls, which occur when the write buffer is full
when a write occurs.
23. Measuring and Improving Cache Performance:
In most write-through cache organizations, the read and write
miss penalties are the same (the time to fetch the block from
memory).
If we assume that the write buffer stalls are negligible, we can
combine the reads and writes by using a single miss rate and
the miss penalty:
24. Schemes for Reducing Cache Misses:
The cache misses can be reduced by means of
• Direct Mapped Cache
• Set Associative Cache
• Fully Mapped Cache
27. Set Associative Cache
A cache that has a fixed number of locations (at least two)
where each block can be placed.
A set-associative cache with n locations for a block is called
an n-way set-associative cache.
28. Set Associative Cache
•2 set associative – each block is repeated 2 times
•4 set associative – each block is repeated 4 times
•8 set associative – each block is repeated 8 times
•E.g. in 2 set associative type, if no. of block in CM
is 4 means it contains two 0 block and two 1 block
and totally of 4 blocks.
Location = Block No. in MM % No. of sets
Least Recently used blocks will be replaced if any need
30. Fully Mapped
•Blocks will be transferred to the free location in
the cache memory
Least Recently used blocks will be replaced if any need
31. Schemes for Reducing Cache Misses:
Multilevel Caches:
Two miss rate
Global miss rate – The fraction of references that miss in all
levels of a multilevel cache.
Local miss rate – The fraction of references to one level of a
cache that misses used in multilevel hierarchies.
32. Virtual Memory:
A technique that uses main memory as a “cache” for
secondary storage.
Physical address - An address in main memory.
Program’s own address space - A separate range of memory
locations accessible only to the program
Virtual memory implements the translation of a program’s
address space to physical addresses.
33. Virtual Memory:
Need for Virtual Memory:
To allow efficient and safe sharing of memory among multiple
programs, such as for the memory needed by multiple virtual
machines for cloud computing
to remove the programming burdens of a small, limited
amount of main memory i.e. if the executing program
memory is larger than main memory.
Overlays – Program of larger size than main memory is
divided into pieces and then identified the pieces that were
mutually exclusive
34. Virtual Memory:
Protection:
A set of mechanisms for ensuring that multiple processes
sharing the processor, memory, or I/O devices cannot
interfere, intentionally or unintentionally, with one another
by reading or writing each other’s data.
These mechanisms also isolate the operating system from a
user process.
35. Virtual Memory:
Page – A block in the virtual memory
Page Fault – An event that occurs when the requested page is
not in virtual memory i.e. miss in virtual memory
Virtual Address - An address that corresponds to a location in
virtual space and is translated by address mapping to a
physical address when memory is accessed.
Address mapping or address translation – Translation of
virtual address into physical address by mapping the pages of
virtual memory into main memory i.e. a virtual address is
mapped to an address used to access memory
36. Virtual Memory:
Relocation – A technique that maps the virtual addresses
used by a program to different physical addresses before the
addresses are used to access memory.
37. Virtual Memory:
The virtual address is broken into
•A virtual page number and
•A page offset
Page fault frequency can be reduced by optimizing page
replacement i.e. any page can be replaced when a page fault
occurs.
Clever and flexible replacement schemes reduces the page
fault rate
38. Virtual Memory:
Translation of Virtual Page Number to Physical Page Number:
Physical page number constitutes the upper portion and page
offset in lower portion. No. of bits in page offset determines
page size
39. Virtual Memory:
Page Table:
Used by the virtual memory system, containing the virtual to
physical address translations in a virtual memory system.
The table, which is stored in memory, is typically indexed by
the virtual page number
Each entry in the table contains the physical page number for
that virtual page if the page is currently in memory.
42. Virtual Memory:
Swap Space:
The space on the disk reserved for the full virtual memory
space of a process.
Reference bit / Use bit:
A field that is set whenever a page is accessed and that is
used to implement LRU or other replacement schemes.
Used to replace the pages in virtual memory
43. Virtual Memory:
Dirty Bit:
Used to track whether a page has been written since it was
read into the memory
Sets when any word in a page is written
If the operating system chooses to replace the page, the dirty
bit indicates whether the page needs to be written out before
its location in memory can be given to another page.
Hence, a modified page is often called a dirty page.
44. Translation-Lookaside Buffer:
Since the page tables are stored in main memory, every
access takes twice i.e. one access to access the address and
another to access the data
Instead of accessing page table often, a special cache that
keeps track of recently used translations is maintained which
is called as Translation-Lookaside Buffer (TLB) otherwise
called as Translation Cache
e.g. a piece of paper to record the location of set of books in
the catalog rather than searching the entire catalog.
45. Translation-Lookaside Buffer:
Locality of reference:
Locality of reference states that, instead of loading the entire
process in the main memory, OS can load only those number
of pages in the main memory that are frequently accessed by
the CPU and along with that, the OS can also load only those
page table entries which are corresponding to those many
pages.
TLB follows the concept of locality of reference which means
that it contains only the entries of those many pages that are
frequently accessed by the CPU.
48. Translation-Lookaside Buffer:
If the probability of TLB hit is P% (TLB hit rate) then the
probability of TLB miss (TLB miss rate) will be (1-P) %.
Therefore, the effective access time can be defined as;
EAT = P (t + m) + (1 - p) (t + k.m + m)
Where, P → TLB hit rate,
t → time taken to access TLB,
m → time taken to access main memory
k = 1, if the single level paging has been
implemented.
49. Interrupts:
On executing any operation by the processor, if the I/O
devices are not ready then often processor wastes time to
check whether the devices are ready and if not ready it waits.
To eliminate the continuous checking and waiting time, the
I/O devices can send a signal to the processor indicating that
it is ready, which is called as interrupt.
Instead of waiting for the devices to get ready, the processor
can execute any other operation.
50. Interrupts:
e.g. consider a task of computing the calculations and
displaying the results once in every ten seconds.
The task involves two routines namely COMPUTE and
DISPLAY.
COMPUTE – Computing the calculations
DISPLAY – Display the results in every ten seconds.
Timer will count the seconds. Processor usually executes
COMPUTE routine and upon receiving interrupt from the
timer it executes DISPLAY routine and resume back to
COMPUTE.
51. Interrupts:
Interrupt Service Routine (ISR):
The routine that is executed immediately in response to an
interrupt request is called interrupt service routine.
52. Interrupts:
When receiving the interrupt i.e. here in instruction i,
immediately the PC value (i+1) is stored in temporary storage.
After executing the interrupt, the program execution resumes
from the PC value at the temporary storage.
Interrupt sent by any I/O devices to the processor will be
acknowledged by the processor by means of Interrupt
Acknowledge.
53. Interrupts:
There may be two kinds of interrupts.
One that can save all the register contents before transferring
the interrupt and the another one that doesn’t save the
contents.
An approach is to provide duplicate sets of processor registers
used by interrupt service routine which eliminates the need
of save and restore registers value. The duplicate registers are
called as shadow registers.
54. Interrupts:
The processor has a status register (PS) providing information
about the current state of operation.
Interrupt Enable (IE) of the status register is used for enabling
/ disabling interrupts.
When IE=1, the processor accepts and processes the
interrupt from any I/O devices
When IE=0, the processor will just ignore the interrupt and
continue with normal execution.
57. Handling interrupts from multiple devices:
Solution:
When the devices raising an interrupt request, it sets a bit called
as IRQ to 1
The first device which encountered with 1 will serviced first
Polling - all interrupts are serviced by branching to the same
service program. This program then checks with each device if it
is the one generating the interrupt. The order of checking is
determined by the priority that has to be set. The device having
the highest priority is checked first and then devices are checked
in descending order of priority.
An alternative scheme is using vectored interrupt.
58. Interrupts:
Vectored Interrupts:
Interrupts generating devices are having the interrupt service
routine to service the interrupt.
Interrupt vectors – used to store the interrupt service routine
i.e. address that store the routine.