Cache memory

1
Contents
1. Introduction………………………………………………………………………………2
2. Placement Of Caches…………………………………………………………………3
3. Cache addressing………………………………………………………………………3
4. Working Principles of Cache Memory……………………………………….4
5. Levels of Cache Memory…………………………………………………………..5
6. Cache mapping…………………………………………………………………………6
7. Methods of Cache mapping……………………………………………………..6
7.1 Direct cache mapping
7.2 Associative cache mapping
7.3 Set Associative cache mapping
8.Interaction Policies with Main Memory…………………………………….9
9.Conclusion………………………………………………………………………………..12

2
1.Introduction
What is Cache Memory?
Generally Cache memory is a small-sized type of volatile computer memory that provides high-
speed data access to a processor and stores frequently used computer programs, applications
and data. It stores and retains data only until a computer is powered up. Cache memory is the
fastest memory in a computer. It is typically integrated on the motherboard and directly
embedded on the processor or main random access memory (RAM)
Cache is a memory which holds the recently utilized data by the processor. A bloc
of memory cannot be placed randomly in the cache and is restricted to a single cache line by
the “Placement Policy”. In other words, Placement Policy determines where a
particular memory block can be placed when it goes into the cache
We know that Cache memory is also called CPU memory, is high-speed static random access
memory (SRAM) that a computer microprocessor can access more quickly than it can access
regular random access memory (RAM). This memory is typically integrated directly into the CPU
chip or placed on a separate chip that has a separate bus interconnect with the CPU. The
purpose of cache memory is to store program instructions and data that are used repeatedly in
the operation of programs or information that the CPU is likely to need next. The computer
processor can access this information quickly from the cache rather than having to get it from
computer's main memory. Fast access to these instructions increases the overall speed of the
program.

3
2.Placement of Cache Memory
Cache is a memory which holds the recently utilized data by the processor. A block
of memory cannot be placed randomly in the cache and is restricted to a single cache line by
the “Placement Policy”. In other words, Placement Policy determines where a
particular memory block can be placed when it goes into the cache.
Caches in modern systems are located on the same chip as the cores. Typically, systems will
have three levels of caches. The first level of cache is the L1 which is close to each core and can
be accessed at least once a cycle and will respond to a read hit in a handful or less of cycles. L1
caches are generally divided between instruction (L1-I) and data (L1-D) accesses. L1 caches are
usually less than 64kB.
When read or write address is not found in the L1, a miss request will be made to the L2 cache
which is generally private or shared by a cluster of cores. L2 caches often hold both instructions
and data but are sometimes split where the L2-D is private to a core and the L2–I is shared
between a cluster of cores. This cache will generally have a hit latency that is under 15 cycles
(sometimes significantly under). When an access to the L2 misses, the L3 cache is accessed. L2
caches are usually around 256kB
The L3 cache is usually shared by all cores on a processor and is much larger than the aggregate
capacity of all L1 and L2. Some L3 are up to 48MB! The access latency of the L3 cache can be a
few tens of cycles. When a request misses in the L3, main memory is accessed with hundreds of
cycles of latency. Avoiding that latency is the whole purpose of caches.
3.Cache Addressing
Logical cache (virtual cache)
Logical cache stores Data using virtual addresses. Virtual addresses use same address space for
different applications and so Cache access faster, before MMU address translation.

4
Physical cache
Physical caches use physical addresses, do not need flush-ing on a context switch and therefore
data is preserved within the cache. The disadvantage is that all accesses must go through the
memory management unit, thus incurring delays. Particular care must also be exercised when
pages are swapped to and from disk.
If the processor does not invalidate any associated cache entries, the cache contents will be
different from the main memory con-tents by virtue of the new page that has been swapped in.
4. Working Principles of Cache Memory
When an application starts or data is to be read/written or any operation is to be performed
then the data and commands associated with the specific operation are shifted from a slow
moving storage device (magnetic device – hard disk, optical device – CD drive etc.) to a faster
device. This faster device is RAM – Random Access Memory. This RAM is type of DRAM
(Dynamic Random Access Memory). RAM is placed here because it is a faster device, and
whenever data/ commands/instructions are needed by Processor, they provide them at a faster

5
rate than slow storage devices. They serve as a cache memory for the storage devices. Although
they are much faster than slow storage device but the processor processes at much a faster
pace and they are not able to provide the needed data/instructions at that rate. So there is
need of a device that is faster than RAM which could keep up with the speed of processor
needs. Therefore the data required is transmitted to the next level of fast memory, which is
known as CACHE memory. CACHE is also a type of RAM, but it is Static RAM – SRAM. SRAM are
faster and costlier then DRAM because it has flip-flops (6 transistors) to store data unlike DRAM
which uses 1 transistor and capacitor to store data in form of charge. Moreover they need not
be refreshed periodically (because of bistable latching circuitry) unlike DRAM making it faster.
So whenever the processor needs to perform an action or execute any command then it first
checks the state of the data registers. If the required instruction/data is not present over there,
then it looks in the first level of cache memory – L1, and if there also data is not present it further
goes to second and further third level of cache memory. Whenever the data needed by processor
is not found in the cache it is known as CACHE MISS and it leads to delay in the execution thus
making the system slow. If the data is found in cache memory it is known as CACHE HIT.
If the data needed is not found in any of the cache memory, the processor checks in RAM. And if
this also fails then it goes to look onto the slower storage device.
5.Levels of Cache Memory
CPU cache is divided into three main ‘Levels’, L1, L2, and L3. The hierarchy here is again
according to the speed, and thus, the size of the cache.
L1 (Level 1) cache is the fastest memory that is present in a computer system. In terms of
priority of access, L1 cache has the data the CPU is most likely to need while completing a
certain task.As far as the size goes, the L1 cache typically goes up to 256KB. However, some
really powerful CPUs are now taking it close to 1MB. Some server chipsets (like Intel’s top-end
Xeon CPUs) now have somewhere between 1-2MB of L1 cache.
L1 cache is also usually split two ways, into the instruction cache and the data cache. The
instruction cache deals with the information about the operation that the CPU has to perform,
while the data cache holds the data on which the operation is to be performed.

6
L2 (Level 2) cache is slower than L1 cache, but bigger in size. Its size typically varies between
256KB to 8MB, although the newer, powerful CPUs tend to go past that. L2 cache holds data
that is likely to be accessed by the CPU next. In most modern CPUs, the L1 and L2 caches are
present on the CPU cores themselves, with each core getting its own cache.
L3 (Level 3) cache is the largest cache memory unit, and also the slowest one. It can range
between 4MB to upwards of 50MB. Modern CPUs have dedicated space on the CPU die for the
L3 cache, and it takes up a large chunk of the space.
6.Cache Mapping
Cache mapping is the method by which the contents of main memory are brought into
the cache and referenced by the CPU. The mapping method used directly affects the
performance of the entire computer system.
Cache Mapping Function
Mapping functions are used as a way to decide which main memory block occupies which line
of cache. As there are less lines of cache than there are main memory blocks, an algorithm is
needed to decide this. One block from main memory maps into only one possible line
of cache memory.
Cache Mapping Needed for?
It holds frequently requested data and instructions so that they are immediately available to
the CPU when needed. Cache memory is used to reduce the average time to access data from
the Main memory. The cache is a smaller and faster memory which stores copies of the data
from frequently used main memory locations.
7.Methods Of Cache Mapping
The three different types of mapping used for the purpose of cache memory are :
1)Direct Mapping.
2) Associative Mapping.
3)Set Associative Mapping.

7
7.1 Direct Mapping:
In Direct mapping, assign each memory block to a specific line in the cache. If a line is previously
taken up by a memory block when a new block needs to be loaded, the old block is trashed. An
address space is split into two parts index field and a tag field. Each location in RAM has one
specific place in cache where the data will be held. Consider the cache to be like an array. Part
of the address is used as index into the cache to identify where the data will held. Since a data
block from RAM can only be in one specific line in the cache, it must always replace the one
block that was already there. There is no need for a replacement algorithm.
A direct-mapping is the simplest approach: each main memory address maps to exactly
one cache block. For example, on the right is a 16-byte main memory and a 4-
byte cache . Memory locations 0, 4, 8 and 12 all map to cache block 0. Addresses 1, 5, 9 and
13 map to cache block 1 etc.
Advantages Of Direct Mapping
In Direct mapping each memory block is mapped to exactly one block in the cache. Advantages
of direct mapping are that it is simple technique and The mapping scheme is easy to

8
implement. Replacement is straight forward. Does not require any search technique to find a
block in cache. Simple hardware and it’s cost is low.
Disadvantages Of Direct Mapping
Each block of main memory maps to a fixed location in the cache;, therefore, if two different
blocks map to the same location in cache and they are continually referenced, the two blocks
will be continually swapped in and out . each main memory maps to fixed location the scheme
can suffer from many addresses colliding to the same block, thus causing the cache line to be
repeatedly evicted, even though there may be empty blocks that aren't being used, or being
used with less frequency. Poor cache utilization.
7.2 Associative Cache Mapping
An Associative Mapping use an associative memory. This memory is being accessed using its
contents. Each line of cache memory will accommodate the address (main memory)and the
contents of that address from the main memory. That is why this memory also called Content
Addressable Memory (CAM). It allows each Block of main memory to be stored in the cache. A
number of hardware schemes have been developed for translating main memory addresses to
cache memory addresses. The user does not need to know about the address translation, which
has the advantage that cache memory enhancements can be introduced into a computer
without a corresponding need
for modifying application software. The choice of cache mapping scheme affects cost and
performance, and there is no single best method that is appropriate for all situations. In this
section, an associative mapping scheme is studied.

9
Advantages of Associative Mapping
Any main memory block can be placed into any cache slot.Regardless of how irregular the data
and program references are, if a slot is available for the block, it can be stored in the cache.
Dis-advantage of Associative Memory
Considerable hardware overhead needed for cache bookkeeping. There must be a mechanism
for searching the tag memory in parallel.
Set Associative Mapping
This form of mapping is an enhanced form of direct mapping where the drawbacks of direct mapping
are removed. Set associative addresses the problem of possible thrashing in the direct mapping method.
It does this by saying that instead of having exactly one line that a block can map to in the cache, we will
group a few lines together creating a set.
Then a block in memory can map to any one of the lines of a specific set..Set-associative mapping allows
that each word that is present in the cache can have two or more words in the main memory for the
same index address. Set associative cache mapping combines the best of direct and associative cache
mapping techniques.

10
8.Interaction Policies with Main Memory
Reads dominate processor cache accesses. All instruction accesses are reads, and most
instructions do not write to memory. The block can be read at the same time that the tag is
read and compared, so the block read begins as soon as the block address is available. If the
read is a miss, there is no benefit - but also no harm; just ignore the value read.
The read policies are:
Read Through - reading a word from main memory to CPU
No Read Through - reading a block from main memory to cache and then from cache to
CPU
Such is not the case for writes. Modifying a block cannot begin until the tag is checked to see if
the address is a hit. Also the processor specifies the size of the write, usually between 1 and 8
bytes; only that portion of the block can be changed. In contrast, reads can access more bytes
than necessary without a problem.
The write policies on write hit often distinguish cache designs:
Write Through - the information is written to both the block in the cache and to the block in
the lower-level memory.
Advantage
- read miss never results in writes to main memory
- easy to implement
- main memory always has the most current copy of the data (consistent)
Disadvantage
- write is slower
- every write needs a main memory access
- as a result uses more memory bandwidth
Write Back
The process of write backs are-
i. Updates initially made in cache only
ii. Update bit for cache slot is set when update occurs
iii. If block is to be replaced, write to main memory only if update bit is set
iv. Other caches get out of sync
v. I/O must access main memory through cache

11
vi. N.B. 15% of memory references are writes
Advantage
- writes occur at the speed of the cache memory
- multiple writes within a block require only one write to main memory
- as a result uses less memory bandwidth
Disadvantage
- harder to implement
- main memory is not always consistent with cache
- reads that result in replacement may cause writes of dirty blocks to main
There are two common options on a write miss:
Write Allocate - the block is loaded on a write miss, followed by the write-hit action.
No Write Allocate - the block is modified in the main memory and not loaded into the cache.
Although either write-miss policy could be used with write through or write back, write-
back caches generally use write allocate (hoping that subsequent writes to that block will be
captured by the cache) and
write-through caches often use no-write allocate (since subsequent writes to that block will
still have to go to memory).
Write Through with Write Allocate:
on hits it writes to cache and main memory
on misses it updates the block in main memory and brings the block to the cache
Bringing the block to cache on a miss does not make a lot of sense in this combination
because the next hit to this block will generate a write to main memory anyway (according to
Write Through policy)
Write Through with No Write Allocate:
on hits it writes to cache and main memory;
on misses it updates the block in main memory not bringing that block to the cache;
Subsequent writes to the block will update main memory because Write Through policy is

12
employed. So, some time is saved not bringing the block in the cache on a miss because it
appears useless anyway.
Write Back with Write Allocate:
on hits it writes to cache setting bit for the block, main memory is not updated;
on misses it updates the block in main memory and brings the block to the cache;
Subsequent writes to the same block, if the block originally caused a miss, will hit in the
cache next time, setting dirty bit for the block. That will eliminate extra memory accesses and
result in very efficient execution compared with Write Through with Write Allocate
combination.
Write Back with No Write Allocate:
on hits it writes to cache setting bit for the block, main memory is not updated;
on misses it updates the block in main memory not bringing that block to the cache;
Subsequent writes to the same block, if the block originally caused a miss, will generate
misses all the way and result in very inefficient execution.
The following applet shows the dynamics of interaction policies. By clicking on white squares of
the applet one can chose the policy to test depending on whether it is read or write, hit or miss.
On read hit there is not much choice of policies. The block is just read from cache to CPU.
The following applet will show you the dynamics of interaction policies.
9.Conclusion
So we can say that Cache memory is a high-speed random access memory which is used by a
system Central processing unit for storing the data/instruction temporarily. It decreases the
execution time by storing the most frequent and most probable data and instructions “closer”
to the processor, where the systems CPU can quickly get it.

Cache memory

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Cache memory

Similar to Cache memory (20)

Recently uploaded

Recently uploaded (20)

Cache memory